i’ve alluded to it several times, but for the past two weeks i’ve been working on a blogroll plugin for wordpress that sorts blogs from the most recently updated to the least recently updated.
sounds simple enough, right?
i started with a world-accessible web-interface for adding blogs. once i got that working, i realized it’d be great if it could auto-discover the feed url and blog name from the blog url. so i wrote some code to do that with an html parser i found. i figured out some things about RSS/Atom feeds so i could get the last modified date (not easy!) and last post title for each blog, using a pretty good rss parser. then i had to figure out how to use php on the command line so i could have a cron job call a method to poll all the feeds in the blogroll. fairly simple, but something i hadn’t done before.
i installed wordpress to help in transforming what i had done into a plugin, and i discovered that i needed another plugin that would evaluate php code in the body of a blog post.
that got me to what i called v0.1, a custom proof of concept which is currently running quite nicely over at OrangePolitics.org (check it out). the motivation for doing this in the first place was my desire to create a tool to support/reflect a local community of bloggers, which grew out of some discussions on orangepolitics sparked by anton’s upcoming conference on blogging and community. this may not be that tool, but that’s another blog post…
since then, i’ve gone over v0.1 with a finer-toothed comb in order to get the code to a place where i’d be willing let anyone else see it. it’s designed it to be highly configurable–there are 24 different settings that affect how it looks and works.
however, i have to say setting it up is not for the faint of heart. so far it has only been tested on my clean wordpress install and ruby’s reasonably-modded orangepolitics. it requires a number of different components working together to achieve the intended effect. of course if anyone does install it, i’d love to hear reactions, feedback, and questions.
there are some more detailed instructions at the beginning of dynamic_blogroll.php in the archive above, but you should feel comfortable with the following before attempting to install this plugin:
- extracting the files (dynamic_blogroll.php, poll_feeds.php, run_php.php) from the archive above into wp-content/plugins
- downloading MagpieRSS and HTML Parser and extracting them in wp-content/plugins
- editing settings in dynamic_blogroll.php and poll_feeds.php (see instructions in dynamic_blogroll.php)
- activating Run PHP and Dynamic Blogroll in wordpress
- adding the following to a wordpress post:
<phpcode> $dynamic_blogroll = new DynamicBlogroll(); $dynamic_blogroll->run(); </phpcode>
- setting up a cron job to run poll_feeds.php every hour
be aware that other plugins may conflict with dynamic blogroll, in particular v0.9 of Scott Reilly’s auto-hyperlink urls plugin. however i found that it plays well with the latest version (v2.01).
Wow, I got dropped from the friendblogs list. Guess I should post once in a while.
ha, i resort the list every so often. you’re actually still there (and still a friend), just commented out. hope you’re doing well.
i haven’t actually looked at the script and am completely unfamiliar with the parsers you’re using, but wouldn’t it be easier for the script to automatically check when the last poll feeds check was executed each time the run() method was called, eliminating the need to set a cron job? more overhead for the php execution, but it would make installation one step simpler.
also, could you wrap the parsers into the plugin such that it would be bundled with the blogroll plugin and install itself automatically upon detection that the parsers weren’t already installed? or maybe just bundle the parsers with your plugins and have the blogroll plugin install and activate the parsers. i guess this is more that only really saves a few clicks, but i hate complex installation procedures.
so are you going to transition to wordpress, or is this all academic?
ryan, i like the idea of having the run method check to see whether the feeds need updating. (damn, wish i had thought of that!)
however, the overhead (in terms of time) to poll all the feeds is quite great, which would slow down the page load significantly for that one user who happened to trigger the update.
btw, do you know if there is a way in php to say “go run this function” but continue executing the code that follows? i think that would be akin spawning a separate thread?
ha, actually i was thinking about writing my own html parser partly for fun (to figure out how), but mostly cause i didn’t want to deal with the implications of the various licensing issues. magpie rss is GPL, the html parser is apache. so for this first try i just decided to duck the issue. i’ll sort it out for the next release.
lastly, wordpress is just academic at this time.
if you wanted to run the function but didn’t care about using whatever it does, you could just stick it at the bottom of the script after you’ve closed out the html. i don’t know for sure, but i have this feeling that PHP doesn’t do threading. no idea actually.
yeah, i was thinking that, but being that this is plugin, i have little control over when execution occurs outside the plugin… i’ll do some testing and see what happens.
well, found this – not sure if it’s any help:
i suspect it has the same problem with execution time.
i also found this Process Control Functions. note the caveat, though: “Process Control should not be enabled within a webserver environment and unexpected results may happen if any Process Control functions are used within a webserver environment”
yeah, i just checked and my server hasn’t been compiled with pcntl
yeah that’s way to long to wait, i would have cancelled the request and found something else to look at.
i like the idea though, maybe i’ll take a look at the code so i have a better idea of what’s going on. surely it can work faster…
i don’t understand why it takes so long – does it just take forever to get the rss feed? (dynamic_blogroll.php line 694 “$feed = @fetch_rss($url);”)
ok, one last question (for now…)
wordpress is pre-configured to ping ping-o-matic when a post is published, and you can set links in a blogroll to organize based on most recently posted. this is all based on the rpc thing (which i’m not very familiar with). anyway, it would be nice to integrate this into what you’re doing (essentially the same thing but using RSS) so that pages that aren’t syndicated will be handled the same as those that are, only without including the post title. the other benefit of doing this is that wordpress does this very quickly, making me think that if you started by checking to see which blogs had been updated using the ping thing, you could then only get the RSS feed for those sites, reducing the time needed to execute the poll_feeds function.
the time it takes is based on how long it takes for the http request for the rss feed to be fulfilled.
so for instance, paul jones’ blog (and feed) on ibiblio which takes forever to load, slows the whole process down.
so i’ve been “resorted” to the special invisible place. cool.
Glad Ryan finally has someone to talk to about this stuff. Good look…
yeah, the performance hit is pretty gruesome, even if it only happens to one user every hour.
I just released a plugin for the exact same purpose – LinkedList. Do check it out.