5 years ago this month, i answered an email that came across a listserv i was on as a part of my residential computing consultant job. i wish i still had it. a project at the carolina population center was looking for someone to work on a microsoft access database. i was looking for a summer job.
i answered the email—as i was in the habit of doing at that point in my life (i was working several campus jobs concurrently). i knew microsoft word, powerpoint, and excel, so why not access? after interviewing, i was offered the position, to replace someone who was leaving in a month for the peace corps. i was going to be responsible for a financial database that would track $47 million dollars.
later i would go on to build a health indicators database that would send me to kazakhstan, cambodia, tanzania, and ghana. somehow in the midst of all this i managed to graduate from unc with a bachelor’s degree in linguistics and a masters of science in information science. for my masters project i redeveloped the access-based financial system for the web using php and mysql. then i wrote a sixty-six page paper describing what i’d done.
over the last year i’ve been adjusting to life on the outside. i worked on a few other projects, i traveled to ghana in november, but otherwise i’ve worked pretty consistently on developing and extending the web-based financial system. until now. april 2005 is my last month at measure evaluation.
i’ve accepted a position as a senior web producer with o’reilly media, inc. starting in may.
i worry that, contrary to popular belief, the computer actually inhibits flow. maybe it’s not the computer per se, but its capacity to offer almost limitless distraction, exacerbated by my desire for the greatest amount of stimulation with the least amount of effort.
in the early history of the computer, j.c.r. licklider predicted that the computers would become like an extension of our cognitive faculties, doing all of the things people do poorly or slowly (organization, memory, arithmetic) so that we could all be creators (and not get bogged down in the adding, retyping, filing, etc). however, this seems to brush aside the fact that routine and predictable tasks can be an aide to creative thought, distracting the mind’s frenetic central executive so that the big iron can do some heavy lifting.
because the computer is essentially a passive creature, i spend most of my time directing it do the tedious, and then kind of stare, slack-jawed, as it sits there waiting for me to be creative so i can tell it what to do next. all too often nothing occurs to me because i expect a sort of give and take, back and forth human interaction style—but i get nothing that i don’t initiate.
so i check my email. or bloglines. or my blog. and repeat. because those internetty things (courtesy of there being a human at the other end) respond with information that’s novel and thus satisfying. while the computer just sits there, waiting for me to tell it what to do.
Update:This post is not intended for the faint of heart. It describes in some detail what it took for me to move my Blogger blog (circa March 2005) to WordPress 1.5. Much of it is obsolete, as WordPress 2.0 has a much simpler, more user-friendly process for importing posts and comments from Blogger. If you’re interested in moving your Haloscan comments to WordPress 2.x along with your Blogger blog, please see my “new and improved” version of this post, Importing Haloscan comments into WordPress from Blogger.
when i moved from blogger to wordpress 1.5 (a few weeks ago) the process proved to be a little less than painless. partly because the following two requirements of mine were somewhat outside the scope of the provided blogger import script.
i wanted the existing blogger permalink post urls (based on the year, month, and post title) to be the same in wordpress (ignoring the .html difference which could be solved with mod_rewrite). Update: I’ve written a new post with information on maintaining your permalinks when moving from Blogger to WordPress. (17-Oct-2006)
as a result, what you’re reading below is a partial simplification of the 3 tries it took before i was able to successfully import everything. knowing the following sql:
RENAME TABLE tbl_name TO new_tbl_name;
SHOW CREATE TABLE tbl_name;
was extremely useful when i needed to rename a table with munged data and then recreate it without reinstalling wordpress.
haloscan preparations
i had to make the following two changes to import-blogger.php (based on code written by ravingmadness) to ensure that the unique id blogger assigns to every post got included in the content of the blog post as an html comment. since haloscan uses this id to associate comments with each post, the number is crucial in maintaining that link when importing the comments later.
uncomment line 61: $post_number = $postinfo[3];
change line 127 to this: $post_content = addslashes("<!--" . $post_number . "-->" . $post_content);
it occurs to me now that it probably would have been easier to include the post id as a comment in the simplified blogger template below, rather than having to modify blogger-import.php. for an example of this, see my post importing haloscan comments into wordpress.
standard blogger import
after running the blogger import script (blogger-import.php) for the first time, it instructed me to replace the blogger template with the following:
then change a few other settings in blogger, republish the whole blog to the root wordpress directory, and finally start the import process.
maintaining the post page filenames from blogger
Update: I’ve written an additional post entitled Maintain permalinks moving from Blogger to WordPress that enables you to automatically import your Blogger permalinks (aka the “post slug”) into WordPress 2.0 along with your posts and comments. (17-Oct-2006)
in order to maintain the permalink urls for each post created by blogger, i had to make sure the “post slug” (aka dirified post title) in wordpress matched the filename of the post page created by blogger. unfortunately, blogger’s dirify algorithm is different than wordpress’, leaving out articles (e.g. “a”, “the”) and truncating the title length.
first i fixed several imported posts that didn’t have titles at all (oops). blogger still dirified them (based on the first few words of the content), but wordpress did not. luckily there were only three, which i fixed by hand:
ran the following query: SELECT LEFT(post_content,80) FROM wp_posts WHERE post_title = '';
based on the content, looked up the posts in blogger to figure out how it had dirified them
updated the post slug and title in wordpress manually
next i had to get the actual permalink urls out of blogger and into a format that i could use to compare with the newly imported posts in wordpress. i created the following template, with a tab between the two tags to get a list of the titles and their dirified permalinks.
i concatenated all the new monthly archive files blogger produced:
cat *_wordpress.php > blogger_post_titles.txt
then ran some regular expressions to clean up blank lines, removing .html on the ends of the permalinks, and the hostname/path from the beginning. at this point i had a nice clean tab-separated file of post titles and slugs to import into mysql.
i created a table in the wordpress database to store the (thankfully unique) post titles and slugs.
i used the mysqlimport utility to import the tab-separated text file into the database table.
mysqlimport -p wp blogger_post_titles.txt
the following query updated the post slugs in wordpress to be equal to the filenames i’d extracted from blogger.
UPDATE wp_posts
LEFT JOIN blogger_post_titles ON wp_posts.post_title = blogger_post_titles.post_title
SET wp_posts.post_name = blogger_post_titles.post_slug
rewrite rule to match blogger urls with wordpress permalinks
now i had to write a rewrite rule that would catch requests for the old post page urls with .html extensions (e.g. /2004/02/hello-world.html) and redirect them to urls without (e.g. /2004/02/hello-world/). figuring this out took much longer than i expected, mostly because i was trying to accomplish it with an apache redirect directive. it turned out to interact badly with the rewrite rules created by wordpress. so i ended up writing a rewrite rule instead (outside the block of rules generated by wordpress).
this was hard. haloscan allows comment exporting using the caif format, so i started by hacking up phil ringnalda’s caif2mt.php code. i got this almost all the way there when i discovered that haloscan was only exporting the raw text of the comments with none of the html tags. what a waste. i sent a support email in, and still haven’t heard back.
ravingmadness’ previously-cited post got the comment data by requesting each individual comment webpage from haloscan, scraping the page with some regular expressions, and then inserting the appropriate fields into the wordpress database. this is just what i had hoped to avoid by using caif.
unfortunately the code was a little hard to read and the regular expressions depended on a typical haloscan template layout—mine had been significantly modified. i wanted to make things a little easier on myself, so i added the raw haloscan data in an html comment beneath each actual comment, but before {HSCommentEnd}. this allowed me to write and debug the import script without having to turn off the ability to leave comments.
note: i’ve never collected commentor’s email addresses, so i didn’t bother to collect them here.
update: it occured to me that the ability to change the haloscan template requires that you’ve donated some amount of money to become a premium customer. since i imagine few people have taken that step, you may be better off using ravingmadness’ haloscan import script which uses regular expressions to extract the comment data based on the standard haloscan template.
update: ravingmadness’s haloscan import script is out of date and probably won’t work for you unless you’re comfortable hacking and debugging php. i hope to write a new and improved import-haloscan script for wordpress at some point in the future. (13-jun-2005)
update: i’ve written a script that imports haloscan comments directly from the exported CAIF files into wordpress. see importing haloscan comments into wordpress for more information. as a result you can safely disregard most of the information in this post about importing comments. (23-jun-2005)
then i wrote a script that loops through each imported wordpress post, gets the wordpress post ID and the old blogger post id from within the html comment in post_content. then it fetches the appropriate haloscan url using the unique blogger post id, parses the html for the commented out comment content, and inserts the comment fields into wp_comments. though i don’t recommend this script for public consumption it might help someone out there: import-haloscan.phps.
before using it, i’d recommend commenting out line 110 mysql_query($sql); (which inserts the comment into the wordpress wp_comments table) to make sure the code is working for you.
there really isn’t enough time in the day for me to do all the things i want to do. that’s why this blog is called justinsomnia. because all too often i find myself squeezing just another hour out of the day (at the expense of getting 8ish hours of sleep) to read blogs or write for my own.
back to work today after 6 days away. dove headlong into the email that had built up while i was in san diego. pine, it turns out, has become something of a liability when traveling. it’s my preferred email client, but due to the nature of the ssh connection i was using it over, checking my unc email using pine remotely is almost impossible. and i hesitate to use a local mail client like thunderbird because of the security issues of sending my password over the wireless network in plain text in a room full of some of the brightest minds in hackerdom. does anyone know a simple way to use an email client like thunderbird more securely? and unc’s webmail doesn’t filter out the spam like i’ve configured pine to do–though there’s probably a way to do that as well. suffice it to say, i let some emails lay fallow.
the virtues of idleness and getting things done have been fighting it out in my mind lately. i just want to be less distracted. and more focused. so i dove headlong into work today, hacking away at code even though i felt nowhere near flow. i think flow is overrated. i mean, i love feeling totally in the groove of something i’m doing, but usually it’s to the detriment of everything else.
i caught up with alice for dinner and heard all about her news, which is cool. chatted with the parents on the way over to jane’s apartment. distracted jane from her homework which was distracting her from her masters paper. now i’m home. and it’s after midnight.
i have a wordpress feature request. i wish assigning posts to categories was more like these folksonomy tagging systems that are all the rage (e.g. del.icio.us, technorati, flickr). i don’t want to figure out the categories in advance. i just want to categorize them on the fly right now, and sort things out later.
larry lessig riled up the crowd with a powerpoint duet (where the powerpoint presentation automatically responded to/emphasized whatever he was saying). it was pretty effective, but almost too theatrical to take seriously. i kept wondering whether the presentation might keep going ahead all milli vanilla-style if he got tripped up. maybe he had some well practiced graduate assistant keeping things in sync. oh, and he showed some funny popcultural mashups. note to self: any successful powerpoint presentation must include video.
chris anderson talked about the long tail which everyone will try to apply to every phenomenon from now on. if “long tail” isn’t a term on the yahoo buzz game, it should be. if every conceivable domain name with “long tail” in it hasn’t been snatched up i would be surprised. if any business plan doesn’t mention how it will be milking the long tail (using those words), it should be aborted asap.
evan williams of blogger fame demoed a dev version of odeo. imagine a recording studio (albeit greatly simplified) in your web browser. record audio through the mic. mix it with other audio clips (recorded or uploaded). publish it. very cool.
and then it was done.
later that night paul, patrick, and i trekked out to the yardhouse to celebrate a successful etech and a happy st. patrick’s day. good conversation and excellent beer was consumed, some it in glasses half a yard tall. are there pictures? hell yeah.