Moving from Blogger to WordPress 1.5

Update: This post is not intended for the faint of heart. It describes in some detail what it took for me to move my Blogger blog (circa March 2005) to WordPress 1.5. Much of it is obsolete, as WordPress 2.0 has a much simpler, more user-friendly process for importing posts and comments from Blogger. If you’re interested in moving your Haloscan comments to WordPress 2.x along with your Blogger blog, please see my “new and improved” version of this post, Importing Haloscan comments into WordPress from Blogger.

WordPress logowhen i moved from blogger to wordpress 1.5 (a few weeks ago) the process proved to be a little less than painless. partly because the following two requirements of mine were somewhat outside the scope of the provided blogger import script.

  1. i wanted the existing blogger permalink post urls (based on the year, month, and post title) to be the same in wordpress (ignoring the .html difference which could be solved with mod_rewrite). Update: I’ve written a new post with information on maintaining your permalinks when moving from Blogger to WordPress. (17-Oct-2006)
  2. i wanted to import my haloscan comments. update: i’ve written a new post with information on importing haloscan comments into wordpress. (23-Jun-2005)

as a result, what you’re reading below is a partial simplification of the 3 tries it took before i was able to successfully import everything. knowing the following sql:

RENAME TABLE tbl_name TO new_tbl_name;

SHOW CREATE TABLE tbl_name;

was extremely useful when i needed to rename a table with munged data and then recreate it without reinstalling wordpress.

haloscan preparations

i had to make the following two changes to import-blogger.php (based on code written by ravingmadness) to ensure that the unique id blogger assigns to every post got included in the content of the blog post as an html comment. since haloscan uses this id to associate comments with each post, the number is crucial in maintaining that link when importing the comments later.

uncomment line 61:
$post_number = $postinfo[3];

change line 127 to this:
$post_content = addslashes("<!--" . $post_number . "-->" . $post_content);

it occurs to me now that it probably would have been easier to include the post id as a comment in the simplified blogger template below, rather than having to modify blogger-import.php. for an example of this, see my post importing haloscan comments into wordpress.

standard blogger import

after running the blogger import script (blogger-import.php) for the first time, it instructed me to replace the blogger template with the following:

<blogger><wordpresspost><$BlogItemDateTime$>|||<$BlogItemAuthorNickname$>|||<$BlogItemBody$>|||<$BlogItemNumber$>|||<$BlogItemSubject$></wordpresspost></blogger>

then change a few other settings in blogger, republish the whole blog to the root wordpress directory, and finally start the import process.

maintaining the post page filenames from blogger

Update: I’ve written an additional post entitled Maintain permalinks moving from Blogger to WordPress that enables you to automatically import your Blogger permalinks (aka the “post slug”) into WordPress 2.0 along with your posts and comments. (17-Oct-2006)

in order to maintain the permalink urls for each post created by blogger, i had to make sure the “post slug” (aka dirified post title) in wordpress matched the filename of the post page created by blogger. unfortunately, blogger’s dirify algorithm is different than wordpress’, leaving out articles (e.g. “a”, “the”) and truncating the title length.

first i fixed several imported posts that didn’t have titles at all (oops). blogger still dirified them (based on the first few words of the content), but wordpress did not. luckily there were only three, which i fixed by hand:

  1. ran the following query:
    SELECT LEFT(post_content,80) FROM wp_posts WHERE post_title = '';
  2. based on the content, looked up the posts in blogger to figure out how it had dirified them
  3. updated the post slug and title in wordpress manually

next i had to get the actual permalink urls out of blogger and into a format that i could use to compare with the newly imported posts in wordpress. i created the following template, with a tab between the two tags to get a list of the titles and their dirified permalinks.

<blogger>
<$BlogItemTitle$>	<$BlogItemPermalinkURL$>
</blogger>

i concatenated all the new monthly archive files blogger produced:

cat *_wordpress.php > blogger_post_titles.txt

then ran some regular expressions to clean up blank lines, removing .html on the ends of the permalinks, and the hostname/path from the beginning. at this point i had a nice clean tab-separated file of post titles and slugs to import into mysql.

i created a table in the wordpress database to store the (thankfully unique) post titles and slugs.

CREATE TABLE blogger_post_titles (post_title text, post_slug text)

i used the mysqlimport utility to import the tab-separated text file into the database table.

mysqlimport -p wp blogger_post_titles.txt

the following query updated the post slugs in wordpress to be equal to the filenames i’d extracted from blogger.

UPDATE wp_posts
LEFT JOIN blogger_post_titles ON wp_posts.post_title = blogger_post_titles.post_title
SET wp_posts.post_name = blogger_post_titles.post_slug

rewrite rule to match blogger urls with wordpress permalinks

now i had to write a rewrite rule that would catch requests for the old post page urls with .html extensions (e.g. /2004/02/hello-world.html) and redirect them to urls without (e.g. /2004/02/hello-world/). figuring this out took much longer than i expected, mostly because i was trying to accomplish it with an apache redirect directive. it turned out to interact badly with the rewrite rules created by wordpress. so i ended up writing a rewrite rule instead (outside the block of rules generated by wordpress).

RewriteRule ^([0-9]{4})/([0-9]{1,2})/([^/]+)\.html$ $1/$2/$3/ [QSA,R]

importing comments from haloscan
this was hard. haloscan allows comment exporting using the caif format, so i started by hacking up phil ringnalda’s caif2mt.php code. i got this almost all the way there when i discovered that haloscan was only exporting the raw text of the comments with none of the html tags. what a waste. i sent a support email in, and still haven’t heard back.

ravingmadness’ previously-cited post got the comment data by requesting each individual comment webpage from haloscan, scraping the page with some regular expressions, and then inserting the appropriate fields into the wordpress database. this is just what i had hoped to avoid by using caif.

unfortunately the code was a little hard to read and the regular expressions depended on a typical haloscan template layout—mine had been significantly modified. i wanted to make things a little easier on myself, so i added the raw haloscan data in an html comment beneath each actual comment, but before {HSCommentEnd}. this allowed me to write and debug the import script without having to turn off the ability to leave comments.

<!--
<export>{HSCommentName}
|||{HSCommentUrl}
|||{HSCommentDate}
|||{HSCommentMessage}
</export>
-->

note: i’ve never collected commentor’s email addresses, so i didn’t bother to collect them here.

update: it occured to me that the ability to change the haloscan template requires that you’ve donated some amount of money to become a premium customer. since i imagine few people have taken that step, you may be better off using ravingmadness’ haloscan import script which uses regular expressions to extract the comment data based on the standard haloscan template.

update: ravingmadness’s haloscan import script is out of date and probably won’t work for you unless you’re comfortable hacking and debugging php. i hope to write a new and improved import-haloscan script for wordpress at some point in the future. (13-jun-2005)

update: i’ve written a script that imports haloscan comments directly from the exported CAIF files into wordpress. see importing haloscan comments into wordpress for more information. as a result you can safely disregard most of the information in this post about importing comments. (23-jun-2005)

then i wrote a script that loops through each imported wordpress post, gets the wordpress post ID and the old blogger post id from within the html comment in post_content. then it fetches the appropriate haloscan url using the unique blogger post id, parses the html for the commented out comment content, and inserts the comment fields into wp_comments. though i don’t recommend this script for public consumption it might help someone out there: import-haloscan.phps.

before using it, i’d recommend commenting out line 110 mysql_query($sql); (which inserts the comment into the wordpress wp_comments table) to make sure the code is working for you.

Feel free to if you found this useful.