Moving from Blogger to WordPress 1.5

Update: This post is not intended for the faint of heart. It describes in some detail what it took for me to move my Blogger blog (circa March 2005) to WordPress 1.5. Much of it is obsolete, as WordPress 2.0 has a much simpler, more user-friendly process for importing posts and comments from Blogger. If you’re interested in moving your Haloscan comments to WordPress 2.x along with your Blogger blog, please see my “new and improved” version of this post, Importing Haloscan comments into WordPress from Blogger.

WordPress logowhen i moved from blogger to wordpress 1.5 (a few weeks ago) the process proved to be a little less than painless. partly because the following two requirements of mine were somewhat outside the scope of the provided blogger import script.

  1. i wanted the existing blogger permalink post urls (based on the year, month, and post title) to be the same in wordpress (ignoring the .html difference which could be solved with mod_rewrite). Update: I’ve written a new post with information on maintaining your permalinks when moving from Blogger to WordPress. (17-Oct-2006)
  2. i wanted to import my haloscan comments. update: i’ve written a new post with information on importing haloscan comments into wordpress. (23-Jun-2005)

as a result, what you’re reading below is a partial simplification of the 3 tries it took before i was able to successfully import everything. knowing the following sql:

RENAME TABLE tbl_name TO new_tbl_name;

SHOW CREATE TABLE tbl_name;

was extremely useful when i needed to rename a table with munged data and then recreate it without reinstalling wordpress.

haloscan preparations

i had to make the following two changes to import-blogger.php (based on code written by ravingmadness) to ensure that the unique id blogger assigns to every post got included in the content of the blog post as an html comment. since haloscan uses this id to associate comments with each post, the number is crucial in maintaining that link when importing the comments later.

uncomment line 61:
$post_number = $postinfo[3];

change line 127 to this:
$post_content = addslashes("<!--" . $post_number . "-->" . $post_content);

it occurs to me now that it probably would have been easier to include the post id as a comment in the simplified blogger template below, rather than having to modify blogger-import.php. for an example of this, see my post importing haloscan comments into wordpress.

standard blogger import

after running the blogger import script (blogger-import.php) for the first time, it instructed me to replace the blogger template with the following:

<blogger><wordpresspost><$BlogItemDateTime$>|||<$BlogItemAuthorNickname$>|||<$BlogItemBody$>|||<$BlogItemNumber$>|||<$BlogItemSubject$></wordpresspost></blogger>

then change a few other settings in blogger, republish the whole blog to the root wordpress directory, and finally start the import process.

maintaining the post page filenames from blogger

Update: I’ve written an additional post entitled Maintain permalinks moving from Blogger to WordPress that enables you to automatically import your Blogger permalinks (aka the “post slug”) into WordPress 2.0 along with your posts and comments. (17-Oct-2006)

in order to maintain the permalink urls for each post created by blogger, i had to make sure the “post slug” (aka dirified post title) in wordpress matched the filename of the post page created by blogger. unfortunately, blogger’s dirify algorithm is different than wordpress’, leaving out articles (e.g. “a”, “the”) and truncating the title length.

first i fixed several imported posts that didn’t have titles at all (oops). blogger still dirified them (based on the first few words of the content), but wordpress did not. luckily there were only three, which i fixed by hand:

  1. ran the following query:
    SELECT LEFT(post_content,80) FROM wp_posts WHERE post_title = '';
  2. based on the content, looked up the posts in blogger to figure out how it had dirified them
  3. updated the post slug and title in wordpress manually

next i had to get the actual permalink urls out of blogger and into a format that i could use to compare with the newly imported posts in wordpress. i created the following template, with a tab between the two tags to get a list of the titles and their dirified permalinks.

<blogger>
<$BlogItemTitle$>	<$BlogItemPermalinkURL$>
</blogger>

i concatenated all the new monthly archive files blogger produced:

cat *_wordpress.php > blogger_post_titles.txt

then ran some regular expressions to clean up blank lines, removing .html on the ends of the permalinks, and the hostname/path from the beginning. at this point i had a nice clean tab-separated file of post titles and slugs to import into mysql.

i created a table in the wordpress database to store the (thankfully unique) post titles and slugs.

CREATE TABLE blogger_post_titles (post_title text, post_slug text)

i used the mysqlimport utility to import the tab-separated text file into the database table.

mysqlimport -p wp blogger_post_titles.txt

the following query updated the post slugs in wordpress to be equal to the filenames i’d extracted from blogger.

UPDATE wp_posts
LEFT JOIN blogger_post_titles ON wp_posts.post_title = blogger_post_titles.post_title
SET wp_posts.post_name = blogger_post_titles.post_slug

rewrite rule to match blogger urls with wordpress permalinks

now i had to write a rewrite rule that would catch requests for the old post page urls with .html extensions (e.g. /2004/02/hello-world.html) and redirect them to urls without (e.g. /2004/02/hello-world/). figuring this out took much longer than i expected, mostly because i was trying to accomplish it with an apache redirect directive. it turned out to interact badly with the rewrite rules created by wordpress. so i ended up writing a rewrite rule instead (outside the block of rules generated by wordpress).

RewriteRule ^([0-9]{4})/([0-9]{1,2})/([^/]+)\.html$ $1/$2/$3/ [QSA,R]

importing comments from haloscan

this was hard. haloscan allows comment exporting using the caif format, so i started by hacking up phil ringnalda’s caif2mt.php code. i got this almost all the way there when i discovered that haloscan was only exporting the raw text of the comments with none of the html tags. what a waste. i sent a support email in, and still haven’t heard back.

ravingmadness’ previously-cited post got the comment data by requesting each individual comment webpage from haloscan, scraping the page with some regular expressions, and then inserting the appropriate fields into the wordpress database. this is just what i had hoped to avoid by using caif.

unfortunately the code was a little hard to read and the regular expressions depended on a typical haloscan template layout—mine had been significantly modified. i wanted to make things a little easier on myself, so i added the raw haloscan data in an html comment beneath each actual comment, but before {HSCommentEnd}. this allowed me to write and debug the import script without having to turn off the ability to leave comments.

<!--
<export>{HSCommentName}
|||{HSCommentUrl}
|||{HSCommentDate}
|||{HSCommentMessage}
</export>
-->

note: i’ve never collected commentor’s email addresses, so i didn’t bother to collect them here.

update: it occured to me that the ability to change the haloscan template requires that you’ve donated some amount of money to become a premium customer. since i imagine few people have taken that step, you may be better off using ravingmadness’ haloscan import script which uses regular expressions to extract the comment data based on the standard haloscan template.

update: ravingmadness’s haloscan import script is out of date and probably won’t work for you unless you’re comfortable hacking and debugging php. i hope to write a new and improved import-haloscan script for wordpress at some point in the future. (13-jun-2005)

update: i’ve written a script that imports haloscan comments directly from the exported CAIF files into wordpress. see importing haloscan comments into wordpress for more information. as a result you can safely disregard most of the information in this post about importing comments. (23-jun-2005)

then i wrote a script that loops through each imported wordpress post, gets the wordpress post ID and the old blogger post id from within the html comment in post_content. then it fetches the appropriate haloscan url using the unique blogger post id, parses the html for the commented out comment content, and inserts the comment fields into wp_comments. though i don’t recommend this script for public consumption it might help someone out there: import-haloscan.phps.

before using it, i’d recommend commenting out line 110 mysql_query($sql); (which inserts the comment into the wordpress wp_comments table) to make sure the code is working for you.

Feel free to if you found this useful.

32 Comments

Beth

I need to do same this weekend and have a few questions if you don’t mind helping. 1) If I’ve already run the import script without the post_number mod, do I need to start over? 2) Where exactly do I add the html comment with the raw haloscan data? Thanks in advance!

beth, technically yes. you need to start over. but it’s easy.

  1. copy the create table sql:
    show create table wp_posts
  2. rename the posts table:
    rename table wp_posts to wp_posts_old
  3. run the create table sql you copied:
    create table wp_posts...
  4. then re-run the blogger import script with the modifications to include the blogger post_id as an html comment.

regarding the haloscan comment code, put it anywhere between the {HSCommentStart} and {HSCommentEnd}. I put mine right before the {HSCommentEnd} tag.

Hello Justin,
I came across your page while looking for Blog Roll plugin for WP. (but it’s too hard for me…*sob*) Then I noticed your post about importing from blogger to WP. (but I did it a LONG way. going through MovableType first, then to WP)

I love your blog design a lot. I especially *LOVE* your MONTHCHUNKS archives. (It’s tiny, clean and stylish!) Is it possible to post a tutorial on how to achive that sort of layout? I really really wish to add the same MONTHCHUNKS to my blog :D

Thanks !!!

golfy, here is your answer: monthchunks howto

Moving from blogger to wordpress 1.5: A (very) detailed step by step look at moving your prolific Blogger blog to WordPress. I know there are lots of tutorials out there, but the attention to details of this writeup really caught MY attention.

Why didn’t I find this a week ago!?!?! I wasted SO much time… this is awesome. Well done!

My previous attempts to import my Blogger site into a sub-category of this site have failed, but I’m looking forward to attempting it again, following these guidelines…

Justin did it the hard way, importing into WordPress not only the content from his Blogger blog, but also the comments that were in HaloScan, rather than in Blogger itself. He has been public-spirited enough to post on how he did this.

WordPress Theme : tutorials

A transição de dados do Blogger para o WordPress, apesar de muita dificuldade, foi feita com sucesso.

Translation: “The transistion of data of the Blogger for the WordPress, although much difficulty, was made successfully.”

Jax

how did u make this in a nutshell, can u giv me the html code so i can just copy and paste it please

jax, i’m not sure what you’re asking.

Hi.

Like Beth, I already imported my Blogger posts and now I need to import my 3000+ Haloscan comments. Here are my questions: 1. Where exactly do I change the tables you told Beth to change? 2. Do I have to delete any of the files from the previous Blogger post import? 3. Is this even worth doing or are there a lot of tweaks that need to be made? 4. Is Haloscan’s XML export of any value?

Thanks!

Hilary, you have to get access to the MySQL database that stores your wordpress posts. You can do this by using the mysql client at the command line:

mysql -u username -p -h hostname databasename

or you can use a web interface like phpmyadmin, which dreamhost provides, but which I am less familar with.

If you have not posted to your blogger blog since creating the files to be imported into wordpress, you should not have to delete the import files.

I felt that it was worth doing, I didn’t want to lose my comments, and a lot of the discussion above describes the mistakes I made along the way, in the hope that it would make the process easier for the next person—or would make help the wordpress team improve the import process.

The last time I checked, Haloscan did not export any html (like links in comments, etc) which is why I had to use a screen scraping script.

Unfortunately I found the process complex to get right, and I’ve made notes where I realized in retrospect that it could have been easier if I’d done something differently. I might recommend printing out the post so you can refer to it while you’re in the process of making the changes. Feel free to point out anything in the post that doesn’t make sense or could be clarified. Thanks.

I moved a site from Blogger to WP for a friend. To keep the Blogger permalinks (and the filename with html extension) I went to the “Options” tab and set the Permalink Structure to this:
/%year%/%monthnum%/%postname%.html

Lee, great suggestion. Do realize that solution won’t make up for the problem that Blogger would choose filenames without including any article words (a, an, the) whereas WordPress allows them.

After looking at *thousands* of posts that didn’t have a title, we figured it wasn’t worth the effort to keep the permalinks working. Personally I use the Textpattern cms and coded a plugin to perform some smart translation of URLs.

My previous comment was more of a way to avoid touching the .htaccess file – I’m pretty sure that improperly configured mod_rewrites are the number one cause of rickets. Or is it scurvy? Either way, eat plenty of citrus and drink your milk.

[…] I have to say, I was so impressed by WordPress that I toyed around with the idea of moving Randomdialogue over to the system as well. You can see my results over here (warning: url will expire). I ended up deciding to stick with Blogger for now, but during my mulling period I did a lot of research. What you see on that dev page is the result of a night’s worth of hard work, and without resources I wouldn’t have been able to do it. If you’re thinking of moving a Blogger blog to WordPress, I highly recommend Justinomia’s How-To and Andy Skelton’s improved transfer program. […]

[…] The script I used was the one on the codex. I also had help from Skelton’s site and Justinsomniac’s site. Hopefully all is figured out. I think I got my feed correct. I’m so glad I’m with feedburner, because it was easy to simply replace the originating atom feed with this new rss feed and leave the url to the feedburner rss feed untouched. […]

[…] Blogger truncates the title after a certain length, and does not include articles (”a,” “an,” “and,” “the”).  WordPress uses the entire title.  SO, if you have your own domain, your permalinks will no longer match up.  There’s a way around this, but it’s long, involved, and requires some pretty good technical knowledge.  If you really want to attempt it, you can view the instructions at Justinsomnia.org.  If this doesn’t bother you, you’re fine.  If you were using a .blogspot.com domain, then it won’t matter. […]

Moving from blogger to wordpress 1.5: A (very) detailed step by step look at moving your Blogger blog to WordPress.

Hi Justin,
Any interest in taking on the transition for my blog? This looks a little too involved for my skill level. Here’s my blog

[…] Import Blogger posts into WordPress on the remote host. Justinsomnia has the most detailed and useful instructions. If you are planning to import Haloscan comments, you’ll need to make one slight change to WordPress’s standard Blogger import file before importing posts to make sure the comments can get properly associated with posts. My import ran exceedingly slowly and showed MySQL lost database connection errors until I removed one little space from wp-blog-header.php. Check out this thread on the WordPress support site for details and for the fix. […]

[…] A great tutorial for moving my Haloscan comments to WordPress. Read it through a few times but it worked flawlessly for me. […]

Shane

How can I convert the regular Blogger comments to WordPress, or does that automatically happend with the import tool?

Shane, if you’re importing your Blogger blog into WordPress 2.0, then yes, the Blogger comments are imported.

nice article, interested info for blogger :)

I have blogs in WP and Blogger – both are useful with slightly different strengths – I don’t use the exact same content however, more parallel.

Justin, this tutorial saved me hours and hours during my upgrade from MT -> WP. I just dropped a few $$ in the tip jar.

Thanks!

I exported my entries from Movable Type, then imported into WordPress. But I needed to truncate the post slugs so that my links would be backward compatible.

Since I already had the post IDs (because I had already imported into MT, i was able to use those instead of the post titles (which, on my blog are unruly, non-unique, and have long characters). Worked like a charm!

Cheers!
Frank

thanks, interesting, we have increasingly requests to help transferring blogger blogs to wordpress.

Hi,
I moved my posts from blogger beta to wordpress 2.x.x very easily. I wrote instructions to my blog page, you can find that post in http://tjantunen.com/2007/03/21/import-blogger-posts-to-wordpress/

Care to Comment?

Or if you'd prefer to get in touch privately, please send me an email.

Name

Email (optional)

Blog (optional)