i need to learn useful programming

project: i’ve got 68 fairly structured html files (example). they contain a long, non-uniform header that needs to be junked, an <H1> title, a line with author(s), a date, some large chunks of text, and a footer that needs to be junked.

i want to write a quick program to loop through the files, parse them, and import them into a mysql table. this program will be one time use only, but having the knowledge to write this kind of program will be useful in the near future. i feel like perl is the language of champions here, but i haven’t used perl in ages. java? php maybe?

i feel like i should time myself to see how long it will take me to figure this out in perl versus just doing a whole lotta cutting and pasting (4 chunks of information x 68 documents = repetitive stress injury).

update: so far i’ve spent one hour and i’ve figured out enough perl to loop through every file in the directory, spitting out every line in each file.

final update: it took me another 3 hours to get a completely working solution. i had to figure out the perl DBI, which was actually the easiest part. mostly it just took time getting the regular expressions to do what i needed them to do. here’s the code. amazing, huh?

Care to Comment?

Or if you'd prefer to get in touch privately, please send me an email.

Name

Email (optional)

Blog (optional)