Work Archives, page 9

It’s hard not to blog about work. It’s hard to blog about work.

From the Belly of the Beasts, a eulogy

When I worked at O’Reilly Media, the Online Publishing Group (OPG) had this external-facing blog they called “From the Belly of the Beasts”, a reference to O’Reilly’s famous animal-branded technical books. The blog lived at blogs.oreillynet.com/beasts/.

Screenshot of OPG's 'From the Belly of the Beasts' blog

It was started in October 2004 by Terrie Miller, the group’s manager. I arrived in May 2005, hired partly to assist with O’Reilly’s blog and feed projects. I loved that our group had a blog of its own, so I started posting to it within days of my arrival.

In order to justify blogging at work (to myself), I treated it as a knowledge sharing tool. I’d usually post useful things I found online and then pass the URL around to my co-workers. Frequently I’d refer back to an old post to jog my memory (or someone else’s). It was also a place to be a little silly, to lighten the mood. I posted there frequently until I left O’Reilly in June 2006. After my last post only two more posts appeared, one by Mark and one by Terrie, and from that point on the blog has lain dormant. Update: Terrie left O’Reilly in March 2008.

I’ve always been a little surprised, and of course happy that those old posts still exist out there on the web. They come from a special time in my life, my first year in California, my first job outside of the university. I definitely felt I was leaving behind something dear when I left O’Reilly. I expected the blog to just cease to exist one day, presumably after someone unplugged some dusty server in the corner that happened to be blogs.oreillynet.com.

Anyway yesterday I heard from Mark that the Beasts blog is indeed getting “unplugged”, and he wanted to know whether I wanted to save anything beforehand. Yikes, yes! So I’m going to recreate each of the posts I wrote there on Justinsomnia, categorize them in such a way that they can be pulled out together, and postdate them appropriately. I’m actually really excited about this, to me it’s like getting back a piece of myself for my archives.

The new view, panoramic

The panoramic view from my desk, in Sausalito, CA

The view from my new desk

The new view from my desk

I felt sort of bad posting the old view from my desk, but after the FM tech team moved to a new building across the street today, I feel like I’ve received a significant upgrade.

View from my new desk at Federated Media in Sausalito: San Francisco

Some light reading at work

A present from FM Tech:

Justin reading the Cucko's egg
photo credit: Ivan’s iPhone

In other news, Justinsomnia has reappeared in Google. Hurray!

Load object by id + database call or by array?

I closed my post I heart foreach with the following question:

What I don’t know is how the performance savings of select * from table compares to the memory savings of select id from table + select * from table where id = $id.

Tonight I decided to answer that question. First I installed Xdebug. I already had Pear, so this did the trick:

sudo apt-get install php5-dev
sudo pecl install xdebug-beta

Then I added extension=xdebug.so to php.ini and restarted Apache. Thanks to Install xdebug for PHP5 on Ubuntu Feisty for the steps.

Then I had some fun. I downloaded datasets from Frequently Occurring First Names and Surnames From the 1990 Census, turned them into PHP arrays, and then generated a table with 100,000 records consisting of a primary key id, a random first name, and a random last name. Here’s a sample:

mysql> select * from names order by rand() limit 10;
+-------+------------+-----------+
| id    | first_name | last_name |
+-------+------------+-----------+
|  2535 | Lynelle    | Schmucker | 
| 25092 | Arnette    | Treglia   | 
| 92914 | Pandora    | Vonruden  | 
| 84120 | Nicholas   | Granados  | 
| 34593 | Roseline   | Salvitti  | 
| 40300 | Blondell   | Guerard   | 
| 50564 | Ivelisse   | Schuur    | 
| 15215 | Lynetta    | Naiman    | 
| 22300 | Antonia    | Lineback  | 
| 70832 | Jesse      | Agoras    | 
+-------+------------+-----------+

As a starting point, I wanted to test the “array of objects” design, and profile it using xdebug_time_index, xdebug_memory_usage, and xdebug_peak_memory_usage. The Xdebug documentation doesn’t really explain what those measure (seconds and bytes, I assume), but I figured at least it’d at least give me a way to judge relative performance and memory load.

So here was my reference code:

<?php
mysql_connect('localhost', 'root', '');
mysql_select_db('test');

class Name
{
  public $id;
  public $first_name;
  public $last_name;

  public function __construct($id = 0)
  {
    if ($id > 0) {
      $rst = mysql_query("select * from names where id = $id");
      $row = mysql_fetch_array($rst);
      $this->id         = $row['id'];
      $this->first_name = $row['first_name'];
      $this->last_name  = $row['last_name'];
    }
  }

  public static function bySQL($sql)
  {
    $rst = mysql_query($sql);
    $names = array();
    while ($row = mysql_fetch_array($rst)) {
      $names[] = new Name($row['id']);
    }
    return $names;
  }
}

And here was the test:

foreach (Name::bySQL("select id from names") as $name) {
  // do something
}

print "time: " . xdebug_time_index()        . "<br />";
print "pmem: " . xdebug_peak_memory_usage() . "<br />";
print "mem:  " . xdebug_memory_usage()      . "<br />";

The result was pretty scary. It took a long time to build and loop through all 100,000 objects:

time: 50.3534600735 (50-70 sec)
pmem: 39,267,200
mem: 3,205,784

Just for grins and giggles, I decided to make a modification to the constructor and bySQL functions, to pass along the complete array representing the database row, rather than having to go back and query the database again:

class Name
{
  public $id;
  public $first_name;
  public $last_name;

  public function __construct($id = 0)
  {
    if (is_array($id)) {
      $this->id         = $id['id'];
      $this->first_name = $id['first_name'];
      $this->last_name  = $id['last_name'];
    
    } elseif ($id > 0) {
      $rst = mysql_query("select * from names where id = $id");
      $row = mysql_fetch_array($rst);
      $this->id         = $row['id'];
      $this->first_name = $row['first_name'];
      $this->last_name  = $row['last_name'];
    }
  }

  public static function bySQL($sql)
  {
    $rst = mysql_query($sql);
    $names = array();
    while($row = mysql_fetch_array($rst)) {
      $names[] = new Name($row);
    }
    return $names;
  }
}

Here was the test this time (select * instead of id):

foreach (Name::bySQL("select * from names") as $name) {
   //do something
}

The result was astounding:

time: 8.96280193329 (6-9 secs)
pmem: 34,094,768
mem: 3,207,200

Peak memory usage dropped only slightly, but the script execution time dropped by almost an order of magnitude. Clearly most of the time was being spent executing those 100,000 additional selects (duh!), and there was no memory cost associated with doing an initial select *.

Next I tried the Collection class detailed in I heart foreach, minus the mysql_connect and mysql_select_db function calls. The test looked like this:

foreach (new Collection("Name", "select id from names") as $name) {
  // do something
}

Once again, performance was surprising. I was expecting an all around performance and memory usage improvement, but it turns out the Collection class actually ran slower:

time: 62.3732869625
pmem: 77,040
mem: 76,944

Of course memory-wise, the Collection object kills. From 39MB to 77KB! Granted this was still using the select id design, so I modified Collection::cacheNext() to pass along the whole array (not just the id value):

$this->value = new $this->class_name($row);

Using the second version of the Name class above, the test became:

foreach (new Collection("Name", "select * from names") as $name) {
  // do something
}

And finally I got the results I was looking for. Memory usage was less, and time dropped from 60 seconds to 13.

time: 12.9740481377
pmem: 75,960
mem: 66,872

Moral of the story: select * from table is much faster than select id from table + select * from table where id = $id and isn’t burdened by any sort of memory tax.