Load object by id + database call or by array?

I closed my post I heart foreach with the following question:

What I don’t know is how the performance savings of select * from table compares to the memory savings of select id from table + select * from table where id = $id.

Tonight I decided to answer that question. First I installed Xdebug. I already had Pear, so this did the trick:

sudo apt-get install php5-dev
sudo pecl install xdebug-beta

Then I added extension=xdebug.so to php.ini and restarted Apache. Thanks to Install xdebug for PHP5 on Ubuntu Feisty for the steps.

Then I had some fun. I downloaded datasets from Frequently Occurring First Names and Surnames From the 1990 Census, turned them into PHP arrays, and then generated a table with 100,000 records consisting of a primary key id, a random first name, and a random last name. Here’s a sample:

mysql> select * from names order by rand() limit 10;
+-------+------------+-----------+
| id    | first_name | last_name |
+-------+------------+-----------+
|  2535 | Lynelle    | Schmucker | 
| 25092 | Arnette    | Treglia   | 
| 92914 | Pandora    | Vonruden  | 
| 84120 | Nicholas   | Granados  | 
| 34593 | Roseline   | Salvitti  | 
| 40300 | Blondell   | Guerard   | 
| 50564 | Ivelisse   | Schuur    | 
| 15215 | Lynetta    | Naiman    | 
| 22300 | Antonia    | Lineback  | 
| 70832 | Jesse      | Agoras    | 
+-------+------------+-----------+

As a starting point, I wanted to test the “array of objects” design, and profile it using xdebug_time_index, xdebug_memory_usage, and xdebug_peak_memory_usage. The Xdebug documentation doesn’t really explain what those measure (seconds and bytes, I assume), but I figured at least it’d at least give me a way to judge relative performance and memory load.

So here was my reference code:

<?php
mysql_connect('localhost', 'root', '');
mysql_select_db('test');

class Name
{
  public $id;
  public $first_name;
  public $last_name;

  public function __construct($id = 0)
  {
    if ($id > 0) {
      $rst = mysql_query("select * from names where id = $id");
      $row = mysql_fetch_array($rst);
      $this->id         = $row['id'];
      $this->first_name = $row['first_name'];
      $this->last_name  = $row['last_name'];
    }
  }

  public static function bySQL($sql)
  {
    $rst = mysql_query($sql);
    $names = array();
    while ($row = mysql_fetch_array($rst)) {
      $names[] = new Name($row['id']);
    }
    return $names;
  }
}

And here was the test:

foreach (Name::bySQL("select id from names") as $name) {
  // do something
}

print "time: " . xdebug_time_index()        . "<br />";
print "pmem: " . xdebug_peak_memory_usage() . "<br />";
print "mem:  " . xdebug_memory_usage()      . "<br />";

The result was pretty scary. It took a long time to build and loop through all 100,000 objects:

time: 50.3534600735 (50-70 sec)
pmem: 39,267,200
mem: 3,205,784

Just for grins and giggles, I decided to make a modification to the constructor and bySQL functions, to pass along the complete array representing the database row, rather than having to go back and query the database again:

class Name
{
  public $id;
  public $first_name;
  public $last_name;

  public function __construct($id = 0)
  {
    if (is_array($id)) {
      $this->id         = $id['id'];
      $this->first_name = $id['first_name'];
      $this->last_name  = $id['last_name'];
    
    } elseif ($id > 0) {
      $rst = mysql_query("select * from names where id = $id");
      $row = mysql_fetch_array($rst);
      $this->id         = $row['id'];
      $this->first_name = $row['first_name'];
      $this->last_name  = $row['last_name'];
    }
  }

  public static function bySQL($sql)
  {
    $rst = mysql_query($sql);
    $names = array();
    while($row = mysql_fetch_array($rst)) {
      $names[] = new Name($row);
    }
    return $names;
  }
}

Here was the test this time (select * instead of id):

foreach (Name::bySQL("select * from names") as $name) {
   //do something
}

The result was astounding:

time: 8.96280193329 (6-9 secs)
pmem: 34,094,768
mem: 3,207,200

Peak memory usage dropped only slightly, but the script execution time dropped by almost an order of magnitude. Clearly most of the time was being spent executing those 100,000 additional selects (duh!), and there was no memory cost associated with doing an initial select *.

Next I tried the Collection class detailed in I heart foreach, minus the mysql_connect and mysql_select_db function calls. The test looked like this:

foreach (new Collection("Name", "select id from names") as $name) {
  // do something
}

Once again, performance was surprising. I was expecting an all around performance and memory usage improvement, but it turns out the Collection class actually ran slower:

time: 62.3732869625
pmem: 77,040
mem: 76,944

Of course memory-wise, the Collection object kills. From 39MB to 77KB! Granted this was still using the select id design, so I modified Collection::cacheNext() to pass along the whole array (not just the id value):

$this->value = new $this->class_name($row);

Using the second version of the Name class above, the test became:

foreach (new Collection("Name", "select * from names") as $name) {
  // do something
}

And finally I got the results I was looking for. Memory usage was less, and time dropped from 60 seconds to 13.

time: 12.9740481377
pmem: 75,960
mem: 66,872

Moral of the story: select * from table is much faster than select id from table + select * from table where id = $id and isn’t burdened by any sort of memory tax.

Care to Comment?