Load object by id + database call or by array?
I closed my post I heart foreach with the following question:
What I don’t know is how the performance savings of
select * from table
compares to the memory savings ofselect id from table
+select * from table where id = $id
.
Tonight I decided to answer that question. First I installed Xdebug. I already had Pear, so this did the trick:
sudo apt-get install php5-dev sudo pecl install xdebug-beta
Then I added extension=xdebug.so
to php.ini and restarted Apache. Thanks to Install xdebug for PHP5 on Ubuntu Feisty for the steps.
Then I had some fun. I downloaded datasets from Frequently Occurring First Names and Surnames From the 1990 Census, turned them into PHP arrays, and then generated a table with 100,000 records consisting of a primary key id, a random first name, and a random last name. Here’s a sample:
mysql> select * from names order by rand() limit 10; +-------+------------+-----------+ | id | first_name | last_name | +-------+------------+-----------+ | 2535 | Lynelle | Schmucker | | 25092 | Arnette | Treglia | | 92914 | Pandora | Vonruden | | 84120 | Nicholas | Granados | | 34593 | Roseline | Salvitti | | 40300 | Blondell | Guerard | | 50564 | Ivelisse | Schuur | | 15215 | Lynetta | Naiman | | 22300 | Antonia | Lineback | | 70832 | Jesse | Agoras | +-------+------------+-----------+
As a starting point, I wanted to test the “array of objects” design, and profile it using xdebug_time_index
, xdebug_memory_usage
, and xdebug_peak_memory_usage
. The Xdebug documentation doesn’t really explain what those measure (seconds and bytes, I assume), but I figured at least it’d at least give me a way to judge relative performance and memory load.
So here was my reference code:
<?php mysql_connect('localhost', 'root', ''); mysql_select_db('test'); class Name { public $id; public $first_name; public $last_name; public function __construct($id = 0) { if ($id > 0) { $rst = mysql_query("select * from names where id = $id"); $row = mysql_fetch_array($rst); $this->id = $row['id']; $this->first_name = $row['first_name']; $this->last_name = $row['last_name']; } } public static function bySQL($sql) { $rst = mysql_query($sql); $names = array(); while ($row = mysql_fetch_array($rst)) { $names[] = new Name($row['id']); } return $names; } }
And here was the test:
foreach (Name::bySQL("select id from names") as $name) { // do something } print "time: " . xdebug_time_index() . "<br />"; print "pmem: " . xdebug_peak_memory_usage() . "<br />"; print "mem: " . xdebug_memory_usage() . "<br />";
The result was pretty scary. It took a long time to build and loop through all 100,000 objects:
time: 50.3534600735 (50-70 sec) pmem: 39,267,200 mem: 3,205,784
Just for grins and giggles, I decided to make a modification to the constructor and bySQL functions, to pass along the complete array representing the database row, rather than having to go back and query the database again:
class Name { public $id; public $first_name; public $last_name; public function __construct($id = 0) { if (is_array($id)) { $this->id = $id['id']; $this->first_name = $id['first_name']; $this->last_name = $id['last_name']; } elseif ($id > 0) { $rst = mysql_query("select * from names where id = $id"); $row = mysql_fetch_array($rst); $this->id = $row['id']; $this->first_name = $row['first_name']; $this->last_name = $row['last_name']; } } public static function bySQL($sql) { $rst = mysql_query($sql); $names = array(); while($row = mysql_fetch_array($rst)) { $names[] = new Name($row); } return $names; } }
Here was the test this time (select *
instead of id
):
foreach (Name::bySQL("select * from names") as $name) { //do something }
The result was astounding:
time: 8.96280193329 (6-9 secs) pmem: 34,094,768 mem: 3,207,200
Peak memory usage dropped only slightly, but the script execution time dropped by almost an order of magnitude. Clearly most of the time was being spent executing those 100,000 additional selects (duh!), and there was no memory cost associated with doing an initial select *
.
Next I tried the Collection class detailed in I heart foreach, minus the mysql_connect and mysql_select_db function calls. The test looked like this:
foreach (new Collection("Name", "select id from names") as $name) { // do something }
Once again, performance was surprising. I was expecting an all around performance and memory usage improvement, but it turns out the Collection class actually ran slower:
time: 62.3732869625 pmem: 77,040 mem: 76,944
Of course memory-wise, the Collection object kills. From 39MB to 77KB! Granted this was still using the select id
design, so I modified Collection::cacheNext()
to pass along the whole array (not just the id value):
$this->value = new $this->class_name($row);
Using the second version of the Name class above, the test became:
foreach (new Collection("Name", "select * from names") as $name) { // do something }
And finally I got the results I was looking for. Memory usage was less, and time dropped from 60 seconds to 13.
time: 12.9740481377 pmem: 75,960 mem: 66,872
Moral of the story: select * from table
is much faster than select id from table
+ select * from table where id = $id
and isn’t burdened by any sort of memory tax.