23rd September 2005 @ 03:25 EDT

I just wanted to follow up on Davey's post about extending PDO, because I think he isn't being very clear about what he's doing (no offence, Davey!)

Davey's Cortex framework allows you to pass in either a DSN or an already-instantiated-PDO object to its data object class, and Davey's post claims, quite rightly, that it is faster to not create brand new connections each time you create an instance of his framework objects.

Let's see if we can come up with a slightly more scientific and perhaps more fair set of tests.

When benchmarking it's important to get some decent numbers. If your test is over in less than 1 second, your readings are probably wildly inaccurate because your system may not have had a chance to properly utilize hardware or OS level caches or otherwise adjust in the same way that it would be doing under a consistent level of load.

If you're running a quick test, try to make it last more than 10 seconds. If you want to get a better numbers, run the test for longer; 5 or 10 minutes should give pretty decent results for a given code fragment, and an hour is probably as good as you could ever hope for.

Here's a simple test harness that runs for approximately 5 minutes:

<?php
    $start = time();
    $deadline = $start + (5 * 60); // run for 5 minutes
    $iters = 0;
    do {    
      something();
      ++$iters;
    } while (time() <= $deadline);
    $end = time();
    $diff = $end - $start;
    printf("Ran %.2f iterations per minute (%d/%d)\n",
        (60.0 * $iters) / $diff, $iters, $diff);
    ?>

This harness simply repeats a task until the time limit is more or less up, and then summarizes how many times it managed to run within the specified time, normalizing it to iterations per minute.

Notice that I'm not particular bothered about sub-second time intervals here, because they don't really have much impact when compared to a 5 minute time duration--5 minutes plus or minus half a second is still near as damn it 5 minutes.

Our first test creates some kind of object that does some kind of work on a database connection. We'll make this one re-connect to the database each time; this is equivalent to Davey's extending PDO case:

<?php
   // represents some object in your framework
   class TestObject {
      var $db;
      function __construct($db) {
         $this->db = $db;
      }
      function doWork() {
         # Limited to 100 rows, because the connection cost
         # will be lost in the noise of the fetch otherwise
         array_reverse($this->db->query("select * from words LIMIT 100")->fetchAll());
      }
   }
   function something() {
       $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The next test uses the same test object class, but caches the PDO instance. This is equivalent to Davey's call proxying case:

<?php
   function something() {
       static $db = null;
       if ($db === null) $db = new PDO($dsn, $user, $pass);
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

The third test uses persistent connections; this is equivalent to Davey's extending PDO case, but "smarter"; even though $db falls out of scope and is destroyed at the end of each call to the something() function, the underlying connection is cached so that subsequent calls don't need to re-connect. This is transparent to the calling script, except for the extra parameter to the constructor, and is generally a very good thing to do with database connections:

<?php
   function something() {
       $db = new PDO($dsn, $user, $pass, array(PDO::ATTR_PERSISTENT => true));
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>

Here are the results I got; since I'm lazy I'm only running mine for about 30 seconds each. I used a sqlite database with the contents of /usr/share/dict/words inserted into it (234937 words).

   one:   Ran 46734.19 iterations per minute (24146/31)
   two:   Ran 68504.52 iterations per minute (35394/31)
   three: Ran 64689.68 iterations per minute (33423/31)

The results speak for themselves; if you're initiating connections every time to you want to do some work, it's the slowest. If you cache the connection in a PHP variable it's faster than making persistent connections to PDO, because it doesn't need to create a new object each time. Persistent connections are "almost" as fast as caching in PHP variables; they need to create a new object but still reference the same connection internally.

It's work mentioning that benchmarks are tricky things. For instance, if you take out the "LIMIT 100" clause from the SELECT statement, the connection overhead time becomes so small in comparison to the time it takes to fetch the data, that all the tests wind up the same (about 18 iterations per minute with my test data). Similarly, if you limit the fetch to 1 row, you'll see a more exaggerated difference in the numbers, because the benchmark script is exercising your system differently.

If you're running against mysql, the differences between test one and test two will be greater because there is more overhead in establishing a connection over a socket than there is for sqlite to open up a file or two. You'll see a bigger difference again when connection to Oracle, because it does a pretty hefty amount of work at connection time.

The main lesson to be learned here is that benchmarking an artifical code fragment will give you artificial results; they can help you guage how fast something will run in general, but the answers you get back depend on the questions you're asking. If you don't ask appropriate or even enough questions (eg: Davey's quick tests didn't include persistent connections), you're not going to get all the information you need to tune your application effectively.

PS: There's a fourth test case that I didn't cover; it's important and probably yields the best results out of all the cases presented here. Anyone care to suggest what that case might be?

by Wez Furlong in .
Post a comment
23rd September 2005 @ 07:18 EDT

considering the nature of the benchmark it seems that the quickest would be to cache the dataset result and not bother with the connection or the query at all.

by boots in .
23rd September 2005 @ 12:42 EDT

but that's answering a slightly different question.

Dataset caching will massively improve the results for these tests, because they would only need to issue the query on the first time through. What you'll end up seeing if you do that is all three test cases running with approximately the same number of iterations per minute--dataset caching changes the system and you wind up measuring how fast PHP can copy arrays rather than how fast PDO can give you a working database connection.

Contrived benchmarks like this are tricky beasts, eh?

The missing 4th case doesn't stray very far from what I've already covered in the blog post; any other ideas?

23rd September 2005 @ 16:45 EDT

How about both caching the PDO instance and using persistent connections?

<?php
   function something() {
       static $db = null;
       if ($db === null) $db = new PDO($dsn, $user, $pass, array(PDO::ATTR_PERSISTENT => true));
       $obj = new TestObject($db);
       $obj->doWork();
   }
   ?>
by dkaylor in .
24th September 2005 @ 01:29 EDT

I expect the numbers for this case to be (near as damn it) identical to test 2, but this way is better because we're caching the connection for the next request to be run. Our test harness (as it stands right now) can't measure that gain, so the numbers will appear to be identical.

When running for real, the second and subsequent runs through the web server will shave off the connection overhead completely.

What have we learned? Two things:

  • It's good practice to cache expensive objects and resources in your script
  • It's good practice to use the persistent options for those objects/resources where available

We haven't necessarily learned enough to boldly state that this will improve performance by X% across the board. These kind of numbers are usually very flawed because of the limited scope of the contrived test cases. Suffice to say that the numbers we've seen are enough to prove that this is faster, within the scope of the tests that we ran.

(yes, that sounds like a lawyer talking :)

24th September 2005 @ 06:56 EDT

Hi wez. Yes, I was obviously skewing the benchmark with a out-of-band optimization (which is often de-rigor for benchmarks, unfortunately). As you noted, extraordinary gains come from caching results -- the rest is really just academic in real-world performance when dealing with good overall driver implementations. PDO obviously meets that expectation -- hats off to you for yet another fantastic contribution to PHP and the community in general.

Respect and many thanks!

by boots in .
Post a comment