PDO fetch performance problems with many bind parameters

  111694
August 26, 2020 13:02 dino.pejakovic@voxdiversa.hr (=?UTF-8?Q?Dino_Pejakovi=c4=87?=)
--------------E4D9AC3C93DB72AEAA49E019
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit

Hi,


First time messing around with PHP internal code and mailing lists, so 
sorry if I'm in the wrong place or doing something wrong in general.


I recently noticed some weird performance issues while doing bulk 
inserts with prepared statements (single INSERT with a lot of VALUES) 
and using RETURNING clause to get back IDs and other columns.

So I wrote a little benchmark to insert 8000 random rows (3 columns 
each) into a table and spent some time tracking down why it's slow. 
Suprisingly it seems that INSERT itself takes 100-200ms, but 
fetch/fetchAll returning id and one of the columns takes 2-3 seconds.

I'm sending the benchmark script and postgres database schema used.


After digging around PHP source code (pulled master branch), the problem 
seems to be in PDO calling param_hook with PDO_PARAM_EVT_FETCH_PRE and 
again PDO_PARAM_EVT_FETCH_POST  for each fetched row, which causes 
param_hook to be executed for each row x each param twice. In my little 
benchmark inserting 8000 rows with 3 columns and returning 2 columns for 
each row that means param_hook is called 8000x3x8000x2 = 384 000 000 
times! So I took a look at pgsql_stmt_param_hook in 
ext/pdo_pgsql/pgsql_statement.c and it doesn't seem to do anything for 
PDO_PARAM_EVT_FETCH_PRE or PDO_PARAM_EVT_FETCH_POST. So if my 
understanding is correct, it's calling a function that does nothing 
meaningful 384 000 000 times, and this number grows exponentially with 
the number of rows and columns.

I'm using postgres, but looking at ext/pdo_mysql code this seems to also 
be the case for mysql's drivers, didn't benchmark it though.


As a test I commented out the lines dispatching those events in 
ext/pdo/pdo_stmt.c:

|diff --git a/ext/pdo/pdo_stmt.c b/ext/pdo/pdo_stmt.c||
||index 96f7574638..49703a7d68 100644||
||--- a/ext/pdo/pdo_stmt.c||
||+++ b/ext/pdo/pdo_stmt.c||
||@@ -592,9 +592,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum 
pdo_fetch_orientation ori, zen||
||                return 0;||
||        }||
||||
||-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {||
||-               return 0;||
||-       }||
||+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {||
||+       //      return 0;||
||+       // }||
||||
||        if (!stmt->methods->fetcher(stmt, ori, offset)) {||
||                return 0;||
||@@ -605,9 +605,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum 
pdo_fetch_orientation ori, zen||
||                return 0;||
||        }||
||||
||-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {||
||-               return 0;||
||-       }||
||+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {||
||+       //      return 0;||
||+       // }||
||||
||        if (do_bind && stmt->bound_columns) {||
||                /* update those bound column variables now */|

|
|

After this change, fetching takes ~5ms and nothing seems broken, but the 
whole INSERT/RETURNING is now 10 times faster.


Am I understanding this right? Could this be solved by letting each 
pdo_* driver set some kind of a flag (bitmask?) telling PDO which hooks 
it wants called from dispatch_param_event? For example for pdo_pgsql 
that flag would be |PDO_PARAM_EVT_FREE | PDO_PARAM_EVT_NORMALIZE | 
PDO_PARAM_EVT_ALLOC | PDO_PARAM_EVT_EXEC_PRE|, as other case statements 
seem empty.


--------------E4D9AC3C93DB72AEAA49E019
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit


  

    
  
  
    

Hi,


First time messing around with PHP internal code and mailing lists, so sorry if I'm in the wrong place or doing something wrong in general.


I recently noticed some weird performance issues while doing bulk inserts with prepared statements (single INSERT with a lot of VALUES) and using RETURNING clause to get back IDs and other columns.

So I wrote a little benchmark to insert 8000 random rows (3 columns each) into a table and spent some time tracking down why it's slow. Suprisingly it seems that INSERT itself takes 100-200ms, but fetch/fetchAll returning id and one of the columns takes 2-3 seconds.

I'm sending the benchmark script and postgres database schema used.


After digging around PHP source code (pulled master branch), the problem seems to be in PDO calling param_hook with PDO_PARAM_EVT_FETCH_PRE and again PDO_PARAM_EVT_FETCH_POST  for each fetched row, which causes param_hook to be executed for each row x each param twice. In my little benchmark inserting 8000 rows with 3 columns and returning 2 columns for each row that means param_hook is called 8000x3x8000x2 = 384 000 000 times! So I took a look at pgsql_stmt_param_hook in ext/pdo_pgsql/pgsql_statement.c and it doesn't seem to do anything for PDO_PARAM_EVT_FETCH_PRE or PDO_PARAM_EVT_FETCH_POST. So if my understanding is correct, it's calling a function that does nothing meaningful 384 000 000 times, and this number grows exponentially with the number of rows and columns.

I'm using postgres, but looking at ext/pdo_mysql code this seems to also be the case for mysql's drivers, didn't benchmark it though.


As a test I commented out the lines dispatching those events in ext/pdo/pdo_stmt.c:

diff --git a/ext/pdo/pdo_stmt.c b/ext/pdo/pdo_stmt.c
index 96f7574638..49703a7d68 100644
--- a/ext/pdo/pdo_stmt.c
+++ b/ext/pdo/pdo_stmt.c
@@ -592,9 +592,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen
                return 0;
        }
 
-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {
-               return 0;
-       }
+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {
+       //      return 0;
+       // }
 
        if (!stmt->methods->fetcher(stmt, ori, offset)) {
                return 0;
@@ -605,9 +605,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen
                return 0;
        }
 
-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {
-               return 0;
-       }
+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {
+       //      return 0;
+       // }
 
        if (do_bind && stmt->bound_columns) {
                /* update those bound column variables now */


After this change, fetching takes ~5ms and nothing seems broken, but the whole INSERT/RETURNING is now 10 times faster.


Am I understanding this right? Could this be solved by letting each pdo_* driver set some kind of a flag (bitmask?) telling PDO which hooks it wants called from dispatch_param_event? For example for pdo_pgsql that flag would be PDO_PARAM_EVT_FREE | PDO_PARAM_EVT_NORMALIZE | PDO_PARAM_EVT_ALLOC | PDO_PARAM_EVT_EXEC_PRE, as other case statements seem empty.

--------------E4D9AC3C93DB72AEAA49E019--
  111695
August 26, 2020 16:38 php@beccati.com (Matteo Beccati)
Hi Dino,

On 26/08/2020 15:02, Dino Pejaković wrote:
> [snip] > ||+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {|| > ||+       //      return 0;|| > ||+       // }|| > || || > ||        if (do_bind && stmt->bound_columns) {|| > ||                /* update those bound column variables now */| > > After this change, fetching takes ~5ms and nothing seems broken, but the > whole INSERT/RETURNING is now 10 times faster. > > > Am I understanding this right? Could this be solved by letting each > pdo_* driver set some kind of a flag (bitmask?) telling PDO which hooks > it wants called from dispatch_param_event? For example for pdo_pgsql > that flag would be |PDO_PARAM_EVT_FREE | PDO_PARAM_EVT_NORMALIZE | > PDO_PARAM_EVT_ALLOC | PDO_PARAM_EVT_EXEC_PRE|, as other case statements > seem empty.
Nice find, thanks. I'll have a look into it, although I think anything we do in that area would target 8.1. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
  111696
August 26, 2020 17:28 george.banyard@gmail.com ("G. P. B.")
Just that you are aware the emails from both of you went to my spam folder.

Back to the matter, this is the correct list to email Dino, however could
you
open a pull request on GitHub for this change so that more eyes (and
especially eyes from people which know PDO) can have a look?

Please provide the relevant information again in the pull request and
possibly host your files on a gist.

This can still target PHP 8.0 as this seems to be akin to a bug this
might even be applicable to PHP 7.3/7.4. Anyways, good catch.

Best regards

George P. Banyard
  111697
August 26, 2020 18:06 dino.pejakovic@voxdiversa.hr (Dino Pejakovic)
> Just that you are aware the emails from both of you went to my spam folder. Thanks for the info. If anyone has any idea why we went to spam, please
let us know. The reason I didn't create a pull request immediately is that I don't have a concrete fix. The diff that I sent just shows where I commented out the lines to confirm this was the problem. PDO_PARAM_EVT_FETCH_POST seems to be used by pdo_firebird so I can't just remove the hooks. Also, pdo_pgsql seems to be processing some kind of INPUT_OUTPUT parameters and parsing postgres bools. This is in a code path that is maybe still used while fetching: |if (PDO_PARAM_TYPE(param->param_type) == PDO_PARAM_BOOL &&|| ||            ((param->param_type & PDO_PARAM_INPUT_OUTPUT) != PDO_PARAM_INPUT_OUTPUT)) {|| ||            const char *s = zend_is_true(¶m->parameter) ? "t" : "f";|| ||            param->param_type = PDO_PARAM_STR;|| ||            zval_ptr_dtor(¶m->parameter);|| ||            ZVAL_STRINGL(¶m->parameter, s, 1);|| ||        }| I'm not familiar with what this does so I wanted someone more familiar with PDO to see it first. I don't think I can create a useful pull request that doesn't break anything. On 8/26/20 7:28 PM, G. P. B. wrote:
> Just that you are aware the emails from both of you went to my spam folder. > > Back to the matter, this is the correct list to email Dino, however could > you > open a pull request on GitHub for this change so that more eyes (and > especially eyes from people which know PDO) can have a look? > > Please provide the relevant information again in the pull request and > possibly host your files on a gist. > > This can still target PHP 8.0 as this seems to be akin to a bug this > might even be applicable to PHP 7.3/7.4. Anyways, good catch. > > Best regards > > George P. Banyard >
  111699
August 27, 2020 08:23 php@beccati.com (Matteo Beccati)
Hi George,

On 26/08/2020 19:28, G. P. B. wrote:
> Just that you are aware the emails from both of you went to my spam folder.
Thanks. Alas, big G doesn't like the fact I'm using my own email server, despite all DKIM/SPF/whatnot being properly configured, according to them.
> Back to the matter, this is the correct list to email Dino, however could > you > open a pull request on GitHub for this change so that more eyes (and > especially eyes from people which know PDO) can have a look? > > Please provide the relevant information again in the pull request and > possibly host your files on a gist. > > This can still target PHP 8.0 as this seems to be akin to a bug this > might even be applicable to PHP 7.3/7.4. Anyways, good catch.
You could be right. I've created a draft PR for review: Dino could you please benchmark it and get back with results? https://github.com/php/php-src/pull/6047 Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
  111701
August 28, 2020 07:18 php@beccati.com (Matteo Beccati)
Hi George,

/cc release managers - pls don't hate me ;-)

On 27/08/2020 10:23, Matteo Beccati wrote:
> > On 26/08/2020 19:28, G. P. B. wrote: >> This can still target PHP 8.0 as this seems to be akin to a bug this >> might even be applicable to PHP 7.3/7.4. Anyways, good catch. > > You could be right. I've created a draft PR for review: Dino could yo> please benchmark it and get back with results?> > https://github.com/php/php-src/pull/6047
The PR seems to fix the issue: https://bugs.php.net/bug.php?id=80027 Dino's bench script takes 3s on vanilla PHP8 and 120ms with the patch, so it's seems a fairly good win. The fetching part alone goes down from 2.9s to 3ms. The way it's been fixed should be backwards and forwards compatible with no real need to bump PDO_VERSION_API: external PDO driver extensions wanting to use the param_evt_skip flags could simply set them via a preprocessor macro when compiled for PHP8+. For now I've optimised the pdo_pgsql/mysql/sqlite extensions, but I will look into the other bundled ones too. Would you think it's sensible to treat this as a bug fix and target 7.3+? Or is it better to do PHP8 only? Or? Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
  111702
August 28, 2020 11:20 cmbecker69@gmx.de ("Christoph M. Becker")
On 28.08.2020 at 09:18, Matteo Beccati wrote:

> Hi George, > > /cc release managers - pls don't hate me ;-) > > On 27/08/2020 10:23, Matteo Beccati wrote: >> >> On 26/08/2020 19:28, G. P. B. wrote: >>> This can still target PHP 8.0 as this seems to be akin to a bug this >>> might even be applicable to PHP 7.3/7.4. Anyways, good catch. >> >> You could be right. I've created a draft PR for review: Dino could yo> please benchmark it and get back with results?> > > https://github.com/php/php-src/pull/6047 > The PR seems to fix the issue: > > https://bugs.php.net/bug.php?id=80027 > > Dino's bench script takes 3s on vanilla PHP8 and 120ms with the patch, > so it's seems a fairly good win. The fetching part alone goes down from > 2.9s to 3ms. > > The way it's been fixed should be backwards and forwards compatible with > no real need to bump PDO_VERSION_API: external PDO driver extensions > wanting to use the param_evt_skip flags could simply set them via a > preprocessor macro when compiled for PHP8+. > > For now I've optimised the pdo_pgsql/mysql/sqlite extensions, but I will > look into the other bundled ones too. > > Would you think it's sensible to treat this as a bug fix and target > 7.3+? Or is it better to do PHP8 only? Or?
Might be better to do it for PHP 8 only, or maybe for PHP 7.4 as well. I'm not strictly against doing it for PHP 7.3, though. -- Christoph M. Becker
  111703
August 28, 2020 12:50 carusogabriel@php.net (Gabriel Caruso)
On Fri, 28 Aug 2020 at 09:19, Matteo Beccati <php@beccati.com> wrote:

> Hi George, > > /cc release managers - pls don't hate me ;-) > > On 27/08/2020 10:23, Matteo Beccati wrote: > > > > On 26/08/2020 19:28, G. P. B. wrote: > >> This can still target PHP 8.0 as this seems to be akin to a bug this > >> might even be applicable to PHP 7.3/7.4. Anyways, good catch. > > > > You could be right. I've created a draft PR for review: Dino could yo> > please benchmark it and get back with results?> > > https://github.com/php/php-src/pull/6047 > The PR seems to fix the issue: > > https://bugs.php.net/bug.php?id=80027 > > Dino's bench script takes 3s on vanilla PHP8 and 120ms with the patch, > so it's seems a fairly good win. The fetching part alone goes down from > 2.9s to 3ms. > > The way it's been fixed should be backwards and forwards compatible with > no real need to bump PDO_VERSION_API: external PDO driver extensions > wanting to use the param_evt_skip flags could simply set them via a > preprocessor macro when compiled for PHP8+. > > For now I've optimised the pdo_pgsql/mysql/sqlite extensions, but I will > look into the other bundled ones too. > > Would you think it's sensible to treat this as a bug fix and target > 7.3+? Or is it better to do PHP8 only? Or? > > > Cheers > -- > Matteo Beccati > > Development & Consulting - http://www.beccati.com/
Hello Matteo Let's have this patch merged into PHP 8.0 (`master` branch as of today), I have nothing against it. About PHPs 7.3 and 7.4: if the performance improvements are this massive as describe in this email, I'm :+1: to have it on those versions as well. Kind regards,
  111704
August 28, 2020 13:51 dino.pejakovic@voxdiversa.hr (=?UTF-8?Q?Dino_Pejakovi=c4=87?=)
The script I used to pinpoint this problem and test my temporary fix, 
and later Matteo's PR is linked in the bug report I oppened. Should be 
simple to run if anyone is interested in trying. I found this on 
pdo_pgsql, but looking at the code it looks like the problem should 
happen on other pdo_* drivers too, I can do more tests if it matters.


On 28. 08. 2020. 14:50, Gabriel Caruso wrote:
> On Fri, 28 Aug 2020 at 09:19, Matteo Beccati <php@beccati.com> wrote: > >> Hi George, >> >> /cc release managers - pls don't hate me ;-) >> >> On 27/08/2020 10:23, Matteo Beccati wrote: >>> On 26/08/2020 19:28, G. P. B. wrote: >>>> This can still target PHP 8.0 as this seems to be akin to a bug this >>>> might even be applicable to PHP 7.3/7.4. Anyways, good catch. >>> You could be right. I've created a draft PR for review: Dino could yo> >> please benchmark it and get back with results?> > >> https://github.com/php/php-src/pull/6047 >> The PR seems to fix the issue: >> >> https://bugs.php.net/bug.php?id=80027 >> >> Dino's bench script takes 3s on vanilla PHP8 and 120ms with the patch, >> so it's seems a fairly good win. The fetching part alone goes down from >> 2.9s to 3ms. >> >> The way it's been fixed should be backwards and forwards compatible with >> no real need to bump PDO_VERSION_API: external PDO driver extensions >> wanting to use the param_evt_skip flags could simply set them via a >> preprocessor macro when compiled for PHP8+. >> >> For now I've optimised the pdo_pgsql/mysql/sqlite extensions, but I will >> look into the other bundled ones too. >> >> Would you think it's sensible to treat this as a bug fix and target >> 7.3+? Or is it better to do PHP8 only? Or? >> >> >> Cheers >> -- >> Matteo Beccati >> >> Development & Consulting - http://www.beccati.com/ > > > Hello Matteo > > Let's have this patch merged into PHP 8.0 (`master` branch as of today), I > have nothing against it. > > About PHPs 7.3 and 7.4: if the performance improvements are this massive as > describe in this email, I'm :+1: to have it on those versions as well. > > Kind regards, >
  111705
August 28, 2020 18:36 php@beccati.com (Matteo Beccati)
Hi Gabriel,

> Let's have this patch merged into PHP 8.0 (`master` branch as of today), I > have nothing against it.> > About PHPs 7.3 and 7.4: if the performance improvements are this massive as > describe in this email, I'm :+1: to have it on those versions as well.
The mileage may very: running the Doctrine test suite, for example, shows no difference at all. On the other side of the spectrum, a SELECT * FROM tbl WHERE id IN (?, ....) and 65535 parameters: https://gist.github.com/mbeccati/6f2cd094c38b094cb4bfc4c0245a23ab root@jameson /tmp # /tmp/php-7.4-std bench_pdo3.php Got rows: 65535 [Timings] prep: 0.018, exec: 0.210, fetch: 35.812 root@jameson /tmp # /tmp/php-7.4-opt bench_pdo3.php Got rows: 65535 [Timings] prep: 0.019, exec: 0.208, fetch: 0.026 That's almost 36 seconds vs 26ms. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
  111706
August 28, 2020 20:57 cmbecker69@gmx.de ("Christoph M. Becker")
On 28.08.2020 at 20:36, Matteo Beccati wrote:

> Hi Gabriel, > >> Let's have this patch merged into PHP 8.0 (`master` branch as of today), I >> have nothing against it.> >> About PHPs 7.3 and 7.4: if the performance improvements are this massive as >> describe in this email, I'm :+1: to have it on those versions as well. > > The mileage may very: running the Doctrine test suite, for example, > shows no difference at all. > > On the other side of the spectrum, a SELECT * FROM tbl WHERE id IN (?, > ....) and 65535 parameters: > > https://gist.github.com/mbeccati/6f2cd094c38b094cb4bfc4c0245a23ab > > root@jameson /tmp # /tmp/php-7.4-std bench_pdo3.php > Got rows: 65535 > [Timings] prep: 0.018, exec: 0.210, fetch: 35.812 > > root@jameson /tmp # /tmp/php-7.4-opt bench_pdo3.php > Got rows: 65535 > [Timings] prep: 0.019, exec: 0.208, fetch: 0.026 > > That's almost 36 seconds vs 26ms.
Can we be certain that the relevant bits of the formerly _reserved_flags are zero-filled for all existing drivers? -- Christoph M. Becker
  111707
August 29, 2020 07:30 php@beccati.com (Matteo Beccati)
Hi Christoph,

On 28/08/2020 22:57, Christoph M. Becker wrote:
> Can we be certain that the relevant bits of the formerly _reserved_flags > are zero-filled for all existing drivers?
We can, that's basically the premise of the PR itself: 1. The flags are in the pdo_dbh_t struct which is the dbh->inner part 2. dbh->inner = ecalloc(1, sizeof(pdo_dbh_t)): 3. stmt->dbh is set as dbh->inner, in prepare and query 4. drivers can eventually set some bits to 1. A few quick links: https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/php_pdo_driver.h#L514 https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/pdo_dbh.c#L562 https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/pdo_dbh.c#L1085 https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/pdo_dbh.c#L1481 Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
  111736
August 31, 2020 07:19 cmbecker69@gmx.de ("Christoph M. Becker")
On 29.08.2020 at 09:30, Matteo Beccati wrote:

> Hi Christoph, > > On 28/08/2020 22:57, Christoph M. Becker wrote: >> Can we be certain that the relevant bits of the formerly _reserved_flags >> are zero-filled for all existing drivers? > > We can, that's basically the premise of the PR itself: > > 1. The flags are in the pdo_dbh_t struct which is the dbh->inner part > 2. dbh->inner = ecalloc(1, sizeof(pdo_dbh_t)): > 3. stmt->dbh is set as dbh->inner, in prepare and query > 4. drivers can eventually set some bits to 1. > > A few quick links: > > https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/php_pdo_driver.h#L514 > https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/pdo_dbh.c#L562 > https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/pdo_dbh.c#L1085 > https://github.com/php/php-src/blob/658f6ff2dae42c4f2b5c6f8d6adc8c8f09b52b47/ext/pdo/pdo_dbh.c#L1481
Thanks! So there only would be a problem, if a driver modifies _reserved_flags (what they're not supposed to do). Looks to be okay then for the stable release branches. -- Christoph M. Becker