Sunday, May 22, 2011

MySQL: ON DUPLICATE KEY UPDATE breaks generated key list of extended INSERT

When extended INSERT (insert with several tuples in one query) is used with ON DUPLICATE KEY UPDATE clause and there're unique key conflict the Connector/J's Statement will return incorrect list of generated identifiers that cannot be matched with tuples from INSERT query at all. There are steps to reproduce this issue:
  1. create a table with auto increment primary key and unique key:
    create table TEST_TABLE(
            id int auto_increment, 
            test_field varchar(255), 
            primary key (id), 
            unique key(test_field)
    ) engine=InnoDB;

  2. insert the following row:
    insert into TEST_TABLE(id, test_field) values(100, 'test1');

  3. perform an extended insert with ON DUPLICATE KEY UPDATE clause via JDBC driver:
    insert into TABLE(test_field) values('test2'),('test1'),('test3') 
    on duplicate key update id=LAST_INSERT_ID(id);
and check returned generated keys:
  • at first, you'll get 4 values instead of 3: [1, 2, 3, 4]
  • at second, there's no correct id for test1 row in returned list
Thus only for the first row returned id will be correct, if test1 will be first in a query whole list will be incorrect and cannot be matched with inserted rows. The only workaround is to use separate query to retrive generated identifiers to reduce count of Data Base access and avoid the split of this extended insert in separate queries. I've looked thru the code in com.mysql.jdbc.StatementImpl and found that It retrieves count of affected rows and just increments the value of LAST_INSERT_ID for each update and gets sequential list of identifiers. But in fact it can be nonsequential: for my case I expect the result set: [1, 100, 3] <-- 3 is here because of the 2nd value, or even [1, NULL, 3] will be normal. But current result: [1, 2, 3, 4] is very confused and cannot be matched with inserted rows.

In fact MySql returns unexpected count of updated rows in such case because of this behavior:
For INSERT ... ON DUPLICATE KEY UPDATE statements, the affected-rows value is 1 if the row is inserted as a new row and 2 if an existing row is updated.
I've reported new #61213 issue for MySQL Connector/J and according to Mark Matthews the fix raises a modification in MySQL server protocol because there's currently no way to get generated keys for whole data set though LAST_INSERT_ID returns only the first one. Thus be aware of this issue.
 
Blogged.com Technology Blogs