如何在 MySQL 中進行 insert if not exists

相信很多 Developer 都會真心的幹譙懷疑為什麼 MySQL 經過這麼多年都沒有 PostgreSQL 的 INSERT … ON DUPLICATE KEY DO NOTHING 語法，偏偏這真的是一個很常會碰到的使用情境

所以接下來探討一些單一 SQL 語法方面可行的做法

準備

環境為 MySQL server 8.0.37

首先建立以下 table (只是為了情境驗證，合理性就先不計較)

drop table if exists test_user;
create table if not exists test_user (
    id int unsigned auto_increment primary key,
    user_id int unsigned not null,
    is_active boolean not null default 1,
    constraint uk_user_id_user unique (user_id)
) engine = innodb charset = utf8mb4 collate = utf8mb4_0900_ai_ci;

情境就是要在表中新增一筆 user 紀錄，如果該 user id 已經存在，就不做任何變更

每個情境測試前都會重新建立整個 table，以避免其他問題影響測試結果

取得 auto increment 數字前，可能需要 analyze table 以校正數據

`insert`

第一個方法就是直接 insert

mysql> insert into test_user (user_id) value (1);
Query OK, 1 row affected (0.01 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> insert into test_user (user_id) value (1);
ERROR 1062 (23000): Duplicate entry '1' for key 'test_user.uk_user_id_user'

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              3 |
+------------+----------------+
1 row in set (0.00 sec)

可以看到功能完全可以達成

但是有以下兩個問題

第二次以後的 insert 就會直接噴 duplicate key error
即使第二次實際上並沒有變更資料，auto increment 還是會 +1

因為 insert 實際上會先按照原有的 auto increment 準備一個新的 row，然後 auto increment 會 +1，接著在檢查到 duplicate key 後，直接拋 duplicate key error 並 reject statement 的執行，auto increment 並不會 rollback

如果這些問題可以處理或忽略，也不妨考慮這個方式，因為語意上比較明確，也可以讓 client 針對各種可能拋出的錯誤做不同的處理

但有時候在某些考量下，例如 Java 的 catch exception 有其 overhead，還是希望有不會 raise duplicate key error 的作法

`insert ignore`

如果不希望 raise duplicate key error，首先可以考慮 insert ignore

mysql> insert ignore into test_user (user_id) value (1);
Query OK, 1 row affected (0.01 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> insert ignore into test_user (user_id) value (1);
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+---------------------------------------------------------+
| Level   | Code | Message                                                 |
+---------+------+---------------------------------------------------------+
| Warning | 1062 | Duplicate entry '1' for key 'test_user.uk_user_id_user' |
+---------+------+---------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              3 |
+------------+----------------+
1 row in set (0.00 sec)

可以看到 insert ignore 的結果基本上與 insert 幾乎一樣，只差在原本第二次 insert 的時候會拋 duplicate key error，現在會被 ignore 關鍵字 suppress 成 duplicate key warning

這在使用上需要特別小心，因為 ignore 是真的會把所有 error 都忽略，可能會有些你意料之外的錯誤不應該忽略的也會被忽略

而且如果是 insert multiple rows 而且只有其中一些 row 有問題的狀況下，沒有問題的 rows 仍然會被成功 insert 不會整個 insert statement 被 reject，當然這個看狀況可能是缺點也有可能是優點

`replace`

不希望 raise duplicate key error 另一個選擇是採用 replace

mysql> replace into test_user (user_id) value (1);
Query OK, 1 row affected (0.01 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> replace into test_user (user_id) value (1);
Query OK, 2 rows affected (0.04 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  2 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              3 |
+------------+----------------+
1 row in set (0.00 sec)

可以看到 replace 跟 insert 也幾乎一樣，只差在第二次 replace 時是 2 rows affected 並且不會有 error 或 warning，並且在第二次 replace 後，該筆資料的 id 被變更為 2

這是因為 replace 實際上是先做 delete 再做 insert，因此被影響的是 2 個 row，因此也不會有 duplicate key error，因為實際 insert 的時候，原本會 duplicate 的 row 已經被刪掉了

replace 除了 id 會被改變、auto increment 還是會 kind of 浪費一個以外，最大的問題是如果 id 被其他 table 設為 foreign key 的時候，可能會產生你意料之外的結果，例如 cascade delete

`insert … on duplicate key update`

但如果我們既不想 id 被改變，也不希望 raise duplicate key error/warning 呢？

那可以先考慮 insert ... on duplicate key update 的做法

mysql> insert into test_user (user_id) value (1) as new on duplicate key update user_id = new.user_id;
Query OK, 1 row affected (0.05 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> insert into test_user (user_id) value (1) as new on duplicate key update user_id = new.user_id;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              3 |
+------------+----------------+
1 row in set (0.00 sec)

如此就達成既不會改變 id，也不會 raise duplicate key error/warning

但是仍然會造成 auto increment 被多 +1

另外，因為在 8.0.20 之後的 MySQL，values() 功能已 deprecated，所以改用 8.0.19 引入的 alias 語法以避免 warning

如果是 8.0.20 以前的版本，可以改採用 values() 寫法以避免 syntax error

insert into test_user (user_id) value (1) on duplicate key update user_id = values(user_id)

`insert … on duplicate key update id = id`

這個方式還有一個變種，網路流傳這樣寫，MySQL 會自動辨認如果有 duplicate key 要改作 update 也不會造成 auto increment +1，立刻就來做流言驗證！

mysql> insert into test_user (user_id) value (1) on duplicate key update id = id;
Query OK, 1 row affected (0.06 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> insert into test_user (user_id) value (1) on duplicate key update id = id;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              3 |
+------------+----------------+
1 row in set (0.00 sec)

結論不用多說了，MYTH BUSTED!

`insert … select … not exists`

如果我們既不想 raise duplicate key error/warning，也不想改變 id，還不想造成 auto increment waste 呢？(小孩子才做選擇，我全都要)

直接上驗證結果

mysql> insert into test_user (user_id) select temp.* from (select 1 as user_id) as temp where not exists(select * from test_user where user_id = 1) limit 1;
Query OK, 1 row affected (0.06 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> insert into test_user (user_id) select temp.* from (select 1 as user_id) as temp where not exists(select * from test_user where user_id = 1) limit 1;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_schema = database() and table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

在正常情況下，這個方式差不多已經是最優解，既不會 raise duplicate key error/warning，也不改變 id，還不造成 auto increment 浪費

但這是正常情況下，當你有 high concurrent 的情況下就不見得如此了

簡單來說是因為 insert ... select 不是保證 atomic 的，select 跟 insert 是分段進行

如果剛好有相同 unique key 的兩次 insert … select … not exists 在幾乎相同的時間執行，可能還是會形成兩個 select 都認為沒有 duplicate row 而執行 insert，因而造成其中一個 statement 噴出 duplicate key error

`insert … select … not exists … on duplicate key`

mysql> insert into test_user (user_id) select temp.* from (select 1 as user_id) as temp where not exists (select * from test_user where user_id = 1) limit 1 on duplicate key update id = id;
Query OK, 1 row affected (0.04 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> insert into test_user (user_id) select temp.* from (select 1 as user_id) as temp where not exists (select * from test_user where user_id = 1) limit 1 on duplicate key update id = id;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

這個方案算是結合 insert … select … not exists 跟 insert … on duplicate key，也就是當 insert … select … not exists 因為 concurrent 的關係準備要 insert 一個 duplicate row 的時候，就會改為執行 on duplicate key 的部分，而因為 insert … on duplicate key 是 atomic 的，所以這樣的操作不會拋出 duplicate key error/warning

在正常的情況下這個 statement 不會拋出 duplicate key error，id 不會變化，也不會造成 auto increment 浪費

但是在 concurrent 而形成其中一個 statement 會 insert duplicate row 的情況下，其實跟 insert … on duplicate key 的問題是相同的，就是有可能會造成 auto increment 浪費

`insert ignore … select … not exists`

mysql> insert ignore into test_user (user_id) select temp.* from (select 1 as user_id) as temp where not exists (select * from test_user where user_id = 1) limit 1;
Query OK, 1 row affected (0.05 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.01 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

mysql> insert ignore into test_user (user_id) select temp.* from (select 1 as user_id) as temp where not exists (select * from test_user where user_id = 1) limit 1;
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> select * from test_user;
+----+---------+-----------+
| id | user_id | is_active |
+----+---------+-----------+
|  1 |       1 |         1 |
+----+---------+-----------+
1 row in set (0.00 sec)

mysql> select table_name, auto_increment from information_schema.tables where table_name = 'test_user';
+------------+----------------+
| TABLE_NAME | AUTO_INCREMENT |
+------------+----------------+
| test_user  |              2 |
+------------+----------------+
1 row in set (0.00 sec)

這個方案是將 insert … select … not exists 加上 ingore 關鍵字，也就式在 concurrent 時，如果發生 duplicate 的狀況，duplicate key error 會被 suppress 成 duplicate key warning

在一般狀況下同樣不會拋出 duplicate key error，造成 id 改變，或造成 auto increment 浪費

但是在 concurrent 而形成其中一個 statement 會 insert duplicate row 的情況下，其實跟 insert ignore 的問題是相同的，就是有可能會造成 auto increment 浪費，而且萬一有其他應該被處理的 error 也會同樣被 suppress

Conclusion

直接做表格幫大家整理結論，MySQL 並沒有一個完美解決方案，只能根據狀況做取捨

statement	raise duplicate key error/warning	auto increment burn	id change
`insert`	error and reject	N	N
`insert ignore`	warning only	Y	N
`replace`	NA	Y	Y
`insert … on duplicate key update`	NA	Y	N
`insert … select ... not exists`	possible error and reject when concurrent	possible when concurrent	N
`insert … select ... not exists … on duplicate key update id = id`	NA	possible when concurrent	N
`insert ignore … select … not exists`	possible warning when concurrent	possible when concurrent	N

另外，本文僅針對各種單一 statement 的方法的硬限制作討論，並不涉及各 statement 速度、空間利用，也不比較 multiple statement 甚至牽涉 client 端的解決方案，而且僅針對預設設定做比較，不涉及改動 isolation level 或 MySQL server config

Reference

Last modified on 2025-02-07