千万级别mysql合并表快速去重简析_MySQL
内容导读
互联网集市收集整理的这篇技术教程文章主要介绍了千万级别mysql合并表快速去重简析_MySQL,小编现在分享给大家,供广大互联网技能从业者学习和参考。文章包含4740字,纯文字阅读大概需要7分钟。
内容图文
![千万级别mysql合并表快速去重简析_MySQL](/upload/InfoBanner/zyjiaocheng/574/054e6ca8e9584271b43a6f975c038083.jpg)
bitsCN.com
千万级别mysql合并表快速去重简析 mysql合并表去重目标:现有表a和b,把两个表中的数据合并去重到c表中。其中a和b表中数据量大概在2千万左右。基本情况操作系统版本:CentOS release 5.6 64位操作系统内存:8G数据库版本:5.1.56-community 64位数据库初始化参数:默认 数据库表和数据量表a: mysql> desc a2kw;+-------+-------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-------------+------+-----+---------+-------+| c1 | varchar(20) | YES | MUL | NULL | || c2 | varchar(30) | YES | | NULL | || c3 | varchar(12) | YES | | NULL | || c4 | varchar(20) | YES | | NULL | |+-------+-------------+------+-----+---------+-------+4 rows in set (0.00 sec)表bmysql> desc b2kw;+-------+-------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+-------+-------------+------+-----+---------+-------+| c1 | varchar(20) | YES | | NULL | || c2 | varchar(30) | YES | | NULL | || c3 | varchar(12) | YES | | NULL | || c4 | varchar(20) | YES | | NULL | |+-------+-------------+------+-----+---------+-------+4 rows in set (0.00 sec) a和b表的数据概况如下mysql> select * from a2kw limit 10;+-----------+-----------+------+----------+| c1 | c2 | c3 | c4 |+-----------+-----------+------+----------+| 662164461 | 131545534 | TOM0 | 20120520 || 226662142 | 605685564 | TOM0 | 20120516 || 527008225 | 172557633 | TOM0 | 20120514 || 574408183 | 350897450 | TOM0 | 20120510 || 781619324 | 583989494 | TOM0 | 20120510 || 158872754 | 775676430 | TOM0 | 20120512 || 815875622 | 631631832 | TOM0 | 20120514 || 905943640 | 477433083 | TOM0 | 20120514 || 660790641 | 616774715 | TOM0 | 20120512 || 999083595 | 953186525 | TOM0 | 20120513 |+-----------+-----------+------+----------+10 rows in set (0.01 sec) 基本步骤 1、在B表上创建索引mysql> select count(*) from b2kw;+----------+| count(*) |+----------+| 20000002 |+----------+1 row in set (0.00 sec)mysql> create index ind_b2kw_c1 on b2kw(c1);Query OK, 20000002 rows affected (1 min 2.94 sec)Records: 20000002 Duplicates: 0 Warnings: 0数据量为:20000002 ,时间为:1 min 2.94 sec2、把a、b分别插入中间表temp表中 创建中间表mysql> create table temp select * from c2kw where 1=2;Query OK, 0 rows affected (0.00 sec)Records: 0 Duplicates: 0 Warnings: 0插入数据mysql> insert into temp select * from a2kw;Query OK, 20000002 rows affected (13.23 sec)Records: 20000002 Duplicates: 0 Warnings: 0mysql> insert into temp select * from b2kw;Query OK, 20000002 rows affected (13.27 sec)Records: 20000002 Duplicates: 0 Warnings: 0 mysql> select count(*) from temp;+----------+| count(*) |+----------+| 40000004 |+----------+1 row in set (0.00 sec)数据量为:40000004 ,时间为:26.50 sec3、temp建立联合索引,强制索引去掉重复数据mysql> create index ind_temp_c123 on temp(c1,c2,c3);Query OK, 40000004 rows affected (3 min 43.87 sec)Records: 40000004 Duplicates: 0 Warnings: 0查看执行计划mysql> explain select c1,c2,c3,max(c4) from temp FORCE INDEX
(ind_temp_c123) group by c1,c2,c3 ;+----+-------------+-------+-------+---------------+----------
-----+---------+------+----------+-------+| id | select_type | table | type | possible_keys | key
| key_len | ref | rows | Extra |+----+-------------+-------+-------+---------------+-------------
--+---------+------+----------+-------+| 1 | SIMPLE | temp | index | NULL | ind_temp_c123 | 71
| NULL | 40000004 | |+----+-------------+-------+ -------+---------------+--------
-------+---------+------+----------+-------+1 row in set (0.05 sec) mysql> insert into c2kw select c1,c2,c3,max(c4) from temp
FORCE INDEX (ind_temp_c123) group by c1,c2,c3 ;Query OK, 20000004 rows affected (2 min 0.85 sec)Records: 20000004 Duplicates: 0 Warnings: 0实际大约花费实际为:6 min
4、删除中间表mysql> drop table temp;Query OK, 0 rows affected (0.99 sec)实际大约花费实际为:1 sec
5、建立c索引mysql> create index ind_c2kw_c1 on c2kw(c1);Query OK, 20000004 rows affected (49.74 sec)Records: 20000004 Duplicates: 0 Warnings: 0mysql> create index ind_c2kw_c2 on c2kw(c2);Query OK, 20000004 rows affected (1 min 47.20 sec)Records: 20000004 Duplicates: 0 Warnings: 0mysql> create index ind_c2kw_c3 on c2kw(c3);Query OK, 20000004 rows affected (2 min 42.02 sec)Records: 20000004 Duplicates: 0 Warnings: 0实际大约花费实际为:5分钟
6、清空a、b表mysql> truncate table a2kw;Query OK, 0 rows affected (1.15 sec)mysql> truncate table b2kw;Query OK, 0 rows affected (1.34 sec)实际大约花费实际为:3sec 一共花费的时间大概在15分钟左右 作者 RuleV5 bitsCN.com
内容总结
以上是互联网集市为您收集整理的千万级别mysql合并表快速去重简析_MySQL全部内容,希望文章能够帮你解决千万级别mysql合并表快速去重简析_MySQL所遇到的程序开发问题。
如果觉得互联网集市技术教程内容还不错,欢迎将互联网集市网站推荐给程序员好友。
内容备注
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 gblab@vip.qq.com 举报,一经查实,本站将立刻删除。
内容手机端
来源:【匿名】