百万级sql server数据库优化案例分享

        在我们的IT职业生涯中,能有一次百万级的数据库的优化经历是很难得的,如果你遇到了恭喜你,你的职业生涯将会更加完美,如果你遇到并解决了,那么一定足够你炫耀很多年。

        这里我将要分享一次完美的百万级数据库优化经历,希望能给在IT行业的小伙伴一点启发和帮助。本篇分三步走:1、遇到问题;2、分析问题;3、解决问题;

1、遇到问题

        我负责一个web项目,数据主表差不多500w的历史数据,最近老是听运营部门反馈系统卡顿,甚至发生过站点宕机的情况。

        这里省略一万字,经历了各种排查,最终定性为db性能问题,进入下一步。

2、分析问题

        面对上面的问题,经过了各种常规慢查询、sql跟踪查询分析、连接数查询,始终无法解决问题,后来通过各种渠道的打听,得到了以下解决方法。

        第一步,获取数据库表阻塞sql信息:

select * from (
		 SELECT   spid,
		 blocked,
		 DB_NAME(sp.dbid) AS DBName,
		 program_name,
		 waitresource,
		 lastwaittype,
		 sp.loginame,
		 sp.hostname,
		 a.[Text] AS [TextData],
		 SUBSTRING(A.text, sp.stmt_start / 2,
		 (CASE WHEN sp.stmt_end = -1 THEN DATALENGTH(A.text) ELSE sp.stmt_end
		 END - sp.stmt_start) / 2) AS [current_cmd]
		 FROM     sys.sysprocesses AS sp OUTER APPLY sys.dm_exec_sql_text (sp.sql_handle) AS A
) a join (select blocked as blocked2
			from (select * from sys.sysprocesses where blocked>0 ) a
			where not exists(select * from (select * from sys.sysprocesses where blocked>0 ) b
where a.blocked=spid)) b on a.blocked=b.blocked2

        第二步,获取当前执行所有sql信息:

SELECT  SPID = er.session_id,Status = ses.status,[Login] = ses.login_name,Host = ses.host_name,BlkBy = er.blocking_session_id,
        DBName = DB_Name(er.database_id),CommandType = er.command,SQLStatement = st.text,ObjectName = OBJECT_NAME(st.objectid),
        ElapsedMS = er.total_elapsed_time,CPUTime = er.cpu_time,IOReads = er.logical_reads + er.reads,IOWrites = er.writes,
        LastWaitType = er.last_wait_type,StartTime = er.start_time,Protocol = con.net_transport,ConnectionWrites = con.num_writes,
        ConnectionReads = con.num_reads,ClientAddress = con.client_net_address,Authentication = con.auth_scheme  
        FROM sys.dm_exec_requests er  OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) st  
                                      LEFT JOIN sys.dm_exec_sessions ses  ON ses.session_id = er.session_id  
                                      LEFT JOIN sys.dm_exec_connections con  ON con.session_id = ses.session_id  
        WHERE er.session_id > 50  ORDER BY  er.blocking_session_id DESC,er.session_id

      第三步,添加自动监控程序,每隔10s查询需要监控的数据库信息,然后把表阻塞和所有执行sql情况进行入库分析。

private void Process()
        {
            //获取itom监控数据
            string strBlock = @"select * from (
                                 SELECT   spid,
                                 blocked,
                                 DB_NAME(sp.dbid) AS DBName,
                                 program_name,
                                 waitresource,
                                 lastwaittype,
                                 sp.loginame,
                                 sp.hostname,
                                 a.[Text] AS [TextData],
                                 SUBSTRING(A.text, sp.stmt_start / 2,
                                 (CASE WHEN sp.stmt_end = -1 THEN DATALENGTH(A.text) ELSE sp.stmt_end
                                 END - sp.stmt_start) / 2) AS [current_cmd]
                                 FROM     sys.sysprocesses AS sp OUTER APPLY sys.dm_exec_sql_text (sp.sql_handle) AS A
                        ) a join (select blocked as blocked2
			                        from (select * from sys.sysprocesses where blocked>0 ) a
			                        where not exists(select * from (select * from sys.sysprocesses where blocked>0 ) b
                        where a.blocked=spid)) b on a.blocked=b.blocked2";
            string strBlockInfo = @"SELECT  SPID = er.session_id,Status = ses.status,[Login] = ses.login_name,Host = ses.host_name,BlkBy = er.blocking_session_id,
                                    DBName = DB_Name(er.database_id),CommandType = er.command,SQLStatement = st.text,ObjectName = OBJECT_NAME(st.objectid),
                                    ElapsedMS = er.total_elapsed_time,CPUTime = er.cpu_time,IOReads = er.logical_reads + er.reads,IOWrites = er.writes,
                                    LastWaitType = er.last_wait_type,StartTime = er.start_time,Protocol = con.net_transport,ConnectionWrites = con.num_writes,
                                    ConnectionReads = con.num_reads,ClientAddress = con.client_net_address,Authentication = con.auth_scheme  
                                    FROM sys.dm_exec_requests er  OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) st  
                                                                  LEFT JOIN sys.dm_exec_sessions ses  ON ses.session_id = er.session_id  
                                                                  LEFT JOIN sys.dm_exec_connections con  ON con.session_id = ses.session_id  
                                    WHERE er.session_id > 50  ORDER BY  er.blocking_session_id DESC,er.session_id";
            DataTable dtBlock = SqlHelper.ExecuteDataTable(connectionTTItom, CommandType.Text, strBlock);
            DataTable dtSqlInfo = SqlHelper.ExecuteDataTable(connectionTTItom, CommandType.Text, strBlockInfo);

            //数据入库
            if (dtBlock != null && dtBlock.Rows.Count > 0)
            {
                foreach (DataRow dr in dtBlock.Rows)
                {
                    try
                    {
                        Insert_BlockedInfo(dr, "itom");
                    }
                    catch (Exception ex)
                    {
                        _logger.Info("Insert_BlockedInfo方法异常,"+ex.ToString());
                    }
                    
                }
            }

            if (dtSqlInfo != null && dtSqlInfo.Rows.Count > 0)
            {
                foreach (DataRow dr in dtSqlInfo.Rows)
                {
                    try
                    {
                        Insert_SqlInfo(dr, "itom");
                    }
                    catch (Exception ex)
                    {
                        _logger.Info("Insert_SqlInfo方法异常," + ex.ToString());
                    }
                    
                }
            }
        }

3、解决问题

        有了以上的准备工作,有了所有的db执行情况数据,我们就很容易对db进行分析了。当业务反馈访问异常的时候,我们只需要打开监控数据,根据监控时间范围就能很容易得看到导致表阻塞的语句。

select top 200 * from dbo.Monitor_BlockedInfo where createtime>'xxxx' and createtime<'xxxx' order by id desc 

        除了查看当前阻塞语句,我们还可以根据当前时间点的并行sql数量来监控数据库高负载时间段,具体情况如下:

-- 先检查时间段内sql堆积数量
select convert(char(19),createtime,120),COUNT(1)
from dbo.Monitor_SqlInfo 
where createtime>'2023-07-17 08:00:00' --and createtime<'2023-06-21 08:59:00'
group by convert(char(19),createtime,120)
order by COUNT(1) desc

 

         最后,我们根据高负载时间点,查询当前时间点的所有sql执行情况,有了上面的数据,db的运行情况就被拿捏的稳稳的了。