Thank you for attending my 22nd July 2016 webinar titled “ Top Most Overlooked MySQL Performance Optimizations “. In this blog, I will provide answers to the Q&A for that webinar. Look out for Part II, coming soon.
For hardware, which disk raid level do you suggest? Is raid5 suggested performance-wise and data integrity-wise?
RAID 5 comes with high overhead, as each write turns into a sequence of four physical I/O operations, two reads and two writes. We know that RAID 5s have some write penalty, and it could affect the performance on spindle disks. In most cases, we advise using alternative RAID levels. Use RAID 5 when disk capacity is more important than performance (e.g., archive databases that aren’t used often). Since write performance isn’t a problem in the case of SSD, but capacity is expensive, RAID 5 can help by wasting less disk space.
Regarding collecting table statistics, do you have any suggestions for analyzing large tables (over 300GB) since we had issues with MySQL detecting the wrong cardinality?
MySQL optimizer makes decisions about the execution plan (EXPLAIN), and statistics are re-estimated automatically and can be re-estimated explicitly when one calls the ANALYZE TABLE statement for the table, or the OPTIMIZE TABLE statement for InnoDB tables (which rebuilds the table and then performs an ANALYZE for the table).
When MySQL optimizer is not picking up the right index in EXPLAIN, it could be caused by outdated or wrong statistics (optimizer bugs aside). So, when you optimize the table you rebuild it so data are stored in a more compact way (assuming they changed a lot in the past) and then you re-estimate statistics based on some random sample pages checked in the table. As a result, you come up with statistics that are more correct for the data you have at the moment. This allows optimizer to get a better plan. When an explicit hint is added, you reduce possible choices for optimizer and it can use a good enough plan even with wrong statistics.
If you use versions 5.6.x and 5.7.x, and InnoDB tables, there is a way to store/fix statistics when the plans are good. Using Persistent Optimizer Statistics prevents it from changing automatically. It’s recommended you run ANALYZE TABLE to calculate statistics (if really needed) during off peak time and make sure the table in question is not in use. Check this blog post too.
Regarding the buffer pool, when do you think using multiple buffer pool instances make sense?
Multiple InnoDB buffer pools were introduced in MySQL 5.5, and the default value for it was 1. Now, the default value in MySQL 5.6 is 8. Enabling
is useful in highly concurrent workloads as it may reduce contention of the global mutexes.
helps to improve scalability in multi-core machines and having multiple buffer pools means that access to the buffer pool splits across all instances. Therefore, no single mutex controls the access pattern.
only takes effect when set to 1GB (at minimum), and the total specified size for innodb_buffer_pool is divided among all the buffer pool instances. Further, setting the innodb_buffer_pool_instances
parameter is not a dynamic option, so it requires a server restart to take effect.
What do you mean “PK is appended to secondary index”
In InnoDB, secondary indexes are stored along with their corresponding primary key values. InnoDB uses this primary key value to search for the row in the clustered index. So, primary keys are implicitly added with secondary keys.
About Duplicate Keys, if I have a UNIQUE KEY on two columns, is it ok then to set a key for each of these columns also? Or should I only keep the unique key on the columns and get rid of regular key on each column also?
As I mentioned during the talk, for composite index the leftmost prefix is used. For example, If you have a UNIQUE INDEX on columns A,B as (A,B), then this index is not used for lookup for the query below:
SELECT * FROM test WHERE B='xxx';
For that query, you need a separate index on B column.
Stay tuned for Part II where we finish up the second half of the Q&A.