by Tibor Köröcz and Francisco Bordenave

10 more essential MySQL performance tuning tips

feature
Nov 07, 202214 mins
DatabasesMySQLRelational Databases

Schema design, indexes, queries, configs, I/O... what could go wrong? Follow these 10 essential tips to keep your MySQL servers humming.

mechanic wrench fix engine repair
Credit: Thinkstock

MySQL is the world’s most widely used open source database, and ranks a close second in popularity among databases overall. It’s an effective relational database management system that has been at the heart of popular applications for years. However, it can be challenging to use and there are many opportunities to improve performance.

There have been some important new developments in the last few years for MySQL as well. This article updates a previous set of MySQL performance tuning tips provided by Baron Schwartz. Although the earlier article is still relevant, there are additional steps you can take to achieve the best performance for your MySQL deployment. Here are 10 more MySQL performance tuning tips to add to your list.

MySQL performance tip No. 1: Schema design is just as important as any other MySQL settings

Schema design is one of the most important things that you will do in your database. This is a cross relational database technology principle, as normal forms were introduced back in the 1970s. Since MySQL moved to InnoDB as the default storage engine in version 5.6, the schema design becomes even more important.

Why is this? In InnoDB, everything is a primary key! This relates to the way InnoDB organizes the data. In InnoDB, the primary key is clustered and every secondary key adds an entry pointer to the primary key. If you don’t take this into account in your schema design, then your performance will be negatively impacted.

The data is also stored using B-tree indexes, so inserting data in an ordered way (i.e. using quasi-sequential values) prevents primary key fragmentation and thus reduces I/O operations required to find leaf nodes.

There are some use cases where sequential primary keys are not the right choice — a good example here is the Universally Unique IDentifier or UUID. You can find a deeper dive into issues around UUIDs and primary keys here. However, generally speaking, we recommend using sequential primary keys for most use cases.

MySQL performance tip No. 2: Secondary keys are not your enemy

Secondary keys are updated by a background process. As a result, the performance impact is not as serious as you would expect. Instead, the problem is around disk footprint because adding secondary keys will increase the storage requirements.

Filtering on a field that does not have an index could result in a full table scan every time the query runs. This can, of course, result in a huge performance impact. It’s therefore better to have a secondary key than miss one.

That being said, you should not over-index your databases, as running many indexes may not provide the performance improvements you want to achieve. At the same time, these additional indexes may increase your storage costs, and InnoDB has to perform many background operations to keep them up to date.

MySQL performance tip No. 3: Rows can be served from indexes

InnoDB can find and actually serve rows directly from indexes, whereas a secondary key points to the primary key and the primary key contains the row itself. If the InnoDB Buffer Pool is big enough, it can hold most data in memory too. You can even use composite keys, which are usually more effective for queries than individual per-column keys. MySQL can use one index per table access, so if you are running queries with a clause like WHERE x=1 and y=2 then having an index over x,y is better than having individual indexes over each column.

Furthermore, a composite index over x,y also can improve the performance of the following query:

SELECT y FROM table WHERE x=1

MySQL will use the covering index and serve y from the index, which is in memory.

In practice, you can improve performance by using a composite index when you have the chance to do so. Whenever you’re designing indexes you need to think about them in the natural way that they are read. What this means is that indexes are read always from the left to right, so given a query like this:

SELECT a,b,c FROM table WHERE a=1 and b=2

Then an index over a,b will help with the query. But if the query is in this format:

SELECT a,b,c FROM table WHERE b=2

Then the index will be useless and will cause a full table scan. The idea of always reading the indexes from the left also applies to some other cases. For example, given the following query:

SELECT a,b,c FROM table WHERE a=1 and c=2

Then an index over a,b,c will read only the first column because there is no WHERE clause filtering by column b. So in this case MySQL can partially read the index, which is better than a full table scan, but still not good enough to get the best performance of the query.

Another element related to query design is the leftmost index approach, as this is a common optimization used in MySQL. For example, an index on a,b,c will not cover a query like select a,c where c=x because the query cannot skip the first part of the index, which is a,b. The same goes for a query like select c,count(c) where a=x group by c. This query cannot use the index on a,b,c for the group by because it cannot skip the index on b. However, if you have a query like select c,count(c) where a=x and b=y group by c, which filters on a,b and performs a group by on c, then one index on a,b,c can help with both the filtering and the group by.

MySQL performance tip No. 4 : Query reviews, query reviews, query reviews

Just having a Formula One car doesn’t win the race. Not if you put an inexperienced driver behind the wheel, and they crash it on the first corner. Similarly, you might have the best-tuned MySQL server on earth, but if you have bad queries your database will be slower than it should be.

You should regularly review your query design over time as your application changes with new features and bug fixes. The dataset and usage patterns of the application are likely to change over time as well, all of which can impact the query performance.

Setting aside time for query reviews and monitoring query execution time is very important. You can use a slow query log or Performance Schema for this, but implementing a monitoring tool will help you get even better data.

Keep in mind that it’s not always the slowest query that is the most important one to fix. For example, you might have a query that takes 30 seconds but runs twice a day alongside one that takes one second and runs 100 times a minute. For a big win, you should start optimizing the second query, as improving that one could save a lot of time and resources over the longer term.

MySQL performance tip No. 5: Visibility matters

Monitoring is one of the key elements of performance tuning. Without knowing the current workload and patterns it is hard to give any specific recommendations. In recent years, MySQL has improved its exposure of low-level MySQL/InnoDB metrics, which can help in understanding the workload.

For instance, in earlier versions, the Performance Schema was a bottleneck and had considerable impact, especially if you had many tables. In the recent versions of MySQL, many changes like the new Data Dictionary have improved performance, and now you can have many tables without significant impact.

Most of the modern monitoring tools are using Performance Schema in some way, so a good recommendation is to check out these tools and choose the one that best fits your needs. This visibility into performance data can be a huge asset in your investigations.

MySQL performance tip No. 6: Be careful with tuning tools

Some general recommendations given by tuning tools will work in most use cases. However, every workload and every schema is different. In some cases the general recommendations of tuning tools do not work, and it is wise to be careful when trusting these recommendations. Even innodb_dedicated_server, which is Oracle’s own tool and available in MySQL, can make questionable changes to the configuration.

For example, setting innodb_buffer_pool_size to 75% of total RAM is a good general rule of thumb. However, nowadays you can have servers with hundreds of gigabytes of RAM. If you have 512GB RAM, that will leave 128GB free and not dedicated to the buffer pool, which is a lot of waste.

innodb_log_file_size and innodb_log_files_in_group are defined based on the amount of RAM too. On servers with more than 128GB of RAM, this setting makes little sense as it will create 64 redo log files (yes, 64!) of 2GB each. This will result in 128GB of redo logs stored on disk. In most cases there is no need for such big redo log files, even in the busiest environments. This is therefore not a good recommendation.

innodb_flushing_method is the only value configured properly when automatic configuration is enabled. This variable sets the flushing method to O_DIRECT_NO_FSYNC, which is the recommended method when using Ext4 or XFS file systems, as it avoids double buffering of data.

A good recommendation would be to set innodb_buffer_pool_size to 75% or 80% on dedicated servers. On servers with large amounts of RAM, i.e., more than 128GB, increase this to 90% or even more with proper profiling of memory consumption. Similarly, for most cases with innodb_log_file_size and innodb_log_files_in_group, start with two files of 2GB each and monitor write log operations. Normally it is advisable to cover roughly one hour of writes when sizing redo logs.

Regarding innodb_flush_method, this option should be set to either O_DIRECT or O_DIRECT_NO_FSYNC for modern Linux file systems like Ext4 or XFS.

MySQL performance tip No. 7: I/O operations are still costly

MySQL and InnoDB try to minimize the number of I/O operations they carry out because accessing the storage layer is costly in terms of application performance. There are a few settings that can impact how many I/O operations InnoDB performs. Two of these settings are frequently misunderstood, and changing them will often cause performance issues.

innodb_io_capacity and innodb_io_capacity_max are variables that are related to the number of I/O operations for flushing in the background. Many customers increase the values of these settings to take advantage of modern SSDs that can provide very high I/O capacity at relatively low latencies. While the idea seems logical, increasing the I/O capacity settings can lead to a few problems.

The first issue is performance degradation by making InnoDB flush dirty pages too quickly, thus reducing the opportunity to modify a page more than once before being flushed. Keeping dirty pages in memory can significantly decrease the I/O operations needed to write data to storage.

Secondly, SSDs have an expected number of writes before they see a drop in performance. Increasing the amount of write operations can therefore affect the life span of your SSDs, even if you’re using high-end drives.

Cloud hosting is popular these days, and running your MySQL service instance in the cloud can be very useful. However, servers in the cloud will often have I/O limits or will charge more for using more I/O. By being aware of these limitations, you can carefully configure these parameters to make sure these limits are not reached and that I/O operations are minimized.

It’s important to mention innodb_lru_scan_depth as well because this setting controls how far down the buffer pool LRU page list the page cleaner thread scans for dirty pages to flush. If you have a write-heavy workload with a big buffer pool and many buffer pool instances, you can try decreasing this variable to use fewer I/O operations.

A good recommendation to follow is keep the defaults unless you know you need to change them.

It is also worth mentioning that the latest SSDs are specifically optimized for transactional databases. One example is Western Digital, which sought out expert assistance to help them meet the requirements for the new wave of applications being created.

MySQL performance tip No. 8: Take advantage of common table expressions

MySQL 8.0 saw the introduction of common table expressions (CTEs), which help to get rid of nested queries that will create derived tables. This new functionality allows you to create a custom query and reference the results as if they were a temporary table or a view. The difference is that CTEs can be referenced several times within a transaction without the need of explicitly creating and dropping them.

Given that CTEs are materialized only once, they tend to be faster in complex transactions that run multiple queries. Plus, CTE recursion is supported, so you can easily create complex structures in the SQL language like hierarchical models and series. If you want more details on CTEs, you’ll find an introduction here.

MySQL performance tip No. 9: Be aware of the cloud

There are many different cloud options worth considering for a MySQL deployment, from implementing a MySQL server instance in a VM that you manage, to using a database as a service (DBaaS) solution. The range of options is vast.

Many of these services promise to deliver a significant performance boost and to make all of your issues go away. In some simple use cases that may be true. However, even in the cloud, you have to know and understand the basic principles of databases, or your costs will increase significantly. This cost increase often happens because you are essentially solving problems by throwing more hardware at the problem rather than fixing the design.

All of the tips in this article are important and relevant in the cloud as well. It is important to monitor and understand your workload, to design proper schemas, and to change settings only when necessary. Tools like MySQL Enterprise Monitor and the open source Percona Monitoring and Management can be a great help for monitoring your servers.

MySQL performance tip No. 10: Keep the replicas up to date

In recent years there have been many improvements around MySQL replication, which could not keep up with the write workload on the primary server in many instances. In the latest MySQL major version, replication is parallel by default, which means that multiple replication threads are running and trying to apply transactions at the same time.

Of course the efficiency depends heavily on the write workload of your application, but in most cases parallel replication can help the replicas to keep up. You can control this with the replica_parallel_type and replica_parallel_workers variables. With the LOGICAL_CLOCK type, transactions are applied in parallel and dependencies are tracked based on the timestamps.

Overall, MySQL is the leading open source database for millions of developers and it will continue to be the platform of choice for many applications created around the world. By looking at issues around schema design, indexing, tuning, and I/O, you can improve the performance of your application dramatically. And don’t forget, deployment approaches like moving to the cloud will have their own performance implications too.

Tibor Köröcz is an architect for Percona, and Francisco Bordenave is a senior architect for Percona. They work on large-scale open source database deployments and services, covering MySQL, PostgreSQL, and MongoDB. Percona is a leading provider of unbiased open source database solutions.