How to optimise your SQL Database to handle millions of records – part 2
Optimise your SQL database can be troublesome, let’s continue the series after the part 1.
Create indexes for your table columns
Yes, create indexes for your tables, but don’t overdo it.
- Create composite indexes if possible – don’t create multiple indexes for every single column
- Select the correct index type –use UNIQUE type for the column you want to be unique; use SPATIAL type for geometry data type column.
Make sure you only create indexes on those columns involved in filter or sort operations, don’t create any unused or unnecessary indexes, because indexes will consume your storage.
Use EXPLAIN to get the total records count
Previously, we did like this.
SELECT COUNT(id) FROM Table;
After that, we replaced it with this. It may not accurate most of the time, but it’s good enough for our case.
EXPLAIN SELECT id FROM Table;
Surprisingly, this little hack works like charm.
Use WHERE and ORDER BY clause only on indexed columns
Don’t use WHERE or ORDER BY clause on the non-indexed columns. Even you filter using the indexed columns, you must follows the index sequence, the order of the indexed columns do matter on the performance.
We indexed the first_name column follow by last_name column.
CREATE INDEX full_name ON table (first_name, last_name);
Following statements won’t perform index scanning even though the columns were indexed because they aren’t follow the indexed orders.
- SELECT * FROM table WHERE last_name = ‘Doe’ AND first_name = ‘John’;
- SELECT * FROM table ORDER BY last_name, first_name;
Instead, we should alway follows the indexed orders like such :
- SELECT * FROM table WHERE first_name = ‘John’ AND last_name = ‘Doe’;
- SELECT * FROM table ORDER BY first_name, last_name;
Avoid leading wildcard for LIKE operator
SELECT * FROM Users WHERE Email LIKE ‘%someone@hotmail%’;
LIKE operator with leading wildcard will not use index scan but will perform a full table scan due to the nature of B-Tree index behaviour.
Archive unused data
If your data are unused, archive it with any data storage/object storage if possible.
FIX your slow queries
Make sure you enable slow query logs for your SQL database, identify the slow queries and fix it.
Limit your recordset
Any query statement without a LIMIT clause can be very dangerous, it consumes more CPU and memory if the target table has over million records. It’s recommended to put a LIMIT clause on every query statement.
Don’t do this.
SELECT * FROM VeryBigTable;
Should do this instead.
SELECT * FROM VeryBigTable LIMIT 100;
No stored procedure
If you still using the stored procedure, remove it! Move those logic into your application layer, don’t place it inside your database, let your application handle the business logic instead.
Upgrade your SQL database
Upgrade your SQL database to the latest version. Yes, latest version of SQL database do ensure query performance improvement in most of the time. For example, MySQL 8.0 improved their query performance by 34% for columns which using charset of utf8mb4.
I often hear people say SQL database is bad, we should consider other alternative database, such as NoSQL database, wide-column database, etc. In my opinion, this is conceptually wrong, the reason is not the database is bad, but the developer is bad.
To better handle the data load in production, we do :
- We limit client side filtering and sorting, only allow client side to do filtering or sorting on the indexed columns.
- We did benchmark on certain SQL statements to check QPS.
GitHub - si3nloong/toxic-queries: Benchmark on toxic queries
Benchmark on toxic queries. Contribute to si3nloong/toxic-queries development by creating an account on GitHub.
- We use our own SQL ORM to avoid toxic queries.
GitHub - si3nloong/sqlike: Golang Sequel ORM that supports Enum, JSON, Spatial, and many more
A Golang SQL ORM which anti-toxic query and focus on the latest features. go get github.com/si3nloong/sqlike Fully…
Basically, we used different databases for different use cases.
In summary, there are no one solution to fit all problems, we should choose the best solution which suits our problem.