How to optimise your SQL Database to handle millions of records – part 2

SianLoong
4 min readJul 16, 2022

--

Optimise your SQL database can be troublesome, let’s continue the series after the part 1.

Create indexes for your table columns

Yes, create indexes for your tables, but don’t overdo it.

  • Create composite indexes if possible – don’t create multiple indexes for every single column
  • Select the correct index type –use UNIQUE type for the column you want to be unique; use SPATIAL type for geometry data type column.

Make sure you only create indexes on those columns involved in filter or sort operations, don’t create any unused or unnecessary indexes, because indexes will consume your storage.

Use EXPLAIN to get the total records count

Previously, we did like this.

SELECT COUNT(id) FROM Table;

After that, we replaced it with this. It may not accurate most of the time, but it’s good enough for our case.

EXPLAIN SELECT id FROM Table;

Surprisingly, this little hack works like charm.

Use WHERE and ORDER BY clause only on indexed columns

Don’t use WHERE or ORDER BY clause on the non-indexed columns. Even you filter using the indexed columns, you must follows the index sequence, the order of the indexed columns do matter on the performance.

For example:

We indexed the first_name column follow by last_name column.

CREATE INDEX full_name ON table (first_name, last_name);

Following statements won’t perform index scanning even though the columns were indexed because they aren’t follow the indexed orders.

  • SELECT * FROM table WHERE last_name = ‘Doe’ AND first_name = ‘John’;
  • SELECT * FROM table ORDER BY last_name, first_name;

Instead, we should alway follows the indexed orders like such :

  • SELECT * FROM table WHERE first_name = ‘John’ AND last_name = ‘Doe’;
  • SELECT * FROM table ORDER BY first_name, last_name;

Avoid leading wildcard for LIKE operator

SELECT * FROM Users WHERE Email LIKE ‘%someone@hotmail%’;

LIKE operator with leading wildcard will not use index scan but will perform a full table scan due to the nature of B-Tree index behaviour.

MySQL documentation do mention this

Archive unused data

If your data are unused, archive it with any data storage/object storage if possible.

Archive technique

FIX your slow queries

Make sure you enable slow query logs for your SQL database, identify the slow queries and fix it.

Identify your slow queries using slow query log

Limit your recordset

Any query statement without a LIMIT clause can be very dangerous, it consumes more CPU and memory if the target table has over million records. It’s recommended to put a LIMIT clause on every query statement.

Don’t do this.

SELECT * FROM VeryBigTable;

Should do this instead.

SELECT * FROM VeryBigTable LIMIT 100;

No stored procedure

If you still using the stored procedure, remove it! Move those logic into your application layer, don’t place it inside your database, let your application handle the business logic instead.

Benchmark on INSERT statement with stored procedure and without stored procedure

Upgrade your SQL database

Upgrade your SQL database to the latest version. Yes, latest version of SQL database do ensure query performance improvement in most of the time. For example, MySQL 8.0 improved their query performance by 34% for columns which using charset of utf8mb4.

MySQL 8.0 is perform better than MySQL 5.7 while using utf8mb4 collation

I often hear people say SQL database is bad, we should consider other alternative database, such as NoSQL database, wide-column database, etc. In my opinion, this is conceptually wrong, the reason is not the database is bad, but the developer is bad.

To better handle the data load in production, we do :

  • We limit client side filtering and sorting, only allow client side to do filtering or sorting on the indexed columns.
  • We did benchmark on certain SQL statements to check QPS.
  • We use our own SQL ORM to avoid toxic queries.
  • We use Elasticsearch for tracing, logging, searching and reporting operation.
  • We use Redis for caching the HOT data.
  • We use MongoDB to store unstructured data such as user reviews and comments.

Basically, we used different databases for different use cases.

In summary, there are no one solution to fit all problems, we should choose the best solution which suits our problem.

--

--

SianLoong

Frontend most of the time. Sometime backend. Sometime low-level.