Hugh Garnier de Blog

The Impact of Database Schema Changes

Feb 17, 2023

Once upon a time, I was tasked with improving the performance of a PostgreSQL database for a client. After analyzing the database, I found that the database schema was poorly designed and that it was the root cause of the performance issues. I suggested changing the database schema to the client, but they were hesitant about making such significant changes to their database. To convince them, I had to demonstrate how changing the database schema could improve the overall performance.

The first change I made was to denormalize some of the tables. Denormalization is a technique where redundant data is added to a table to improve query performance. In the original database schema, there were two tables, one for customers and one for orders. When a query was executed to fetch all orders for a particular customer, the database had to perform a join between the two tables. This was a time-consuming operation and caused the database to slow down. To improve the performance, I denormalized the orders table and added the customer information to it. This reduced the number of joins required and improved the query performance significantly.

Here is the SQL code for denormalizing the orders table:

ALTER TABLE orders ADD COLUMN customer_name TEXT;

UPDATE orders SET customer_name = customers.name
FROM customers
WHERE orders.customer_id = customers.id;

The second change I made was to split large tables into smaller ones. In the original database schema, there was a single table that contained all the orders. This table was massive, and querying it was time-consuming. To improve the performance, I split the orders table into smaller tables based on the order date. This way, queries could be run on smaller tables, which improved the performance.

Here is the SQL code for splitting the orders table:

CREATE TABLE orders_2020 (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date DATE NOT NULL,
    ...
);

CREATE TABLE orders_2021 (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date DATE NOT NULL,
    ...
);

The third change I made was to use indexes on frequently queried columns. In the original database schema, there were no indexes, which meant that queries had to scan the entire table to find the required data. To improve the performance, I added indexes to the frequently queried columns. This reduced the query execution time and improved the overall performance.

Here is the SQL code for adding indexes:

CREATE INDEX orders_customer_id_idx ON orders (customer_id);
CREATE INDEX orders_order_date_idx ON orders (order_date);

While making these changes, I had to ensure that I did not break any of the existing functionality of the database. Therefore, I tested each change thoroughly and made sure that all existing queries and applications continued to work as expected.

After implementing the changes to the database schema, I performed several performance tests to compare the before and after performance of the database. The results were impressive. The query time for some of the most frequently executed queries had decreased by more than 50%. The overall database performance had improved, and the client was pleased with the results.

However, while making these changes, I also noticed that changing the database schema could sometimes have an adverse effect on performance. For example, I had previously worked on a project where a table had been denormalized to improve query performance. However, this had resulted in an increased number of writes to the database and had slowed down the application. This was because, in a denormalized table, data was stored redundantly, and updating the table required updating the redundant data as well. This had led to a slowdown in the application and an increase in the database size.

Another issue I had encountered in the past was when a table was split into smaller tables, and indexes were added to frequently queried columns. While this had improved the performance of the queries, it had also resulted in an increase in the size of the database. This was because indexes take up space in the database, and adding too many indexes can cause the database size to balloon.

To avoid these issues, it is important to carefully analyze the database schema and understand the application’s requirements before making any changes. It is also essential to test the changes thoroughly before implementing them in a production environment.

In addition to the changes I had made to the database schema, there are other techniques that can be used to improve PostgreSQL database performance. These include tuning the PostgreSQL configuration settings, optimizing queries, and using connection pooling. However, in my experience, changing the database schema is one of the most effective ways to improve database performance.

To summarize, changing the database schema can have a significant impact on the performance of a PostgreSQL database. By denormalizing tables, splitting large tables into smaller ones, and using indexes on frequently queried columns, the performance of the database can be improved significantly. However, it is important to carefully analyze the database schema, test each change thoroughly, and avoid making changes that may have an adverse effect on performance. By following these best practices, you can optimize the performance of your PostgreSQL database and ensure that it meets the needs of your application.

Tags: performance posgresql schema