Navigate back to the homepage

Improve query caching with IN clause padding

Mark Brown
October 31st, 2020 · 3 min read

👋 Introduction

In my experience, the number one cause of application performance problems is not your application code – it’s your persistence layer. Problems in this area can be caused by many different ‘sins’; Improper entity relationships (think LAZY vs EAGER fetching), Inefficient queries or indeed the cardinal sin – N+1 queries! A lot of the time the application can end up in this way as a result of a lack of awareness of what you are asking the persistence layer to do. Often, you’ll find that simply enabling logging of sql statements will open your eyes to the problem. In fact, when using a data access framework that generates statements on your behalf – it should be mandatory that you inspect the generated statements to ensure both their effectiveness, and their performance.

Today we’re going to talk about a simple optimisation trick that comes after you have done all you can do for the query, it’s been carefully developed, tested, and you have given consideration to query optimisation – now what? Well you can enable IN clause padding to get the query cache on your side so that less execution plans will be required for your query.

All code examples in this post are available on my GitHub.

🤨 The Problem

Execution plans

Execution plans (or query plans) are built by the query optimiser and is an attempt to calculate the most efficient way to process the request (the SQL query). SQL Server has to build an execution plan for each request that it has to execute. The execution plan is build based on several considerations;

The tables it needs to join. The Indexes to use The sub-queries it has to execute. How aggregations of GROUP BY are calculated. The estimated cost and load the operations place on the system. Other even more complex considerations. Based on combinations of the above, it cannot always be guaranteed that ‘best’ execution plan will be chosen – For example if your query is not making use of correct filtering, the optimiser may decide not to use any indexes to fulfil the query results. Therefore, it is imperative that you are producing carefully considered queries.

SQL has to put a lot of work to build an execution plan, so it caches the execution plan in memory to avoid having to do the same work over and over again.

The query cache

SQL Server uses the Query Cache to reuse plans. SQL Server can avoid the overhead of calculating the execution plan for each request and speed up the execution of the queries. The query cache allows SQL Server to reuse Execution Plans for subsequent requests. Within the query cache, additional information is captured, for example the number of times a query has been executed, resources used, etc, so inspecting the contents of the query cache can be a way of diagnosing performance issues in your query statements.

The query cache can be cleared by executing T-SQL commands, but this is not recommended in production code.

Default behaviour

Consider the following entity:

1@Entity
2public class Customer {
3
4 @Id
5 @GeneratedValue(strategy = GenerationType.IDENTITY)
6 private Long id;
7
8 private String firstName;
9
10 private String lastName;
11
12 // Getters/Setters omitted
13}

Now imagine you want to load multiple Customer entities by id , so you have written the following query:

1@Query("FROM Customer WHERE id IN :ids")
2Set<Customer> findAllById(@Param("ids") Collection<Long> ids);

When running the follow test cases:

1def "Find all by id"(Set<Long> ids) {
2 expect: "Size to be equal"
3 customerRepository.findAllById(ids).size() == ids.size()
4 where:
5 ids | _
6 [1L, 2L] | _
7 [2L, 3L] | _
8 [3L, 4L, 5L] | _
9 [1L, 2L, 3L, 4L, 5L] | _
10 [4L, 5L, 6L, 7L, 8L, 9L] | _
11 [4L, 5L, 6L, 7L, 8L, 9L, 10L] | _
12 [3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L] | _
13}

The following SQL statements are executed by Hibernate:

1select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ?)
2select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ?)
3select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ?)
4select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ?)
5select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ? , ?)
6select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ? , ? , ?)
7select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ? , ? , ? , ?)

The query was executed 7 times in total in the test, only 2 of which will be sharing an execution plan as the other 5 queries had a differing number of bind parameters.

🧐 Enabling IN clause padding

If you enable the ‘padding’ property in Hibernate; hibernate.query.in_clause_parameter_padding, e.g;

1spring.jpa.properties.hibernate.query.in_clause_parameter_padding=true

And re-running the test cases, the following statements are executed.

1select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ?)
2select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ?)
3select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ?)
4select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ? , ? , ? , ?)
5select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ? , ? , ? , ?)
6select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ? , ? , ? , ?)
7select customer0_.id as id1_1_, customer0_.first_name as first_na2_1_, customer0_.last_name as last_nam3_1_ from customer customer0_ where customer0_.id in (? , ? , ? , ? , ? , ? , ? , ?)

This time, only 3 execution plans are required – the statements being executed are using either 2, 4, or 8 bind parameters.

This is possible because Hibernate is now padding parameters to the next power of two. Test cases 4-7 in my example were very deliberately outlined to exaggerate this problem – each of those can now use just one query statement with 8 bind parameters, previously this was 4 separate query statements with 5, 6, 7, 8 parameters respectively.

🔥 Conclusion

As you can see, this one-liner is an easy way to boost your application’s statement caching potential – free performance gains!

So what’s the downside? Well, JDBC has a limit on the amount of parameters that can be bound – 2100 to be precise – so you need to be careful that rounding does not exceed this. Honestly though, if your queries do bind that many parameters, then you should probably also have a look at those too as that will be another source of performance issues.

More articles from Mark Brown

Using Testcontainers for Integration Testing

As a follow on from my last post about running sql server in docker, I thought i’d write about something that I have just introduced for the first time into one of my projects – in the hope that some of you feel as passionate as I do about testing your production code (and not just the equivalent in H2-specific syntax 😱). I hope that through this post I can show you how easy it is to include this dependency into your project, and be on your way to having your integration tests running against your production database!

October 20th, 2020 · 2 min read

Running Microsoft SQL Server.. on a Mac!

A few years ago I switched my main work machine from Windows to Mac – despite the reliance I have on SQL Server to run our application (Well, there’s also the dev-only H2 database, but even then I knew that wouldn’t fly due to some of the ‘subtle’ differences).

October 6th, 2020 · 1 min read
© 2020–2021 Mark Brown
Link to $https://twitter.com/marktjbrownLink to $https://github.com/mtjbLink to $https://instagram.com/marktjbrownLink to $https://www.linkedin.com/in/mark-brown-9952b4a8/