Navigate back to the homepage

How to introduce a Cartesian Product into your generated queries

Mark Brown
February 24th, 2021 · 1 min read

…and why you shouldn’t!

👋 Introduction

When working with an ORM, it is essential that you inspect your generated queries. Often when doing so you will find they are not entirely what you expected. The CriteriaBuilder API can be very useful for building dynamic queries, but you must take great care – especially if you let your UI generate many wonderful queries to fulfil all of its filtering/sorting/etc.

If you aren’t aware of what queries are being generated by your application, perhaps this post will show you that you should be – and also how easy it can be to get wrong.

🕵🏻‍♂️ Criteria Query

The problem

Below is an innocent looking CriteriaQuery; Find all Cars that are in a Garage, filtering to those matching the user input of searchTerm – in this case ‘Browns”.

1String searchTerm = "Browns";
2CriteriaBuilder cb = entityManager.getCriteriaBuilder();
3CriteriaQuery<Car> query = cb.createQuery(Car.class);
4Root<Car> carRoot = query.from(Car.class);
5
6query.select(carRoot)
7 .where(cb.and(
8 cb.equal(carRoot.get("garage").get("name"), searchTerm),
9 cb.equal(carRoot.get("make"), make)
10 ));
11
12return entityManager.createQuery(query)
13 .getResultList();

Can you spot the issue? Lets take a look at the generated query;

1select car0_.id as id1_0_,
2 car0_.garage_id as garage_i4_0_,
3 car0_.make as make2_0_,
4 car0_.model as model3_0_
5from car car0_
6cross join car_garage cargarage1_
7where car0_.garage_id=cargarage1_.id and
8 cargarage1_.name=? and
9 car0_.make=?

I was expecting an INNER JOIN based on the code, but instead a CROSS JOIN was generated instead. This join type produces a cartesian product between all Car entities and all Garage entities – this is very inefficient.

The fix

Luckily, these can easily be updated like so;

1String searchTerm = "Browns";
2CriteriaBuilder cb = entityManager.getCriteriaBuilder();
3CriteriaQuery<Car> query = cb.createQuery(Car.class);
4Root<Car> carRoot = query.from(Car.class);
5
6query.select(carRoot)
7 .where(cb.and(
8 cb.equal(carRoot.join("garage").get("name"), searchTerm),
9 cb.equal(carRoot.get("make"), make)
10 ));
11
12return entityManager.createQuery(query)
13 .getResultList();

Note this use of .join("garage") instead of .get("garage")!

With the updated code, the generated query is now like so;

1select car0_.id as id1_0_,
2 car0_.garage_id as garage_i4_0_,
3 car0_.make as make2_0_,
4 car0_.model as model3_0_
5from car car0_
6inner join car_garage cargarage1_ on car0_.garage_id=cargarage1_.id
7where cargarage1_.name=? and
8 car0_.make=?

Easy, right?

🏎 Conclusion

In this simple example, you can see how easy it is to neglect queries that are generated by your application – and how they can have a crippling effect on your application performance. Often when using the Criteria API to generate queries, the requirement is a lot more complicated than this example – so it is crucial that you are always aware of what is being generated by your application.

More articles from Mark Brown

Activate Hibernate Query Statistics to investigate your performance issues

Often, you’re may find yourself wondering why application performance has degraded – and almost always that issue originates from your persistence layer. You can interrogate your query statistics in SQL Server using some of the built in (Transact SQL) views – which have been invaluable before to me investigating issues in customer environments.

January 31st, 2021 · 1 min read

Improve query caching with IN clause padding

In my experience, the number one cause of application performance problems is not your application code – it’s your persistence layer. Problems in this area can be caused by many different ‘sins’; Improper entity relationships (think LAZY vs EAGER fetching), Inefficient queries or indeed the cardinal sin – N+1 queries! A lot of the time the application can end up in this way as a result of a lack of awareness of what you are asking the persistence layer to do. Often, you’ll find that simply enabling logging of sql statements will open your eyes to the problem. In fact, when using a data access framework that generates statements on your behalf – it should be mandatory that you inspect the generated statements to ensure both their effectiveness, and their performance.

October 31st, 2020 · 3 min read
© 2020–2021 Mark Brown
Link to $https://twitter.com/marktjbrownLink to $https://github.com/mtjbLink to $https://instagram.com/marktjbrownLink to $https://www.linkedin.com/in/mark-brown-9952b4a8/