A Division of Technology Associates International Corporation
Maximo Blog

Performance Tuning SQL statements: The From Clause

September 25, 2008 in SQL, SQL Server by Michael Chrisman 0 Comments

I want to talk about techniques for speeding up SQL statements. As a database developer, this is the biggest thing I am asked to do improve the performance of my application. Like most developers, I have found that performance tuning SQL Statements involved some type of Voodoo magic. I have even found most DBA not very helpful in this area (I believe it is because, other than creating indexes, they don't have a clue how to make it better either). This started me on a path to found out. Of all the material I have read, only two things are consistent: indexes and trial and error. I really think the trial and error is the most importing one. Anytime you try to performance tune your SQL, always use some tool that will tell you how long it took to execute the SQL state. Then get a baseline before you start.

To understand how to speed up a query, you have to understand a little on how the database processes them. The first step is that it processes the FROM clause. It has to build the record set before it can proceed with the SQL statement. Next comes the WHERE clause. This eliminates records from the record set returned by the FROM clause. Next it processes the SELECT clause and then lastly it processed the ORDER BY clause.

Because of this hierarchical processing, the more records you can eliminate in the FROM clause, the fewer records that will have to be processed in the where clause. Tuning the FROM clause has the greatest potential for improving the performance of your SQL Statement.

ELIMATING RECORDS IN THE FROM CLAUSE

First of all, you should be doing all joins in the from clause. Just about all databases now support the ANSI method of doing joins in the FROM clause. It may not be provide the much performance increase, but it does allow you do some record elimination in the from clause. For example, let's say we are joining the sales database to the employee database and we want sales from California. Normally you would write the SQL like this:

   1:  SELECT
   2:  FROM sales INNER JOIN employee ON sales.empid = employee.empid
   3:  WHERE sales.state = 'CA'
   4:  AND

Well, if we have a million+ records in sales, but only 100,000 from California, then this statement will process all 1 million records with the WHERE clause. However, we can eliminate all the non-California records in the FROM clause. That SQL statement would look like this:

   1:  SELECT
   2:  FROM sales INNER JOIN employee on sales.empid = employee.empid AND sales.state = 'CA'
   3:  WHERE

Obviously, this simple version doesn't look like it would help much, but where this really helps is you have a complex WHERE clause. For example, let's do the same query, but this time we also need all sales for last year. Having the FROM clause eliminate 900,000 records before we do the date check would greatly help.

Another place it can help is when dealing with indexes. Lets look at all sales for last year for California.

   1:  SELECT
   2:  FROM sales
   3:  WHERE state = 'CA'
   4:  AND YEAR(salesDate) = 2007;

You find that your SQL is slow and is doing a full table scan (that's bad). You tell your DBA that you need an index on sales for state and the YEAR(salesDate). Your DBA just laughs at you and goes back to doing what he was doing before. As you sit at your desk, calling into question his heritage, you play with the query and find that there is an index on just state and without the YEAR(salesDate), it returns the 100,000 records in under a second. Knowing this, you could re-write your SQL statement to take advantage of this:

   1:  SELECT
   2:  FROM (SELECT * FROM sales WHERE state = 'CA') sales
   3:  WHERE YEAR(salesDate) = 2007;

Now your SQL statement uses the index to eliminate 900,000 records and you only have to scan 100,000. Be careful here, you will gain nothing by moving your entire WHERE clause into a sub-select in the FROM clause. As always, trial and error rules the day.

ELIMATING OUTER JOINS

Outer joins are very costly. Having to do a join that may or may not be there just takes time. If however, you are just joining to the table to pull a couple of values for the SELECT clause and you are not using them in the WHERE clause, then you can actually remove the Outer join. You do this by using a sub select in the SELECT clause. For example:

   1:  SELECT 
   2:    sales.companyName, 
   3:    sales.salesDate, 
   4:    employee.employeename
   5:  FROM sales LEFT JOIN employee ON sales.salesPersonID = employee.empID
   6:  WHERE sales.state = 'CA'
   7:  AND YEAR(salesDate) = 2007;

For the above query, let's assume that sales has 1 million records and the SQL statement returns 5,000 records. This query would do the outer joining for all 1 million records. However, if we write it this way:

   1:  SELECT 
   2:    sales.companyName,
   3:    sales.salesDate,
   4:    (SELECT employeeName FROM employee WHERE sales.salesPersonID = employee.empID) as employeeName
   5:  FROM sales
   6:  WHERE sales.state = 'CA'
   7:  AND YEAR(salesDate) = 2007;

This new SQL statement would return the same results but since the outer join is done in the SELECT clause, we are only outer joining 5,000 records instead of 1 million.

Cartesian JOINS

Now I am a Certified DBA and if I see you try to run a Cartesian join in a database I am going to have a very strong talk with you. I am not saying that there is no use for Cartesian joins, I am just saying the valid reasons are few and far behind. Cartesian joins a very costly (joining every record from one table with every record from another table). My first thought is to try not to do this, if you can. If you have to use one, then instead of using tables in the FROM clause, use sub selects and eliminate as many records before you do the Cartesian join.

Post a Comment

Remember my personal information.
Notify me of follow-up comments?

We don't know if you're a human. Confirm below: