SQL Formatting

This is something that drives me crazy. The other day I needed to debug a problem report so I asked the developer to send me SQL he was using to generate the report. The SQL looked something like this:

select distinct
c.[Column1] as Col1,
cn.Column2,
cn.Column3,
cn.Column4,
cn.Column5 as Col5,
o.Column6,
b.Column7 as Col7,
pa.Column8 as Col8, s.Column9,
oc.Column10 AS Col10,
eoc.Column11 as Col11,
case ht.Column12
when 'Val1' then 'Yes'
when 'Val2' then 'No'
when 'Val3' then 'Yes'
end as Col12,
pt.Column13,
oc.Column14 as Col14,
oc.Column15 AS Col15,
oc.Column16 AS Col16,
oc.Column17 AS Col17,
case eoc.Column18
when 1 then 'Y'
when 0 then 'N'
when 2 then 'Y'
end as Col18,
POs.Column19,
POs.Column20 as Col20,
CASE WHEN EOC.Column21 IS NOT NULL
THEN JC.Column22 ELSE CAST(O.Column23 AS VARCHAR(22)) END AS Col23,
case
when jc.Column24 like '%$%' then 'E'
when jc.Column25 like '%#%' then 'H'
else ''
end as Col25,
FL.COLUMN26 as Col26,
eoc.Column27 AS Col27,
eoc.Column28 AS Col28,
eoc.Column29,
sac.Column30 as Col30

from table1 oc with(nolock)
inner join table2 o on oc.tblID = o.tblID
inner join table3 eoc on oc.tblID = eoc.tblID
left outer join table4 c on eoc.tblID = c.tblID
left outer join table5 on eoc.tblID = POs.tblID
inner join table6 cn on oc.tblID = cn.tblID
LEFT OUTER JOIN table7 sac ON eoc.tblID = sac.tblID
LEFT outer JOIN table8 pa ON oc.tblID =pa.[tblID]
left outer join table9 s on pa.tblID = s.tblID
inner join table10 pt on oc.tblID = pt.tblID
left outer join table11 ct on CN.tblID = ct.tblID
left outer join table12 jcl on EoC.tblID = jcl.tblID and cn.tblID = jcl.tblID
left outer join table13 jcl1 on O.TBLID = jcl1.tblID and cn.tblID = jcl1.tblID
left outer join table14 jc on jcl.tblID = jc.tblID
left outer join table15 ht on oc.tblID = ht.tblID
LEFT OUTER JOIN table16 fl ON EOC.TBLID=fL.tblID
left outer join table17 bl on cn.tblID = bl.tblID and cn.tblID = bl.tblID
left outer join table18 b on bl.tblID = b.tblID
WHERE c.column like '%condition%'

Sure it ran, but trying to read it was a major pain. If I had written that query it probably would have looked something like this:

SELECT DISTINCT
c.Column1 AS Col1
,cn.Column2
,cn.Column3
,cn.Column4
,cn.Column5 AS Col5
,o.Column6
,b.Column7 AS Col7
,pa.Column8 AS Col8
,s.Column9
,oc.Column10 AS Col10
,eoc.Column11 AS Col11
,CAS E ht.Column12
WHEN 'Val1'
THEN 'Yes'
WHEN 'Val2'
THEN 'No'
WHEN 'Val3'
THEN 'Yes'
END AS Col12
,pt.Column13
,oc.Column14 AS Col14
,oc.Column15 AS Col15
,oc.Column16 AS Col16
,oc.Column17 AS Col17
,CAS E eoc.Column18
WHEN 1
THEN 'Y'
WHEN 0
THEN 'N'
WHEN 2
THEN 'Y'
END AS Col18
,POs.Column19
,POs.Column20 AS Col20
,CAS E
WHEN EOC.Column21 IS NOT NULL
THEN JC.Column22
ELSE CAS T(O.Column23 AS VARCHAR(22))
END AS Col23
,CAS E
WHEN jc.Column24 like '%$%'
THEN 'E'
WHEN jc.Column25 like '%#%'
THEN 'H'
ELSE ''
END AS Col25
,FL.COLUMN26 AS Col26
,eoc.Column27 AS Col27
,eoc.Column28 AS Col28
,eoc.Column29
,sac.Column30 AS Col30
FROM
table1 oc
INNER JOIN
table2 o
ON oc.tblID=o.tblID
INNER JOIN
table3 eoc
ON oc.tblID=eoc.tblID
LEFT OUTER JOIN
table4 c
ON eoc.tblID=c.tblID
INNER JOIN
table6 cn
ON oc.tblID=cn.tblID
INNER JOIN
table10 pt
ON oc.tblID=pt.tblID
LEFT OUTER JOIN
table5
ON eoc.tblID=POs.tblID
LEFT OUTER JOIN
table7 sac
ON eoc.tblID=sac.tblID
LEFT OUTER JOIN
table8 pa
ON oc.tblID=pa.tblID
LEFT OUTER JOIN
table9 s
ON pa.tblID=s.tblID
LEFT OUTER JOIN
table11 ct
ON CN.tblID=ct.tblID
LEFT OUTER JOIN
table12 jcl
ON EoC.tblID=jcl.tblID
AND cn.tblID =jcl.tblID
LEFT OUTER JOIN
table13 jcl1
ON O.TBLID =jcl1.tblID
AND cn.tblID=jcl1.tblID
LEFT OUTER JOIN
table14 jc
ON jcl.tblID=jc.tblID
LEFT OUTER JOIN
table15 ht
ON oc.tblID=ht.tblID
LEFT OUTER JOIN
table16 fl
ON EOC.TBLID=fL.tblID
LEFT OUTER JOIN
table17 bl
ON cn.tblID=bl.tblID
AND cn.tblID=bl.tblID
LEFT OUTER JOIN
table18 b
ON bl.tblID=b.tblID
WHERE
c.column like '%condition%'

This formatting takes up a lot more screen real estate, but it is much more readable. Now, I'm not saying everyone has to format their queries the way I do, but I am saying that developers should try to be consistent and make their code human-readable. As am sure we have all heard, the majority of the costs for any software project stem from maintenance. Taking a little time up front to format code in a consistent manner can really help make the process of debugging and maintaining code much easier. Plus it helps in making wrong code look wrong.

Some of My SQL Formatting Conventions

Leading Commas

I've been bitten by the missing comma syntax error way too many times. I find that placing commas at the beginning of the line helps me find those errors a lot faster.

Quick quiz, which line is missing the comma?

SELECT DISTINCT
c.[Column1] AS Col1,
cn.Column2,
cn.Column3,
cn.Column4,
cn.Column5 AS Col5,
o.Column6,
b.Column7 AS Col7,
pa.Column8 AS Col8
s.Column9,
oc.Column10 AS Col10,
eoc.Column11 AS Col11,
CASE ht.Column12
WHEN 'Val1'
THEN 'Yes'
WHEN 'Val2'
THEN 'No'
WHEN 'Val3'
THEN 'Yes'
END AS Col12,
pt.Column13,
oc.Column14 AS Col14,
oc.Column15 AS Col15,
oc.Column16 AS Col16,
oc.Column17 AS Col17,
CASE eoc.Column18
WHEN 1
THEN 'Y'
WHEN 0
THEN 'N'
WHEN 2
THEN 'Y'
END AS Col18,
POs.Column19
And now, which line is missing the comma?
SELECT DISTINCT
c.[Column1] AS Col1
,cn.Column2
,cn.Column3
,cn.Column4
,cn.Column5 AS Col5
,o.Column6
,b.Column7 AS Col7
,pa.Column8 AS Col8
s.Column9
,oc.Column10 AS Col10
,eoc.Column11 AS Col11
,CASE ht.Column12
WHEN 'Val1'
THEN 'Yes'
WHEN 'Val2'
THEN 'No'
WHEN 'Val3'
THEN 'Yes'
END AS Col12
,pt.Column13
,oc.Column14 AS Col14
,oc.Column15 AS Col15
,oc.Column16 AS Col16
,oc.Column17 AS Col17
,CASE eoc.Column18
WHEN 1
THEN 'Y'
WHEN 0
THEN 'N'
WHEN 2
THEN 'Y'
END AS Col18
,POs.Column19

Wasn't it easier to find the problem line in the second example?

WHERE Predicates

When writing where predicates I always put the operator at the beginning of the line and only put one condition per line. This allows me to easily comment a single predicate at a time, very useful when debugging complex report queries.

WHERE
a.col = 'Some Condition'
AND b.col = 1234
-- AND c.col = 'Some Other Condition'

INNER JOINS before OUTER JOINS

You may have noticed in my re-formatting of the query above I moved all the INNER JOINs before the OUTER JOINs. While this used to be recommended in order to get the best query performance, most query optimizers are now smart enough to figure this out on their own. However I still do this because it makes debugging queries with several joins much easier. Nothing is worse than trying to figure out why a report isn't returning the expected results only to find an INNER JOIN hidden among thirty or so OUTER JOINs.

Tools

If you find you have a big nasty query to untangle (not one you wrote of course;) there are several tools out there that can format SQL for you. The one I use is SQLinForm. I've been using the free online version for awhile now, but the developer recently released a free desktop version which is what I currently use.

If anyone has any other SQL formatting tips or has a recommendation for another SQL formatting tool (one that integrates with Eclipse perhaps?) please share.

Comments
Lola LB's Gravatar Umm . . . in the second problem example, it no longer is "incorrect", because the comma exists there, as such: ,pa.Column8 AS Col8
# Posted By Lola LB | 7/11/07 11:11 AM
Nathan Mische's Gravatar Lola, the error was actually on a different line for the second example. I guess it makes more sense to have the same error in both examples so I'm changing them now.
# Posted By Nathan Mische | 7/11/07 12:14 PM
Boyan's Gravatar I've tried a bunch of online SQL formatters but the one I like best is at http://www.wangz.net/cgi-bin/pp/gsqlparser/sqlpp/s...
# Posted By Boyan | 7/11/07 1:52 PM
James's Gravatar Here is another free online sql formatter:
http://www.wangz.net/gsqlparser/sqlpp/sqlformat.ht...
# Posted By James | 11/29/07 8:33 AM
sumanth's Gravatar you can try www.sqlinform.com
# Posted By sumanth | 8/28/08 2:13 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.8.001.