Beware the Performance Impact of the T-SQL FORMAT Function
admin | Posted on |
Since SQL Server 2012, the FORMAT function has been a benefit to developers and DBAs, offering an easy way to convert dates and numbers to strings, akin to how it would be done in a language like C# or Java.
After happily coding with the function for a number of years, performance has arisen as a reason not to use it.
After recently reviewing a developer’s code, I saw an ugly and antiquated string expression that I immediately replaced with the FORMAT function. The original string expression was used to place leading zeros in front of a number, similar to this:
@Prefix + RIGHT('00000'+CAST(c.MyNumericColumn AS VARCHAR(5)),5) AS Formatted_Expression
With indignation, the code was overhauled to something prettier like this:
@Prefix + FORMAT(c.MyNumericColumn,'00000') AS Formatted_Expression
With an air of triumph, the review was completed and thereafter performance testing was done to evaluate the change. The above expressions were evaluated on a relatively large number of rows and, surprisingly, the revised code ran slower. Not much code had changed in the review, but even still, It took a bit to figure out that it was the FORMAT function that had slowed the code down enough to be noticeable.
A quick script demonstrates the significance of the problem:
USE tempdb
GO
CREATE PROCEDURE test_oldstyle
AS
SET NOCOUNT ON
SELECT TOP 500000
c.name,RIGHT('00000'+CAST(c.column_id AS VARCHAR(5)),5) AS Column_Id
INTO #temp
FROM sys.columns c
CROSS JOIN sys.columns c1
ORDER BY c.object_id,c.column_id
GO
CREATE PROCEDURE test_format
AS
SET NOCOUNT ON
SELECT TOP 500000
c.name,FORMAT(c.column_id,'00000') AS Column_Id
INTO #temp
FROM sys.columns c
CROSS JOIN sys.columns c1
ORDER BY c.object_id,c.column_id
GO
DECLARE @i INT=5
WHILE @i>0
BEGIN
EXEC test_oldstyle
EXEC test_format
SET @i-=1
END
GO
There are two procedures created (“test_oldstyle” and “test_format”) to demonstrate the cost of using the FORMAT function over a large number of rows. Using a CROSS JOIN on a clean tempdb database yields around 800K rows. The only difference between the queries is the “column_id” expression.
The SQL dynamic management view sys.dm_exec_procedure_stats can be used to gauge the results:
SELECT o.name,
avg_cpu_s = total_worker_time/1000000.0/execution_count,
avg_elp_s = total_elapsed_time/1000000.0/execution_count,
avg_reads = total_logical_reads/execution_count,
ps.*
FROM sys.objects o
JOIN sys.dm_exec_procedure_stats ps ON ps.object_id=o.object_id
WHERE o.name IN ('test_oldstyle','test_format')
On an idle SQL Server 2019 instance (15.0.4043.16), the results for five executions are as follows:
name avg_cpu_s avg_elp_s avg_reads
test_format 6.713 6.713 3314
test_oldstyle 0.220 0.220 3107
Similar results for a SQL Server 2014 instance (12.0.6118.4) are:
name avg_cpu_s avg_elp_s avg_reads
test_format 7.674 7.744 3499
test_oldstyle 0.218 0.219 2827
The old, klunky and ugly string expression is the clear winner by an astonishingly large amount, though the gap is closed a bit on SQL Server 2019 when compared to 2014.
Conclusion
This is one of many instances where good SQL programming habits (e.g. a concise and clean expression) can cause performance troubles. Other familiar examples of being punished by using good SQL coding practices such as code modularity and reusability will often get you into deep performance trouble. So in instances when performance is a necessity, though very painful, preferred SQL coding habits need to be tossed into the rubbish.