Beware the Performance Impact of the T-SQL FORMAT Function

Since SQL Server 2012, the FORMAT function has been a benefit to developers and DBAs, offering an easy way to convert dates and numbers to strings, akin to how it would be done in a language like C# or Java.

After happily coding with the function for a number of years, performance has arisen as a reason not to use it.

After recently reviewing a developer’s code, I saw an ugly and antiquated string expression that I immediately replaced with the FORMAT function. The original string expression was used to place leading zeros in front of a number, similar to this:

@Prefix + RIGHT('00000'+CAST(c.MyNumericColumn AS VARCHAR(5)),5) AS Formatted_Expression

With indignation, the code was overhauled to something prettier like this:

@Prefix + FORMAT(c.MyNumericColumn,'00000') AS Formatted_Expression

With an air of triumph, the review was completed and thereafter performance testing was done to evaluate the change. The above expressions were evaluated on a relatively large number of rows and, surprisingly, the revised code ran slower. Not much code had changed in the review, but even still, It took a bit to figure out that it was the FORMAT function that had slowed the code down enough to be noticeable.

A quick script demonstrates the significance of the problem:

USE tempdb
GO

CREATE PROCEDURE test_oldstyle
AS
SET NOCOUNT ON


SELECT TOP 500000
c.name,RIGHT('00000'+CAST(c.column_id AS VARCHAR(5)),5) AS Column_Id
  INTO #temp
  FROM sys.columns c
CROSS JOIN sys.columns c1
ORDER BY c.object_id,c.column_id
GO

CREATE PROCEDURE test_format
AS
SET NOCOUNT ON

SELECT TOP 500000
c.name,FORMAT(c.column_id,'00000') AS Column_Id
  INTO #temp
  FROM sys.columns c
CROSS JOIN sys.columns c1
ORDER BY c.object_id,c.column_id
GO

DECLARE @i INT=5
WHILE @i>0
BEGIN
    EXEC test_oldstyle
    EXEC test_format
    SET @i-=1
END
GO

There are two procedures created (“test_oldstyle” and “test_format”) to demonstrate the cost of using the FORMAT function over a large number of rows. Using a CROSS JOIN on a clean tempdb database yields around 800K rows. The only difference between the queries is the “column_id” expression.

The SQL dynamic management view sys.dm_exec_procedure_stats can be used to gauge the results:

SELECT o.name,
       avg_cpu_s = total_worker_time/1000000.0/execution_count,
       avg_elp_s = total_elapsed_time/1000000.0/execution_count,
       avg_reads = total_logical_reads/execution_count,
       ps.*
  FROM sys.objects o
  JOIN sys.dm_exec_procedure_stats ps ON ps.object_id=o.object_id
 WHERE o.name IN ('test_oldstyle','test_format')

On an idle SQL Server 2019 instance (15.0.4043.16), the results for five executions are as follows:

name            avg_cpu_s    avg_elp_s    avg_reads
test_format 6.713 6.713 3314
test_oldstyle 0.220 0.220 3107

 

Similar results for a SQL Server 2014 instance (12.0.6118.4) are:

name            avg_cpu_s    avg_elp_s    avg_reads
test_format 7.674 7.744 3499
test_oldstyle 0.218 0.219 2827

The old, klunky and ugly string expression is the clear winner by an astonishingly large amount, though the gap is closed a bit on SQL Server 2019 when compared to 2014.

 

Conclusion

This is one of many instances where good SQL programming habits (e.g. a concise and clean expression) can cause performance troubles. Other familiar examples of being punished by using good SQL coding practices such as code modularity and reusability will often get you into deep performance trouble. So in instances when performance is a necessity, though very painful, preferred SQL coding habits need to be tossed into the rubbish.