By: Jeffrey Yao |Comments (4) | Related: > TSQL
Problem
The SQL SeverORDER BY clause is commonly used in T-SQL scripts, if used properly, it cansolve lots of problems. Such as:
- How can I sample [N] records out of a table randomly?
- How can I find themedian valueof a numeric column?
- How can I order a product table by always putting a specific category atthe front of the result?
- How can I find the biggest value records based on a column in each group?
Each question may be resolved multiple ways, but I will focus on using ORDERBY in a simple SELECT statement.
Solution
ORDER BY has seen its biggest change in SQL Server 2012 when OFFSET and FETCHkey words are introduced. But before that, ORDER BY can still do lots of interestingthings. We will look at a few examples here.
Example 1
I have a big table, and I just want to sample 10 records randomlyfor my testing purpose, how can I do it?
We will prepare a sample table and then code against it.
use tempdb-- prepare test tableif object_id('dbo.product', 'U') is not null drop table dbo.product;gocreate table dbo.product (id int identity primary key, [name] varchar(100), [color] varchar(20), [price] int)go-- populate the table with some data, -- there are three distinct colors for each product; with L0 as (select 1 as c union all select 1 as c), L1 as (select L0.c from L0 cross join L0 as T ), L2 as (select L1.c from L1 cross join L1 as T ), L3 as (select L2.c from L2 cross join L2 as T ), L4 as (select L3.c from L3 cross join L3 as T ), L5 as (select rn = ROW_NUMBER() over (order by (select null) ) from L4)insert into dbo.product (name, color, price)select [name]='Product' + cast(rn as varchar(5)), color = case ( cast (ceiling(rand(rn*7)*100000) as int) %3) when 0 then 'red' when 1 then 'green' else 'blue' end, price = cast(ceiling(rand(rn*7)*123456) as int)%56789from L5;
Now let's list 10 records randomly.
-- to list top 10 records randomly, this query is good for medium to large (5K) data set.-- for small data set, using the 2nd way select top 10 * from dbo.productorder by datepart(ns, getdate())% cast((rand(id)*11111111) as int); -- another way commonly knownselect top 10 * from dbo.productorder by newid();
Each run will return different results as shown below.
Actually there is a third way using OFFSET and FETCH as shown below. This approachwill always randomly grab 10 records in sequence.
select * from dbo.productorder by (select null)offset cast(ceiling(rand()*1234567) as int)% (select count(*) from dbo.product) rowsfetch next 10 rows only
Question 2
Using the same table, we will create a small sample data and then find the medianvalue of [Price].
We will populate the table with 9 records and later with 10 records to see whetherour solution works.
truncate table dbo.productdeclare @i int = 1;while @i < 10begin insert into dbo.product (name, color, price) select 'product'+cast(@i as varchar(3)), case @i%3 when 0 then 'blue' when 1 then 'green' else 'red' end, @i set @i += 1; endselect * from dbo.product
The results are shown below:
We can see the median value is 5 for [Price]. So to calculate this in T-SQL,we can use the following code:
-- find the row(s) with median value select * from dbo.product order by price offset (select count(*)-1 from dbo.product)/2 rowsfetch next (case (select count(*) from dbo.product) % 2 when 1 then 1 else 2 end) rows onlygo-- find the real median value for [Price] columnwith c as (select *from dbo.product order by price offset (select count(*)-1 from dbo.product)/2 rowsfetch next (case (select count(*) from dbo.product) % 2 when 1 then 1 else 2 end) rows only)select median = avg(price*1.) from c
Assume we insert another record into the table to make it 10 records:
truncate table dbo.productdeclare @i int = 1;while @i < 11 -- change 10 to 11begin insert into dbo.product (name, color, price) select 'product'+cast(@i as varchar(3)), case @i%3 when 0 then 'blue' when 1 then 'green' else 'red' end, @i set @i += 1; endselect * from dbo.product
The result will be like the following, we will see two median value rows justas expected:
The key here is the OFFSET and FETCH setting, which are decided by the row count,if row count is an odd number, we set the OFFSET to (count-1)/2 and then FETCH next1 row. But if row count is an even number, we set the same OFFSET value while setFETCH for the next 2 rows.
For an example, if row count = 9, OFFSET = (9-1)/2 = 4, FETCH = 1. If row count=10,OFFSET=(10-1)/2 = 4 (4.5 becomes 4 due to integer conversion), and FETCH=2
Question 3
In the test table created above, there are three distinct colors, Red, Green,Blue, now for my report, I want Green color product to be always on the top of myreport, how can I do so in one T-SQL statement?
First let's prepare the sample data, we will still use the same table as createdin Question 1, but we will insert 10 records for a better visualization.
-- populate the test table with 10 records onlytruncate table dbo.product;; with L0 as (select 1 as c union all select 1 as c), L1 as (select L0.c from L0 cross join L0 T), L2 as (select L1.c from L1 cross join L1 T), L3 as (select rn = ROW_NUMBER() over (order by (select null) ) from L2)insert into dbo.product (name, color, price)select [name]='Product' + cast(rn as varchar(5)), color = case ( cast (ceiling(rand(rn*7)*100000) as int) %3) when 0 then 'red' when 1 then 'green' else 'blue' end, price = cast(ceiling(rand(rn*7)*123456) as int)%100from L3where rn <=10;--list records with Green product on the topselect * from dbo.productorder by case color when 'green' then 0 else 1 end asc
The results are shown below:
Notice color Blue and Red are randomly displayed. So another extension to thequestion could be how to list records in the sequence of Green, Red and Blue
--list records with Green product on the topselect * from dbo.productorder by case color when 'green' then 0 when 'red' then 1 else 2 end asc
In this ORDER BY clause, we can assign the [color] value to a value, such as[green] to 0, [red] to 1. Since the ORDER BY is defined in ASC order, so when ORDERBY is evaluated, Green will be listed before Red because 0 is less than 1 with ASCorder.
The results are shown below:
Question 4
How can I list the most expensive i.e. [price], product in each color group?
We will switch back to the first big sample of data to do the demo, we can justre-run the first table population script to reset up the data.
-- find the most expensive product in each color group.select top 1 with ties *from dbo.productorder by row_number() over (partition by color order by price desc);
ROW_NUMBER() will list the records in each color group, so the 1st record ofeach group will have the same value of 1 and they will be automatically selectedvia TOP 1 WITH TIES. When I first used ORDER BY this way long ago,I was totally amazed by the powerful beauty of ORDER BY clause.
The results are shown below:
Sometimes, there can be a few products that are same cost, in this case, insteadof using row_number(), we should use dense_rank().
-- if there are product that are tied in price in the same group, -- we should use dense_rank() instead of row_number()select top 1 with ties *from dbo.productorder by dense_rank() over (partition by color order by price desc)
The results are shown below:
To understand this behavior, we should know the difference ofROW_Number andDense_Rank, the ROW_NUMBER numbers all rows sequentially (for example 1, 2,3, 4, 5). While DENSE_RANK provides the same numeric value for ties, the rank ofa row is one plus the number of distinct ranks that come before the row in question.
In other words, if two values are the same and are both the biggest value, thenboth rows will have a rank number = 1, while in row_number scenario, one row hasrow number =1 while another will be 2, thus only the row with 1 will be selectedby TOP 1 WITH TIES.
Summary
In this tip, we have examined some interesting uses of ORDER BY, which are veryuseful under some niche business requirements. Without ORDER BY, we may write lengthyscripts to achieve the same result.
Currently, OFFSET and FETCH still have some limitations, one of which is thatthe OVER clause does not support OFFSET and FETCH. If this is supported, we maybe able to write more elegant code when doing group processing, such as findingthe Nth largest value in each group.
All code is tested with SQL Server 2012 Developer Edition.
Next Steps
One item worth digging deeper is to compare the performance of other solutionswith the solutions provided in this tip. We can compare their execution plan andthe statistics IO/CPU data.
For question 2, if the requirement is to find the median value of each colorgroup, what is your solution?
Also the following links may help to better understand the ORDER BY clause:
- Different ways to get random data for SQL Server data sampling
- SQL Server Random Sorted Result Set
- Randomly Retrieve SQL Server Records
About the author
Jeffrey Yao is a senior SQL Server consultant, striving to automate DBA work as much as possible to have more time for family, life and more automation.
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips