Tuesday, September 9, 2014

A look at how PowerShell handles Case Sensitivity

I was asked today but an attendee at one of my PowerShell workshops about why sorting data by Case did not appear as expected. So I thought I would dig a bit deeper into that.

As we know PowerShell is Case Insensitive by default..... well most things. The CmdLets Select-Object and Get-Unqiue actually seem to be Case Sensitive..... so it's almost everything by default.

When it comes to sorting data with Sort-Object and the -CaseSensitive switch parameter you may also be surprised that this does not order in the way you might think.

Lets take a fairly obvious array of strings to start with by just changing an individual character at a time:
$Strings = "aabbcc","aabbcC","aabbCc","aaBbcc","aAbbcc","Aabbcc","aabBcc";
$Strings | Sort-Object -CaseSensitive;

This sorts as you would expect like:

aabbcc
aabbcC
aabbCc
aabBcc
aaBbcc
aAbbcc
Aabbcc

However if we take a more complicated array of strings where we not only increase an individual character but we also have different order of characters or different characters all together within the strings like this:
$Strings = "abc", "acc", "adc", "aec", "afc", "agc", "ahc","aBc", "aCc", `
"aDc", "aEc", "aFc", "aGc", "aHc", "Abc", "Acc", "Adc", "Aec", "Afc", "Agc", "Ahc";
$Strings | Sort-Object -CaseSensitive;

Then the behavior changes to what you may not expect and results in:

abc
aBc
Abc
acc
aCc
Acc
adc
aDc
Adc
aec
aEc
Aec
afc
aFc
Afc
agc
aGc
Agc
ahc
aHc
Ahc

Notice the fact that not all the strings starting with lowercase 'a' are together?

So what is happening here, well it seems that first all strings are sorted regardless of case (i.e. all the strings with 'AEC' appear together). Then PowerShell starts evaluating each character within the string individually. So in the case of the 3 strings "aec","aEc", "Aec", the all lowercase version wins outright, but then the next string with the first lowercase 'a' wins and finally the string with uppercase 'A' is displayed..... but why do these all beat 'afc" well because even though they all start with 'a', 'ae' beats 'af', and most importantly regardless of case.

Lets look at a slightly more complicated example to highlight this more:
$Strings = "abc", "acc", "adc", "aec", "afc", "agc", "ahc","aBc", "aCc", `
"aDc", "aEc", "aFc", "aGc", "aHc", "Abc", "Acc", "Adc", "Aec", "Afc", "Agc", "Ahc", `
"aab","abb","Aab","Abb","ABb", "AbB";
$Strings | Sort-Object -CaseSensitive;

Which returns:

aab
Aab
abb
Abb
AbB
ABb
abc
aBc

Abc
acc
aCc
Acc
adc
aDc
Adc
aec
aEc
Aec
afc
aFc
Afc
agc
aGc
Agc
ahc
aHc
Ahc

Play around with this, try sorting file names in C:\Windows\System32 with both lower and upper cases, and you will see this behavior.

So why is this important, well if you are just using CmdLets then it is only going to be the sorting in case sensitive when you will really notice this behavior. However if you are using the Case Sensitive Operators, particularly the -CLT, -CLE, -CGT, -CGE then you are definitely going to need to care about this as the outcome may not be as you expect. For example.

#You would expect the string 'abc' to be lower than 'ABb' due to the lowercase first character 'a', but in fact this returns False.
"abc" -CLT "ABb"

Which returns False

In fact there is no difference between the above and this as it too (as expected) will return False, but this help to highlight how PowerShell is applying it's logic.
"abc" -LT "abb"


So in summary, Case Sensitivity Ordering only applies when strings have the exact same characters, because PowerShell (or more accurately .Net) is first ordering by the characters regardless of case, and then sorting by case within that group of common characters.


If you want to know what CmdLets support the CaseSensitive parameter then use these statements

#find the other cmdlets with CaseSensitive param
Get-Module -ListAvailable; Get-Command -ParameterName CaseSensitive;

# If you don't want to list the available modules first (to read them into the session) then use this
Get-Command | %{try {if ($_.Parameters.Count -gt 0 -and $_.Parameters.Keys.Contains("CaseSensitive") -eq $true){"Found in: $($_.Name)"}} catch {} finally {}}



Just to extend this investigation further I fired up Visual Studio, created a basic Windows Form with a List Box and the following code.

//create the array
String[] sStrings = new string[] {"abc", "acc", "adc", "aec", "afc", "agc", "ahc","aBc", "aCc", "aDc", "aEc", "aFc", "aGc", "aHc", "Abc", "Acc", "Adc", "Aec", "Afc", "Agc", "Ahc", "aab","abb","Aab","Abb","ABb", "AbB"};

//Sort the values
Array.Sort(sStrings);

//add the strings to the list box 
listBox1.Items.Clear();
listBox1.Items.AddRange(sStrings);

This outputs the values in the same order as above, as in:

....
Aab
abb
Abb
AbB
ABb
abc
aBc

Abc
acc
aCc
Acc
....

UPDATE: According to one of my .Net Programmer  mates (thanks Steve), this is all known as the Lexicographical order....



Legal Stuff: As always the contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.