Faster, Better String Comparisons
Note: Inspired by this video on YouTube.
The Problem
When performing string comparisons in C#, using ToLowr()
or ToUpper()
has some issues:
- A new string is allocated to perform the comparison, in a loop that is going to give the garbage collector more work to clean up.
- Not all language cultures will work correctly (or at least as expected).
- It's slow.
For example, in a new console app:
dotnet new console
Create the following code:
string str = "Hello World!";
string str = "ISubscribed";
Console.WriteLine(str.ToLower());
Console.WriteLine(str.ToLower() == "isubscribed");
Run this:
dotnet run
This returns the following, as expected:
isubscribed
True
However, if, for example, we chnage the current culture to Turkish:
Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("tr-TR");
string str = "ISubscribed";
Console.WriteLine(str.ToLower());
Console.WriteLine(str.ToLower() == "isubscribed");
Then run this, we get the following output:
ısubscribed
False
In turkish a capital I lowercases to 'ı' so the strings no longer match as expected.
The Solution
We can get around this by using string comparison:
Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("tr-TR");
string str = "ISubscribed";
Console.WriteLine(str.ToLower());
Console.WriteLine(string.Equals("isubscribed", str, StringComparison.OrdinalIgnoreCase));
Which now gives us the expected result:
isubscribed
True
Here the comparison is performed correctly despite the culture differences.
Note: If we know the string is not null, the following may be easier:
Console.WriteLine(str.Equals("isubscribed", StringComparison.InvariantCultureIgnoreCase));
Different Comparison Types
Comparing with just StringComparison.Ordinal
is the fastest approach, it performs a byte-wise comparison so is great for keys values, tokens, identifiers, file names on Windows, etc.
Comparing with StringComparison.CurrentCulture
takes the culture into account, so in the example above we would return False in the comparison check, which might be what we want for UIs, etc.
Comparing with StringComparison.InvariantCulture
compares by ignoring culture-specific rules, great when you want to produce consistent results across different machines and users, regardless of their cultural settings, for example with data serialisation, hashing, etc.
All the above have IgnoreCase
variants, for example IgnoreCultureIgnoreCase
.
A rule of thumb:
- If the result is something the user will see, you probably want to use
CurrentCulture
variants. - If it's for logs or internal/back-end processes, using
Ordinal
will be faster.
Objects with Comparers
Some objects, such as Dictionaries and HasSet may have comparers that need to be defined, so this is also useful there:
var comments = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
{ "Mick", "I say hello to the world!" },
{ "Fatima", "Merhaba Dünya!" }
};
Console.WriteLine(comments.ContainsKey("mick"));
This returns true as we specified to ignore the case using an ordinal.
Similarly for HashSet (with a language tweak to prove the point):
Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("tr-TR");
var comments = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
"MIck",
"Fatima"
};
Console.WriteLine(comments.Contains("mick"));
Still returns True, as expected.
EFCore Considerations
If you are using SQL Server with EF Core you are likely using the default of case insensitive comparisons, so this may work:
var filter = "michael"
var user = db.Users.Where(u => u.username == filter);
So if the DB is using the default settings this would return the stored record with a username of "Michael".
However, if the database has been configured with different collations it may not, so it's no guaranteed.
You could try something like this to fix that:
var filter = "michael"
var user = db.Users.Where(u => EF.Functions.Like(u.username, filter);
The key thing to remember is don't use ToLower()
as it will be applied to every row in the result set.
Benchmarks
To see the performance differences run the following (add package BenchmarkDotNet!)
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Engines;
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<StringComparisonBenchmarks>();
}
}
[SimpleJob(RunStrategy.Monitoring, iterationCount: 1000, invocationCount: 1)]
public class StringComparisonBenchmarks
{
private const string str = "Hello, Universe!";
[Benchmark]
public bool StringEquals()
{
return string.Equals(str, "hello, universe!", StringComparison.OrdinalIgnoreCase);
}
[Benchmark]
public bool ToLower()
{
return str.ToLower() == "hello, universe!";
}
}
Run this in Release mode (or BenchmarkDotNet will complain!):
dotnet run -c Release
This gives results like this:
Method | Mean | Error | StdDev | Median |
---|---|---|---|---|
StringEquals | 731.4 ns | 446.7 ns | 4,280.0 ns | 600.0 ns |
ToLower | 7,734.4 ns | 18,817.5 ns | 180,306.0 ns | 1,900.0 ns |