Faster, Better String Comparisons

Note: Inspired by this video on YouTube.

The Problem

When performing string comparisons in C#, using ToLowr() or ToUpper() has some issues:

A new string is allocated to perform the comparison, in a loop that is going to give the garbage collector more work to clean up.
Not all language cultures will work correctly (or at least as expected).
It's slow.

For example, in a new console app:

dotnet new console

Create the following code:

string str = "Hello World!";
string str = "ISubscribed";
Console.WriteLine(str.ToLower());
Console.WriteLine(str.ToLower() == "isubscribed");

Run this:

dotnet run

This returns the following, as expected:

isubscribed
True

However, if, for example, we chnage the current culture to Turkish:

Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("tr-TR");
string str = "ISubscribed";
Console.WriteLine(str.ToLower());
Console.WriteLine(str.ToLower() == "isubscribed");

Then run this, we get the following output:

ısubscribed
False

In turkish a capital I lowercases to 'ı' so the strings no longer match as expected.

The Solution

We can get around this by using string comparison:

Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("tr-TR");
string str = "ISubscribed";
Console.WriteLine(str.ToLower());
Console.WriteLine(string.Equals("isubscribed", str, StringComparison.OrdinalIgnoreCase));

Which now gives us the expected result:

isubscribed
True

Here the comparison is performed correctly despite the culture differences.

Note: If we know the string is not null, the following may be easier:

Console.WriteLine(str.Equals("isubscribed", StringComparison.InvariantCultureIgnoreCase));

Different Comparison Types

Comparing with just StringComparison.Ordinal is the fastest approach, it performs a byte-wise comparison so is great for keys values, tokens, identifiers, file names on Windows, etc.

Comparing with StringComparison.CurrentCulture takes the culture into account, so in the example above we would return False in the comparison check, which might be what we want for UIs, etc.

Comparing with StringComparison.InvariantCulture compares by ignoring culture-specific rules, great when you want to produce consistent results across different machines and users, regardless of their cultural settings, for example with data serialisation, hashing, etc.

All the above have IgnoreCase variants, for example IgnoreCultureIgnoreCase.

A rule of thumb:

If the result is something the user will see, you probably want to use CurrentCulture variants.
If it's for logs or internal/back-end processes, using Ordinal will be faster.

Objects with Comparers

Some objects, such as Dictionaries and HasSet may have comparers that need to be defined, so this is also useful there:

var comments = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
    { "Mick", "I say hello to the world!" },
    { "Fatima", "Merhaba Dünya!" }
};

Console.WriteLine(comments.ContainsKey("mick"));

This returns true as we specified to ignore the case using an ordinal.

Similarly for HashSet (with a language tweak to prove the point):

Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo("tr-TR");

var comments = new HashSet<string>(StringComparer.OrdinalIgnoreCase)
{
    "MIck",
    "Fatima"
};

Console.WriteLine(comments.Contains("mick"));

Still returns True, as expected.

EFCore Considerations

If you are using SQL Server with EF Core you are likely using the default of case insensitive comparisons, so this may work:

var filter = "michael"

var user = db.Users.Where(u => u.username == filter);

So if the DB is using the default settings this would return the stored record with a username of "Michael".

However, if the database has been configured with different collations it may not, so it's no guaranteed.

You could try something like this to fix that:

var filter = "michael"

var user = db.Users.Where(u => EF.Functions.Like(u.username, filter);

The key thing to remember is don't use ToLower() as it will be applied to every row in the result set.

Benchmarks

To see the performance differences run the following (add package BenchmarkDotNet!)

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Engines;

public class Program
{
    public static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<StringComparisonBenchmarks>();
    }
}

[SimpleJob(RunStrategy.Monitoring, iterationCount: 1000, invocationCount: 1)]
public class StringComparisonBenchmarks
{
    private const string str = "Hello, Universe!";

    [Benchmark]
    public bool StringEquals()
    {
        return string.Equals(str, "hello, universe!", StringComparison.OrdinalIgnoreCase);
    }

    [Benchmark]
    public bool ToLower()
    {
        return str.ToLower() == "hello, universe!";
    }
}

Run this in Release mode (or BenchmarkDotNet will complain!):

dotnet run -c Release

This gives results like this:

Method	Mean	Error	StdDev	Median
StringEquals	731.4 ns	446.7 ns	4,280.0 ns	600.0 ns
ToLower	7,734.4 ns	18,817.5 ns	180,306.0 ns	1,900.0 ns