Two strings are anagrams if they are made up of the same set of characters. Examples
“hello” and “loleh”“123123” and “312312”“qwerty” and “wretqy”
The degree of “anagram-ness” can vary
ignore case?ignore non-numeric characters?ignore whitespace?
In this post we’ll only consider word-characters only and the comparison will be case-insensitive to make the problem more interesting.
We’ll write a function that accepts two integers and returns a boolean true if the strings are anagrams, otherwise false.
We’ll look at two solutions out of many that exist out there
using a character mapusing string sort
What is a character map? It is a map where the key is of type char and the value if of type integer. We collect the unique characters of a string and count how many times each character occurs in the string. E.g.
CharCount‘f’2‘g’1‘i’2‘d’1‘o’1
We do that for both strings and compare the counts of each unique character.
Let’s start with a skeleton
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace Algorithms
{
public class Anagram
{
public bool AreAnagrams(string first, string second)
{
if (first == null || second == null) return false;
return AreAnagramsWithCharMap(first, second);
//return AreAnagramsWithSortedStrings(first, second);
}
private bool AreAnagramsWithCharMap(string first, string second)
{
return false;
}
private bool AreAnagramsWithSortedStrings(string first, string second)
{
return false;
}
private Dictionary<char, int> MakeCharacterMap(string input)
{
string cleaned = CleanInput(input);
return cleaned.Distinct().ToDictionary(c => c, c => cleaned.Count(s => s == c));
}
private string CleanInput(string input)
{
return Regex.Replace(input, @"[_]+|[^\w]+|[\d-]+", "").ToLower();
}
}
}
We start by some null-checking and return false if either of the two inputs is null. The AreAnagramsWithCharMap function has been wired up but it’s not yet implemented. The function for the second solution AreAnagramsWithSortedStrings has also been prepared.
We have two private functions as well
CleanInputthis one takes a string and strips all underscores, white space and non-word characters from it and returns the lower-case version of itMakeCharacterMapfirst we clean the incoming input stringsecond we use a couple of LINQ operators to build the character mapDistinct() to gather the unique characters from the stringToDictionary() to build the map itself, the key will be the character itself and for the value we count the number of occurrences of that character in the string
Let’s look at the implementation of AreAnagramsWithCharMap
private bool AreAnagramsWithCharMap(string first, string second)
{
var charMapFirst = MakeCharacterMap(first);
var charMapSecond = MakeCharacterMap(second);
if (charMapFirst.Count != charMapSecond.Count)
{
return false;
}
return charMapFirst.All(kvp => charMapSecond.ContainsKey(kvp.Key) ? kvp.Value == charMapSecond[kvp.Key] false);
}
We first create the two character maps. If they differ in size then we can immediately return false. It means that one of the strings has at least one more character than the other so it’s pointless to continue.
Otherwise we make use of the All LINQ operator which return true of all the elements of a collection fulfil a certain condition. The condition is based on two parameters
whether the character map contains the character as the key in the first placewhether that character occurs with the same frequency as in the source map
If both conditions are fulfilled for all characters in the character maps then we return true.
Here’s a set of unit tests
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace Algorithms.Tests
{
[TestClass]
public class AnagramTests
{
[TestMethod]
public void AreAnagramsTests()
{
Anagram a = new Anagram();
Assert.IsTrue(a.AreAnagrams("hello", "___ hllOe!! 456 ???"));
Assert.IsFalse(a.AreAnagrams("sdfs", null));
Assert.IsFalse(a.AreAnagrams(null, "sdfs"));
Assert.IsFalse(a.AreAnagrams(null, null));
Assert.IsFalse(a.AreAnagrams("qwerty", "yewr"));
Assert.IsFalse(a.AreAnagrams("qwerty", "qwertyuiop"));
Assert.IsTrue(a.AreAnagrams("? par**lIame%%nt !", "partIAL men"));
Assert.IsTrue(a.AreAnagrams("a gentleman", "elegant man"));
}
}
}
The will all pass.
The second solution takes the two cleaned strings, puts them in order and compares them. This solution is slightly less performant than the first one due to the string ordering though. The map comparison doesn’t involve any ordering so it’s somewhat quicker.
Here’s the implemented function
private bool AreAnagramsWithSortedStrings(string first, string second)
{
string sortedOne = string.Concat(CleanInput(first).OrderBy(c => c));
string sortedTwo = string.Concat(CleanInput(second).OrderBy(c => c));
return sortedOne == sortedTwo;
}
We again clean the input string and then call the OrderBy LINQ operator. It returns an ordered collection of characters from the string, i.e. not an ordered string. Hence we need to embed this bit of code in string.Concat so that we build the string again from the characters. Finally we simply compare the two strings.
Wire up this function from the main one and rerun the unit tests. They will still pass.
public bool AreAnagrams(string first, string second)
{
if (first == null || second == null) return false;
//return AreAnagramsWithCharMap(first, second);
return AreAnagramsWithSortedStrings(first, second);
}