The Data Science Lab

Lightweight Mathematical Combinations Using C#

After previously discussing permutations, Dr. James McCaffrey of Microsoft Research uses step-by-step examples and full code presentations to explore combinations.

A zero-based mathematical (n, k) combination is a subset of k integers from 0 to n-1. For example, if n = 5 and k = 3 there are 10 combinations:

0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4

It's usual to list combinations in what's called lexicographical order. If each combination is interpreted as an ordinary integer, the combinations are listed from smallest (12) to largest (234).

The number of (n,k) combinations is n! / (k! * (n-k)!) where n! is factorial(n) = n * (n-1) * (n-2) * . . 1. The function is often called Choose(). For example, Choose(5, 3) = 5! / (3! * (5-3)!) = 120 / (6 * 2) = 120 / 12 = 10.

The number of combinations gets very, very large as n and k increase. For example, Choose(500, 100) =

204,169,423,593,847,671,561,387,240,724,193,094,
030,165,460,090,325,932,474,143,661,505,758,330,
944,415,515,994,945,349,100,113,181,687,417,345

which is significantly larger than the estimated number of atoms in the Universe.

To deal with combinations using the C# language it's necessary to use the BigInteger data type which can handle arbitrarily large values.

Mathematical combinations are related to, but quite different from, mathematical permutations. A zero-based mathematical permutation of order n is a rearrangement of the integers from 0 to n-1. For example, one permutation of order n = 5 is (2, 0, 1, 4, 3).

Figure 1: Combinations Using C# in Action
[Click on image for larger view.]Figure 1: Combinations Using C# in Action

A good way to see where this article is headed is to take a look at the screenshot of a demo program in Figure 1. The demo illustrates how to create and display an (n,k) combination, compute Choose(n, k) using the BigInteger type, display all (n,k) combinations using a Successor() function, and compute a specific combination element directly.

This article assumes you have intermediate or better programming skill with the C# language but doesn't assume you know anything about mathematical combinations. The complete demo code is presented in this article and is also available in the accompanying download. All normal error checking has been removed to keep the main ideas as clear as possible.

To create the demo program I used Visual Studio 2022 (the free Community edition) with .NET 6 (in spite of the new "de-proved" Console template). However, the demo has no significant dependencies so any relatively recent versions of VS and .NET can be used.

Defining and Displaying a Combination
The simplest possible way to define a mathematical combination is as an array of integer. The demo program implements a MakeComb() function as:

static int[] MakeComb(int n, int k)
{
  int[] result = new int[k];
  for (int i = 0; i < k; i++)
    result[i] = i;
  return result;
}

The MakeComb() function can be called like so:

int n = 5; int k = 3;
int[] c = MakeComb(n, k); 
Console.WriteLine("Initial Combination(n=5, k=3) is: ");
ShowComb(c);

The ShowComb() function is defined as:

static void ShowComb(int[] comb)
{
  int n = comb.Length;
  for (int i = 0; i < n; ++i)
    Console.Write(comb[i] + " ");
  Console.WriteLine("");
}

An alternative design is to define a dedicated Combination class, but in most scenarios using an integer array is simple and effective.

Defining a Choose Function
The demo program uses the BigInteger type to define a Choose() function as presented in Listing 1. The function uses several math tricks for efficiency. The first trick is based on the idea that Choose(n, k) = Choose(n, n-k), for example, Choose(100, 97) has the same value as Choose(100, 3). This trick is useful because smaller values of k are easier to compute.

The second trick is best explained by a concrete example. Computing Choose() using the basic definition is inefficient because the n! and k! factorial terms can be very large. An alternate definition of Choose() looks like this:

Choose(100, 3) = (100 * 99 * 98) / (3 * 2 * 1)
               = 50 * 33 * 98
               = 161,700

In words, the denominator is k! (which, by using the first trick can be small) and the numerator is k terms in the form n * (n-1) * (n-2) * . . . which is much smaller than n!

Listing 1: A Choose() Function

using System.Numerics; 
static BigInteger Choose(int n, int k)
{
  // number combinations
  if (n < 0 || k < 0)
    throw new Exception("Negative argument in Choose()");
  if (n < k) return 0; // special
  if (n == k) return 1; // short-circuit

  int delta, iMax;

  if (k < n - k) // ex: Choose(100,3)
  {
    delta = n - k; iMax = k;
  }
  else           // ex: Choose(100,97)
  {
    delta = k; iMax = n - k;
  }

  BigInteger ans = delta + 1;
  for (int i = 2; i <= iMax; ++i)
    ans = (ans * (delta + i)) / i;

  return ans;
}

The BigInteger type is defined in the System.Numerics namespace. The demo code defines Choose(n, k) as 0 when n < k as a special case.

The demo program calls the Choose() function like this:

BigInteger numCombs = Choose(5,3);
Console.WriteLine("Number of n=5, k=3 combs is: ");
Console.WriteLine(numCombs);

numCombs = Choose(100, 10);
Console.WriteLine("Number ways choose 10 from 100: ");
Console.WriteLine(numCombs.ToString("N0"));

Because the Choose() function result can be very large, it's often helpful to use the "N0" formatting specification to place commas in the output.

Defining a Successor Function
Defining a function that returns the successor combination to a given combination is a well-known problem. The demo program defines a Successor() function in Listing 2.

Listing 2: The Combination Successor() Function

static int[] Successor(int[] comb, int n, int k)
{
  if (IsLast(comb, n, k) == true)
    return null;

  int[] result = new int[k];  // make copy
  for (int i = 0; i < k; ++i)
    result[i] = comb[i];

  int idx = k - 1;
  while (idx > 0 && result[idx] == n - k + idx)
    --idx;
      
  ++result[idx];

  for (int j = idx; j < k - 1; ++j)
    result[j + 1] = result[j] + 1;

  return result;
}

Suppose n = 10 and k = 6 and a current combination is (0, 2, 3, 4, 8, 9). The successor combination is (0, 2, 3, 5, 6, 7).

The Successor() function first finds the left-most index (idx) of the value to increment. In this example idx = [3]. The value at [3] is 4 and it is incremented giving an intermediate result of (0, 2, 3, 5, 8, 9). Then the algorithm sequentially increments the tail values to the right of idx giving the final successor result of (0, 2, 3, 5, 6, 7).

With a Successor() function in hand, it's possible to iterate through all combinations with code like:

int[] c = MakeComb(5, 3);  // [0,1,2]
Console.WriteLine("All n=5, k=3 combinations: ");
while (c != null)
{
  ShowComb(c);
  c = Successor(c, n, k);
}

The demo Successor() function returns null if the current combination is the last combination. For example, for n = 8 and k = 3, the last combination in lexicographical order is (5, 6, 7). The demo program defines an IsLast() function as:

static bool IsLast(int[] comb, int n, int k)
{
  // is comb(8,3) like [5,6,7] ?
  if (comb[0] == n - k)
    return true;
  else
    return false;
}

The last (n, k) combination will have the form (n-k), (n-k+1), . . (n-1). Because of lexicographical ordering, it's enough to check the first value in the combination to see if it is n-k because that condition is true only for the last combination.

The mth Lexicographical Combination Element Directly
Suppose you want a specific combination element. For example, all 10 n = 5, k = 3 combinations are:

[0] 0 1 2
[1] 0 1 3
[2] 0 1 4
[3] 0 2 3
[4] 0 2 4
[5] 0 3 4
[6] 1 2 3
[7] 1 2 4
[8] 1 3 4
[9] 2 3 4

You want a function named Element that accepts an index, m, and returns the specified combination, for example, for n = 5 and k = 3, Element(m=5) returns (0, 3, 4).

Because there are only Choose(5, 3) = 10 possible combinations you could start with the initial (0, 1, 2) combination and then call the Successor() function inside a for-loop with five iterations and end up at the desired combination.

But this simple approach only works for very small values of n and k. Somewhat surprisingly, it is possible to write a function that directly returns a specified combination. The demo program defines an Element() function in Listing 3.

Listing 3: Directly Computing the mth Combination Element

static int[] Element(BigInteger m, int n, int k)
{
  // compute element [m] using the combinadic
  BigInteger maxM = Choose(n, k) - 1;

  if (m > maxM)
    throw new Exception("m value is too large in Element");

  int[] ans = new int[k];

  int a = n;
  int b = k;
  BigInteger x = maxM - m; // x is the "dual" of m

  for (int i = 0; i < k; ++i)
  {
    ans[i] = LargestV(a, b, x); // see helper below    
    x = x - Choose(ans[i], b);
    a = ans[i];
    b = b - 1;
  }

  for (int i = 0; i < k; ++i)
    ans[i] = (n - 1) - ans[i];

  return ans;
}

static int LargestV(int a, int b, BigInteger x)
{
  int v = a - 1;
  while (Choose(v, b) > x)
    --v;
  return v;
}

The Element() function calls a helper LargestV() function. The Element() function can be called like this:

int n = 100;
int k = 10;
BigInteger m = BigInteger.Parse("9999999999");
int[] c = Element(m, n, k);
Console.WriteLine("Combination [" + m + "] " +
  " of C(n=100,k=10): ");
ShowComb(c);

The algorithm for the Element() function is beautiful but quite complicated. The algorithm is based on expressing an integer as a "combinadic," which is a linear combination of Choose() function values. You can find a detailed explanation of the algorithm in my blog post.

Wrapping Up
The examples in this article use zero-based combinations. This is convenient because in many practical uses of combinations, combination values map to array indices. In a pure mathematical context, one-based combinations are more common.

An idea that's closely related to mathematical combinations has the rather odd name of "Stirling numbers of the second kind," usually written as S(n, k). The function returns the number of ways to partition n items into k subsets. For example, suppose n = 4 and k = 2, then S(4, 2) = 7. If the n items are (0, 1, 2, 3) then the 7 partitions are: (0 | 123), (1 | 023), (2 | 013), (3 | 012), (01 | 23), (02 | 13), and (03 | 12). It's possible to implement the S(n, k) function using the ideas presented in this article.

About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube