What does foreach really do? (.NET)

Cross-posted from Ryan's blog.

In this post I’ll explain how the C# compiler interprets the foreach keyword. I’ll start by looking at the common pattern of enumerating a list.

var numbers = new List<int>() { 1, 2, 3, 4, 5 };
foreach(var n in numbers) 
{
    Console.WriteLine(n);
}

List is consumable by foreach because it implements IEnumerable. Let’s take a look at what IEnumerable and IEnumerator define.

interface IEnumerable<T>
{
    IEnumerator<T> GetEnumerator();
}

interface IEnumerator<T>
{
    bool MoveNext();
    T Current { get; }
    void Reset();
}

Each call to GetEnumerator() will return a new instance of an enumerator. An enumerator does the heavy lifting for foreach. When an enumerator is created, it is initialized so that the first call to MoveNext() will advance the current item pointer to the first item. Calling Reset() will also return the enumerator to this state. Each subsequent call to MoveNext() will attempt to move the current pointer to the next item. MoveNext() returns true if Current points to a new item. It returns false if the end of the enumerable was reached.

The compiler utilizes IEnumerator's members to internally rewrite a foreach loop to code that is roughly equivalent to the following:

var numbers = new List<int>() { 1, 2, 3, 4, 5 };
IEnumerator<int> enumerator = numbers.GetEnumerator();
try {
    while(enumerator.MoveNext()) {
        int n = enumerator.Current;
        Console.WriteLine(n); // The inner scope of the foreach block is embedded here
    }
} finally {
    ((System.IDisposable) enumerator).Dispose();
    // Or if no implicit conversion exists:
    System.IDisposable d = enumerator as System.IDisposable;
    if (d != null) d.Dispose();
}

There are a few things to note about this code:

  • Enumerators are read-only and forward-only. foreach’s implementation does not use anonymous methods. The compiler actually rewrites the code at compile time.
  • Separate foreach statements will always consume their own instance of the enumerator.
  • The enumerator is only disposed if it implements IDisposable. If the enumerator is a sealed type, the finally block will be empty.
  • Enumeration does not have exclusive access to the underlying collection. It is inherently not thread-safe. If you need thread-safe enumeration, use a type from the System.Collections.Synchronized namespace.
  • If the underlying collection is modified (an item is added or removed), the next call to MoveNext() or Reset() will throw InvalidOperationException.
  • Enumerators cannot be passed between threads. If an enumerator detects it is being called from a different thread than it was initialized on, it will throw an exception.
  • If the underlying collection is an array, the compiler will optimize this code by using a for loop and comparing the index to the array length. When iterating a multidimensional array, the rightmost index is increased first. (But don’t use multidimensional arrays in the first place.)

(The above rules are true for built-in .NET enumerators and iterator blocks. Details can be found in the C# language specification -- ECMA-334: 8.8.4)

foreach can also be used on any class that implements a method named GetEnumerator(). The compiler treats classes that implement MoveNext() and Current as enumerators. This is to support .NET 1.0 code and is obsoleted by .NET 4.0’s dynamic keyword. You might see this pattern if you deal with legacy code.

Enumerating Strings

foreach can be used to enumerate a string as a char array.

foreach(var c in “foo”) {
    Console.Write(c);
    Console.Write(' ');
}
// outputs: f o o 

Watch out for multiple enumeration

When working with a raw IEnumerable (such as the result of some LINQ extension methods), it’s possible to do extra work by enumerating it multiple times. If the the enumerable is the result of database query it will query the database multiple times.

var squares = ints().Select(x => x * x);    // var is IEnumerable<int>
foreach(var s in squares) {
    Console.Write(s);
}
foreach(var s in squares) {
    // This will rerun the select statement.
    Console.Write(s);
}

To prevent this, add .ToList() or .ToArray() after the select method. This forces the enumeration to occur only once. Resharper will automatically detect multiple enumeration and warn you.

resharper.png

In my next post, I’ll discuss iterator blocks--a special syntax for finer-grained control over iteration.