Friday, May 25, 2012

Next, Please! - A Closer Look at IEnumerable (Part 5 - Usage Tips & Wrap-up)

This is a continuation of "Next, Please! - A Closer Look at IEnumerable".  The articles in the series are collected together here: JeremyBytes - Downloads.

Last time, we took a look at the Strategy pattern and then used that pattern to create a StrategicSequence that allows us to pass in whatever algorithm we want to generate the sequence.  This gives us an class that can be extended without modifying the class directly and gives the application (client) the opportunity to select which algorithm to use.

Today, we'll explore some tips for using classes that implement IEnumerable<T> (whether they are classes we create ourselves or classes from the .NET Framework).

IEnumerable<T> Tips
As we've seen, IEnumerable<T> is a fairly simple interface that gives us access to a lot of functionality.  But, there are a number of things we should take into consideration when we decide on how to use that functionality in our applications.

Tip: Don't Modify Collection Items in a foreach Loop
This is actually a "don't ever do this" rather than a "tip".  When using a foreach loop (or using the enumerator more directly), it is very tempting to modify the items as you are iterating them.  However, this only causes problems (some more obvious than others).

First, what happens if we try to add or remove an item in a foreach loop?  Consider the following code:


The idea behind this code is that we iterate through our people collection (with a foreach loop).  If the FirstName property of the person is "John", then we want to remove that person from the collection.  Then we want to add that person object to the list box of our UI.

But we are not allowed to add or remove items from a List<T> while we are enumerating it.  This code will compile just fine, but if we try to run it, we get the following Exception:
System.InvalidOperationException: "Collection was modified; enumeration operation may not execute."
So adding and removing items is not allowed.  What about modifying items?  That's where things get a bit interesting.  Let's look at the following code:


Inside the foreach loop, we first add the "person" to the list box of our UI.  Then if the FirstName property is "John", we change it to "Test".  We would expect that we would see the "John" items in the list box (since we add thems before changing them).  But here is the output:


As you can see, "Test" is showing up in our output!  This is not the output we would expect from this code.  The strangeness is a result of trying to update a collection while we are iterating that collection.

Based on these problems, we never want to modify a collection as we are iterating it.  In most situations, it is fairly easy to come up with another solution that accomplishes our same goals by handling the modification outside of the iteration.  Note: "for" loops do not exhibit this same issue since they do not use the enumerator; this is sometimes a good solution (but not always).

Tip: Avoid Creating A Custom Class
In our samples from Part 3 and Part 4, we created classes that return a sequence of numbers.  I purposely chose this type of sequence because we could base our "MoveNext()" on a calculation of some type.  In most situations, though, there is no need to create a custom class that implements IEnumerable<T>.

As mentioned in Part 1, almost every .NET collection implements the IEnumerable interface.  This includes arrays, generic lists, linked lists, queues, stacks, and dictionaries.  Generally speaking, if you find yourself in a situation where you need an IEnumerable implementation (because you need to iterate through a set of items), you will probably also need the more advanced collection functionality that you get with one of these framework classes.

In most scenarios, we should start by looking at the .NET collection classes.  One of these classes will mostly likely fulfill our needs.

Tip: Use a Custom Class to Hide Functionality
Sometimes the .NET collections have more functionality than we want to expose in our application.  For example, let's say that we need a collection that we can iterate through, but we don't want to use "List<T>" because we don't want the collection items to be directly modifiable.

Here's an example how we can use a custom class to wrap a List<T> object and hide its functionality:


We have a custom class that implements IEnumerable<T> (so we'll get our iteration functionality).  The class contains a private list that is not visible externally.  The list gets initialized by the class constructor.

Notice the GetEnumerator() method.  Instead of implementing our own IEnumerator<T> class or using "yield return", we are simply calling the internal list's GetEnumerator() method.  This is perfectly valid.  And it makes sense to do it in this situation -- the IEnumerator is already implemented by an object in our class, so there is no reason for us to create a custom implementation.

The result of this class is that we have a list of objects that we can iterate through, but we have no way to modify the collection externally.  Note that we are also hiding the other List<T> members in the private class (such as IndexOf), but we still have access to all of the IEnumerable<T> extension methods, such as Single() and Where(), if we are searching for particular items in the collection.

Tip: Use a Custom Class to Reduce Overhead
Sometimes we don't need all of the functionality provided by a collection object.  In that case, we can create a custom class that implements IEnumerable<T> that only has the functionality that we need.  This is exactly what we did with our sequence classes: IntegerSequence, FibonacciSequence, and StrategicSequence.

We did not need any of the overhead associated with a collection since our values are calculated as we need them.  Granted, our example is a rather contrived one.  But it demonstrates that it is possible and practical to have a class that implements IEnumerable<T> without other collection-type functionality.

IEnumerable<T> Review
The last several articles have shown a lot of different aspects of IEnumerable<T>.  Let's do a quick review:

IEnumerable<T> Interface Members
We started by looking at the IEnumerable<T> and IEnumerator<T> interfaces. We saw the methods and properties (such as GetEnumerator(), Current, and MoveNext()) that are specified in each.

Iterator Pattern
The Iterator pattern is described by the Gang of Four as a way to get items one-by-one from a collection -- "Next, Please!"  The IEnumerable<T> interface describes an implementation of this pattern.

foreach Loop
We can use a "foreach" loop to iterate through the items contained in any class that implements IEnumerable<T>.  This is an easier way to interact with the class then if we were to explicitly call the GetEnumerator() method along with MoveNext() and Current.

Extension Methods
Extension methods give us a way to add functionality to a class without modifying the class directly.  From a syntactic standpoint, extension methods behave as if they are native methods of the class.

IEnumerable<T> and LINQ
LINQ (Language INtegrated Query) provides us with a myriad of extension methods on the IEnumerable<T> interface.  This gives us the ability to write queries and perform all types of functions against any class that implements IEnumerable<T>, including (but not limited to) filtering, sorting, aggregation, and grouping.

Implementing IEnumerable<T>: IntegerSequence
Our first custom implementation of IEnumerable<T> let us take a look at the members of IEnumerable<T> and IEnumerator<T>.  We saw what each member is designed for, and we used the members to create a class that returns a series of consecutive positive integers.

yield return
"yield return" provides us with shortcut syntax when implementing an enumerator.  It lets us create a method that retains its state between calls.  Using this, we can mimic a full implementation of the IEnumerator<T> class with a single method.

Implementing IEnumerable<T>: FibonacciSequence
Our second custom implementation of IEnumerable<T> was a bit more complex.  We used what we learned from the simple IntegerSequence and created a class that returns the Fibonacci sequence, where each value is the sum of the previous two values.

Strategy Pattern
The Strategy pattern is described by the Gang of Four as a way to create a set of encapsulated, interchangeable algorithms.  We took a look at the advantages of using the pattern as well as some of the negative consequences.

The StrategicSequence Class
Our final custom implementation of IEnumerable<T> let us explore the Strategy pattern in a bit more detail.  The StrategicSequence class accepts a strategy (the algorithm for calculated the sequence) as a constructor parameter.  This lets us externalize the algorithms into separate classes.  We created the IntegerStrategy, FibonacciStrategy, and SquareStrategy that implemented these algorithms.  And we saw the advantages that this gave to our particular application.

Wrap Up
In summary: IEnumerable<T> is awesome!  Okay, I might be a little too excited about an interface, but it offers us a ton of functionality with very little effort.  Implementing the interface ourselves is not that difficult, and even better, there are lots of implementations already provided for us in the .NET framework.

I find myself using the LINQ extension methods all of the time.  It's so easy to just drop in a Where() method to do a filter of a collection, or an OrderBy() to sort the list, or a Single() to pick out a specific item. And all we need to take advantage of this functionality is a class that implements IEnumerable<T> and a good understanding of lambda expressions (but we've already got that, right?).

Hopefully you are as excited about IEnumerable<T> as I am (or at least a bit more interested than you were when we started).  Learning the features included in the .NET framework lets us take advantage of extremely powerful functionality that is already there just waiting for us.

Happy Coding!

Thursday, May 24, 2012

Next, Please! - A Closer Look at IEnumerable (Part 4 - Strategic Sequence)

This is a continuation of "Next, Please! - A Closer Look at IEnumerable".  The articles in the series are collected together here: JeremyBytes - Downloads.

Last time, we created the IntegerSequence and FibonacciSequence classes that both implemented IEnumerable<T>.  We saw how we could implement these with separate classes that implement IEnumerator<T> (IntegerEnumerator and FibonacciEnumerator) or with the "yield return" statement.

This time, we'll see how we can use the Strategy pattern to create a single IEnumerable<T> to which we can pass a specific sequence algorithm.  This will allow us to easily add other sequence algorithms in the future.

The sample code can be downloaded from here: JeremyBytes - Downloads.

The Strategy Pattern
The Strategy pattern is one of the Gang of Four design patterns.  Here is the GoF description of the Strategy pattern:
"Define a family of algorithms, encapsulate each one, and make them interchangeable.  Strategy lets the algorithm vary independently from clients that use it." [Gamma, Helm, Johnson, and Vlissides. Design Patterns. Addison-Wesley, 1995.]
This means that rather than having the algorithm hard-coded with the class, it is passed in (often as a constructor parameter).  This allows us to create different algorithms that are all interchangeable and can be used with the same class.

Benefits of Strategy
One of the benefits of the Strategy pattern is that it passes control of which algorithm will be used from the class to the client using the class.  This means that the client is responsible for picking which strategy to use and then passing it to the class that will use it.  We'll see how this works in just a bit.

The Strategy pattern also allows us to create multiple implementations of the same behavior.  This means that we can have different algorithms that ultimately perform the same function but have different priorities -- perhaps optimizing for speed or memory usage, depending on what is important to the client.  For example, a mobile application may want an algorithm designed to minimize network usage while an intranet application wants to maximize speed without a concern for usage of the internal high-speed network.

Consequences of Strategy
As we've discussed previously (Design Patterns: Understand Your Tools), design patterns have both pros and cons.  One of the consequences of the Strategy pattern is that the client must be aware of the different strategies that are available.  Often in non-strategy implementations, the client passes a string or enum to select an algorithm, but it has no direct knowledge of the algorithms themselves.  When using the Strategy pattern, the client is responsible for instantiating the strategy class, and so it must be aware of what classes are available.

Another consequence is that we generally end up with an increased number of classes.  As always, we need to weigh the benefits and consequences before we decide to implement a specific pattern.

Strategy and the Sequence Classes
What we saw in Part 3 is that the IntegerSequence and FibonacciSequence classes varied only in terms of the specific IEnumerator<T> concrete type that they "new" up.  This makes it a good candidate for us to create a more generic class and then pass in that concrete type.

The client will need to be aware of the various concrete types.  This is okay in our case because the client was already responsible for choosing the specific IEnumerable<T> concrete type.  With regard to having more objects, we are willing to accept this since we'll have isolated and easily-interchangeable classes.

As a reminder, the Strategy Pattern is a design pattern; it does not specify a particular implementation.  The implementation we use here is just one way to implement the pattern.  There are countless others.

The StrategicSequence Class
Let's do some planning before we implement the StrategicSequence class.  Let's start by looking at the IntegerSequence class from the last article:


To make this conducive to the Strategy pattern, we'll do a couple of things.  First, we will create a private variable for the strategy (which will mirror the IntegerEnumerator in this case).  Then we will update our constructor to accept that strategy as a parameter.

Things get a little more complicated here: we need to figure out how to get the "NumberOfValues" value into the strategy.  We could do this with the strategy constructor (like we have above in the IntegerEnumerator constructor), but this may not be the best choice.  What we will do instead is create a "NumberOfValues" property in the strategy that we can use to set this value.

IStrategicEnumerator Interface
Since each of our strategy classes needs to have a property for "NumberOfValues", it makes sense for us to create an interface that includes this property.  In the sample solution, the SequenceLibrary project contains a folder called StrategicEnumerators.  We'll add a new class under this folder: right-click the folder, select "Add", then "Class", then type "IStrategicEnumerator.cs" in the dialog.

Then we'll update the file as follows:


Notice that IStrategicEnumerator is an interface that also specifies the IEnumerator<int> interface.  This is so that we will get all of the properties and methods contained in IEnumerator<T> in addition to the NumberOfValues property.

IntegerStrategy Class
Next, we'll add our first strategy class that will implement the IStrategicEnumerator interface.  This will be "IntegerStrategy" that will mirror the functionality of the IntegerEnumerator class that we implemented last time.  (Note: if this were a "real life" application, we would probably just re-purpose the IntegerEnumerator class.  But in this case, we'll leave the original in tact so we can refer to the code if we want.)

Add the "IntegerStrategy" class under the StrategicEnumerators folder, add "using System.Collections", and specify the IStrategicEnumerator interface:


Just like last time, we'll right-click on "IStrategicEnumerator" and select "Implement Interface" to let Visual Studio stub out the properties and methods for us.  Note that the NumberOfValues property is implemented as well as the properties and methods of IEnumerator<T>.

After a little re-ordering, and converting "NumberOfValues" to an automatic property, we end up with the following:


For our implementation, we can copy from the IntegerEnumerator class that we created previously.  Here are the fields and properties (including "_position"):


And here are the methods:


Other than "NumberOfValues" being a property rather than a field, all of the other code is the same.

StrategicSequence Class
Now that we have our first strategy class, let's create the class that can use it.  The StrategicSequence class will allow us to pass in whichever concrete strategy that we want.

Follow these steps:
  1. Add the new "StrategicSequence" class to the root of the project.
  2. Add "using System.Collections".
  3. Add "using SequenceLibrary.StrategicEnumerators" (where IStrategicEnumerator is located).
  4. Make the class public.
  5. Specify that StrategicSequence implements "IEnumerable<int>".
  6. Implement the IEnumerable<T> interface.
That gets us a good start.  Let's look at the completed code:


Let's compare this to the original IntegerSequence.  First, we have a new private field to hold the strategy.  Next, we have the NumberOfValues property (just like before).  Then, the constructor takes both a strategy and the number of values, and it updates the appropriate class values.

GetEnumerator() is a bit different.  Here, we set the NumberOfValues property of the strategy based on the NumberOfValues property of our StrategicSequence class.  Then we return the strategy -- this is valid since the strategy implements the IEnumerator<int> interface.

This gives us a working sequence class that implements the Strategy pattern.

Using the StrategicSequence Class
Now let's flip over to the console application to see the output of our class.  We'll need to add a using statement for "SequenceLibrary.StrategicEnumerators" (since that is where our strategy class is).

Here's our updated code:


First, we create an instance of our strategy by calling "new IntegerStrategy()".  Then we create a new StrategicSequence and pass in our strategy and the number of values.  The rest of our code is exactly the same as before.

And we get the expected output:


But the advantage of using the strategy pattern is that we can create multiple algorithms that are all interchangeable.

Additional Strategies
Now that we get the idea, we can create additional strategies.  We won't go through all of the code details here since the implementations will mirror what we did with the IntegerStrategy.  You can check the download for the completed code.  The "Starter" solution contains these strategies in the "StrategicEnumerators" folder.  You can add them to the SequenceLibrary project, by right-clicking on the StrategicEnumerators folder, select "Add", then "Existing Item" and then locate the 2 files: FibonacciStrategy.cs and SquareStrategy.cs.

FibonacciStrategy contains the algorithm for the Fibonacci Sequence.  Again, this is very similar to the FibonacciEnumerator class that we implemented previously (with a few updates such as changing the _numberOfValues field to the NumberOfValues property).

SquareStrategy contains an algorithm that returns a series of squares (by returning "_position * _position").

Error Handling
The FibonacciStrategy and SquareStrategy classes contain a check ensure that we do not overflow Int32.  As we saw last time, this occurs after 47 values in the Fibonacci Sequence.  For the sequence of squares, this occurs after about 50,000 values.  We don't have to worry about this error check for the IntegerStrategy since there is a one-to-one relationship between the returned integer value and the total number of values returned (meaning, we can't overflow the return value without overflowing the NumberOfValues first).

As a side note, the reason that we are using "int" for the Fibonacci Sequence instead of a "long" is so that we will have the same generic type ("int") across all of our strategies.  We could change this to a "long" for all of our classes if we wish; I left it as "int" for simplicity.

Using the FibonacciStrategy
In order to use the FibonacciStrategy in our console app, we only need change the concrete type for our IStrategicEnumerator:


Which gives us the following output:


It is just as easy to switch this over to the "SquareStrategy".

Benefits of the StrategicSequence
We get a few different benefits from implementing the Strategy pattern in our StrategicSequence class.  First, our sequence class has been made extensible by abstracting out the algorithm (the "strategy").  This means that we can easily create new strategies for other sequences -- such as a a prime number strategy or even a random number strategy.

Next, the client (the console application in our case) gets to decide which strategy will be used.  Even though our console application is making this decision at compile time, we can easily move this decision to run time. Picture the following application:


In this case, we have a set of radio buttons that let us pick which algorithm we want to use to generate our sequence.  Based on this selection, the client will "new" up the appropriate strategy and pass it to the StrategicSequence class.  (Note: this application is not included in the code download; it is left as an exercise for the reader.)

Although the Strategy pattern is designed around having multiple strategies available for the client to select from, we can also think about how our code would work when using an Inversion of Control (IoC) container.  In this scenario, the container would be responsible for instantiating the appropriate strategy based on configuration (similar to how we handled dynamically loading a repository in the final sample of IEnumerable, ISaveable, IDontGetIt: Interfaces in .NET).  Then the strategy will be passed to the sequence class.

Things to Think About
There are a couple of things to think about regarding our implementation of the Strategy pattern.  The biggest concern is that the NumberOfValues property in the strategy is a publicly exposed property.  This means that we can alter this value directly in our client application -- including while the sequence is being generated.

Consider the following:


This results in the sequence returning 5 values instead of the originally-specified 12.

Another option would be to make NumberOfValues a constructor parameter (similar to our original FibonacciEnumerator class).  In this scenario, NumberOfValues could be a private field of the strategy class rather than a publicly exposed property.  It would be used as follows:


The advantage to this implementation is that we can no longer modify the NumberOfValues while the sequence is being iterated.  But there are a couple of disadvantages.  First, the usage is not as intuitive -- instead of passing the NumberOfValues to the StrategicSequence constructor, we pass it to the strategy object constructor.  This strikes me as being the wrong place for this value; it makes more sense to tell the sequence how many values you want rather than telling the strategy.

Another disadvantage is that you cannot put a constructor into an interface.  This means that by using interfaces, we cannot enforce that each strategy class would include a constructor with 1 parameter.  This would become a convention rather than something that would cause a compiler error.

Because of these disadvantages, I decided to go with the first option (the publicly exposed property).  I am willing to risk the possibility that someone would modify the sequence length based on the other options.  I'm sure that there are other options as well.  Be sure to drop me a note if you have a better implementation.

Next Time
Today, we took a look at the Strategy pattern and how we can use it to create a StrategicSequence.  This shifts the control of what type of sequence is generated to the client.  We also have an easy way to plug in new sequence algorithms.  Since StrategicSequence implements IEnumerable<T>, we still get all of the advantages of that interface.

Next time, we'll review the IEnumerable<T> interface, the advantages that the interface provides, and what we've done with various implementations of the interface.  Until then...

Happy Coding!

Wednesday, May 23, 2012

Next, Please! - A Closer Look at IEnumerable (Part 3 - Fibonacci Sequence)

This is a continuation of "Next, Please! - A Closer Look at IEnumerable".  The articles in the series are collected together here: JeremyBytes - Downloads.

Last time, we took a look at Extension Methods and the added functionality we get from LINQ when using classes that implement IEnumerable<T>.  Today, we'll create our own class that implements IEnumerable<T> (a couple of classes, actually).  We'll see what it takes to implement the interface, a really cool shortcut that we can use, and how we can use LINQ with our newly created classes.

Implementing IEnumerable<T>: IntegerSequence
A few articles back, we explored using the BackgroundWorker Component with MVVM.  While the article did not focus on it, our Model (the ProcessModel object) implemented the IEnumerable<T> interface.  This was so that we could use the model with a foreach loop to simulate a long-running process.

Today, we'll be creating a very similar object: the IntegerSequence.  When we use this class with a foreach loop, it will return a sequence of consecutive positive integers.  A property (NumberOfValues) will specify how many integers should be part of the sequence.  The sample code can be downloaded here: JeremyBytes - Downloads.  Note: the code download includes the sample code for this article as well as the following article, so if you open up the sample projects, you will see some items that we will discuss next time.

Project Setup
The download contains both "Starter" and "Completed" solutions.  If you open the "Starter" solution, you can follow along with the steps here.  The "Completed" solution has all of the code already in place.  Each solution contains 2 projects: SequenceLibrary contains our classes; SequenceLibrary.ConsoleApp contains a console application that we will use to view the output.  So far, both of these projects are nearly empty.  So, let's write some code!

IntegerSequence Class
We'll start by adding a new class to the SequenceLibrary project called "IntegerSequence".  (Just right-click the project, select "Add", then "Class", then type "IntegerSequence.cs" in the dialog.)

First steps, add "using System.Collections" at the top of the file (we'll see why in just a bit).  Then make the class "public" and specify that it implements "IEnumerable<int>":


Visual Studio will help us implement the interface (we've seen this previously: Updating Interface Implementation).  All we need to do is right-click on "IEnumerable", select "Implement Interface" and then "Implement Interface" again.  This will stub out the interface members for us:


This gives us the 2 members that we saw in Part 1: the generic version of GetEnumerator(), and the non-generic version of GetEnumerator().  Notice that the non-generic version (the 2nd one) uses explicit implementation of the IEnumerable (non-generic) interface.  For more information, refer to Explicit Interface Implementation.

The reason that we added "using System.Collections" is because IEnumerable (non-generic) is in this namespace.  Without this using statement, the non-generic versions of IEnumerable and IEnumerator would need to be fully qualified.

But we have a problem: GetEnumerator needs to return IEnumerator<int>.  That means that we need an class that implements this interface.  (Well, technically, there's a shortcut that we'll see a little later; let's run through the full implementation for now.)

IntegerEnumerator Class
Now we'll add the IntegerEnumerator class that will implement IEnumerator<int>,  Let's add a new class to the project: right-click the project, select "Add", then "Class", then type "IntegerEnumerator.cs" in the dialog.  Add "using System.Collections" at the top of the file; then make the class "public" and specify that it implements "IEnumerator<int>":


Then we'll run through the same steps as above to implement the interface.  I've re-arranged the methods just a little bit (to keep the 2 "Current" properties together and move the "Dispose()" method to the end):


Again, we'll see the properties and methods that we saw in Part 1.  Let's run through the logic: we need to iterate through integer values starting with 1 and ending with the total number of values.  To do this, we'll need a couple of internal fields.

Let's start by adding the fields, a constructor, and the "Current" properties:


"_numberOfValues" will tell us when to end our enumeration, and we'll initialize this with a constructor.  "_position" will hold the current position in the enumeration.

Notice that we have 2 "Current" properties: one of type "int" and one of type "object".  The "int" version returns the value of the "_position" field.  The "object" version returns the "int" version of "Current".  This makes IEnumerator<T> backward compatible with IEnumerator.  Although we could return "_position" for both calls to "Current", it's generally considered a best practice to have the non-generic version simply reference the generic version (we'll see this again with IEnumerable<T> and IEnumerable).

The MoveNext() method is where we do the work for our iterator.  Here is MoveNext() along with Reset() and Dispose():


MoveNext() increments the _position field (which defaults to 0 since it is an integer), this will have the effect of making the "Current" value one unit higher than the previous value.  As a reminder, MoveNext() needs to return true or false depending on whether it was successful.  When it hits the end of the enumeration, it needs to return false.  To accomplish this, we compare the current _position to _numberOfValues.  These would be equal for the last value since we're doing a simple integer sequence.

Reset() simply puts the _position back to its initial state.  Dispose() allows us to free any resources that we might be using.  In our case, we don't have anything that needs to be disposed, so we'll just call the Garbage Collector's SuppressFinalize() method.  (This technically isn't necessary, but it will save an extra iteration of the garbage collector.)

Now that we have a fully-implemented enumerator, we can go back to our IntegerSequence class.

Finishing IntegerSequence
As we noted initially, we need to have a NumberOfValues property.  Let's add that property and create a constructor to initialize the value.  Then we just need to implement the GetEnumerator methods:


The generic version of GetEnumerator() will create a new IntegerEnumerator, passing in the NumberOfValues property.  The non-generic version of GetEnumerator simply calls the generic version of GetEnumerator.  Just like we saw with "Current" (above), this keeps our IEnumerable<T> compatible with IEnumerable.

We now have a working implementation of IEnumerable<T>!

Using IntegerSequence
To use IntegerSequence, we'll flip over to our SequenceLibrary.ConsoleApp project.  The "Starter" project already has a reference to the SequenceLibrary project and has the SequenceLibrary namespace in the "using" section.

So, let's "new" up an IntegerSequence, run it through a foreach loop, and output the results:


And the output:


Since IntegerSequence implements IEnumerable<T>, we can use it in a foreach loop.  But, we can also use the IEnumerable<T> extension methods that we saw in Part 2.

Let's say that we only want to output the even numbers between 1 and 12.  We can accomplish this by using the "Where()" extension method to filter our results:


Note how we can simply use the "Where()" method with the sequence variable where our foreach is defined.   The lambda expression (the parameter of the Where) uses the modulo operator (%) to see if the current value (i) is evenly divisible by 2.  If so, then the number is even, and it will be included.

The Shortcut: yield return
There is a much shorter way that we can implement IEnumerable<T>: using the "yield return" statement.  "yield return" implements an enumerator and saves the current state of the method.  When the next item is asked for, then the method continues where it left off.

This sounds a little confusing, so let's look at a sample.  Back to our IntegerSequence:


Take a look at the GetEnumerator() method.  Instead of explicitly returning an IEnumerator<T>, we use the "yield return" statement.  The "while" loop has the same effect as "MoveNext" from our IntegerEnumerator class (by comparing the current position to the total number of values).  The "return ++position" statement has 2 purposes.  First, it pre-increments the position variable, then it returns that value.  Since we have the "yield" statement, the state of the position variable is preserved.

Notice that with this version, we no longer need the IntegerEnumerator class.  But, since we went through the exercise of explicitly implementing the IEnumerator<T> interface, we have a much better idea of what the "yield return" statement is actually doing.

Implementing IEnumerable<T>: FibonacciSequence
The IntegerSequence class is pretty trivial.  Since it simply returns sequential values, it could easily be replaced by a "for" loop.  So, let's take a look at something a little more complicated: implementing the Fibonacci Sequence.

The Fibonacci Sequence has the following values: 1, 1, 2, 3, 5, 8, 13, 21...

I won't go into the mathematical definition (you can look that up if you're curious).  Each number in the sequence is calculated by adding the 2 previous values.  So, 1 + 1 = 2, 1 + 2 = 3, 2 + 3 = 5, 3 + 5 = 8, and so forth.

We'll start with the explicit implementation (both IEnumerable<T> and IEnumerator<T>), then we'll switch it to use the "yield return" version.  (There's a reason for doing the full implementation first; it will lead us directly into the next article.)

FibonacciSequence Class
Just like with the IntegerSequence class, we'll create a new class (calling the file "FibonacciSequence.cs"), add "using System.Collections", stubbing out the implementation of IEnumerable<T>, adding the NumberOfValues property and the constructor.


This looks almost exactly like our IntegerSequence class.  Before we can continue, we need to create the enumerator class.

FibonacciEnumerator Class
Next, we'll create a FibonacciEnumerator class.  We'll run through the same initial steps as with the IntegerEnumerator class: create a new class, add "using System.Collections", add the IEnumerator<int> interface, and stub out the implementation:


Since calculating the Fibonacci Sequence is a bit more complicated than simply incrementing integers, we'll need a few more fields to handle our calculations:


We have "_numberOfValues" (just like before) to keep track of how many values we will enumerate.  "_position" will keep track of our current position in the sequence so that we'll know when we are done.  "_previousTotal" and "_currentTotal" will be used for our calculations.  Notice that the "Current" property returns the "_currentTotal" field.

Reset() and Dispose() are pretty straight forward:


MoveNext() is a bit more complicated:


The "if (_position == 0)" condition will handle our initial state (before we have 2 previous values to add together).  The "else" will handle all the other numbers by adding together the _currentTotal and _previousTotal (the last 2 numbers), and then updating the _previousTotal and _currentTotal to the new values.

Our return value determines whether we have completed our enumeration.  Now we can complete our FibonacciSequence.

Finishing FibonacciSequence
Now we can fill in the the GetEnumerator() methods of the FibonacciSequence class:


Using FibonacciSequence
Now, we can update the console application to use the FibonacciSequence class:


And the output:


We can use the extension methods with the FibonacciSequence as well.  We could return every other value, or only values that are divisible by 3, or any number of creative uses of the extension methods.

Error Handling
Because of the nature of the Fibonacci Sequence, we may run into some issues.  Consider what happens if we create our sequence with 50 values: "new FibonacciSequence(50)" instead of 12.  Here's the output:


It looks like we have a problem toward the end of our values.  Why do we have negative numbers?  Those should not be part of the Fibonacci Sequence.

Our problem is that we are overflowing the standard Int32 that we are using for our values.  The overflow is surfacing as the negative values (since int is signed).  And once we try to continue from there, our values are completely useless.

We'll take care of this by doing a check in our FibonacciEnumerator class.  If we have a possible overflow in the MoveNext() method, we will throw an exception:


What we do is convert the _previousTotal and _currentTotal to long integers, and then see if adding them together is greater than Int32 can hold.  If so, then we throw an exception.

This means that we need to handle this exception in our console app:


Here, we're checking for the OverflowException and printing the message to the console.  Here's the output:


So, now we have some appropriate error handling to make sure that we do not get inappropriate values.  Note: this limits the valid values of our FibonacciSequence class to 47 (before we get an overflow).  If we wanted to make this class more robust, we could update it to use "long" instead of "int".  But we'll keep this as "int" for now; this will make more sense when we look at Part 4.

FibonacciSequence with yield return
Finally, we can update the GetEnumerator() function with the "yield return" statement to eliminate the separate FibonacciEnumerator class.  This is a bit more complex than the previous implementation but still not too difficult.  Basically, we combine the MoveNext() and Current into a single method:


Now we have local variables for _previousTotal and _currentTotal.  We use a "for" loop to return the right number of values.  The rest is similar to what we had in the MoveNext() method.

Next Time
This time, we took a look at how to implement IEnumerable<T> and IEnumerator<T>.  We saw that when we create classes that implement IEnumerable<T>, we can use foreach and the LINQ extension methods.  We also saw how we can use the "yield return" statement to eliminate the need for a separate enumerator class.

But take another look at our IteratorSequence and FibonacciSequence when we *are* using the separate IEnumerator class:



Notice anything about these two classes?  They are exactly the same with the exception of what specific enumerator class they "new" up in the GetEnumerator() method.  This strikes me as a great place for us to add an abstraction.  Using the Strategy Pattern, we can create a generic Sequence class to which we can pass a specific enumerator class.  Next time, we'll take a look at the Strategy Pattern and how to implement it with these classes.

Happy Coding!

Tuesday, May 22, 2012

Next, Please! - A Closer Look at IEnumerable (Part 2 - Extension Methods and LINQ)

This is a continuation of "Next, Please! - A Closer Look at IEnumerable".  The articles in the series are collected together here: JeremyBytes - Downloads.

Last time we took a look at the Iterator Pattern and how IEnumerable<T> lets us use the "foreach" loop.  IEnumerable<T> also gives us access to a huge number of extension methods that make up LINQ (Language INtegrated Query).

Extension Methods
Extension methods allow us to add methods to a class -- with no subtyping required.  This means that we can even add methods to sealed classes, and the classes will behave as if our methods are native to the class.  (Technically, we are not adding methods to the class, but the usage appears that way.)  To explore extension methods a little further, we'll take a look a quick sample.  The source code is available here: Quick Byte: Extension Methods.

In this sample, we will add an extension method to IEnumerable<T>.  Here's the definition of our method:


The idea behind this method is that we take a collection (any collection that implements IEnumerable<T>), pass in a delimiter (such as a comma or pipe), and the method will output a string with all of the elements delimited with the specified value.

On its face, this method should be pretty easy to figure out.  First, it is a public static method (ToDelimitedString<T>()) that is part of a public static class (JBExtensions).  The only strange part of this method is that there is the "this" keyword before the first parameter (IEnumerable<T> input).  We'll come back to this in just a bit.

We can use this method just like we would use any other static method:


This is a button click event in a simple WPF application.  First, we create a variable called "months" which is a list of strings.  The Months.GetMonths() method will populate this list with the 12 calendar months.  Next, we populate a text box based on our static method.  Notice that we call this with the static class (JBExtensions), then the static method (ToDelimitedString<string>).  There are 2 parameters: our list (months) and the delimiter (a comma and a space).  This results in the following output:


But remember how we described extension methods: the ability to add a method to a class (or at least appear to do so).  Because we used the "this" keyword before the first parameter, we can treat the method as if it were a method on that first parameter.  In our example, this means that it would behave as if the "ToDelimitedString" method is part of the "months" class.  Here's what that looks like:


Now instead of calling "JBExtensions.ToDelimitedString...", we call "months.ToDelimitedString...".  The parameters of the method are then any remaining parameters (after the first one).  In this case, we have the delimiter parameter.  If we compare the syntax to what we had before, this version is much more readable.  It is clear what is happening: we are operating on the "months" object by calling "ToDelimitedString" with a delimiter as the parameter.  The great thing about Visual Studio is that we get full IntelliSense as well.  When we type "months." we will see "ToDelimitedString" in the list of methods that are available (assuming that all of the requirements for an extension method are met).

Here are the requirements:
  • Extension methods must be public static methods in a public static class.  The class name itself is unimportant.
  • Extension methods are declared by including the "this" keyword in front of the first parameter.  The "this" keyword can only be used with the first parameter.
  • Extension methods are used by including the namespace of the public static class in the scope where the methods are to be used.  This means that extension methods can be collected in a shared library that is used across projects, if desired.
That's all there is to it.

IEnumerable<T> and LINQ
So, why all of this talk about extension methods?  Well, it turns out that much of Language INtegrated Query (LINQ) is implemented as extension methods on IEnumerable<T>.  This gives us a ton of really cool functionality that we can use with our collections.  The one qualification is that we need to include a using statement for "System.Linq" in order for the extension methods to be available (the 3rd bullet point, above).  The good news is that Visual Studio includes System.Linq as part of the default "using" statements for most code-file types.

LINQ offers us multiple syntaxes for implementation.  The first is using Query Syntax.  This syntax looks a lot like a SQL query (except the "select" is at the end instead of the beginning).  Here's a sample:


The "people" variable is the same one we saw last time.  It is a list of Person that is populated by the static "GetPeople" method (just a list of hard-coded values).  If you are familiar with SQL queries, then the second statement should look pretty familiar.  We are looking in the "people" object, doing a filter based on the FirstName property, and sorting the records based on the StartDate property.

If we take a look at the members of the IEnumerable<T> interface, we will see all of the Extension Methods: IEnumerable<T> Interface.  And when I say that there is a ton of functionality, this includes over 50 unique extension methods -- and several of those extension methods have multiple overloads.

By using these methods directly, we can implement LINQ by using what is often called Fluent Syntax.  Here's what the same query looks like using the fluent syntax:


The "Where" extension method takes an IEnumerable<T> and returns an IEnumerable<T>.  This means that we can keep concatenating the extension methods, so we end up with people.Where().OrderBy().  As you can imagine, this could make our code lines quite long.  Fortunately, C# allows us to add line breaks before the dots:


You can see that this is a bit more legible.  One thing to point out about LINQ extension methods is that they generally take lambda expressions in the parameters.  For a full discussion of Lambda Expressions and LINQ, please refer to Learn to Love Lambdas.

When trying to decide between Query Syntax and Fluent Syntax, there are a few things to note.  First, not all of the extension methods have Query Syntax keywords.  The basics are there (from, where, orderby, join, groupby, select), but others are not (such as First(), Single(), Count(), Average()).  This means that there are times when you may need to mix the Query Syntax and Fluent Syntax.  And that's okay.  Ultimately, which syntax you use is up to you.  Personally, if you are comfortable with lambda expressions, I think that the Fluent Syntax is easier to read and work with.  But that's just my preference.

Look at the IEnumerable<T> Extension Methods
Look at the extension methods that LINQ provides.  No, really.  Look at the Extension Methods.  There are a ton of useful methods in there.  They aren't all just "query" type methods (where, order by, grouping, etc.).  They also include aggregations such as average and count, and scalar types such as min, max, first, and last.  So, look through the list.

Here's a sample of how the extension methods can make your code much more concise.  This sample is taken from Introduction to Data Templates and Value Converters in Silverlight (this works in WPF and Windows Phone as well).  Here is the output:


The Person class includes a Rating field, which is an integer value from 0 to 10.  For the UI above, we want to take the value of the "Rating (Stars)" text box and count the number of asterisks that it contains.  We want to ignore any character that is not an asterisk.  Here is the "traditional" way of doing this (this is located in the "RatingStarConverter" in the "Converters.cs" file):


This takes the incoming value (from the text box), and assigns it to the "input" variable.  Then it uses a foreach loop to iterate through the characters of that "string" (remember from last time that string implements IEnumerable<char>, so we can use it with foreach).  If the current character is an asterisk, then we increment our rating value.

But the IEnumerable<T> extension methods give us a much quicker way of writing this: Count().


This code does exactly the same thing as the foreach loop.  The Count() method takes an optional predicate parameter that lets us set a condition on the items that we want to count.  In this case, we only count the characters that are asterisks.

There are tons of other useful methods.  Some of my favorites are Average, Count, Skip, SingleOrDefault, OrderBy, Except (the opposite of Where), and Sum.  Pretty much all of these methods use Func<> of some type in the declaration.  Wherever you see "Func<>", treat it as a big sign that says, "PUT YOUR LAMBDA EXPRESSION HERE."  Of course, the more comfortable you are with delegates and lambda expressions, the better off you'll be.  You can refer to Learn to Love Lambdas and Get Func<>-y: Delegates in .NET for additional information and samples.

The extension methods on the IEnumerable<T> interface make this an extremely powerful interface.  By simply including the System.Linq namespace, we get over 50 additional functions that can make our code more concise and readable.

Next Time
Today, we took a look what extension methods are and how we can use them to add functionality to existing classes.  Then we saw how LINQ adds a huge number of extension methods to the IEnumerable<T> interface.  These extension methods let us harness the power of LINQ in any of our collections (or other classes that implement IEnumerable<T>).

Next time, we'll create our own class that implements the IEnumerable<T> interface.  This will give us a chance to explore the interface in a bit more detail and to see exactly what we need to do in order to create our own enumerable classes.

Happy Coding!