Friday, February 11, 2011

C# 3.0 and LINQ

In my previous Post (an Introduction to LINQ), I described LINQ as a language agnostic technology. LINQ can work in Visual Basic just as well as it can work in C#. However, both languages introduced new features to facilitate and enhance the LINQ experience. In this article, we want to investigate the new features of C# 3.0. While the C# designers put together many of these features with LINQ in mind, the features can also be used outside of query expressions to enhance other areas of code.

Extension methods

Extension methods allow us to create the illusion of new methods on an existing type - even if the existing type lives in an assembly outside of our control. For instance, the System.String class is compiled into the .NET framework as a sealed class. We can't inherit from the String class, and we can't change the String class and recompile the framework. In the past, we might package an assortment of commonly used string utility methods into a static class like the following.
// Extending string via static methods (pre C# 3.0)public static class StringUtils{
    
static public double ToDouble(string data)
    {
        
double result = double.Parse(data);
        
return result;
    }
}
// Usage:
string text = "43.35";double data = StringUtils.ToDouble(text);
With C# 3.0, we can make our ToDouble method appear as a member of the String class. When the first parameter of a method includes a this modifier, the method is an extension method. We must define extension methods inside a non-generic, static class, like the StringExtensions class shown below.
// extending String via extension methodspublic static class StringExtensions{
    
static public double ToDouble(this string data)
    {
        
double result = double.Parse(data);
        
return result;
    }

    
static public int ToInt32(this string data)
    {
        
int result = Int32.Parse(data);
        
return result;
    }
}
StringExtensions defines two extension methods. We can now invoke either method using instance method syntax. It appears as if the methods are members of System.String, and we don't need to use the name of the static class.
string text = "43.35";double data = text.ToDouble();
Despite appearances, extension methods are not members of the type they are extending. Technically, the ToDouble method we are using is still just a static method on another type, but the C# compiler can hide this fact. It is important to realize that extension methods have no special privileges with respect to member access of the target object. Our extension methods cannot access protected and private members of the target object.

Extension Method Resolution

An extension method is available to call on any object instance that is convertible to the type of the method's first parameter. For instance, the following extension method is available for any object that is convertible to IList.
public static class IListExtensions{
    
public static void Randomize(this IList input)
    {
        
if (input == null)
            
return;

        
for (int i = input.Count - 1; i > 0; i--)
        {
            
int n = rand.Next(i + 1);
            T temp = input[i];
            input[i] = input[n];
            input[n] = temp;
        }         
    }

    
static Random rand = new Random();
}
We can use the extension method with an array of strings, as an array implements the IList interface. Notice we need to check for null, as it is legal and possible for a program to invoke an extension method on a null object reference.
void RandomizeArray()
{
    
string[] cities = { "Boston", "Los Angeles",
                        
"Seattle", "London", "Hyderabad" };
    cities.Randomize();

    
foreach (string city in cities)
    {
        
Console.WriteLine(city);
    }
}
An extension method cannot override or hide any instance methods. For instance, if we defined a ToString extension method, the method would never be invoked because every object already has a ToString method. The compiler will only bind a method call to an extension method when it cannot find an instance method with a usable signature.
An extension method is only available when you import the namespace containing the static class that defines the extension method (with a using statement). Although you can always invoke the extension method via the namespace qualified class name (Foo.StringExtensions.ToDouble()), there is no mechanism available to reach a single extension method using an instance syntax unless you import the method's namespace. Once you import the namespace, all extension methods defined inside will be in scope. For this reason you should be careful when designing namespaces to properly scope your extension methods (see the LINQ Framework Design Guidelines).

Extension Methods and LINQ

LINQ defines a large number of extension methods in the System.Linq namespace. These extension methods extend IEnumerable and IQueryable to implement the standard query operators, like Where, Select, OrderBy, and many more. For instance, when we include the System.Linq namespace, we can use the Where extension method to filter an enumerable collection of items.
private static void UseWhereExtensionMethod()
{
    
string[] cities = { "Boston", "Los Angeles",
                        
"Seattle", "London", "Hyderabad" };

    
IEnumerable<string> filteredList =
        cities.Where(
delegate(string s) { return s.StartsWith("L"); });

    
// filteredList will include only London and Los Angeles}
The definition for Where would look like the following:
namespace System.Linq
{
    
public static class Enumerable    {
        
// ...
        public static IEnumerable Where(
                                
this IEnumerable source,
                                
Funcbool> predicate)
        {
            
// ...        }
     
        
// ..    }
}
As you can see, the second parameter to Where is of type Func. This parameter is a predicate. A predicate is a function that the Where method can invoke to test each element. If the predicate returns true, the element is included in the result, otherwise the element is excluded. Just remember - different LINQ providers may implement the filtering using different strategies. For example, when using the LINQ to SQL provider with an IQueryable data source, the provider analyzes the predicate to build a WHERE clause for the SQL command it sends to the database – but more on this later.
We built the predicate by constructing a delegate using an anonymous method. It turns out that little snippets of in-line code like this are quite useful when working with LINQ. So useful, in fact, that the C# designers gave us a new feature for constructing anonymous methods with an even more succinct syntax - lambda expressions.

Lambda Expressions

In the days of C# 1.0, we did not have anonymous methods. Even the simplest delegate required construction with a named method. Creating a named method seemed heavy-handed when you only needed a one-line event handler. Fortunately, C# 2.0 added anonymous methods. Anonymous methods were perfect for simple event handlers and expressions – like the predicate we constructed earlier. Anonymous methods also had the advantage of being nested inside the scope of another method, which allowed us to pack related logic into a single section of code and create lexical closures.
C# 3.0 has added the lambda expression. Lambda expressions are an even more compact syntax for defining anonymous functions (anonymous methods and lambda expressions are collectively referred to as anonymous functions). The following section of code filters a list of cities as we did before, but this time using a lambda expression.
string[] cities = { "Boston", "Los Angeles",
                    
"Seattle", "London", "Hyderabad" };
IEnumerable<string> filteredList =
    cities.Where(s => s.StartsWith(
"L"));
The distinguishing feature in this code is the "goes to" operator (=>). The left hand side of the => operator defines the function signature, while the right hand side of the => operator defines the function body.
You'll notice that the lambda expression doesn't require a delegate keyword. The C# compiler automatically converts lambda expressions into a compatible delegate type or expression tree (more on expression trees later in the article). You'll also notice the function signature doesn't include any type information for the parameter named s. We are using an implicitly typed signature and letting the C# compiler figure out the type of the parameter based on the context. In this case, the C# compiler can deduce that s is of type string. We could also choose to type our parameter explicitly.
IEnumerable<string> filteredList =
    cities.Where((
string s) => s.StartsWith("L"));
Parentheses are required around the function signature when we use explicit typing, or when there are zero or more than one parameter. In other words, the parentheses are optional when we are using a single, implicitly typed parameter. An empty set of parentheses is required when there are no parameters for the expression.
The right hand side of a lambda may contain an expression or a statement block inside { and } delimiters. Our city filter above uses an expression. Here is the same filter using a statement block.
IEnumerable<string> filteredList =
    cities.Where(s => {
string temp = s.ToLower();
                        
return temp.StartsWith("L");
                      });
Notice the statement block form allows us to use multiple statements and provide local variable declarations. We also need to include a return statement. Statement blocks are useful in more complex lambda expressions, but the general guideline is to keep lambda expressions short.

Invoking Lambdas

Once we assign a lambda to a delegate type, we can put the lambda to use. The framework class library includes two sets of generic delegates that can fit most scenarios. Consider the following code:
Func<int, int> square = x => x * x;Func<int, int, int> mult = (x, y) => x * y;Action<int> print = x => Console.WriteLine(x);

print(square(mult(3, 5)));
// displays 225;
The Func generic delegate comes in 5 versions. Func encapsulates a method that takes no parameters and returns a value of type TResult. Our square delegate is of type Func - a method that takes 1 parameter of type int and returns an int, while mult is of type Func - a method that takes 2 parameters of type int and returns an int. The type of the return value is always specified as the last generic parameter. Func variations are defined that can encapsulate methods with up to 4 parameters (Func).
Action delegates represent methods that do not return a value. Action takes a single parameter of type T. We use an instance of Action in the above code. There is also an Action (no parameters). Action (2 parameters), Action (3 parameters), and Action (4 parameters). We can put all these delegates to work with lambda expressions instead of defining our own custom delegate types.
Remember the definition of the Where extension method for IEnumerable? We created a lambda expression for the parameter of type Func. The implementation of the Where clause for IEnumerable executes the lambda expression via a delegate to filter the collection of in-memory objects.
What happens if we aren't operating on an in-memory collection? How could a lambda expression influence a SQL query sent to a database engine for execution? This is where IQueryable and expression trees come into play.

Expression Trees

Let's add a twist to our last example.
Expression<Func<int, int>> squareExpression =
                            x => x * x;
Expression<Func<int, int, int>> multExpression =
                            (x, y) => x * y;
Expression<Action<int>> printExpression =
                            x =>
Console.WriteLine(x);
Console.WriteLine(squareExpression);Console.WriteLine(multExpression);Console.WriteLine(printExpression);
Instead of assigning our lambdas to an Action or Func delegate type, we've assigned them to an Expression type. This code prints out the following:
x => (x * x)
(x, y) => (x * y)
x => WriteLine(x)
It would appear that the output is echoing our code. This is because the C# compiler treats Expression as a special case. Instead of compiling the lambda expression into intermediate language for execution, the C# compiler compiles the lambda expression into an expression tree.
Our code is now data that we can inspect and act on at runtime. If we look at what the C# compiler produces for the squareExpression with a tool like Reflector, we'd see something similar to the following.
ParameterExpression x;            Expression<Func<int, int>> squareExpression = Expression.Lambda<Func<int, int>>(
    
Expression.Multiply(x = Expression.Parameter(typeof(int), "x"), x),
    
new ParameterExpression[] { x });
If we wanted to invoke the expression, we'd first have to compile the expression:
Func<int, int> square = squareExpression.Compile();
int y = 3;int ySquared = square(y);
Console.WriteLine(ySquared); // prints 9
However, the real power of an expression tree is not in dynamic compilation but in runtime inspection. Again, our code is now data – a tree of Expression nodes that we can inspect. In fact, the C# samples for Visual Studio 2008 include an expression tree visualize – a plugin for the Visual Studio debugger to view expression trees. The following is a screen shot of the visualizer for our squareExpression.

Imagine writing some code that walks through the tree and inspects the types, names, and parameters. In fact, this is exactly what technologies like LINQ to SQL do. LINQ to SQL inspects the expression tree and converts the expression tree into a SQL command.
LINQ to SQL classes implement an IQueryable interface, and the System.Linq namespace adds all the standard query operators as extension methods for IQueryable. Instead of taking delegates as parameters like the IEnumerable Where method does, IQueryable will take Expression parameters. We are not passing code into IQueryable, we are passing expression trees (code as data) for the LINQ provider to analyze.
With an understanding of lambda expressions, expression trees, and extension methods, we are finally able to tackle one of the beauties of LINQ in C# - the query expression.

Query expressions

Query expressions provide the "language integrated" experience of LINQ. Query expressions use syntax similar to SQL to describe a query.
string[] cities = { "Boston", "Los Angeles",
                    
"Seattle", "London", "Hyderabad" };
IEnumerable<string> filteredCities =
    
from city in cities
    
where city.StartsWith("L") && city.Length < 15
    
orderby city
    
select city;
A query expression begins with a from clause and ends with a select or group clause. Other valid clauses for the middle of the expression include from, let, where, join, and orderby. We'll delve into these clauses in a future article.
The C# compiler translates a query expression into method invocations. The where clause will translate into a call to a Where method, the orderby clause will translate into a call to an OrderBy method, and so on. These methods must be extension methods or instance methods on the type being queried, and each has a particular signature and return type. It is the implementation of the methods, not the compiler, that will determine how to execute the query at runtime. Our query expression above would transform into the following:
IEnumerable<string> filteredCities =
  cities.Where(c => c.StartsWith(
"L") && c.Length < 15)
        .OrderBy(c => c)
        .Select(c => c);              
Since we are using an IEnumerable, these method calls are the extension methods for IEnumerable. From our earlier discussion we know that the compiler assigns each lambda expression to a delegate for the extension methods to invoke on the in-memory collection. If we were working with an IQueryable data source, compiler would be creating expressions trees for a LINQ provider to parse.
Most of the lambda expressions in this query are simple, like c => c, meaning in the OrderBy case that given a string parameter c, sort the result by the value of c (alphabetical). We could have also said orderby c.Length, which translates into the lambda expression c => c.Length, meaning given a parameter c, sort the items by the value of c's Length property.
It's important to reinforce the fact that the C# compiler is performing a translation of the query expression and looking for matching methods to invoke. This means the compiler will use the IEnumerable or IQueryable extension methods, when available. However, in our own classes we could add methods like Where or OrderBy to override the standard LINQ implementations (remember instance methods will always take preference over extension methods). We could also leave out the System.Linq namespace and write our own extension methods to replace the standard LINQ implementations completely. This extensibility in LINQ is a powerful feature.

Intermission

We've now covered all the C# features that make LINQ work. The compiler translates query expressions into method calls. These method calls are generally extension methods on the queried type. The extension method approach doesn't force the queried type to use a specific base class or implement a particular interface, and allows us to customize behavior for specific types. Sorting, filtering, and grouping logic in the query expression is passed into the method calls using lambda expressions. The compiler can convert lambda expressions into delegates, for the method to invoke, or expression trees, for the method to parse and analyze. Expression trees allow LINQ providers to transform the query into SQL, XPath, or some other native query language that works best for the data source.
There are other features in C# that aren't required for all this magic to work, but they do offer everyday conveniences. These features include type inference, anonymous types, object and collection initializes, and partial methods.

Implicit Typing and Var

C# 3.0 introduced the var keyword. We can use the new keyword to define implicitly typed local variables. Unlike the var keyword in JavaScript, the C# version does not introduce a weak, loose, or dynamically typed variable. The compiler will infer the type of the variable at compile time. The following code provides an example.
var name = "Scott";var x = 3.0;var y = 2;var z = x * y;
// all lines print "True"Console.WriteLine(name is string);Console.WriteLine(x is double);Console.WriteLine(y is int);Console.WriteLine(z is double);
Each variable has a type that the compiler deduced using the variable's initializer. For example, the compiler sees the code assign the name variable an initial value of "Scott", so the compiler deduces that name is a string. We are not able to try change this type later. The following code is full of compiler errors – things you can't do with var.
// ERROR: implicitly typed local varaibles must be initializedvar i;
// ERROR: Implicitly-typed local variables cannot have multiple declarators   var j, k = 0;
// ERROR: Cannot assign to an implicitly-typed local variablevar n = null;
// ERROR: Cannot assign lambda expression to an implicitly-typed local variablevar l = x => x + 1;
var number = "42";// ERROR: Cannot implicitly convert type 'string' to 'int'int x = number + 1;
Implicitly type variables must have an initializer, and the initializer must allow the compiler to infer the type of the variable. Thus, we can't assign null to an implicitly typed variable – the compiler can't determine the correct type – any reference type can accept the value null! Lambda expressions themselves are not associated with a specific type, we have to let the compiler know a delegate or expression type for lambdas. Finally, we can see that once the compiler has determined the type, we can't change or morph the type like we can in some dynamic languages. Thus the number variable declared in the above code will always and forever be strongly typed as a string.
For practical purposes, the var keyword is important in two scenarios. The first scenario is when we can use var to remove redundancy from our code. For instance, when constructing generic types, all the type information and angled brackets can clutter the code (Dictionary d = new Dictionary). Do we need to see the type information twice to understand the variable is a dictionarty of int and string?
While the first scenario is entirely a matter of style and personal preference, the second scenario requires the use of var. The scenario is when you don't know the name of the type you are consuming – in other words, you are using an anonymous type.

Anonymous types

Anonymous types are nameless class types. We can create an anonymous type using an object initializer, as shown in several examples below.
var employee = new { Name = "Scott", Department = "Engineering" };Console.WriteLine("{0}:{1}", employee.Name, employee.Department);

var processList =
    
from process in Process.GetProcesses()
    
orderby process.Threads.Count descending,
             process.ProcessName
ascending     select new     {
         Name = process.ProcessName,
         ThreadCount = process.Threads.Count
     };
Console.WriteLine("Process List");foreach (var process in processList)
{
    
Console.WriteLine("{0,25} {1,4:D}", process.Name, process.ThreadCount);
}
The first example in the above code constructs an anonymous type to represent an employee. Between the curly braces, the object initializer lists a sequence of properties and their initial values. The compiler will use type inference to determine the types of these properties (both are strings in this case), then create an anonymous type derived from System.Object with the read-only properties Name and Department.
Notice we can use the anonymous type in a strongly typed fashion. In fact, once we've type employee. into the Visual Studio editor, an Intellisense window will show us the Name and Department properties of the object.
We cannot use this object in a strongly typed fashion outside of the local scope. In other words, if we needed to return the employee object from the current method, the method would have to specify a return type of object. If we wanted to pass the variable to another method, we'd have to pass the variable as an object reference. We can only use the var keyword for local variable declaration – var is not allowed as a return type, parameter type, or field type. Outside of the scope of the current method, we'd have to use reflection to retrieve the property values on the anonymously typed object (which is how anonymously typed objects work with data-binding features in .NET platforms).
Anonymous types are also useful in query expressions. In the second half of the code above, we create a report of running processes on the machine. Instead of the query expression returning a Process object, which has dozens of properties, we've projected a new anonymous type with two properties: the process name and the process thread count. These types of projections are useful when creating data transfer objects or restricting the amount of data coming back in a LINQ to SQL query.
The initializers we've used are not only or anonymous types. In fact, C# has added a number of shortcuts for constructing new objects.

Initializers

The following two classes use auto-implemented properties to describe themselves.
public class Employee{
    
public int ID { get; set; }
    
public string Name { get; set; }
    
public Address HomeAddress { get; set; }
}
public class Address{
    
public string City { get; set; }
    
public string Country { get; set; }
}
Given those class definitions, we can use the object initializer syntax to construct new instances of the classes like so:
Address myAddress = new Address { City = "Hagerstown", Country = "USA" };

Employee employee = new Employee {
        ID = 1,
        Name =
"Sami",
        HomeAddress = { City =
"Sharpsburg", Country = "USA" }
    };
The initializer syntax allows us to set accessible properties and fields on an object during construction of the object. We can even nest an initializer inside an initializer, as we have done for the employee's Address property (notice the new keyword is optional, too). Closely related to the object initializer is the collection initializer.
List<Employee> employees = new List<Employee>() {
    
new Employee { ID=1, Name="...", HomeAddress= { City="...", Country="..." }},
    
new Employee { ID=2, Name="...", HomeAddress= { City="...", Country="..." }},
    
new Employee { ID=3, Name="...", HomeAddress= { City="...", Country="..." }}
};

Partial methods

Our coverage of C# features for LINQ wouldn't be complete without talking about partial methods. Partial methods were introduced into the C# language at the same time as LINQ. Partial methods, like partial classes, are an extensibility method to work with designer generated code. The LINQ to SQL class designer uses partial methods like the following.
public partial class Account{
    
public Account()
    {
        OnCreated();
    }

    
partial void OnCreated();
}
The Account class is a partial class, meaning we can augment the class with additional methods, properties, and fields with our own partial definition. Partial classes have been around since 2.0.
The partial method OnCreated is defined in the above code. Partial methods are always private members of a class, and their return type must always be void. We have the option of providing an implementation for OnCreated in our own partial class definition. The implementation is optional, however. If we do not provide an implementation of OnCreated for the class, the compiler will remove the method declaration and all method calls to the partial method.
Providing an implementation is just defining a partial class with a partial method that carries an implementation for the method.
public partial class Account{
    
partial void OnCreated()
    {
        
Console.WriteLine("Account created...");
    }
}
Partial classes are an optimization for designer generated code that wants to provide some extensibility hooks. We can "catch" the OnCreated method if we need it for this class. If we do not provide an implementation, the compiler will remove all OnCreated method calls from the code and we won't pay a penalty for extensibility that we do not use.

Summary

In this article we've covered most of the new features introduced into C# for LINQ. Although lambda expressions, extension methods, anonymous types and the like were primarily introduced to facilitate LINQ, I hope you'll find some uses for these powerful features outside of query expressions.

No comments:

Post a Comment