Introducing LINQ (3) Functional Programming

Datetime:2016-08-23 05:13:46          Topic: Functional Program           Share

Programming paradigms

Programming paradigm is the fundamental style of programming. There are many paradigms for programming , for example:

etc.

One programming language can adopt multiple paradigms. C# supports:

  • declarative programming: attributes, data annotations, code contracts, etc.
    • functional programming: first class functions, lambda expressions, LINQ query expressions, etc.
  • dynamic programming: dynamic type
  • event-driven programming: events, event handlers
  • generic programming: generics
  • imperative programming: by default
    • object-oriented and class-based programming: classes, encapsulation, inheritance, polymorphism, etc.
    • procedural programming: static class, static method, using static, etc.
  • metaprogramming: code DOM, expression tree, IL emit, compiler as a service, etc.
    • reflective programming: reflection

C# is such a powerful, flexible and productive language for general purpose, and all these C# language features live in in harmony. This tutorial discusses functional programming of C#, but other features, like generics, objects, attributes, expression trees, etc., is used a lot in functional C# code.

Imperative vs. declarative

Functional programming is declarative, and describes what to do; Object-oriented programming is imperative, and specifies how to do. To compare these 2 paradigms. The following is a declarative LINQ example that queries the types in mscorlib.dll assembly, the core library provided in .NET FCL containing the primitive and fundamental types. It needs to:

  • filter the types to get delegate types
  • group the delegate types by their namespaces
  • sort the groups by each group’s delegate type count in descending order, and if groups have identical delegate type count, then sort them by their namespaces

The following example implements this with traditional C# object-oriented programming. It is imperative. The code is a sequence of statements and commands, specifying how to execute the query:

internal static void DelegateTypes()
{
    Assembly mscorlib = typeof(object).Assembly;
    Dictionary<string, List<Type>> delegateTypes = new Dictionary<string, List<Type>>();
    foreach (Type type in mscorlib.GetExportedTypes())
    {
        if (type.BaseType == typeof(MulticastDelegate))
        {
            List<Type> namespaceTypes;
            if (!delegateTypes.TryGetValue(type.Namespace, out namespaceTypes))
            {
                namespaceTypes = delegateTypes[type.Namespace] = new List<Type>();
            }
            namespaceTypes.Add(type);
        }
    }
    List<KeyValuePair<string, List<Type>>> delegateTypesList =
        new List<KeyValuePair<string, List<Type>>>(delegateTypes);
    for (int index = 0; index < delegateTypesList.Count - 1; index++)
    {
        int currentIndex = index;
        KeyValuePair<string, List<Type>> after = delegateTypesList[index + 1];
        while (currentIndex >= 0)
        {
            KeyValuePair<string, List<Type>> before = delegateTypesList[currentIndex];
            int compare = before.Value.Count.CompareTo(after.Value.Count);
            if (compare == 0)
            {
                compare = after.Key.CompareTo(before.Key);
            }
            if (compare >= 0)
            {
                break;
            }
            delegateTypesList[currentIndex + 1] = delegateTypesList[currentIndex];
            currentIndex--;
        }
        delegateTypesList[currentIndex + 1] = after;
    }
    foreach (KeyValuePair<string, List<Type>> namespaceTypes in delegateTypesList) // Output.
    {
        Trace.Write(namespaceTypes.Value.Count + " " + namespaceTypes.Key + ":");
        foreach (Type delegateType in namespaceTypes.Value)
        {
            Trace.Write(" " + delegateType.Name);
        }
        Trace.WriteLine(null);
    }
    // 30 System: Action`1 Action Action`2 Action`3 Action`4 Func`1 Func`2 Func`3 Func`4 Func`5 Action`5 Action`6 Action`7 Action`8 Func`6 Func`7 Func`8 Func`9 Comparison`1 Converter`2 Predicate`1 ResolveEventHandler AssemblyLoadEventHandler AppDomainInitializer CrossAppDomainDelegate AsyncCallback ConsoleCancelEventHandler EventHandler EventHandler`1 UnhandledExceptionEventHandler
    // 8 System.Threading: SendOrPostCallback ContextCallback ParameterizedThreadStart WaitCallback WaitOrTimerCallback IOCompletionCallback ThreadStart TimerCallback
    // 3 System.Reflection: ModuleResolveEventHandler MemberFilter TypeFilter
    // 3 System.Runtime.CompilerServices: TryCode CleanupCode CreateValueCallback
    // 2 System.Runtime.Remoting.Messaging: MessageSurrogateFilter HeaderHandler
    // 1 System.Runtime.InteropServices: ObjectCreationDelegate
    // 1 System.Runtime.Remoting.Contexts: CrossContextDelegate
}

The following example is functional LINQ implementation, it is declarative. The code describes the logic, without specifying the execution details:

internal static partial class LinqToObjects
{
    internal static void DelegateTypesQueryExpression()
    {
        Assembly mscorlib = typeof(object).Assembly;
        IEnumerable<IGrouping<string, Type>> delegateTypes =
            from type in mscorlib.GetExportedTypes()
            where type.BaseType == typeof(MulticastDelegate)
            group type by type.Namespace into namespaceTypes
            orderby namespaceTypes.Count() descending, namespaceTypes.Key
            select namespaceTypes;
        foreach (IGrouping<string, Type> namespaceTypes in delegateTypes) // Output.
        {
            Trace.Write(namespaceTypes.Count() + " " + namespaceTypes.Key + ":");
            foreach (Type delegateType in namespaceTypes)
            {
                Trace.Write(" " + delegateType.Name);
            }
            Trace.WriteLine(null);
        }
    }
}

The following is the identical query in query method syntax:

internal static partial class LinqToObjects
{
    internal static void DelegateTypesQueryMethods()
    {
        Assembly mscorlib = typeof(object).Assembly;
        IEnumerable<IGrouping<string, Type>> delegateTypes = mscorlib.GetExportedTypes()
            .Where(type => type.BaseType == typeof(MulticastDelegate))
            .GroupBy(type => type.Namespace)
            .OrderByDescending(namespaceTypes => namespaceTypes.Count())
            .ThenBy(namespaceTypes => namespaceTypes.Key);
        foreach (IGrouping<string, Type> namespaceTypes in delegateTypes) // Output.
        {
            Trace.Write(namespaceTypes.Count() + " " + namespaceTypes.Key + ":");
            foreach (Type delegateType in namespaceTypes)
            {
                Trace.Write(" " + delegateType.Name);
            }
            Trace.WriteLine(null);
        }
    }
}

So imperative programming and declarative programming are quite different paradigms and philosophies. Imperative programming has a history to think from lower level up. The computer hardware’s implementation usually is imperative and stateful, so they are designed to execute imperative machine code and change state. Then low level programming languages are designed, which usually have strong correspondence to the machine code, with a little or no abstractions, so they are also imperative and stateful, like assembly language. Later, higher level programming languages are designed as abstraction of low level languages, which is usually more portable, but still imperative and stateful, for example, C is the abstractions of assembly languages, C++ was initially called C with Classes and designed as extension of C. When Microsoft designed modern languages, Microsoft designed C#, rooted in C family of languages to make immediately familiar to C, C++, and Java programmers, etc., so by default C# is imperative and stateful too - Actually C# was initially called COOL (C-like Object Oriented Language). In above imperative example, all execution details of logic have to be specified.

  • how to filter: scan the types, if a type is not a delegate type, ignore it.
  • how to group: use a dictionary to store the groups, where each dictionary key is namespace, and each dictionary value is a list of delegate types under a namespace; for each delegate type, if the dictionary does not have the delegate type’s namespace as a key yet, add a key-value pair to the dictionary, where key is the namespace, and value is an empty list of types; now the current namespace must have a corresponding type list, so add the delegate type to the type list.
  • and how to sort: copy the groups (key-value pairs of dictionary) to a list, so that the groups have an order. then scan the list of groups to apply insertion sort; when comparing 2 groups, first comparing their delegate type counts, if they have the same count, then compare their namespaces; after growing the sorted sub list of groups, eventually all groups are sorted in place.

When reading the above sequence of statements and commands, it is a control flow, the logic and semantic are less intuitive.

In contrast, declarative programming is to think from higher level. It is usually abstractions of the mathematics and logic, disregarding how exactly the operations should be executed. This usually includes avoiding specifying how to change state and how to mutate data. In above LINQ examples, the query simply declares:

  • what is the filter logic: keep delegate types
  • what is the group logic: group delegate types by namespaces
  • what is the sorting logic: sort the delegate type groups in descending order of delegate type counts, then in ascending order of namespaces

When reading the above logic description, it is a data flow, the logic and semantic are very natural and intuitive.

In previous part, the traditional XML data and SQL database queries are also in imperative, object-oriented paradigm. They specify how exactly to access the specific data sources, like opening SQL database connection, etc., pass the query logic to data source with domain specific SQL and XPath languages, etc. In contrast, the LINQ to XML and LINQ to SQL queries are functional and declarative, they describe the query logic without specifying execution details.

Regarding computer hardware is usually imperative, declarative code eventually needs to translated to imperative code to execute in hardware. This process is usually done by compilers at compile time, and by APIs at runtime, so that at design time it can be non-imperatively. Later, this tutorial will discuss how functional and declarative LINQ is implemented by C# compiler and LINQ query methods.

Besides LINQ and functional programming, C# and .NET also provide other declarative features and technologies. For example, attribute is a powerful feature to associate declarative information with code, including assemblies, modules, types, type members:

[TestClass]
public class QueryMethodsTests
{
    [TestMethod]
    public void FilteringTest()
    {
        // Unit test.
    }

    [TestMethod]
    public void GroupingTest()
    {
        // Unit test.
    }
}

Attributes are widely used in C# and .NET programming. For example, data annotation is a technology to use attributes to modeling, display, and validate data entities. The following type uses attributes to declare validation rules for its properties, and the error messages when the validation fails:

public class Contact
{
    [Required(ErrorMessageResourceType = typeof(Resources), ErrorMessageResourceName = nameof(Resources.NameRequired))]
    [StringLength(maximumLength: 50, MinimumLength = 1, ErrorMessageResourceType = typeof(Resources), ErrorMessageResourceName = nameof(Resources.NameInvalid))]
    public string Name { get; set; }

    [EmailAddress(ErrorMessageResourceType = typeof(Resources), ErrorMessageResourceName = nameof(Resources.EmailInvalid))]
    public string Email { get; set; }
}

Code contracts is also a declarative technology to describes the behavior of code. The following example describes type members’ precondition, postcondition, and purity, which is intuitive and readable:

public class Product
{
    private readonly string name;

    private readonly decimal price;

    public Product(string name, decimal price)
    {
        Contract.Requires<ArgumentNullException>(!string.IsNullOrWhiteSpace(name));
        Contract.Requires<ArgumentOutOfRangeException>(price >= 0);

        this.name = name;
        this.price = price;
    }

    public string Name
    {
        [Pure]
        get
        {
            Contract.Ensures(!string.IsNullOrWhiteSpace(Contract.Result<string>()));

            return this.name;
        }
    }

    public decimal Price
    {
        [Pure]
        get
        {
            Contract.Ensures(Contract.Result<int>() >= 0);

            return this.price;
        }
    }
}

Object-oriented vs. functional

In object oriented programming, objects can have behaviors in the form of methods, comparing to functions in functional programming, they are both modularized, reusable code, they can both be called, and they can both have parameters and return values. The main difference is, functional programming is a subtype of declarative programming. Besides being declarative, describing what to do, and not specifying how to do, functional programming usually models operations as the application of mathematical functions, and avoids state changes and data mutation. A mathematical function is a relation between a set of inputs and a set of outputs, and each certain input is related to a certain output. In another word, a mathematical function’s output only depends on the input. This also means it is self contained and does not produce side effects, like state changes which do nor depend on the input. As a result, if a mathematical function is applied (called) with the same input for multiple times, it produces the same output. In C#, functions are represented with methods:

internal static int Add(int a, int b)
{
    return a + b;
}

internal static void AddToConsole(int a, int b)
{
    Console.WriteLine("{0} => {1}", DateTime.Now.ToString("o"), a + b);
}

Add method is a mathematical function. It has a pair integers as input, and relates them to a new sum integer as output. The output sum integer only depends on the input pair of integers. And Add does not change any state. If Add is called with the same pair of integers for multiple times, it produces the same sum integer. In contrast, AddToConsole method is not a mathematical function. It has a pair integers as input, and produce side effect to change console’s display state, which depends on input pair of integers, and current time. When AddToConsole is called with the same pair of integers for multiple times, it changes console’s display state differently. Apparently, being mathematical improves productivity. A self contained, side effect free, stateless function is more readable, debuggable, maintainable, parallelizable, and testable. For example, to test these 2 methods:

[TestMethod]
public void AddTest()
{
    Assert.AreEqual(3, Add(1, 2));
}

[TestMethod]
public void AddToConsoleTest()
{
    StringBuilder consoleState = new StringBuilder();
    Console.SetOut(new StringWriter(consoleState));
    DateTime begin = DateTime.Now;
    AddToConsole(1, 2);
    DateTime end = DateTime.Now;
    string consoleMessage = consoleState.ToString(); // 2016-08-13T19:23:17.0278158-07:00 => 3
    Match match = Regex.Match(consoleMessage, @"(\d{4}-\d{2}-\d{2}T\d{2}\:\d{2}\:\d{2}\.\d{7}[+\-]{1}\d{2}\:\d{2}) => (\d+)");
    DateTime consoleDateTime = Convert.ToDateTime(match.Groups[1].Value);
    int consoleSum = Convert.ToInt32(match.Groups[2].Value);
    Assert.IsTrue(begin < consoleDateTime && consoleDateTime < end);
    Assert.AreEqual(3, consoleSum);
}

Testing Add just need to call it with input, and verify its output. Testing AddToConsole is more complex, because the side effect (console display state change) needs to be captured and verified.

In above functional LINQ queries, there is no state changes or data mutation (like dictionary manipulation, list manipulation, variable reassignment, etc.), there is no flow control (like for loop, if condition etc.). There are just function calls, and all the involved functions are mathematical functions:

  • Where’s parameter type => type.BaseType == typeof(MulticastDelegate) is a mathematical function. It has Type object as input (left side of the => operator), and relates to new bool value as output (right side of the => operator), which indicates if that Type object represents a delegate type. This syntax is called lambda expression, which will be discussed in details later. The output bool value only depends on the input Type object. And this function does not change states. If it is called with the the same Type object for multiple times, it produces the same bool value.
  • GroupBy’s parameter type => type.Namespace is a mathematical function. It has Type object as input, and relates to namespace string value as output, which is used as the grouping key. Again, the output namespace string value only depends on the input Type object. And this function does not change states. If it is called with the same Type object for multiple times, it produces the sane namespace string.
  • OrderByDescending’s parameter namespaceTypes => namespaceTypes.Count() is a mathematical function. It has group of Type objects as input, and relates to that group’s object count integer value as output, which is used as the sorting key. Again, the output object count integer value only depends on the input group. And this function does not change states. If it function is called with the same group for multiple times, it produces the sane count integer.
  • Similarly, ThenBy’s parameter namespaceTypes => namespaceTypes.Key is also a mathematical function.
  • Where, GroupBy, OrderByDescending, ThenBy are called LINQ query methods, they are also mathematical functions, When they are called, they do not actually execute the filtering, grouping, and sorting logic. They have a source sequence and a function as input, and relate to a new generator object as output, which wraps the input source sequence and input function. They do not change state either. If each of these query methods is called with the same source sequence and function, it produces the same generator. This will be discussed later in detail.

Object-oriented programming has first class objects. And in functional programming, functions are first class citizens. To demonstrate the difference, the following document building example is in object-oriented paradigm. It downloads HTML content from the specified URL, converts it to a word document file, and upload to OneDrive to share:

internal class WebContentDownloader
{
    internal string Download(string url)
    {
        throw new NotImplementedException();
    }
}

internal class WordConverter
{
    internal FileInfo Convert(string html)
    {
        throw new NotImplementedException();
    }
}

internal class OneDriveUploader
{
    internal void Upload(FileInfo file)
    {
        throw new NotImplementedException();
    }
}

internal class DocumentBuilder
{
    private readonly WebContentDownloader downloader;
    private readonly WordConverter converter;
    private readonly OneDriveUploader uploader;

    internal DocumentBuilder(WebContentDownloader downloader, WordConverter converter, OneDriveUploader uploader)
    {
        this.downloader = downloader;
        this.converter = converter;
        this.uploader = uploader;
    }

    internal void Build(string url)
    {
        string html = this.downloader.Download(url);
        FileInfo word = this.converter.Convert(html);
        this.uploader.Upload(word);
    }
}

The above WebContentDownloader class is defined for a single responsibility, downloading HTML content with the specified URL. WordConverter class is defined to convert HTML content to Word document file. And OneDriveUploader class is defined to upload the converted Word document file to OneDrive and share. To focus on the paradigm, the implementations are omitted (If interested, the complete web content to Word document building implementation can be foundhere). To build the document, DocumentBuilder class is defined to compose everything together. This is how it works:

internal partial class Imperative
{
    internal static void BuildDocument()
    {
        DocumentBuilder builder = new DocumentBuilder(
            new WebContentDownloader(), new WordConverter(), new OneDriveUploader());
        builder.Build("https://weblogs.asp.net/dixin/linq-via-csharp");
    }
}

In functional paradigm, the above 4 classes are not needed, they can be can be replaced by functions, represented by the following static methods:

internal static partial class Functional
{
    internal static string DownloadWebContent(string url)
    {
        throw new NotImplementedException();
    }

    internal static FileInfo ConvertToWord(string html)
    {
        throw new NotImplementedException();
    }

    internal static void UploadToOneDrive(FileInfo file)
    {
        throw new NotImplementedException();
    }

    internal static Action<string> GetDocumentBuilder(
        Func<string, string> download, Func<string, FileInfo> covert, Action<FileInfo> upload)
    {
        return url =>
            {
                string html = download(url);
                FileInfo word = covert(html);
                upload(word);
            };
    }
}

They can be put together to work:

internal static partial class Functional
{
    internal static void BuildDocument()
    {
        Action<string> buildDocument = GetDocumentBuilder(
            DownloadWebContent, ConvertToWord, UploadToOneDrive);
        buildDocument("https://weblogs.asp.net/dixin/linq-via-csharp");
    }
}

Here GetDocumentBuilder function is called with DownloadWebContent, ConvertToWord, and UploadToOneDrive functions as arguments, and its return value is a buildDocument function. And function names, like buildDocument, works like object variable name, with a type Action<string>, which means accepting a string parameter, and returns void. This demonstrates functions are first class citizens in C#, just like objects. First class functions, and other functional features of C#, will be discussed in detail in this tutorial.

Functional programming features, like lambda expression, are relatively newer to C#, but the concepts and the paradigm has a long history. Lambda calculus, where lambda expression and functional programming come from, was invented in 1930s. The first functional programming language, Lisp, was designed in 1950s. Lisp is also the second oldest high level programming language still widely used today. It is only younger than Fortran, an imperative programming language, by 1 year.





About List