Wilsonhut

Deal with it or don't

Tag Archives: Linq

Obvious Extension Methods: IEnumerable Lazy DefaultIfEmpty

Is your Aggregate throwing an InvalidOperationException: Sequence contains no elements?
This line of code certainly would:
new string[]{}.Aggregate((ag, s) => ag + ", " + s)

Use this instead:
new string[]{}.DefaultIfEmpty("[NA]").Aggregate((ag, s) => ag + ", " + s)

I was writing some extension methods for things such as Aggregate where the only purpose was to pass in default values. That was not very smart, since you can always use DefaultIfEmpty() and pass your default value there. Really, what I needed was an overload of DefaultIfEmpty that takes a Func to get the default value, in case the default value is expensive (a database look-up, for example)

So here that is:

public static IEnumerable<TSource> DefaultIfEmpty<TSource>(this IEnumerable<TSource> source, Func<TSource> getDefaultValue)
{
	using (var enumerator = source.GetEnumerator())
	{
		if (enumerator.MoveNext())
		{
			do
			{
				yield return enumerator.Current;
			}
			while (enumerator.MoveNext());
		}
		else
			yield return getDefaultValue();
	}
}

Now you can say
new string[]{}.DefaultIfEmpty(() => goGetTheDefaultValue()).Aggregate((ag, s) => ag + ", " + s)

Advertisements

Obvious Extension Method: IEnumerable MaxOrDefault and MinOrDefault

This one is even too obvious to comment on:

public static TResult MaxOrDefault(this IEnumerable source, Func selector, TResult defaultValue)
{
	return source.Select(selector).DefaultIfEmpty(defaultValue).Max();
}

public static TResult MaxOrDefault(this IEnumerable source, Func selector)
{
	return source.Select(selector).DefaultIfEmpty().Max();
}
		
public static TResult MinOrDefault(this IEnumerable source, Func selector, TResult defaultValue)
{
	return source.Select(selector).DefaultIfEmpty(defaultValue).Min();
}

public static TResult MinOrDefault(this IEnumerable source, Func selector)
{
	return source.Select(selector).DefaultIfEmpty().Min();
}

It just makes the code a little more succinct when trying to avoid the InvalidOperationException: Sequence contains no elements

Obvious Extension Methods: IEnumerable AnyAndAll

This might not be intuitive to you (it wasn’t to me)

What does the following line return?

Enumerable.Empty<string>().All(t => t == "I don't think so")

…or maybe this would even be more confusing:

Enumerable.Empty<string>().All(t => t != t)

Well… both of those return true. (See Vacuous Truth)

I didn’t want to have to tack a call to Any() in my code for this situation, so I wrote AnyAndAll (which also performs better since it won’t even begin to enumerate the list twice.

Here it is:

  1. public static bool AnyAndAll<T>(this IEnumerable<T> source, Func<T, bool> predicate)
  2. {
  3.     if (source == null)
  4.         throw new ArgumentNullException(“source”);
  5.     if (predicate == null)
  6.         throw new ArgumentNullException(“predicate”);
  7.     bool hasAny = false;
  8.     foreach (var item in source)
  9.     {
  10.         hasAny = true;
  11.         if (!predicate(item))
  12.             return false;
  13.     }
  14.     return hasAny;
  15. }

You’re welcome.

Obvious Extension Method: TakeAllButLast

I had a previous post about TakeAllButLast, but here’s a more succinct version which also has a callback for the untaken records.

Basically, Linq provides us with the extension method Take(x) to get the first x items in an enumerable. It also gives us Skip(x) to take all but the first x items. There is no “TakeAllButLast(x)” or “SkipLast(x)” equivalent.

public static class EnumerableExtensions
{
    public static IEnumerable<T> TakeAllButLast<T>(this IEnumerable<T> items, int count, Action<T> onUntaken = null)
    {
        if (items == null)
            throw new ArgumentNullException("items");

        if (count < 0)
            throw new ArgumentOutOfRangeException("count");

        var buffer = new Queue<T>(count + 1);

        foreach (var item in items)
        {
            buffer.Enqueue(item);

            if (buffer.Count == count + 1)
            {
                yield return buffer.Dequeue();
            }
        }
        if (onUntaken != null)
        {
            foreach (var x in buffer)
            {
                onUntaken(x);
            }
        }
    }
}

 

You’d use this like this example where you’re processing a file that has ONE Trailer record:

using (var reader = new StreamReader(_inputFile))
{
    var fileLines = reader.GetLines().TakeAllButLast(1);
}

…which also uses my GetLines method for a TextReader.

And now, with the callback, you can take a peek at the last line that it skipped:

using (var reader = new StreamReader(_inputFile))
{
    string lastLine;
    var fileLines = reader.GetLines().TakeAllButLast(1, line => lastLine = line);
}

Obvious Extension Method: TextReader GetLines()

I don’t know about you, but looping through a TextReader or StreamReader looks ugly to me. I’d rather have the lines of the file returned as an IEnumerable<string>

So here it is. Easy Schpeasy.

 

  1.     public static class TextReaderExtensions
  2.     {
  3.         public static IEnumerable<string> GetLines(this TextReader streamReader)
  4.         {
  5.             string fileLine;
  6.             while ((fileLine = streamReader.ReadLine()) != null)
  7.             {
  8.                 yield return fileLine;
  9.             }
  10.         }
  11.     }

Now the code using it looks like this:

  1.             using (var reader = new StreamReader("someFile.txt"))
  2.             {
  3.                 foreach (var fileLine in reader.GetLines())
  4.                 {
  5.                     //pretty!
  6.                 }
  7.             }

…and you can use all your favorite Linq on there!

Obvious Extension Methods: The Simple Join

Maybe you just want to know the things from list1 where they have a match in list2.

You can do that yesterday and today with list1.Join(list2, item1=>item1, item2=>item2, (item1,item2)=>item1);

But today and tomorrow, you can simply do list1.Join(list2).

Obvious, right?

  1. public static class Extensions
  2. {
  3.     public static IEnumerable<T> Join<T>(this IEnumerable<T> outer,
  4.         IEnumerable<T> inner)
  5.     {
  6.         return outer.Join(inner, a => a, b => b, (a, b) => a);
  7.     }
  8. }

Obvious Extension Methods: IEnumerable Cache

So you don’t want to do a ToList(), but you don’t want to enumerate your IEnumerable<T> more than once? How about an extension method called “Cache” that Caches the output as you go.

  1. public static class Extensions
  2. {
  3.     public static CachedEnumerable<T> Cache<T>(this IEnumerable<T> enumerable)
  4.     {
  5.         return new CachedEnumerable<T>(enumerable);
  6.     }
  7. }
  8. public class CachedEnumerable<T> : IEnumerable<T>
  9. {
  10.     IEnumerator<T> _originalEnumerator;
  11.     readonly IEnumerable<T> _originalEnumerable;
  12.     readonly List<T> _cache = new List<T>();
  13.     public CachedEnumerable(IEnumerable<T> enumerable)
  14.     {
  15.         _originalEnumerable = enumerable;
  16.     }
  17.     public IEnumerator<T> GetEnumerator()
  18.     {
  19.         foreach (var t in _cache)
  20.         {
  21.             yield return t;
  22.         }
  23.         if (_originalEnumerator == null)
  24.         {
  25.             _originalEnumerator = _originalEnumerable.GetEnumerator();
  26.         }
  27.         while (_originalEnumerator.MoveNext())
  28.         {
  29.             _cache.Add(_originalEnumerator.Current);
  30.             yield return _originalEnumerator.Current;
  31.         }
  32.     }
  33.     IEnumerator IEnumerable.GetEnumerator()
  34.     {
  35.         return GetEnumerator();
  36.     }
  37. }

That’s it. If you want to see it in action, open LINQPad, select Language: “C# Program”, delete everything, paste in the above, then finally, paste in the below:

void Main()
{
  var x = GetNumbers().Cache();

  "TAKE 2.".Dump();
  x.Take(2).Dump("TWO:");

  "Get them all.".Dump();
  x.Dump("ALL:");

  "Get them all again.".Dump();
  x.Dump("ALL:");
}

public IEnumerable<T> GetNumbers()
{
  yield return 1.Dump("Numbers is hard");
  Thread.Sleep(500);
  yield return 2.Dump("Numbers is hard");
  Thread.Sleep(500);
  yield return 3.Dump("Numbers is hard");
  Thread.Sleep(500);
  yield return 4.Dump("Numbers is hard");
  Thread.Sleep(500);
  yield return 5.Dump("Numbers is hard");
  Thread.Sleep(500);
  yield return 6.Dump("Numbers is hard");
}

There.

Obvious Methods: Distinct with equality selector

You have an IEnumerable<SomeType> on which you want to do a .Distinct(). “SomeType” is not equatable – at least not in the way you want. Your only choice is to write an IEqualityComparer… Until today!!!

Here’s what you WANT to write:

myEnumerable.Distinct(item => item.Id);

…assuming that the Id’s make them unique. And here’s the extension method override of Distinct that lets you do this:

public static class Extensions

{

  public static IEnumerable<T> Distinct<T, TCompare>(this IEnumerable<T> items, Func<T, TCompare> predicate)

  {

    var distinctKeys = new HashSet<TCompare>();

    foreach (var item in items)

    {

      var key = predicate(item);

      if (distinctKeys.Contains(key)) continue;

      distinctKeys.Add(key);

      yield return item;

    }

  }

}

Now, you’re probably saying, couldn’t I just write?:

myEnumerable.GroupBy(item => item.Id).Select(g => g.First());

It gives you the same output, but, this new Distinct method is:

  1. faster… like an order of magnitude faster
  2. easier to read… like an order of magnitude easier to read.
  3. lazier. Consider the following contrived enumerable:

var list = new[]

            {

              new {x = 1, y = “one”},

              new {x = 1, y = “won”},

              new {x = 1, y = “juan”},

              new {x = 0, y = “zero”},

            };

…and this call to .Distinct:

list.Distinct(item => 1 / item.x).Take(1);

With the GroupBy approach, this would raise an Exception. But with the new Distinct, you get no exception because it is so dang lazy.

Chunk

I wrote before about partitioning a string, which was a way to bust a string into Chunks. It used the string like an IEnumerable<char>. I thought it would be nice to be able to Chunk any enumerable (for batching, or whatever).

The Partition was written without regard to performance, but this time, it’s all about performance. Just try to make it faster.

Here’s the extension method:

    1 public static class Extensions

    2 {

    3   public static IEnumerable<IEnumerable<T>> ToChunks<T>(this IEnumerable<T> list, int chunkSize)

    4   {

    5     var enumerator = list.GetEnumerator();

    6

    7     for (;;)

    8     {

    9       var chunk = enumerator.GetNext(chunkSize);

   10       if (chunk.Length == 0)

   11       {

   12         break;

   13       }

   14       yield return chunk;

   15     }

   16   }

   17

   18   private static T[] GetNext<T>(this IEnumerator<T> enumerator, int count)

   19   {

   20     var ts = new T[count];

   21     int i;

   22     for (i = 0; i < count; i++)

   23     {

   24       if (!enumerator.MoveNext()) break;

   25       ts[i] = enumerator.Current;

   26     }

   27     if (i < count)

   28     {

   29       Array.Resize(ref ts, i);

   30     }

   31     return ts;

   32   }

   33 }

When I needed this recently, I also needed to know in each chunk where I was in the original IEnumerable, so instead of returning an IEnumerable of IEnumerables, I returned an IEnumerable of a new Chunk type that inherits from IEnumerable. I just had to make a small change on lines 3, 7, and 14:

    1 public static class Extensions

    2 {

    3   public static IEnumerable<Chunk<T>> ToChunks<T>(this IEnumerable<T> list, int chunkSize)

    4   {

    5     var enumerator = list.GetEnumerator();

    6

    7     for (var i = 0;; i++)

    8     {

    9       var chunk = enumerator.GetNext(chunkSize);

   10       if (chunk.Length == 0)

   11       {

   12         break;

   13       }

   14       yield return new Chunk<T>(chunk, i*chunkSize, chunk.Length);

   15     }

   16   }

   17

   18   private static T[] GetNext<T>(this IEnumerator<T> enumerator, int count)

   19   {

   20     var ts = new T[count];

   21     int i;

   22     for (i = 0; i < count; i++)

   23     {

   24       if (!enumerator.MoveNext()) break;

   25       ts[i] = enumerator.Current;

   26     }

   27     if (i < count)

   28     {

   29       Array.Resize(ref ts, i);

   30     }

   31     return ts;

   32   }

   33 }

   34

…and here’s Chunk

    1 public class Chunk<T> : IEnumerable<T>

    2 {

    3   private readonly IEnumerable<T> _chunk;

    4

    5   public Chunk(IEnumerable<T> chunk, int first, int length)

    6   {

    7     _chunk = chunk;

    8     FirstIndex = first;

    9     Length = length;

   10   }

   11

   12   public int FirstIndex { get; private set; }

   13   public int Length { get; private set; }

   14   public int LastIndex { get { return FirstIndex + Length 1; } }

   15

   16   public IEnumerator<T> GetEnumerator()

   17   {

   18     return _chunk.GetEnumerator();

   19   }

   20

   21   IEnumerator IEnumerable.GetEnumerator()

   22   {

   23     return GetEnumerator();

   24   }

   25 }

.Where? .WhereNot

Ever have some code like this?

CollectionOfThings.Where(x => !ThingMeetsSomeCondition(x))

If only there were a WhereNot method, then you could use it with a method group, like so

CollectionOfThings.WhereNot(ThingMeetsSomeCondition)

So Where is your WhereNot? Here:

  public static class Extensions

  {

    public static IEnumerable<T> WhereNot<T>(this IEnumerable<T> items, Func<T, bool> predicate)

    {

      return items.Where(x => !predicate(x));

    }

  }

 

Glad that’s done.