-
Notifications
You must be signed in to change notification settings - Fork 58
2009 09 02 how to write a linq provider the simple way again
Published on September 2nd, 2009 at 8:32
This is the second part of a two-part series of posts. Read the first part for a very short introduction to re-linq, read Stefan Wenig’s post or my whitepaper for more background.
As promised, here’s an introduction to the steps that need to be taken to implement a LINQ provider using re-linq.
First, let’s take a look at the classes and interfaces LINQ and re-linq require you to implement. To start with, you need to provide an implementation of IQueryable<T>
. That’s LINQ’s main query interface, and all of LINQ’s query methods, such as Queryable.Where
, Queryable.OrderBy, or Queryable.Select
are written against it. re-linq provides a base class, QueryableBase<T>
, from which you can derive to implement this interface. Doing so is fairly trivial, it only requires adding two constructors – one used by your provider’s clients, one used by the LINQ infrastructure in the .NET framework.
Then, you need an implementation of IQueryProvider
. LINQ query methods use this interface to create new queries around an existing IQueryable<T>
and to actually execute queries. For example, a call to Queryable.Where
will take an existing query and wrap its expression so that it now represents a query with a where
clause. A call to Queryable.Single
will use the IQueryProvider.Execute
method to actually execute the query. Enumerating queries will also delegate to IQueryProvider.Execute
.
re-linq provides an abstract base class, QueryProviderBase
, and a default implementation, DefaultQueryProvider
, which implement the IQueryProvider
interface. Usually, DefaultQueryProvider
is completely sufficient, so QueryableBase<T>
uses that implementation by default.
While DefaultQueryProvider
implements the query creation part of IQueryProvider
, it of course cannot pre-implement the actual execution of a query against the target query system. Instead, it does the following:
- First, it parses the query which is to be executed into a
QueryModel
. That’s a structured, interlinked object model defined by re-linq, which is much easier to understand and to transform than the native LINQ expression trees. If you’re interested in how the parsing works, take a look at theQueryParser
class and the expression node parsers. - Then, it passes the
QueryModel
on to an implementation ofIQueryExecutor
.
IQueryExecutor
is an interface representing the details of executing a query against a target queryable system. This means it needs to be implemented by you, of course, since you are the one who knows how to build queries for that system.
When you take a look at IQueryExecutor
, you can see that it has three methods: ExecuteScalar, ExecuteSingle, and ExecuteCollection.
Let’s start with ExecuteCollection
, since that is the simplest of the three methods. Take a look at the following code:
var query = from o in QueryFactory.CreateLinqQuery<Order\>()
where o.OrderNumber > 10
select o;
foreach (var order in query)
{
Console.WriteLine (order.OrderNumber);
}
When you execute that code, the query is enumerated and expected to return a collection (or sequence) of items. That’s why IQueryExecutor.ExecuteCollection()
is called for that query (at least when the object returned by QueryFactory.CreateLinqQuery<T>()
is based on QueryableBase<T>)
. ExecuteCollection
is passed a QueryModel
that has exactly one MainFromClause
, one WhereClause
, and one SelectClause
. In short, the QueryModel
directly corresponds to the LINQ query written above.
Now, what about ExecuteSingle
and ExecuteScalar
? Take a look at the following two queries:
var count = (from o in QueryFactory.CreateLinqQuery<Order\>()
where o.OrderNumber > 10
select o).Count();
var item = (from o in QueryFactory.CreateLinqQuery<Order\> ()
where o.OrderNumber > 10
select o).First ();
These two queries are different in that they are not expected to return collections. Instead, they are expected to return scalar, calculated values and single items from the sequence, respectively. Their QueryModels
have operators attached to them that represent the calculation or single item selection. re-linq calls those ResultOperator
s.
The first query has a CountResultOperator
, which represents a scalar value calculated from the query’s result sequence, therefore IQueryExecutor.ExecuteScalar
is called in order to execute it. Other scalar operators are LongCountResultOperator
, ContainsResultOperator
, SumResultOperator,
and AverageResultOperator
.
The second query has a FirstResultOperator,
which represents a single item that is selected from the result sequence, therefore IQueryExecutor.ExecuteSingle
is called in order to execute it. Other single operators are SingleResultOperator, LastResultOperator, MinResultOperator,
and MaxResultOperator
. All of those choose a single item from the query sequence, so all of them are treated the same way. Note that even when those operators return a scalar value because the query returns a sequence of scalar values, they still invoke ExecuteSingle
because a single item is chosen from the list rather than calculated.
For many target queryable systems it will be possible to simply implement ExecuteCollection
and just delegate to that from ExecuteSingle
or ExecuteScalar
. For others, it might be important to take note of the semantic differences. Whichever path you follow, you’ll finally have to pose one important question. “How the heck do I create a query in my target system’s format from a QueryModel
?”
And the answer is, of course, “That depends on your target system!” :)
However, re-linq gives you two important tools to do so: IQueryModelVisitor
and RelinqExpressionVisitor
.
The first of those two visitors operates on a large scale: it provides a way to execute specific code for each clause within a QueryModel
, allowing you to translate one clause at a time. You can collect the partial results of your translations, and finally make one query for your target system from those parts.
The simplest way to make use of IQueryModelVisitor
is to derive from QueryModelVisitorBase
. That class implements the interface by automatically iterating over sub-clauses and collections, dispatching to the correct visitor methods for every element of the query. It’s also hardened against modifications of the QueryModel
being iterated, but more about this later. Simply override its Visit...
methods for the query components you want to handle, and generate your target query parts accordingly. Note that you need to handle all the clauses, result operators, and so on defined by re-linq. If you don’t at least throw an exception for those constructs you simply cannot translate, you’ll get invalid query translations.
While you’re visiting the clauses and result operators, you’ll notice that some of them contain LINQ Expressions. For example, WhereClause.Predicate
contains an Expression
, SelectClause.Selector
does, and even MainFromClause.FromExpression
is an expression tree. Now, haven’t I said earlier that LINQ expressions are inherently complex and hard to understand?
They are, but the expressions you can find in re-linq’s clauses have already been simplified. In them,
- references to outer variables (closures) and other evaluatable expressions have already been pre-evaluated into constants,
- sub-queries have been parsed and replaced by
QueryModel
s wrapped inSubQueryExpressions
, and, most importantly, - transparent identifiers have been removed and references to query sources (from clauses, joins) have been replaced by
QuerySourceReferenceExpressions,
which link back to the respective query source.
Therefore, the expressions you find in re-linq’s clauses are usually quite straight-forward to translate to the target query system. Depending on the target query system, of course.
To implement the translation of expressions, you derive a class from ExpressionTreeVisitor
or, better, ThrowingExpressionTreeVisitor
. Both of them are meant to iterate over an expression tree and to visit each of the nodes in the tree, but ThrowingExpressionTreeVisitor
throws an exception for unsupported node types by default.
Simply override the Visit...
methods for those node types you want to support, and generate a semantically equivalent query element for your target query system. Then, from your IQueryModelVisitor
, take the elements and integrate them into the current query part.
All of this works very fine. Unless, of course, you encounter a construct that’s just way incompatible with your target query system. What now, throw a NotSupportedException
? Realistically, you’ll have to do that, sometimes. But in other cases, it would actually be possible to support some of these constructs, although you’d have to simulate them using other query mechanisms… somehow…
For example, your target query system might not support sub-queries in from clauses
. But sometimes, sub-queries in from clauses can be flattened, thus turning the unsupported query into a supported one.
Or, in other scenarios, you might want to move a Where clause from one side of a join to the other side in order to avoid creating a dependent sub-query. Or you might want to detect group clauses with aggregates if those are well-translatable into your target query system.
While re-linq does not – and cannot – pre-implement all conceivable query model transformations, it does provide a lot of infrastructural support for them. Here’s a list of what we do in order to make transformations less difficult:
- Apart from
QuerySourceReferenceExpressions
, there are no ordering dependencies between clauses in aQueryModel
. You can simply remove clauses from the model, move them around, or insert new ones without any problems. Only when there areQuerySourceReferenceExpressions
that reference those clauses, it is of course important to be more careful. Usually, referenced query sources must stay in the query, prior to the point where they are referenced, or the references must be updated (see below). - All properties of clauses are settable, i.e. it’s easy to replace a
WhereClause
’s predicate or change anAdditionalFromClause
’s item name. - If both the original and the transformed
QueryModel
must be retained, theQueryModel.Clone()
method provides a simple way of generating a deep copy (including clones of all query elements) of theQueryModel
before it is transformed. -
QueryModel.TransformExpressions()
provides an easy-to-use mechanism to transform all expressions held by a query model in one go. -
ReferenceReplacingExpressionTreeVisitor
provides an easy-to-use mechanism to replace references to query sources after they were modified or removed, even across sub-queries. Use in combination withQueryModel.TransformExpressions()
whenever replacing a query source or moving a clause from oneQueryModel
to another. -
ExpressionTreeVisitor
supports custom modification of the expression tree being visited. Simply return new nodes from any of itsVisit...
methods, andExpressionTreeVisitor
will automatically create an expression tree containing your new nodes. -
QueryModelVisitorBase
is hardened against changes made to theQueryModel
while it is being visited. This means that from anyQueryModelVisitorBase.Visit...
method, you can modify any element of theQueryModel
without having to fear exceptions because you’ve just modified a collection being iterated. - Whenever you need to get information about the data produced by a
QueryModel
or a result operator, you can use theGetOutputDataInfo()
methods to calculate the kind (single item, scalar value, sequence) and type of the data being returned.
Last, but not least, you may also run into situations where you’d like to have support for a certain feature that is not supported by re-linq or even LINQ. It happens quite often that LINQ providers define their own, target system-specific query methods; for example to implement full-text querying or query hinting.
For such scenarios, re-linq provides options on several levels. On the query method level, you can implement a custom IExpressionNode
parser class. These classes are used to analyze the structure of a LINQ expression tree and to build the QueryModel
corresponding to that tree. To make use of this extension point, derive from the MethodCallExpressionNodeBase
or ResultOperatorExpressionNodeBase
classes, depending on your scenario. Then, create a MethodCallExpressionNodeTypeRegistry
instance and register your new parser classes. Pass that registry to the DefaultQueryProvider
from your QueryableBase<T>
implementation.
On the QueryModel
level, you can provide custom IBodyClause
implementations, derive from MainFromClause
and SelectClause
, or subclass ResultOperatorBase
. How you integrate them into the QueryModel
depends on your use case, but most often, you’ll integrate them from your expression node parser’s (see above) Apply
methods.
Now, this text, which has turned out to become more an article than a blog post, has given a short overview about the concepts and features of re-linq and how to use them when writing a LINQ provider.
All the options provided by re-linq may seem a little overwhelming, but actually, re-linq is quite straight-forward. A basic LINQ provider only needs to implement a few interfaces to start with, as well as two visitors: one for the QueryModel
, one for the expression trees. Sample code for this can be found at the Linq 2 HQL repository – the sample builds a LINQ provider for the open-source O/R mapper NHibernate based on the query language HQL.
As the LINQ provider evolves, it will need to support queries that are more difficult to translate to the target system, so it will start using query transformations. Transformations are incremental, so you can add new transformations on a feature-by-feature basis. Sophisticated LINQ providers will also want to provide their own query methods in addition to the standard query operators, and again, re-linq supports this in an incremental fashion.
All in all, I’m quite proud of re-linq’s architecture; I think, we’ve managed to build a robust piece of framework code with great utility. So, as I said in part I:
Are you planning to write a LINQ provider? Try re-linq – it’s open-source (LGPL) – and it will save you a lot of headaches.
- Fabian
Since Fabian is off for two weeks of well-deserved vacation, I’ll just post the linq, er, link, to the code sample that matches this post:
Fantastic stuff, been pouring over the hql sample for the last few nights. Question- how would the IQueryExecutor materialize to anonymous types via query projections?
Ray,
Depending on whether you had a single query (Single
, First
, Last
, Min
, Max
) or a collection query, ExecuteCollection<T>
or ExecuteSingle<T>
would be called with T being the anonymous type. In your SelectClause
’s Selector, there’ll be a NewExpression
(IIRC) that constructs the custom query.
For handling this within a specific LINQ provider, it doesn’t really matter whether the query projection constructs an anonymous type or an explicit constructor call, you’d handle both the same way.
You’d use an ExpressionTreeVisitor
to analyze the Selector and to generate the actual projection in your target query language (e.g. SQL). At the same time, you’d construct a LambdaExpression that can take the result of your generated target query, pull out the required data, and put it into the right places in the constructor call.
Because that is rather abstract, I’ll try to write a blog post detailing the implementation some time next week.
Fabian,
Good to hear that it just works. Will you also be addressing usage of the ‘backend’ infrastructure for sql generation? Related question- unless I’m missing something, both the SqlServer generation and the hql backend-generator don’t support grouping? If so, what’s the deal with that?
The backend is currently not a good place to look at for re-linq examples: it’s based on an old implementation of the QueryModel which was a lot more constrained and had fewer features.
It’s planned to rewrite the SQL-generating backend some time to form a better example of how to use re-linq, but I’m not sure about when we (= rubicon) will be able to schedule this. Once it’s done, I’ll of course blog about it.
About grouping support: re-linq does support grouping, of course, but LINQ-style grouping is much different from SQL-style grouping. If you take a look at Queryable.GroupBy
(or the C# group keyword), you’ll see that it returns an IGrouping<TKey, TElement>
, which is very hard to translate to SQL or HQL.
I guess Steve Strong, who’s currently implementing the real NH LINQ provider based on re-linq (http://blogs.imeta.co.uk/sstrong/archive/2009/09/15/756.aspx) will only implement a subset of grouping at first, where the IGrouping stuff is never enumerated, but only accessed via aggregate functions or via its Key property. This is then easily translatable to SQL or HQL (see How to support “group into” with aggregates about this). The rest of the grouping functionality can either be executed in memory (re-linq supports this) – or simply throw a NotSupportedException.
About the sample HQL provider: it was just out of scope for the CodeProject sample.
OK, I definitely get the complexities of grouping. It just seems like, if one’s goal was to leverage re-linq for the purposes of generating sql, that using the backend stuff would be where you start, instead of starting at the querymodel and expression visitors. Because starting with nothing but visitors puts you more or less where you would be if you weren’t using re-linq to begin with, defeating much of the whole purpose. Just trying to grasp how you would get beyond the Frans Bouma ‘toy’ scenario to full on sql-generating linq provider, that’s all. I’ll stay tuned.
Ray,
re-linq is much more than just a couple of visitors, even without its SQL-generating backend. I’ve tried to explaing in this post: More Than a Couple of Visitors.
BTW, we’ve a new Google Group dedicated to questions about re-motion, including re-linq: http://groups.google.com/group/re-motion-users.
Can you provide some samples for ‘Writing custom extensions’ ?
Greetings