Cached and NotCached should return IEnumerable instead of IQueryable

Oct 16, 2014 at 9:38 AM
Let's take the following example:
var person = dbContext.People.Cached().FirstOrDefault(person => person.Name == "Joe");
It compiles perfectly, and will return the first person who is called Joe, however the result will never be cached, thus it cannot be retrieved from cache either.

This is because applying Cached (the same goes for NotCached as well) means "cache the result of the query, where the query is at the left of the Cached extension method". It behaves so because the cache key will be the SQL generated from the query at the left of the Cached extension method.

However, the query composition can continue even after the Cached extension method due to the fact that Cached returns an IQueryable. In the example above the real query that will be sent to the database as SQL will be the whole dbContext.People.FirstOrDefault(person => person.Name == "Joe") which will contain the WHERE clause as well.

Since the cache key will never be equal to the SQL that is being sent to the database, the result will never be cached.

This can be solved by instead of returning an IQueryable:
public static IQueryable<T> Cached<T>(this IQueryable<T> source)
...

public static IQueryable<T> NotCached<T>(this IQueryable<T> source)
...
returning an IEnumerable:
public static IEnumerable<T> Cached<T>(this IQueryable<T> source)
...

public static IEnumerable<T> NotCached<T>(this IQueryable<T> source)
...
This way Cached would become a strict boundary separating the "query that is being sent to the database and that is also being cached" from "that part of the query that is applied to the (probably already) cached result".

Thus the following would work properly:
var person = dbContext.People.Cached().FirstOrDefault(person => person.Name == "Joe");
by retrieving and caching all the people from the database and then applying the in-memory version of FirstOrDefault to the list.

If we want to cache only the people who are called "Joe" (e.g. because there are too many people in our database), then we can write this:
var person = dbContext.People.Where(person => person.Name == "Joe").Cached().FirstOrDefault();
thus the in-memory version of FirstOrDefault will be applied to a much smaller list.
Coordinator
Oct 26, 2014 at 4:48 PM
If I am reading your proposal correctly the change is to enforce querying in the Cached()\NotCached() methods. However the purpose of the Cached()\NotCached() methods is merely to mark queries that should be cached (not cached) and does not have to be tied to query execution (even though in most cases it will probably be). From this perspective returning IQueryable<T> and not IEnumerable<T> is fine. Yes, you need to know that Cached()`NotCached()` applies to what's on the left side but I think this is how Linq works in general.
Another reason I don't want to change the current behavior is that I don't want to add a method that to a set of well-known methods that send query to the database. It can be a really costly operation and I feel that in case of misuse it might be better not to cache the results than to query the database.

Note that the pattern you proposed:
 var person = dbContext.People.Where(person => person.Name == "Joe").Cached().FirstOrDefault();
should work with the way things are done now.

If you need this functionality however you could write your own Cached()\NotCached() like extension methods which would return IEnumerable<T> and would just call existing Cached()\NotCached() methods and do .ToArray() on the result.

Thanks,
Pawel
Oct 27, 2014 at 11:52 AM
Edited Oct 27, 2014 at 11:54 AM
I think you have misunderstood me, probably because I wasn't clear enough.

I have not proposed an instant query execution (i.e. eager evaluation) inside the Cached method. Actually, my Cached method differs from yours only in the return type; all the implementation is the same:
        public static IEnumerable<T> Cached<T>(this IQueryable<T> source)
            where T : class
        {
            if (source == null)
            {
                throw new ArgumentNullException("source");
            }

            var objectQuery = TryGetObjectQuery(source) ?? source as ObjectQuery;

            if (objectQuery != null)
            {
                AlwaysCachedQueriesRegistrar.Instance.AddCachedQuery(
                    objectQuery.Context.MetadataWorkspace, objectQuery.ToTraceString());
            }

            return source;
        }
So there is no ToArray/ToList etc. calls inside Cached.

Cached method would remain what it is currently in your version: it marks queries that should be cached. And of course it marks that part of the query that resides on its left side.

However, by changing the return type from IQueryable to IEnumerable we would end the query composition at the call of the Cached method, thus assuring that the query that will be sent to the database will be exactly that query that we have marked as to-be-cached. And it is exactly that we want. I cannot imagine any other scenarios.

Note that in the following code it is not the Cached method that will cause the query execution, but the FirstOrDefault method - which behaves so regardless of being cached or not:
Person person = dbContext.People.Cached().FirstOrDefault(person => person.Name == "Joe"); // query execution will happen here because of FirstOrDefault
Person person2 = dbContext.People.FirstOrDefault(person => person.Name == "Joe"); // query execution will happen here because of FirstOrDefault
Using the Cached method itself will not cause query execution, due to the lazy evaluation achieved by the combination of IQueryable and IEnumerable queries:
IEnumerable<Person> people = dbContext.People.Cached().Where(person => person.Name == "Joe"); // query execution will NOT happen here
IEnumerable<Person> people2 = dbContext.People.Where(person => person.Name == "Joe").Cached(); // query execution will NOT happen here
IQueryable<Person> people3 = dbContext.People.Where(person => person.Name == "Joe"); // query execution will NOT happen here
These queries will be evaluated only if we ask them to yield their elements by calling ToArray, ToList or by enumerating over their elements by foreach, etc.

In case of people3 all of the query (including that Where clause) will be sent to the database (when we force it to evaluate). No caching here.

In case of people2 all of the query (including that Where clause) will be sent to the database (when we force it to evaluate), and will be cached as well, since all of the query resides on the left of the Cached method.

In case of people only part of the query (excluding that Where clause) will be sent to the database (when we force it to evaluate), and will be cached as well, since the Where clause resides on the right of the Cached method. The Where part of the query will be evaluated in-memory rather then by the database server, but regardless of this the query remains lazy-evaluated.

I know that the code snippet you have quoted from me works properly by using the original version of the Cached method:
// USING THE ORIGINAL VERSION OF CACHED METHOD
var person = dbContext.People.Where(person => person.Name == "Joe").Cached().FirstOrDefault(); // will be cached properly
However, due to the fact that this query can be defined in an alternative form - which will not be cached properly - makes the usage of the original Cached method very error-prone:
// USING THE ORIGINAL VERSION OF CACHED METHOD
var person = dbContext.People.Cached().FirstOrDefault(person => person.Name == "Joe"); // will NOT be cached properly
My proposal avoids these kinds of error-proneness by defining a strict, guaranteed-by-type (IQueryable vs IEnumerable) boundary between the cached and non-cached parts of the query thus ensuring proper caching, while preserving the laziness of the query evaluation.

Thanks,
David
Oct 28, 2014 at 10:05 AM
I think I was wrong in one thing: although I do think that the return type of Cached should be changed from IQueryable to IEnumerable, this is not true for NotCached which should stay as it is now (i.e. IQueryable).

This is because the query composition should be allowed to continue after NotCached, in order to allow the whole query to be sent to the database in case of evaluation of the query. NotCached is really just a marker for a given query (which might be a part of the whole query) which otherwise would be cached told by the CachingPolicy, but which we want to prevent.

So, none of the problems emerge for NotCached that I mentioned for Cached.
Coordinator
Dec 15, 2014 at 2:59 AM
Hi David,

Sorry for not getting back to you sooner. I decided that I don't want to change the current behavior. There are 2 main reason for this. First, it would be a breaking change. Existing projects using .Cached() may no longer compile. Secondly, in case of the following query:
dbContext.People.Cached().Where(person => person.Name == "Joe");
what happens today is that the results are just not cached. After changing the .Cached() method as you propose instead of filtering the results on the database side EF would bring all the entities for the given entity set (potentially thousand of rows) and they will be filtered locally on the client. This could have very severe impact on performance (transferring all the data over the wire and materializing entities can get very costly) and memory use (all the data would be first cached by EFCache and then cached again (in form of entities) by EF in the ObjectStateManager (for the lifetime of the context). Not caching a single entity has therefore more benign consequences than fetching and caching half of the database twice.

Thanks,
Pawel
Feb 8, 2015 at 4:16 PM
Hi Pawel,

the problem with NotCached is more severe since results which are supposed to be blacklisted are cached. In my case it was some record updated by some external process to trigger a routine. Since NotCahced uses a key sql which doesn't contain the TOP(1) bit inserted by FirstOrDefault call, when the query is actually executed and results are handled by the EFCache, no maching blacklist registry is found, complete query and the results are cached. As a result trigger fails since the subsequent calls would get the results from the cache.

I don't know the internals of EF or hook up points EF provides, but if there is a way the query would be evaluated lazily to get the complete sql and use it as the cache key, it's required to make it so to make NotCached work as it's supposed to. Otherwise it's safer to use caching policy.

Thanks for the much needed package by the way!
Coordinator
Feb 9, 2015 at 5:19 AM
@kassobasi - can you show me sample code? I want to make sure I understand your scenario.
Feb 9, 2015 at 12:07 PM
Sure. Customer uploads a huge amount of data to our database to be processed weekly. When upload is done, they create a record in BatchRuns table with status Queued. A windows service polls the table every other minute to catch the upload done record and starts processing the data.

When using FirstOfDefault() directly like this:
var entity = _context.BatchRuns.Where(x => x.Status == (int)StatusEnum.Queued).OrderBy(x => x.Id).NotCached().FirstOrDefault();
The SQL query registered in the blacklist is something like this:
SELECT 
   Extend1.Id,
   Extend1.Status
FROM BatchRuns Extend1
WHERE Extend1.Status = 0
ORDER BY Extend1.Id
However, when FirstOrDefault is executed and caching handler tries to find out if the results should be cached or not, IsQueryBlacklisted is called with a different query:
SELECT TOP(1)
   Extend1.Id,
   Extend1.Status
FROM BatchRuns Extend1
WHERE Extend1.Status = 0
ORDER BY Extend1.Id
Hence, the results are cached with the key containing TOP(1). If the first call is a hit, that is a new batch run is queued, data processing changes its status to Running and then to Complete. In this case, since the change in the entity set invalidates the cached entry, it works fine. However, if the first call is a miss, then nothing invalidates the cache and the cache returns no results int the next call even after a new record is created in BatchRuns.

I don't want to mess around with strings in CachingPolicy and use it to prevent caching to function calls and views only. So the workaround for me to get all records and then use FirstOrDefault to get next batch to process.
var query = _context.BatchRuns.Where(x => x.Status == (int)StatusEnum.Queued).OrderBy(x => x.Id).NotCached();
var entity = query.ToArray().FirstOrDefault();
It's ok for my situation to use it this was since the table won't grow huge, and in this particular query one or no rows are expected. But unlike you said in the discussion previously for the case of Cached, it's not only a performance impact, NotCached does not work with FirstOrDefault.
Coordinator
Feb 10, 2015 at 6:40 AM
Edited Feb 10, 2015 at 6:41 AM
Hi kassobasi,

The issue here is that FirstOrDefault() does not return a query that (possibly) will be executed in the future (i.e. IQueryable) but an actual result of the query that was executed. Changing the extension method as proposed by the OP won't help. The idea is that the .Cached()/.NotCached() extension methods apply to the portion of the query that is to the left of the .Cached()/.NotCached() call. (This is very similar and tightly coupled to how the .ToString() method in EF works - see below). In your case:
var entity = _context.BatchRuns.Where(x => x.Status == (int)StatusEnum.Queued).OrderBy(x => x.Id).NotCached().FirstOrDefault();
NotCached applies to the context.BatchRuns.Where(x => x.Status == (int)StatusEnum.Queued).OrderBy(x => x.Id) part which is exactly equivalent to the SQL query you posted as blacklisted.

I think your issue can be worked around in one of three ways:

1) You could to use a slightly different query. Instead of using .FirstOrDefault() you could try using Take(1). Take() returns IQueryable so you can do .Cached/.NotCached on it. In this case even if you do .ToArray().FirstOrDefault() on the not cached result you will not incur the cost of materializing and filtering results on the client because Take(x) should be translated to TOP x on the server side. The drawback is that you need to remember no too use .FirstOrDefault() directly which can be hard to enforce since this is the most straightforward way of doing this.

2) Use caching policy. I know you did not want to mess with it but I am not convinced that the "CacheAll" policy was really a good choice as a default caching policy. I think in real life it is a very rare case where you want to cache results for each and any query sent to the database.

3) (Which might be the most suitable in your case). Cached()\NotCached() methods are really just syntactical sugar. What they do is just invoke .ToString() on the portion of the query to the left of the Cached()\NotCached() method and add the result (which is a Sql Query) to the right registrar. However, the registrars are public and you can add any arbitrary Sql Query to the registrars yourself. Just call BlacklistedQueriesRegistrar.Instance.AddBlacklistedQuery() provide metadata and your query and I think that should fix your issue. Note that the query has to be an exact match to what EF would produce.

Hope this helps,
Pawel
Feb 10, 2015 at 2:42 PM
Edited Feb 10, 2015 at 2:42 PM
Hi Pawel,

thanks for your reply. I didn't know that I could use registrars publicly (apparently I didn't pay attention when I read your blog post). Now when I think of it, I wouldn't want to register strings I produce there either.

For the Take(1), it's not only FirstOrDefault, you would have to use it for Max/Min with ordering. How would you do Count()?

I didn't take Cached/NotCached as syntactic sugars, due to my lack of knowledge, but as a complete alternative of CachingPolicy to hand pick queries to cache. Now with this limitation, only viable option to me is to do all the work in the CachingPolicy (option 1 is hard to enforce as you said, option 3 is redundant and more error prone than CachingPolicy), which I wanted to avoid because I didn't want to express the query logic in two separate locations and I didn't want to have caching problems due to a typo in the strings, but I guess I can get most of them using context classes' type names.

I'm not suggesting that changing the return type to IEnumerable would solve this problem. The only safe method I see for Cached/NotCached to work as I expected is to subscribe to IQueryable translation (or some hook after the translation happens) to get the final SQL, not the SQL for the left of Cached/NotCached call. It seems to be not possible with EF6.1.

Thanks.