Aggregate deeply nested JSONField data in Django

Written by on October 24, 2018, 6:19 am

I ran across this problem the other day and it seems there aren't useful answers on this particular issue.

A combination of Django and Postgres makes it very easy to persist JSON data in your models. I think generally when building an application data schema storing JSON data is usually a bad idea. In fact this post and its partial solution is an example of issues you can have expecting JSON fields to behave predictably (in ORM terms).

In this case I wanted to store responses from a third party API and didn't think I would need to query data in those responses (hence no formal schema for the values within). Of course a requirement eventuated that required querying and then aggregation on values in the JSON I was storing and that is where I hit the problem described below.

(N.B. As I write this, with some hindsight, I think the correct solution is actually to migrate the values out of the JSON and store them in dedictated fields in the model, but anyway).

So if you want to filter by deeply nested values in a JSONField that part is fine. It actually works nicely with query_set syntax that matches doing joins across reverse relationships or foreign keys.

class ThingResponse(TimeStampedModel):
user = models.ForeignKey(User, on_delete=models.PROTECT, related_name="thing_responses")
    data = JSONField(_('The actual response from thing'))

If an example data looks like:

{
  "house": {
    "name": "Lannister",
    "seat": "Casterly Rock",
    "scion": "Tywin",
    "heir": "Jaime",
    "chapter_introduced": 5
  },
  "name" : "Tyrion Lannister",
  "age" : 24
}

and you want to find all the Lannisters, you could write a queryset like this:

ThingResponse.objects.filter(data__house__name="Lannister")

Things get tricky when you want to aggregate data in the JSONField. Aggregating on a value at the top level is fine (as described in this SO answer):

from django.db.models.functions import Cast
from django.db.models import Min, IntegerField
from django.contrib.postgres.fields.jsonb import KeyTextTransform

ThingResponse.objects.annotate(age_value=Cast(KeyTextTransform('age', 'data'), IntegerField())).aggregate(Min('age_value'))

You can annotate the queryset with nested values by nesting the KeyTextTransform like this:

ThingResponse.objects.annotate(chapter=Cast(KeyTextTransform('chapter_introduced', KeyTextTransform('house', 'data')), IntegerField()))

So this works fine and will annotate your result with a 'chapter' property with the nested value in it. Problem happens when you try and aggregate on that nested value:

ThingResponse.objects.annotate(chapter=Cast(KeyTextTransform('chapter_introduced', KeyTextTransform('house', 'data')), IntegerField())).aggregate(Min('chapter_introduced'))

You'll get an error like this:

File "[env path]/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 128, in get_group_by if (sql, tuple(params)) not in seen: TypeError: unhashable type: 'list'

I think what is happening here is the KeyTextTransform hasn't been resolved by this time so instead of a value it is getting a list of arguments.

So this is as far as I got with this and ended up implementing the aggregation in memory. If you do have a similar issue and find a solution to the issue with aggregating on nested JSON values I would love to hear from you.

Permalink - Tags: Development