Turning rows into nested objects
Beating object-relational impedance mismatch with Postgres
The data representation when querying a relational database often doesn’t match the representation we use inside our programming language of choice.
This challenge is called Object-relational impedance mismatch, and is apparent when we want to translate denormalised data in rows to nested objects and arrays, especially when joining across tables in a one-to-many or many-to-many scenario.
Fortunately, Postgres provides LATERAL subqueries coupled with native JSON processing to make this easy!
Don’t blindly trust the recommendations of a stranger on the internet!
If you are taking this approach - especially in your application’s hot path - then I’d suggest that you benchmark it against the alternatives (ie. returning joined rows and processing in the app).
I personally rely on this method when doing any investigation, especially when
coupled with jq
.
JOINing in
Given the following DDL the creates 3 tables with a ‘many-to-many’ relationship:
Which we seed this with some data:
Now we can query these tables using regular join
s to attach tags to posts:
Which results in the following…
id | name | content | post_id | tag_id | id | name |
---|---|---|---|---|---|---|
bc963376-b5c4-4cf4-9705-999abd515c96 | Chasing the Gopher | … | bc963376-b5c4-4cf4-9705-999abd515c96 | a64b2f87-504d-4f3e-b989-916ffc0a507b | a64b2f87-504d-4f3e-b989-916ffc0a507b | statically-typed |
fd154bb5-d112-4a54-845e-bdc57d72ab08 | Feeling Rusty | … | fd154bb5-d112-4a54-845e-bdc57d72ab08 | a64b2f87-504d-4f3e-b989-916ffc0a507b | a64b2f87-504d-4f3e-b989-916ffc0a507b | statically-typed |
ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0 | Taming the Python | … | ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0 | 31fad421-385a-484c-ad78-72dc4b9afaba | 31fad421-385a-484c-ad78-72dc4b9afaba | dynamically-typed |
bc963376-b5c4-4cf4-9705-999abd515c96 | Chasing the Gopher | … | bc963376-b5c4-4cf4-9705-999abd515c96 | dde3159b-f7bf-46e0-b096-a9ec123bec90 | dde3159b-f7bf-46e0-b096-a9ec123bec90 | garbage-collected |
ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0 | Taming the Python | … | ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0 | dde3159b-f7bf-46e0-b096-a9ec123bec90 | dde3159b-f7bf-46e0-b096-a9ec123bec90 | garbage-collected |
fd154bb5-d112-4a54-845e-bdc57d72ab08 | Feeling Rusty | … | fd154bb5-d112-4a54-845e-bdc57d72ab08 | 46b649c6-7e5e-4327-8a2a-610c86800f03 | 46b649c6-7e5e-4327-8a2a-610c86800f03 | manual-memory-management |
As you can see, each post is duplicated for every associate tag.
Moving LATERALy
Leveraging Postgres’
LATERAL subqueries,
and json aggeregation functions such as json_agg
, we can vastly simplify the
resulting data:
And now, when we query this view:
id | name | content | tags |
---|---|---|---|
bc963376-b5c4-4cf4-9705-999abd515c96 | Chasing the Gopher | … | [{"id":"a64b2f87-504d-4f3e-b989-916ffc0a507b","name":"statically-typed"},{"id":"dde3159b-f7bf-46e0-b096-a9ec123bec90","name":"garbage-collected"}] |
fd154bb5-d112-4a54-845e-bdc57d72ab08 | Feeling Rusty | … | [{"id":"a64b2f87-504d-4f3e-b989-916ffc0a507b","name":"statically-typed"},{"id":"46b649c6-7e5e-4327-8a2a-610c86800f03","name":"manual-memory-management"}] |
ceb9082f-eb92-4cc6-b0f5-3f358ff71bd0 | Taming the Python | … | [{"id":"31fad421-385a-484c-ad78-72dc4b9afaba","name":"dynamically-typed"},{"id":"dde3159b-f7bf-46e0-b096-a9ec123bec90","name":"garbage-collected"}] |
And we can even spit the whole thing out as JSON:
A note on denormalisation
Another way of tackling this problem is through data denormalisation.
This approach involves storing related data together so that reads are simpler and faster due to lack of joins. For example, this might mean storing tags and comments as part of a post.
In Postgres you could achieve this with an array or JSON column, and heavily denormalised systems generally leverage dedicated document DBs (ie. MongoDB), completely foregoing relational DBs.
While this optimises for reads, it then becomes more costly and difficult to update,
as you now have to update any related resources in every place it is stored.
For example, if a tag was renamed from k8s
to kubernetes
, every post must
then be updated individually to reflect this change.
One high profile instance of this was Instagram - source.