Querying JSONB data in PostgreSQL
Introduction
I use columns with JSONB data type for storing things in PostgreSQL databases quite often and find myself googling or seeking help from LLMs whenever a requirement for querying them arises.
PostgreSQL documentation is awesome but I wanted to understand different common querying scenarios while dealing with JSONB data types. Hence, I am writing this blog as a documentation for me to use in future.
This is not a complete list of scenarios possible while dealing with JSONB data types. There can be other scenarios as well which are not covered here 🙂
You can read up on JSON / JSONB data types, and the differences between them here.
Pre-requisites
Before we proceed further, we need to understand a few available JSONB operators and functions in PostgreSQL
Arrow operator
This operator helps in accessing specific elements of a JSONB / JSON array or object
Consider an object like this -
{
"name": "Bob",
"age": 21
}
Assuming this value is stored in a column named data
, if we do data->'name'
, we’ll get "Bob"
as the output.
Similarly, for arrays, we can use this operator to access elements at a specific index. Consider the following scenario -
[
{
"name": "Bob",
"age": 21
}
]
Here, data->0
gives us the object present at index 0. You can further chain these operators to get the desired value. For example, in this case, you can do data->0->'age'
to get the age
from the first object.
There’s a difference between the
->
and->>
operator.
->
operator returns a value which is of typejsonb
/jsonb
(depending on the input).->>
operator returns the value as atext
.
Containment operator
This operator helps in checking whether a JSON / JSONB array / object contains a property / element. Returns a boolean value.
Examples -
// Objects
{ "name": "Bob", "age": 21 } @> {"name": "Bob"} // True
{ "name": "Bob", "age": 21 } @> { "age": 45 } // False
{ "name": "Bob", "age": 21 } @> { "name": "Bob", "age": 21 } // True
// Arrays
[1, 2, 3] @> [1, 3] // True
jsonb_array_elements
This function takes in a JSONB / JSON array as an input and expands the top-level values to individual JSON values
Consider the following query -
select jsonb_array_elements('[{"name": "Bob"}, {"name": "Alex"}]'::jsonb) as data;
It outputs the following result -
data |
----------------+
{"name": "Bob"} |
{"name": "Alex"}|
Every object inside the array got mapped to a separate row in the result
The above operators and functions are just a pre-requisite for this blog. PostgreSQL offers many other useful JSON(B) functions and operators which you can find here
Scenarios
Now that we have an understanding of the above function / operators, we can proceed to discussing the different scenarios while querying JSONB
Simple objects
We’ll be dealing with the following table in this section.
Table - simple_object
id|data |
--+------------------------------+
1|{"age": 21, "name": "Bob"} |
2|{"age": 18, "name": "Sam"} |
3|{"age": 22, "name": "Michael"}|
Here’s what the schema looks like -
id
-int4
data
-jsonb
The data
column contains the name
and age
of different students.
Get entries where name is “Sam”
select * from simple_object where data->>'name' = 'Sam';
id|data |
--+--------------------------+
2|{"age": 18, "name": "Sam"}|
Here, we are using the ->>
operator to get the name
as a text
value and then performing an equality check with the string Sam
If we wanted to do this with the ->
operator, here’s how it would look like -
select * from simple_object where data->'name' = '"Sam"'::jsonb;
As you can see, we are casting the string Sam
to jsonb
so that we can compare it with the output from data->name
(which is of jsonb
type)
Get entries where age is greater than 18
select * from simple_object where (data->'age')::int > 18;
id|data |
--+------------------------------+
1|{"age": 21, "name": "Bob"} |
3|{"age": 22, "name": "Michael"}|
Here, we are casting the output from data->age
to int
so that it can be used with the comparison operator
Simple objects with arrays
We have now updated our data
column to also store the subjects
information for each student
Table - simple_object_with_array
id|data |
--+-------------------------------------------------------------------------+
1|{"age": 21, "name": "Bob", "subjects": ["english", "spanish"]} |
2|{"age": 18, "name": "Sam", "subjects": ["japanese", "korean"]} |
3|{"age": 22, "name": "Michael", "subjects": ["english", "hindi"]} |
4|{"age": 22, "name": "Chris", "subjects": ["english", "hindi", "spanish"]}|
Get all entries where “english” is one of the subjects
select * from simple_object_with_array where data->'subjects' @> '"english"'::jsonb
id|data |
--+-------------------------------------------------------------------------+
1|{"age": 21, "name": "Bob", "subjects": ["english", "spanish"]} |
3|{"age": 22, "name": "Michael", "subjects": ["english", "hindi"]} |
4|{"age": 22, "name": "Chris", "subjects": ["english", "hindi", "spanish"]}|
Here, the important thing to note is the casting that we are doing for the english
text. Since the data->'subjects'
operation returns an array with jsonb
data, this casting is necessary for the containment operator to work.
Get all entries which have “english” or “spanish” as one of their subjects
Do not confuse this query with
data->'subjects' @> '["english", "spanish"]'
. This will return only entries which has atleast English and Spanish as the subject + more.We want entries with either English or Spanish or both
select * from simple_object_with_array where data->'subjects' @> '"english"' or data->'subjects' @> '"spanish"'
id|data |
--+-------------------------------------------------------------------------+
1|{"age": 21, "name": "Bob", "subjects": ["english", "spanish"]} |
3|{"age": 22, "name": "Michael", "subjects": ["english", "hindi"]} |
4|{"age": 22, "name": "Chris", "subjects": ["english", "hindi", "spanish"]}|
Notice how we also have an entry with only english
as the subject (2nd entry).
In the above query, we were using the containment operator with an OR condition to achieve the result. Another way is to use the jsonb_array_elements_text
(similar to jsonb_array_elements
but outputs an array of texts) function with IN
operator
select * from simple_object_with_array
where exists (
select 1
from jsonb_array_elements_text(data->'subjects') AS subjects
where subjects IN ('english', 'spanish')
);
Here, we are going through each row and checking if the subjects
array contains english
or spanish
after expanding the subjects
property into an array of texts
Arrays with objects
In this section, we are going to deal with some different data but the table schema remains the same
Table - array_with_object
id|data |
--+---------------------------------------------------------------------------------------------------------------------------------------+
1|[{"name": "Oats", "price": 100, "category": ["grocery", "healthy"]}, {"name": "Facewash", "price": 50, "category": ["toiletry"]}] |
2|[{"name": "Bread", "price": 40, "category": ["grocery"]}, {"name": "Shampoo", "price": 150, "category": ["toiletry"]}] |
3|[{"name": "Laptop", "price": 50000, "category": ["electronics"]}, {"name": "Book", "price": 300, "category": ["education"]}] |
4|[{"name": "Rice", "price": 60, "category": ["grocery"]}, {"name": "Toothpaste", "price": 30, "category": ["toiletry"]}] |
5|[{"name": "Milk", "price": 50, "category": ["grocery"]}, {"name": "Soap", "price": 20, "category": ["toiletry"]}] |
6|[{"name": "Notebook", "price": 40, "category": ["stationery"]}, {"name": "Pen", "price": 10, "category": ["stationery"]}] |
7|[{"name": "Apples", "price": 150, "category": ["fruits"]}, {"name": "Orange Juice", "price": 120, "category": ["beverages"]}] |
8|[{"name": "Eggs", "price": 70, "category": ["grocery"]}, {"name": "Conditioner", "price": 200, "category": ["toiletry"]}] |
9|[{"name": "Chicken", "price": 250, "category": ["meat"]}, {"name": "Fish", "price": 300, "category": ["seafood"]}] |
10|[{"name": "Bananas", "price": 50, "category": ["fruits"]}, {"name": "Grapes", "price": 90, "category": ["fruits"]}] |
11|[{"name": "Shirt", "price": 700, "category": ["clothing"]}, {"name": "Jeans", "price": 1200, "category": ["clothing"]}] |
12|[{"name": "Television", "price": 30000, "category": ["electronics"]}, {"name": "Speakers", "price": 5000, "category": ["electronics"]}]|
13|[{"name": "Pillow", "price": 400, "category": ["home"]}, {"name": "Blanket", "price": 800, "category": ["home"]}] |
14|[{"name": "Chair", "price": 1500, "category": ["furniture"]}, {"name": "Table", "price": 2500, "category": ["furniture"]}] |
15|[{"name": "Fan", "price": 2000, "category": ["electronics"]}, {"name": "Air Conditioner", "price": 30000, "category": ["electronics"]}]|
16|[{"name": "Football", "price": 500, "category": ["sports"]}, {"name": "Tennis Racket", "price": 2000, "category": ["sports"]}] |
17|[{"name": "Handbag", "price": 1500, "category": ["accessories"]}, {"name": "Belt", "price": 800, "category": ["accessories"]}] |
18|[{"name": "Bike", "price": 80000, "category": ["vehicles"]}, {"name": "Helmet", "price": 1500, "category": ["vehicles"]}] |
19|[{"name": "Coffee", "price": 400, "category": ["beverages"]}, {"name": "Tea", "price": 200, "category": ["beverages"]}] |
20|[{"name": "Printer", "price": 8000, "category": ["office supplies"]}, {"name": "Paper", "price": 200, "category": ["office supplies"]}]|
The data
column contains an array of objects. Each entry is an order and the data
column contains the different items part of the order.
Every item has the following properties -
name
- Name
price
- Price (eg: 20, 10)
category
- array of categories (eg: [‘electronics’, ‘office supplies’])
Get all entries which have an item of category “grocery”
select * from array_with_object where data @> '[{"category": ["grocery"]}]'::jsonb;
id|data |
--+---------------------------------------------------------------------------------------------------------------------------------+
2|[{"name": "Bread", "price": 40, "category": ["grocery"]}, {"name": "Shampoo", "price": 150, "category": ["toiletry"]}] |
4|[{"name": "Rice", "price": 60, "category": ["grocery"]}, {"name": "Toothpaste", "price": 30, "category": ["toiletry"]}] |
5|[{"name": "Milk", "price": 50, "category": ["grocery"]}, {"name": "Soap", "price": 20, "category": ["toiletry"]}] |
8|[{"name": "Eggs", "price": 70, "category": ["grocery"]}, {"name": "Conditioner", "price": 200, "category": ["toiletry"]}] |
1|[{"name": "Oats", "price": 100, "category": ["grocery", "healthy"]}, {"name": "Facewash", "price": 50, "category": ["toiletry"]}]|
This is a very simple and clean query which utilizes the power of the containment operator. It does all the heavy lifting and fetches all entries which have an item with the category grocery
(even when the item has multiple categories and grocery
is one of them)
Another approach for fetching this data is the following
select * from array_with_object where exists (select 1 from jsonb_array_elements(data) as item where item->'category' @> '"grocery"');
Here, we are going through each row and then going through each item in its data
column to check if the category is grocery
or not
Get all entries where total purchase amount is more than 2000
with parsed_orders as (
select
id,
jsonb_array_elements(data) as item
from array_with_object
),
order_totals as (
select
id,
sum((item->>'price')::numeric) as total_price
from parsed_orders
group by id
)
select
id,
total_price
from order_totals
where total_price > 2000
order by id;`
Here, we are using CTEs for fetching our data.
- First, we expand all the items for a row Assuming every row has 2 items and we have a total of 20 rows, after the first operation, we’ll get 40 rows
- Next, we are calculating the sum of all items and grouping them by the id
- And at last, we are just filtering the ids which have a total sum greater than 2000
Another approach for fetching this data, is to use a Cross Join
select id, sum((item->>'price')::numeric) from array_with_object, jsonb_array_elements(data) as item group by id having SUM((item->>'price')::numeric) > 2000 order by id;
The idea is the same as the first approach. We just do a cartesian product of each row with the items in its data
column and then the rest of the logic remains the same.
Nested objects
For this section, we’ll be modifying the structure from the second section a bit. Now, instead of having the name
and age
information for a student at the root level, we’ll be moving it inside a property named info
Here’s how the updated table will look like
Table - nested_object
id|data |
--+--------------------------------------------------------------------------+
1|{"info": {"age": 21, "name": "Bob"}, "subjects": ["english", "spanish"]} |
2|{"info": {"age": 18, "name": "Sam"}, "subjects": ["japanese", "korean"]} |
3|{"info": {"age": 22, "name": "Michael"}, "subjects": ["english", "hindi"]}|
Get entries where name is “Sam”
select * from nested_object where data->'info'->>'name' = 'Sam';
id|data |
--+------------------------------------------------------------------------+
2|{"info": {"age": 18, "name": "Sam"}, "subjects": ["japanese", "korean"]}|
As you can see, this is almost the same as the query we had in the first section. We’ve just added one additional ->
operator as the data is nested now
Wrapping up
That’s mostly it. I’ll be updating this blog in future if I come across any other interesting use cases. Hope you learnt something new.