Himanshu Borah

Querying JSONB data in PostgreSQL


Introduction

I use columns with JSONB data type for storing things in PostgreSQL databases quite often and find myself googling or seeking help from LLMs whenever a requirement for querying them arises.

PostgreSQL documentation is awesome but I wanted to understand different common querying scenarios while dealing with JSONB data types. Hence, I am writing this blog as a documentation for me to use in future.

This is not a complete list of scenarios possible while dealing with JSONB data types. There can be other scenarios as well which are not covered here 🙂

You can read up on JSON / JSONB data types, and the differences between them here.

Pre-requisites

Before we proceed further, we need to understand a few available JSONB operators and functions in PostgreSQL

Arrow operator

This operator helps in accessing specific elements of a JSONB / JSON array or object

Consider an object like this -

{
  "name": "Bob",
  "age": 21
}

Assuming this value is stored in a column named data, if we do data->'name', we’ll get "Bob" as the output.

Similarly, for arrays, we can use this operator to access elements at a specific index. Consider the following scenario -

[
  {
    "name": "Bob",
    "age": 21
  }
]

Here, data->0 gives us the object present at index 0. You can further chain these operators to get the desired value. For example, in this case, you can do data->0->'age' to get the age from the first object.

There’s a difference between the -> and ->> operator.

Containment operator

This operator helps in checking whether a JSON / JSONB array / object contains a property / element. Returns a boolean value.

Examples -

// Objects

{ "name": "Bob", "age": 21 } @> {"name": "Bob"} // True

{ "name": "Bob", "age": 21 } @> { "age": 45 } // False

{ "name": "Bob", "age": 21 } @> { "name": "Bob", "age": 21 } // True

// Arrays

[1, 2, 3] @> [1, 3] // True

jsonb_array_elements

This function takes in a JSONB / JSON array as an input and expands the top-level values to individual JSON values

Consider the following query -

select jsonb_array_elements('[{"name": "Bob"}, {"name": "Alex"}]'::jsonb) as data;

It outputs the following result -

data            |
----------------+
{"name": "Bob"} |
{"name": "Alex"}|

Every object inside the array got mapped to a separate row in the result

The above operators and functions are just a pre-requisite for this blog. PostgreSQL offers many other useful JSON(B) functions and operators which you can find here

Scenarios

Now that we have an understanding of the above function / operators, we can proceed to discussing the different scenarios while querying JSONB

Simple objects

We’ll be dealing with the following table in this section.

Table - simple_object

id|data                          |
--+------------------------------+
 1|{"age": 21, "name": "Bob"}    |
 2|{"age": 18, "name": "Sam"}    |
 3|{"age": 22, "name": "Michael"}|

Here’s what the schema looks like -

The data column contains the name and age of different students.

Get entries where name is “Sam”

select * from simple_object where data->>'name' = 'Sam';
id|data                      |
--+--------------------------+
 2|{"age": 18, "name": "Sam"}|

Here, we are using the ->> operator to get the name as a text value and then performing an equality check with the string Sam

If we wanted to do this with the -> operator, here’s how it would look like -

select * from simple_object where data->'name' = '"Sam"'::jsonb;

As you can see, we are casting the string Sam to jsonb so that we can compare it with the output from data->name (which is of jsonb type)

Get entries where age is greater than 18

select * from simple_object where (data->'age')::int > 18;
id|data                          |
--+------------------------------+
 1|{"age": 21, "name": "Bob"}    |
 3|{"age": 22, "name": "Michael"}|

Here, we are casting the output from data->age to int so that it can be used with the comparison operator

Simple objects with arrays

We have now updated our data column to also store the subjects information for each student

Table - simple_object_with_array

id|data                                                                     |
--+-------------------------------------------------------------------------+
 1|{"age": 21, "name": "Bob", "subjects": ["english", "spanish"]}           |
 2|{"age": 18, "name": "Sam", "subjects": ["japanese", "korean"]}           |
 3|{"age": 22, "name": "Michael", "subjects": ["english", "hindi"]}         |
 4|{"age": 22, "name": "Chris", "subjects": ["english", "hindi", "spanish"]}|

Get all entries where “english” is one of the subjects

select * from simple_object_with_array where data->'subjects' @> '"english"'::jsonb
id|data                                                                     |
--+-------------------------------------------------------------------------+
 1|{"age": 21, "name": "Bob", "subjects": ["english", "spanish"]}           |
 3|{"age": 22, "name": "Michael", "subjects": ["english", "hindi"]}         |
 4|{"age": 22, "name": "Chris", "subjects": ["english", "hindi", "spanish"]}|

Here, the important thing to note is the casting that we are doing for the english text. Since the data->'subjects' operation returns an array with jsonb data, this casting is necessary for the containment operator to work.

Get all entries which have “english” or “spanish” as one of their subjects

Do not confuse this query with data->'subjects' @> '["english", "spanish"]'. This will return only entries which has atleast English and Spanish as the subject + more.

We want entries with either English or Spanish or both

select * from simple_object_with_array where data->'subjects' @> '"english"' or data->'subjects' @> '"spanish"'
id|data                                                                     |
--+-------------------------------------------------------------------------+
 1|{"age": 21, "name": "Bob", "subjects": ["english", "spanish"]}           |
 3|{"age": 22, "name": "Michael", "subjects": ["english", "hindi"]}         |
 4|{"age": 22, "name": "Chris", "subjects": ["english", "hindi", "spanish"]}|

Notice how we also have an entry with only english as the subject (2nd entry).

In the above query, we were using the containment operator with an OR condition to achieve the result. Another way is to use the jsonb_array_elements_text (similar to jsonb_array_elements but outputs an array of texts) function with IN operator

select * from simple_object_with_array
where exists (
  select 1
  from jsonb_array_elements_text(data->'subjects') AS subjects
  where subjects IN ('english', 'spanish')
);

Here, we are going through each row and checking if the subjects array contains english or spanish after expanding the subjects property into an array of texts

Arrays with objects

In this section, we are going to deal with some different data but the table schema remains the same

Table - array_with_object

id|data                                                                                                                                   |
--+---------------------------------------------------------------------------------------------------------------------------------------+
 1|[{"name": "Oats", "price": 100, "category": ["grocery", "healthy"]}, {"name": "Facewash", "price": 50, "category": ["toiletry"]}]      |
 2|[{"name": "Bread", "price": 40, "category": ["grocery"]}, {"name": "Shampoo", "price": 150, "category": ["toiletry"]}]                 |
 3|[{"name": "Laptop", "price": 50000, "category": ["electronics"]}, {"name": "Book", "price": 300, "category": ["education"]}]           |
 4|[{"name": "Rice", "price": 60, "category": ["grocery"]}, {"name": "Toothpaste", "price": 30, "category": ["toiletry"]}]                |
 5|[{"name": "Milk", "price": 50, "category": ["grocery"]}, {"name": "Soap", "price": 20, "category": ["toiletry"]}]                      |
 6|[{"name": "Notebook", "price": 40, "category": ["stationery"]}, {"name": "Pen", "price": 10, "category": ["stationery"]}]              |
 7|[{"name": "Apples", "price": 150, "category": ["fruits"]}, {"name": "Orange Juice", "price": 120, "category": ["beverages"]}]          |
 8|[{"name": "Eggs", "price": 70, "category": ["grocery"]}, {"name": "Conditioner", "price": 200, "category": ["toiletry"]}]              |
 9|[{"name": "Chicken", "price": 250, "category": ["meat"]}, {"name": "Fish", "price": 300, "category": ["seafood"]}]                     |
10|[{"name": "Bananas", "price": 50, "category": ["fruits"]}, {"name": "Grapes", "price": 90, "category": ["fruits"]}]                    |
11|[{"name": "Shirt", "price": 700, "category": ["clothing"]}, {"name": "Jeans", "price": 1200, "category": ["clothing"]}]                |
12|[{"name": "Television", "price": 30000, "category": ["electronics"]}, {"name": "Speakers", "price": 5000, "category": ["electronics"]}]|
13|[{"name": "Pillow", "price": 400, "category": ["home"]}, {"name": "Blanket", "price": 800, "category": ["home"]}]                      |
14|[{"name": "Chair", "price": 1500, "category": ["furniture"]}, {"name": "Table", "price": 2500, "category": ["furniture"]}]             |
15|[{"name": "Fan", "price": 2000, "category": ["electronics"]}, {"name": "Air Conditioner", "price": 30000, "category": ["electronics"]}]|
16|[{"name": "Football", "price": 500, "category": ["sports"]}, {"name": "Tennis Racket", "price": 2000, "category": ["sports"]}]         |
17|[{"name": "Handbag", "price": 1500, "category": ["accessories"]}, {"name": "Belt", "price": 800, "category": ["accessories"]}]         |
18|[{"name": "Bike", "price": 80000, "category": ["vehicles"]}, {"name": "Helmet", "price": 1500, "category": ["vehicles"]}]              |
19|[{"name": "Coffee", "price": 400, "category": ["beverages"]}, {"name": "Tea", "price": 200, "category": ["beverages"]}]                |
20|[{"name": "Printer", "price": 8000, "category": ["office supplies"]}, {"name": "Paper", "price": 200, "category": ["office supplies"]}]|

The data column contains an array of objects. Each entry is an order and the data column contains the different items part of the order.

Every item has the following properties - name - Name price - Price (eg: 20, 10) category - array of categories (eg: [‘electronics’, ‘office supplies’])

Get all entries which have an item of category “grocery”

select * from array_with_object where data @> '[{"category": ["grocery"]}]'::jsonb;
id|data                                                                                                                             |
--+---------------------------------------------------------------------------------------------------------------------------------+
 2|[{"name": "Bread", "price": 40, "category": ["grocery"]}, {"name": "Shampoo", "price": 150, "category": ["toiletry"]}]           |
 4|[{"name": "Rice", "price": 60, "category": ["grocery"]}, {"name": "Toothpaste", "price": 30, "category": ["toiletry"]}]          |
 5|[{"name": "Milk", "price": 50, "category": ["grocery"]}, {"name": "Soap", "price": 20, "category": ["toiletry"]}]                |
 8|[{"name": "Eggs", "price": 70, "category": ["grocery"]}, {"name": "Conditioner", "price": 200, "category": ["toiletry"]}]        |
 1|[{"name": "Oats", "price": 100, "category": ["grocery", "healthy"]}, {"name": "Facewash", "price": 50, "category": ["toiletry"]}]|

This is a very simple and clean query which utilizes the power of the containment operator. It does all the heavy lifting and fetches all entries which have an item with the category grocery (even when the item has multiple categories and grocery is one of them)

Another approach for fetching this data is the following

select * from array_with_object where exists (select 1 from jsonb_array_elements(data) as item where item->'category' @> '"grocery"');

Here, we are going through each row and then going through each item in its data column to check if the category is grocery or not

Get all entries where total purchase amount is more than 2000

with parsed_orders as (
	select
	id,
	jsonb_array_elements(data) as item
	from array_with_object
),
order_totals as (
	select
	id,
	sum((item->>'price')::numeric) as total_price
	from parsed_orders
	group by id
)
select
id,
total_price
from order_totals
where total_price > 2000
order by id;`

Here, we are using CTEs for fetching our data.

Another approach for fetching this data, is to use a Cross Join

select id, sum((item->>'price')::numeric) from array_with_object, jsonb_array_elements(data) as item group by id having SUM((item->>'price')::numeric) > 2000 order by id;

The idea is the same as the first approach. We just do a cartesian product of each row with the items in its data column and then the rest of the logic remains the same.

Nested objects

For this section, we’ll be modifying the structure from the second section a bit. Now, instead of having the name and age information for a student at the root level, we’ll be moving it inside a property named info

Here’s how the updated table will look like

Table - nested_object

id|data                                                                      |
--+--------------------------------------------------------------------------+
 1|{"info": {"age": 21, "name": "Bob"}, "subjects": ["english", "spanish"]}  |
 2|{"info": {"age": 18, "name": "Sam"}, "subjects": ["japanese", "korean"]}  |
 3|{"info": {"age": 22, "name": "Michael"}, "subjects": ["english", "hindi"]}|

Get entries where name is “Sam”

select * from nested_object where data->'info'->>'name' = 'Sam';
id|data                                                                    |
--+------------------------------------------------------------------------+
 2|{"info": {"age": 18, "name": "Sam"}, "subjects": ["japanese", "korean"]}|

As you can see, this is almost the same as the query we had in the first section. We’ve just added one additional -> operator as the data is nested now

Wrapping up

That’s mostly it. I’ll be updating this blog in future if I come across any other interesting use cases. Hope you learnt something new.