- Print
- DarkLight
A JSON document does not contain tabular data, unlike a CSV or XLS file, which is why data sources based on them work a little different.
You have fine-grained control over which parts of the JSON document end up in your data source by using JMESPath expressions. JMESPath is a powerful tool to select and transform JSON data, but the syntax needs some getting used to.
On this page we demonstrate how to set up a simple data source with JMESPath expressions, followed by some in-depth examples.
Try it yourself
You can download the file below to play around with the examples yourself.
Upload it as file asset in the asset manager and then create a data source based on it. You can then edit the structure of the data source and fill in the JMESPath expressions explained below.
Define a source
Firstly, we have to define which part of our JSON document contains the data that we are interested in.
Depending on the document, that may be fairly obvious – for instance if the document is one large JSON array. However, oftentimes a JSON document may be the output of an API endpoint or an export from an external system and hence contains a lot more information besides what we are interested in.
To create a data source, we need an array or object and we can select (or create) one from our document with a JMESPath expression. The system will then create a data source row for each item in the array resp. property value of the object (as defined by the column expressions).
Let's look at an example document:
{
"products": [
{
"id": 21,
"title": "Cucumber"
},
{
"id": 22,
"title": "Dog Food"
},
{
"id": 23,
"title": "Eggs"
}
],
"categories": ["ingredients", "furniture", "pets"]
}
Here we are only interested in the products property, which holds an array of objects. The JMESPath expression to get it is fairly simple – it’s just the name of the property:
products
This will return the array of objects associated with the products
property. For each item in the array, a data source row will be created.
Define columns
Now that we have our source, we must define which parts of it we want to convert into data source columns.
Let’s say each product object from the example above is a bit more complex and looks like this:
{
"id": 21,
"title": "Cucumber",
"description": "Crisp and hydrating cucumbers, ideal for salads, snacks, or as a refreshing side.",
"category": "groceries",
"price": 1.49,
"discountPercentage": 11.44,
"rating": 4.71,
"stock": 22,
"tags": [
"vegetables",
"onsale"
],
"sku": "6KGF2K6Z",
"weight": 9,
"warrantyInformation": "5 year warranty",
"shippingInformation": "Ships overnight",
"availabilityStatus": "In Stock",
"reviews": [
{
"rating": 4,
"comment": "Very satisfied!",
"date": "2024-05-23T08:56:21.620Z",
"reviewerName": "Elijah Hill",
"reviewerEmail": "elijah.hill@x.dummyjson.com"
},
{
"rating": 5,
"comment": "Fast shipping!",
"date": "2024-05-23T08:56:21.620Z",
"reviewerName": "Liam Garcia",
"reviewerEmail": "liam.garcia@x.dummyjson.com"
},
{
"rating": 4,
"comment": "Excellent quality!",
"date": "2024-05-23T08:56:21.620Z",
"reviewerName": "Ella Cook",
"reviewerEmail": "ella.cook@x.dummyjson.com"
}
],
"returnPolicy": "30 days return policy",
"minimumOrderQuantity": 7,
"images": [
"https://cdn.dummyjson.com/products/images/groceries/Cucumber/1.png"
],
"thumbnail": "https://cdn.dummyjson.com/products/images/groceries/Cucumber/thumbnail.png"
}
As you can see, each product contains nested data and we may not be interested in all of it and we have to bring it into a tabular form.
To do so, we create as many column definitions as we need, each one defined by a unique name and type (as with other data sources). The data for each column can come from anywhere in the object, it doesn’t have to be a top-level property.
Here are some examples for column definitions:
Column: Id
Type: Number
Path:id
Result:21
Column: Title
Type: Text
Path:title
Result:Cucumber
Column: Reviews
Type: List<Text>
Path:reviews[*].comment
Result:[ "Very satisfied!", "Fast shipping!", "Excellent quality!" ]
Note how in the last column we are able to pick the comment
from each object in the reviews
array and get a single list of strings as a result, which we can store in a List<Text>
column in our data source.
JMESPath allows us to select and transform data in a myriad of ways – here are a few more advanced examples:
Get the URL of the first image (or
null
if theimages
array is empty or missing):
Path:images[0]
Result:https://cdn.dummyjson.com/products/images/groceries/Cucumber/1.png
Combine all items in the
tags
array into 1 string:
Path:join(',', tags)
Result:vegetables,onsale
Get the average rating across all reviews of the product:
Path:avg(reviews[*].rating)
Result:4.333333333333333
Advanced Scenarios
In this section we tackle a few more in-depth problems that you may run across in your data sources. We recommend that you familiarize yourself with JMESPath a bit before diving into these samples.
JMESPath Community Edition
The JMESPath Community Edition enhances JMESPath with a number of useful features that we will make use of in the examples below. You won’t find information about these features on the official JMESPath website. Both sites are a valuable source of information and examples, but please keep in mind that only the Community Edition contains information about the newer features.
You can try the examples yourself by using the below JSON file for your data source.
Here is an excerpt from that file so you can get an idea of what we’re dealing with:
{
"products": {
"p1": {
"title": "Snare Boot",
"productType": "Boots",
"options": [
{
"name": "Color",
"optionValues": [
{
"name": "Black"
},
{
"name": "Blue"
}
]
}]}}} // collapsed for brevity
This example showcases 2 common scenarios you may come across:
inside
products
the property names are used as unique keys to identify each product (“p1"
)compare this to the simple example above, where we have an
id
property in each product object, which is easier to access as we’ll soon learn
the
options
array can contain multiple objects which are identified by theirname
propertythis makes the
“Color"
option more difficult to access, as we’ll see
Property names as keys
Let’s assume that we want to store the product IDs (“p1"
, “p2"
etc.) as an id
column in our data source. We can use products
as our source expression, which seems good at first but unfortunately, only the property values inside products
will be available to our column expressions – the property names will be lost.
We can use
items(products)
instead, which will turn each property name/value pair into an array[ <name>, <value> ]
insteadWith that, we can go to our column definitions and use
[0]
to get the property name (“p1"
) and[1]
to access the property value (e.g.[1].title
would yield“Snare Boot"
)While this works, it makes all our column definitions unwieldy, as most will have to start with
[1]
Fortunately, we can transform the source even more to make our column definitions easier:
map(&merge({id: [0]}, [1]), items(products))
Let’s break down what happens in this expression:
as before, we use
items(products)
to get an array of[ <name>, <value> ]
arraysthis array is passed to the
map
function as its 2nd argument.map
applies an expression to each item of an array, thus transforming it. These function-like expressions start with&
the expression we want to apply to each item is
merge({id: [0]}, [1])
which takes the product object ([1]
) and merges it with another object that we create in placeThis in-place created object is used to store our property name in an “id” property:
{id : [0]}
The result of this expression is an array of objects that now contain the product id as an easy-to-access property:
{
"id": "p1",
"title": "Snare Boot",
"productType": "Boots",
"options": [
{
"name": "Color",
"optionValues": [
{
"name": "Black"
},
{
"name": "Blue"
}
]
}
]
}
Filtering to find objects
Let’s continue the previous example by creating a column definition to hold all colors of a product as a List<Text>
.
As we outlined in the introduction, the options
array contains a number of objects that are identified by their name
property, which makes it more difficult for us to access the “Color"
object. Let’s look at the final expression again and then break it down:
options[?name=='Color'].optionValues[*].name[]
JMESPath allows us to filter arrays by writing a filter expression:
[? name == ‘Color’]
will return all objects from the array that have aname
property with the value“Color"
Next we need to access the
optionValues
to get to the actual color names;optionValues[*].name
will do that for usWe now get an array of names inside an array as a result:
[["Black", "Blue"]]
(due to how projection and filters work in JMESPath)The flatten operator
[]
will collapse these nested arrays into a single array["Black", "Blue"]
which we can finally use for our column
Get all Colors across all Products
You may want to create a data source that contains all possible product colors, e.g. to allow users to filter products by their preferred color. Unfortunately, our source JSON doesn’t list all available colors explicitly. Each product indicates in which colors it is available, but it is our job to combine all these options into a single table.
We can use the filter expression from the previous example, but apply it to all products:
products.*[].options[?name=='Color'].optionValues[][].name
As you can see we have to use the flatten operator []
several times here – it’s best if you just copy the expression and take it apart yourself to see the individual results and what each subsequent expression does.
While this solves our problem, we do get duplicate color names. If we only want distinct color names in our data source, we have to augment the above expression even further. We can use the group_by
function to group all our color objects by their name
(thus eliminating any duplicates) and then use the keys
function to return only the property names. Again, we recommend you play around with the expression yourself to see the in-between steps
keys(group_by(products.*[].options[?name=='Color'].optionValues[][], &name))
Create a Permutation / Cross-Join Table
Let’s say we want to create a data source as shown below, where we list each product in each color it is available:
One use case for a table such as this is to filter by product title, to end up with all available colors for that product. Note that you would usually want to use a product ID and not a title for something like this, but since the ID isn’t easily accessible in this scenario (as demonstrated in the 1st example), we will use title
for simplicity. You can of course combine the 1st example with this one to use the product ID instead of the title.
Here is the source expression to arrive at this table:
map(&let $product = @ in options[?name=='Color'].optionValues[*].{ title: $product.title, color: name}[], values(products))[]
We are using the
map
function again so that we can process each product individuallyThe key trick in this example is to introduce a variable with the
let
expression, which allows us to store the current object@
in the variable$product
so that we can use it further on.In JMESPath, there is no way to access a ‘parent’ object once we navigated down to e.g. the Color level in an object. We can store such an object in a variable though and thus have access to it at that level.
The
Color
filter has already been showcased in previous examples, the last noteworthy action here is the in-place creation of a new object which holds the product title and color name:{ title: $product.title, color: name}