With the variety of deployment options available, getting started with StackStorm is now easier than ever. However, it's easy to run into small data processing issues when working with complex workflows.
You have the right data, you’re processing it in your workflow, but there’s just something that doesn’t quite work until you find the right technique. Read this blog to learn 5 useful techniques for processing data in StackStorm.
In This Post
-
Jinja2 vs. YAQL
-
Nested Variable Interpolation
-
Managing JSON Data with Jinja Filters
-
Jinja Loop inside Variable Publish
-
Duplicate Variable Publish
-
Selecting data from a
with items
based task
Up First: Jinja2 vs. YAQL
Each templating language excels in its own area of expertise—they can even be used interchangeably in most cases. However, the cases where Jinja2 and YAQL differ can make a huge impact on your data processing. It’s important to learn when to use each solution to unlock one of the most flexible parts of StackStorm.
Jinja2
-
Excels at filtering & converting collections and objects to other data types
-
☕ to_json_string
-
-
Has more programmatic features
-
🤖 for default values if there is no value
-
YAQL
-
More reliable at selecting parts of data or manipulating data objects
-
Has fewer problems with special characters appearing in data
StackStorm Data Processing Techniques
With the necessary context out of the way, let’s get started with 5 techniques to improve your workflows!
1. Nested Variable Interpolation
One issue that can arise with centrally hosted but distributed packs is that different environments may have different variable names for the same field and data. Consider an example with a Development and Production Jira instance, where a particular customfield_xxxxx
object payload.fields.data
differs between the two environments.
Production may have payload.fields.customfield_12345
for your data
and Dev's customfield variable for the same object may be a totally different integer in payload.fields.customfield_24680
. How can you design your actions & packs to be more flexible with respect to their environments?
One solution is to add a setting into the pack config schema which allows us to define the correct customfield
for your environment. Let’s use the Production field value from the above data for this example:
> config.schema.yaml
jira_data_field:
description: "Jira custom field ID associated with `data`"
type: "string"
default: "customfield_12345"
required: true
However, you cannot call the pack config_context
directly in workflows. You’ll then need to modifyyour action’s metadata file to include the config_context
parameter:
> action-metadata.yaml
jira_payload:
required: true
type: object
jira_data_field:
required: true
default: "{{ config_context.jira_data_field }}"
type: string
After that, you still need to specify the new input in the workflow
:
> action-workflow.yaml
version: 1.0
input:
- jira_payload
- jira_data_field
With all of that in place, you can now do nested variable using YAQL!
- jira_data: </% ctx().jira_payload.fields.get(ctx().jira_data_field) %/>
This will first resolve .get()
, fetching whichever customfield
value was passed in from the config.schema.yaml
. After that value is inserted, it would then resolve the value using the customfield value from the pack. Essentially what happens is:
- jira_data: </% ctx().jira_payload.fields.get(ctx().jira_data_field) /%>
- jira_data: </% ctx().jira_payload.fields.customfield_12345 /%>
- jira_data: "{data}"
Note: This is only doable with YAQL. When testing these patterns, Jinja2was unable to “resolve down” regardless of whether it used the Expression delimiter or Statement delimiter. This works exactly the way it should with YAQL. If you’ve been able to get it working with Jinja2, hit me up in the Bitovi Community Slack’s #devops channel!
2. Managing JSON Data with Jinja2 Filters
As was hinted to above, one of the most useful parts of Jinja2 is the ability to easily string-ify data and turn it back into objects. StackStorm generally prefers string variables, so having ways of easily converting it to other data types and processing it is extremely helpful.
If you manually reach out to an API using a core.local
, core.http
, or some other command line method, you will likely receive a string based response in your action's result
or stdout
. Having this pattern available is very useful when integrating new APIs:
fetch_data:
action: core.http
input:
url: "{{ ctx().api_url }}"
method: GET
verify_ssl_cert: true
next:
publish:
- json_string_response: <% task(fetch_data).result.body %>
do: convert_to_json_object
convert_to_json_object:
action: core.noop
next:
- when: <% succeeded() %>
publish:
# Load response as JSON object so we can filter/select
- json_object: "{{ ctx().json_string_response | from_json_string }}"
do: send_to_db
send_to_db:
action: my_pack.backup_to_mongodb
input:
ip: "{{ ctx().mdb_instance_ip }}"
db: "{{ ctx().mdb_db_name }}"
collection: "{{ ctx().mdb_collection_name }}"
db_username: "{{ ctx().mdb_db_username }}"
db_password: "{{ ctx().mdb_db_password }}"
json_data: "{{ ctx().json_object.data | to_json_string }}"
Because you first converted json_string_response
to a json_object
you were able to select out your json_object.data
key on line 29. If you had not first converted the object type, ctx().json_object.data
would fail due to a “expected object type ‘dict’ got ‘string’” error.
At the same time you’re selecting your data from the object, you’re still able to convert the data back to a json string, should the action require a string object type. Best of both worlds!
This could also be slightly condensed where the initial fetch_data
is published directly to a json object with from_json_string
, but I wanted to show the conversion as a distinct step.
3. Jinja Loop inside Variable Publish
One of the more interesting data processing techniques is the loop inside a variable publish. For example, let’s say you're developing a workflow that could receive a list of multiple alerts bundled together. Each of the alerts has three fields, host
, port
, and message
.
For example:
{"alerts":
[{"host":"hostA", "port":"12345", "message":"Unable to connect."},
{"host":"hostB", "port":"24680", "message":"No route to host."},
…]
}
You’d want to collect this information and cleanly format it such that you could post a Jira ticket relating to the alerts that would be a more readable than just pasting the JSON object. A neat trick you can do is to use a Jinja for loop within a variable publish to format multiple lines together:
format_output:
action: core.noop
next:
- when: <% succeeded() %>
publish:
- formatted_alerts: |
{% for alert in ctx().alerts %}
Connection to {{ alert.host }}:{{ alert.port }} failed!
Error Code: {{ alert.message }}
---
{% endfor -%}
This will give you a formatted_alerts
var containing a nicely formatted text block:
Connection to hostA:12345 failed!
Error Code: "Unable to connect."
---
Connection to hostB:24680 failed!
Error Code: "No route to host."
---
...
The resulting contents of the above variable can be easily added as an e-mail output or any place you want to send the formatted messaging.
Using this in-line Jinja for loop can be a bit more useful than using the Stackstorm with: items:
functionality, as you do not need to prior specify the object you are passing in as part of the task metadata. As long as you have the array you want to work with, this pattern can be used most anywhere inside of a workflow.
4. Duplicate Variable Publish
Let’s say you had some data that you want to apply both a Jinja filter to as well as a YAQL selector, without having individual tasks for each. Is this even possible? Yes!
Normally, mixing YAQL and Jinja will cause immediate errors if used within the same fields. However, you can publish the same variable multiple times using each templating language when you need it; including publishing the same variable multiple times in a single task.
format_output:
action: core.noop
next:
- when: <% succeeded() %>
publish:
- selected_data: <% ctx(data).where($.type = 'foo').select($.data) %>
- selected_data: {{ ctx(selected_data) | to_json_string }}
If you call selected_data
following this step, the result would be the data selected matching type = foo
, in the form of a JSON string object.
We’re not technically mixing YAQL and Jinja here as they exist within their own statements. The publish
heading does not require a dictionary, so you're allowed to have duplicate keys in the data, thus allowing the ‘double publish’ to the same var.
5. Selecting Data from a with items
Based Task
Rounding out our data processing tips is just a simple pointer on how to use the data output by a with: items:
based workflow task.
If you’ve tried selecting task(foo)
output data before, you may have noticed that on occasion you’ll need to select data from a result.result
key if the task does not specifically export values.
When using the with: items:
task pattern, e.g curl
ing against a single endpoint IP using an array of Ports:
test_endpoint:
action: core.local
with: port in <% ctx().endpoint_ports %>
input:
cmd: "curl -w '{\"http_code\":\"%{http_code}\", \"remote_ip\":\"%{remote_ip}\", \"remote_port\":\"%{remote_port}\"}' '<% ctx().endpoint_url %>:<% item(port) %>' -o /dev/null -m 60"
next:
- when: <% succeeded() %>
publish:
- curl_results: <% task(test_endpoint).result.items.result.stdout %>
You’ll need to select result.items.result
, even in the case of an array with a single item. If this function above was provided only a single port, the output would still be in the form of result.items.result
.
The -w
flag chooses to write out only specific information, which has been manually formatted and escaped into a JSON object. -o /dev/null
suppresses all other out dialogue. The local version of curl
is slightly out of date, otherwise you could have used -w json
to output all variables in JSON format (--write-out json
was only introduced in curl v7.70) instead of manual formatting.
Even though this loops and would seem like each action would begin its own branching workflow, or would otherwise clobber the variable to only contain the last result. curl_results
will contain all of the results for each curl. Each new task result
from the list of items
will be appended to that published variable as an array, for example:
> curl_results:
[{"http_code":"401", "remote_ip":"2.4.6.8", "remote_port":"3000"},
{"http_code":"200", "remote_ip":"1.3.5.7", "remote_port":"80"},
{"http_code":"200", "remote_ip":"1.3.5.7", "remote_port":"3821"}]
Takeaways
It’s easy to get hung up on small data processing issues when you’re getting started with StackStorm, as there are many corner case pitfalls you can run into. The most important thing to remember is that both YAQL and Jinja excel in certain ways, so if you’re having issues with one of the languages, perhaps there’s a case to be made for using the other. Having both in the pocket as alternatives to each other is one of Stackstorm’s greatest strengths.
If you found these patterns helpful, consider saying thanks in the Bitovi's Community Discord #devops channel. Or if you have any ideas or tips and tricks you’ve found out yourself that you want to chat about, the best place to share those thoughts is the StackStorm Community Slack channel!
Need Help?
Bitovi has consultants that can help. Drop into Bitovi's Community Discord, and talk to us in the #devops
channel!
Need DevOps Consulting Services? Head over to https://www.bitovi.com/devops-consulting , and book a free consultation.
Previous Post
Next Post