5 Useful StackStorm Processing Techniques

With the variety of deployment options available, getting started with StackStorm is now easier than ever. However, it's easy to run into small data processing issues when working with complex workflows.

You have the right data, you’re processing it in your workflow, but there’s just something that doesn’t quite work until you find the right technique. Read this blog to learn 5 useful techniques for processing data in StackStorm.

In This Post

Jinja2 vs. YAQL
Nested Variable Interpolation
Managing JSON Data with Jinja Filters
Jinja Loop inside Variable Publish
Duplicate Variable Publish
Selecting data from a with items based task

Up First: Jinja2 vs. YAQL

Each templating language excels in its own area of expertise—they can even be used interchangeably in most cases. However, the cases where Jinja2 and YAQL differ can make a huge impact on your data processing. It’s important to learn when to use each solution to unlock one of the most flexible parts of StackStorm.

Jinja2

Excels at filtering & converting collections and objects to other data types
- ☕ to_json_string
Has more programmatic features
- 🤖 for default values if there is no value

YAQL

More reliable at selecting parts of data or manipulating data objects
- ➗ regexp
- 🔑 where($.var = 'foo').select($.value)
- 👉 inserting data into a collection
Has fewer problems with special characters appearing in data
- 👄 Braces {}
- 😑 Double Underscore __
- 💩 Colon :
- 💰 Dollar Sign $
- 💬 Quotation Marks “ '

StackStorm Data Processing Techniques

With the necessary context out of the way, let’s get started with 5 techniques to improve your workflows!

1. Nested Variable Interpolation

One issue that can arise with centrally hosted but distributed packs is that different environments may have different variable names for the same field and data. Consider an example with a Development and Production Jira instance, where a particular customfield_xxxxx object payload.fields.data differs between the two environments.

Production may have payload.fields.customfield_12345 for your data and Dev's customfield variable for the same object may be a totally different integer in payload.fields.customfield_24680. How can you design your actions & packs to be more flexible with respect to their environments?

One solution is to add a setting into the pack config schema which allows us to define the correct customfield for your environment. Let’s use the Production field value from the above data for this example:

> config.schema.yaml

jira_data_field:
  description: "Jira custom field ID associated with `data`"
  type: "string"
  default: "customfield_12345"
  required: true

However, you cannot call the pack config_context directly in workflows. You’ll then need to modifyyour action’s metadata file to include the config_context parameter:

> action-metadata.yaml

jira_payload:
  required: true
  type: object
jira_data_field:
  required: true
  default: "{{ config_context.jira_data_field }}"
  type: string

After that, you still need to specify the new input in the workflow:

> action-workflow.yaml

version: 1.0

input:
  - jira_payload
  - jira_data_field

With all of that in place, you can now do nested variable using YAQL!

- jira_data: </% ctx().jira_payload.fields.get(ctx().jira_data_field) %/>

This will first resolve .get(), fetching whichever customfield value was passed in from the config.schema.yaml. After that value is inserted, it would then resolve the value using the customfield value from the pack. Essentially what happens is:

- jira_data: </% ctx().jira_payload.fields.get(ctx().jira_data_field) /%>
- jira_data: </% ctx().jira_payload.fields.customfield_12345 /%>
- jira_data: "{data}"

Note: This is only doable with YAQL. When testing these patterns, Jinja2was unable to “resolve down” regardless of whether it used the Expression delimiter or Statement delimiter. This works exactly the way it should with YAQL. If you’ve been able to get it working with Jinja2, hit me up in the Bitovi Community Slack’s #devops channel!

2. Managing JSON Data with Jinja2 Filters

As was hinted to above, one of the most useful parts of Jinja2 is the ability to easily string-ify data and turn it back into objects. StackStorm generally prefers string variables, so having ways of easily converting it to other data types and processing it is extremely helpful.

If you manually reach out to an API using a core.local, core.http, or some other command line method, you will likely receive a string based response in your action's result or stdout. Having this pattern available is very useful when integrating new APIs:

fetch_data:
action: core.http
input:
  url: "{{ ctx().api_url }}"
  method: GET
  verify_ssl_cert: true
next:
    publish:
      - json_string_response: <% task(fetch_data).result.body %>
    do: convert_to_json_object
  
  convert_to_json_object:
action: core.noop
next:
  - when: <% succeeded() %>
    publish:
    # Load response as JSON object so we can filter/select
      - json_object: "{{ ctx().json_string_response | from_json_string }}"
    do: send_to_db
      
  send_to_db:
action: my_pack.backup_to_mongodb
input:
  ip: "{{ ctx().mdb_instance_ip }}"
  db: "{{ ctx().mdb_db_name }}"
  collection: "{{ ctx().mdb_collection_name }}"
  db_username: "{{ ctx().mdb_db_username }}"
  db_password: "{{ ctx().mdb_db_password }}"
  json_data: "{{ ctx().json_object.data | to_json_string }}"

Because you first converted json_string_response to a json_object you were able to select out your json_object.data key on line 29. If you had not first converted the object type, ctx().json_object.data would fail due to a “expected object type ‘dict’ got ‘string’” error.

At the same time you’re selecting your data from the object, you’re still able to convert the data back to a json string, should the action require a string object type. Best of both worlds!

This could also be slightly condensed where the initial fetch_data is published directly to a json object with from_json_string, but I wanted to show the conversion as a distinct step.

3. Jinja Loop inside Variable Publish

One of the more interesting data processing techniques is the loop inside a variable publish. For example, let’s say you're developing a workflow that could receive a list of multiple alerts bundled together. Each of the alerts has three fields, host, port, and message.

For example:

{"alerts":
[{"host":"hostA", "port":"12345", "message":"Unable to connect."},
{"host":"hostB", "port":"24680", "message":"No route to host."},
…]
}

You’d want to collect this information and cleanly format it such that you could post a Jira ticket relating to the alerts that would be a more readable than just pasting the JSON object. A neat trick you can do is to use a Jinja for loop within a variable publish to format multiple lines together:

format_output:
action: core.noop
next:
  - when: <% succeeded() %>
    publish:
      - formatted_alerts: |
          {% for alert in ctx().alerts %}
          Connection to {{ alert.host }}:{{ alert.port }} failed!
          Error Code: {{ alert.message }}
          ---
          {% endfor -%}

This will give you a formatted_alerts var containing a nicely formatted text block:

Connection to hostA:12345 failed!
Error Code: "Unable to connect."
---
Connection to hostB:24680 failed!
Error Code: "No route to host."
---
...

The resulting contents of the above variable can be easily added as an e-mail output or any place you want to send the formatted messaging.

Using this in-line Jinja for loop can be a bit more useful than using the Stackstorm with: items: functionality, as you do not need to prior specify the object you are passing in as part of the task metadata. As long as you have the array you want to work with, this pattern can be used most anywhere inside of a workflow.

4. Duplicate Variable Publish

Let’s say you had some data that you want to apply both a Jinja filter to as well as a YAQL selector, without having individual tasks for each. Is this even possible? Yes!

Normally, mixing YAQL and Jinja will cause immediate errors if used within the same fields. However, you can publish the same variable multiple times using each templating language when you need it; including publishing the same variable multiple times in a single task.

format_output:
action: core.noop
next:
  - when: <% succeeded() %>
    publish:
      - selected_data: <% ctx(data).where($.type = 'foo').select($.data) %>
      - selected_data: {{ ctx(selected_data) | to_json_string }}

If you call selected_data following this step, the result would be the data selected matching type = foo, in the form of a JSON string object.

We’re not technically mixing YAQL and Jinja here as they exist within their own statements. The publish heading does not require a dictionary, so you're allowed to have duplicate keys in the data, thus allowing the ‘double publish’ to the same var.

5. Selecting Data from a `with items` Based Task

Rounding out our data processing tips is just a simple pointer on how to use the data output by a with: items: based workflow task.

If you’ve tried selecting task(foo) output data before, you may have noticed that on occasion you’ll need to select data from a result.result key if the task does not specifically export values.

When using the with: items: task pattern, e.g curling against a single endpoint IP using an array of Ports:

test_endpoint:
action: core.local
with: port in <% ctx().endpoint_ports %>
input:
  cmd: "curl -w '{\"http_code\":\"%{http_code}\", \"remote_ip\":\"%{remote_ip}\", \"remote_port\":\"%{remote_port}\"}'  '<% ctx().endpoint_url %>:<% item(port) %>' -o /dev/null -m 60"
next:
  - when: <% succeeded() %>
    publish:
      - curl_results: <% task(test_endpoint).result.items.result.stdout %>

You’ll need to select result.items.result, even in the case of an array with a single item. If this function above was provided only a single port, the output would still be in the form of result.items.result.

The -w flag chooses to write out only specific information, which has been manually formatted and escaped into a JSON object. -o /dev/null suppresses all other out dialogue. The local version of curl is slightly out of date, otherwise you could have used -w json to output all variables in JSON format (--write-out json was only introduced in curl v7.70) instead of manual formatting.

Even though this loops and would seem like each action would begin its own branching workflow, or would otherwise clobber the variable to only contain the last result. curl_results will contain all of the results for each curl. Each new task result from the list of items will be appended to that published variable as an array, for example:

> curl_results:

[{"http_code":"401", "remote_ip":"2.4.6.8", "remote_port":"3000"},
 {"http_code":"200", "remote_ip":"1.3.5.7", "remote_port":"80"},
 {"http_code":"200", "remote_ip":"1.3.5.7", "remote_port":"3821"}]

Takeaways

It’s easy to get hung up on small data processing issues when you’re getting started with StackStorm, as there are many corner case pitfalls you can run into. The most important thing to remember is that both YAQL and Jinja excel in certain ways, so if you’re having issues with one of the languages, perhaps there’s a case to be made for using the other. Having both in the pocket as alternatives to each other is one of Stackstorm’s greatest strengths.

If you found these patterns helpful, consider saying thanks in the Bitovi's Community Discord #devops channel. Or if you have any ideas or tips and tricks you’ve found out yourself that you want to chat about, the best place to share those thoughts is the StackStorm Community Slack channel!

Need Help?

Bitovi has consultants that can help. Drop into Bitovi's Community Discord, and talk to us in the #devops channel!

Need DevOps Consulting Services? Head over to https://www.bitovi.com/devops-consulting , and book a free consultation.