WSSCode Blog

Pathom Updates 7, Pathom 3 goes Async

January 20, 2021

Welcome to one more edition of Pathom updates!

Recursive queries

To start, I like to talk about recursive queries.

This important feature was missing until recently on Pathom 3, but not anymore!

You can know more about this feature at this documentation page.

Hacker News scraper tutorial

This is a recent addition to the tutorials on the documentation site.

In this tutorial, I model the Hacker News data with Pathom. For the implementation, I did a scraping strategy, extracting data from the HTML.

This tutorial is medium size and touches some many aspects of Pathom. If you like to learn though building (which is one of the most effective ways IMO), check it out!

Async support

Pathom now has a new runner implementation that supports resolvers to use async processes.

In the async runner, when a resolver or mutation returns a future-like thing, Pathom will wait for that future to realize before moving on.

For the underlying implementation, Pathom is using Promesa. Promesa is fast and uses good native primitives under the hood: on the JVM it uses CompletableFuture, and in the JS it uses Promises.

I’m quite happy with the performance of it (benchmarks down below in this article).

It’s also extensible. I documented how to extend it to support core.async channels instead of futures, you can find this at the async documentation.

Benchmarks

In Pathom 2, I used core.async as the primary building block for the async support.

When I started the Pathom 3 async support, I did the same. After measuring the performance, it wasn’t that good, the overhead to process the same sync items using the new async runner was considerably slower.

So I decided to give a second take to it, and try something else, which was Promesa.

With Promesa, I got a performance very close to the serial!

Here are the benchmark results, they also include Pathom 2 runners:

note

All those tests were done in the JVM, using Criterium to measure the executions.

You can find the title for each benchmark below the bars.

RunnerMeanVariance
Pathom 3 Serial Cached Plan0.009ms1.000x
Pathom 3 Serial0.042ms4.982x
Pathom 2 Serial0.057ms6.715x
Pathom 2 Async0.108ms12.714x
Pathom 2 Parallel0.145ms17.048x
RunnerMeanVariance
Pathom 3 Serial Cached Plan0.013ms1.000x
Pathom 3 Serial0.028ms2.047x
Pathom 2 Serial0.574ms42.605x
Pathom 2 Async1.086ms80.521x
Pathom 2 Parallel1.682ms124.760x
RunnerMeanVariance
Pathom 3 Serial Cached Plan19.404ms1.340x
Pathom 3 Serial14.479ms1.000x
Pathom 2 Serial111.461ms7.698x
Pathom 2 Async228.621ms15.790x
Pathom 2 Parallel123.812ms8.551x
RunnerMeanVariance
Pathom 3 Serial Cached Plan19.699ms1.025x
Pathom 3 Serial19.219ms1.000x
Pathom 2 Serial114.969ms5.982x
Pathom 2 Async236.170ms12.289x
Pathom 2 Parallel69.532ms3.618x
RunnerMeanVariance
Pathom 3 Serial Cached Plan18.412ms1.000x
Pathom 3 Serial19.476ms1.058x
Pathom 2 Serial149.716ms8.131x
Pathom 2 Async281.580ms15.293x
Pathom 2 Parallel124.575ms6.766x
RunnerMeanVariance
Pathom 3 Serial Cached Plan17.396ms1.000x
Pathom 3 Serial18.194ms1.046x
Pathom 2 Serial139.458ms8.017x
Pathom 2 Async280.384ms16.117x
Pathom 2 Parallel141.067ms8.109x
RunnerMeanVariance
Pathom 3 Serial Cached Plan21.772ms1.000x
Pathom 3 Serial22.292ms1.024x
Pathom 2 Serial140.549ms6.456x
Pathom 2 Async308.395ms14.165x
Pathom 2 Parallel97.979ms4.500x
RunnerMeanVariance
Pathom 3 Serial Cached Plan208.209ms1.000x
Pathom 3 Serial211.243ms1.015x
Pathom 2 Serial220.532ms1.059x
Pathom 2 Async240.124ms1.153x
Pathom 2 Parallel214.609ms1.031x
RunnerMeanVariance
Pathom 3 Serial Cached Plan29.681ms1.000x
Pathom 3 Serial31.858ms1.073x
Pathom 2 Serial300.165ms10.113x
Pathom 2 Async327.642ms11.039x
Pathom 2 Parallel87.656ms2.953x

I don’t think this is a signal that core.async is slow. The way I’m using core.async probably has a big impact. Since core.async doesn’t have an error propagation method built-in, I have to create my constructs. This means I have to check for errors at each channel read, adding overhead.

I think core.async is just not appropriated for the usage at hand. A future based mechanism ended up suiting this situation better.

Parallel support (not available yet)

Parallel support isn’t available yet, but it will probably come as an extension to this same async runner.

Making it do some blind parallelism is easy. For example, during the collection process I could trigger all the items at the same time.

To make a non-naive implementation of parallel support also requires the implementation of resource management. I want to allow users to configure things like:

  • How many items should run in parallel for a given sequence?
  • How many “operations” can a single request do in parallel
  • Configure thread pools for parallel process

In Pathom 2, the parallel process required a lot of overhead. Due to structural changes in Pathom 3, this is likely to be different this time.

Another big difference is that in Pathom 2 the runner had to recalculate paths multiple times when things go wrong. The new planner already knows every possible path ahead of time. The new runner implementation for parallel will be much simpler because of this pre-work from the planner.

Once those are there, the same current code can run in parallel!

Porting repl-tooling

Once I got the basics of async working, I wanted to use it in some real applications.

Luck for me there is the repl-tooling used by Chlorine editor.

Mauricio Szabo used Pathom to compute editor related information, and there are some interesting, complex dependencies in this process. At the same time, the code is small, which makes a great candidate for the porting experiment.

note

This is a good example of using Pathom outside the API realm. The task of Pathom is to handle data realization via declarative attribute relationships. This is a property you can leverage in any domain you are working with!

Porting the code was easy. Repl-tooling isn’t using any plugins or fancy things. In the process I did simplify the code by using the implicit inputs feature.

The other change was on the interface edge, to replace the parser usage with the new EQL async processor.

Then I ran the tests. Almost all of them failed.

The exercise was cool. With some debug, I figured some issues with the planner and the runner. Over the weekend, those got fixed, so if you are using Pathom 3, be sure to upgrade to get those fixes.

Some problems were easy to fix, like the code that filters the output on the EQL, it was losing the record types when they were present in the data. Or a problem with lists getting reversed.

tip

One of the bugs was a consequence of a bad assumption. I assumed the following code would always output the same collection in the end:

(into (empty coll) coll)

It turns out this is not true. In the case of lists, the output has the items in reversed order.

Most of the time, I show here some small graphs that I use for testing. This time I have the opportunity to show you one from a real application and this is what it looks like:

Chlorine Graph

The Pathom 3 algorithm is a new thing. For that reason, I still expect to find some bugs like this. With a crescent number of tests, I hope the issues will lower in frequency.

Next step: tooling

Next I’ll work on tooling!

I plan to extend the current Pathom Viz app also to support Pathom 3. This will make the same app work with both versions.

I think I can re-use the query editor, the index is almost the same, and this should make the porting easy.

The tracer, I’m not sure yet. The process of Pathom 3 is too different from Pathom 2, and I need to do some experimentation to see if I can re-use, or if that needs to be something new. I indeed believe a complete view of the query like the tracer is essential to enable query debugging at a glance.

There is also the new graph visualization that I’ve shown in some posts here. This view is likely to be integrated into the timeline view to inspect the graph per execution entity.

These are the new challenges. If you like to discuss any of these things, reach out at #pathom on Clojurians Slack.

That’s it for today, see you!


Follow closer

If you like to know in more details about my projects check my open Roam database where you can see development details almost daily.

Support my work

I'm currently an independent developer and I spent quite a lot of my personal time doing open-source work. If my work is valuable for you or your company, please consider supporting my work though Patreon, this way you can help me have more available time to keep doing this work. Thanks!

Current supporters

And here I like to give a thanks to my current supporters:

Albrecht Schmidt
Alister Lee
Austin Finlinson
Daemian Mack
Jochen Bedersdorfer
Kendall Buchanan
Mark Wardle
Michael Glaesemann
Oleg, Iar, Anton
West