Revisiting 'Clojure Don'ts : concat

Nostalgia City

I've recently started maintaining a Clojure codebase that hasn't been touched for over a decade - all Clojure devs that built and maintained it are long gone. It's using java8, Clojure 1.6 and libs like korma and noir - remember those? Contrary to the prevailing Clojure lore, upgrading Clojure will not be just a matter of changing version numbers in the lein project.clj.

I find one of the most dated aspects of the project is the laziness. I only use laziness as an explicit choice and have done so for many years. Laziness is a feature I find I rarely need, but is sometimes just the right fit.

A lot of the original Clojure collection functions are lazy and it is still common to see new code written with them - I think because they are still seen as an idiomatic default, rather than a conscious choice. Non-lazy versions like mapv and filterv came later and transducers later still, but of course the old functions must continue to work as before.

Investigating a bug in the codebase led me back to this great blog post, Clojure Dont's: Concat also written around a decade ago. The rest of this post will discuss that post, so if you haven't read it, please read that first (and ofc the rest of the 'Dont's series is also good').

Revisiting the post

I had first read the post many years ago and had forgotten the details - I guess, the main thing I remembered was 'don't use concat' - which is maybe a good heuristic but actually missed the main point which could be phrased as build lazy sequences starting from the outside - I'll explain the outside thing further on.

Reading it again, I had go over it couple of times to fully understand it - if it was crystal clear to you then you've no need to read on. To check your understanding - answer this: what difference would it make (wrt overflow) to change the order of the args to concat in the build-result function?

Following is my attempt to make the post's message even clearer.

The post mentions that seq realises the collection and causes the overflow. Just in case it is not clear, seq does not in general realise lazy collections in entirety, it just realises the first element.

To demonstrate that, have a look at the following, which is like range but the numbers in the sequence descend to one :


(defn range-descending [x]
  (when (pos? x)
    (lazy-seq
      (cons x (range-descending (dec x))))))

(let [_ (seq (range-descending 4000))]
  nil) ; => ok, no overflow

This is what one might call an outside-in lazy sequence. As the sequence is generated, one might picture it like this:

(4000, LazySeq-obj)
(4000, 3999, LazySeq-obj)
(4000, 3999, 3998, LazySeq-obj)
...

Calling seq on the collection, only the first element is realized, so no overflow.

The equivalent to the way concat was used in the original post would be more like this:

  (defn range-descending-ohno [x]
    (when (pos? x)
      (lazy-seq
        (conj (range-descending-ohno (dec x)) x))))

Now visualising the sequence generation, it would look more like this:

(conj LazySeq-obj 4000)
(conj (conj LazySeq-obj 3999) 4000)
...
(conj `...` (conj nil 1) `...` 4000)

Now when calling seq (as in (seq (range-descending-ohno 4000))), the whole sequence needs to be realised for seq to get to the first element (4000 in the example). As the post says: seq has to recurse through them until it finds an actual value. One might call this an inside-out lazy sequence.

Conclusion

The original post concludes Don’t use lazy sequence operations in a non-lazy loop - which I would update to add don't use laziness at all unless required.

If deciding to use laziness, avoid building sequences inside-out - this might be in your direct usage of e.g. lazy-seq or hiding in plain sight in your usage of e.g. clojure.core functions such as concat.

Clojurescript using JS libraries via importmap

In this post I am going to look at using the importmap feature (supported by all modern browsers), as an alternative way for Clojurescript apps to access npm dependencies.

The Problem

When a Clojurescript app depends on a regular JS library, such as React for example, then it is typical to:

have the code from the npm library 'processed' in some fashion (e.g. to target a specific JS version)
bundle the 3rd party code with the application code and have it delivered from the application server

There are different ways this can happen, for example:

use :target :bundle with a bundler such as webpack
use shadow-cljs with its default shadow provider

I am a fan of shadow-cljs and so would typically use the second option. What this actually does is :simple optimizations on 3rd-party code, which means Google Closure code is going to read 3rd-party libs when the app is being built. Sometimes though, Closure cannot understand the 3rd party code, for example because it doesnt have support for Class fields. In this really interesting talk from Alex Davis, he says he is seeing more and more popular JS libraries that Closure can't handle. I've only had the issue once myself and thankfully was able to configure shadow to use a different file than the problem one.

So, what to do?

Using importmap

Sticking with shadow but using it with a different provider (e.g. webpack) is an option, but for browser apps there is another interesting option to consider: get pre-processed 3rd party libraries directly in the browser via a script tag (e.g. from a CDN such as unpkg).

I used a version of this approach for my experiments that called the deja-fu library's rationale into question. There, I just had a couple of script tags for 3rd party libs, followed by a script tag getting the application code. This meant script tags were order-sensitive and would not scale well because transitive dependencies would not be fetched automatically. Still, for a simple app it worked fine.

Recently though, all browsers have got support for importmap, which is best explained by example:


<script type="importmap">
{
  "imports": {
    "react": "https://esm.sh/react@18.2.0",
    "react-dom/client": "https://esm.sh/react-dom@18.2.0",
     "@tanstack/react-router": "https://esm.sh/@tanstack/react-router",
     "my-demo-app": "/cljs-importmap-demo/cljs-out/main.js"
  }
}
</script>
<script  type="module">
    import start from "my-demo-app";
    start();
</script>

The imports map is a bit like package.json dependencies - it says what libraries are needed and details of how to get them - all must arrive as ES6 modules. Transitive dependencies are also retrieved. After the importmap, the script tag with type=module says to interpret the code within as an ES6 module. Here, it just imports the application code and starts it.

The module my-demo-app just contains the application code, not any 3rd party libraries. To generate the module from clojuresript code is just a matter of using Shadow-cljs documented options for that, for example:

{:target     :esm
 :js-options {:js-provider :import}
 :modules    {:main {:exports {'default 'com.widdindustries.demo-app.app/init}}}}

Here is an example app demonstrating this technique and here is the source code for that.

Pros and cons

This is the first time I've tried using importmap. I failed to google any experience reports of anyone using it from Clojurescript, hence this post. Here are some pros and cons I am aware of so far:

Using importmap, 3rd party libraries can be cached. The application code will also be cached, but is likely to change at a faster rate than library versions get changed, so will be downloaded more frequently, but will be smaller than the tradition bundled version.

The importmap is specified in the html file, but will also need to be specified again for a page that loads tests for example. Also, it may be required to use a dev-time version of a library locally, but deploy with the optimized one. For example, React performance profiling tools only work with the dev-time React version. It is possible to conditionally create the importmap, for example, if on localhost, create one map, if deployed a different one.

The 3rd party libraries are being retrieved wholesale - ie no dead code elimination could happen here. Is significant dead code elimination a thing in JS-land these days though? I've heard of Rollup, but I haven't tried out how it would help trim down React and the like.

importmap is a relatively recent addition to browsers - so might not be suitable for some potential users.

Loading speed vs bundled apps, aka time-to-interactive (TTI)? I haven't measured anything yet. Please comment if you have experience of this.

Any more you'd add? Please use the link below to discuss.

Published: 2023-11-08

Tagged: clojure

Taming a Clojurescript mega-project with Shadow and Kaocha

In this post I am going to look at applying my regular dev setup to a project with a lot of code and tests that take a very long time to run.

Firstly, here are my Clojurescript dev setup requirements:

run a single test from the IDE - ie the one under the cursor
run all tests in namespace similar to above
run any subset of tests based on ns-pattern
see nicely formatted test output - for example clearly highlighting the diff in expected/actual maps
tests in CI output a standard format like junit xml, so results can be interpreted by CI tool
tests in CI can run under advanced compilation (so as to be similar to the production build)
fast incremental compile+reload

Anything you'd add? If so please see link at the bottom for how to discuss

I should also say that I'm generally developing in multi-person teams building SPA's with considerable business complexity. However, the list is still the same even for my own open source or hobby projects, but in some situations not everything above is a must-have. For example, if you only have a small number of fast-running tests then the items above about running subsets of tests will not be so important.

I recently started to do some work on a project with around 800 Clojurescript (browser) tests - which in itself may not sound massive, but there are a number of slower DOM-clicking tests, so total test-run time above 20 minutes. It goes without saying that one would not want to run all the tests in one go - but this fact meant that builds and tests had been split apart and this had led to quite a bit of complexity in the build setup: Local dev was done with figwheel+webpack, configured with multiple extra mains, and shadow+karma in the CI environment.

With the existing build setup, on saving a file you could be waiting 10+ seconds to see the incremental build finish - ouch! To run a single test required knowing which figwheel 'extra-main' the test would be compiled into and loading the auto-test browser page for that, and then doing some other steps I'm not even going to get into... all in all, not ideal.

So... what to do? My preferred Clojurescript build and test setup for the past couple of years has been shadow plus kaocha-cljs2. Everyone knows shadow of course, but kaocha-cljs2 seems weirdly unstarred (< 20) and un-discussed on the interwebs. The combination of these two gives me the above wishlist of course - that's why I chose it. But how well would it scale to the new megaproject? How easy would it be to set up?

Setup

Possibly one reason kaocha-cljs2 seems under-appreciated is that by design there are more moving parts compared to other Clojurescript test setups I have used - for example one needs an extra server (Funnel) for 2-way communication between jvm and js environments.

However, setup couldn't have been easier - and that's because I have a little ready-made shadow+kaocha_cljs2 template that I use in all my projects and libraries. I've called this template tiado-cljs2 and if you want to try it on your project, you'll have it up and running in minutes - see the README for instructions.

Does it scale?

In the way I set up Shadow on this project, there are 4 'watches' going at once, one for the main app, one for tests and a couple of others for some miscellaneous apps. Shadow incrementally compiles just what is needed so if changing a test file, just the test build kicks in. So compared to the old build, incremental compile is often around 5-10x faster.

When it comes to the tests, I have used kaocha suites to split the tests based on ns-patterns - which in CI can be run concurrently. Kaocha-cljs2 doesn't support 'ns-patterns' out of the box yet (as kaocha does for clojure tests) - but luckily kaocha supports user-defined hooks so adding it was not difficult.

With these mega test-suites, the default timeout was not enough. User-defined timeouts are not currently respected by kaocha-cljs2 - so a little monkey patching was needed.

As well as running individual suites, the holy grail of clojurescript testing is surely having an IDE hotkey to run the test under the caret. This is achieved with a simple macro invoking (kaocha.repl/run xxx) - a macro was required (rather than a function) so that it can be invoked when either in a cljs or clj repl.

I could have had multiple test watches instead of one - each watching a subset of tests (this is a shadow feature). However the single one works well enough and makes life easier for developers as there is just a single place for tests to run.

So, whilst compilation and test-running times still seem a bit bigger compared to what I'm used to, the whole thing now feels far more manageable. There may be scope for modularization of the app - I don't know yet, but I'm much happier to investigate that and experiment with a solid, speedier build+test setup.

Finally

The fact that all this works as well as it does is thanks to the shadow and kaocha maintainers of course. Clojurescript would not be in such a good place without them!

Published: 2023-08-17