Revisiting 'Clojure Don'ts : concat

Nostalgia City

I've recently started maintaining a Clojure codebase that hasn't been touched for over a decade - all Clojure devs that built and maintained it are long gone. It's using java8, Clojure 1.6 and libs like korma and noir - remember those? Contrary to the prevailing Clojure lore, upgrading Clojure will not be just a matter of changing version numbers in the lein project.clj.

I find one of the most dated aspects of the project is the laziness. I only use laziness as an explicit choice and have done so for many years. Laziness is a feature I find I rarely need, but is sometimes just the right fit.

A lot of the original Clojure collection functions are lazy and it is still common to see new code written with them - I think because they are still seen as an idiomatic default, rather than a conscious choice. Non-lazy versions like mapv and filterv came later and transducers later still, but of course the old functions must continue to work as before.

Investigating a bug in the codebase led me back to this great blog post, Clojure Dont's: Concat also written around a decade ago. The rest of this post will discuss that post, so if you haven't read it, please read that first (and ofc the rest of the 'Dont's series is also good').

Revisiting the post

I had first read the post many years ago and had forgotten the details - I guess, the main thing I remembered was 'don't use concat' - which is maybe a good heuristic but actually missed the main point which could be phrased as build lazy sequences starting from the outside - I'll explain the outside thing further on.

Reading it again, I had go over it couple of times to fully understand it - if it was crystal clear to you then you've no need to read on. To check your understanding - answer this: what difference would it make (wrt overflow) to change the order of the args to concat in the build-result function?

Following is my attempt to make the post's message even clearer.

The post mentions that seq realises the collection and causes the overflow. Just in case it is not clear, seq does not in general realise lazy collections in entirety, it just realises the first element.

To demonstrate that, have a look at the following, which is like range but the numbers in the sequence descend to one :


(defn range-descending [x]
  (when (pos? x)
    (lazy-seq
      (cons x (range-descending (dec x))))))

(let [_ (seq (range-descending 4000))]
  nil) ; => ok, no overflow

This is what one might call an outside-in lazy sequence. As the sequence is generated, one might picture it like this:

(4000, LazySeq-obj)
(4000, 3999, LazySeq-obj)
(4000, 3999, 3998, LazySeq-obj)
...

Calling seq on the collection, only the first element is realized, so no overflow.

The equivalent to the way concat was used in the original post would be more like this:

  (defn range-descending-ohno [x]
    (when (pos? x)
      (lazy-seq
        (conj (range-descending-ohno (dec x)) x))))

Now visualising the sequence generation, it would look more like this:

(conj LazySeq-obj 4000)
(conj (conj LazySeq-obj 3999) 4000)
...
(conj `...` (conj nil 1) `...` 4000)

Now when calling seq (as in (seq (range-descending-ohno 4000))), the whole sequence needs to be realised for seq to get to the first element (4000 in the example). As the post says: seq has to recurse through them until it finds an actual value. One might call this an inside-out lazy sequence.

Conclusion

The original post concludes Don’t use lazy sequence operations in a non-lazy loop - which I would update to add don't use laziness at all unless required.

If deciding to use laziness, avoid building sequences inside-out - this might be in your direct usage of e.g. lazy-seq or hiding in plain sight in your usage of e.g. clojure.core functions such as concat.

Henry Widd's Blog

Revisiting 'Clojure Don'ts : concat

Nostalgia City

Revisiting the post

Conclusion

Further Reading