Log data, not strings - with SLF4J

A recent release of the popular Java logging abstraction SLF4J has a new API enabling structured logging. New Clojure logging macros using this, e.g. (log/info "request-for-help" {"priority" "high"}) are available as slf4clj.

Why Log data?

A couple of years ago I made a survey of all the logging libraries one might use from Clojure. The majority of contenders (clojure.tools.logging foremost among them) I ruled out fairly early because they only log strings, not data.

To explain this limitation, in all logging frameworks you can format messages as json or something and get {level: "INFO", message "Commissioner Gordon called because Gotham city is under attack from the Joker"}. Whilst useful, what is more queryable is to log arbitrary data (key value pairs) and have that data passed as-is to appenders, which might serialise to json, or write to a database. An equivalent message formatted as JSON might be {level: "INFO", message-type: "request for help", urgency: "high", caller: "Commissioner Gordon", foe: "Joker", target:"Gotham" }. Assuming you're not still shelling into boxes and grepping log files, IMO structured log data is something you can't go back from.

Options for Logging data from Clojure

The requirement to log data left two main contenders, MuLog and Log4j2. At the time I decided Log4j2 seemed like the boring, safe choice. Well, that didn't turn out to be quite right haha!

As a result of opting for Log4j2, I put some helper functions in a lib for Clojure users writing log statements against Log4j2. Note: the README for those contains more detailed comparison of existing Clojure logging libraries.

Released since I made that review, and created as a result of log4shell is Amperity's Dialog which is a logging backend for slf4j(1.x, string based) - the de facto Java logging facade. Dialog also provides a Logging API based on logging strings or suggests you use clojure.tools.logging (strings again).

Logging from Library Code

Log4j2 is a 'logging implementation' or 'backend'. Ideally one would write log statements against a logging abstraction, where log statements get channeled to whatever logging backend is in place. This is especially important when writing library code. Users of e.g. Carmine will find logs coming via Timbre whether they like it or not.

Looking at the options here, assuming we want to log data ofc, MuLog might be a choice. It has be made to plug into an slf4j 1.x backend, so surely can be made to plug into other things.

The most obvious choice though is SLF4J, apparently the most popular Java library of any kind. The 1.x version of this has been around for a long time and as you'd expect from an API dating from the noughties it only logs strings. With the 2.0 release, that has changed.

Logging data with Slf4j

Enter slf4j 2.0 - which was released toward the end of 2022. The 2.0 version of this popular logging facade newly includes an API for logging data, whilst remaining backwards compatible with the 1.x API.

The fluent API contains the addKeyValue(String key, Object value) method for structured logging. It's up to implementations as to what to do with the structured data. They may merge it to the MDC for that message for example as Log4j2-slf4j bridge does. The MDC is a map of String->String though, which is a problem if the value happens to be anything other than a string.

It is straightforward to use from Clojure as-is, but there are some convenience macros released as a new library slf4clj. The aim is not to re-create the whole API, just offer some shorthand for the majority of use cases.

As you'd expect there are macros for debug, info, warn etc and the args for each are deliberately the same as Clojure's ex-info, namely (<level> msg map) or (<level> msg map cause).

Here is an example:

(require '[com.widdindustries.slf4clj.core :as log])

(log/info "request-for-help" {"urgency" "high", "caller" "Commissioner Gordon", "foe" "Joker", "target" "Gotham"})

Migration path

If you're using clojure.tools.logging, you can keep your existing setup and just start writing slj4j 2.0 logging statements and that will likely 'just work' in that the data will get printed out in some string format according to your pattern config). Separately you can change your logging backend to something that does more with structured logs than just turning them into strings, like writing them as JSON for example.

Logging APIs bundled with the JVM

If you're logging from a library it is possible to avoid having any logging dependency by using APIs included with the jvm.

java.util.logging (JUL) is the one you have most likely heard of. As you might guess though for something created in the early part of this century, it is a string-based logging API. This question on Stackoverflow goes into some details about JUL but in the threads is mention of a newer platform Logging facade called System.Logger - which might be interesting but doesn't seem to have gained traction AFAICT.


Unrelated FYI

This is my first blog post since moving to Quickblog - a blogging tool powered by Clojure - thanks Borkdude!

Discuss this post here.

Published: 2023-02-09

Tagged: clojure

Clojurescript jar embeds some core libraries

Update: As of mid-2022, Clojurescript has fixed the problems described in this post.

Why having clojurescript in the classpath may lead to unexpected behaviour

The clojurescript maven artifact lists compile dependencies which include: data.json, tools.reader and transit-clj and transit-java.

However the clojurescript jar itself is something like an uberjar: It includes compiled data.json, tools.reader and transit-clj and transit-java namespaces inside itself. That means that although it declares dependencies on those libraries, when you use Clojurescript yourself, those libraries' artifacts are not used at all (despite being on your disk and in the classpath).

The output of this command shows the dependencies I am referring to:

clj -Sdeps '{:deps {org.clojure/clojurescript {:mvn/version "1.11.4" } }}' -Stree

The effect is that if you want to use a different version of one of those libraries compared to the one Clojurescript was compiled with, you can't. Well, you can't with the regular clojurescript artifact, you can use the non-aot version: org.clojure/clojurescript$slim {:mvn/version “1.11.4”}.

This was not an issue for a long time because those libraries didn't change. Now e.g. clojure.data.json has changed, hence why I hit the problem.

A handy technique I usually use to answer the question 'where on the classpath is namespace x.y.z getting loaded from?' is to call io/resource on it. Doing that with data.json and Clojurescript in the classpath gives result as follows:

 (clojure.java.io/resource "clojure/data/json.clj")
 => ".../.m2/repository/org/clojure/data.json/2.4.0/data.json-2.4.0.jar!/clojure/data/json.clj"]

which is the wrong answer! I still don't understand why io/resource shows the file there, whereas calling require on that ns returns the one embedded in Clojurescript.

One might ask why I would be using Clojurescript and clojure.data.json together in the same jvm. Well, in my case, in development I tend to have my server and client dependencies combined, so I run cljs compile and server side stuff in one vm. When deploying and testing, I separate them (meaning Clojurescript jar is only on the classpath when cljs source is being compiled). It is possible to run separate server and cljs jvm's locally of course, but that then means I can't have a single .nrepl.edn file for example. There could be other reasons for using these 2 together though, writing data-reader functions that use json possibly.

I raised this on clojure slack and now Clojurescript's maintainers are aware, so hopefully this gets fixed.

A fix is likely to involve shading. This is where a library wants to use a fixed version of another library, so it copies the sources of that library into itself, but changes the namespaces/packages of the source library to be something different, and specific to itself.

My thanks go to Alex Miller for explaining this.

Discuss this post here.

Published: 2022-02-09

Tagged: clojure

you Promise to be Lazy?

Combining Java's promises with Clojure's laziness yields some interesting behaviour


(doseq [x (range 1000000)])

Since range returns a lazy sequence and doseq does not retain the head of the sequence, there will only be one element of the sequence realized at every step of the doseq.

Now let's split the creation and consumption of the lazy sequence over chained promises. I am using the promesa library, which on the jvm is a thin wrapper over java.util.concurrent#CompletableFuture.

(-> (p/resolved (range 1000000))
    (p/then (fn [xs]
              (doseq [x xs]))))

It looks like it should be just as lazy. Nowhere is the code above retaining a reference to the head of the sequence - and yet it is!

The then promise internally has a reference to the preceding promise and that promise has a reference to its result - the head of the sequence. When the first promise returns, the sequence is unrealized, but as the subsequent then promise consumes the sequence it is realized and the head retained by the preceding promise!

What happens if there is a longer chain of promises? A promise executing in a chain only has reference to the preceding one. The preceding one has lost its reference to the next one upstream of itself, so in a chain just the current and immediately preceding one are not gc-able.

So? Well imagine you are streaming results out of a db for example - that might be modelled as a lazy seq, which is consumed through e.g. doseq and written out to a stream. Sounds like a nice memory-friendly solution, but if the db request results in a promise it might seem natural to keep chaining that result on.

Further thoughts

Does this apply to

I haven't investigated.

Related to this topic is Stuart Sierra's Lazy Effects post, in which he says never mix laziness and side-effects. Good advice I would say.

Discuss this post here.

Published: 2022-02-03

Tagged: clojure