Skip to content
Christophe Grand edited this page Sep 16, 2016 · 4 revisions

xforms is a collection of transducer-related functions.

Familiar functions

Before getting to know the original stuff in xforms, it's worth introducing xforms' cousins of some Clojure functions.

x/into

x/into has arities 2 ([to from]) and 3 ([to xform from]) which are identical to regular into but accepts also 1 argument ([to]) in which case it returns a transducer which builds a collection by adding item to the to collection and when the transducing context is done it returns the resulting collection.

=> (into [:core-into-vector] (x/into [:x-into-vector]) (range 10))
[:core-into-vector [:x-into-vector 0 1 2 3 4 5 6 7 8 9]]

This extra nesting may seem pointless but it is going to make sense soon.

x/reduce

1-arg ([rf]) or 2-arg ([rf init]) performs the specified reduction and returns its result on completion of the transducing context. Treats rf like transduce (and not reduce) does: when no init (rf) is called and, upon completion, (rf acc) is called.

=> (into [] (x/reduce +) (range 10))
[45]

Again, it will make more sense soon.

x/by-key a Janus-like function

Like roman deity Janus, x/by-key has two faces: it's both a transducer and a transducing context!

Let's examine a simple example:

=> (into {} (x/by-key (map inc)) {:a 1 :b 2 :c 3 :d 4})
{:a 2, :b 3, :c 4, :d 5}

For each key, x/by-key establishes a transducing context and runs (map inc) in this context. These nested contexts are terminated when the parent context (the one in which x/by-key runs) is completed.

When a subcontext does not emit values, the matching key is never emitted. See:

=> (into {} (x/by-key (filter odd?)) {:a 1 :b 2 :c 3 :d 4})
{:a 1, :c 3}

A subcontext may also emit several values for a single key:

=> (into {} (x/by-key (mapcat (fn [v] [v (- v)]))) {:a 1 :b 2 :c 3 :d 4})
{:a -1, :b -2, :c -3, :d -4}

It may seem like x/by-key emits only the last value for each key but it's not the case. The last-value-wins semantics are those of into {}. Let's switch to a less discarding into []:

=> (into [] (x/by-key (mapcat (fn [v] [v (- v)]))) {:a 1 :b 2 :c 3 :d 4})
[[:a 1] [:a -1] [:b 2] [:b -2] [:c 3] [:c -3] [:d 4] [:d -4]]

We may also receive several vals for a given key:

=> (into [] (x/by-key (map inc)) (concat {:a 1 :b 2} {:a 3 :b 4}))
[[:a 2] [:b 3] [:a 4] [:b 5]]

And that's where x/into and x/reduce start to make sense:

=> (into {} (x/by-key (x/into [])) (concat {:a 1 :b 2} {:a 3 :b 4}))
{:a [1 3], :b [2 4]}
=> (into {} (x/by-key (x/reduce +)) (concat {:a 1 :b 2} {:a 3 :b 4}))
{:a 4, :b 6}

They allow to perform an into or a reduce independently on each key!

One thing that is not immediatly obvious is that x/into and x/reduce do not need a transducer-aware variant (like 3-arg into or transduce) because they are transducers themselves. So instead of writing (x/by-key (x/into [] (map inc))) or (x/by-key (x/transduce (map inc) +)) (both don't work) you write: (x/by-key (comp (map inc) (x/into []))) or (x/by-key (comp (map inc) (x/reduce +)).

The optional args: kfn, vfn and pair

By default x/by-key expects to receives pairs (map entries or vectors) and assumes the first item to be the key and the second item the value. It also assumes that in the end you are going to recombine a key with each value output by the matching subcontext (remember mapcat) in a pair (thus potentially yielding 0, 1 or more pairs for the same key).

So x/by-key has 4 arities:

  • The 1-arg we have already seen with the defaults exposed in the previous paragraph.
  • The 2-arg arity [kfn xform] which specify a function to extract the key and change the default assumption for what is the value: the value is now the whole input item.
=> (into {} (x/by-key :continent (x/into []))
     [{:country "USA" :continent "North America"}
      {:country "Canada" :continent "North America"}
      {:country "France" :continent "Europe"}
      {:country "Germany" :continent "Europe"}])
{"North America" [{:country "USA", :continent "North America"} {:country "Canada", :continent "North America"}], "Europe" [{:country "France", :continent "Europe"} {:country "Germany", :continent "Europe"}]}

which is a 1-line reimplementation of group-by.

  • The 3-arg arity [kfn vfn xform] which allows to define what's the value:
=> (into {} (x/by-key :continent :country (x/into []))
     [{:country "USA" :continent "North America"}
      {:country "Canada" :continent "North America"}
      {:country "France" :continent "Europe"}
      {:country "Germany" :continent "Europe"}])
{"North America" ["USA" "Canada"], "Europe" ["France" "Germany"]}

This alone is quite handy as you don't have to retransform the map returned by group-by you can directly transform (or even reduce it) while the grouping happens without retaining each group in memory. x/by-key allows for efficient computations of all kind of grouping and rollovers.

  • 4-arg arity [kfn vfn pair xform] which gives you control on the recombination of keys and values:
=> (into [] (x/by-key :continent :country 
              (fn [continent countries]
                {:continent continent :countries countries})
              (x/into []))
     [{:country "USA" :continent "North America"}
      {:country "Canada" :continent "North America"}
      {:country "France" :continent "Europe"}
      {:country "Germany" :continent "Europe"}])
[{:continent "North America", :countries ["USA" "Canada"]} {:continent "Europe", :countries ["France" "Germany"]}]

On efficiency

Computations expressed with xforms functions allow to leverage several optimizations without sacrificing expressivity.

Transients everywhere

The 1-line clone of group-by is more efficient than the original: transients are used (because of into and x/into) for both the map and the nested vectors while group-by only use transients for the map.

One pass computations

Thanks to nested transducing contexts one does not have to keep whole collections in memory just to summarize them in a subsequent transformation.

0-alloc pairs

When composing xforms transducers that manipulate pairs allocations of pairs are elided. For example the below map-vals function works without allocating a single pair.

(defn map-vals [m f]
  (x/into m (x/by-key (map f)) m))