-
Notifications
You must be signed in to change notification settings - Fork 32
Introduction
xforms is a collection of transducer-related functions.
Before getting to know the original stuff in xforms, it's worth introducing xforms' cousins of some Clojure functions.
x/into
has arities 2 ([to from]
) and 3 ([to xform from]
) which are identical to regular into
but accepts also 1 argument ([to]
) in which case it returns a transducer which builds a collection by adding item to the to
collection and when the transducing context is done it returns the resulting collection.
=> (into [:core-into-vector] (x/into [:x-into-vector]) (range 10))
[:core-into-vector [:x-into-vector 0 1 2 3 4 5 6 7 8 9]]
This extra nesting may seem pointless but it is going to make sense soon.
1-arg ([rf]
) or 2-arg ([rf init]
) performs the specified reduction and returns its result on completion of the transducing context. Treats rf
like transduce
(and not reduce
) does: when no init
(rf)
is called and, upon completion, (rf acc)
is called.
=> (into [] (x/reduce +) (range 10))
[45]
Again, it will make more sense soon.
Like roman deity Janus, x/by-key
has two faces: it's both a transducer and a transducing context!
Let's examine a simple example:
=> (into {} (x/by-key (map inc)) {:a 1 :b 2 :c 3 :d 4})
{:a 2, :b 3, :c 4, :d 5}
For each key, x/by-key
establishes a transducing context and runs (map inc)
in this context. These nested contexts are terminated when the parent context (the one in which x/by-key
runs) is completed.
When a subcontext does not emit values, the matching key is never emitted. See:
=> (into {} (x/by-key (filter odd?)) {:a 1 :b 2 :c 3 :d 4})
{:a 1, :c 3}
A subcontext may also emit several values for a single key:
=> (into {} (x/by-key (mapcat (fn [v] [v (- v)]))) {:a 1 :b 2 :c 3 :d 4})
{:a -1, :b -2, :c -3, :d -4}
It may seem like x/by-key
emits only the last value for each key but it's not the case. The last-value-wins semantics are those of into {}
. Let's switch to a less discarding into []
:
=> (into [] (x/by-key (mapcat (fn [v] [v (- v)]))) {:a 1 :b 2 :c 3 :d 4})
[[:a 1] [:a -1] [:b 2] [:b -2] [:c 3] [:c -3] [:d 4] [:d -4]]
We may also receive several vals for a given key:
=> (into [] (x/by-key (map inc)) (concat {:a 1 :b 2} {:a 3 :b 4}))
[[:a 2] [:b 3] [:a 4] [:b 5]]
And that's where x/into
and x/reduce
start to make sense:
=> (into {} (x/by-key (x/into [])) (concat {:a 1 :b 2} {:a 3 :b 4}))
{:a [1 3], :b [2 4]}
=> (into {} (x/by-key (x/reduce +)) (concat {:a 1 :b 2} {:a 3 :b 4}))
{:a 4, :b 6}
They allow to perform an into
or a reduce
independently on each key!
One thing that is not immediatly obvious is that x/into
and x/reduce
do not need a transducer-aware variant (like 3-arg into
or transduce
) because they are transducers themselves. So instead of writing (x/by-key (x/into [] (map inc)))
or (x/by-key (x/transduce (map inc) +))
(both don't work) you write: (x/by-key (comp (map inc) (x/into [])))
or (x/by-key (comp (map inc) (x/reduce +))
.
By default x/by-key
expects to receives pairs (map entries or vectors) and assumes the first item to be the key and the second item the value. It also assumes that in the end you are going to recombine a key with each value output by the matching subcontext (remember mapcat
) in a pair (thus potentially yielding 0, 1 or more pairs for the same key).
So x/by-key
has 4 arities:
- The 1-arg we have already seen with the defaults exposed in the previous paragraph.
- The 2-arg arity
[kfn xform]
which specify a function to extract the key and change the default assumption for what is the value: the value is now the whole input item.
=> (into {} (x/by-key :continent (x/into []))
[{:country "USA" :continent "North America"}
{:country "Canada" :continent "North America"}
{:country "France" :continent "Europe"}
{:country "Germany" :continent "Europe"}])
{"North America" [{:country "USA", :continent "North America"} {:country "Canada", :continent "North America"}], "Europe" [{:country "France", :continent "Europe"} {:country "Germany", :continent "Europe"}]}
which is a 1-line reimplementation of group-by
.
- The 3-arg arity
[kfn vfn xform]
which allows to define what's the value:
=> (into {} (x/by-key :continent :country (x/into []))
[{:country "USA" :continent "North America"}
{:country "Canada" :continent "North America"}
{:country "France" :continent "Europe"}
{:country "Germany" :continent "Europe"}])
{"North America" ["USA" "Canada"], "Europe" ["France" "Germany"]}
This alone is quite handy as you don't have to retransform the map returned by group-by
you can directly transform (or even reduce it) while the grouping happens without retaining each group in memory. x/by-key
allows for efficient computations of all kind of grouping and rollovers.
- 4-arg arity
[kfn vfn pair xform]
which gives you control on the recombination of keys and values:
=> (into [] (x/by-key :continent :country
(fn [continent countries]
{:continent continent :countries countries})
(x/into []))
[{:country "USA" :continent "North America"}
{:country "Canada" :continent "North America"}
{:country "France" :continent "Europe"}
{:country "Germany" :continent "Europe"}])
[{:continent "North America", :countries ["USA" "Canada"]} {:continent "Europe", :countries ["France" "Germany"]}]
Computations expressed with xforms functions allow to leverage several optimizations without sacrificing expressivity.
The 1-line clone of group-by
is more efficient than the original: transients are used (because of into
and x/into
) for both the map and the nested vectors while group-by
only use transients for the map.
Thanks to nested transducing contexts one does not have to keep whole collections in memory just to summarize them in a subsequent transformation.
When composing xforms transducers that manipulate pairs allocations of pairs are elided. For example the below map-vals
function works without allocating a single pair.
(defn map-vals [m f]
(x/into m (x/by-key (map f)) m))