-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
by-keys and assoc-in? #11
Comments
=> (xforms/into {}
(xforms/by-key :a
(comp
(xforms/by-key :b identity)
(xforms/into {})))
[{:a 1 :b 1} {:a 1 :b 2} {:a 2 :b 1} {:a 2 :b 2}])
{1 {1 {:a 1, :b 1}, 2 {:a 1, :b 2}},
2 {1 {:a 2, :b 1}, 2 {:a 2, :b 2}}} The contract for your propsed Did you consider something like: (transduce
(xforms/by-key (juxt :a :b) identity) ; or a (map (juxt (juxt :a :b) identity))
(completing (fn [m [ks v]] (assoc-in m ks v))) {}
[{:a 1 :b 1} {:a 1 :b 2} {:a 2 :b 1} {:a 2 :b 2}]) More generally: what's your broader usecase? A rollup? Edit: closing because not an issue but we can keep discussing this problem. |
The original example assumed the transducer would emit a pair of The broader use-case is indeed a rollup where a In any case it seems easily achieved in your first solution by replacing Only curious whether you think there is an even broader use-case here that could be fleshed out? |
A couple of month old gist
https://gist.github.com/cgrand/01d4a5dec24acc912501c97bcaa1ceb7
|
This is brilliant! Another goal we had was to use the transducer incrementally. That is after processing an initial collection, we could update the previous result with a new value. I was unable to figure out what (let [xf (rollup [:continent :country] :population)
data [{:continent "Europe" :country "France" :population 66}
{:continent "Europe" :country "Germany" :population 80}
{:continent "Europe" :country "Belarus" :population 9}
{:continent "North-America" :country "USA" :population 319}
{:continent "North-America" :country "Canada" :population 35}]
acc (into {} xf data)
new-val {:continent "North-America" :country "Mexico" :population 100}
rf (xf ???)]
(rf acc new-val))
;; Would return:
;; {:detail
;; {"Europe" {:detail {"France" 66, "Germany" 80, "Belarus" 9}, :total 155},
;; "North-America" {:detail {"USA" 319, "Canada" 35 "Mexico" 100}, :total 454}},
;; :total 509} |
Good old => (let [xf (rollup [:continent :country] :population)
data [{:continent "Europe" :country "France" :population 66}
{:continent "Europe" :country "Germany" :population 80}
{:continent "Europe" :country "Belarus" :population 9}
{:continent "North-America" :country "USA" :population 319}
{:continent "North-America" :country "Canada" :population 35}]
acc (into {} xf data)
new-vals [{:continent "North-America" :country "Mexico" :population 100}]]
(transduce xf (fn mrg
([x] x)
([x x']
(if (:detail x) ; leaf detection could be better
{:detail (merge-with mrg (:detail x) (:detail x'))
:total (+ (:total x) (:total x'))}
(+ x x')))) acc new-vals))
{:detail
{"Europe"
{:detail
{"France" 66,
"Germany" 80,
"Belarus" 9},
:total 155},
"North-America"
{:detail
{"USA" 319,
"Canada" 35,
"Mexico" 100},
:total 454}},
:total 609} By the way, be very careful when doing |
@cgrand thanks for mentioning Interestingly your last example reminds me of how reduce functions need to be written for Spark, in how you need to provide an extra merge function to combine results from different partitions. For RDDs this extra step seemed inevitable, but it's been bothering me since I started looking at this incremental approach to rollups with transducers. It just seems that we get those merges for free when we process the initial Sorry I'm rambling, but I feel like I'm missing something here and was wondering if you had any thoughts on how to improve that situation? |
@julienfantin it's not quite like the merge of RDDs. The model used for RDDs (or Transducers are really tied to linear traversal so the only incrementalism you could get for free is "append-only". Except that it may be difficult because of what I consider a bug in the way transducers are initialized – just a gut feeling I haven't thought enough about incremental transducers. Yet. |
What I was trying to point out is that even "append-only" incrementalisms can be non-trivial when the result is not a simple seq. Take your incremental rollup example: the My observation, and frustration, is that we have to re-define something that the instantiated transducing-fn does during its lifecycle, i.e. "append" new values to the rollup. To be more concrete, do you think there is a way that you could have written the |
Right now I don't see a way. If transducers were pure, then all state would be in the accumulator and you would get what you want for free. (i.e. call Would it be possible to make some transducers to opt-in for such behavior? Using something akin to puncutuated streams? I don't think so. Well, I could certainly make something xforms-only but this would break with any non-xform-aware transducer. (To be constrasted with kv transducers which are xforms-only but only an optimization and can interop with regular transducers.) Our discussion makes me realize that I should rename "1-item transducers" to "aggregate transducers" (similar to SQL aggregate functions). Maybe starting from there I could build something not too confusing... |
I think I've done a poor job of getting my point across. Please bear with me as I try and explain it from a high level. The initial use-case that prompted me to look into In our case however, the rollup is a process with a lifecycle. We receive thousands of events in an initial payload and much smaller updates throughout the application lifecycle. Our main goals are to:
The tradeoff we're willing to deal with is impurity, or statefulness of the transducing fn. Given our goals, it seems only natural that we'd have to trade time for space. We also do not have unbounded growth situations such as the use of Now for what I don't understand... In your rollup example, what's most different from common uses of transducers I've seen -- and also what we're really after -- is the deeply-nested associative behavior. Our goals above seem like they could be achieved if we got that associative behavior when using the 2-arity only. However we only get what we expect from "finalizing" the result with the 1-arity. From a high-level again, as I understand it we have two stateful aspects going on here:
It seems our problem is that we cannot get the transducer to exercise its associative behavior required for the aggregation (1.) without finalizing it (2.), thus negating our initial goals of reuse? Now if the previous statement is correct, the next thing I don't understand is whether this is required by the nature of transducers in general or due to the particular implementations of |
You can also use a roadroller approach: borrow some code from Powderkeg which would allow you to "fork" any mutable stuff. So you would be able to clone a stateful rf, complete it to get a snapshot and keep aggregating with the original one. |
Unfortunately we need our solution to be portable to cljs! I've been digging into some of the As far as I understand the implementation of The only problem I see with the whole online approach is that it'll preclude using a transient for the Am I off the mark here? |
Can't reply in detail now but touching the acc is not possible. See slide
#20
https://cdn.rawgit.com/cgrand/xforms/resources/Lost%20in%20Transduction.pdf
|
Thanks for the slides, wish I could have attended! |
So, some quirks about transducers: you can't touch the
A little thing that may interest you: => (sequence (x/by-key odd? (x/reduce +)) (range 10))
([false 20] [true 25])
=> (sequence (x/by-key odd? (x/reductions +)) (range 10))
([false 0]
[false 0]
[true 0]
[true 1]
[false 2]
[true 4]
[false 6]
[true 9]
[false 12]
[true 16]
[false 20]
[true 25])
=> (reductions conj {} (sequence (x/by-key odd? (x/reductions +)) (range 10)))
({}
{false 0}
{false 0}
{false 0, true 0}
{false 0, true 1}
{false 2, true 1}
{false 2, true 4}
{false 6, true 4}
{false 6, true 9}
{false 12, true 9}
{false 12, true 16}
{false 20, true 16}
{false 20, true 25}) |
To me the key is to provide running versions of all aggregating xforms (those which always return only one item) |
Indeed Please correct me if I'm wrong but the key then would be to make sure we call the 2-arity rf in the 2-arity of the transducer? This is probably not quite correct but I got a basic example working with this: (defn x-reduce
([f]
(fn [rf]
(let [vacc (volatile! (f))]
(let [f (x/ensure-kvrf f)]
(x/kvrf
([] (rf))
([acc] (rf acc))
([acc x] (rf acc (vswap! vacc f x)) )
([acc k v] (rf acc (vswap! vacc f k v) k)))))))
([f init]
(reduce (fn ([] init) ([acc] (f acc)) ([acc x] (f acc x))))))
(into [] (x-reduce +) (range 10))
;; => [0 1 3 6 10 15 21 28 36 45] More generally do you think |
Under the hood, most aggregation transducers are just I think the best way to approach that is to give a rf the opportunity to yield a snapshot. I can see several designs: (defprotocol Snap1
(snap1 [rf acc] "returns [acc' snapshot-value]")
; pros: clean, functional looking ; cons: pair alloc
(defprotocol Snap2
(snap2 [rf vacc] "returns snapshot-value, and vacc has been updated to contain the new accumulator")
; pros: easy, even handy when acc is already in a volatile ; cons: mutation
(defprotocol Snap3
(pause [rf acc] "returns paused-acc")
(snap [rf paused-acc] "returns snapshot-value")
(resume [rf paused-acc] "returns acc'")
; pro : clean & functional looking ; cons : verbose |
It's interesting that your original question leads to a problematic I had on the back of my mind: there's |
That sounds interesting but I'm not sure I'm seeing how this would work for reusing the acc accross multiple transducing contexts? Would you need new transducing contexts that'd call |
The not quite formed idea is to leverage composability. So that rollup would become "runningable" because its constituents are. |
I opened #12 |
Hi @cgrand,
Sorry this is more of a question than an issue.
I was wondering if it was possible to achieve something very close to what
xforms/by-key
already does but instead using akfn
that returns a vector and use those keys toassoc-in
the accumulator?I.e. something like this
The text was updated successfully, but these errors were encountered: