You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’ve been using Marquez for tracking data lineage, and I’ve noticed that the current column lineage visualization in the UI does not distinguish between different types of relationships between the input and output columns. All connections between the columns are represented by the same color lines, regardless of whether the relationship is direct or indirect.
According to the OpenLineage, relationships can be categorized as:
Direct:
Identity: Output value is taken as is from the input.
Transformation: Output value is a transformed source value from the input row.
Aggregation: Output value is an aggregation of source values from multiple input rows.
Indirect:
Join: Input is used in a join condition.
GroupBy: Output is aggregated based on input (e.g., GROUP BY clause).
Filter: Input is used as a filtering condition (e.g., WHERE clause).
Order: Output is sorted based on input field.
Window: Output is windowed based on input field.
Conditional: Input value is used in IF or CASE WHEN statements.
However, in Marquez, these relationships between input and output columns are not visually differentiated. Can this be achieved in Marquez?
The text was updated successfully, but these errors were encountered:
We would need to integrate such a thing with our query parsers and integrations on the OL side to build such a feature. I do agree basing column lineage on a sort by field does not really make so much sense.
On the Marquez side, we don't really have the capacity to distinguish between these. I'd ping the OL folks about this one.
Thanks for reporting this @rohansun! You're absolutely right, we can do way more here and the ColumnLineageDatasetFacet does defines DIRECT and INDIRECT for InputField.transformations.type. I've added your suggestion to our UI.v2 roadmap. As we think of ways to best visualize these relationships in the UI, can you share some example OpenLineage events so we can use for testing as we mockup the UI?
I’ve been using Marquez for tracking data lineage, and I’ve noticed that the current column lineage visualization in the UI does not distinguish between different types of relationships between the input and output columns. All connections between the columns are represented by the same color lines, regardless of whether the relationship is direct or indirect.
According to the OpenLineage, relationships can be categorized as:
Direct:
Identity: Output value is taken as is from the input.
Transformation: Output value is a transformed source value from the input row.
Aggregation: Output value is an aggregation of source values from multiple input rows.
Indirect:
Join: Input is used in a join condition.
GroupBy: Output is aggregated based on input (e.g., GROUP BY clause).
Filter: Input is used as a filtering condition (e.g., WHERE clause).
Order: Output is sorted based on input field.
Window: Output is windowed based on input field.
Conditional: Input value is used in IF or CASE WHEN statements.
However, in Marquez, these relationships between input and output columns are not visually differentiated. Can this be achieved in Marquez?
The text was updated successfully, but these errors were encountered: