Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguishing Direct and Indirect Relationships between the input and output columns in Column Lineage Visualization #2874

Open
rohansun opened this issue Aug 12, 2024 · 3 comments
Labels
Milestone

Comments

@rohansun
Copy link

I’ve been using Marquez for tracking data lineage, and I’ve noticed that the current column lineage visualization in the UI does not distinguish between different types of relationships between the input and output columns. All connections between the columns are represented by the same color lines, regardless of whether the relationship is direct or indirect.

According to the OpenLineage, relationships can be categorized as:

Direct:
Identity: Output value is taken as is from the input.
Transformation: Output value is a transformed source value from the input row.
Aggregation: Output value is an aggregation of source values from multiple input rows.

Indirect:
Join: Input is used in a join condition.
GroupBy: Output is aggregated based on input (e.g., GROUP BY clause).
Filter: Input is used as a filtering condition (e.g., WHERE clause).
Order: Output is sorted based on input field.
Window: Output is windowed based on input field.
Conditional: Input value is used in IF or CASE WHEN statements.

However, in Marquez, these relationships between input and output columns are not visually differentiated. Can this be achieved in Marquez?

Copy link

boring-cyborg bot commented Aug 12, 2024

Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!

@phixMe
Copy link
Member

phixMe commented Aug 20, 2024

Thanks for opening an issue.

We would need to integrate such a thing with our query parsers and integrations on the OL side to build such a feature. I do agree basing column lineage on a sort by field does not really make so much sense.

On the Marquez side, we don't really have the capacity to distinguish between these. I'd ping the OL folks about this one.

@wslulciuc wslulciuc modified the milestones: 0.52.0, Roadmap Oct 23, 2024
@wslulciuc
Copy link
Member

Thanks for reporting this @rohansun! You're absolutely right, we can do way more here and the ColumnLineageDatasetFacet does defines DIRECT and INDIRECT for InputField.transformations.type. I've added your suggestion to our UI.v2 roadmap. As we think of ways to best visualize these relationships in the UI, can you share some example OpenLineage events so we can use for testing as we mockup the UI?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

No branches or pull requests

3 participants