-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset visualization issue #2915
Comments
Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template! |
It looks like your configuration is good since you've got jobs in the system. Are you using the correct Airflow operators to extract lineage metadata? (ie: not python or k8s operator) Have you tried to turn on the debug logs in Airflow so that you can see the inputs and outputs are indeed populated with schemas? Are there are namespaces created for your datasets? |
I tried to create DAG based on example ( btw example from repo (https://github.com/MarquezProject/marquez/blob/main/examples/airflow/airflow.md) works fine based on Postgres operator) but when I tried to create same dag via sparksubmit operator, I can see jobs but no dataset . def run_stage(stage):
if name == "main": Import necessary methods and variablesfrom draif_common.common_env import get_default_spark_conf, LIB_FOLDER default_args = { dag = DAG( conf = get_default_spark_conf() generate_tables = SparkSubmitOperator( merge_tables = SparkSubmitOperator( process_final_dataset = SparkSubmitOperator( generate_tables >> merge_tables >> process_final_dataset |
@salamandra2508 I see you're trying to manually define outlets in operator. How about turning on Spark OpenLineage listener and getting the lineage automatically from spark job https://openlineage.io/docs/integrations/spark/ ? |
After installing Marquez and setup in my environments (AWS EKS + helm + argocd) I can see the Airflow DAGs in Marquez UI but do not see datasets.
![Datasets-Repo-https-code-rbi-tech-raiffeisen-ua-data-airflow-examples-git-Branch-dev-new-10-02-2024_09_33_AM]
(https://github.com/user-attachments/assets/4bf50c17-4fe8-4764-925c-a488a14f80d5)
how to achieve this? Or can some one provide DAG example to check it? I'm pretty new with this stuff.
The text was updated successfully, but these errors were encountered: