Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] execute hivesql in hive have some exception after add new column in hudi table #12637

Open
liucongjy opened this issue Jan 15, 2025 · 4 comments

Comments

@liucongjy
Copy link

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

when add a new column in hudi table,execute select * from hudiTable in hive,have some exception:Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException:Field ext4 not found in log schema.

To Reproduce

Steps to reproduce the behavior:

  1. execute sql add column
    alter table ehr_etbj_csyxzm add columns(ext4 string comment '扩展字段4');
    2.execute hive sql in hive
    select * from etbj_csyxzm;

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version : 0.15

  • Spark version : 3.3

  • Hive version : 2.3.9

  • Hadoop version : 3.0

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.exception.HoodieException: Field ext4 not found in log schema. Query cannot proceed! Derived Schema Fields: [ dzlbdm2, cjrxm, csyxzmbh, fqmz, qfjgmc, dz_cj2, dz_sz2, scsj, _hoodie_partition_path, xsecsrqsj, _hoodie_commit_seqno, fqsfzjlbdm, qfrq, _hoodie_commit_time, jgmc, scwdcs, jlsj, mqxm, lybs, dz_hm2, hzjcsyh, jsryxm, _hoodie_file_name, fqxm, dzqc2, lzryxm, mqgj, gxrxm, qfryxm, jgdm, gxsj, dz_xq2, qfjgdm, mqsfzjhm, csyz, qfrybh, xsexbdm, dz_xz2, mqnl, pch, mqzz, cstz, scwdbz, fqgj, mqmz, dz_sq2, sjzt, yzbm2, wdsyh, cssc, xsexm, mqsfzjlbdm, _hoodie_record_key, fqsfzjhm, fqnl, fqzz, jsrybh, plan_id]]
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:279)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:265)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.next(JDBCResultSetImpl.java:272)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:180)
... 7 more
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hudi.exception.HoodieException: Field ext4 not found in log schema. Query cannot proceed! Derived Schema Fields: [ dzlbdm2, cjrxm, csyxzmbh, fqmz, qfjgmc, dz_cj2, dz_sz2, scsj, _hoodie_partition_path, xsecsrqsj, _hoodie_commit_seqno, fqsfzjlbdm, qfrq, _hoodie_commit_time, jgmc, scwdcs, jlsj, mqxm, lybs, dz_hm2, hzjcsyh, jsryxm, _hoodie_file_name, fqxm, dzqc2, lzryxm, mqgj, gxrxm, qfryxm, jgdm, gxsj, dz_xq2, qfjgdm, mqsfzjhm, csyz, qfrybh, xsexbdm, dz_xz2, mqnl, pch, mqzz, cstz, scwdbz, fqgj, mqmz, dz_sq2, sjzt, yzbm2, wdsyh, cssc, xsexm, mqsfzjlbdm, _hoodie_record_key, fqsfzjhm, fqnl, fqzz, jsrybh, plan_id]]
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:499)
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:307)
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:878)
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy37.fetchResults(Unknown Source)
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:559)
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:751)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:567)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: org.apache.hudi.exception.HoodieException: Field ext4 not found in log schema. Query cannot proceed! Derived Schema Fields: [ dzlbdm2, cjrxm, csyxzmbh, fqmz, qfjgmc, dz_cj2, dz_sz2, scsj, _hoodie_partition_path, xsecsrqsj, _hoodie_commit_seqno, fqsfzjlbdm, qfrq, _hoodie_commit_time, jgmc, scwdcs, jlsj, mqxm, lybs, dz_hm2, hzjcsyh, jsryxm, _hoodie_file_name, fqxm, dzqc2, lzryxm, mqgj, gxrxm, qfryxm, jgdm, gxsj, dz_xq2, qfjgdm, mqsfzjhm, csyz, qfrybh, xsexbdm, dz_xz2, mqnl, pch, mqzz, cstz, scwdbz, fqgj, mqmz, dz_sq2, sjzt, yzbm2, wdsyh, cssc, xsexm, mqsfzjlbdm, _hoodie_record_key, fqsfzjhm, fqnl, fqzz, jsrybh, plan_id]]
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208)
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:494)
... 24 more
Caused by: java.lang.RuntimeException: org.apache.hudi.exception.HoodieException:Field ext4 not found in log schema. Query cannot proceed! Derived Schema Fields: [ dzlbdm2, cjrxm, csyxzmbh, fqmz, qfjgmc, dz_cj2, dz_sz2, scsj, _hoodie_partition_path, xsecsrqsj, _hoodie_commit_seqno, fqsfzjlbdm, qfrq, _hoodie_commit_time, jgmc, scwdcs, jlsj, mqxm, lybs, dz_hm2, hzjcsyh, jsryxm, _hoodie_file_name, fqxm, dzqc2, lzryxm, mqgj, gxrxm, qfryxm, jgdm, gxsj, dz_xq2, qfjgdm, mqsfzjhm, csyz, qfrybh, xsexbdm, dz_xz2, mqnl, pch, mqzz, cstz, scwdbz, fqgj, mqmz, dz_sq2, sjzt, yzbm2, wdsyh, cssc, xsexm, mqsfzjlbdm, _hoodie_record_key, fqsfzjhm, fqnl, fqzz, jsrybh, plan_id]
at org.apache.hudi.avro.HoodieAvroUtils.generateProjectionSchema(HoodieAvroUtils.java:582)
at org.apache.hudi.hadoop.avro.HoodieAvroParquetReader.(HoodieAvroParquetReader.java:69)
at org.apache.hudi.hadoop.avro.HoodieTimestampAwareParquetInputFormat.createRecordReader(HoodieTimestampAwareParquetInputFormat.java:42)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:94)
at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReaderInternal(HoodieParquetInputFormat.java:129)
at org.apache.hudi.hadoop.HoodieParquetInputFormat.getRecordReader(HoodieParquetInputFormat.java:121)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)

Add the stacktrace of the error.

@rangareddy
Copy link

Hi @liucongjy

How was the Hudi table created (using Spark, Flink, or Hive), and where are you running the ALTER command?

@danny0405 danny0405 changed the title execute hivesql in hive have some exception after add new column in hudi table [SUPPORT] execute hivesql in hive have some exception after add new column in hudi table Jan 15, 2025
@danny0405
Copy link
Contributor

It looks like the hadoop-mr does not support schema evolution now,

Image

@cshuo Can you take care this issue and fire a fix for it.

@cshuo
Copy link
Contributor

cshuo commented Jan 15, 2025

It looks like the hadoop-mr does not support schema evolution now,

Image

@cshuo Can you take care this issue and fire a fix for it.

Sure, I'll take it.

@cshuo
Copy link
Contributor

cshuo commented Jan 17, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants