Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Create Cassandra db schema on session initialization #5922

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

akstron
Copy link
Contributor

@akstron akstron commented Sep 2, 2024

Create Schema (if not present) on Session Initialization

Once a session is established with cassandra db, the added code parses the template file containing queries for creating schema and create queries out of it. Post which it executes those queries to create the required types and tables.

Which problem is this PR solving?

Resolves #5797

Description of the changes

  • The PR includes the following changes:
    1. Embedding template files into binary
    1. Creation of database schema in initialization steps once session to database is established.

How was this change tested?

  • Schema rendering is being tested with unit test.
  • bash scripts/cassandra-integration-test.sh 4 v004 v2 -s

Checklist

plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
return result
}

func constructQueriesFromTemplateFiles(session cassandra.Session, params *StorageConfigParams) ([]cassandra.Query, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cassandra.Session not able to execute multiple queries at once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about running individual queries in parallel or a batch query option? I didn't try executing parallel queries.

plugin/storage/cassandra/factory.go Outdated Show resolved Hide resolved
@akstron akstron force-pushed the create-database-scheme-cassandra branch 2 times, most recently from 69275fb to bcad4c0 Compare October 25, 2024 13:34
@akstron akstron marked this pull request as ready for review October 25, 2024 13:52
@akstron akstron requested a review from a team as a code owner October 25, 2024 13:52
cmd/jaeger/config-cassandra.yaml Outdated Show resolved Hide resolved
pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/schema/schema.go Outdated Show resolved Hide resolved
@akstron akstron force-pushed the create-database-scheme-cassandra branch from f9dd90e to 90368b1 Compare October 28, 2024 10:46
…ution for initialize database

Signed-off-by: Alok Kumar Singh <[email protected]>
@akstron akstron force-pushed the create-database-scheme-cassandra branch from 90368b1 to afc786d Compare October 28, 2024 10:50
//Datacenter is the name for network topology
Datacenter string `mapstructure:"datacenter" valid:"optional"`
// TraceTTL is Time To Live (TTL) for the trace data in seconds
TraceTTL int `mapstructure:"trace_ttl" valid:"optional"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the type here be time.Duration so that the user could specify 72h?

Copy link
Contributor Author

@akstron akstron Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that there should be a validation to make sure that user can't specify something like "ms"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the point of using strong type - the business logic does not need to worry about validations, they should happen separately during parsing. I believe if you simply change int to time.Duration and use 10ms as a value in YAML, the parser will do the right thing.

Copy link
Contributor Author

@akstron akstron Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TraceTTL and DependecyTTL sets: default_time_to_live (https://cassandra.apache.org/doc/latest/cassandra/developing/cql/ddl.html) which should be supplied in seconds. Won't making it time.Duration and allowing users to use something like 10ms complicate it? Like we have to block more precise duration than seconds like millisecond or nanosecond?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the option is either we name the field trace_ttl_seconds and force the user to deal with ridiculous numbers like 1123200 (2w - can you tell? I can't without doing math), or we keep a clean name trace_ttl and let the user specify easy-to-read values like 24h or 14d

// CasVersion is version of cassandra used
CasVersion int `mapstructure:"cas_version" valid:"optional"`
// CompactionWindow of format "^[0-9]+[mhd]$" tells the compaction window of the db
CompactionWindow string `mapstructure:"compaction_window" valid:"optional"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is defining a time interval? Can we then also use time.Duration type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only time precision form https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/TimeUnit.html are allowed as per https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/twcs.html.

Also, below is based on the current script used. Should we go ahead with breaking it?
// CompactionWindow of format "^[0-9]+[mhd]$" tells the compaction window of the db

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are creating new config, not bound by previous restrictions. If the destination API does not allow smaller units we can always add validation cw >= time.Minute

pkg/cassandra/config/config.go Outdated Show resolved Hide resolved
plugin/storage/cassandra/schema/v004-go-tmpl-test.cql.tmpl Outdated Show resolved Hide resolved
plugin/storage/cassandra/schema/schema.go Outdated Show resolved Hide resolved
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
@@ -0,0 +1,43 @@
-- There are total 4 queries here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of test schema?

Copy link
Contributor Author

@akstron akstron Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing on all the queries and manually creating expected result for all of them, I reduced the number of queries to ease the creation of expected result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't really make sense to me. We can either test that the template is correct or that the code using the template is correct. You're not doing the former by having another template, and the latter you can do with the primary template. We don't need to validate that the output of running template is "as expected", that's like testing the Go template package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to validate that the output of running template is "as expected", that's like testing the Go template package.
We are also removing comments and constructing individual query strings out of it by iterating over lines in the template output. The test checks this "individual query string" construction, which not just involves using the template package.

and the latter you can do with the primary template.
Sure, I think I can write an integration test for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yurishkuro , I have the integration test script ready. Should I merge it with the current cassandra-intergration-test.sh? This would require adding an additional 4th parameter skip_apply_schema for running the script

@akstron akstron changed the title Create Cassandra db schema on session initialization [WIP] Create Cassandra db schema on session initialization Nov 9, 2024
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Signed-off-by: Alok Kumar Singh <[email protected]>
Copy link

codecov bot commented Nov 9, 2024

Codecov Report

Attention: Patch coverage is 62.70270% with 69 lines in your changes missing coverage. Please review.

Project coverage is 96.18%. Comparing base (640615e) to head (1dc16cd).
Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
pkg/cassandra/config/schema.go 57.44% 44 Missing and 16 partials ⚠️
pkg/cassandra/config/config.go 79.54% 6 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5922      +/-   ##
==========================================
- Coverage   96.41%   96.18%   -0.24%     
==========================================
  Files         353      355       +2     
  Lines       20135    20310     +175     
==========================================
+ Hits        19414    19535     +121     
- Misses        535      572      +37     
- Partials      186      203      +17     
Flag Coverage Δ
badger_v1 8.16% <0.00%> (-0.16%) ⬇️
badger_v2 1.64% <0.00%> (-0.04%) ⬇️
cassandra-4.x-v1 15.25% <62.70%> (+0.85%) ⬆️
cassandra-4.x-v2 1.59% <0.00%> (-0.04%) ⬇️
cassandra-5.x-v1 15.25% <62.70%> (+0.85%) ⬆️
cassandra-5.x-v2 1.59% <0.00%> (-0.04%) ⬇️
elasticsearch-6.x-v1 18.27% <0.00%> (-0.23%) ⬇️
elasticsearch-7.x-v1 18.35% <0.00%> (-0.24%) ⬇️
elasticsearch-8.x-v1 18.51% <0.00%> (-0.24%) ⬇️
elasticsearch-8.x-v2 1.64% <0.00%> (-0.03%) ⬇️
grpc_v1 9.31% <0.00%> (-0.22%) ⬇️
grpc_v2 6.88% <0.00%> (-0.14%) ⬇️
kafka-v1 8.72% <0.00%> (-0.17%) ⬇️
kafka-v2 1.64% <0.00%> (-0.04%) ⬇️
memory_v2 1.64% <0.00%> (-0.03%) ⬇️
opensearch-1.x-v1 18.40% <0.00%> (-0.25%) ⬇️
opensearch-2.x-v1 18.39% <0.00%> (-0.25%) ⬇️
opensearch-2.x-v2 1.63% <0.00%> (-0.05%) ⬇️
tailsampling-processor 0.46% <0.00%> (-0.02%) ⬇️
unittests 94.86% <36.21%> (-0.47%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Alok Kumar Singh <[email protected]>
@akstron akstron changed the title [WIP] Create Cassandra db schema on session initialization Create Cassandra db schema on session initialization Nov 9, 2024
@akstron akstron changed the title Create Cassandra db schema on session initialization [WIP] Create Cassandra db schema on session initialization Nov 10, 2024
@akstron akstron marked this pull request as draft November 10, 2024 05:21
// NewSession creates a new Cassandra session
func (c *Configuration) NewSession() (cassandra.Session, error) {
err := c.newSessionPrerequisites()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is redundant. Create the session as before and call GenerateSchemaIfNotPresent just before returning from NewSession

@@ -0,0 +1,43 @@
-- There are total 4 queries here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already asked - why do we need this file?

@@ -58,6 +58,18 @@ type Schema struct {
// while connecting to the Cassandra Cluster. This is useful for connecting to clusters, like Azure Cosmos DB,
// that do not support SnappyCompression.
DisableCompression bool `mapstructure:"disable_compression"`
// Datacenter is the name for network topology
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add CreateSchema bool (default false) for backwards compatibility

CompactionWindowUnit string `mapstructure:"compaction_window_unit" valid:"optional"`
}

func DefaultParams() TemplateParams {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed. There is DefaultConfig function somewhere, it should populate these settings there.


expOutputQueries := []string{
`CREATE TYPE IF NOT EXISTS jaeger_v1_dc1.keyvalue (
key text,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these details are not relevant for the unit test - after all we don't know if they are correct anyway since we're not testing against a real DB (but the integration test will test against it). So at most you can verify that CREATE commands are issued for expected db objects, but there's no need to match full queries.

Comment on lines +16 to +20
s)
SKIP_APPLY_SCHEMA="true"
;;
*)
;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use two-space indentation, not tabs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create database schema in Cassandra automatically
3 participants