[WIP] Refactor storage factories to hold one configuration #6156

mahadzaryab1 · 2024-11-03T14:49:32Z

Which problem is this PR solving?

Towards Investigate if ES Storage is passed a proper isArchive flag #6065

Description of the changes

How was this change tested?

Checklist

I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
I have signed all commits
I have added unit tests for the new functionality
I have run lint and test steps successfully
- for jaeger: make lint test
- for jaeger-ui: yarn lint and yarn test

pkg/es/config/flags.go

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-11-03T18:27:35Z

plugin/storage/factory.go

@@ -118,7 +118,7 @@ func (*Factory) getFactoryOfType(factoryType string) (storage.Factory, error) {
 	case cassandraStorageType:
 		return cassandra.NewFactory(), nil
 	case elasticsearchStorageType, opensearchStorageType:
-		return es.NewFactory(), nil
+		return es.NewFactory(es.PrimaryNamespace), nil


@yurishkuro can you take another look at the changes? I minimized the code movements to simply remove the others field from Options. The problem now is that this isn't initializing the CLI flags for es-archive. Any thoughts on how to get around this?

see #6156 (comment)

The query service instantiates just a single factory and casts it to ArchiveFactory. With your changes (which are in the right direction, but of insufficient scope) it never gets a chance to create archive CLI flags, because that has to happen via different factories.

@yurishkuro is the archive factory only needed for the query service?

plugin/storage/es/factory.go

yurishkuro · 2024-11-03T18:36:53Z

plugin/storage/es/factory.go

 )

 var ( // interface comformance checks
-	_ storage.Factory        = (*Factory)(nil)
-	_ storage.ArchiveFactory = (*Factory)(nil)


need to be very careful about removing ArchiveFactory interface, because query service uses it via runtime cast, so unless we have integration tests (many of them disable archive tests as I recall) you can introduce a breaking change

jaeger/cmd/query/app/querysvc/query_service.go

Line 131 in 772d84c

archiveFactory, ok := storageFactory.(storage.ArchiveFactory)

@yurishkuro yep - noted. I was thinking to avoid breaking changes we can remove the ArchiveFactory within this PR and use an archive storage factory wherever needed.

https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/integration/elasticsearch_test.go#L165 - ES doesn't skip the archive test

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-11-03T19:14:58Z

@yurishkuro we've got 3 callsites for InitArchiveStorage that currently does the runtime cast. I had a couple of questions on how to move forward here and wanted to get your thoughts.

https://github.com/jaegertracing/jaeger/blob/main/cmd/jaeger/internal/extension/jaegerquery/server.go#L137-L139. The previous check to get the traces archive factory should be sufficient. If it exists, we can set the archive span reader and archive span writer in query options. Does that make sense?
jaeger/cmd/query/app/flags.go

Line 139 in 772d84c

func (qOpts *QueryOptions) BuildQueryServiceOptions(storageFactory storage.Factory, logger *zap.Logger) *querysvc.QueryServiceOptions {

. This is called by cmd/query and cmd/all-in-one. Should we instantiate a new storage factory here that's the archive storage factory and pass that down to the function linked.
https://github.com/jaegertracing/jaeger/blob/main/cmd/remote-storage/app/server.go#L81. This is used in the remote storage in cmd/remote-storage. Should we do the same thing here as suggested in 2?

yurishkuro · 2024-11-03T19:39:12Z

I don't think you need to change (1), but we need to change InitArchiveStorage() not to cast but to use the storage directly

(2) yes

(3) I think you don't need to change it because if the caller needs an archive storage it should instantiate a different remote storage. As I understand it there's not a dedicated gRPC API for archive storage, which could go away the same way as ArchiveFactory.

mahadzaryab1 · 2024-11-03T19:51:20Z

@yurishkuro Got it. For (2) - this is how the primary storage factory is initialized. How would we go about initializing the archive storage factory here to expose the CLI flags for storages that have archive flags (cassandra/es)

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-11-03T21:26:33Z

@yurishkuro For v1, what do you think of passing the isArchive flag into https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/factory.go#L116. This way, we can create a new archive storage using NewFactory, which we can pass down to es.NewFactory() and any other storage configs that need it.

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-11-04T01:17:45Z

cmd/remote-storage/app/server.go

+	// TODO: what should we do here?
+	// _ = qOpts.InitArchiveStorage(f, logger)


@yurishkuro any thoughts on how we should handle this case here? previously, this wouldn't initialize the archive storage if the factory didn't implement the ArchiveStorage interface but now it always will.

Signed-off-by: Mahad Zaryab <[email protected]>

codecov · 2024-11-04T02:16:41Z

Codecov Report

Attention: Patch coverage is 98.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 96.44%. Comparing base (0a24f6d) to head (4194aab).
Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
plugin/storage/es/factory.go	97.56%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6156      +/-   ##
==========================================
- Coverage   96.47%   96.44%   -0.04%     
==========================================
  Files         354      354              
  Lines       20126    19997     -129     
==========================================
- Hits        19417    19286     -131     
- Misses        524      526       +2     
  Partials      185      185

Flag	Coverage Δ
badger_v1	`8.42% <0.00%> (+0.10%)`	⬆️
badger_v2	`1.70% <0.00%> (+0.02%)`	⬆️
cassandra-4.x-v1	`14.15% <35.00%> (-0.24%)`	⬇️
cassandra-4.x-v2	`1.64% <0.00%> (+0.02%)`	⬆️
cassandra-5.x-v1	`14.15% <35.00%> (-0.24%)`	⬇️
cassandra-5.x-v2	`1.64% <0.00%> (+0.02%)`	⬆️
elasticsearch-6.x-v1	`18.18% <38.75%> (-0.43%)`	⬇️
elasticsearch-7.x-v1	`18.26% <38.75%> (-0.43%)`	⬇️
elasticsearch-8.x-v1	`18.43% <38.75%> (-0.42%)`	⬇️
elasticsearch-8.x-v2	`1.69% <0.00%> (+0.01%)`	⬆️
grpc_v1	`?`
grpc_v2	`6.91% <3.75%> (-0.09%)`	⬇️
kafka-v1	`8.99% <0.00%> (+0.11%)`	⬆️
kafka-v2	`1.70% <0.00%> (+0.02%)`	⬆️
memory_v2	`1.70% <0.00%> (+0.02%)`	⬆️
opensearch-1.x-v1	`18.30% <38.75%> (-0.44%)`	⬇️
opensearch-2.x-v1	`18.31% <38.75%> (-0.43%)`	⬇️
opensearch-2.x-v2	`1.69% <0.00%> (+0.01%)`	⬆️
tailsampling-processor	`0.47% <0.00%> (+<0.01%)`	⬆️
unittests	`95.29% <83.75%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-11-08T01:03:43Z

@yurishkuro For ES Archive, it looks like the archive flag is used in two places:

To choose the suffix for the index name (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/spanstore/reader.go#L167)
In getSourceFn to add sorting and a search after clause if we're not querying the archive index (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/spanstore/reader.go#L211-L214)

For Cassandra, its a bit more straightforward:

The only difference between the primary storage factory and the archive one is that the metrics factory is initialized with a different namespace (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/cassandra/factory.go#L137).

Do you have any thoughts on how we should proceed? Do we still want to expose an isArchive or setAsArchive flag?

yurishkuro · 2024-11-08T01:11:27Z

The metrics namespacing for Cassandra can be easily done elsewhere, does not need to be based on isArchive. It's only needed there now because primary/archive distinction is made internally in the factory.

For ES:

"To choose the suffix for the index name" - not needed since the user can do that themselves. One significant change in v2 is that we cannot provide different defaults in the config for primary/archive, and having the same index prefix by mistake will be bad for the user. Maybe we can introduce additional validation for configs of the same type and catch that as a configuration error.
"to add sorting and a search after clause if we're not querying the archive index" - I don't understand the purpose of that difference. Any ideas? Would it hurt if the logic for archive was the same as for primary?
There is a 3rd, most important usage of isArchive - in the GetIndicesFn. That's the one where I wonder if we could replace isArchive with a different logic based on the lookback parameter.

mahadzaryab1 · 2024-11-08T01:28:56Z

The metrics namespacing for Cassandra can be easily done elsewhere, does not need to be based on isArchive. It's only needed there now because primary/archive distinction is made internally in the factory.

With the setup of v2, how would we make that distinction?

For ES:

"To choose the suffix for the index name" - not needed since the user can do that themselves. One significant change in v2 is that we cannot provide different defaults in the config for primary/archive, and having the same index prefix by mistake will be bad for the user. Maybe we can introduce additional validation for configs of the same type and catch that as a configuration error.

@yurishkuro Okay I see. But if the configurations are being held in different factories - how would we perform validation there?

"to add sorting and a search after clause if we're not querying the archive index" - I don't understand the purpose of that difference. Any ideas? Would it hurt if the logic for archive was the same as for primary?

I was thinking the same as well. I'm guessing its an optimization to avoid sorting the archive storage which would be larger than the a non-archive storage. Here is the documentation for Search After. Would we have a performance degradation here if we enabled this for archive as well?

There is a 3rd, most important usage of isArchive - in the GetIndicesFn. That's the one where I wonder if we could replace isArchive with a different logic based on the lookback parameter.

Ah yes. I was only looking at the reader. Are you referring to getSpanAndServiceIndexFn? This looks to once again be creating a different suffix based on whether the storage is archive or not (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/spanstore/writer.go#L104-L111). Can we not use the same approach here as the first point for the ES reader?

yurishkuro · 2024-11-08T02:32:11Z

With the setup of v2, how would we make that distinction?

the storage extension manages factories and knows storage names, it can use those names to bound MetricsFactory to have a specific label.

Okay I see. But if the configurations are being held in different factories - how would we perform validation there?

No, configuration are passed to factories, but held in a single place in storage extension, which can invoke additional validation here:

jaeger/cmd/jaeger/internal/extension/jaegerstorage/extension.go

Line 118 in a420fd9

for storageName, cfg := range s.config.Backends {

Would we have a performance degradation here if we enabled this for archive as well?

but the thing is, we never search for traces in archive storage, we only retrieve trace by ID, so sorting in this case would only apply to the spans within a trace - yes, could have overhead for very large trace, but still small.

yurishkuro · 2024-11-08T02:39:48Z

Are you referring to getSpanAndServiceIndexFn? This looks to once again be creating a different suffix

not just suffix, when it's primary storage with manually rotated indices they also have the date pattern in the name, but archive index never has that (because it doesn't grow large). One compromise we could do is recommend that users don't use archive storage with manually rotated indices, only with ILM. Unless there's another way that I am not seeing.

Btw, reader also has similar branching in index naming logic:

jaeger/plugin/storage/es/spanstore/reader.go

Line 174 in a420fd9

    
           return addRemoteReadClusters(func(indexPrefix, _ /* indexDateLayout */ string, _ /* startTime */ time.Time, _ /* endTime */ time.Time, _ /* reduceDuration */ time.Duration) []string {

mahadzaryab1 · 2024-11-08T12:28:25Z

With the setup of v2, how would we make that distinction?

the storage extension manages factories and knows storage names, it can use those names to bound MetricsFactory to have a specific label.

So this is where the Cassandra storage is initialized in the extension. Are you suggesting we can pass the label into the constructor here? If so, how would we make the distinction based on just the name?

Okay I see. But if the configurations are being held in different factories - how would we perform validation there?

No, configuration are passed to factories, but held in a single place in storage extension, which can invoke additional validation here:

Oh okay, I see. So whenever we're processing an ES config, we would go through all the other ones that exist and make sure that the index prefixes are not the same?

jaeger/cmd/jaeger/internal/extension/jaegerstorage/extension.go

Line 118 in a420fd9

for storageName, cfg := range s.config.Backends {

Would we have a performance degradation here if we enabled this for archive as well?

but the thing is, we never search for traces in archive storage, we only retrieve trace by ID, so sorting in this case would only apply to the spans within a trace - yes, could have overhead for very large trace, but still small.

Sounds good. I can remove the indirection here then.

mahadzaryab1 · 2024-11-08T12:36:17Z

Are you referring to getSpanAndServiceIndexFn? This looks to once again be creating a different suffix

not just suffix, when it's primary storage with manually rotated indices they also have the date pattern in the name, but archive index never has that (because it doesn't grow large). One compromise we could do is recommend that users don't use archive storage with manually rotated indices, only with ILM. Unless there's another way that I am not seeing.

Btw, reader also has similar branching in index naming logic:

jaeger/plugin/storage/es/spanstore/reader.go

Line 174 in a420fd9

return addRemoteReadClusters(func(indexPrefix, _ /* indexDateLayout */ string, _ /* startTime */ time.Time, _ /* endTime */ time.Time, _ /* reduceDuration */ time.Duration) []string {

How would making that recommendation simplify the archive branching for us?

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 · 2024-11-09T01:42:49Z

@yurishkuro Regarding getSourceFn, the es archive integration tests seem to fail when the SearchAfter clause is added as it is unable to find the trace.

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 commented Nov 3, 2024

View reviewed changes

pkg/es/config/flags.go Outdated Show resolved Hide resolved

mahadzaryab1 commented Nov 3, 2024

View reviewed changes

pkg/es/config/flags.go Outdated Show resolved Hide resolved

yurishkuro reviewed Nov 3, 2024

View reviewed changes

pkg/es/config/flags.go Outdated Show resolved Hide resolved

mahadzaryab1 added 4 commits November 3, 2024 13:21

Add IsArchive Flag To ESConfig

92cc3f4

Signed-off-by: Mahad Zaryab <[email protected]>

Change Options To Only Hold One Config

8a548a1

Signed-off-by: Mahad Zaryab <[email protected]>

Change ES Factory To Remove Archive Implementation

dd04a13

Signed-off-by: Mahad Zaryab <[email protected]>

Change Usages of Factory

209b834

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 force-pushed the refactor-archive-es branch from 63d9d38 to 209b834 Compare November 3, 2024 18:25

mahadzaryab1 commented Nov 3, 2024

View reviewed changes

yurishkuro reviewed Nov 3, 2024

View reviewed changes

plugin/storage/es/factory.go Outdated Show resolved Hide resolved

yurishkuro reviewed Nov 3, 2024

View reviewed changes

Remove Namespace Parameter For NewFactoryWithConfig

89c7c70

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 added 6 commits November 3, 2024 15:07

Fix Build Errors And Naming

c6b6274

Signed-off-by: Mahad Zaryab <[email protected]>

Remove Cast To Archive Factory

92e8070

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Unit Test

535aa2c

Signed-off-by: Mahad Zaryab <[email protected]>

Remove Archive Reader And Writers From Blackhole And Memory Storage

76a3528

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Lint Checks

50795b0

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Unit Test

a4d41b9

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 added the changelog:bugfix-or-minor-feature label Nov 3, 2024

mahadzaryab1 force-pushed the refactor-archive-es branch from 6789a49 to a4d41b9 Compare November 4, 2024 01:00

mahadzaryab1 added 2 commits November 3, 2024 20:03

Fix Unit Tests

11a560d

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Lint Error And Comment Out Archive Flags

8c8e8ae

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 commented Nov 4, 2024

View reviewed changes

Initialize Archive Factory In Tests

bbcdbd7

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 force-pushed the refactor-archive-es branch from cefc464 to bbcdbd7 Compare November 4, 2024 01:35

Set IsArchive Flag In Factory

6c1fd95

Signed-off-by: Mahad Zaryab <[email protected]>

Fix Span Writer In ES Config

43487cd

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 and others added 2 commits November 4, 2024 18:37

Merge branch 'main' into refactor-archive-es

c65bd21

Signed-off-by: Mahad Zaryab <[email protected]>

Expose is_archive Flag

7f87200

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 force-pushed the refactor-archive-es branch from ecc679f to 7f87200 Compare November 9, 2024 00:14

Refactor Cassandra Namespace To Only Hold One Storage Type

7d65eb8

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 force-pushed the refactor-archive-es branch from 8fccbac to 7d65eb8 Compare November 9, 2024 01:37

mahadzaryab1 force-pushed the refactor-archive-es branch from 272749b to 655a666 Compare November 9, 2024 01:57

Fix Cassandra Archive Integration Test

c385416

Signed-off-by: Mahad Zaryab <[email protected]>

mahadzaryab1 force-pushed the refactor-archive-es branch from 655a666 to c385416 Compare November 9, 2024 01:59

Fix Linter

4194aab

Signed-off-by: Mahad Zaryab <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Refactor storage factories to hold one configuration #6156

[WIP] Refactor storage factories to hold one configuration #6156

mahadzaryab1 commented Nov 3, 2024

mahadzaryab1 Nov 3, 2024

yurishkuro Nov 3, 2024

mahadzaryab1 Nov 3, 2024

yurishkuro Nov 3, 2024

yurishkuro Nov 3, 2024 •

edited

Loading

mahadzaryab1 Nov 3, 2024 •

edited

Loading

mahadzaryab1 Nov 3, 2024

mahadzaryab1 commented Nov 3, 2024 •

edited

Loading

yurishkuro commented Nov 3, 2024

mahadzaryab1 commented Nov 3, 2024

mahadzaryab1 commented Nov 3, 2024

mahadzaryab1 Nov 4, 2024

codecov bot commented Nov 4, 2024 •

edited

Loading

mahadzaryab1 commented Nov 8, 2024

yurishkuro commented Nov 8, 2024

mahadzaryab1 commented Nov 8, 2024

yurishkuro commented Nov 8, 2024 •

edited

Loading

yurishkuro commented Nov 8, 2024

mahadzaryab1 commented Nov 8, 2024

mahadzaryab1 commented Nov 8, 2024

mahadzaryab1 commented Nov 9, 2024

		// TODO: what should we do here?
		// _ = qOpts.InitArchiveStorage(f, logger)

[WIP] Refactor storage factories to hold one configuration #6156

Are you sure you want to change the base?

[WIP] Refactor storage factories to hold one configuration #6156

Conversation

mahadzaryab1 commented Nov 3, 2024

Which problem is this PR solving?

Description of the changes

How was this change tested?

Checklist

mahadzaryab1 Nov 3, 2024

Choose a reason for hiding this comment

yurishkuro Nov 3, 2024

Choose a reason for hiding this comment

mahadzaryab1 Nov 3, 2024

Choose a reason for hiding this comment

yurishkuro Nov 3, 2024

Choose a reason for hiding this comment

yurishkuro Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

mahadzaryab1 Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

mahadzaryab1 Nov 3, 2024

Choose a reason for hiding this comment

mahadzaryab1 commented Nov 3, 2024 • edited Loading

yurishkuro commented Nov 3, 2024

mahadzaryab1 commented Nov 3, 2024

mahadzaryab1 commented Nov 3, 2024

mahadzaryab1 Nov 4, 2024

Choose a reason for hiding this comment

codecov bot commented Nov 4, 2024 • edited Loading

Codecov Report

mahadzaryab1 commented Nov 8, 2024

yurishkuro commented Nov 8, 2024

mahadzaryab1 commented Nov 8, 2024

yurishkuro commented Nov 8, 2024 • edited Loading

yurishkuro commented Nov 8, 2024

mahadzaryab1 commented Nov 8, 2024

mahadzaryab1 commented Nov 8, 2024

mahadzaryab1 commented Nov 9, 2024

yurishkuro Nov 3, 2024 •

edited

Loading

mahadzaryab1 Nov 3, 2024 •

edited

Loading

mahadzaryab1 commented Nov 3, 2024 •

edited

Loading

codecov bot commented Nov 4, 2024 •

edited

Loading

yurishkuro commented Nov 8, 2024 •

edited

Loading