Phase out the distinction between primary and archive storage #6065

yurishkuro · 2024-10-06T20:41:28Z

Update Jan 7 2025: the intention of this issue is to do away with awareness in the code of the differences between primary and archive storage. The two implementations are generally identical, have identical reader/writer API, with only minor changes in behavior that can be managed with more explicit parameters instead if a hardcoded isArchive flag. For example, in the "legacy" mode the ES primary storage uses manual rotation of indices, e.g. by creating a new index name every rollover period, whereas archive storage always uses a fixed index name. The latter behavior can be reproduced by setting use_aliases flag instead of having an explicit understanding of isArchive mode.

This unification becomes especially important in Jaeger-v2 where storage backends are already declared uniformly in the jaeger_storage extension without any knowledge of primary/archive distinction. The distinction only appears once jaegerquery extension retrieves a specific backend by name and uses it as either primary or archive backend, by which point the backend itself is already initialized. The intent is to expose required configuration options to the user such that they can customize the behavior of the backend used for archive as they see fit, e.g. if they still want to rotate the ES indices, they can do so.

This thread #6060 (comment)

The text was updated successfully, but these errors were encountered:

mahadzaryab1 · 2024-10-17T02:34:01Z

@yurishkuro dug into this a little bit while redoing the configurations and here's what I found

The factory creates a primary and archive namespace for each provided configuration (

jaeger/plugin/storage/es/factory.go

Lines 88 to 95 in f411b3c

    
           defaultConfig := DefaultConfig() 
        
           cfg.ApplyDefaults(&defaultConfig) 
        
           archive := make(map[string]*namespaceConfig) 
        
           archive[archiveNamespace] = &namespaceConfig{ 
        
           	Configuration: cfg, 
        
           	namespace:     archiveNamespace, 
        
           }

CreateArchiveSpanReader and CreateArchiveSpanWriter are called by InitArchiveStorage in v1's query service (https://github.com/jaegertracing/jaeger/blob/main/cmd/query/app/querysvc/query_service.go#L130)
The InitArchiveStorage method is called by the v2 jaeger query extension (https://github.com/jaegertracing/jaeger/blob/main/cmd/jaeger/internal/extension/jaegerquery/server.go#L137)

So I believe that this is working as expected. Let me know if there's anything I'm missing here.

One other thing to note is that in v1, the additional namespaces need to be explictly enabled (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/options.go#L110) but the archive namespace is enabled by default in v2 (https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/es/options.go#L437).

yurishkuro · 2024-10-17T14:16:12Z

@mahadzaryab1 In v1 the workflow is:

the binary creates a Factory
asks factory to initialize itself from CLI flags
creates main storage implementations via CreateSpanWriter
optionally calls CreateArchiveSpanWriter

The key observation here is that the distinction between primary and archive is handled inside the Factory because the caller clearly indicates the intent by calling InitArchiveStorage and CreateArchiveSpanWriter. in other words, the caller delegates the specific initialization required for archive storage to the Factory - a single Factory that manages both primary and archive.

In contrast, in v2 the workflow is supposed to be different

the user designates a configuration as primary or archive, but the configuration otherwise is the same and each configuration is represented by a unique Factory object
when initializing query service the v2 extension is supposed to call CreateSpanWriter on the primary factory, and also call CreateSpanWriter on the archive factory (calling the same API)
instead, v2 extension still falls back onto calling two different APIs due to invoking InitArchiveStorage

So v2 is serendipitously working right now, but not as expected. We can see it in how Memory storage is handled - when I configured all-in-one with primary and archive storage, originally it did not work for me because Memory storage factory never implemented the Archive sub-API, so I had to add it (and the side effect of it is that archive for memory cannot be turned off now). But that was exactly the opposite of what I wanted to happen - I wanted the config to be able to configure two storages and designate each as primary / archive, so that the v2 extension would only interact with the factories via CreateSpanWriter.

mahadzaryab1 · 2024-10-19T02:25:06Z

@yurishkuro Thanks so much for the helpful context. Do you have any thoughts on how we should proceed here? Do we want to make changes to the query extension to not use InitArchiveStorage?

yurishkuro · 2024-10-19T16:51:04Z

Yes that would be ideal. It may need refactoring of query service, eg perhaps it should be taking a separate archive factory instead of using a different interface on the main factory.

mahadzaryab1 · 2024-10-19T21:47:50Z

@yurishkuro just looking for a bit of clarification

so today, we're doing the following:

	f, err := jaegerstorage.GetStorageFactory(s.config.Storage.TracesArchive, host)
	if err != nil {
		return fmt.Errorf("cannot find archive storage factory: %w", err)
	}

	if !opts.InitArchiveStorage(f, s.telset.Logger) {
		s.telset.Logger.Info("Archive storage not initialized")
	}

Is the goal to be able to just do the following? Why do we need to introduce a new archive factory?

	f, err := jaegerstorage.GetStorageFactory(s.config.Storage.TracesArchive, host)
	if err != nil {
		return fmt.Errorf("cannot find archive storage factory: %w", err)
	}

	spanReader, err := f.CreateSpanReader()
	if err != nil {
		return fmt.Errorf("cannot create span reader: %w", err)
	}
        opts.ArchiveSpanReader = f.spanReader

yurishkuro · 2024-10-19T23:33:37Z

yes

mahadzaryab1 · 2024-10-19T23:54:09Z

@yurishkuro do we still need a separate interface to do the above? One other question I had was, will simply calling CreateSpanReader behave the same way as CreateArchiveSpanReader does? How would there be a differentiation between normal storage logic and archive storage logic?

yurishkuro · 2024-10-20T17:18:18Z

That is a good question, but if you look at all implementations of GetArchiveSomething they are no different from regular method. I think the only difference is in ES implementation which needs the isArchive flag. We can expose that flag as part of the ES config - not ideal because the user has to remember to set it, otherwise they might get limited look back. But I think in ES the archive storage requires separate settings anyway, eg you may not want to rollover index every day.

mahadzaryab1 · 2024-10-20T17:52:08Z

@yurishkuro How would it work if we add the is_archive flag to the ES config? We're currently storing separate configs in the ES factory as follows so we have no way of knowing if CreateSpanSomething is being called for the regular config or the archive config.

	primaryConfig *config.Configuration
	archiveConfig *config.Configuration

mahadzaryab1 · 2024-10-20T18:02:33Z

Do we want to refactor the ES factory to only only hold one config/client and then propagate the isArchive flag to it?

yurishkuro · 2024-10-20T18:48:12Z

In v2 we have two independent factories. I'd say yes, we want to refactor the factories to represent just one kind of storage. It will require changes to query service. I think only Cassandra and ES factories are storing two different namespaces.

mahadzaryab1 · 2024-10-20T18:53:28Z

@yurishkuro Got it! I'll get started on that. Thanks!

yurishkuro · 2024-10-22T19:31:39Z

This is not blocking for v2 GA, I confirmed that a proper code path is being invoked, e.g. by introducing a panic in CreateArchiveSpanReader:

$ go run ./cmd/jaeger --config ./cmd/jaeger/config-elasticsearch.yaml
...
panic: archive reader not implemented

goroutine 1 [running]:
github.com/jaegertracing/jaeger/plugin/storage/es.(*Factory).CreateArchiveSpanReader(0x107fe3c80?)
	/Users/ysh/dev/jaegertracing/jaeger/plugin/storage/es/factory.go:201 +0x54
github.com/jaegertracing/jaeger/cmd/query/app/querysvc.(*QueryServiceOptions).InitArchiveStorage(0x14000a8ef60, {0x106ad1898?, 0x14000eb6e00}, 0x14000eb8980)
	/Users/ysh/dev/jaegertracing/jaeger/cmd/query/app/querysvc/query_service.go:136 +0x64
github.com/jaegertracing/jaeger/cmd/jaeger/internal/extension/jaegerquery.(*server).addArchiveStorage(0x14000ea5c20, 0x14000a8ef60, {0x106ab6d20?, 0x14000a45760?})
	/Users/ysh/dev/jaegertracing/jaeger/cmd/jaeger/internal/extension/jaegerquery/server.go:137 +0x98
github.com/jaegertracing/jaeger/cmd/jaeger/internal/extension/jaegerquery.(*server).Start(0x14000ea5c20, {0x14000ec6030?, 0x14000f711d0?}, {0x106ab6d20, 0x14000a45760})
	/Users/ysh/dev/jaegertracing/jaeger/cmd/jaeger/internal/extension/jaegerquery/server.go:76 +0x204

mahadzaryab1 · 2024-10-29T01:36:13Z

@yurishkuro just wanted to walk through the plan for this refactoring with a couple of questions

Should the archive storage now also conform to storage.Factory?

type Factory interface {
	// Initialize performs internal initialization of the factory, such as opening connections to the backend store.
	// It is called after all configuration of the factory itself has been done.
	Initialize(metricsFactory metrics.Factory, logger *zap.Logger) error

	// CreateSpanReader creates a spanstore.Reader.
	CreateSpanReader() (spanstore.Reader, error)

	// CreateSpanWriter creates a spanstore.Writer.
	CreateSpanWriter() (spanstore.Writer, error)

	// CreateDependencyReader creates a dependencystore.Reader.
	CreateDependencyReader() (dependencystore.Reader, error)
}

It looks like for each ES storage configuration, we configure both a primary store and an archive store with the same config. As a result, do we need to expose an isArchive flag?

yurishkuro · 2024-10-29T02:14:22Z

(1) Yes on the first point. I'd say this is more of a question to the binaries using the factory to create readers/writers, right now those binaries cast the Factory to ArchiveFactory, and it will need to change.

(2) It's not from the same config, primary/archive have their own configs (but the latter is derived from the former for defaults, so that people don't need to duplicate CLI flags - we don't plan to support such reuse in v2 config). The isArchive flag only exists in ES storage because it affects how the storage queries the index. We could probably reuse the MaxSpanAge parameter for that.

mahadzaryab1 · 2024-10-29T02:23:36Z

@yurishkuro Following up on (2) - do primary/archive have their own configs in v2 as well? Or is this just in v1?

Another question I had was the jaegerquery extension initializes the ES storage as follows:

		case cfg.Elasticsearch != nil:
			factory, err = es.NewFactoryWithConfig(*cfg.Elasticsearch, mf, s.telset.Logger)

How would it work if we changed the factory to only initialize one config (primary or archive) instead of having both in the same configuration?

yurishkuro · 2024-10-29T02:40:52Z

do primary/archive have their own configs in v2 as well?

They do. In v2 config we already make no distinction - you declare a storage with some name X of some backend type (e.g. Elastic). Then the query extension has two foreign key fields, PrimaryStore and ArchiveStore. In theory they could point to the same name X (but then the data will be written to the same db), but if they point to different names then you have different configs.

How would it work if we changed the factory to only initialize one config (primary or archive) instead of having both in the same configuration?

The caller is supposed to use different factories for different purposes.

mahadzaryab1 · 2024-10-29T02:47:42Z

do primary/archive have their own configs in v2 as well?

They do. In v2 config we already make no distinction - you declare a storage with some name X of some backend type (e.g. Elastic). Then the query extension has two foreign key fields, PrimaryStore and ArchiveStore. In theory they could point to the same name X (but then the data will be written to the same db), but if they point to different names then you have different configs.

I see. But from the user's perspective in v2 - wouldn't they just define one config and they would get the archive storage out of the box with same configurations?

How would it work if we changed the factory to only initialize one config (primary or archive) instead of having both in the same configuration?

The caller is supposed to use different factories for different purposes.

Got it. But what would the factory get set to in this case? Would the Start function set both the primary and the archive configuration?

yurishkuro · 2024-10-29T02:50:28Z

factory is not supposed to know if it's used for primary or archive storage, that's the point.

mahadzaryab1 · 2024-11-02T02:39:04Z

@yurishkuro Sorry my question was actually about the extension (https://github.com/jaegertracing/jaeger/blob/main/cmd/jaeger/internal/extension/jaegerstorage/extension.go#L133). Currently, this part here will initialize both the primary and archive config. If we refactor es.NewFactoryWithConfig to only initialize one storage type, how would the query extension deal with that to ensure both are initialized?

yurishkuro · 2024-11-02T03:33:06Z

storageextension doesn't care about primary/archive. queryservice does, and it creates to different factories, e.g.

jaeger/cmd/jaeger/internal/extension/jaegerquery/server.go

Line 132 in b644701

f, err := jaegerstorage.GetStorageFactory(s.config.Storage.TracesArchive, host)

mahadzaryab1 · 2024-11-02T13:39:27Z

@yurishkuro Got it. So when creating the factory using es.NewFactoryWithConfig, how do we specify that the config is primary/archive? Do we have the factory hold an isArchive flag that is passed in through the constructor? You mentioned earlier that it might also be something we expose to the user through the configuration - do we want to take that route and if so, how would that work?

yurishkuro · 2024-11-02T18:41:54Z

We discussed elsewhere if we even need this flag, or if the same effect can be achieved by tweaking the existing parameters.

mahadzaryab1 · 2024-11-02T18:49:08Z

@yurishkuro I think I'm a bit confused about how the config is laid out. For ES, we have a config like the following:

     some_storage:
        elasticsearch:
          indices:
            index_prefix: "jaeger-main"
            spans:
              date_layout: "2006-01-02"
              rollover_frequency: "day"
              shards: 5
              replicas: 1
            services:
              date_layout: "2006-01-02"
              rollover_frequency: "day"
              shards: 5
              replicas: 1
            dependencies:
              date_layout: "2006-01-02"
              rollover_frequency: "day"
              shards: 5
              replicas: 1
            sampling:
              date_layout: "2006-01-02"
              rollover_frequency: "day"
              shards: 5
              replicas: 1
      another_storage:
        elasticsearch:
          indices:
            index_prefix: "jaeger-archive"

How does each config get "chosen" as the primary or the archive configuration? Is it because of this delegation here?

  jaeger_query:
    storage:
      traces: some_storage
      traces_archive: another_storage

mahadzaryab1 · 2024-11-02T18:51:46Z

We discussed elsewhere if we even need this flag, or if the same effect can be achieved by tweaking the existing parameters.

Yep - that discussion was in #6065 (comment). How would we use maxspanage to decide if we should set isArchive or not?

mahadzaryab1 · 2024-11-02T22:55:12Z

@yurishkuro Another question I had was related to the setup of the config. Currently, the Options config is held in the factory which looks like this. Do we want to remove this struct and have everything be tied to namespaceConfig or even Configuration? I'm not entirely sure if that's possible given that Options is tied to the initialization for v1 storage.

yurishkuro · 2024-11-03T05:19:28Z

The namespace configs should go away, they were a mechanism for a single factory to represent multiple storages, which is what we're trying to move away from. One factory, one storage namespace, one role (primary/archive).

yurishkuro · 2024-11-03T05:20:30Z

How would we use maxspanage to decide if we should set isArchive or not?

it's used for both reading and writing, by affecting which index names are utilized. We need to inspect that logic to see if the same selections can be made based on existing parameters.

mahadzaryab1 · 2025-01-10T02:20:54Z

@yurishkuro If we don't want to change the behaviour of the archive storage for v1, then can we still upgrade the v2 storages in place? If we update the storages to the v2 interface in place and get rid of the v1 implementation, we'll lose the CreateArchiveSpanReader and CreateArchiveSpanWriter functions in the process.

…factories (#6526) ## Which problem is this PR solving? - Towards #6065 ## Description of the changes - Some of the storage factories were currently adding the `role` tag to the metrics factory that it was getting to differentiate between `primary` and `archive`. Given that goal in v2 is not have that distinction, the adding of the `primary` tag was moved to the v1 meta-factory. This tag can then be overriden by the archive specific methods in the storage factories (mainly `CreateArchiveSpanReader` and `CreateArchiveSpanWriter`) with `role=archive`. ## How was this change tested? - CI ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>

…factories (jaegertracing#6526) ## Which problem is this PR solving? - Towards jaegertracing#6065 ## Description of the changes - Some of the storage factories were currently adding the `role` tag to the metrics factory that it was getting to differentiate between `primary` and `archive`. Given that goal in v2 is not have that distinction, the adding of the `primary` tag was moved to the v1 meta-factory. This tag can then be overriden by the archive specific methods in the storage factories (mainly `CreateArchiveSpanReader` and `CreateArchiveSpanWriter`) with `role=archive`. ## How was this change tested? - CI ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>

…6555) ## Which problem is this PR solving? - Towards #6065 ## Description of the changes - This PR implements a reverse adapter (SpanWriter) that wraps a native v2 storage writer (tracestore.Writer) and downgrades it to implement the v1 writer (spanstore.Writer). - This will be used by #6519 to downgrade a native v2 writer so that it can be used by the v1 query service ## How was this change tested? - Added unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>

#6519) ## Which problem is this PR solving? - Towards #6065 ## Description of the changes - This PR changes the jaegerquery extension to remove usages of `CreateArchiveSpanReader` and `CreateArchiveSpanWriter` and replace them with their primary storage counterparts from the v2 storage API `CreateTraceReader` and `CreateTraceWriter`. - 🛑 **This PR contains a breaking changes for users of jaeger-v2 that have an ElasticSearch storage configured for `traces_archive`** 🛑 - The distinction between primary and archive storage was removed which causes some breaking changes that can be remediated. Refer to Step 9 of the test report below for remediation steps. ## Test Report ### 1. Establish Ground Truth on `main` To begin, configure the storage and query settings in the `all-in-one.yaml` file as follows: ```yaml jaeger_storage: backends: some_storage: elasticsearch: indices: index_prefix: "jaeger-main" spans: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 services: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 dependencies: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 sampling: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 another_storage: elasticsearch: indices: index_prefix: "jaeger-archive" ``` ### 2. Spin Up Elasticsearch Using Docker To bring up Elasticsearch, run the following commands: ```zsh jaeger % cd docker-compose/elasticsearch/v8 v8 % docker compose up ``` ### 3. Start Jaeger Start the Jaeger service: ```zsh jaeger % go run ./cmd/jaeger ``` ### 4. Archive a Trace From the Jaeger UI, select a trace and archive it. **traceID**: `0dc3e460bd9b8e0dddfa29a2f751cfb9` ![Trace Screenshot](https://github.com/user-attachments/assets/c9ef72a6-30da-4496-9356-20b5db42663f) ### 5. Update `index_prefix` Stop Jaeger and modify the `index_prefix` in the primary configuration to `jaeger-main-1`. This ensures the query for the same trace is no longer found in the primary storage but will be found in the archive storage. ### 6. Query for the Same Trace ```zsh curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}' { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "/api/services" } { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "GetService" } ``` ```zsh curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}' { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "GetService" } { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "/api/services" } ``` ### 7. Test Changes from This PR Stop Jaeger, checkout this PR, and restart Jaeger: ```zsh gh pr checkout 6519 go run ./cmd/jaeger ``` ### 8. Query for the Same Trace After PR Changes ```zsh curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq . { "data": null, "total": 0, "limit": 0, "offset": 0, "errors": [ { "code": 404, "msg": "trace not found" } ] } ``` ```zsh curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq . { "error": { "httpCode": 404, "message": "No traces found" } } ``` 🛑 **This is where the breaking change occurs** 🛑 ### 9. Mitigation for Users #### Set `use_aliases` for Archive Storage To mitigate the issue, set the `use_aliases` configuration for your archive storage to `true`. Update the configuration as follows: ```yaml another_storage: elasticsearch: indices: index_prefix: "jaeger-archive" use_aliases: true ``` #### Add Alias from Old Index to New Index To ensure backwards compatibility, add an alias from the old index to the new index. You can query the current set of aliases in Elasticsearch with: ```bash curl -X GET "http://localhost:9200/_aliases?pretty" ``` ```json { "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-archive-jaeger-span-archive": { "aliases": {} }, "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} } } ``` To link the new index (`jaeger-archive-jaeger-span-read`) with the old index (`jaeger-archive-jaeger-span-archive`), run the following: ```bash curl -X POST "http://localhost:9200/_aliases" -H 'Content-Type: application/json' -d' { "actions": [ { "add": { "index": "jaeger-archive-jaeger-span-archive", "alias": "jaeger-archive-jaeger-span-read" } } ] }' ``` Confirm that the alias has been added: ```bash curl -X GET "http://localhost:9200/_aliases?pretty" ``` ```json { "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-archive-jaeger-span-archive": { "aliases": { "jaeger-archive-jaeger-span-read": {} } }, "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-main-2-jaeger-service-2025-01-16": { "aliases": {} }, "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} } } ``` Note that if you already had `use_aliases` set to true for your archive storage before the upgrade, then Jaeger would've been using an index name with -read suffix: `jaeger-archive-jaeger-span-archive-read`. Then for the example above, you would create an alias `jaeger-archive-jaeger-span-read` pointing to `jaeger-archive-jaeger-span-archive-read`. ### 10. Restart Jaeger and Retry the Query Finally, restart Jaeger and run the same trace queries again: ```zsh curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}' { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "/api/services" } { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "GetService" } ``` ```zsh curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}' { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "/api/services" } { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "GetService" } ``` ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com> Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>

…aegertracing#6555) ## Which problem is this PR solving? - Towards jaegertracing#6065 ## Description of the changes - This PR implements a reverse adapter (SpanWriter) that wraps a native v2 storage writer (tracestore.Writer) and downgrades it to implement the v1 writer (spanstore.Writer). - This will be used by jaegertracing#6519 to downgrade a native v2 writer so that it can be used by the v1 query service ## How was this change tested? - Added unit tests ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>

jaegertracing#6519) ## Which problem is this PR solving? - Towards jaegertracing#6065 ## Description of the changes - This PR changes the jaegerquery extension to remove usages of `CreateArchiveSpanReader` and `CreateArchiveSpanWriter` and replace them with their primary storage counterparts from the v2 storage API `CreateTraceReader` and `CreateTraceWriter`. - 🛑 **This PR contains a breaking changes for users of jaeger-v2 that have an ElasticSearch storage configured for `traces_archive`** 🛑 - The distinction between primary and archive storage was removed which causes some breaking changes that can be remediated. Refer to Step 9 of the test report below for remediation steps. ## Test Report ### 1. Establish Ground Truth on `main` To begin, configure the storage and query settings in the `all-in-one.yaml` file as follows: ```yaml jaeger_storage: backends: some_storage: elasticsearch: indices: index_prefix: "jaeger-main" spans: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 services: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 dependencies: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 sampling: date_layout: "2006-01-02" rollover_frequency: "day" shards: 5 replicas: 1 another_storage: elasticsearch: indices: index_prefix: "jaeger-archive" ``` ### 2. Spin Up Elasticsearch Using Docker To bring up Elasticsearch, run the following commands: ```zsh jaeger % cd docker-compose/elasticsearch/v8 v8 % docker compose up ``` ### 3. Start Jaeger Start the Jaeger service: ```zsh jaeger % go run ./cmd/jaeger ``` ### 4. Archive a Trace From the Jaeger UI, select a trace and archive it. **traceID**: `0dc3e460bd9b8e0dddfa29a2f751cfb9` ![Trace Screenshot](https://github.com/user-attachments/assets/c9ef72a6-30da-4496-9356-20b5db42663f) ### 5. Update `index_prefix` Stop Jaeger and modify the `index_prefix` in the primary configuration to `jaeger-main-1`. This ensures the query for the same trace is no longer found in the primary storage but will be found in the archive storage. ### 6. Query for the Same Trace ```zsh curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}' { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "/api/services" } { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "GetService" } ``` ```zsh curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}' { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "GetService" } { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "/api/services" } ``` ### 7. Test Changes from This PR Stop Jaeger, checkout this PR, and restart Jaeger: ```zsh gh pr checkout 6519 go run ./cmd/jaeger ``` ### 8. Query for the Same Trace After PR Changes ```zsh curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq . { "data": null, "total": 0, "limit": 0, "offset": 0, "errors": [ { "code": 404, "msg": "trace not found" } ] } ``` ```zsh curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq . { "error": { "httpCode": 404, "message": "No traces found" } } ``` 🛑 **This is where the breaking change occurs** 🛑 ### 9. Mitigation for Users #### Set `use_aliases` for Archive Storage To mitigate the issue, set the `use_aliases` configuration for your archive storage to `true`. Update the configuration as follows: ```yaml another_storage: elasticsearch: indices: index_prefix: "jaeger-archive" use_aliases: true ``` #### Add Alias from Old Index to New Index To ensure backwards compatibility, add an alias from the old index to the new index. You can query the current set of aliases in Elasticsearch with: ```bash curl -X GET "http://localhost:9200/_aliases?pretty" ``` ```json { "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-archive-jaeger-span-archive": { "aliases": {} }, "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} } } ``` To link the new index (`jaeger-archive-jaeger-span-read`) with the old index (`jaeger-archive-jaeger-span-archive`), run the following: ```bash curl -X POST "http://localhost:9200/_aliases" -H 'Content-Type: application/json' -d' { "actions": [ { "add": { "index": "jaeger-archive-jaeger-span-archive", "alias": "jaeger-archive-jaeger-span-read" } } ] }' ``` Confirm that the alias has been added: ```bash curl -X GET "http://localhost:9200/_aliases?pretty" ``` ```json { "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-archive-jaeger-span-archive": { "aliases": { "jaeger-archive-jaeger-span-read": {} } }, "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} }, "jaeger-main-2-jaeger-service-2025-01-16": { "aliases": {} }, "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} } } ``` Note that if you already had `use_aliases` set to true for your archive storage before the upgrade, then Jaeger would've been using an index name with -read suffix: `jaeger-archive-jaeger-span-archive-read`. Then for the example above, you would create an alias `jaeger-archive-jaeger-span-read` pointing to `jaeger-archive-jaeger-span-archive-read`. ### 10. Restart Jaeger and Retry the Query Finally, restart Jaeger and run the same trace queries again: ```zsh curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}' { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "/api/services" } { "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "operationName": "GetService" } ``` ```zsh curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}' { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "/api/services" } { "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9", "name": "GetService" } ``` ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com> Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>

…terfaces (#6567) ## Which problem is this PR solving? - Towards #6065 ## Description of the changes - This PR completely removes the interface `storage.ArchiveFactory` by refactoring all the storage implementations to remove the distinction between a primary and archive interface. Note that the concept of archive storage remains the same within Jaeger, we just now use the same interface to handle both primary and archive storages. - 🛑 Breaking change for users of GRPC storage 🛑 - The GRPC storage was changed to only dispatch to the primary storage instead of being able to dispatch to a primary and archive storage. - Mitigation for jaeger-v1 users: In order to restore your archive storage, configure a new GRPC server on a different port and specify it via `--grpc-storage-archive.server`. Archive storage will also need to be enabled via `--grpc-storage-archive.enabled=true` - Mitigation for jaeger-v2 users: In order to restore your archive storage, configured a new GRPC server on a different port and specify it via a new storage backend in `jaeger_storage.backends` (an example of this can be viewed at https://github.com/jaegertracing/jaeger/blob/main/cmd/jaeger/config-remote-storage.yaml) ## How was this change tested? ### gRPC Storage #### On main (establish ground truth) Start the remote storage binary (uses memory storage by default which implements the ArchiveFactory interface) ``` go run ./cmd/remote-storage ``` Start the all in one binary configured with grpc storage (jaeger-v1) ``` SPAN_STORAGE_TYPE=grpc go run ./cmd/all-in-one --grpc-storage.server=http://localhost:17271 ``` Traces can be archived from the UI <img width="1469" alt="Screenshot 2025-01-23 at 10 17 32 PM" src="https://github.com/user-attachments/assets/76a7341b-0344-479b-ad78-380a32412ee4" /> For jaeger-v2, change the extension section of `cmd/jaeger/internal/all-in-one.yaml` to be the following ```yaml jaeger_query: storage: traces: some_storage traces_archive: some_storage jaeger_storage: backends: some_storage: grpc: endpoint: localhost:17271 tls: insecure: true ``` and then start the binary as follows: ``` go run ./cmd/jaeger/ ``` #### For current PR Stop both binaries and checkout this PR ``` gh pr checkout 6567 ``` Start two remote storage binaries (in two separate terminals) ``` go run ./cmd/remote-storage --admin.http.host-port=:17270 --grpc.host-port=:17271 go run ./cmd/remote-storage --admin.http.host-port=:17272 --grpc.host-port=:17273 ``` Start the all-in-one binary with explicit archive configurations ``` SPAN_STORAGE_TYPE=grpc go run ./cmd/all-in-one --grpc-storage.server=http://localhost:17271 --grpc-storage-archive.enabled=true --grpc-storage-archive.server=http://localhost:17273 ``` Traces can be once again archived from the UI <img width="1469" alt="Screenshot 2025-01-23 at 10 25 29 PM" src="https://github.com/user-attachments/assets/1b4ca69f-e94b-4b08-8854-04d3db2650fc" /> For jaeger-v2, the configuration was changed to the following: ```yaml extensions: jaeger_query: storage: traces: some_storage traces_archive: another_storage jaeger_storage: backends: some_storage: grpc: endpoint: localhost:17271 tls: insecure: true another_storage: grpc: endpoint: localhost:17273 tls: insecure: true ``` Try running all-in-one without archive-storage enabled ``` SPAN_STORAGE_TYPE=grpc go run ./cmd/all-in-one --grpc-storage.server=http://localhost:17271 ``` We cannot archive traces <img width="1469" alt="Screenshot 2025-01-23 at 10 26 53 PM" src="https://github.com/user-attachments/assets/0ae7cdb6-25a5-4a74-be63-fb421d77d611" /> ### CLI Flags #### ElasticSearch CLI Flags ``` git checkout main SPAN_STORAGE_TYPE=elasticsearch go run ./cmd/collector help > es_main git checkout v1-es-archive SPAN_STORAGE_TYPE=elasticsearch go run ./cmd/collector help > es_current ``` The diff can be viewed [here](https://www.diffchecker.com/xXuFqnOc/). There is no difference. #### Cassandra CLI Flags ``` git checkout main SPAN_STORAGE_TYPE=cassandra go run ./cmd/collector help > cassandra_main git checkout v1-es-archive SPAN_STORAGE_TYPE=cassandra go run ./cmd/collector help > cassandra_current ``` The diff can be viewed [here](https://www.diffchecker.com/x74PKvhA/). There are a few here differences here in which `cassandra-archive.*` gains some new configuration options that were previously only existed for the primary storage. `cassandra-archive.*` also gains some defaults that will be the same as the the primary storage. ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com> Signed-off-by: Mahad Zaryab <43658574+mahadzaryab1@users.noreply.github.com>

…handler (#6611) ## Which problem is this PR solving? - Towards #6065 ## Description of the changes - 🛑 This PR deprecates two fields in the `CapabilitiesResponse`; `ArchiveSpanReader` and `ArchiveSpanWriter` as these capabilities do not exist within grpc storage anymore and are configured externally. 🛑 - 🛑 This PR deprecates the ArchiveSpanReaderPlugin and ArchvieSpanWriterPlugin 🛑 ## How was this change tested? - CI ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>

…handler (jaegertracing#6611) ## Which problem is this PR solving? - Towards jaegertracing#6065 ## Description of the changes - 🛑 This PR deprecates two fields in the `CapabilitiesResponse`; `ArchiveSpanReader` and `ArchiveSpanWriter` as these capabilities do not exist within grpc storage anymore and are configured externally. 🛑 - 🛑 This PR deprecates the ArchiveSpanReaderPlugin and ArchvieSpanWriterPlugin 🛑 ## How was this change tested? - CI ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>

mahadzaryab1 · 2025-01-27T00:00:38Z

@yurishkuro Anything else we want to address as part of this issue?

yurishkuro · 2025-01-27T00:18:43Z

I think we can close, I can't anything high priority.

…writer (#6625) ## Which problem is this PR solving? - Towards #6065 ## Description of the changes - This PR cleans up the integration test suite for jaeger-v1 by removing the archive related fields for all tests since they were being skipped anyway, with the exception of elasticsearch. - Since the only storage that has a difference between primary and archive storage is ElasticSearch, this PR moves the archive trace test to `elasticsearch_test.go`. ## How was this change tested? - CI ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully - for `jaeger`: `make lint test` - for `jaeger-ui`: `npm run lint` and `npm run test` --------- Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>

yurishkuro added this to Jaeger V2 Oct 6, 2024

yurishkuro converted this from a draft issue Oct 6, 2024

yurishkuro added the help wanted Features that maintainers are willing to accept but do not have cycles to implement label Oct 6, 2024

yurishkuro mentioned this issue Oct 6, 2024

[jaeger-v2] Refactor ElasticSearch/OpenSearch Storage Configurations #6060

Merged

4 tasks

dosubot bot added area/storage storage/elasticsearch labels Oct 6, 2024

mahadzaryab1 mentioned this issue Nov 3, 2024

[WIP] Remove dependency on ArchiveFactory #6156

Closed

4 tasks

This was referenced Jan 4, 2025

Upgrade Storage Backends to V2 Storage API #6458

Open

[storage] Remove dependency on archive flag in ES reader #6490

Merged

yurishkuro changed the title ~~Investigate if ES Storage is passed a proper isArchive flag~~ Phase out the distinction between primary and archive storage Jan 7, 2025

mahadzaryab1 mentioned this issue Jan 10, 2025

[v2][query] Create archive reader/writer using regular factory methods #6519

Merged

4 tasks

mahadzaryab1 mentioned this issue Jan 11, 2025

[refactor][storage] Remove metrics factory distinctions from storage factories #6526

Merged

4 tasks

mahadzaryab1 mentioned this issue Jan 16, 2025

[v2][storage] Implement reverse adapter to translate v2 writer to v1 #6555

Merged

4 tasks

mahadzaryab1 mentioned this issue Jan 20, 2025

[storage] Remove distinction between primary and archive storage interfaces #6567

Merged

4 tasks

mahadzaryab1 mentioned this issue Jan 26, 2025

[refactor] Remove archive reader and writer from remote storage grpc handler #6611

Merged

4 tasks

yurishkuro closed this as completed Jan 27, 2025

github-project-automation bot moved this to Done in Jaeger V2 Jan 27, 2025

mahadzaryab1 mentioned this issue Jan 27, 2025

[chore][tests] Clean up integration tests to remove archive reader / writer #6625

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase out the distinction between primary and archive storage #6065

Phase out the distinction between primary and archive storage #6065

yurishkuro commented Oct 6, 2024 •

edited

Loading

mahadzaryab1 commented Oct 17, 2024 •

edited

Loading

yurishkuro commented Oct 17, 2024

mahadzaryab1 commented Oct 19, 2024

yurishkuro commented Oct 19, 2024

mahadzaryab1 commented Oct 19, 2024

yurishkuro commented Oct 19, 2024

mahadzaryab1 commented Oct 19, 2024 •

edited

Loading

yurishkuro commented Oct 20, 2024

mahadzaryab1 commented Oct 20, 2024 •

edited

Loading

mahadzaryab1 commented Oct 20, 2024

yurishkuro commented Oct 20, 2024

mahadzaryab1 commented Oct 20, 2024

yurishkuro commented Oct 22, 2024

mahadzaryab1 commented Oct 29, 2024 •

edited

Loading

yurishkuro commented Oct 29, 2024

mahadzaryab1 commented Oct 29, 2024 •

edited

Loading

yurishkuro commented Oct 29, 2024

mahadzaryab1 commented Oct 29, 2024 •

edited

Loading

yurishkuro commented Oct 29, 2024

mahadzaryab1 commented Nov 2, 2024 •

edited

Loading

yurishkuro commented Nov 2, 2024

mahadzaryab1 commented Nov 2, 2024 •

edited

Loading

yurishkuro commented Nov 2, 2024

mahadzaryab1 commented Nov 2, 2024 •

edited

Loading

mahadzaryab1 commented Nov 2, 2024

mahadzaryab1 commented Nov 2, 2024

yurishkuro commented Nov 3, 2024

yurishkuro commented Nov 3, 2024

mahadzaryab1 commented Jan 10, 2025

mahadzaryab1 commented Jan 27, 2025

yurishkuro commented Jan 27, 2025

Phase out the distinction between primary and archive storage #6065

Phase out the distinction between primary and archive storage #6065

Comments

yurishkuro commented Oct 6, 2024 • edited Loading

mahadzaryab1 commented Oct 17, 2024 • edited Loading

yurishkuro commented Oct 17, 2024

mahadzaryab1 commented Oct 19, 2024

yurishkuro commented Oct 19, 2024

mahadzaryab1 commented Oct 19, 2024

yurishkuro commented Oct 19, 2024

mahadzaryab1 commented Oct 19, 2024 • edited Loading

yurishkuro commented Oct 20, 2024

mahadzaryab1 commented Oct 20, 2024 • edited Loading

mahadzaryab1 commented Oct 20, 2024

yurishkuro commented Oct 20, 2024

mahadzaryab1 commented Oct 20, 2024

yurishkuro commented Oct 22, 2024

mahadzaryab1 commented Oct 29, 2024 • edited Loading

yurishkuro commented Oct 29, 2024

mahadzaryab1 commented Oct 29, 2024 • edited Loading

yurishkuro commented Oct 29, 2024

mahadzaryab1 commented Oct 29, 2024 • edited Loading

yurishkuro commented Oct 29, 2024

mahadzaryab1 commented Nov 2, 2024 • edited Loading

yurishkuro commented Nov 2, 2024

mahadzaryab1 commented Nov 2, 2024 • edited Loading

yurishkuro commented Nov 2, 2024

mahadzaryab1 commented Nov 2, 2024 • edited Loading

mahadzaryab1 commented Nov 2, 2024

mahadzaryab1 commented Nov 2, 2024

yurishkuro commented Nov 3, 2024

yurishkuro commented Nov 3, 2024

mahadzaryab1 commented Jan 10, 2025

mahadzaryab1 commented Jan 27, 2025

yurishkuro commented Jan 27, 2025

yurishkuro commented Oct 6, 2024 •

edited

Loading

mahadzaryab1 commented Oct 17, 2024 •

edited

Loading

mahadzaryab1 commented Oct 19, 2024 •

edited

Loading

mahadzaryab1 commented Oct 20, 2024 •

edited

Loading

mahadzaryab1 commented Oct 29, 2024 •

edited

Loading

mahadzaryab1 commented Oct 29, 2024 •

edited

Loading

mahadzaryab1 commented Oct 29, 2024 •

edited

Loading

mahadzaryab1 commented Nov 2, 2024 •

edited

Loading

mahadzaryab1 commented Nov 2, 2024 •

edited

Loading

mahadzaryab1 commented Nov 2, 2024 •

edited

Loading