Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resilience4J bulkhead metrics missing at /actutor/prometheus #121

Closed
tjuchniewicz opened this issue Nov 5, 2021 · 11 comments · Fixed by #130
Closed

Resilience4J bulkhead metrics missing at /actutor/prometheus #121

tjuchniewicz opened this issue Nov 5, 2021 · 11 comments · Fixed by #130
Labels
bug Something isn't working
Milestone

Comments

@tjuchniewicz
Copy link

tjuchniewicz commented Nov 5, 2021

How to reproduce

  1. use two different circuit breaker (CB) instances (cb1, cb2)
  2. define resilience4j.thread-pool-bulkhead.configs.default.coreThreadPoolSiz: 5
  3. define custom config for only one CB instance resilience4j.thread-pool-bulkhead.instances.cb2.coreThreadPoolSize: 10
  4. you will not see metrics for cb1 (e.g. resilience4j.bulkhead.max.thread.pool.size) at /prometheus but /metrics is fine

Explanation
Prometheus requires that all meters with the same name have the same set of tag keys. See: micrometer-metrics/micrometer#877
When we define custom config for circuit breaker, Spring Cloud Circuit Breaker (SCCB) will initialize CB instance at startup and this automatically register metric in Micrometer without any tags.
At runtime, for all CB instances, SCCB executes Resilience4jBulkheadProvider.run() and initializes other CB using group tag. See:

final io.vavr.collection.Map<String, String> tags = io.vavr.collection.HashMap.of(CIRCUIT_BREAKER_GROUP_TAG,

The result is that Prometheus will ignore metrics for cb1.

Workaround
Remove group tag in your code:

@Bean
public MeterFilter removeGroupTagFromResilience4jBulkheadMeterFilter() {
	return new MeterFilter() {
		@Override
		public Meter.Id map(Meter.Id id) {
			if(id.getName().startsWith("resilience4j.bulkhead")) {
				List<Tag> tags = id.getTags().stream().filter(t -> !"group".equals(t.getKey())).collect(Collectors.toList());
				return id.replaceTags(tags);
			}
		   return id;
		}
	};
}

Proposed fix
Do not pass group tag to Resilience4jBulkheadProvider. Do we really need group as tag? Looks like the same value is used as name tag (created from id).

@ryanjbaxter
Copy link
Contributor

When we define custom config for circuit breaker, Spring Cloud Circuit Breaker (SCCB) will initialize CB instance at startup and this automatically register metric in Micrometer without any tags

Where does this happen? I am not seeing it

Also if we removed the group tag wouldn't that result in the CB not being part of the group?

Wonder if @Ferioney might care to comment as he originally wrote this code.

@tjuchniewicz
Copy link
Author

tjuchniewicz commented Nov 8, 2021

Where does this happen? I am not seeing it

See: ThreadPoolBulkheadConfiguration.threadPoolBulkheadRegistry(): https://github.com/resilience4j/resilience4j/blob/809080dfcdf2a72efd27c5b1ad0e78d8c4c00176/resilience4j-spring/src/main/java/io/github/resilience4j/bulkhead/configure/threadpool/ThreadPoolBulkheadConfiguration.java#L72

Also if we removed the group tag wouldn't that result in the CB not being part of the group?

In my opinion tags sent to Resilience4j are only used for metrics. Grouping is implemented in Spring Cloud and is not directly related to any of Resilience4j feature. Will not break anything in my opinion.

@ryanjbaxter
Copy link
Contributor

@tjuchniewicz would it be possible to provide a sample that reproduces the problem so I can dig into it further?

@tjuchniewicz
Copy link
Author

ryanjbaxter pushed a commit to ryanjbaxter/spring-cloud-circuitbreaker that referenced this issue Feb 7, 2022
@ryanjbaxter ryanjbaxter linked a pull request Feb 7, 2022 that will close this issue
@ryanjbaxter ryanjbaxter added bug Something isn't working and removed feedback-provided labels Feb 10, 2022
@ryanjbaxter ryanjbaxter added this to the 2.1.1 milestone Feb 10, 2022
ryanjbaxter pushed a commit that referenced this issue Feb 10, 2022
rodbate pushed a commit to rodbate/spring-cloud-circuitbreaker that referenced this issue Feb 13, 2022
Use MeterFilter instead of MeterRegistryCustomizer
@tjuchniewicz
Copy link
Author

@ryanjbaxter is this really fixed? I'm using:

<dependency>
   <groupId>org.springframework.cloud</groupId>
   <artifactId>spring-cloud-starter-circuitbreaker-resilience4j</artifactId>
   <version>2.1.1</version>
</dependency>	

and issue still exists?

@ryanjbaxter
Copy link
Contributor

It should be yes, I tested it with the sample you provided in the issue

@tjuchniewicz
Copy link
Author

It should be yes, I tested it with the sample you provided in the issue

Can you show what exactly you change in the sample? I double checked using spring-cloud-starter-circuitbreaker-resilience4j 2.1.1 and issue still exists.

@ryanjbaxter
Copy link
Contributor

The only change I made was switch the boot version to 2.6.3 and the cloud version to 2021.0.1.

I also fixed the property names in application properties

resilience4j.thread-pool-bulkhead.configs.default.coreThreadPoolSize=10
resilience4j.timelimiter.configs.default.coreThreadPoolSize=10s

If I go to http://localhost:8080/actuator/prometheus using 2021.0.0 I get the following...store-one missing.

# HELP resilience4j_bulkhead_thread_pool_size The thread pool size
# TYPE resilience4j_bulkhead_thread_pool_size gauge
resilience4j_bulkhead_thread_pool_size{name="store-two",} 1.0

If I go to http://localhost:8080/actuator/prometheus using 2021.0.1 I get the following...

# HELP resilience4j_bulkhead_core_thread_pool_size The core thread pool size
# TYPE resilience4j_bulkhead_core_thread_pool_size gauge
resilience4j_bulkhead_core_thread_pool_size{group="none",name="store-two",} 9.0
resilience4j_bulkhead_core_thread_pool_size{group="store-one",name="store-one",} 10.0

@tjuchniewicz
Copy link
Author

@ryanjbaxter See tjuchniewicz/r4j-metrics-issue@69bb56d. I've checked several times.. For me issue still exists... Could you double check on your side again?

@ryanjbaxter
Copy link
Contributor

Can you add feign.circuitbreaker.enabled=true to your application.properties?

@tjuchniewicz
Copy link
Author

Works with feign.circuitbreaker.enabled=true. For my use-case this is fine. I use Resilience4J only with Feign.
Not sure about other use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants