Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

if endTime is set, so must startTime be set #281

Closed
adisharoon opened this issue Nov 21, 2017 · 12 comments
Closed

if endTime is set, so must startTime be set #281

adisharoon opened this issue Nov 21, 2017 · 12 comments

Comments

@adisharoon
Copy link

adisharoon commented Nov 21, 2017

For some of our cassandra reaper instances, the web UI has stopped displaying running repairs, and the javascript console shows 500 errors. The only clue I've found is the following exception which occurs frequently in the log. This is for the new version 1.0.0

ERROR [2017-11-21 16:42:42,853] io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 61b8c4120d508373
! java.lang.IllegalStateException: if endTime is set, so must startTime be set
! at com.google.common.base.Preconditions.checkState(Preconditions.java:174)
! at io.cassandrareaper.core.RepairSegment$Builder.build(RepairSegment.java:182)
! at io.cassandrareaper.storage.CassandraStorage.createRepairSegmentFromRow(CassandraStorage.java:693)
! at io.cassandrareaper.storage.CassandraStorage.getRepairSegmentsForRun(CassandraStorage.java:663)
! at io.cassandrareaper.storage.CassandraStorage.getClusterRunStatuses(CassandraStorage.java:935)
! at io.cassandrareaper.resources.ClusterResource.viewCluster(ClusterResource.java:127)
! at io.cassandrareaper.resources.ClusterResource.getCluster(ClusterResource.java:112)
! at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
! at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
! at java.lang.reflect.Method.invoke(Method.java:498)
! at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
! at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
! at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
! at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
! at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
! at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
! at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
! at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
! at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
! at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
! at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
! at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
! at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
! at io.dropwizard.jetty.NonblockingServletHolder.handle(NonblockingServletHolder.java:49)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
! at io.dropwizard.servlets.ThreadNameFilter.doFilter(ThreadNameFilter.java:34)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.handle(AllowedMethodsFilter.java:50)
! at io.dropwizard.jersey.filter.AllowedMethodsFilter.doFilter(AllowedMethodsFilter.java:44)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
! at org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:308)
! at org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:262)
! at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
! at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
! at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1174)
! at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
! at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1106)
! at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
! at com.codahale.metrics.jetty9.InstrumentedHandler.handle(InstrumentedHandler.java:240)
! at io.dropwizard.jetty.RoutingHandler.handle(RoutingHandler.java:51)
! at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:459)
! at io.dropwizard.jetty.BiDiGzipHandler.handle(BiDiGzipHandler.java:68)
! at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)
! at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
! at org.eclipse.jetty.server.Server.handle(Server.java:524)
! at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319)
! at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
! at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
! at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
! at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
! at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
! at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
! at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
! at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
! at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
! at java.lang.Thread.run(Thread.java:748)
@michaelsembwever
Copy link
Member

@adisharoon would you be interested in uploading your logs (and even a cql export of the reaper_db keyspace)? if you're unsure about privacy concerns you could email it directly to me on mick@thelastpickle.com

@adejanovski
Copy link
Contributor

@adisharoon : I created a snapshot release to issue a quick fix >> https://github.com/thelastpickle/cassandra-reaper/releases/tag/1.0.1-SNAPSHOT

Please tell us how it works, and we'd gladly take the logs as @michaelsembwever to try and understand how this could have happened.

Thanks !

michaelsembwever added a commit that referenced this issue Nov 23, 2017
move the setting of the startTime to when the segment is triggered, as this is definite, rather than the jmx notification recieved afterwards that the segment has started (as we don't always get it).
@clangaxon
Copy link

1.0.1 SNAPSHOT allows repairs to continue, but does log warnings followed by an exception.

INFO   [2017-11-23 00:37:03,690] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000] i.c.s.RepairRunner - Next segment to run : 00000000-0000-0993-0000-000000000000 
INFO   [2017-11-23 00:37:03,751] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.s.SegmentRunner - It is ok to repair segment '00000000-0000-0993-0000-000000000000' on repair run with id '00000000-0000-0003-0000-000000000000' 
INFO   [2017-11-23 00:37:03,753] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.j.JmxProxy - Triggering repair of range (-9012888123792682497,-8992675264498560447] for keyspace "mystuff" on host 1.2.3.4, with repair parallelism parallel, in cluster with Cassandra version '3.0.14' (can use DATACENTER_AWARE 'true'), for column families: [] 
INFO   [2017-11-23 00:37:03,760] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.s.SegmentRunner - Repair for segment 00000000-0000-0993-0000-000000000000 started, status wait will timeout in 1800000 millis 

<Repair runs from 00:37:03,763 to 00:37:03,778>

WARN   [2017-11-23 00:37:03,784] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.c.RepairSegment - endTime is set but startTime is not on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
WARN   [2017-11-23 00:37:03,784] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.c.RepairSegment - startTime can only be unset if segment state is NOT_STARTED (currently = DONE) on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
WARN   [2017-11-23 00:37:03,785] [cas-auditdb-cluster] i.c.c.RepairSegment - endTime is set but startTime is not on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
WARN   [2017-11-23 00:37:03,785] [cas-auditdb-cluster] i.c.c.RepairSegment - startTime can only be unset if segment state is NOT_STARTED (currently = DONE) on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
WARN   [2017-11-23 00:37:03,785] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.c.RepairSegment - endTime is set but startTime is not on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
WARN   [2017-11-23 00:37:03,785] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.c.RepairSegment - startTime can only be unset if segment state is NOT_STARTED (currently = DONE) on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
WARN   [2017-11-23 00:37:03,785] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.c.RepairSegment - endTime is set but startTime is not on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
WARN   [2017-11-23 00:37:03,785] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.c.RepairSegment - startTime can only be unset if segment state is NOT_STARTED (currently = DONE) on segment 00000000-0000-0993-0000-000000000000 for repair 00000000-0000-0003-0000-000000000000. This should not be happening :( 
INFO   [2017-11-23 00:37:03,785] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.s.SegmentRunner - Repair command 43 on segment 00000000-0000-0993-0000-000000000000 returned with state DONE 
ERROR  [2017-11-23 00:37:03,797] [cas-auditdb-cluster:00000000-0000-0003-0000-000000000000:00000000-0000-0993-0000-000000000000] i.c.s.RepairRunner - Executing SegmentRunner failed 
java.lang.IllegalArgumentException: ReadableInstant objects must not be null
	at org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:69)
	at org.joda.time.Seconds.secondsBetween(Seconds.java:101)
	at io.cassandrareaper.service.SegmentRunner.runRepair(SegmentRunner.java:332)
	at io.cassandrareaper.service.SegmentRunner.run(SegmentRunner.java:133)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@michaelsembwever
Copy link
Member

michaelsembwever commented Nov 23, 2017

I will fix that @clangaxon

In fact it should already be fixed in mck/fix-repair-segment-starttime.

The problem with removing the preconditions is that it just moves the error to other places in the code, as the IllegalArgumentException above shows.

@michaelsembwever
Copy link
Member

Fixed in #282

Expect the 1.0.1 release with 24 hrs.

@clangaxon
Copy link

Thanks!

@adejanovski
Copy link
Contributor

The problem should be fixed in 1.0.1-SNAPSHOT

@clangaxon @adisharoon : any chance you could test this release and tell us if the fix works for you?

We'll then release a final 1.0.1.

Thanks

@adejanovski adejanovski reopened this Nov 24, 2017
@clangaxon
Copy link

So far, it appears to be functioning properly and has stopped all the warnings and errors.

@adejanovski
Copy link
Contributor

thanks for the feedback @clangaxon.

Closing the issue.

@ludekdolejsky
Copy link

ludekdolejsky commented Dec 4, 2017

I am unfortunately seeing this issue even after upgrading to 1.0.1.

I am running one Reaper per DC and the "endTime is set, so must startTime be set" error is happening occasionally (I'd say less frequently than prior to upgrading to 1.0.1).

please see attached log files:

reaper-dc2.log
reaper-dc1.log

data causing the issue (you can see they come from dc2 log file):

cqlsh> select segment_start_time, segment_end_time from reaper_db.repair_run where id=231ee1e0-d871-11e7-8db9-7bb3e751dc82 and segment_id=23212c16-d871-11e7-8db9-7bb3e751dc82;

 segment_start_time | segment_end_time
--------------------+---------------------------------
               null | 2017-12-04 00:36:45.261000+0000

(1 rows)

everything goes back to normal when I fix the data by running:

update reaper_db.repair_run set segment_end_time=null where id=231ee1e0-d871-11e7-8db9-7bb3e751dc82 and segment_id=23212c16-d871-11e7-8db9-7bb3e751dc82;

@michaelsembwever
Copy link
Member

michaelsembwever commented Dec 5, 2017

@ludekdolejsky , i'll look into this. but i'll need more logs than what you provided. all logs since that particular repair started would be best.

also it needs to be opened as a new ticket, as this failure message is only a symptom to various underlying causes that need to be addressed separately… (and hopefully most of those causes have been fixed already). but this can be done when more info is provided.

@ludekdolejsky
Copy link

@michaelsembwever I have opened #291 with full logs. Hope it helps. Thanks, Ludek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants