Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSSIDECAR-203: Created Endpoint that Triggers an Immediate Schema Report #198

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

5
Copy link
Contributor

@5 5 commented Feb 19, 2025

@5 5 force-pushed the trunk branch 3 times, most recently from be55528 to bfa7acc Compare February 19, 2025 03:25
@yifan-c yifan-c changed the title Created Endpoint that Triggers an Immediate Schema Report CASSSIDECAR-203: Created Endpoint that Triggers an Immediate Schema Report Feb 20, 2025
@@ -8,7 +9,7 @@
* Sidecar schema initialization can be executed on multiple thread (CASSSIDECAR-200)
* Make sidecar operations resilient to down Cassandra nodes (CASSSIDECAR-201)
* Fix Cassandra instance not found error (CASSSIDECAR-192)
* Implemented Schema Reporter for Integration with DataHub (CASSSIDECAR-191)
* Implement Schema Reporter for Integration with DataHub (CASSSIDECAR-191)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not change the existing entries in the CHANGES.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Is there a specific reason you'd like me to keep this typo I myself accidentally introduced in there forever?

Copy link
Contributor

@yifan-c yifan-c Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mainly avoid amending the changes log. It is more important to have the git history of each line linked to the correct commit.
Btw, it is not really a typo. "Implemented something" described the change as clear.
If you look at the log, there are places where "Adds" and "Add", as well as "Adding" are used. There is no strict rule on the verb's form.

@@ -63,6 +63,9 @@ public class BasicPermissions
public static final Permission READ_OPERATIONAL_JOB = new DomainAwarePermission("OPERATIONAL_JOB:READ", OPERATION_SCOPE);
public static final Permission DECOMMISSION_NODE = new DomainAwarePermission("NODE:DECOMMISSION", OPERATION_SCOPE);

// Permissions related to Schema Reporting
public static final Permission REPORT_SCHEMA = new DomainAwarePermission("SCHEMA:REPORT", CLUSTER_SCOPE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think an extra permission is required.
The permission needed in order to publish to DataHub is SCHEMA:READ, which already exists. cc: @sarankk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The permission is not for someone to read the schema, but for someone to trigger the schema report on demand. So I think the permission is different. Would love to hear input from @sarankk

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentiment here is to have restraint in adding new verbs. Ideally, it should be a fixed set of verb to avoid operational pain.
The reason that READ should work here is that the reporter is reading the cassandra schema. When it publishes (i.e. sends requests to DataHub), the authorization should be enforced by the server (DataHub), not the client (Sidecar).

Copy link
Contributor

@sarankk sarankk Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel having 2 different permissions is better in general, with 1 granting read permission automatically grants them report permission. But in this particular case, since we already have a periodic task to report schema, irrespective of this endpoint, we can use the enable flag for schema reporting to control whether we want to allow reporting or not without adding a separate permission for reporting control?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to get to this tomorrow, but see there's confusion about this one.

Changing permissions to the existing SCHEMA:READ will be logically equivalent to the following statement:

"Everyone who is allowed to see cluster schema is also allowed to perform DoS attacks on Sidecar."

Is that actually true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a separate permission to allow a user trigger a schema reporting. We should not conflate the SCHEMA:READ permission with the ability to report schemas

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also think about the future. We do not want to go down on the route of introducing new verbs for new actions. And this is triggering my concern on it.

I am fine with letting PUBLISH pass. But we should really think about it and be mindful with introducing new verbs.

*
* @param metadata the metadata fetcher
* @param executor executor pools for blocking executions
* @param reporter executor pools for blocking executions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-paste error

executorPools.service()
.runBlocking(() -> metadataFetcher.runOnFirstAvailableInstance(instance ->
schemaReporter.process(instance.delegate().metadata())))
.onSuccess(context::json)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous block does not return any value. What does the response json look like?

Comment on lines +100 to +113
* Iterate through the local instances and run the {@link Consumer} on the first available one,
* so no {@link CassandraUnavailableException} or {@link OperationUnavailableException} is thrown for the operations
*
* @param consumer a {@link Consumer} that processes {@link InstanceMetadata} and returns no result
* @throws CassandraUnavailableException if all local instances were exhausted
*/
public void runOnFirstAvailableInstance(Consumer<InstanceMetadata> consumer) throws CassandraUnavailableException
{
callOnFirstAvailableInstance(metadata ->
{
consumer.accept(metadata);
return null;
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this method? It is unnecessary. It does not retrieve anything from instance metadata fetcher, who is supposed to "fetch something". I have suggested a different implementation in the new handler w/o using this method.

}

@Test
@SuppressWarnings("deprecation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

}

@Test
@SuppressWarnings("deprecation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

CountDownLatch latch = new CountDownLatch(1);
server.close()
.onSuccess(future -> latch.countDown());
latch.await(TIMEOUT.toMillis(), TimeUnit.MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close the client too.

String expected = IOUtils.readFully("/datahub/empty_cluster.json");
emitter = new JsonEmitter();

client.get(server.actualPort(), LOCALHOST, ENDPOINT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an assertion that emitter.content is empty, before making the http request.

@@ -21,7 +21,7 @@
import java.nio.file.Path;
import java.util.Arrays;
import java.util.List;

import com.google.common.collect.ImmutableList;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert the changes in this file once you remove runOnFirstAvailableInstance

@yifan-c
Copy link
Contributor

yifan-c commented Feb 20, 2025

Let's keep the patch succinct, i.e. no unrelated changes and add code only when they are definitely required.

// Schema Reporting
protectedRouteBuilderFactory.get()
.router(router)
.method(HttpMethod.GET)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The verb should not be GET. This should either be PUT or POST:

  • PUT is more suitable for idempotent operations
  • POST is more suitable for operations that are not idempotent

Also, this endpoint might be suitable for the operations framework. Something to consider

Copy link
Contributor

@frankgh frankgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. I think we also need to add support on the client side to be able to run this new endpoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants