Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/refresh #273

Merged
merged 19 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions src/server/__mocks__/mockDataFromES.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@ const mockPing = () => {
.reply(200, 'hello');
};

const mockRefresh = () => {
if (config.allowRefresh) {
nock(config.esConfig.host)
.post('/_refresh')
.reply(200, '[Server] guppy refreshed successfully');
} else {
nock(config.esConfig.host)
.post('/_refresh')
.reply(404, '[Server] guppy _refresh functionality is not enabled');
}
};

const mockResourcePath = () => {
const queryResource = {
size: 0,
Expand Down Expand Up @@ -405,6 +417,7 @@ const mockArrayConfig = () => {
const setup = () => {
mockArborist();
mockPing();
mockRefresh();
mockResourcePath();
mockESMapping();
mockArrayConfig();
Expand Down
7 changes: 7 additions & 0 deletions src/server/__tests__/config.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -97,4 +97,11 @@ describe('config', () => {
expect(config.esConfig.aggregationIncludeMissingData).toBe(true);
expect(config.esConfig.missingDataAlias).toEqual(alias);
});

/* --------------- For _refresh testing --------------- */
test('could not access _refresh method if not in config', async () => {
process.env.GUPPY_CONFIG_FILEPATH = `${__dirname}/testConfigFiles/test-no-refresh-option-provided.json`;
const config = require("../config").default;
expect(config.allowRefresh).toBe(false);
});
});
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}
4 changes: 3 additions & 1 deletion src/server/auth/__tests__/authHelper.test.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
// eslint-disable-next-line
import nock from 'nock'; // must import this to enable mock data by nock
import nock from 'nock'; // must import this to enable mock data by nock
import getAuthHelperInstance from '../authHelper';
import esInstance from '../../es/index';
import setupMockDataEndpoint from '../../__mocks__/mockDataFromES';
Expand All @@ -12,6 +12,8 @@ setupMockDataEndpoint();
describe('AuthHelper', () => {
test('could create auth helper instance', async () => {
const authHelper = await getAuthHelperInstance('fake-jwt');
expect(authHelper.getAccessibleCreateResources()).toEqual(['internal-project-1', 'internal-project-3', 'internal-project-4', 'internal-project-5']);
expect(authHelper.getAccessibleCreateResources()).not.toContain(['internal-project-2', 'internal-project-6']);
expect(authHelper.getAccessibleResources()).toEqual(['internal-project-1', 'internal-project-2', 'internal-project-4', 'internal-project-5']);
expect(authHelper.getAccessibleResources()).not.toContain(['internal-project-3', 'internal-project-6']);
expect(authHelper.getUnaccessibleResources()).toEqual(['external-project-1', 'external-project-2']);
Expand Down
58 changes: 58 additions & 0 deletions src/server/auth/arboristClient.js
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,64 @@ class ArboristClient {
},
);
}

listAuthorizedCreateResources(jwt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this function is duplicating listAuthorizedResources(jwt), except for the part that check's the user for certain service + methods.
We don't need to duplicate those codes, we should be extracting the part of making request to arborist and checking for user's access in the returned auth mapping into two separated functions, and just reuse them. In this way, we can avoid sending duplicated requests to arborist
See how the fetch for user's auth mapping and check for user's access can be separated in another service
https://github.com/uc-cdis/data-portal/blob/master/src/actions.js#L513
https://github.com/uc-cdis/data-portal/blob/master/src/authMappingUtils.js#L53

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok getAuthMapping function has been created and used twice, once for the reader permissions checking, and again for the guppy admin permissions checking. See 713fc1f

// Make request to arborist for list of resources with access
const resourcesEndpoint = `${this.baseEndpoint}/auth/mapping`;
log.debug('[ArboristClient] listAuthorizedCreateResources jwt: ', jwt);

const headers = (jwt) ? { Authorization: `bearer ${jwt}` } : {};
return fetch(
resourcesEndpoint,
{
method: 'POST',
headers,
},
).then(
(response) => {
if (response.status === 400) {
// Retry with GET instead of POST. Older version of Arborist POST auth/mapping
// didn't support token authentication.
// This catch block can be removed in a little while, when it will likely not cause issues
return fetch(
resourcesEndpoint,
{
method: 'GET',
headers,
},
).then((res) => res.json());
}
return response.json();
},
(err) => {
log.error(err);
throw new CodedError(500, err);
},
).then(
(result) => {
const data = {
resources: [],
};
Object.keys(result).forEach((key) => {
// logic: you have access to a project if you have the following access:
// method 'create' (or '*' - all methods) to service 'guppy' (or '*' - all services)
// on the project resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too broad, and it sort of binds the two roles of data submitter and Guppy admin togather. The restarting of Guppy should be an "admin-level" feature. I think we'd better to have a dedicated "guppy_admin" policy for this feature, which could be service == guppy, method == *, or service == guppy, method == admin_access

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see 713fc1f

if (result[key] && result[key].some((x) => (
(x.method === 'create' || x.method === '*')
&& (x.service === 'guppy' || x.service === '*')
))) {
data.resources.push(key);
}
});
log.debug('[ArboristClient] data: ', data);
return data;
},
(err) => {
log.error(err);
throw new CodedError(500, err);
},
);
}
}

export default new ArboristClient(config.arboristEndpoint);
8 changes: 8 additions & 0 deletions src/server/auth/authHelper.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import {
getRequestResourceListFromFilter,
buildFilterWithResourceList,
getAccessibleResourcesFromArboristasync,
getAccessibleCreateResourcesFromArboristasync
} from './utils';
import config from '../config';

Expand All @@ -18,6 +19,9 @@ export class AuthHelper {
this._accessibleResourceList = await getAccessibleResourcesFromArboristasync(this._jwt);
log.debug('[AuthHelper] accessible resources:', this._accessibleResourceList);

this._accessibleCreateResourceList = await getAccessibleCreateResourcesFromArboristasync(this._jwt);
log.debug('[AuthHelper] accessible resources with create method:', this._accessibleCreateResourceList);

const promiseList = [];
config.esConfig.indices.forEach(({ index, type }) => {
const subListPromise = this.getOutOfScopeResourceList(index, type);
Expand All @@ -39,6 +43,10 @@ export class AuthHelper {
return this._accessibleResourceList;
}

getAccessibleCreateResources(){
return this._accessibleCreateResourceList
}

getUnaccessibleResources() {
return this._unaccessibleResourceList;
}
Expand Down
25 changes: 25 additions & 0 deletions src/server/auth/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,31 @@ export const getAccessibleResourcesFromArboristasync = async (jwt) => {
return resources;
};

export const getAccessibleCreateResourcesFromArboristasync = async (jwt) => {
let data;
if (config.internalLocalTest) {
data = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider fix the failed es-lint check here

Copy link
Contributor Author

@matthewpeterkort matthewpeterkort Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

resources: [ // these are just for testing
'/programs/DEV/projects/test',
'/programs/jnkns/projects/jenkins',
],
};
} else {
data = await arboristClient.listAuthorizedCreateResources(jwt);
}

log.debug('[authMiddleware] list create resources: ', JSON.stringify(data, null, 4));
if (data && data.error) {
// if user is not in arborist db, assume has no access to any
if (data.error.code === 404) {
return [];
}
throw new CodedError(data.error.code, data.error.message);
}
const resources = data.resources ? _.uniq(data.resources) : [];
return resources;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to execute all those expensive array manipulations for this feature, if your goal is to determine whether am user is Guppy admin or not. This function can simply returns a boolean based on the return value from listAuthorizedCreateResources()
Of couse variable and function names should be updated to relflect this change

Copy link
Contributor Author

@matthewpeterkort matthewpeterkort Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok boolean is returned now when checking for refresh permissions

};

export const getRequestResourceListFromFilter = async (
esIndex,
esType,
Expand Down
1 change: 1 addition & 0 deletions src/server/config.js
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ const config = {
analyzedTextFieldSuffix: '.analyzed',
matchedTextHighlightTagName: 'em',
allowedMinimumSearchLen: 2,
allowRefresh: inputConfig.allowRefresh || false,
};

if (process.env.GEN3_ES_ENDPOINT) {
Expand Down
112 changes: 73 additions & 39 deletions src/server/server.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,50 @@ import downloadRouter from './download';
import CodedError from './utils/error';
import { statusRouter, versionRouter } from './endpoints';

let server;
const app = express();
app.use(cors());
app.use(helmet());
app.use(bodyParser.json({ limit: '50mb' }));

const refreshRouter = async (req, res, next) => {
res.setHeader('Content-Type', 'application/json; charset=utf-8');
try {
if (config.allowRefresh) {
log.debug('[Refresh] ', JSON.stringify(req.body, null, 4));
const jwt = headerParser.parseJWT(req);
if (!jwt) {
const noJwtError = new CodedError(401, '[Refresh] no JWT user token provided to _refresh function');
throw noJwtError;
}
const authHelper = await getAuthHelperInstance(jwt);
console.log("AUTH HELPER: ", authHelper)
if (authHelper._accessibleCreateResourceList === undefined || authHelper._accessibleCreateResourceList.length === 0) {
const noPermsUser = new CodedError(401, '[Refresh] User cannot refresh Guppy without a valid token that has read access to at least one project');
throw noPermsUser;
}
await server.stop()
await initializeAndStartServer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution is generally ok, but it only works if there is only 1 guppy pod/container/server in the system.
In the cases of how Gen3 is usually deployed, which is on a k8s cluster with a load balancer / reverse proxy in front of pods, and also if there are replicas of guppy pods. This solution will only triggers 1 guppy pod to restart each time, which will causes a discrepancy in the states of pods

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfshao thanks for all the feedback. In the case of replica guppy pods, is there a link you could provide for how requests from the clients are distributed to the guppy replicas? In other words, is there a nginx or helm configuration that fans out the requests?

Copy link

@bwalsh bwalsh Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfshao Alternatively, as a brainstorm: guppy could checking mappings on a configurable interval and initiate a restart if the version changed. For example, for index gen3.aced.io_file_0

$ curl  localhost:9200/gen3.aced.io_file_0  | jq '.["gen3.aced.io_file_0"].settings.index.version.created '
>>> "7170399"

On startup, guppy could cache this value for each index, and then monitor the index for changes

} else if (config.allowRefresh === false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should happens first, before checking for user's token

const disabledRefresh = new CodedError(404, '[Refresh] guppy _refresh functionality is not enabled');
throw disabledRefresh;
}
res.send("[Refresh] guppy refreshed successfully")
} catch (err) {
log.error(err);
next(err);
}
return 0;
};

const startServer = async () => {
// build schema and resolvers by parsing elastic search fields and types,
const typeDefs = getSchema(config.esConfig, esInstance);
const resolvers = getResolver(config.esConfig, esInstance);
const schema = makeExecutableSchema({ typeDefs, resolvers });
const schemaWithMiddleware = applyMiddleware(
schema,
...middlewares,
);
// create graphql server instance
const server = new ApolloServer({
const schemaWithMiddleware = applyMiddleware(schema, ...middlewares);
// create graphql server instance
server = new ApolloServer({
mocks: false,
schema: schemaWithMiddleware,
validationRules: [depthLimit(10)],
Expand All @@ -57,43 +85,49 @@ const startServer = async () => {
path: config.path,
}),
);
log.info(`[Server] guppy listening on port ${config.port}!`);
};

// simple health check endpoint
// eslint-disable-next-line no-unused-vars
app.get('/_status', statusRouter, (req, res, err, next) => {
if (err instanceof CodedError) {
// deepcode ignore ServerLeak: no important information exists in error
res.status(err.code).send(err.msg);
} else {
// deepcode ignore ServerLeak: no important information exists in error
res.status(500).send(err);
}
});
const initializeAndStartServer = async () => {
await esInstance.initialize();
await startServer();
};
// simple health check endpoint
// eslint-disable-next-line no-unused-vars
app.get('/_status', statusRouter, (req, res, err, next) => {
if (err instanceof CodedError) {
// deepcode ignore ServerLeak: no important information exists in error
res.status(err.code).send(err.msg);
} else {
// deepcode ignore ServerLeak: no important information exists in error
res.status(500).send(err);
}
});

// eslint-disable-next-line no-unused-vars
app.get('/_version', versionRouter);
// eslint-disable-next-line no-unused-vars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this for the next line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

app.get('/_version', versionRouter);

// download endpoint for fetching data directly from es
app.post(
'/download',
downloadRouter,
(err, req, res, next) => { // eslint-disable-line no-unused-vars
if (err instanceof CodedError) {
// deepcode ignore ServerLeak: no important information exists in error
res.status(err.code).send(err.msg);
} else {
// deepcode ignore ServerLeak: no important information exists in error
res.status(500).send(err);
}
},
);
// download endpoint for fetching data directly from es
app.post('/download', downloadRouter, (err, req, res, next) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put // eslint-disable-next-line no-unused-vars above this to satisfy the linter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

// eslint-disable-line no-unused-vars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if (err instanceof CodedError) {
// deepcode ignore ServerLeak: no important information exists in error
res.status(err.code).send(err.msg);
} else {
// deepcode ignore ServerLeak: no important information exists in error
res.status(500).send(err);
}
});

app.listen(config.port, () => {
log.info(`[Server] guppy listening on port ${config.port}!`);
});
};
app.post('/_refresh',refreshRouter, (err, req, res, next) => {
if (err instanceof CodedError) {
res.status(err.code).send(err.msg);
}else{
res.status(500).send(err)
}
});

// need to connect to ES and initialize before setting up a server
esInstance.initialize().then(() => {
startServer();
app.listen(config.port, async () => {
await initializeAndStartServer();
});
2 changes: 1 addition & 1 deletion startServer.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ if [[ -z "$DD_TRACE_ENABLED" ]]; then
export DD_TRACE_ENABLED=false
fi

node --max-http-header-size 16000 --require dd-trace/init dist/server/server.js
node --max-http-header-size 16000 --require dd-trace/init dist/server/server.js
Loading