Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: be/distributed uptime #1510

Merged
merged 12 commits into from
Jan 2, 2025
Merged

feat: be/distributed uptime #1510

merged 12 commits into from
Jan 2, 2025

Conversation

ajhollid
Copy link
Collaborator

@ajhollid ajhollid commented Jan 2, 2025

This PR lays the groundwork for distributed uptime monitoring. Distributed uptime monitoring will work by sending a POST request to UpRock servers, payload will include a callback URL. UpRock will POST to the callback URL when they have results.

  • Add ngrok to the project to receieve callbacks on local dev server
  • Add route/controller/db pathway for Distributed Uptime monitoring
  • Add a DistributedUptimeCheck model
  • Update monitors to allow "distributed_http" type

@ajhollid ajhollid merged commit d46cc3d into develop Jan 2, 2025
2 checks passed
@ajhollid ajhollid deleted the feat/be/distributed-uptime branch January 2, 2025 20:08
Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 PR Summary

  • Business value and requirements alignment: This PR introduces distributed uptime monitoring by integrating with UpRock servers, enhancing the system's capability to monitor uptime through external services.
  • Key components modified:
    • Added ngrok for local development to receive callbacks.
    • Introduced a new DistributedUptimeCheck model.
    • Updated monitors to include a "distributed_http" type.
    • Added new routes, controllers, and database pathways for distributed uptime monitoring.
  • Impact assessment: The changes introduce a new paradigm for uptime monitoring, adding dependencies on external services and altering data flow and processing logic.
  • System dependencies and integration impacts: The system now relies on external UpRock servers for certain uptime checks, introducing new network interactions and callback handling mechanisms.

1.2 Architecture Changes

  • System design modifications: The architecture now includes a distributed uptime monitoring feature, which involves sending POST requests to UpRock servers and handling callbacks.
  • Component interactions: The new DistributedUptimeController handles callbacks from UpRock, and the DistributedUptimeCheck model stores the results. The system interacts with UpRock servers via network requests.
  • Integration points: The integration points include the new callback route and the interaction with UpRock servers. The ngrok tool is used for local development to receive callbacks.

2. Detailed Technical Analysis

2.1 Code Logic Deep-Dive

Core Logic Changes

  • Server/controllers/distributedUptimeController.js - DistributedUptimeController

    • Submitted PR Code:
      import { handleError } from "./controllerUtils.js";
      import { successMessages } from "../utils/messages.js";
      
      const SERVICE_NAME = "DistributedUptimeQueueController";
      
      class DistributedUptimeController {
      	constructor() {}
      
      	async resultsCallback(req, res, next) {
      		try {
      			console.log(req.body);
      			res.status(200).json({ message: "OK" });
      		} catch (error) {
      			throw handleError(error, SERVICE_NAME, "resultsCallback");
      		}
      	}
      }
      export default DistributedUptimeController;
    • Analysis:
      • The resultsCallback function currently only logs the request body and returns a generic "OK" response. There's no logic to process and persist the received uptime check results, no validation of the request origin, and minimal error handling.
    • LlamaPReview Suggested Improvements:
      import { handleError } from "./controllerUtils.js";
      import { successMessages } from "../utils/messages.js";
      import DistributedUptimeCheck from "../db/models/DistributedUptimeCheck.js"; // LlamaPReview Improvement: Import the model
      import Monitor from "../db/models/Monitor.js"; // LlamaPReview Improvement: Import the Monitor model
      
      const SERVICE_NAME = "DistributedUptimeQueueController";
      const UPROCK_ALLOWED_ORIGINS = [process.env.UPROCK_CALLBACK_ORIGIN]; // LlamaPReview Improvement: Define allowed origins
      
      class DistributedUptimeController {
      	constructor() {}
      
      	async resultsCallback(req, res, next) {
      		try {
      			// LlamaPReview Improvement: Validate request origin
      			if (!UPROCK_ALLOWED_ORIGINS.includes(req.get('origin'))) {
      				return res.status(403).json({ message: "Unauthorized callback origin" });
      			}
      
      			// LlamaPReview Improvement: Basic data validation (expand as needed)
      			const { monitorId, success, responseTime, details } = req.body;
      			if (!monitorId || typeof success !== 'boolean' || responseTime === undefined) {
      				return res.status(400).json({ message: "Invalid callback payload" });
      			}
      
      			// LlamaPReview Improvement: Find the associated monitor
      			const monitor = await Monitor.findById(monitorId);
      			if (!monitor || monitor.type !== 'distributed_http') {
      				return res.status(400).json({ message: "Invalid monitor for callback" });
      			}
      
      			// LlamaPReview Improvement: Create a new DistributedUptimeCheck record
      			await DistributedUptimeCheck.create({
      				monitorId: monitorId,
      				success: success,
      				responseTime: responseTime,
      				details: details,
      				// Add other relevant fields from the callback payload
      			});
      
      			res.status(200).json({ message: "OK" });
      		} catch (error) {
      			// LlamaPReview Improvement: More specific error logging
      			handleError(error, SERVICE_NAME, "resultsCallback", `Error processing callback from UpRock: ${error.message}`);
      			next(error); // Pass error to error handling middleware
      		}
      	}
      }
      export default DistributedUptimeController;
    • Improvement rationale:
      • Technical benefits: Adds crucial security by validating the callback origin. Implements basic data validation to ensure the integrity of received data. Persists the received monitoring data into the database, making the distributed uptime feature functional. Provides more specific error logging for better debugging.
      • Business value: Enables the core functionality of distributed uptime monitoring, providing users with insights from external checks.
      • Risk assessment: Mitigates the risk of unauthorized data injection and ensures that the system correctly processes and stores monitoring results.
  • Server/db/models/DistributedUptimeCheck.js - DistributedUptimeCheckSchema

    • Submitted PR Code:
      import mongoose from "mongoose";
      import { BaseCheckSchema } from "./Check.js";
      
      const DistributedUptimeCheckSchema = mongoose.Schema(
      	{
      		...BaseCheckSchema.obj,
      	},
      	{ timestamps: true }
      );
      
      DistributedUptimeCheckSchema.index({ createdAt: 1 });
      DistributedUptimeCheckSchema.index({ monitorId: 1, createdAt: 1 });
      DistributedUptimeCheckSchema.index({ monitorId: 1, createdAt: -1 });
      
      export default mongoose.model("DistributedUptimeCheck", DistributedUptimeCheckSchema);
    • Analysis:
      • While the schema inherits from BaseCheckSchema, it lacks specific fields for storing data received from the UpRock callback, such as the raw response or any specific error codes returned by the external service.
    • LlamaPReview Suggested Improvements:
      import mongoose from "mongoose";
      import { BaseCheckSchema } from "./Check.js";
      
      const DistributedUptimeCheckSchema = mongoose.Schema(
      	{
      		...BaseCheckSchema.obj,
      		// LlamaPReview Improvement: Add fields specific to distributed uptime checks
      		externalResponseTime: {
      			type: Number,
      			description: "Response time reported by the external service",
      		},
      		externalRawResponse: {
      			type: mongoose.Schema.Types.Mixed,
      			description: "Raw response data from the external service",
      		},
      		externalStatusCode: {
      			type: Number,
      			description: "HTTP status code from the external service",
      		},
      		externalError: {
      			type: String,
      			description: "Error message from the external service, if any",
      		},
      	},
      	{ timestamps: true }
      );
      
      DistributedUptimeCheckSchema.index({ createdAt: 1 });
      DistributedUptimeCheckSchema.index({ monitorId: 1, createdAt: 1 });
      DistributedUptimeCheckSchema.index({ monitorId: 1, createdAt: -1 });
      
      export default mongoose.model("DistributedUptimeCheck", DistributedUptimeCheckSchema);
    • Improvement rationale:
      • Technical benefits: Adds specific fields to store relevant data from the external service, providing a more comprehensive record of the distributed uptime check. This avoids potential misuse or misinterpretation of fields inherited from BaseCheckSchema.
      • Business value: Provides richer data for analysis and debugging of distributed uptime checks.
      • Risk assessment: Reduces the risk of data inconsistencies and improves the clarity and accuracy of the stored monitoring information.
  • Server/index.js - startApp

    • Submitted PR Code:
      const startApp = async () => {
      	const app = express();
      	if (process.env.NODE_ENV === "development") {
      		try {
      			ngrokUrl = await ngrok.connect({
      				proto: "http",
      				addr: PORT,
      				authtoken: process.env.NGROK_AUTH_TOKEN,
      				api_addr: false,
      			});
      			process.env.NGROK_URL = ngrokUrl;
      			logger.info({
      				message: `ngrok url: ${ngrokUrl}`,
      				service: SERVICE_NAME,
      				method: "startApp",
      			});
      		} catch (error) {
      			logger.error({
      				message: `Error connecting to ngrok`,
      				service: SERVICE_NAME,
      				method: "startApp",
      				stack: error.stack,
      			});
      		}
      	}
    • Analysis:
      • The use of ngrok and its implications are correctly identified. However, there's no mechanism to ensure that the ngrok tunnel is properly closed when the development server shuts down, potentially leading to orphaned tunnels.
    • LlamaPReview Suggested Improvements:
      let ngrokUrl;
      
      const startApp = async () => {
      	const app = express();
      	if (process.env.NODE_ENV === "development") {
      		try {
      			ngrokUrl = await ngrok.connect({
      				proto: "http",
      				addr: PORT,
      				authtoken: process.env.NGROK_AUTH_TOKEN,
      				api_addr: false,
      			});
      			// LlamaPReview Improvement: Log ngrok URL with more context
      			logger.info({
      				message: `ngrok tunnel established at: ${ngrokUrl}`,
      				service: SERVICE_NAME,
      				method: "startApp",
      			});
      		} catch (error) {
      			logger.error({
      				message: `Error connecting to ngrok`,
      				service: SERVICE_NAME,
      				method: "startApp",
      				stack: error.stack,
      			});
      		}
      	}
      
      	// ... rest of the startApp function
      };
      
      const shutdown = async () => {
      	if (isShuttingDown) {
      		return;
      	}
      	isShuttingDown = true;
      	logger.info({ message: "Attempting graceful shutdown" });
      
      	if (process.env.NODE_ENV === "development" && ngrokUrl) { // LlamaPReview Improvement: Close ngrok tunnel on shutdown
      		try {
      			await ngrok.disconnect(ngrokUrl);
      			logger.info({ message: `ngrok tunnel disconnected from: ${ngrokUrl}` });
      		} catch (error) {
      			logger.error({ message: `Error disconnecting ngrok: ${error.message}`, stack: error.stack });
      		}
      	}
      
      	// ... rest of the shutdown function
      };
    • Improvement rationale:
      • Technical benefits: Ensures that the ngrok tunnel is closed when the server shuts down in development, preventing resource leaks. Makes the logging for ngrok more informative.
      • Business value: Improves the development experience by ensuring clean shutdown of development resources.
      • Risk assessment: Reduces the risk of orphaned ngrok tunnels and potential security issues associated with them.
  • Server/routes/distributedUptimeRoute.js - DistributedUptimeRoutes

    • Submitted PR Code:
      import { Router } from "express";
      
      class DistributedUptimeRoutes {
      	constructor(distributedUptimeController) {
      		this.router = Router();
      		this.distributedUptimeController = distributedUptimeController;
      		this.initRoutes();
      	}
      	initRoutes() {
      		this.router.post("/callback", this.distributedUptimeController.resultsCallback);
      	}
      
      	getRouter() {
      		return this.router;
      	}
      }
      
      export default DistributedUptimeRoutes;
    • Analysis:
      • The absence of any authentication or authorization middleware on the /callback route is a critical oversight. This means any external entity could potentially send POST requests to this endpoint, leading to data manipulation or denial-of-service.
    • LlamaPReview Suggested Improvements:
      import { Router } from "express";
      // LlamaPReview Improvement: Import a middleware for verifying UpRock requests
      import { verifyUpRockCallback } from "../middleware/verifyUpRockCallback.js";
      
      class DistributedUptimeRoutes {
      	constructor(distributedUptimeController) {
      		this.router = Router();
      		this.distributedUptimeController = distributedUptimeController;
      		this.initRoutes();
      	}
      	initRoutes() {
      		// LlamaPReview Improvement: Apply middleware to the callback route
      		this.router.post("/callback", verifyUpRockCallback, this.distributedUptimeController.resultsCallback);
      	}
      
      	getRouter() {
      		return this.router;
      	}
      }
      
      export default DistributedUptimeRoutes;
    • Improvement rationale:
      • Technical benefits: Adds a crucial layer of security by ensuring that only authenticated requests from UpRock are processed by the resultsCallback function. This prevents unauthorized access and data manipulation.
      • Business value: Protects the integrity of the monitoring data and the system's stability.
      • Risk assessment: Significantly reduces the risk of security vulnerabilities associated with the callback mechanism. This addresses a major security concern identified in the initial review.

Cross-cutting Concerns

  • Data flow analysis: The data flow for distributed uptime checks involves sending a POST request to UpRock servers and receiving a callback with the results. The resultsCallback function processes the callback data and stores it in the DistributedUptimeCheck model.
  • State management implications: The state management for distributed uptime checks is handled by the DistributedUptimeCheck model, which stores the results of the checks. The system needs to ensure that the state is consistent and accurate.
  • Error propagation paths: Errors in the resultsCallback function are logged and passed to the error handling middleware. The system should ensure that errors are handled gracefully and that the system remains stable even if errors occur.
  • Edge case handling across components: The system should handle edge cases such as invalid callback payloads, unauthorized callback origins, and errors in processing the callback data. The suggested improvements address these edge cases by adding validation and error handling.

Algorithm & Data Structure Analysis

  • Complexity analysis: The complexity of the resultsCallback function is O(1) as it performs a constant amount of work regardless of the input size. The suggested improvements add validation and error handling, but the complexity remains O(1).
  • Performance implications: The suggested improvements add validation and error handling, which may slightly increase the processing time for the resultsCallback function. However, the performance impact is minimal and the benefits of added security and data integrity outweigh the performance cost.
  • Memory usage considerations: The suggested improvements add fields to the DistributedUptimeCheck model to store additional data from the callback. This may slightly increase memory usage, but the impact is minimal and the benefits of richer data for analysis outweigh the memory cost.

2.2 Implementation Quality

  • Code organization and structure: The code is well-organized and follows a modular structure. The suggested improvements add validation and error handling, which improves the code quality and maintainability.
  • Design patterns usage: The code uses design patterns such as the controller pattern for handling callbacks and the model pattern for storing data. The suggested improvements add validation and error handling, which aligns with best practices for secure and robust code.
  • Error handling approach: The suggested improvements add more specific error logging and ensure that errors are passed to the error handling middleware. This improves the error handling approach and makes the system more robust.
  • Resource management: The suggested improvements ensure that the ngrok tunnel is properly closed when the server shuts down in development, preventing resource leaks. This improves the resource management of the system.

3. Critical Findings

3.1 Potential Issues

  • 🔴 Critical Issues

    • Lack of request origin validation: The resultsCallback function does not validate the request origin, allowing unauthorized callbacks.
      • Impact: Potential data manipulation and security vulnerabilities.
      • Recommendation: Add validation to ensure that only authorized requests from UpRock are processed.
    • Lack of data validation: The resultsCallback function does not validate the callback payload, which could lead to invalid data being stored.
      • Impact: Data integrity issues and potential errors in processing the callback data.
      • Recommendation: Add basic data validation to ensure the integrity of received data.
    • Lack of error handling: The resultsCallback function does not handle errors specifically, which could lead to unclear error messages and difficult debugging.
      • Impact: Reduced system stability and difficult debugging.
      • Recommendation: Add more specific error logging and ensure that errors are passed to the error handling middleware.
  • 🟡 Warnings

    • Potential resource leaks: The ngrok tunnel is not properly closed when the server shuts down in development, which could lead to orphaned tunnels.
      • Potential risks: Resource leaks and potential security issues associated with orphaned ngrok tunnels.
      • Suggested improvements: Ensure that the ngrok tunnel is properly closed when the server shuts down in development.
    • Lack of authentication middleware: The /callback route does not have any authentication or authorization middleware, which could lead to unauthorized access.
      • Potential risks: Data manipulation and denial-of-service attacks.
      • Suggested improvements: Add authentication or authorization middleware to the /callback route to ensure that only authenticated requests from UpRock are processed.

3.2 Code Quality Concerns

  • Maintainability aspects: The suggested improvements add validation and error handling, which improves the maintainability of the code. The code is well-organized and follows a modular structure, which makes it easier to maintain.
  • Readability issues: The suggested improvements add validation and error handling, which improves the readability of the code. The code is well-commented and follows best practices for secure and robust code.
  • Performance bottlenecks: The suggested improvements add validation and error handling, which may slightly increase the processing time for the resultsCallback function. However, the performance impact is minimal and the benefits of added security and data integrity outweigh the performance cost.

4. Security Assessment

  • Authentication/Authorization impacts: The suggested improvements add validation to ensure that only authorized requests from UpRock are processed. This improves the security of the callback mechanism and prevents unauthorized access.
  • Data handling concerns: The suggested improvements add basic data validation to ensure the integrity of received data. This improves the data handling concerns and prevents potential data manipulation.
  • Input validation: The suggested improvements add input validation to ensure that only valid data is processed and stored. This improves the input validation and prevents potential data integrity issues.
  • Security best practices: The suggested improvements align with security best practices by adding validation and error handling. This improves the overall security of the system and prevents potential security vulnerabilities.
  • Potential security risks: The lack of request origin validation and data validation in the resultsCallback function are potential security risks. The suggested improvements address these risks by adding validation and error handling.
  • Mitigation strategies: The suggested improvements add validation and error handling, which mitigate the potential security risks and improve the overall security of the system.
  • Security testing requirements: The system should be tested for security vulnerabilities such as unauthorized access, data manipulation, and denial-of-service attacks. The suggested improvements address these security concerns and should be tested to ensure that they are effective.

5. Testing Strategy

5.1 Test Coverage

  • Unit test analysis: The suggested improvements add validation and error handling, which should be covered by unit tests. The unit tests should cover edge cases such as invalid callback payloads, unauthorized callback origins, and errors in processing the callback data.
  • Integration test requirements: The system should be tested for integration with the UpRock service. The integration tests should cover scenarios such as successful callbacks, failed callbacks, delayed callbacks, and invalid data in the callback.
  • Edge cases coverage: The suggested improvements add validation and error handling, which should be tested for edge cases. The edge cases should include invalid callback payloads, unauthorized callback origins, and errors in processing the callback data.

5.2 Test Recommendations

Suggested Test Cases

// Example unit test for resultsCallback function
describe('DistributedUptimeController', () => {
  describe('resultsCallback', () => {
    it('should return 403 for unauthorized callback origin', async () => {
      const req = {
        get: jest.fn().mockReturnValue('unauthorized_origin'),
        body: { monitorId: '123', success: true, responseTime: 100, details: {} },
      };
      const res = {
        status: jest.fn().mockReturnThis(),
        json: jest.fn(),
      };
      const next = jest.fn();

      await distributedUptimeController.resultsCallback(req, res, next);

      expect(res.status).toHaveBeenCalledWith(403);
      expect(res.json).toHaveBeenCalledWith({ message: 'Unauthorized callback origin' });
    });

    it('should return 400 for invalid callback payload', async () => {
      const req = {
        get: jest.fn().mockReturnValue(process.env.UPROCK_CALLBACK_ORIGIN),
        body: { success: true, responseTime: 100, details: {} },
      };
      const res = {
        status: jest.fn().mockReturnThis(),
        json: jest.fn(),
      };
      const next = jest.fn();

      await distributedUptimeController.resultsCallback(req, res, next);

      expect(res.status).toHaveBeenCalledWith(400);
      expect(res.json).toHaveBeenCalledWith({ message: 'Invalid callback payload' });
    });

    it('should create a DistributedUptimeCheck record for valid callback', async () => {
      const req = {
        get: jest.fn().mockReturnValue(process.env.UPROCK_CALLBACK_ORIGIN),
        body: { monitorId: '123', success: true, responseTime: 100, details: {} },
      };
      const res = {
        status: jest.fn().mockReturnThis(),
        json: jest.fn(),
      };
      const next = jest.fn();

      await distributedUptimeController.resultsCallback(req, res, next);

      const check = await DistributedUptimeCheck.findOne({ monitorId: '123' });
      expect(check).not.toBeNull();
      expect(res.status).toHaveBeenCalledWith(200);
      expect(res.json).toHaveBeenCalledWith({ message: 'OK' });
    });
  });
});
  • Coverage improvements: The suggested improvements add validation and error handling, which should be covered by unit tests. The unit tests should cover edge cases such as invalid callback payloads, unauthorized callback origins, and errors in processing the callback data.
  • Performance testing needs: The system should be tested for performance under load, especially for the resultsCallback function. The performance tests should cover scenarios such as high volume of callbacks and delayed callbacks.

6. Documentation & Maintenance

  • Documentation updates needed: The documentation should be updated to include the new DistributedUptimeCheck model and the distributed_http monitor type. The documentation should also include information on how to configure the ngrok tool for local development.
  • Long-term maintenance considerations: The system should be maintained to ensure that the ngrok tool is properly configured and that the ngrok tunnel is properly closed when the server shuts down in development. The system should also be maintained to ensure that the resultsCallback function is secure and that the data validation and error handling are effective.
  • Technical debt and monitoring requirements: The system should be monitored for technical debt and potential security vulnerabilities. The system should also be monitored for performance and stability, especially for the resultsCallback function.

7. Deployment & Operations

  • Deployment impact and strategy: The deployment of the changes should be carefully planned to ensure that the ngrok tool is properly configured and that the ngrok tunnel is properly closed when the server shuts down in development. The deployment should also ensure that the resultsCallback function is secure and that the data validation and error handling are effective.
  • Key operational considerations: The system should be operated to ensure that the ngrok tool is properly configured and that the ngrok tunnel is properly closed when the server shuts down in development. The system should also be operated to ensure that the resultsCallback function is secure and that the data validation and error handling are effective.

8. Summary & Recommendations

8.1 Key Action Items

  1. Implement request origin validation in the resultsCallback function to ensure that only authorized requests from UpRock are processed.
  2. Add basic data validation in the resultsCallback function to ensure the integrity of received data.
  3. Improve error handling in the resultsCallback function to ensure that errors are logged specifically and passed to the error handling middleware.
  4. Ensure that the ngrok tunnel is properly closed when the server shuts down in development to prevent resource leaks.
  5. Add authentication or authorization middleware to the /callback route to ensure that only authenticated requests from UpRock are processed.

8.2 Future Considerations

  • Technical evolution path: The system should evolve to include more robust security measures and data validation for the resultsCallback function. The system should also evolve to include more comprehensive monitoring and alerting for the ngrok tool and the resultsCallback function.
  • Business capability evolution: The system should evolve to include more advanced uptime monitoring capabilities, such as real-time alerts and more detailed reporting. The system should also evolve to include more comprehensive integration with external services, such as UpRock.
  • System integration impacts: The system should evolve to include more comprehensive integration with external services, such as UpRock. The system should also evolve to include more robust security measures and data validation for the resultsCallback function.

💡 Help Shape LlamaPReview
How's this review format working for you? Vote in our Github Discussion Polls to help us improve your review experience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant