Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement token caching for Vercel AI SDK usage #60

Merged
merged 9 commits into from
Mar 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changeset/convert-to-zod.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
'mycoder-agent': minor
'mycoder': minor
---

Convert from JsonSchema7Type to ZodSchema for tool parameters and returns, required for Vercel AI SDK integration.
5 changes: 5 additions & 0 deletions .changeset/implement-token-caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"mycoder-agent": patch
---

Re-implemented token caching for Vercel AI SDK usage with Anthropic provider to reduce token consumption during repeated API calls.
14 changes: 14 additions & 0 deletions docs/LargeCodeBase_Plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,19 @@ This document presents research findings on how leading AI coding tools handle l
While detailed technical documentation on Claude Code's internal architecture is limited in public sources, we can infer several approaches from Anthropic's general AI architecture and Claude Code's capabilities:

1. **Chunking and Retrieval Augmentation**:

- Claude Code likely employs retrieval-augmented generation (RAG) to handle large codebases
- Files are likely chunked into manageable segments with semantic understanding
- Relevant code chunks are retrieved based on query relevance

2. **Hierarchical Code Understanding**:

- Builds a hierarchical representation of code (project → modules → files → functions)
- Maintains a graph of relationships between code components
- Prioritizes context based on relevance to the current task

3. **Incremental Context Management**:

- Dynamically adjusts the context window to include only relevant code
- Maintains a "working memory" of recently accessed or modified files
- Uses sliding context windows to process large files sequentially
Expand All @@ -35,16 +38,19 @@ While detailed technical documentation on Claude Code's internal architecture is
Aider's approach to handling large codebases can be inferred from its open-source codebase and documentation:

1. **Git Integration**:

- Leverages Git to track file changes and understand repository structure
- Uses Git history to prioritize recently modified files
- Employs Git's diff capabilities to minimize context needed for changes

2. **Selective File Context**:

- Only includes relevant files in the context rather than the entire codebase
- Uses heuristics to identify related files based on imports, references, and naming patterns
- Implements a "map-reduce" approach where it first analyzes the codebase structure, then selectively processes relevant files

3. **Prompt Engineering and Chunking**:

- Designs prompts that can work with limited context by focusing on specific tasks
- Chunks large files and processes them incrementally
- Uses summarization to compress information about non-focal code parts
Expand Down Expand Up @@ -90,6 +96,7 @@ Based on the research findings, we recommend the following enhancements to MyCod
```

**Implementation Details:**

- Create a lightweight indexer that runs during project initialization
- Generate embeddings for code files, focusing on API definitions, function signatures, and documentation
- Build a graph of relationships between files based on imports/exports and references
Expand Down Expand Up @@ -120,6 +127,7 @@ Based on the research findings, we recommend the following enhancements to MyCod
```

**Implementation Details:**

- Develop a working set manager that tracks currently relevant files
- Implement a relevance scoring algorithm that considers:
- Semantic similarity to the current task
Expand Down Expand Up @@ -148,6 +156,7 @@ Based on the research findings, we recommend the following enhancements to MyCod
```

**Implementation Details:**

- Chunk files at meaningful boundaries (functions, classes, modules)
- Implement overlapping chunks to maintain context across boundaries
- Develop a progressive loading strategy:
Expand Down Expand Up @@ -181,6 +190,7 @@ Based on the research findings, we recommend the following enhancements to MyCod
```

**Implementation Details:**

- Implement a multi-level caching system:
- Token cache: Store tokenized representations of files to avoid re-tokenization
- Embedding cache: Store vector embeddings for semantic search
Expand Down Expand Up @@ -209,6 +219,7 @@ Based on the research findings, we recommend the following enhancements to MyCod
```

**Implementation Details:**

- Improve task decomposition to identify parallelizable sub-tasks
- Implement smart context distribution to sub-agents:
- Provide each sub-agent with only the context it needs
Expand All @@ -222,16 +233,19 @@ Based on the research findings, we recommend the following enhancements to MyCod
## Implementation Roadmap

### Phase 1: Foundation (1-2 months)

- Develop the basic indexing system for project structure and file metadata
- Implement a simple relevance-based context selection mechanism
- Create a basic chunking strategy for large files

### Phase 2: Advanced Features (2-3 months)

- Implement the semantic indexing system with code embeddings
- Develop the full context management system with working sets
- Create the multi-level caching system

### Phase 3: Optimization and Integration (1-2 months)

- Enhance sub-agent coordination for parallel processing
- Optimize performance with better caching and context management
- Integrate all components into a cohesive system
Expand Down
7 changes: 6 additions & 1 deletion docs/SentryIntegration.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ npm install @sentry/node --save
## Configuration

By default, Sentry is:

- Enabled in production environments
- Disabled in development environments (unless explicitly enabled)
- Configured to capture 100% of transactions
Expand Down Expand Up @@ -56,7 +57,9 @@ Sentry.init({
tracesSampleRate: 1.0,
environment: process.env.NODE_ENV || 'development',
release: `mycoder@${packageVersion}`,
enabled: process.env.NODE_ENV !== 'development' || process.env.ENABLE_SENTRY === 'true',
enabled:
process.env.NODE_ENV !== 'development' ||
process.env.ENABLE_SENTRY === 'true',
});

// Capture errors
Expand All @@ -76,6 +79,7 @@ mycoder test-sentry
```

This command will:

1. Generate a test error that includes the package version
2. Report it to Sentry.io
3. Output the result to the console
Expand All @@ -85,6 +89,7 @@ Note: In development environments, you may need to set `ENABLE_SENTRY=true` for
## Privacy

Error reports sent to Sentry include:

- Stack traces
- Error messages
- Environment information
Expand Down
5 changes: 4 additions & 1 deletion packages/agent/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,16 @@
"author": "Ben Houston",
"license": "MIT",
"dependencies": {
"@anthropic-ai/sdk": "^0.37",
"@ai-sdk/anthropic": "^1.1.13",
"@ai-sdk/openai": "^1.2.0",
"@mozilla/readability": "^0.5.0",
"@playwright/test": "^1.50.1",
"@vitest/browser": "^3.0.5",
"ai": "^4.1.50",
"chalk": "^5",
"dotenv": "^16",
"jsdom": "^26.0.0",
"ollama-ai-provider": "^1.2.0",
"playwright": "^1.50.1",
"uuid": "^11",
"zod": "^3",
Expand Down
5 changes: 3 additions & 2 deletions packages/agent/src/core/tokens.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import Anthropic from '@anthropic-ai/sdk';
//import Anthropic from '@anthropic-ai/sdk';

import { LogLevel } from '../utils/logger.js';

Expand Down Expand Up @@ -34,14 +34,15 @@ export class TokenUsage {
return usage;
}

/*
static fromMessage(message: Anthropic.Message) {
const usage = new TokenUsage();
usage.input = message.usage.input_tokens;
usage.cacheWrites = message.usage.cache_creation_input_tokens ?? 0;
usage.cacheReads = message.usage.cache_read_input_tokens ?? 0;
usage.output = message.usage.output_tokens;
return usage;
}
}*/

static sum(usages: TokenUsage[]) {
const usage = new TokenUsage();
Expand Down
29 changes: 2 additions & 27 deletions packages/agent/src/core/toolAgent.respawn.test.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import { anthropic } from '@ai-sdk/anthropic';
import { describe, it, expect, vi, beforeEach } from 'vitest';

import { toolAgent } from '../../src/core/toolAgent.js';
Expand All @@ -15,32 +16,6 @@ const toolContext: ToolContext = {
pageFilter: 'simple',
tokenTracker: new TokenTracker(),
};
// Mock Anthropic SDK
vi.mock('@anthropic-ai/sdk', () => {
return {
default: vi.fn().mockImplementation(() => ({
messages: {
create: vi
.fn()
.mockResolvedValueOnce({
content: [
{
type: 'tool_use',
name: 'respawn',
id: 'test-id',
input: { respawnContext: 'new context' },
},
],
usage: { input_tokens: 10, output_tokens: 10 },
})
.mockResolvedValueOnce({
content: [],
usage: { input_tokens: 5, output_tokens: 5 },
}),
},
})),
};
});

describe('toolAgent respawn functionality', () => {
const tools = getTools();
Expand All @@ -56,7 +31,7 @@ describe('toolAgent respawn functionality', () => {
tools,
{
maxIterations: 2, // Need at least 2 iterations for respawn + empty response
model: 'test-model',
model: anthropic('claude-3-7-sonnet-20250219'),
maxTokens: 100,
temperature: 0,
getSystemPrompt: () => 'test system prompt',
Expand Down
26 changes: 19 additions & 7 deletions packages/agent/src/core/toolAgent.test.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import { anthropic } from '@ai-sdk/anthropic';
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { z } from 'zod';

import { MockLogger } from '../utils/mockLogger.js';

Expand All @@ -19,7 +21,7 @@ const toolContext: ToolContext = {
// Mock configuration for testing
const testConfig = {
maxIterations: 50,
model: 'claude-3-7-sonnet-latest',
model: anthropic('claude-3-7-sonnet-20250219'),
maxTokens: 4096,
temperature: 0.7,
getSystemPrompt: () => 'Test system prompt',
Expand Down Expand Up @@ -64,7 +66,11 @@ describe('toolAgent', () => {
const mockTool: Tool = {
name: 'mockTool',
description: 'A mock tool for testing',
parameters: {
parameters: z.object({
input: z.string().describe('Test input'),
}),
returns: z.string().describe('The processed result'),
parametersJsonSchema: {
type: 'object',
properties: {
input: {
Expand All @@ -74,7 +80,7 @@ describe('toolAgent', () => {
},
required: ['input'],
},
returns: {
returnsJsonSchema: {
type: 'string',
description: 'The processed result',
},
Expand All @@ -84,7 +90,11 @@ describe('toolAgent', () => {
const sequenceCompleteTool: Tool = {
name: 'sequenceComplete',
description: 'Completes the sequence',
parameters: {
parameters: z.object({
result: z.string().describe('The final result'),
}),
returns: z.string().describe('The final result'),
parametersJsonSchema: {
type: 'object',
properties: {
result: {
Expand All @@ -94,7 +104,7 @@ describe('toolAgent', () => {
},
required: ['result'],
},
returns: {
returnsJsonSchema: {
type: 'string',
description: 'The final result',
},
Expand Down Expand Up @@ -133,12 +143,14 @@ describe('toolAgent', () => {
const errorTool: Tool = {
name: 'errorTool',
description: 'A tool that always fails',
parameters: {
parameters: z.object({}),
returns: z.string().describe('Error message'),
parametersJsonSchema: {
type: 'object',
properties: {},
required: [],
},
returns: {
returnsJsonSchema: {
type: 'string',
description: 'Error message',
},
Expand Down
Loading
Loading