Skip to content

Feature: Token reporting usage #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fmunteanu opened this issue Mar 28, 2025 · 5 comments
Closed

Feature: Token reporting usage #46

fmunteanu opened this issue Mar 28, 2025 · 5 comments

Comments

@fmunteanu
Copy link

fmunteanu commented Mar 28, 2025

First, thank you for the great product, it works wonders. I was wondering if is possible to implement a token reporting usage, on each prompt? Something like Zed does, see 13/200k screenshot.

Image

Claude's Suggested Implementation

My ask:

Please review https://github.com/rusiaaman/wcgw and determine if is possible to implement a token reporting usage, within Claude Desktop:
* Add logging within the wcgw Python code (e.g., in `/src/wcgw/client/mcp_server`)
* Estimate token counts locally using a tokenizer (e.g., Anthropic’s tokenizer or a compatible one) for inputs and outputs processed through the MCP
* Output the usage result into each Claude Desktop conversation prompt

Image

As a side note, it might be beneficial to enable Discussions into your repository, for general questions.

@rusiaaman
Copy link
Owner

I'm glad you like the project.

Token counting is possible and will be helpful.

There are two ways we can report the results back:

  1. As tool output that Claude can also read
  2. As a resource that only user can open.

I like the second approach better as it doesn't hinder the user conversation flow with the model. What is your opinion? I'm not sure if by "Claude Desktop conversation prompt" you mean the tool output or resource?

@rusiaaman
Copy link
Owner

It's an interesting idea, also I've enabled the discussions.

@fmunteanu
Copy link
Author

fmunteanu commented Mar 28, 2025

I'm glad you like the project.

Token counting is possible and will be helpful.

There are two ways we can report the results back:

  1. As tool output that Claude can also read
  2. As a resource that only user can open.

I like the second approach better as it doesn't hinder the user conversation flow with the model. What is your opinion? I'm not sure if by "Claude Desktop conversation prompt" you mean the tool output or resource?

I updated the OP with additional details and findings. I'm inclined also to select the second approach. Related to your question about conversation prompt, what I like in Zed editor is that you can see into upper right corner the token usage, so it does not hinder the conversation. We could have a switch like token_usage=false to disable it, should be enabled by default. I'll trust your judgment, since you have way more experience with MCP's. 😊

Edit: @rusiaaman Note Claude's suggestion for implementation: Add token usage display in Claude Desktop interface.

That means is possible to display the token usage into Claude Desktop interface, like Zed does it. See below what Clause suggests. We might need to use the first approach, as tool output that Claude can also read, in order to display it into Claude Desktop interface.

Second conversation ask:

Related to: Add token usage display in Claude Desktop interface
Where would be a possible place to display the token usage in Claude Desktop, next to other buttons into chat text input area?

Image

Which option would you pick?

  1. Bottom-right corner of chat input area, next to send button
  2. Status bar beneath text input field, alongside character count
  3. As a small badge in the right sidebar

Image

About the suggested format 123 in | 456 out | 579 total, we should use Zed's approach 579 / 200k, we only care about the total, not the in/out numbers.

@fmunteanu
Copy link
Author

fmunteanu commented Mar 29, 2025

@rusiaaman I'm almost done with a PR, I will finish it tomorrow. This is what the implementation will look like, Claude does not render its Desktop well, the implementation will look much better live:

Image

IMO this is the best location for counter (next to Send button), as is non-invasive for end-user.

I also added into Initialize MCP Tools a "continue": False key/value, default False. If passed as True into Initialize, the MCP server will automatically submit a Continue response, when Claude hit the max length for a message and has paused its response message is detected into Claude Desktop UI:

Image

The MCP server stops sending the Continue response when the token usage reaches or exceeds the token_threshold. By default, this threshold is set to 90% (0.9). All implementation details will be detailed into PR.

For dependencies, we use anthropic_tokenizer with fallback to tiktoken:

# Fallback to tiktoken, if anthropic_tokenizer not available
try:
    from anthropic_tokenizer import count_tokens
except ImportError:
    import tiktoken
    def count_tokens(text: str) -> int:
        """Count tokens using tiktoken fallback"""
        enc = tiktoken.get_encoding("cl100k_base")
        return len(enc.encode(text))

@rusiaaman
Copy link
Owner

It's somewhat a lot to review for me.

On your suggestions

First things first, there's no way for an MCP server like wcgw to control or change the UI of an MCP client like claude desktop.

Claude doesn't have innate knowledge of MCP and that's why it has given you wrong information on what's possible.

What's possible

As an MCP server we can return output in tool call or we can attach resources. The tool output is meant for claude to know the result of a tool execution and it isn't readily consumable by the user. It's also not obvious what it's impact will be on Claude's subsequent response. In zed, the LLMs aren't shown the token count information.

Image

We could use resources, which are also meant to be shared with user. Using resource, it would look like the following (see the counter)

Image

In order to read the tokens, we'll have to click it. It then gets attached as a pasted text, which you can then open and read.

Image

Fresh thoughts on this feature

All of the UI decisions don't matter because of the following things I realised.

Unfortunately in MCP server there's no way to know when a new conversation starts. So the counter will continue in a new conversation.

We also don't know if the server has restarted. So in middle of conversation the token counter can reset.

In both of these cases our token counting will be way off.

Since we also don't know about the number of tokens outside the MCP servers, like what user has has pasted or written, or what claude has shared in form of artifact or other mcp calls, the tokens we'll show won't be representative of real conversation size.

Current outlook

Given these important limitations I don't think it's worth adding this feature to an MCP server. This has to be handled by the client. MCP clients like Zed and Cline already show this information and Claude should also show it.

In its current form, it's introducing complexity without proportional benefits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants