Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sampling support #217

Merged
merged 9 commits into from
Apr 7, 2017
Merged

Add sampling support #217

merged 9 commits into from
Apr 7, 2017

Conversation

OsvaldoRosado
Copy link
Member

Addresses #61

This also places some of the groundwork for refactoring SDK functions into isolated telemetry processors

@msftclas
Copy link

@OsvaldoRosado,
Thanks for your contribution as a Microsoft full-time employee or intern. You do not need to sign a CLA.
Thanks,
Microsoft Pull Request Bot

@OsvaldoRosado OsvaldoRosado added this to the 0.20.0 milestone Mar 30, 2017
return true;
} else if (contextObjects.correlationContext && contextObjects.correlationContext.operation) {
// If we're using dependency correlation, sampling should retain all telemetry from a given request
isSampledIn = getSamplingHashCode(contextObjects.correlationContext.operation.id) < samplingPercentage;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in .NET we use User_ID if available and if it is not available - we fall back to operation ID. Will it be hard to replicate that logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe we currently keep user id within the correlation context, so that will work for requests but no dependencies. We could update the correlation logic but that's probably a separate PR

var isSampledIn = false;

if (samplingPercentage === null || samplingPercentage === undefined || samplingPercentage >= 100) {
return true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand. If it's null we want to calculate it, correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to skip if it was already calculated so you can chain multiple sampling processors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The envelope will be prepopulated with the default level of sampling before any telemetry processors. Other telemetry processors can change the envelope sample rate field at will. This telemetry processor will always run last, reading the sampleRate field off of the envelope and deciding to keep or reject the data based off of it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the logic is - if telemetry processor set samplingRate to undefined - this telemetry processor will do nothing? This looks strange

}

/** Ported from AI .NET SDK */
export function getSamplingHashCode(input: string): number {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More and more I feel we need to share the npm module with "Base" SDK for JavaScript that will define interface like trackTrace as well as telemetryprocessor interface and such

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used for basically historical reasons :) I had already written this together for Breeze and simply moved it across. When I wrote it the first time I didn't know JS had hashing logic at all. I believe they're basically the same however. A base SDK would seem useful for this...

@MSLaguana
Copy link
Member

Is it possible to query the SDK to work out whether an event would be filtered? I'm thinking of a scenario where an application would have to perform extra work to construct a telemetry event (getting appropriate data for it) that could be avoided if we know that the event will be ignored anyway.

@OsvaldoRosado
Copy link
Member Author

OsvaldoRosado commented Mar 31, 2017

@MSLaguana it's not possible to know this ahead of time because of the possible random component to sampling. It is also by design that sampling is the last action to happen to an envelope before it is sent for sending. This allows customers to add their own telemetry processors that adjust the sample rate per item. For example, if you detect an item that you know is important and rare (say a dependency with a recorded duration of over 2 minutes), you have a chance to adjust its individual samplerate to 100 before it reaches sampling so that it is kept for sending.

@KamilSzostak
Copy link

Could you add a sample code with a short description to the README?

@OsvaldoRosado
Copy link
Member Author

@KamilSzostak Good idea. Updated!

if (result) accepted++;
}

assert.ok(accepted > (iterations * 0.40), "data should pass more than 40% of the time");
Copy link

@KamilSzostak KamilSzostak Mar 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test will fail randomly once in a ~2.72846×10^8 runs... I guess we can live with it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆

Copy link

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a CHANGELOG.md file in this repo? It proven to be very useful in -dotnet-* repositories to track which version has what changes for release notes generation

Copy link

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't realize - I couldn't find a place in samplingTelemetryProcessor where you set a samplingRate on envelope. Do you?

@OsvaldoRosado
Copy link
Member Author

The sampleRate field is pre-set when the envelope is constructed. User processors can change the field as desired.

@OsvaldoRosado
Copy link
Member Author

OsvaldoRosado commented Apr 7, 2017

I don't think we have a changelog file. Instead all PR #'s are attached to the GitHub release info on new releases.

@OsvaldoRosado OsvaldoRosado merged commit 5229f7a into develop Apr 7, 2017
OsvaldoRosado added a commit that referenced this pull request May 10, 2017
* Add build script command

* Adds schema and typescript files to npmignore

* Add tagOverrides and contextObjects (#210)

* Add tagOverrides and contextObjects

Also updated documentation. Yes, it's 1 commit. Yes, I'm a terrible person for doing that.

* Add contextObject + formatting

* Fix CORS error when using library with browserify

Fixes an obscure problem when trying to use the ApplicationInsights node.js library with a browserify application.

* Add missing docs (#218)

* Update cross component dependency type (#219)

* migrate from typings to @types (#221)

* Add sampling support (#217)

* Add sampling support

* Fix test failure

* Update readme

* Fix race condition in readme example

* Fix build error from typings changes

* update package install and build scripts (#224)

* update package install and build scripts

* Adds a prepare script to replace the prepublish script for `npm install`
in npm@5+.
* Built artifacts from tsc are aggregated and stored in ./out for easier
workspace management.
* New test case which isn't transpiled by tsc to test some scenarios.
Includes a test for loading the transpiled module.

PR-URL:
Reviewed-By:
Reviewed-By:

* fixup! update package install and build scripts

* edits to comments in applicationinsights.ts (#223)

* edits to comments

* fixup! edits to comments

* Use autogenerated schemas (#228)

* Update BOND schema, update generated typescript with new comment generation

* Update autogenerated schemas. Migrate project to only use autogenerated schema classes

* Fix build errors

* Address PR feedback

* Add missing metric

* Change metric names to be in line with .NET SDK for common metrics

* Fix broken percentage units in performance counters

* pick up generated Declaration submodule (#232)

* Set the host app version to the context tags. #183 (#233)

* Sets version to the context tags

* No Readme changes

* Correct unit test

* Update Client.ts

Typo on the comment solved and switched the method name from setVersion to overrideApplicationVersion.

* Update Client.tests.ts

I apologize for the inconvenience, unit test corrected.

* Updating how cross-application correlations are tracked (#231)

* Updating how cross-application correlations are tracked

Instead of using a hash of the instrumentation key we now use the appId, matching the .NET sdk.
We also use different headers to match the .NET sdk.

* Updating to only issue appId requests once per ikey

* Exposing profileQueryEndpoint property

Allows for the appId query target to be configured separately to the telemetry endpoint.
It may be specified either by the APPINSIGHTS_PROFILE_QUERY_ENDPOINT environment variable,
or explicitly via client.config.profileQueryEndpoint. Note that it must be set synchronously
with the creation of the Config otherwise the value will not be used.

* Allowing appId lookups to be cancelled if a new endpoint is specified

* Adding operationId to outbound HTTP headers

* Provide access to contracts (#234)

* Provide access to contracts

* Test for access to contracts

* Change access to Contracts from Client.ts to applicationinsights.ts

* Fix baseTypes (#236)

* update README

* address Alex's feedback

* Fixing CorrelationId

There was a typo which lead to the `correlationIdPrefix` being null.

* Adding diagnostic-channel (#235)

* Adding diagnostic-channel

By using diagnostic-channel and diagnostic-channel-publishers, we can support context tracking through third-party libraries as well as getting additional telemetry for those dependencies.

* Fixing cyclical reference and supporting multiple AI clients in diagnostic-channel subscribers

* Updating readme with diagnostic-channel information

* Update Readme (#238)

* Enable automatic correlation by default (#239)

* Include typescript definitions in NPM package (#240)

* Update package.json to 0.20 (#241)
@OsvaldoRosado OsvaldoRosado deleted the osrosado/sampling branch May 11, 2017 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants