Skip to content

Standard deviation for Array lengths #51

Open
@d4nirod

Description

@d4nirod

Given the example output below (edited for brevity) for a specific field of interest from a collection analysis: Tasks, which can optionally have Assignees (array).
Regarding the number of assignees (array length) it would be very useful to have the standard deviation besides the already provided average_length

   "_id" : ObjectId("5a8d71276397ce1a2dd42bbe"), 
   "name" : "assignees", 
   "path" : "assignees", 
   "count" : NumberInt(44), 
   "types" : [
       {
           "name" : "Undefined", 
           "type" : "Undefined", 
           "path" : "assignees", 
           "count" : NumberInt(56), 
           "total_count" : NumberInt(0), 
           "probability" : 0.56, 
           "unique" : NumberInt(1), 
           "has_duplicates" : true
       }, 
       {
           "name" : "Array", 
           "bsonType" : "Array", 
           "path" : "assignees", 
           "count" : NumberInt(30), 
           "types" : [
               {
                   "name" : "DBRef", 
                   "bsonType" : "DBRef", 
                   "path" : "assignees", 
                   "count" : NumberInt(37), 
                   "values" : [
                       DBRef("cw_user", ObjectId("577e7f1f300488c6676b3406")), 
                       DBRef("cw_user", ObjectId("577e7f08300488c6676b33f5")), 
                       DBRef("cw_role", ObjectId("582493383004c0551c10bc5d")), 
                       DBRef("cw_user", ObjectId("577e7f08300488c6676b33f5")), 
                       DBRef("cw_user", ObjectId("577e7f08300488c6676b33f5")), 
                       DBRef("cw_user", ObjectId("577e7f08300488c6676b33f5")), 
                       DBRef("cw_role", ObjectId("5a7c46bd39c3cc64d3683a18")), 
                       DBRef("cw_user", ObjectId("577e7f08300488c6676b33f5")), 
                       DBRef("cw_user", ObjectId("577e7f08300488c6676b33f5")), 
                       DBRef("cw_role", ObjectId("5a7c46bd39c3cc64d3683a18")), 
                       DBRef("cw_user", ObjectId("577e7f85300488c6676b344c")), 
                       DBRef("cw_user", ObjectId("577e7f08300488c6676b33f5")), 
                       DBRef("cw_user", ObjectId("5a8d51cd6397ce5a3b44496c"))
                   ], 
                   "total_count" : NumberInt(0), 
                   "probability" : NumberInt(1), 
                   "unique" : NumberInt(1), 
                   "has_duplicates" : true
               }
           ], 
           "lengths" : [
               NumberInt(2), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(2), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(3), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(3), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(1), 
               NumberInt(2), 
               NumberInt(1), 
               NumberInt(0), 
               NumberInt(2), 
               NumberInt(2), 
               NumberInt(0), 
               NumberInt(1)
           ], 
           "total_count" : NumberInt(37), 
           "probability" : 0.3, 
           "average_length" : 1.2333333333333334
       }, 
       {
           "name" : "Null", 
           "bsonType" : "Null", 
           "path" : "assignees", 
           "count" : NumberInt(14), 
           "total_count" : NumberInt(0), 
           "probability" : 0.14, 
           "unique" : NumberInt(1), 
           "has_duplicates" : true
       }
   ], 
   "total_count" : NumberInt(100), 
   "type" : [
       "Undefined", 
       "Array", 
       "Null"
   ], 
   "has_duplicates" : true, 
   "probability" : 0.44
}

Activity

d4nirod

d4nirod commented on Feb 23, 2018

@d4nirod
Author
  • From my limited understanding they're two different things if you are dealing with the whole dataset (population) or a subset (sample) but want to extrapolate the results of the latter to the whole.
  • Mongo aggregations easily give you both with $stdDevPop and $stdDevSamp

So why not have both either by default or with some option?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @d4nirod@pzrq

        Issue actions

          Standard deviation for Array lengths · Issue #51 · mongodb-js/mongodb-schema