-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathproblem56
16 lines (13 loc) · 1.02 KB
/
problem56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
scala> val b = sc.parallelize(List(1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1))
b: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at parallelize at <console>:27
scala> b.countByValue().collect
<console>:30: error: missing arguments for method collect in trait TraversableLike;
follow this method with `_' if you want to treat it as a partially applied function
b.countByValue().collect
^
scala> b.countByValue()
[Stage 3:> (0 + 0) / 2[Stage 3:> (0 + 1) / 2 res6: scala.collection.Map[Int,Long] = Map(5 -> 1, 1 -> 6, 6 -> 1, 2 -> 3, 7 -> 1, 3 -> 1, 8 -> 1, 4 -> 2)
countByValue
Returns a map that contains all unique values of the RDD and their respective occurrence counts. (Warning: This operation will finally aggregate the information in a single reducer.)
Listing Variants
def countByValue(): Map[T, Long]