Skip to content

Using Base Data

Christopher Bradford edited this page Oct 17, 2018 · 2 revisions

Running SELECT queries against a DSE Cluster without knowing what data is in the tables is difficult if not impossible to accomplish. The FetchBaseData class combined with the configuration params allow for pulling into a CSV file column values with only the table's partition key(s) and columns wanted required.

FetchBaseData class will connect to the Cluster and for the given keyspace and table iterate over all the nodes in the datacenter picking a random number of token ranges and querying for X number of partition keys writing the columns fetched to a CSV to be used in a Feeder.

Used Settings

Param Type Notes
keyspace string Cluster keyspace to use
table string Cluster table to use
dataFile string CSV File to write the found values to. Uses general.dataDir param as the base directory.
appendToFile boolean Append values to an existing dataFile or reset on each run
perPartitionDisabled boolean With C* 3.6+ should the PER PARTITION query option be disabled. If C* < 3.6 it will default to disabled and revert to the in-memory check for duplicate values
tokenRangesPerHost int Number of token ranges to use per host found in the cluster. If dcName set in configuration will only use the connected datacenter nodes
maxPartitionKeys int Number of unique partition key and columns per token range to write into the CSV
paginationSize int Size of request paging by the driver to limit the amount of rows returned per request
partitionKeyColumns list Tables partition keys
columnsToFetch list Table columns to fetch from the table and place in the CSV

Example Using FetchBaseData

Configuration

defaults {
  keyspace = load_example
  table = order_data
  dataFile = my.csv
  perPartitionDisabled = false
  tokenRangesPerHost = 10
  paginationSize = 100
  maxPartitionKeys = 500
  appendToFile = false
  partitionKeyColumns = [order_no]
  columnsToFetch = [order_no]
}

Simulation

new FetchBaseData(simConf, cass).createBaseDataCsv()

val feederFile = getDataPath(simConf)
val csvFeeder = csv(feederFile).random

val readScenario = scenario("OrderRead")
  .feed(csvFeeder)
  .exec(orderActions.readOrder)

The above code will connect using the cassandra conf sections parameters to connect to the cluster and for the load_example.order_data table iterate through 10 random token ranges per host fetching the order_no column from up to 500 unique partition keys. Then the Simulation's readScenario will use the created CSV file and randomly pick rows and the corresponding order_no and use with a SELECT query.