-
Notifications
You must be signed in to change notification settings - Fork 9
Using Base Data
Running SELECT
queries against a DSE Cluster without knowing what data is in the tables is difficult if not impossible to accomplish. The FetchBaseData
class combined with the configuration params allow for pulling into a CSV file column values with only the table's partition key(s) and columns wanted required.
FetchBaseData class will connect to the Cluster and for the given keyspace and table iterate over all the nodes in the datacenter picking a random number of token ranges and querying for X number of partition keys writing the columns fetched to a CSV to be used in a Feeder.
Param | Type | Notes |
---|---|---|
keyspace | string | Cluster keyspace to use |
table | string | Cluster table to use |
dataFile | string | CSV File to write the found values to. Uses general.dataDir param as the base directory. |
appendToFile | boolean | Append values to an existing dataFile or reset on each run |
perPartitionDisabled | boolean | With C* 3.6+ should the PER PARTITION query option be disabled. If C* < 3.6 it will default to disabled and revert to the in-memory check for duplicate values |
tokenRangesPerHost | int | Number of token ranges to use per host found in the cluster. If dcName set in configuration will only use the connected datacenter nodes |
maxPartitionKeys | int | Number of unique partition key and columns per token range to write into the CSV |
paginationSize | int | Size of request paging by the driver to limit the amount of rows returned per request |
partitionKeyColumns | list | Tables partition keys |
columnsToFetch | list | Table columns to fetch from the table and place in the CSV |
Configuration
defaults {
keyspace = load_example
table = order_data
dataFile = my.csv
perPartitionDisabled = false
tokenRangesPerHost = 10
paginationSize = 100
maxPartitionKeys = 500
appendToFile = false
partitionKeyColumns = [order_no]
columnsToFetch = [order_no]
}
Simulation
new FetchBaseData(simConf, cass).createBaseDataCsv()
val feederFile = getDataPath(simConf)
val csvFeeder = csv(feederFile).random
val readScenario = scenario("OrderRead")
.feed(csvFeeder)
.exec(orderActions.readOrder)
The above code will connect using the cassandra
conf sections parameters to connect to the cluster and for the load_example.order_data
table iterate through 10 random token ranges per host fetching the order_no
column from up to 500 unique partition keys. Then the Simulation's readScenario
will use the created CSV file and randomly pick rows and the corresponding order_no
and use with a SELECT query.
Getting Started
Feeds
Fetching Base Data
Load Generators
Simulations
- CQL Executors
- Simulation Writing Shortcuts
- Using Custom ClusterBuilder Settings
- Using Client to Node SSL
Advanced Topics
- Dependent Queries
- Using Graphite and Grafana
- Debugging Simulations
- Adjusting input from Feed Before Executing a Query
Archived (v1.0) Topics