Skip to content

Commit 476b205

Browse files
authored
Docs: Fix language in Schema Design docs (#17010)
1 parent 175636b commit 476b205

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/ingestion/schema-design.md

+5-5
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ In Druid, on the other hand, it is common to use totally flat datasources that d
5757
the example of the "sales" table, in Druid it would be typical to store "product_id", "product_name", and
5858
"product_category" as dimensions directly in a Druid "sales" datasource, without using a separate "products" table.
5959
Totally flat schemas substantially increase performance, since the need for joins is eliminated at query time. As an
60-
an added speed boost, this also allows Druid's query layer to operate directly on compressed dictionary-encoded data.
60+
added speed boost, this also allows Druid's query layer to operate directly on compressed dictionary-encoded data.
6161
Perhaps counter-intuitively, this does _not_ substantially increase storage footprint relative to normalized schemas,
6262
since Druid uses dictionary encoding to effectively store just a single integer per row for string columns.
6363

@@ -101,7 +101,7 @@ see [partitioning and sorting](./partitioning.md) below for details).
101101
* Create other dimensions for attributes attached to your data points. These are often called "tags" in timeseries
102102
database systems.
103103
* Create [metrics](../querying/aggregations.md) corresponding to the types of aggregations that you want to be able
104-
to query. Typically this includes "sum", "min", and "max" (in one of the long, float, or double flavors). If you want the ability
104+
to query. Typically, this includes "sum", "min", and "max" (in one of the long, float, or double flavors). If you want the ability
105105
to compute percentiles or quantiles, use Druid's [approximate aggregators](../querying/aggregations.md#approximate-aggregations).
106106
* Consider enabling [rollup](./rollup.md), which will allow Druid to potentially combine multiple points into one
107107
row in your Druid datasource. This can be useful if you want to store data at a different time granularity than it is
@@ -160,7 +160,7 @@ approximate distinct counts, and you'll reduce your storage footprint.
160160

161161
Sketches reduce memory footprint at query time because they limit the amount of data that needs to be shuffled between
162162
servers. For example, in a quantile computation, instead of needing to send all data points to a central location
163-
so they can be sorted and the quantile can be computed, Druid instead only needs to send a sketch of the points. This
163+
so that they can be sorted and the quantile can be computed, Druid instead only needs to send a sketch of the points. This
164164
can reduce data transfer needs to mere kilobytes.
165165

166166
For details about the sketches available in Druid, see the
@@ -255,7 +255,7 @@ Druid can infer the schema for your data in one of two ways:
255255

256256
You can have Druid infer the schema and types for your data partially or fully by setting `dimensionsSpec.useSchemaDiscovery` to `true` and defining some or no dimensions in the dimensions list.
257257

258-
When performing type-aware schema discovery, Druid can discover all of the columns of your input data (that aren't in
258+
When performing type-aware schema discovery, Druid can discover all the columns of your input data (that are not present in
259259
the exclusion list). Druid automatically chooses the most appropriate native Druid type among `STRING`, `LONG`,
260260
`DOUBLE`, `ARRAY<STRING>`, `ARRAY<LONG>`, `ARRAY<DOUBLE>`, or `COMPLEX<json>` for nested data. For input formats with
261261
native boolean types, Druid ingests these values as longs if `druid.expressions.useStrictBooleans` is set to `true`
@@ -298,7 +298,7 @@ If you previously used string-based schema discovery and want to migrate to type
298298
### Including the same column as a dimension and a metric
299299

300300
One workflow with unique IDs is to be able to filter on a particular ID, while still being able to do fast unique counts on the ID column.
301-
If you are not using schema-less dimensions, this use case is supported by setting the `name` of the metric to something different than the dimension.
301+
If you are not using schema-less dimensions, this use case is supported by setting the `name` of the metric to something different from the dimension.
302302
If you are using schema-less dimensions, the best practice here is to include the same column twice, once as a dimension, and as a `hyperUnique` metric. This may involve
303303
some work at ETL time.
304304

0 commit comments

Comments
 (0)