|
| 1 | +osmzen [](https://travis-ci.org/paulmach/osmzen) [](https://godoc.org/github.com/paulmach/osmzen) |
| 2 | +====== |
| 3 | + |
| 4 | +This is a port of [tilezen/vector-datasource](https://github.com/tilezen/vector-datasource) developed by |
| 5 | +[Mapzen](https://mapzen.com/). It converts [Open Street Map](https://www.openstreetmap.org/) data |
| 6 | +directly into GeoJSON with properties that are understood by [Mapzen house |
| 7 | +styles](https://mapzen.com/products/maps/). |
| 8 | + |
| 9 | + |
| 10 | +A Postgres database is not required to evaluate the logic that is originally defined in a combination |
| 11 | +of SQL and Python. This allows for the quick mapping of any OSM element(s) to a `kind`/`kind_detail` |
| 12 | +normalization. Such a normalization is non-trivial given the "diversity" of OSM tagging so projects |
| 13 | +like tilezen/vector-datasource (and may others) are necessary. |
| 14 | + |
| 15 | +The port currently implements almost all features applicable to evaluating zoom 14+ tile data. |
| 16 | +These features include: |
| 17 | + |
| 18 | +* all filter, min_zoom and output logic defined in the `yaml/*.yaml` files, |
| 19 | +* all transforms that apply, implementation specific data transforms are skipped, |
| 20 | +* the CSV matcher post processor to set the `scale_rank` and `sort_rank` properties, |
| 21 | +* geometry clipping and label placement logic. |
| 22 | + |
| 23 | +A lot of post processors still need to be ported, but only a few of the missing ones apply |
| 24 | +to zooms 14+. Missing post processors include: landuse_kind intercuts, merging line strings |
| 25 | +and merging building with building parts. |
| 26 | + |
| 27 | +It would also be nice to port some of the integration tests as they would give confidence that |
| 28 | +things are really working as expected. Right now there are just some unit tests and some |
| 29 | +high level sanity checks. |
| 30 | + |
| 31 | +#### Changes from the original tilezen/vector-datasource |
| 32 | + |
| 33 | +The goal is for there to be no functional differences for zooms 14+. The YAML definition files are |
| 34 | +unchanged, there a just a few minor changes to the post processor filtering in `queries.yaml`. See |
| 35 | +the [github diff](https://github.com/tilezen/vector-datasource/compare/master...paulmach:master). |
| 36 | + |
| 37 | +The port is based off of [v1.4.0ish](https://github.com/tilezen/vector-datasource/releases/tag/v1.4.0) |
| 38 | +version of the vector-datasource. The [fork](https://github.com/paulmach/vector-datasource) or the |
| 39 | +[github diff](https://github.com/paulmach/vector-datasource/compare/master...tilezen:master) between |
| 40 | +it and upstream/master are kept at the intended "reference". |
| 41 | + |
| 42 | +Usage |
| 43 | +----- |
| 44 | + |
| 45 | +1. Load and compile the `queries.yaml`, `yaml/*.yaml` and `spreadsheets/*_rank/*.csv` files. This can |
| 46 | + be done by loading the files directly using the implied directory structure: |
| 47 | + |
| 48 | + config, err := osmzen.Load("config/queries.yaml") |
| 49 | + |
| 50 | + or if you want to use the "official" ported config files but don't want to distribute them with |
| 51 | + the binary you can make use of the `embeddedconfig` subpackage which uses |
| 52 | + [go-bindata](https://github.com/jteeuwen/go-bindata) to "compile in" the files: |
| 53 | + |
| 54 | + config, err := osmzen.LoadEmbeddedConfig(embeddedconfig.Asset) |
| 55 | + |
| 56 | + If there are mistakes in the YAML the error will contain a lot of information to help debug: |
| 57 | + |
| 58 | + if err, ok := errors.Cause(err).(*filter.CompileError); ok { |
| 59 | + log.Printf("error: %v", err.Error()) |
| 60 | + log.Printf("cause: %v", err.Cause) |
| 61 | + log.Printf("yaml:\n%s", err.YAML()) // chunk of marshalled YAML with the issue |
| 62 | + } else if err != nil { |
| 63 | + log.Printf("other err: %v", err) |
| 64 | + } |
| 65 | + |
| 66 | +2. Process some OSM data: |
| 67 | + |
| 68 | + data := osm.OSM{} |
| 69 | + layers, err := config.Process(data, geo.Bound(-180, 180, -90, 90), zoom) |
| 70 | + |
| 71 | + // layers is defined as `map[string]*geojson.FeatureCollection` |
| 72 | + |
| 73 | + Layers can also be processed individually: |
| 74 | + |
| 75 | + featureCollection, err := config.Layers["buildings"].Process(data, zoom) |
| 76 | + |
| 77 | +The result is a GeoJSON feature collection with `kind`, `kind_detail` etc. properties that |
| 78 | +are understood by [Mapzen house styles](https://mapzen.com/products/maps/). |
| 79 | + |
| 80 | +## Example |
| 81 | + |
| 82 | +A more complete example that loads a zoom 16 area from the OSM API and |
| 83 | +the processes the tile (minus error checking): |
| 84 | + |
| 85 | +```go |
| 86 | +package main |
| 87 | + |
| 88 | +import ( |
| 89 | + "context" |
| 90 | + "encoding/json" |
| 91 | + "fmt" |
| 92 | + |
| 93 | + "github.com/paulmach/osmzen" |
| 94 | + "github.com/paulmach/osmzen/embeddedconfig" |
| 95 | + |
| 96 | + "github.com/paulmach/orb/maptile" |
| 97 | + "github.com/paulmach/osm" |
| 98 | + "github.com/paulmach/osm/osmapi" |
| 99 | +) |
| 100 | + |
| 101 | +func main() { |
| 102 | + tile := maptile.New(19613, 29310, 16) |
| 103 | + |
| 104 | + // load osmzen config |
| 105 | + config, _ := osmzen.LoadEmbeddedConfig(embeddedconfig.Asset) |
| 106 | + |
| 107 | + // get osm data for a tile from the offical api. |
| 108 | + bounds, _ := osm.NewBoundsFromTile(tile) |
| 109 | + data, _ := osmapi.Map(context.Background(), bounds) |
| 110 | + |
| 111 | + // process the data |
| 112 | + // The tile coords will be used to exclude include interesting nodes |
| 113 | + // and labels outside the tile. |
| 114 | + layers, _ := config.Process(data, tile.Bound(), tile.Z) |
| 115 | + |
| 116 | + // pretty print the json |
| 117 | + pretty, _ := json.MarshalIndent(layers, "", " ") |
| 118 | + fmt.Println(string(pretty)) |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +Implementation details |
| 123 | +---------------------- |
| 124 | + |
| 125 | +At a high level [tilezen/vector-datasource](https://github.com/tilezen/vector-datasource) filters and |
| 126 | +process's its data using the following steps: |
| 127 | + |
| 128 | +1. find relevant elements for a layer using the SQL query defined in `data/{layer_name}.jinja`, |
| 129 | +2. filter the elements using filter *conditions* defined in `yaml/{layer_name}.yaml`, |
| 130 | +3. generate properties for each element using the matching filter's output *expressions*, |
| 131 | +4. apply *transforms* to each element independently, |
| 132 | +5. apply *post processes* to all the layers together. |
| 133 | + |
| 134 | +The transforms and post processes that apply to each layer and zoom are defined in `queries.yaml`. |
| 135 | +For a lot more details see the official tilezen/vector-datasource [project |
| 136 | +overview](https://github.com/tilezen/vector-datasource/blob/master/CONTRIBUTING.md). |
| 137 | + |
| 138 | +As this package is a port of that code it follows the same steps, except for step 1 since the data |
| 139 | +is passed in directly. |
| 140 | + |
| 141 | +### Loading and compiling config |
| 142 | + |
| 143 | +During the loading of the YAML+CSV config files everything is compiled to make sure all the |
| 144 | +expressions and function references are known. If there is a typo, or something new/unsupported, an |
| 145 | +error will be returned. See above for how to get useful information from the error. The initial |
| 146 | +compile step allows for the checking of config errors at startup. Also since the types are converted |
| 147 | +up front there is a nice performance boost of about 10x. |
| 148 | + |
| 149 | +The filters and outputs defined in the `yaml/*.yaml` files are basically a set of statements that |
| 150 | +act like: "if the element tags look like this, output these kind, kind_detail, etc. properties". |
| 151 | + |
| 152 | +The filters define a condition, yes/no matching, that evaluates into a boolean. During the compile |
| 153 | +step these are converted into concrete types that implement the `filter.Condition` interface. The |
| 154 | +interface is defined as: |
| 155 | + |
| 156 | + type filter.Condition interface { |
| 157 | + Eval(*filter.Context) bool |
| 158 | + } |
| 159 | + |
| 160 | +The output for each filter defines what properties should be assigned to the element's GeoJSON |
| 161 | +feature. They output things such as booleans (is_tunnel), strings (kind), numbers (area) or nil to |
| 162 | +be ignored. The interface is defined as: |
| 163 | + |
| 164 | + type fitler.Expression interface { |
| 165 | + Eval(*filter.Context) interface{} |
| 166 | + } |
| 167 | + |
| 168 | + type filter.NumExpression interface { |
| 169 | + filter.Expression |
| 170 | + EvalNum(*filter.Context) float64 |
| 171 | + } |
| 172 | + |
| 173 | +The `filter.NumExpression` is also implemented by expressions that must be a number (e.g. area, |
| 174 | +building height). Using it helps avoid a type indirection when we know we need numbers. For example |
| 175 | +the `min` and `max` expressions. |
| 176 | + |
| 177 | +The `filter.Context` is passed in at runtime and contains info about the element being evaluated |
| 178 | +like the OSM tags and geometry. It also caches "expensive" things like the area and volume that can |
| 179 | +be used by multiple filters. |
| 180 | + |
| 181 | +#### Transforms and post processes |
| 182 | + |
| 183 | +After elements for a layer are matched and GeoJSON features are created, a set of transforms is |
| 184 | +applied. The transforms edit the element properties based on some logic, sometimes requiring the |
| 185 | +set of relations the original OSM element is a member of. |
| 186 | + |
| 187 | +The **transforms** are matched while loading the config to a function of the form: |
| 188 | + |
| 189 | + func(*filter.Context, *geojson.Feature) |
| 190 | + |
| 191 | +Transforms can just change a feature, they can't remove a feature if it's "bad" for any reason, like |
| 192 | +too small for the zoom. Transforms also don't know about other features so they can't be used to |
| 193 | +remove duplicates or merge features, like parts of the same road. However, transforms can be used to |
| 194 | +do things like fix one-way direction, add the correct highway shield text, abbreviate road names, |
| 195 | +etc. |
| 196 | + |
| 197 | +The **post processes** are compiled to load files and check the parameters. They are mapped to an |
| 198 | +object implementing the `postprocess.Function` interface defined as: |
| 199 | + |
| 200 | + type postprocess.Function interface { |
| 201 | + Eval(*postprocess.Context, map[string]*geojson.FeatureCollection) |
| 202 | + } |
| 203 | + |
| 204 | +The function takes all the layers as input. Some examples of post processing are clipping to the |
| 205 | +tile bounds, setting sort_rank and scale_rank, removing duplicate features, removing small areas, |
| 206 | +merging lines, etc. |
| 207 | + |
| 208 | +### Evaluating some data |
| 209 | + |
| 210 | +Once everything is all setup we can start evaluating data against the filters and apply the |
| 211 | +transforms and post processes. The input is OSM data, a bound, plus a zoom. The bound is used to |
| 212 | +clip geometry and check if a label should be included. The zoom is used to filter out |
| 213 | +things that are "too small" as defined by the `min_zoom` output in the `yaml/*.yaml` files. To |
| 214 | +include everything, use a high zoom, such as 20. |
| 215 | + |
| 216 | +The evaluation proceeds in the following steps: |
| 217 | + |
| 218 | +1. Convert OSM data to GeoJSON |
| 219 | + |
| 220 | + The data is run through [osm/osmgeojson](https://github.com/paulmach/osm/tree/master/osmgeojson) |
| 221 | + which is a port of the [osmtogeojson](https://github.com/tyrasd/osmtogeojson) node.js library. |
| 222 | + This groups nodes into ways and ways into polygons. For example, we don't care about the 4 nodes |
| 223 | + that define a building, we just want the building polygon. |
| 224 | + |
| 225 | +2. Run each OSM element GeoJSON feature through the filters |
| 226 | + |
| 227 | + We find the first filter in each layer to match and then compute the filter's outputs. Note, |
| 228 | + that an element can match in multiple layers, for example a building polygon and a POI. |
| 229 | + The input and output are both GeoJSON, however, the input contains properties based on OSM tags, |
| 230 | + but the output has properties from the filter like the `kind` and `kind_detail` etc. |
| 231 | + |
| 232 | +3. Apply the transforms |
| 233 | + |
| 234 | + The new GeoJSON object is updated a bit. This can include reversing the geometry or simplifying |
| 235 | + the name. |
| 236 | + |
| 237 | +4. Apply the post processes to all the layers. |
| 238 | + |
| 239 | +The end result is a layer, or set of layers that match those produced by `tilezen`. |
| 240 | +Note that this whole process can be applied to a single element. |
| 241 | + |
| 242 | +### Benchmarks |
| 243 | + |
| 244 | +The first two benchmarks evaluate a single element against ALL the filters and outputs |
| 245 | +in that layer. Normally you can stop after the first match and only evaluate that one output. |
| 246 | +The second benchmark is more typical of normal usage and coverts data from a zoom 16 tile. |
| 247 | + |
| 248 | +``` |
| 249 | +BenchmarkBuildings-4 200000 9969 ns/op 1040 B/op 42 allocs/op |
| 250 | +BenchmarkPOIs-4 10000 171457 ns/op 6816 B/op 450 allocs/op |
| 251 | +BenchmarkFullTile-4 100 11292314 ns/op 3611916 B/op 26555 allocs/op |
| 252 | +``` |
| 253 | + |
| 254 | +These benchmarks were run on a 2017 MacBook Pro with a 3.1 ghz processor and 8 gigs of ram. |
| 255 | +No concurrency is used in this package. |
| 256 | + |
| 257 | +#### This library makes use of the following packages: |
| 258 | + |
| 259 | +* [github.com/pkg/errors](https://github.com/pkg/errors) - for rich errors with stack traces |
| 260 | +* [gopkg.in/yaml.v2](http://gopkg.in/yaml.v2) - YAML parsing |
| 261 | +* [github.com/paulmach/orb](https://github.com/paulmach/orb) - geometry area, centroid, clipping, etc. |
| 262 | +* [github.com/paulmach/osm](https://github.com/paulmach/osm) |
0 commit comments