Skip to content

Commit f2e66b6

Browse files
author
Lara Chiara Ost
committed
SoCG
1 parent be5dd8c commit f2e66b6

7 files changed

+101
-37
lines changed

README.md

+26-5
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Banana Trees
22

3-
This implements the banana tree data structure introduced by Cultrera di Montesano, Edelsbrunner, Henzinger and Ost at SODA 2024.
3+
This implements the banana tree data structure introduced in Cultrera di Montesano et al. "Dynamically Maintaining the Persistent Homology of Time Series" at SODA 2024.
44

55
# Build Instructions
66

@@ -39,9 +39,8 @@ The paramters $m$ and $D$ are user specified (`-m` and `-d`, respectively).
3939
The size of the left interval relative to the total time series is given by the option `-c`.
4040

4141
Both `ex_local_maintenance` and `ex_topological_maintenance` can run worst-case scenarios by selecting the appropriate subcommand.
42-
The `num_items` option works slightly differently in these executables than described in the help string:
43-
the format is `min number_of_divisions max`; `number_of_divisions` values are selected from the interval `[min, max]`,
44-
such that they are spaced evenly on a logarithmic scale.
42+
They use the correct generator by default.
43+
To mix a random walk into the input, use generators `local-wc` and `cut-wc`, respectively, with the appropriate options; see `docs/generators.md` for details.
4544

4645
`ex_time_series construct` reads a time series from standard input in the form of a sequence of function values and constructs the banana tree.
4746

@@ -53,5 +52,27 @@ This output can be converted into a csv-file using the python script `tools/stru
5352
# License
5453

5554
This repository, except files in `ext/`, is published under the MIT license.
56-
5755
See the files in `ext/` for the respective licenses.
56+
57+
If you publish results using our algorithms, please acknowledge our work by citing the corresponding papers:
58+
```
59+
@inproceedings{cultrera24,
60+
author = {Sebastiano Cultrera di Montesano and Herbert Edelsbrunner and Monika Henzinger and Lara Ost},
61+
title = {Dynamically Maintaining the Persistent Homology of Time Series},
62+
booktitle = {Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)},
63+
year = {2024},
64+
chapter = {},
65+
pages = {243-295},
66+
doi = {10.1137/1.9781611977912.11},
67+
}
68+
@misc{ost25,
69+
title={Banana Trees for the Persistence in Time Series Experimentally},
70+
author={Lara Ost and Sebastiano Cultrera di Montesano and Herbert Edelsbrunner},
71+
year={2025},
72+
eprint={2405.17920},
73+
archivePrefix={arXiv},
74+
primaryClass={cs.DS},
75+
note={To appear},
76+
url={https://arxiv.org/abs/2405.17920}
77+
}
78+
```

docs/generators.md

+48-10
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,49 @@
11
# Synthetic Data Generators
22

3+
The general format of the string passed to `--gen-args` is `<gen>:<list-of-params>`,
4+
where `<gen>` is the name of the generator and `<list-of-params>` is a semicolon-separated list of values.
5+
Parameters have to be provided in the given order.
6+
Some may be optional, but if one optional parameter is omitted, all subsequent ones must be omitted, too.
7+
Numerical parameters are converted to a number by `std::stod(...)`.
8+
39
## Gaussian Random Walks
410

5-
Generate a random walk where $f(x_i) = f(x_{i-1}) + \cal{N}(\mu, \sigma)$.
11+
Generate a random walk where $R(x_i;\mu,\sigma) = R(x_{i-1};\mu,\sigma) + \cal{N}(\mu, \sigma)$,
12+
with $R(0;\mu,\sigma) = 0$.
13+
The term $\cal{N}(\mu,\sigma)$ denotes a Gaussian random variable with mean $\mu$ and standard deviation $\sigma$.
614

715
Pass to `--gen_args` as
816
```
917
grw:<mean>;<sd>
1018
```
1119
`<mean>` is the mean of the normal distribution, `<sd>` its standard deviation.
12-
All parameters (`<...>`) are converted to a number by `std::stod(...)`.
13-
Parameter `<sd>` is optional.
20+
Parameter `<sd>` is optional with default `1`.
21+
22+
## Linear Case for Local Maintenance and Topological Maintenance
23+
24+
Pass to `--gen_args` as
25+
```
26+
<gen>:<noise>;<mean>;<sd>
27+
```
1428

15-
## Linear Case for Local Maintenance
29+
Here, `<gen>` is either `local-wc` or `cut-wc`.
30+
`<noise>` is the amount of noise mixed into the signal: 0 for no noise, 1 for only noise.
31+
The noise is a Gaussian random walk with mean `<mean>` and standard deviation `<sd>`.
32+
All parameters (except `<gen>`) are optional.
1633

17-
Pass to `--gen_args` as `local-wc`.
34+
Default values are:
35+
- `<noise>`: `0`
36+
- `<mean>`: `0`
37+
- `<sd>`: `1`
38+
39+
The executable `generate_data` also accepts `glue-wc` for `<gen>`; this produces the same series as `cut-wc`.
1840

1941
## Quasi-Periodic Functions
2042

2143
### Method 1 -- Sine Wave + Random Walk
2244

2345
$f(x) = a \cdot sin(\omega x) + R(x; \mu, \sigma)$, where $R(x; \mu, \sigma)$ is a Gaussian random walk.
46+
Items are generated for $x=0,1,\dots$.
2447

2548
Parameters:
2649
- $a$: amplitude of the sine wave
@@ -32,13 +55,19 @@ Pass to `--gen_args` as
3255
```
3356
sqp:<period>;<amplitude>;<mean>;<sd>
3457
```
35-
All parameters (`<...>`) are converted to a number by `std::stod(...)`.
36-
All are optional, but must be specified in the given order,
37-
i.e., if `<mean>` is given, then `<period>` and `<amplitude>` cannot be omitted.
58+
All parameters are optional, but must be specified in the given order,
59+
e.g., if `<mean>` is given, then `<period>` and `<amplitude>` cannot be omitted.
60+
61+
Default values are:
62+
- `<period>`: `100`
63+
- `<amplitude>`: `1`
64+
- `<mean>`: `0`
65+
- `<sd>`: `1`
3866

3967
### Method 2 -- Sine Wave Modulating a Random Walk
4068

4169
$f(x) = R(x; a \cdot sin(wx), \sigma)$, where $R(x; \mu, \sigma)$ is a Gaussian random walk.
70+
Items are generated for $x=0,1,\dots$.
4271

4372
Parameters:
4473
- $a$: amplitude of the sine wave
@@ -49,6 +78,15 @@ Pass to `--gen_args` as
4978
```
5079
mqp:<number of periods>;<amplitude>;<sd>
5180
```
52-
All parameters (`<...>`) are converted to a number by `std::stod(...)`.
5381
All are optional, but must be specified in the given order,
54-
i.e., if `<sd>` is given, then `<number of periods>` and `<amplitude>` cannot be omitted.
82+
e.g., if `<sd>` is given, then `<number of periods>` and `<amplitude>` cannot be omitted.
83+
84+
Default values are:
85+
- `<number of periods>`: `5.5`
86+
- `<amplitude>`: `1`
87+
- `<sd>`: `1`
88+
89+
### A Note
90+
Both of these methods do essentially the same thing.
91+
For "historical" reasons `sqp` takes the period of the sine wave as input, while in `mqp` you select the number of periods over the whole input.
92+
Furthermore, `sqp` allows to bias the random walk, which `mqp` does not.

src/app/experiments/ex_local_maintenance.cpp

+2-2
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ int main(int argc, char** argv) {
246246

247247
CLI::App app("Construction Experiments");
248248

249-
add_num_items_option(app, num_item_limits) -> required();
249+
add_logspace_num_items_option(app, num_item_limits) -> required();
250250
add_seed_option(app, seed);
251251
add_num_reps_option(app, num_reps);
252252
add_gudhi_flag(app, run_gudhi);
@@ -272,7 +272,7 @@ int main(int argc, char** argv) {
272272
CLI11_PARSE(app, argc, argv);
273273

274274
if (num_item_limits[0] < 2 || num_item_limits[1] == 0 || num_item_limits[2] < num_item_limits[0]) {
275-
std::cerr << "num_items needs to be of the form min step max, with min >= 2, step >= 1 and max >= min.\n";
275+
std::cerr << "num_items needs to be of the form min number_of_steps max, with min >= 2, number_of_steps >= 1 and max >= min.\n";
276276
std::cerr << app.help() << std::endl;
277277
return 1;
278278
}

src/app/experiments/ex_topological_maintenance.cpp

+10-10
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,7 @@ int main(int argc, char** argv) {
252252

253253
add_seed_option(app, seed);
254254
add_num_reps_option(app, num_reps);
255-
add_num_items_option(app, num_item_limits) -> required();
255+
add_logspace_num_items_option(app, num_item_limits) -> required();
256256
add_gudhi_flag(app, run_gudhi);
257257
add_persistence1d_flag(app, run_persistence1d);
258258
auto* gen_opt = add_gen_args_option(app, generator_args);
@@ -275,7 +275,7 @@ int main(int argc, char** argv) {
275275
const auto step_num_items = num_item_limits[1];
276276
const auto max_num_items = num_item_limits[2];
277277
if (min_num_items < 2 || step_num_items == 0 || max_num_items < min_num_items) {
278-
std::cerr << "num_items needs to be of the form min step max, with min >= 2, step >= 1 and max >= min.\n";
278+
std::cerr << "num_items needs to be of the form min number_of_steps max, with min >= 2, number_of_steps >= 1 and max >= min.\n";
279279
std::cerr << app.help() << std::endl;
280280
return 1;
281281
}
@@ -334,18 +334,18 @@ int main(int argc, char** argv) {
334334
std::cout << "# Linear-time case for cutting.\n";
335335
for (auto num_items: logspace_items) {
336336
// Need an odd number of items
337-
if (num_items % 2 == 0) {
337+
if (num_items % 2 != 0) {
338338
num_items++;
339339
}
340-
if (num_items % 4 != 1) {
340+
if (num_items % 4 != 0) {
341341
num_items += 2;
342342
}
343-
massert(num_items % 4 == 1, "Expected number of items to be a number divisible by 4 plus 1.");
343+
massert(num_items % 4 == 0, "Expected number of items to be divisible by 4.");
344344
cut_experiment<topological_worst_case_generator<false, decltype(rng)>>(num_items, 0.5, {rng, gen_param_string}, num_reps, run_gudhi, run_persistence1d);
345345
std::cout << "--\n";
346346
}
347347
} else if (app.got_subcommand(wc_glue_app)) {
348-
if (*gen_opt && gen_name != topological_worst_case_generator<true>::get_name()) {
348+
if (*gen_opt && gen_name != topological_worst_case_generator<false>::get_name()) {
349349
std::cerr << "wc-glue app requires cut-wc generator.\n";
350350
return 1;
351351
}
@@ -356,14 +356,14 @@ int main(int argc, char** argv) {
356356
std::cout << "# Linear-time case for gluing.\n";
357357
for (auto num_items: logspace_items) {
358358
// Need an odd number of items
359-
if (num_items % 2 == 0) {
359+
if (num_items % 2 != 0) {
360360
num_items++;
361361
}
362-
if (num_items % 4 != 1) {
362+
if (num_items % 4 != 0) {
363363
num_items += 2;
364364
}
365-
massert(num_items % 4 == 1, "Expected number of items to be a number divisible by 4 plus 1.");
366-
glue_experiment<topological_worst_case_generator<true, decltype(rng)>>(num_items, 0.5, {rng, gen_param_string}, num_reps, run_gudhi, run_persistence1d);
365+
massert(num_items % 4 == 0, "Expected number of items to be divisible by 4.");
366+
glue_experiment<topological_worst_case_generator<false, decltype(rng)>>(num_items, 0.5, {rng, gen_param_string}, num_reps, run_gudhi, run_persistence1d);
367367
std::cout << "--\n";
368368
}
369369
}

src/app/experiments/generate_data.cpp

+6-6
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,10 @@ int main(int argc, char** argv) {
7070
std::cerr << "Need at least 4 items for topological maintenance worst-case\n";
7171
return 1;
7272
}
73+
if (num_items % 4 != 0) {
74+
std::cerr << "Number of items needs to be a multiple of 4 for topological maintenance worst-case.\n";
75+
return 1;
76+
}
7377
random_number_generator rng{seed};
7478
topological_worst_case_generator<false> gen{{rng, gen_param_string}};
7579
gen(values, num_items);
@@ -78,12 +82,8 @@ int main(int argc, char** argv) {
7882
std::cerr << "Need at least 4 items for topological maintenance worst-case\n";
7983
return 1;
8084
}
81-
if (num_items % 2 == 0) {
82-
std::cerr << "Need an odd number of items for topological maintenance worst-case\n";
83-
return 1;
84-
}
85-
if ((num_items/2) % 2 != 0) {
86-
std::cerr << "Number of items needs to be a multiple of 4 plus 1 for topological maintenance worst-case.\n";
85+
if (num_items % 4 != 0) {
86+
std::cerr << "Number of items needs to be a multiple of 4 for topological maintenance worst-case.\n";
8787
return 1;
8888
}
8989
random_number_generator rng{seed};

src/app/experiments/utility/cli_options.h

+6
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,12 @@ inline CLI::Option* add_num_items_option(CLI::App& app, std::array<size_t, 3>& n
3333
"Number of items in the form 'min step max'");
3434
}
3535

36+
inline CLI::Option* add_logspace_num_items_option(CLI::App& app, std::array<size_t, 3>& num_item_limits) {
37+
return app.add_option("num_items",
38+
num_item_limits,
39+
"Number of items in the form 'min number_of_steps max'; uniformly spaced on a logarithmic scale");
40+
}
41+
3642
inline CLI::Option* add_gen_args_option(CLI::App& app, std::string& generator_args) {
3743
return app.add_option("--gen-args",
3844
generator_args,

src/app/experiments/utility/data_generation.h

+3-4
Original file line numberDiff line numberDiff line change
@@ -450,12 +450,11 @@ struct topological_worst_case_generator {
450450
decrease = false;
451451
long value = 1;
452452
auto size_offset = two_stage ? values.size() : 0;
453-
while (values.size() < target_size - 1 + size_offset) {
454-
values.push_back(static_cast<function_value_type>(value) + 0.1);
455-
values.push_back(static_cast<function_value_type>(-value) + 0.1);
453+
while (values.size() < target_size + size_offset) {
454+
values.push_back(static_cast<function_value_type>(value) - 0.1);
455+
values.push_back(static_cast<function_value_type>(-value) - 0.1);
456456
value++;
457457
}
458-
values.push_back(static_cast<function_value_type>(value) + 0.1);
459458
}
460459
if (noise_amount != 0) {
461460
auto scale = stage_size/2;

0 commit comments

Comments
 (0)