Skip to content

Releases: ovis-hpc/ldms

Release 4.4.5

15 Feb 23:25
Compare
Choose a tag to compare

What's Changed

  • [b4.4] Fix ldmsd_controller completion without ldmsd connection by @narategithub in #1438
  • Fix incorrect remote hostname and port produced by ldms_xprt_names() by @nichamon in #1440
  • fix device group detection for mellanox in filesingle in b4.4 on CTS2 machines by @baallan in #1441
  • ldmsd hanging on delete_thread fix by @jennfshr in #1449
  • python2 to 3 conversion backport to b4.4 by @jennfshr in #1451
  • Fix ldms_ls crash dumping a decomposition by @tom95858 in #1455
  • Introduce Sampler Advertisement by @nichamon in #1459
  • Perform validity checks on set name by @tom95858 in #1465
  • Fix set_sec_mod command error response by @narategithub in #1466
  • Fix libfabric build option (b4.4 backport) by @morrone in #1473
  • Prevent use-after-free in stream publisher RBN key by @nichamon in #1479
  • Document LDMS_DELET_TIMEOUT environment variable by @nichamon in #1482
  • Address mixed use of ldmsd_log() and ovis_log() by @nichamon in #1481
  • (for b4.4) Add ldms-containers dispatcher on release by @narategithub in #1478
  • ovis-roll-over.py conversion to python3 (backport to b4.4) by @morrone in #1467
  • b4.4 bug: Fix free/put of uninitialized pointers in ldmsd_request error handling paths by @baallan in #1484
  • Improve error reporting in ldmsd_controller by @tom95858 in #1493
  • [b4.4] Fix static decomposition by @narategithub in #1490
  • b4.4 bug: Store csv lents by @baallan in #1487
  • wip: b4.4 bug: blob_stream_writer plugin: stop generating empty files by @baallan in #1485
  • Expand characters disallowed in set names by @tom95858 in #1494
  • [b4.4] Set record_sampler non-array metric array_len spec to 0 by @narategithub in #1505
  • [b4.4] Fix "fill" handling in static decomposition by @narategithub in #1500
  • [b4.4] rdc_sampler fixes by @morrone in #1503
  • b4.4 bug: deprecate -t since it is gone in main, mixes poorly with -l, and exits when there is no preexisting log by @baallan in #1488
  • [b4.4] Fix value increment in test_sampler by @narategithub in #1499
  • [b4.4] Fix incorrect type mapping in store_sos by @narategithub in #1498
  • b4.4 deprecation fix: fix warnings about using reconnect instead of interval by @baallan in #1480
  • [b4.4] static decomposition man update by @narategithub in #1515
  • [b4.4] Update mean operation in decomp_static by @narategithub in #1511
  • Make ldmsd continue startup on producer's hostname resolution failure by @nichamon in #1492
  • [b4.4] record_sampler with_name option by @narategithub in #1510
  • Fix SEGV when 'group' not specified. by @narategithub in #1521
  • [b4.4] ldmsd row cache cleanup by @narategithub in #1525
  • Add multi-schema support to the json_stream_sampler by @tom95858 in #1516
  • Fix Sampler Advertisement bugs by @nichamon in #1529
  • Improve spank notifier transport error handling by @tom95858 in #1523
  • [b4.4] slingshot_metrics: don't return an error from sample() when a device's counter lookup fails by @morrone in #1535
  • slingshot_metrics: include rc in log message by @morrone in #1539
  • store_sos: Replace assertion with type mismatch error message by @nichamon in #1528
  • fix free of uninit pointer in json sampler callback err handling by @baallan in #1545
  • [b4.4] rowcache cleanup note and doc by @narategithub in #1541
  • Rename and update Sampler Advertisement man page by @nichamon in #1531
  • Fix set_route handler in ldmsd Python module and interface by @nichamon in #1552
  • Document format 3 and add option to dump format examples. by @baallan in #1513
  • Standardize error response in three request handlers by @nichamon in #1544
  • Add requests to enable/disable streams communication by @tom95858 in #1543
  • Yaml support by @nick-enoent in #1461
  • Fix python definition of STREAM_DISABLE message by @tom95858 in #1554
  • update test scripts to used reconnect by @baallan in #1556
  • Add the column header 'Type' to the prdcr_status output by @nichamon in #1560
  • Add 'cache_ip' to the optional attribute list of prdcr_add by @nichamon in #1558
  • Use peer's hostname and listen port for advertised producer's names by @nichamon in #1557
  • blob writer: reduce minimum rollover interval from 1 min to 5 sec, for latency by @baallan in #1561
  • Fix how thread_stats and updtr_status handle 'reset' in ldmsd_controller by @nichamon in #1562
  • Add the stats command to ldmsd_controller man page by @nichamon in #1563
  • Enchance permission string parsing by @morrone in #1565
  • fix errno report in blobwriter, stop warning about repeat subscribe by @baallan in #1567
  • Update YAML permissions in configuration by @nick-enoent in #1564
  • [b4.4] Add Cython3 support (still support Cytnon 0.29) by @narategithub in #1572
  • Bug fix when "stores" is not present in YAML config by @nick-enoent in #1576
  • Bug fix for updater parsing by @nick-enoent in #1581
  • Regex matching for YAML configuration by @nick-enoent in #1586
  • Correct stat() error check in decomposition library loading by @nichamon in #1585
  • Prevent race between update scheduling and set deletion by @nichamon in #1590
  • Prevent race between look-up and set deletion by @nichamon in #1592
  • Add parameter to account for local mode when using --generate-config-path by @nick-enoent in #1591
  • Prevent deadlock between xprt_list_lock and Zap credit lock by @nichamon in #1603
  • Fix json_entity_dump() in ovis_json by @nichamon in #1602
  • Replace assertions with log messages in prdcr_stream_status handling by @nichamon in #1601
  • Use maestro ver OVIS-4.4.5 to build containers in github action by @narategithub in #1605

Full Changelog: v4.4.4...b4.4.5

v4.4.4

03 Sep 16:45
Compare
Choose a tag to compare
Release OVIS-4.4.4

* Decomposition fixes
* Fix a leak in store_sos.c
* Workaround for GitHub workflow dropping CentOS7 support
* Make ldmsd_controller exit on connection error
* Update kokkos_appmon.c to not pad extra characters to node / name
* Add 'reconnect' to prdcr_start* to deprecate 'interval'
* Make LDMSD support interval strings
* Add 'reconnect' in prdcr_add to deprecate 'interval'
* Refector JSON parsing to use libjansson
* Support for operators in storage decomposition
* A port of `json_stream_sampler` from OVIS-4
* Add lsf/slurm to linux_proc_sampler.job test and fix nits in format 3
* Add format 3 for better commonality across start/end & resource managers.
* Add rollover to blob_stream_writer in b4.4

v4.4.3

13 May 16:30
Compare
Choose a tag to compare

This is Release 4.4.3

It includes the following features and fixes to 4.4.2:

  • Features

    • Updates to the Darshan Store fields
    • Updates to the heartbeat sampler
    • Updates to the store_avro_kafka store
    • Updates to the rdc_sampler
    • Updates to the slingshot sampler
  • Fixes

    • Decomposition fixes
    • Always generate and push store_avro_kafka schema to registry
    • Fix error path memory leaks in store_avro_kafka
    • Clarify dependencies in README.md
    • Fix crash in linux_proc_sampler

OVIS-4.4.2

23 Feb 20:28
Compare
Choose a tag to compare
This is release OVIS-4.4.2

The principal improvements in this release are as follows:

- Enhancements needed for production convenience
- New slingshot switch samplers designed to run on the slingshot switch
- Config changes in support of slingshot_metrics
- Fix mis-sizing of string in jbuf implementation
- Invalidate RDMA memory descriptors on set delete
- Remove memory leaks in the as_is decomp plugin
- Fix segfault caused by attempting to flush syslog
- Fix handling of DCGM string fields
- Fix `zap_sock` rejected endpoint leak

OVIS-4.4.1

10 Jan 23:02
Compare
Choose a tag to compare
OVIS Release 4.4.1

Release OVIS-4.3.11

10 Apr 14:19
Compare
Choose a tag to compare

What's Changed

  • Combined remote configuration into a single Python3 module called ldmsd_communicator
  • Disabled support for multiple lists in store_sos & store_csv when decomposition is not used
  • Added linux_proc_sampler streams store for SOS
  • Fixed wildcard address handling in ldms_xprt_listen_by_name()
  • Added a global message logging library (ovis_log) to incrementally replace message log pointers
  • Moved all sampler plugins to their own sub-directory
  • Deprecated asynchronous sampling mode
  • Added missing ldms_list_tail API
  • Added LDMS_V_TIMESTAMP accessor functions
  • Added support for the Slurm2 sampler in sampler_base
  • Made metric types signed in the lustre_mdc sampler
  • Added an Avro-Kafka Store to republish metric set data as Avro encoded Kafka messages
  • Added ovis_log messages to the LDMS authentication plugins
  • Added user_debug option to control job logging when debugging the slurm_notifier

Release OVIS-4.3.10

29 Jan 18:23
Compare
Choose a tag to compare
LDMS Release 4.3.10

LDMS 4.3.10 remains binary compatible with OVIS 4.3 releases back to
OVIS 4.3.3

If you are storing data from the the new samplers that use lists
and records, please do so with a storage decomposition configuration.
Storing these metric sets (procstat2, procnet2, slurm2, ...) without
a decomposition configuration can lead to confusing results.

There are 78 commits in this release that add new samplers and
include many scalability and resiliency improvements.

Release OVIS-4.3.9

29 Sep 21:18
Compare
Choose a tag to compare

Welcome to the long awaited OVIS 4.3.9

OVIS 4.3.9 includes very exciting new features including:

  • Variable length metric values
    ** lists
    ** records
  • Automatic scaling of I/O threads based on demand
  • Decomposition of LDMS metric set data into multiple storage rows based on a configurable decomposition strategy
  • ...

LDMS 4.3.9 remains binary compatible with OVIS 4.3 releases back to OVIS 4.3.3

OVIS-4.3.8

03 Feb 17:19
Compare
Choose a tag to compare
* Numerous bug fixes
* Multi-threaded low-level Zap transport event handers
* Command line option support in configuration files
* Summary set, transport, producer, and thread statistics
* Kokkos Appmon store
* Darshan store
* Non-blocking event logging
* Netlink notifier stream sampler

Release OVIS-4.3.7

19 Apr 21:03
Compare
Choose a tag to compare
This is OVIS-4.3.7 Release

New Features:
* Improved LDMSD Streams Performance
* Improved ib_verbs backward compatability
* Per-device procnet sampler
* Per-device ibmad sampler
* AMD GPU sampler
* Per-mount Lustre samplers
* Various reliability and resiliency improvements

Fixes:
* LDMSD Streams Memory Leak fixes
* Resolved confusing uGNI error messages on exit
* Fixed store rename issues in CSV store