Releases: aeron-io/aeron
Releases · aeron-io/aeron
1.41.0
- Allow
NameResolver
to be configured for theConsensusModule
in order to support custom name resolution when configuring the ingress channel. - Delay election state transitions if there is an active leader to avoid unnecessary reset and new election.
- Make
AeronCluster.asyncConnect
work completely asynchronously. Don't report exceptions to the error handler that are used for async resources. - Add a system property and API to allow changing a directory where an Archive mark file (
archive-mark.dat
) is stored. - Check the state of the interface when trying to resolve the multicast interface. Only use interfaces that are up. Issue #1387
- CnC file length validation. Issue #1410
- Fix issue of not capturing return code when recording signal arrives after an error to the archive client.
- Support migrating segments to the beginning or end of an existing archive recording.
- [C] Fix issue of using transport after it had been removed.
- [Java] Fix concurrent close of receive destination counters on multi-destination subscriptions.
- [C] Fix
remove_if
methods on pointer value maps which previously could miss an item. - Add debug logging for clustered service acking.
- Add a specific error for archive replication failing to create a remote connection.
- Fix leak with Archive replay session if the async publication has a session clash.
- Shorten duration of cluster election after a leader has closed gracefully.
- [C] Fix image rejoin by swapping correcting cooldown map insertion and removal. PR #1338
- Candidate ballot for 5+ node cluster cannot be cut short on quorum otherwise most up to date member may not be elected.
- [C] Allow for attempted recreation of an Image if initial attempt fails. PR #1435
- Perform most replay validations before sending OK to the client so errors are synchronous when starting a replay.
- Delete all recording segment files when a recording is truncated to its start position.
- Close
ArchiveMarkFile
last when shutting down Archive to capture all errors. - [C++] Apply
std::forward
to fragment handler to avoid unnecessary copy. PR #1405 - Fix handling of padding greater than max message length in Archive replay.
- Add debug logging for Archive recording signals.
- Close log subscription first when clustered service is cleanly closed to drop follower out of flow control as soon as possible.
- Drop cluster follower as soon as possible out of flow control to allow cluster to progress when follower is cleanly closed.
- [C] Report timeout accurately when driver keepalive beyond timeout. PR #1429
- Add ability to run Archive with only IPC control channels for clients.
- Add
ClusterTool.isLeader
method. - Add
Image
toSubscription
before calling available handler rather than after. - Set URI in receiver counters to match subscription channel.
- Add cluster member node state file and migrate out state that needs to be persistent, such as
candidateTermId
and member list, so the mark file can be in /dev/shm. - [C] Fix issue with removing naming resolver neighbor that deleted adjacent memory.
- [C] Improve socket error handling on Windows.
- [Java] Add
toString()
to many Aeron classes to help debugging. - [C] Improve parsing of unsigned 32-bit integers.
- [C] Set max of resource free queue length and resource free limit to
INT32_MAX
. This stops them being incorrectly set to 0 by aeron_config_parse_uint32 when comparing against int32 0. PR #1421 - Deprecate cluster dynamic join feature. This is to be replaced with a more robust and user friendly premium offering.
- [C] Fix counter leak when subscription fails.
- [C] Fix spy channel memory leak when destination is removed for multi-destination subscription.
- [C] Fix channel memory leak on error when creating publications or subscriptions.
- Fix NPE on timeout exception for cluster client in some connect states.
- [Java] Improve efficiency of URI parsing.
- [C] Fix error messages with incorrect varargs.
- Warnings clean up in codebase to have less noisy CodeQL analysis.
- Support having mark files for
Archive
,ConsensusModule
, andClusteredServiceContainer
to be in alternative directory such a /dev/shm so timeouts can be avoided when recording writes queue up on a network filesystem. - Add timestamp params to stripped channel for pass through to Archive operations.
- Queue resource freeing operations in driver to avoid timeouts when unmapping operations are slow.
- [C++] Work around compiler concurrency bug for
AtomicArrayUpdater
that can impact client Subscriptions causing image list to become corrupted. - Improve javadoc for recording signal usage.
- Be strict on handling cluster leader liveness to the current leadership term.
- Only try unblocking a client command after liveness timeout to avoid "lost" commands. PR #1369
- Make archive counters unique so multiple archives can run on the same media driver.
- Truncate files after
ArchiveTool.compact
is invoked to free disk space. - Fix basic auction cluster tutorial configuration.
- Improve
ClusterConfig
sample to allow for ingress configuration. - Add counters for the number of active recordings or replays in an Archive.
- Add counters for reporting on read and write operations in an Archive.
- Support allowing a
ClusteredService
being started before theConsensusModule
. - Improve false sharing protections for more consistent latency.
- Simplify
ReplayMerge
samples to not require entity tags. - Add batch script for launching low-latency media driver on Windows.
- Support message lengths greater than MTU in ping pong samples.
- Fix options handling in
cping
sample. - Improve handling of timeouts in cluster elections for more robust state transitions when network is unstable. Effects are more pronounced in 5+ member clusters.
- [Java] Add
Aeron.addAsyncSubscripiton
for non-block setup. - Compute source identity of images more precisely based on channel configuration.
- Improved handling of out of disk space errors.
- Support taking a cluster consensus module snapshot when member names are greater than MTU in length.
- Allow a follower to veto a member being elected cluster leader if they believe the leader is not valid. This is important in 5+ node clusters.
- Extend debugging for voting in cluster elections.
- Increment error counter when invalid version exceptions occur.
- Handle backpressure from commands between dedicated threads in driver with controlled polls to avoid live locks.
- [C] Add support for controlled poll operations on SPSC and MPSC ring buffers.
- Increase command queues to allow for more concurrent active changes in publications and images.
- Serve cluster backup queries from followers to take load from the leader.
- [C] Fix build when dot is used as thousands separator. PR #1372
- Upgrade to JUnit 5.9.2.
- Upgrade to BND 6.4.0.
- Upgrade to ByteBuddy 1.14.3.
- Upgrade to Mockito 4.11.0.
- Upgrade to Version 0.46.0.
- Upgrade to Gradle 7.6.
- Upgrade to SBE 1.28.1.
- Upgrade to Agrona 1.18.0.
1.40.0
- Memory align allocated buffers in
PublicationTest
so it works on Apple M1 processors. - Check that
NoOpLock
is only allowed to be used when using Aeron client in invoker mode. - Handle case of a delayed concurrent offer to a publication in which other threads have raced terms ahead without throwing an exception.
- Collapse term appenders into publications to reduce memory footprint and avoid data dependent loads.
- Short circuit Image polling operation when bound limit is less than current position to prevent term overrun.
- Add different aliases for consensus module/service container subscriptions. PR #1366.
- Stop an active cluster log replay when
ClusterBackup
is closed rather than waiting for timeout. - Send unavailable counter events to Aeron clients when a client closes or times out.
- Allow Consensus Module Agent to be run via an Invoker in addition to having its own thread.
- Apply liveness checks to Archive and Cluster mark files so that multiple instances cannot be run in the same directory and corrupt files.
- [Java] Use fixed format for timestamps in agent debug logs.
- Allow Archive replicate to overwrite all metadata for an empty recording.
- [C] Handle log buffer files with
term_length == AERON_LOGBUFFER_TERM_MAX_LENGTH
on Windows. PR #1360. - [C] Fix inclusion of symbols for debug builds on Windows.
- Remove
localhost
defaults for Archive and Cluster to help avoid mis-configuration in production. PR #1356. - Await 'REPLICATE_END' when catching up as a follower across multiple leadership terms to avoid clashing session-id.
- Allow setting of receive socket buffer and window on cluster log channel subscribers. PR #1345.
- Fix application of send socket buffer lengths as configured when using MDC.
- Fix
ArchiveTool.dump
when fragment length is set <= 0. - Capture closing sessions into snapshot so session close event is lost on cluster shutdown.
- Remove brackets from counters labels to make it easier for extract to Prometheus.
- Send cluster client session open acknowledgement before appending to the log to avoid race with service sending egress on open event. Issue #1351.
- [C] Fix off by one error local socket address into channel indicator counter.
- Add protocol version support to cluster consensus protocol.
- Add more context to error messages on Archive
ReplaySession
. PR #1349. - Apply strict validation of consensus module snapshot state when messages are offered from clustered services. A number of customers have not been strict with all cluster nodes being deterministic and doing exactly the same thing which can result in corrupted and diverged snapshots.
- Consensus module state snapshot can be inspected with the
describe-latest-cm-snapshot
option toClusterTool
. - If a consensus module snapshot is shown to be corrupt it may be fixed by running
ConsensusModuleSnapshotPendingServiceMessagesPatch
and if non-support customers wish to have help then they can contact sales@aeron.io. The patch can fix the leader and the fixed snapshot then needs to be replicated to the followers which can be done withAeronArchive.replicate
using the correct recording ids. - Add a tool to replicate a specific recording between archives. PR #1363.
- [C++] use
getAsString
calls for pollers for record descriptors for channel fields. Add test from PR #1348. - Add
ClusteredService.doBackgroundWork
which can be used for maintaining external connections beyond ingress and egress. - Increase default message timeout from 5 to 10 seconds for Archive clients.
- Add EOS flag to status messages (SMs) once a stream is totally received so the sender can take clean up action.
- When EOS status message is received by a sender then allow the publication linger on unicast to be cut short so resources are received sooner.
- When EOS status message is received by a sender then remove the receiver from flow control for multicast and MDC with tagged and min FC.
- Fix the closing of session specific subscriptions to prevent resource leak.
- Add scripts for testing raw network performance on Windows.
- Close egress from cluster on change of leader so clients can detect it before a new leader is elected.
- Don't timeout and close cluster client session if quorum cannot be temporarily reached.
- Add logging support for
ClusterBackup
state changes. - Close cluster clients when complete cluster is restarted.
- Support automatic reconnect from cluster client when the same leader is re-elected after a net split or temporarily loosing quorum.
- Add authentication for
ClusterBackup
to a cluster. - Validate Archive mark file length before reading when mapped read-only to avoid access violations.
- Preserve iteration order for cluster client session based on session id so snapshots can have binary compatibility.
- Capture leadership term id for cluster backup queries.
- Account for padding when sweeping pending services messages to avoid out of bounds exception.
- Prevent
-1
leadership term ids appearing in theRecordingLog
. - Allow Archive replication and replay request to specify session level file IO max buffer length for throttling a stream.
- Add support for custom app version validation to clustered services with
AppVersionValidator
. - Add false sharing protection to
DutyCycleTracker
. - Update doc on
ReplayMerge
to indicate theAeronArchive
client should not be shared. Issue #1340. - Upgrade to Versions 0.43.0.
- Upgrade to Mockito 4.8.1.
- Upgrade to Google Test 1.12.1.
- Upgrade to JUnit 5.9.1.
- Upgrade to ByteBuddy 1.12.18.
- Upgrade to Gradle 7.5.1.
- Upgrade to SBE 1.27.0.
- Upgrade to Agrona 1.17.1.
Java binaries can be found here.
1.39.0
- [Java] Fix
IllegalStateException
that could exist for an MDS subscription on the rapid recycling ofReplayMerge
operations. - [C] Align ring buffer implementations and feature set with Java.
- [Java] Make sure that C and Java are aligned on resend window. Re-instate the max message length being accounted in the bottom of the resend window for Java.
- Add duty cycle duration tracking to all agents across all modules.
- [C++] Improve efficiency by reducing the number of copy operations for fragment assembly when a stream has many fragmented messages.
- [C] Default to CLOCK_REALTIME for send/receive timestamps.
- [Java] Add setters for send/receive timestamp clocks to the
MediaDriver.Context
. - Fix handling of fragment assemble when
reliable=false
is set for a channel and loss occurs. - Improve handling of short sends on MDC publication to backoff from overloading a socket.
- Add round-robin facility to MDC publication for increased fairness.
- [Java] Publish
aeron-test-support
package as a JAR. - [Java] Downgrade "unknown replay" errors to warnings for cluster catchup.
- [Java] Add
appVersion
to event logging for consensus module and check for correct app version when replaying log. - [Java] Prevent timeout warnings with cluster dynamic nodes and log replication.
- [Java] Add cluster dynamic join state change logging events.
- Add counters for the number of receivers in min and tagged flow control strategies.
- [Java] Avoid race unmapping buffers on concurrent close of media drivers.
- Modify flow control strategies to have new method for when elicited setups are sent and add counters manager to
init
methods. Modify Min and Tagged flow control to use setupsnd-lmt
as min position until timeout or receiver added on SM. - [Java] Account for possible padding in log buffer when checking for bottom resend window for retransmits.
- [C] Flush output when printing configuration.
- [C] Raise warning on failure to setup media timestamping.
- [Java] Update
recordingId
on any signal with a valid recording id when handling signals for snapshot replication. - [Java] When attempting
ClientSession.tryClaim
, ensure that there is enough buffer space when returning a mocked offer for a follower. - [C] Ensure publication image is released before it it freed.
- [C] Fix
scanf
that could result in buffer overflow when parsing HTTP for configuration. - [Java] Change default cluster session timeout from 5 to 10 seconds.
- Prevent receiver joining min/tagged flow control if they are more than a window behind.
- [C] Add sample for working with large messages.
- [Java] Add logging event for appending a cluster session close.
- Upgrade to BND 6.3.1.
- Upgrade to Mockito 4.6.1.
- Upgrade to ByteBuddy 1.12.10.
- Upgrade to SBE 1.26.0.
- Upgrade to Agrona 1.16.0.
Java binaries can be found here.
1.38.2
C Driver/Client Release Only
- [C] Driver - Ensure the correct control address is used when adding multicast destinations with MDS.
- [C] Driver - Allow thread affinity on CPU 0.
- [C] API - Check handler parameter before polls. Check images for NULL before polling images.
No Java binaries for this release.
1.38.1
1.38.0
- [Java/C/C++] Ensure driver is in ready state when requesting termination from client.
- [Java] Reduce allocation when listing archive directories to find segment files.
- [Java] Add flag to
ClusterTerminationException
to indicate if the termination was expected. - [Java] Expand agent logging for consensus module operations, be careful if using
all
for cluster events as volume may now be greatly expanded. - [C] Use connect and send to improve latency in C driver when sending data at lower volumes.
- [Java] Improve reliability of transferring snapshots to
ClusterBackup
via archive replication with improved re-try semantics. - [Java] Support adding an IPC ingress destination to cluster leader for ingress optimisation.
- [Java] Create replay publication asynchronously to reduce latency pauses in Archive.
- [Java/C++] Add new
RecordingSignal.REPLICATE_END
recording signal to indicate end of a replication operation. - [Java/C++] Make delivery of
RecordingSignal
s to archive client sessions reliable and ordered. - [Java] Support specifying interface with endpoints in cluster config for multi-home members. PR #1290.
- [C] Add thread affinity support to C media driver. PR #1298.
- [C/C++] Update CMake build to use
FetchContent
instead ofExternalProject
. - [C/C++] Fix build on ARM with clang. PR #1291.
- [Java] Improve progress tracking and retry semantics for cluster members catching up in elections.
- [C/C++] Enable support for parallel build on Windows.
- [Java] Add ability to async remove/close a publication by registration id.
- [Java] Fix publication leak in
ClusterBackup
when backup response timesout. - [C] Improve agent logging in C media driver to be more consistent with Java drive.
- [C] Allow for configurable IO vector for
sendmmsg
andrecmmsg
in the C media driver. PR #1285. - [C] Support static linking of the C media driver. PR #1261.
- [Java/C] Support ability to extend concurrent publications by setting initial values to be equivalent to exclusive publications.
- [Java] Fixed bug in
PriorityHeapTimerService.cancelTimerByCorrelationId
. PR #1281. - [C++] Improve error reporting in Archive client when a response is not received.
- [Java/C++] Additional user specified delegating Invoker for Archive client to be used for progressing actions when awaiting responses.
- [Java] Rename Archive segment files before delete to avoid races with streams being extended.
- [C++] Fixes for
ChannelUriStringBuilder
. PR #1268. - [Java] Add admin command so that cluster snapshot can be triggered remotely via an authorised session.
- [Java] Support authorisation of service actions with a new API
AuthorisationService
. The hooks for this have been added to Archive requests and Cluster Snapshot requests. - [Java/C] Support adding spy and IPC destinations to MDS subscriptions so destinations can be all channel types.
- [Java] Ensure Cluster will start on a consistent initial term id when racing to create first term.
- [Java] Prevent unnecessary creation of
RecordingLog
files when usingClusterTool
. - [Java] Add cluster session timeout to set adjusted when debugging.
- [C] Fixes to prevent message duplication and unnecessary sending of messages in MDS.
- Minimum CMake version was raised to 3.14.
- Upgrade to HdrHistogram_c 1.11.4.
- Upgrade to BND 6.2.0.
- Upgrade to Versions 0.42.0.
- Upgrade to Mockito 4.4.0.
- Upgrade to ByteBuddy 1.12.9.
- Upgrade to Shadow 7.1.2.
- Upgrade to Gradle 7.4.2.
- Upgrade to JUnit 5.8.2.
- Upgrade to Checkstyle 9.3.
- Upgrade to SBE 1.25.2.
- Upgrade to Agrona 1.15.0.
Java binaries can be found here.
1.37.0
- [Java] Improve error messages on channel conflicts.
- [C] Remove replicated command prefix in debug agent logging.
- [Java] Use async publication add for async connect to an Archive to minimise the impact of name resolution pauses.
- [Java] Make
ClusterConfig.calculatePort
public. - [C] Correct channel length on metadata for stream counters.
- [Java] Extract channel value from counter label when longer than what will fit in metadata for
StreamStat
. - [Java] Relocate HdrHistogram and ByteBuddy in
aeron-all
JAR. - Upgrade to BND 6.1.0.
- Upgrade to ByteBuddy 1.12.2.
- Upgrade to Mockito 4.1.0.
- Upgrade to SBE 1.25.1.
- Upgrade to Agrona 1.14.0.
Java binaries can be found here.
1.36.0
- [C/C++] Handle SIGINT in code samples.
- [Java] Retry adding cluster member publication in election canvass to address late name registration in containers such as Kubernetes.
- [Java] Log resolution failures in Cluster as warning event rather than exception.
- [Java] Fix timestamp when publishing new leadership terms. PR #1254.
- [C] Use separate transport bindings for the conductor doing name resolution. PR #1253.
- [Java/C++] Allow the setting of a
RecordingSignalConsumer
in the archive client context which is delegated to when processing control channel responses. - [C] Improve error handling and logging on Windows when dealing with network system calls.
- [Java] Verify cluster log is always contiguous when joining a new image in a service.
- [Java] Fix race condition when sending
RecordingSignal.SYNC
during archive replication. PR #1252. - [Java/C] Improve choice of subscription for choosing channel URI when labelling receiver counters.
- [Java] Sort counters displayed with
StreamStat
so they are logically grouped. - [Java] Improve error messages so they are more contextual.
- [Java] Extend debugging logging for archive and cluster operations.
- [Java] Check for errors when cluster snapshots are replayed.
- [Java] Improve tracking of cluster commit position when replicating during an election.
- [Java] Allow replication to skip over empty leadership terms due to failed elections when initially starting cluster.
- [C] Better handling of finding user for default
aeron.dir
whenUSER
is not set in environment. - [Java/C++] Reduce cache invalidations when using pollers for archive and cluster response streams.
- [Java] Add support for changing cluster log params by truncated to the latest snapshot and resetting configuration. PR #1233.
- [Java] Don't catch subclasses of
Throwable
and instead catchException
so that the JVM can handle subclasses ofError
. - [Java/C] Improve validation of ports used in channel URIs.
- [C] Support building on Apple ARM.
- [Java] Add priority heap backing implementation for cluster timers as an alternative to the default timer wheel implementation
- Upgrade to Mockito 4.0.0.
- Upgrade to Shadow 7.1.0.
- Upgrade to BND 6.0.0.
- Upgrade to Gradle 7.2.
- Upgrade to ByteBuddy 1.12.1.
- Upgrade to Checkstyle 9.1.
- Upgrade to SBE 1.25.0.
- Upgrade to Agrona 1.13.0.
Java binaries can be found here.
1.35.1
1.35.0
- [Java] Fix truncation of linger timeout in
ChannelUriStringBuilder
which lead to a short linger of Archive replays. - [Java] Remove incorrect publication linger validation.
- [C] Add sanitize build for MSVC and fix issues found.
- [C] Add missing free of counters associated with Cubic congestion control.
- [C++] Fix missing use of
FragmentAssembler
in Archive response and clean up type warnings. - [Java] Fix packaging declaration in POM file.
- [Java] Separate thread factories for replay and recording agents in Archive for when setting thread affinity is required.
- [Java] Javadoc improvements.
- [C] Agent logging fixes. PR #1198.
- [Java/C] Support a list of bootstrap neighbours for fault tolerance in gossip protocol for driver naming.
- [C] Handle connection reset without error when polling a socket on Windows.
- [C++] Don't progress with archive connect until response subscription is available. PR #1196.
- [Java] Use async publication adding for response channels from the Archive and response channels for egress and backup queries from the Cluster to reduce latency pauses for existing operations.
- [Java] Ability to add publications asynchronously to Aeron client.
- [C/Java] Support timestamping of packets for channel send and receive plus media/hardware receive timestamping if supported. PR #1195.
- [Java] Ensure termination hook is run on unexpected interrupt during cluster election.
- [Java] Reset cluster election state if in election and an exception happens outside the election work cycle.
- [Java] Finish deleting pending archive recording for deletion on shutdown.
- [Java] Ensure cluster log recording has stopped before restarting the election process to avoid spurious election failure from past recording stopping.
- Upgrade to Google Test 1.11.0.
- Upgrade to Mockito 3.11.2.
- Upgrade to ByteBuddy 1.11.9.
- Upgrade to Gradle 7.1.1.
- Upgrade to SBE 1.24.0.
- Upgrade to Agrona 1.12.0.
Java binaries can be found here.