#
## Copyright (C) Mellanox Technologies Ltd. 2001-2015.  ALL RIGHTS RESERVED.
## Copyright (C) UT-Battelle, LLC. 2014-2015. ALL RIGHTS RESERVED.
## Copyright (C) ARM Ltd. 2017-2018.  ALL RIGHTS RESERVED.
##
## See file LICENSE for terms.
##
#

## 1.6.1 (September 23, 2019)
Features:
- Add Bull Atos HCA device IDs
- Add Azure Pipelines testing

Bugfixes:
- Multiple static checker fixes
- Remove pkg.m4 dependency
- Multiple clang static checker fixes
- Fix mem type support with generic datatype

## 1.6.0 (July 17, 2019)
Features:
- Modular architecture for UCT transports
- ROCm transport re-design: support for managed memory, direct copy, ROCm GDR
- Random scheduling policy for DC transport
- Optimized out-of-box settings for multi-rail
- Added support for OmniPath (using Verbs)
- Support for PCI atomics with IB transports
- Reduced UCP address size for homogeneous environments

Bugfixes:
- Multiple stability and performance improvements in TCP transport
- Multiple stability fixes in Verbs and MLX5 transports
- Multiple stability fixes in UCM memory hooks
- Multiple stability fixes in UGNI transport
- RPM Spec file cleanup
- Fixing compilation issues with most recent clang and gcc compilers
- Fixing the wrong name of aliases
- Fix data race in UCP wireup
- Fix segfault when libuct.so is reloaded - issue #3558
- Include Java sources in distribution
- Handle EADDRNOTAVAIL in rdma_cm connection manager
- Disable ibcm on RHEL7+ by default
- Fix data race in UCP proxy endpoint
- Static checker fixes
- Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS
- Fix malloc hooks test
- Fix checking return status in ucp_client_server example
- Fix gdrcopy libdir config value
- Fix printing atomic capabilities in ucx_info
- Fix perftest warmup iterations to be non-zero
- Fixing default values for configure logic
- Fix race condition updating fired_events from multiple threads
- Fix madvise() hook

Tested configurations:
- RDMA: MLNX_OFED 4.5, distribution inbox drivers, rdma-core 22.1
- CUDA: gdrcopy 1.3.2, cuda 9.2, ROCm 2.2
- XPMEM: 2.6.2
- KNEM: 1.1.3

## 1.5.1 (April 1, 2019)
Bugfixes:
- Fix dc_mlx5 transport support check for inbox libmlx5 drivers - issue #3301
- Fix compilation warnings with gcc9 and clang
- ROCm - reduce log level of device-not-found message

## 1.5.0 (February 14, 2019)
Features:
- New emulation mode enabling full UCX functionality (Atomic, Put, Get)
  over TCP and RDMA-CORE interconnects that don't implement full RDMA semantics
- Non-blocking API for all one-sided operations. All blocking communication APIs marked
  as deprecated
- New client/server connection establishment API, which allows connected handover between workers
- Support for rdma-core direct-verbs (DEVX) and DC with mlx5 transports
- GPU - Support for stream API and receive side pipelining
- Malloc hooks using binary instrumentation instead of symbol override
- Statistics for UCT tag API
- GPU-to-Infiniband HCA affinity support based on locality/distance (PCIe)

Bugfixes:
- Fix overflow in RC/DC flush operations
- Update description in SPEC file and README
- Fix RoCE source port for dc_mlx5 flow control
- Improve ucx_info help message
- Fix segfault in UCP, due to int truncation in count_one_bits()
- Multiple other bugfixes (full list on github)

Tested configurations:
- InfiniBand: MLNX_OFED 4.4-4.5, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2

## 1.4.0-rc2 (October 23, 2018)

Features:
- Improved support for installation with latest ROCm
- Improved support for latest rdma-core
- Adding support for CUDA IPC for intra-node GPU
- Added support for CUDA memory allocation cache for mem-type detection
- Added support for latest Mellanox devices
- Added support for Nvidia GPU managed memory
- Added support for multiple connections between the same pair of workers
- Added support large worker address for client/server connection establishment
  and INADDR_ANY
- Added support for bitwise atomics operations

Bugfixes:
- Performance fixes for rendezvous protocol
- Memory hook fixes
- Clang support fixes
- Self tl multi-rail fix
- Thread safety fixes in IB/RDMA transport
- Compilation fixes with upstream rdma-core
- Multiple minor bugfixes (full list on github)
- Segfault fix for a code generated by armclang compiler
- UCP memory-domain index fix for zero-copy active messages

Tested configurations:
- InfiniBand: MLNX_OFED 4.2-4.4, distribution inbox drivers, rdma-core
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
- Multiple bugfixes (full list on github)

Known issues:
  #2919 - Segfault in CUDA support when KNEM not present and CMA is active
  intra-node RMA transport. As a workaround user can disable CMA support at
  compile time: --disable-cma. Alternatively user can remove CMA from UCX_TLS
  list, for example: UCX_TLS=mm,rc,cuda_copy,cuda_ipc,gdr_copy.

## 1.3.1 (August 20, 2018)

Bugfixes:
- Prevent potential out-of-order sending in shared memory active messages
- CUDA: Include cudamem.h in source tarball, pass cudaFree memory size
- Registration cache: fix large range lookup, handle shmat(REMAP)/mmap(FIXED)
- Limit IB CQE size for specific ARM boards
- RPM: explicitly set gcc-c++ as requirement
- Multiple bugfixes (full list on github)
Tested configurations:
- InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2

## 1.3.0 (February 15, 2018)

Features:
- Added stream-based communication API to UCP
- Added support for GPU platforms: Nvidia CUDA and AMD ROCm software stacks
- Added API for client/server based connection establishment
- Added support for TCP transport
- Support for InfiniBand tag-matching offload for DC and accelerated transports
- Multi-rail support for eager and rendezvous protocols
- Added support for tag-matching communications with CUDA buffers
- Added ucp_rkey_ptr() to obtain pointer for shared memory region
- Avoid progress overhead on unused transports
- Improved scalability of software tag-matching by using a hash table
- Added transparent huge-pages allocator
- Added non-blocking flush and disconnect for UCP
- Support fixed-address memory allocation via ucp_mem_map()
- Added ucp_tag_send_nbr() API to avoid send request allocation
- Support global addressing in all IB transports
- Add support for external epoll fd and edge-triggered events
- Added registration cache for knem
- Initial support for Java bindings
Bugfixes:
- Multiple bugfixes (full list on github)
Tested configurations:
- InfiniBand: MLNX_OFED 4.2, inbox OFED drivers.
- CUDA: gdrcopy 1.2, cuda 9.1.85
- XPMEM: 2.6.2
- KNEM: 1.1.2
Known issues:
  #2047 - UCP: ucp_do_am_bcopy_multi drops data on UCS_ERROR_NO_RESOURCE
  #2047 - failure in ud/uct_flush_test.am_zcopy_flush_ep_nb/1
  #1977 - failure in shm/test_ucp_rma.blocking_small/0
  #1926 - Timeout in mpi_test_suite with HW TM
  #1920 - transport retry count exceeded in many-to-one tests
  #1689 - Segmentation fault on memory hooks test in jenkins


## 1.2.2 (January 4, 2018)

Main:
- Support including UCX API headers from C++ code
- UD transport to handle unicast flood on RoCE fabric
- Compilation fixes for gcc 7.1.1, clang 3.6, clang 5

Details:
- When UD transport is used with RoCE, packets intended for other peers may
  arrive on different adapters (as a result of unicast flooding).
- This change adds packet filtering based on destination GIDs. Now the packet
  is silently dropped, if its destination GID does not match the local GID.
- Added a new device ID for InfiniBand HCA
- [packaging] Move `examples/` and `perftest/` into doc
- [packaging] Update spec to work on old distros while complaint with Fedora
  guidelines
- [cleanup] Removed unused ptmalloc version (2.83)
- [cleanup] Fixup license headers

## 1.2.1 (August 28, 2017)

- Compilation fixes for gcc 7.1
- Spec file cleanups
- Versioning cleanups

## 1.2.0 (June 15, 2017)

Supported platforms
  - Shared memory: KNEM, CMA, XPMEM, SYSV, Posix
  - VERBs over InfiniBand and RoCE.
    VERBS over other RDMA interconnects (iWarp, OmniPath, etc.) is available
    for community evaluation and has not been tested in context of this release
  - Cray Gemini and Aries
  - Architectures: x86_64, ARMv8 (64bit), Power64
Features:
  - Added support for InfiniBand DC and UD transports, including accelerated verbs for Mellanox devices
  - Full support for PGAS/SHMEM interfaces, blocking and non-blocking APIs
  - Support for MPI tag matching, both in software and offload mode
  - Zero copy protocols and rendezvous, registration cache
  - Handling transport errors
  - Flow control for DC/RC
  - Dataypes support: contiguous, IOV, generic
  - Multi-threading support
  - Support for ARMv8 64bit architecture
  - A new API for efficient memory polling
  - Support for malloc-hooks and memory registration caching
Bugfixes:
  - Multiple bugfixes improving overall stability of the library
Known issues:
  #1604 - Failure in ud/test_ud_slow_timer.retransmit1/1 with valgrind bug
  #1588 - Fix reading cpuinfo timebase for ppc bug portability training
  #1579 - Ud/test_ud.ca_md test takes too long too complete bug
  #1576 - Failure in ud/test_ud_slow_timer.retransmit1/0 with valgrind bug
  #1569 - Send completion with error with dc_verbs bug
  #1566 - Segfault in malloc_hook.fork on arm bug
  #1565 - Hang in udrc/test_ucp_rma.nonblocking_stream_get_nbi_flush_worker bug
  #1534 - Wireup.c:473 Fatal: endpoint reconfiguration not supported yet bug
  #1533 - Stack overflow under Valgrind 'rc_mlx5/uct_p2p_err_test.local_access_error/0' bug
  #1513 - Hang in MPI_Finalize with UCX_TLS=rc[_x],sm on the bsend2 test bug
  #1504 - Failure in cm/uct_p2p_am_test.am_bcopy/1 bug
  #1492 - Hang when using polling fd bug
  #1489 - Hang on the osu_fop_latency test with RoCE bug
  #1005 - ROcE problem with OMPI direct modex - UD assertion

## 1.1.0 (September 1, 2015)

Workarounds:
Features:
  - Added support for AM based on FIFO in `mm` shared memory transport
  - Added support for UCT `knem` shared memory transport (http://knem.gforge.inria.fr)
  - Added support for UCT `mm/xpmem` shared memory transport (https://github.com/hjelmn/xpmem)


Bugfixes:
Known issues:


## 1.0.0 (July 22, 2015)

Features:

  - Added support for UCT `cma` shared memory transport (Cross-Memory Attatch)
  - Added support for UCT `mm` shared memory transport with mmap/sysv APIs
  - Added support for UCT `rc` transport based on Infiniband/RC with verbs
  - Added support for UCT `mlx5_rc` transport based on Infiniband/RC with accelerated verbs
  - Added support for UCT `cm` transport based on Infiniband/SIDR (Service ID Resolution)
  - Added support for UCT `ugni` transport based on Cray/UGNI
  - Added support for Doxygen based documentation generation
  - Added support for UCP basic protocol layer to fit PGAS paradigm (RMA, AMO)
  - Added ucx_perftest utility to exercise major UCX flows and provide performance metrics
  - Added test script for jenkins (contrib/test_jenkins.sh)
  - Added packaging for RPM/DEB based linux distributions (see contrib/buildrpm.sh)
  - Added Unit-tests infractucture for UCX functionality based on Google Test framework (see test/gtest/)
  - Added initial integration for OpenMPI with UCX for PGAS/SHMEM API
    (see: https://github.com/openucx/ompi-mirror/pull/1)
  - Added end-to-end testing infrastructure based on MTT (see contrib/mtt/README_MTT)
