OSU Micro Benchmarks v5.4.3 (07/23/18)
* Bug Fixes
    - Fix buffer overflow in osu_reduce_scatter
        - Thanks to Matias A Cabral @Intel for reporting the issue and patch
        - Thanks to Gilles Gouaillardet for creating the patch
    - Fix buffer overflow in one sided tests
        - Thanks to John Byrne @HPE for reporting this issue
    - Fix buffer overflow in multi threaded latency test
    - Fix issues with freeing buffers for one-sided tests
    - Fix issues with freeing buffers for CUDA-enabled tests
    - Fix warning messages for benchmarks that do not support CUDA and/or
      Managed memory
        - Thanks to Carl Ponder@NVIDIA for reporting this issue
    - Fix compilation warnings

OSU Micro Benchmarks v5.4.2 (04/30/18)
* New Features & Enhancements
    - Add "-W --window-size" option to osu_bw and osu_bibw

* Bug Fixes
    - Fix issues with out of tree builds
        - Thanks to Adam Moody @LLNL for reporting the issue
    - Fix PGI and XLC builds by using the correct generated cpp files
    - Fix crash with osu_mbw_mr for large messages
    - Fix minor error in Makefile
    - Fix compilation warnings

OSU Micro Benchmarks v5.4.1 (02/19/18)
* New Features & Enhancements
    - Enhanced help messages and runtime parameters

* Bug Fixes
    - Fix compile and runtime issues in PGAS benchmarks (OpenSHMEM, UPC, and
      UPC++) exposed by PGI compiler
    - Added warning message to display memory limitation when running benchmarks
      with very large messages
    - Fix memory leaks for device buffers
    - Fix issues with type overflows
    - Fix an issue with pWork symmetric heap allocation in oshm_reduce benchmark
        - Thanks to Naveen Ravichandrasekaran@Cray for the report

OSU Micro Benchmarks v5.4.0 (10/30/17)
* New Features & Enhancements
    - Introduce new OpenSHMEM Non-blocking Benchmarks
        * osu_oshm_get_mr_nb
        * osu_oshm_get_nb  
        * osu_oshm_put_mr_nb
        * osu_oshm_put_nb
        * osu_oshm_put_overlap
    - Automatically build OpenSHMEM 1.3 benchmarks when library support
      is detected
    - Add ability to specify min and max message size for point-to-point 
      and one-sided benchmarks
    - Enhanced error handling for MPI benchmarks
    - Code clean-ups and unification of utility functions across benchmarks
    - Enhanced help messages and runtime parameters

* Bug Fixes
    - Fix compile-time warnings 
    - Fix peer calculation formula in UPC/UPC++ benchmarks
    - Fix correct number of warmup iterations in osu_barrier benchmark

OSU Micro Benchmarks v5.3.2 (09/08/16)
* New Features & Enhancements
    - Allow specifying very large message sizes (>2GB) for collective benchmarks

* Bug Fixes
    - Fix compilation errors due to missing type casting

OSU Micro Benchmarks v5.3.1 (08/08/16)
* New Features & Enhancements
    - Add option to control whether CUDA kernels are built
    - Add runtime option to specify number of threads for osu_latency_mt

* Bug Fixes
    - Check if -lrt or -lpthread is needed
    - Fix compilation warnings
    - Fix non-blocking collective memory leak
    - Correct documentation for osu_multi_lat

OSU Micro Benchmarks v5.3 (03/25/16)
* New Features & Enhancements
    - Introduce new UPC++ Benchmarks
        * osu_upcxx_allgather
        * osu_upcxx_alltoall
        * osu_upcxx_async_copy_get
        * osu_upcxx_async_copy_put
        * osu_upcxx_bcast
        * osu_upcxx_gather
        * osu_upcxx_reduce
        * osu_upcxx_scatter

* Bug Fixes
    - Determine page size at runtime in OpenSHMEM benchmarks (fixes issue seen
      on OpenPower machines)

OSU Micro Benchmarks v5.2 (02/05/16)
* New Features & Enhancements
    - Support for CUDA-Aware Managed memory
        * osu_bibw
        * osu_bw
        * osu_latency
        * osu_allgather
        * osu_allgatherv
        * osu_allreduce
        * osu_alltoall
        * osu_alltoallv
        * osu_bcast
        * osu_gather
        * osu_gatherv
        * osu_reduce
        * osu_reduce_scatter
        * osu_scatter
        * osu_scatterv
    - Add ability to specify minimum message size in addition to maximum
      message size for all collective benchmarks

OSU Micro Benchmarks v5.1 (11/10/15)
* New Features & Enhancements
    - Introduce non-blocking collective v-variants as well as ialltoallw
        * osu_iallgatherv
        * osu_ialltoallv
        * osu_igatherv
        * osu_iscatterv
        * osu_ialltoallw
    - Add support for benchmarking GPU-Aware non-blocking collectives.  Overlap
      can be computed using either CPU or GPU kernels
        * osu_iallgather
        * osu_iallgatherv
        * osu_ialltoall
        * osu_ialltoallv
        * osu_ialltoallw
        * osu_ibcast
        * osu_igather
        * osu_igatherv
        * osu_iscatter
        * osu_iscatterv
    - Allow users the ability to specify zero warmup iterations

* Bug Fixes
    - fix openacc pragma

OSU Micro Benchmarks v5.0 (08/17/15)
* New Features & Enhancements
    - Support for a set of non-blocking collectives. The benchmarks can display
      both the amount of time spent in the collectives and the amount of
      overlap achievable
        * osu_iallgather
        * osu_ialltoall
        * osu_ibarrier
        * osu_ibcast
        * osu_igather
        * osu_iscatter
    - Add startup benchmarks to facilitate the ability to measure the amount of
      time it takes for an MPI library to complete MPI_Init
        * osu_init
        * osu_hello
    - Allocate and align data dynamically
        - Thanks to Devendar Bureddy from Mellanox for the suggestion
    - Add options for number of warmup iterations [-x] and number of iterations
      used per message size [-i] to MPI benchmarks
        - Thanks to Devendar Bureddy from Mellanox for the suggestion

* Bug Fixes
    - Do not truncate user specified max memory limits
        - Thanks to Devendar Bureddy from Mellanox for the report and patch

OSU Micro Benchmarks v4.4.1 (10/30/14)
* Bug Fixes
    - adding missing MPI3 guard for WIN_ALLOCATE
    - capture getopt return value in an int instead of char

OSU Micro Benchmarks v4.4 (8/23/14)
* New Features & Enhancements
    - Support for MPI-3 RMA (one-sided) and atomic operations using GPU buffers
        * osu_acc_latency
        * osu_cas_latency
        * osu_fop_latency
        * osu_get_bw
        * osu_get_latency
        * osu_put_bibw
        * osu_put_bw
        * osu_put_latency

* Bug Fixes
    - remove use of AC_FUNC_MALLOC to avoid undefined rpl_malloc reference
    - add missing upc benchmarks for make dist rule

OSU Micro Benchmarks v4.3.1 (6/20/14)
* Bug Fixes
    - Fix typo in MPI collective benchmark help message
    - Explicitly mention that -m and -M parameters are specified in bytes

OSU Micro Benchmarks v4.3 (3/24/14)
* New Features & Enhancements
    - This new suite includes several new (or updated) benchmarks to measure
      performance of MPI-3 RMA communication operations with options to select
      different window creation (WIN_CREATE, WIN_DYNAMIC, and WIN_ALLOCATE) and
      synchronization functions (LOCK, PSCW, FENCE, FLUSH, FLUSH_LOCAL, and
      LOCK_ALL) in each benchmark
        * osu_acc_latency
        * osu_cas_latency
        * osu_fop_latency
        * osu_get_acc_latency
        * osu_get_bw
        * osu_get_latency
        * osu_put_bibw
        * osu_put_bw
        * osu_put_latency
    - New UPC Collective Benchmarks
        * osu_upc_all_barrier
        * osu_upc_all_broadcast
        * osu_upc_all_exchange
        * osu_upc_all_gather
        * osu_upc_all_gather_all
        * osu_upc_all_reduce
        * osu_upc_all_scatter
    - Build MPI3 benchmarks when MPI library support is detected

* Bug Fixes
    - Add shmem_quiet() in OpenSHMEM Message Rate benchmark to ensure all
      previously issued operations are completed
    - Allocate pWrk from symmetric heap in OpenSHMEM Reduce benchmark

OSU Micro Benchmarks v4.2 (11/08/13)
* New Features & Enhancements
    - New OpenSHMEM benchmarks
        * osu_oshm_fcollect
    - Enable handling of GPU device buffers in all MPI collective benchmarks
    - Add device binding for OpenACC benchmarks

* Bug Fixes
    - Add upc_fence after memput in osu_upc_memput benchmark
    - Correct CUDA configuration example in README
    - Fix several warnings

OSU Micro Benchmarks v4.1 (8/24/13)
* New Features & Enhancements
    - New OpenSHMEM benchmarks
        * osu_oshm_barrier
        * osu_oshm_broadcast
        * osu_oshm_collect
        * osu_oshm_reduce
    - New MPI-3 RMA Atomics benchmarks
        * osu_cas_flush
        * osu_fop_flush

OSU Micro Benchmarks v4.0.1 (5/06/13)
* Bug Fixes
    - Fix several warnings

OSU Micro Benchmarks v4.0 (4/16/13)
* New Features & Enhancements
    - Support buffer allocation using OpenACC and CUDA in osu_alltoall,
      osu_gather, and osu_scatter benchmarks
    - Limit amount of memory allocated by collective benchmarks dynamically
      based on number of processes
        - Memory limit can also be explicitly set by the user through the -m
          option
    - Support for 64-bit atomic operations in osu_oshm_atomics

* Bug Fixes
    - Fix numerical overflow error with reporting bandwidth in osu_mbw_mr

OSU Micro Benchmarks v3.9 (2/28/13)
* New Features & Enhancements
    - Support buffer allocation using OpenACC in GPU benchmarks
    - Use average time instead of max time for calculating the bandwidth and
      message rate in osu_mbw_mr 
        - Thanks to Alex Mikheev from Mellanox for the patch
* Bug Fixes
    - Properly initialize host buffers for DH and HD transfers in GPU
      benchmarks

OSU Micro Benchmarks v3.8 (11/07/12)
* New Features & Enhancements
    - New UPC benchmarks
        * osu_upc_memput
        * osu_upc_memget

OSU Micro Benchmarks v3.7 (9/07/12)
* New Features & Enhancements
    - New OpenSHMEM benchmarks
        * osu_oshm_get
        * osu_oshm_put_mr
        * osu_oshm_atomics
        * osu_oshm_put
    - Organize installation directory according to benchmark type
* Bug Fixes
    - Destroy cuda context before exiting

OSU Micro Benchmarks v3.6 (4/30/12)
* New Features & Enhancements
    - New collective benchmarks
        * osu_allgather
        * osu_allgatherv
        * osu_allreduce
        * osu_alltoall
        * osu_alltoallv
        * osu_barrier
        * osu_bcast
        * osu_gather
        * osu_gatherv
        * osu_reduce
        * osu_reduce_scatter
        * osu_scatter
        * osu_scatterv
* Bug Fixes
    - Fix GPU binding issue when running with HH mode

OSU Micro Benchmarks v3.5.2 (3/22/12)
* Bug Fixes
    - Fix typo which led to use of incorrect buffers

OSU Micro Benchmarks v3.5.1 (2/02/12)
* New Features & Enhancements
    - Provide script to set GPU affinity for MPI processes
* Bug Fixes
    - Removed GPU binding after MPI_Init to avoid switching context

OSU Micro Benchmarks v3.5 (11/09/11)
* New Features & Enhancements
    - Extension of osu_latency, osu_bw, and osu_bibw benchmarks to evaluate the
      performance of MPI_Send/MPI_Recv operation with NVIDIA GPU device and
      CUDA support
        - This functionality is exposed when configured with --enable-cuda
          option
    - Flexibility for using buffers in NVIDIA GPU device (D) and host memory (H)
    - Flexibility for selecting data movement between D->D, D->H and H->D

OSU Micro Benchmarks v3.4 (09/13/11)
* New Features & Enhancements
    - Add passive one-sided communication benchmarks
    - Update one-sided communication benchmarks to provide shared memory hint
      in MPI_Alloc_mem calls
    - Update one-sided communication benchmarks to use MPI_Alloc_mem for buffer
      allocation
    - Give default values to configure definitions (can now build directly with
      mpicc)
    - Update latency benchmarks to begin from 0 byte message
* Bug Fixes
    - Remove memory leaks in one-sided communication benchmarks
    - Update benchmarks to touch buffers before using them for communication
    - Fix osu_get_bw test to use different buffers for concurrent communication
      operations
    - Fix compilation warnings
