-----------------------------------------
Release Notes for Trilinos Package Kokkos
-----------------------------------------

Trilinos 10.6.1:
----------------

* Fixed some ansi/pednatic build warnings in the node tests.
* Added sync() method to all nodes, modified node timings test to use it, documented it in Kokkos Node API module. Added cudaSyncThreads() to native CUDA/Thrust timings. Broke node timings into multiple exectuables, to keep TBB and TPI from stepping on each other.
* Added native threading examples to Kokkos Node API tests for timing comparisons, kokkos/NodeAPI/test.
* Added ifdefed TPI_Block() and TPI_Unblock() around GEMM calls in Kokkos::DefaultArithmetic<TPINode>
* Added missing clearStatistics() method to CUDANodeMemoryModel

Trilinos 10.6:
--------------

Significant internal/external changes to both the Kokkos Node API and the
Kokkos Linear Algebra library. Most of these changes are centered around 
CrsGraph and CrsMatrix and their kernels. Some exciting developments regarding 
sparse mat-vec on multi-core/GPUs did not make it in this release; look for more
development in 10.6.1.

* Lots of additional documentation, testing and examples in Kokkos.
* Improved debugging in Kokkos Node API
* Added isHostNode static bool to all Kokkos nodes (false for ThrustGPUNode)
* Imported select Teuchos memory management classes/methods into the Kokkos namespace.
* Minor bug fixes, warnings addressed.

Changes breaking backwards compatbility:
* Kokkos CRS classes (i.e., CrsGraph and CrsMatrix) are now templated on the 
  sparse kernel operator, allowing specialization of the class data 
  according to the implementation of the kernel.
* Kokkos CRS classes now contain host-allocated memory, instead of node-allocated
  memory. This means that use of these buffers by the node will, in general, require
  a copy. 


Trilinos 10.4:
--------------

(*) Change to sparse ops
- Combined DefaultSparseSolve and DefaultSparseMultiply into DefaultSparseOps
- Added new traits class, DefaultKernels, to specify default sparse and default block sparse objects
- Added new classes supporting Tpetra::VbrMatrix

(*) MultiVector ops
- some methods were missing "inline" specifier. This could (should?) improve performance for TPINode.

(*) CUDANodeMemoryModel
- support for tracing/profiling data movement between host and GPU

