139 lines
6.8 KiB
Plaintext
139 lines
6.8 KiB
Plaintext
Metadata-Version: 2.2
|
|
Name: nvidia-cusparselt-cu12
|
|
Version: 0.7.1
|
|
Summary: NVIDIA cuSPARSELt
|
|
Home-page: https://developer.nvidia.com/cusparselt
|
|
Author: NVIDIA Corporation
|
|
Author-email: cuda_installer@nvidia.com
|
|
License: NVIDIA Proprietary Software
|
|
Keywords: cuda,nvidia,machine learning,high-performance computing
|
|
Classifier: Topic :: Scientific/Engineering
|
|
Classifier: Environment :: GPU :: NVIDIA CUDA
|
|
Classifier: Environment :: GPU :: NVIDIA CUDA :: 12
|
|
Description-Content-Type: text/x-rst
|
|
Dynamic: author
|
|
Dynamic: author-email
|
|
Dynamic: classifier
|
|
Dynamic: description
|
|
Dynamic: description-content-type
|
|
Dynamic: home-page
|
|
Dynamic: keywords
|
|
Dynamic: license
|
|
Dynamic: summary
|
|
|
|
###################################################################################
|
|
cuSPARSELt: A High-Performance CUDA Library for Sparse Matrix-Matrix Multiplication
|
|
###################################################################################
|
|
|
|
**NVIDIA cuSPARSELt** is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
|
|
|
|
.. math::
|
|
|
|
D = Activation(\alpha op(A) \cdot op(B) + \beta op(C) + bias) \cdot scale
|
|
|
|
where :math:`op(A)/op(B)` refers to in-place operations such as transpose/non-transpose, and :math:`alpha, beta, scale` are scalars.
|
|
|
|
The *cuSPARSELt APIs* allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
|
|
|
|
**Download:** `developer.nvidia.com/cusparselt/downloads <https://developer.nvidia.com/cusparselt/downloads>`_
|
|
|
|
**Provide Feedback:** `Math-Libs-Feedback@nvidia.com <mailto:Math-Libs-Feedback@nvidia.com?subject=cuSPARSELt-Feedback>`_
|
|
|
|
**Examples**:
|
|
`cuSPARSELt Example 1 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul>`_,
|
|
`cuSPARSELt Example 2 <https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul_advanced>`_
|
|
|
|
**Blog post**:
|
|
|
|
- `Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt <https://developer.nvidia.com/blog/exploiting-ampere-structured-sparsity-with-cusparselt/>`_
|
|
- `Structured Sparsity in the NVIDIA Ampere Architecture and Applications in Search Engines <https://developer.nvidia.com/blog/structured-sparsity-in-the-nvidia-ampere-architecture-and-applications-in-search-engines/>`__
|
|
- `Making the Most of Structured Sparsity in the NVIDIA Ampere Architecture <https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31552/>`__
|
|
|
|
================================================================================
|
|
Key Features
|
|
================================================================================
|
|
|
|
* *NVIDIA Sparse MMA tensor core* support
|
|
* Mixed-precision computation support:
|
|
|
|
+--------------+----------------+-----------------+-------------+
|
|
| Input A/B | Input C | Output D | Compute |
|
|
+==============+================+=================+=============+
|
|
| `FP32` | `FP32` | `FP32` | `FP32` |
|
|
+--------------+----------------+-----------------+-------------+
|
|
| `FP16` | `FP16` | `FP16` | `FP32` |
|
|
+ + + +-------------+
|
|
| | | | `FP16` |
|
|
+--------------+----------------+-----------------+-------------+
|
|
| `BF16` | `BF16` | `BF16` | `FP32` |
|
|
+--------------+----------------+-----------------+-------------+
|
|
| `INT8` | `INT8` | `INT8` | `INT32` |
|
|
+ +----------------+-----------------+ +
|
|
| | `INT32` | `INT32` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `FP16` | `FP16` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `BF16` | `BF16` | |
|
|
+--------------+----------------+-----------------+-------------+
|
|
| `E4M3` | `FP16` | `E4M3` | `FP32` |
|
|
+ +----------------+-----------------+ +
|
|
| | `BF16` | `E4M3` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `FP16` | `FP16` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `BF16` | `BF16` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `FP32` | `FP32` | |
|
|
+--------------+----------------+-----------------+-------------+
|
|
| `E5M2` | `FP16` | `E5M2` | `FP32` |
|
|
+ +----------------+-----------------+ +
|
|
| | `BF16` | `E5M2` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `FP16` | `FP16` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `BF16` | `BF16` | |
|
|
+ +----------------+-----------------+ +
|
|
| | `FP32` | `FP32` | |
|
|
+--------------+----------------+-----------------+-------------+
|
|
|
|
* Matrix pruning and compression functionalities
|
|
* Activation functions, bias vector, and output scaling
|
|
* Batched computation (multiple matrices in a single run)
|
|
* GEMM Split-K mode
|
|
* Auto-tuning functionality (see `cusparseLtMatmulSearch()`)
|
|
* NVTX ranging and Logging functionalities
|
|
|
|
================================================================================
|
|
Support
|
|
================================================================================
|
|
|
|
* *Supported SM Architectures*: `SM 8.0`, `SM 8.6`, `SM 8.9`, `SM 9.0`, `SM 10.0`, `SM 12.0`
|
|
* *Supported CPU architectures and operating systems*:
|
|
|
|
+------------+--------------------+
|
|
| OS | CPU archs |
|
|
+============+====================+
|
|
| `Windows` | `x86_64` |
|
|
+------------+--------------------+
|
|
| `Linux` | `x86_64`, `Arm64` |
|
|
+------------+--------------------+
|
|
|
|
|
|
================================================================================
|
|
Documentation
|
|
================================================================================
|
|
|
|
Please refer to https://docs.nvidia.com/cuda/cusparselt/index.html for the cuSPARSELt documentation.
|
|
|
|
================================================================================
|
|
Installation
|
|
================================================================================
|
|
|
|
The cuSPARSELt wheel can be installed as follows:
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install nvidia-cusparselt-cuXX
|
|
|
|
where XX is the CUDA major version (currently CUDA 12 only is supported).
|