Sparse linear algebra kernels are integral to various domains, including graph applications, artificial intelligence, and high-performance computing. Unlike dense kernels that operate with regular patterns, sparse kernels depend on sparse indirection, intersection, and union operations. Modern architectures, which rely heavily on large-scale vector units for computational power, offer high levels of parallelism, but are designed for regular workloads. Hence, they struggle with these irregular patterns and operations caused by sparse data. As such, our goal is to improve the capabilities of these modern architectures for computing scientific and AI sparse applications.
While indirection can be supported with scatter-gather operations, intersection and union rely on costly index-matching operations. All these operations are related to memory, making the corresponding applications memory bound. Hence, we aim to design a novel memory architecture that handles the aforementioned sparse operations near cache, reducing the bandwidth and computational requirements. By transforming the index-matching problem into one of hash lookup, we can extend the memory architectures of large-scale vector units so that they can efficiently support all sparse operations.
With this memory architecture, we aim to achieve the following impact on scientific and AI applications:
1. Enhanced performance: the proposed architecture's ability to accelerate the sparse operations will directly translate to improved performance in the target applications.
2. Area reduction: the proposed architecture requires less area. This has a three-fold impact:
a. Cost savings at manufacturing.
b. Lower energy consumption due to static power.
c. Better sustainability due to requiring fewer resources (e.g., silicon).
Acknowledging the irregular nature of sparse data, we aim to test our novel memory architecture with real-world matrices from the SuiteSparse Matrix Collection. This way, we can show that our approach can improve the computational efficiency of actual scientific and AI applications working with real data.
Ultimately, our proposed memory architecture aims to bridge the gap between current hardware limitations and the demands of modern sparse applications, paving the way for more efficient and sustainable computing solutions for scientific and AI applications.