BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:1.0 BEGIN:VEVENT DTSTART:20141118T231500Z DTEND:20141119T010000Z LOCATION:New Orleans Theater Lobby DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: Sparse matrix-vector and multiple-vector multiplications (SpMV and SpMM) are performance bottlenecks in numerous applications. We implemented two SpMM kennels to integrate in our library of auto-tuned kernels for GPUs. Our kernels use registers to exploit data reuse in SpMM. DIA-SpMM targets structured matrices and ELL-SpMM targets matrices with uniform row lengths. Work is continuing on SpMM kernels for unstructured matrices.=0A=0AExecuting on NVIDIA Kepler Tesla K40m, DIA-SpMM is 2.4x faster than NVIDIA CUSP DIA-SpMV. ELL-SpMM is 2.8x faster than CUSP ELL-SpMV.=0A=0ADIA-SpMM is 5.2x faster than the highly optimized NVIDIA CUSPARSE CSR-SpMV. The maximum speedup is 6.5x. ELL-SpMM is 3.9x faster than CUSPARSE CSR-SpMV. The maximum speedup is 8.3x.=0A=0ADIA-SpMM is 2x faster than CUSPARSE CSR-SpMM. ELL-SpMM is 1.6x faster.=0A=0AFor structured matrices, DIA-SpMM on the K40m GPU is 7.2x faster than Intel MKL CSR-SpMV on a dual socket 10-core Intel Ivy Bridge E5-2690. The maximum speedup is 12.3x. SUMMARY:Performance of Sparse Matrix-Multiple Vectors Multiplication on Multicore and GPUs PRIORITY:3 END:VEVENT END:VCALENDAR