SciPy Tutorial · SciPy Tutorial

SciPy Sparse Data

Learn all about SciPy Sparse Data in this comprehensive tutorial.

5 min read advanced
  • Sparse data is data that has mostly unused elements (elements that don't carry any information ).
  • SciPy has a module, scipy.
  • We can create CSR matrix by passing an arrray into function scipy.
  • Viewing stored data (not the zero items) with the data property:

What is Sparse Data

Sparse data is data that has mostly unused elements (elements that don't carry any information ).

It can be an array like this one:

[1, 0, 2, 0, 0, 3, 0, 0, 0, 0, 0, 0]

Note: Sparse Data: is a data set where most of the item values are zero. Dense Array: is the opposite of a sparse array: most of the values are not zero.

In scientific computing, when we are dealing with partial derivatives in linear algebra we will come across sparse data.

How to Work With Sparse Data

SciPy has a module, scipy.sparse that provides functions to deal with sparse data.

There are primarily two types of sparse matrices that we use:

CSC - Compressed Sparse Column. For efficient arithmetic, fast column slicing.

CSR - Compressed Sparse Row. For fast row slicing, faster matrix vector products

We will use the CSR matrix in this tutorial.

CSR Matrix

We can create CSR matrix by passing an arrray into function scipy.sparse.csr_matrix().

python

The example above returns:

Sparse Matrix Methods

Viewing stored data (not the zero items) with the data property:

python

Counting nonzeros with the count_nonzero() method:

python

Removing zero-entries from the matrix with the eliminate_zeros() method:

python

Eliminating duplicate entries with the sum_duplicates() method:

python

Converting from csr to csc with the tocsc() method:

python
Note: Note: Apart from the mentioned sparse specific operations, sparse matrices support all of the operations that normal matrices support e.g. reshaping, summing, arithemetic, broadcasting etc.

Module quiz

2 questions
1

Which of the following is true about SciPy Sparse Data?

2

What is the most common pitfall when working with SciPy Sparse Data?

Answer all questions to submit.