SciPy Tutorial · SciPy Tutorial

SciPy Spatial Data

Learn all about SciPy Spatial Data in this comprehensive tutorial.

5 min read advanced
  • Spatial data refers to data that is represented in a geometric space.
  • A Triangulation of a polygon is to divide the polygon into multiple triangles with which we can compute an area of the polygon.
  • A convex hull is the smallest polygon that covers all of the given points.
  • KDTrees are a datastructure optimized for nearest neighbor queries.
  • There are many Distance Metrics used to find various types of distances between two points in data science, Euclidean distsance, cosine distsance etc.
  • Find the euclidean distance between given points.
  • Is the distance computed using 4 degrees of movement.
  • Is the value of cosine angle between the two points A and B.
  • Is the proportion of bits where two bits are different.

Working with Spatial Data

Spatial data refers to data that is represented in a geometric space.

E.g. points on a coordinate system.

We deal with spatial data problems on many tasks.

E.g. finding if a point is inside a boundary or not.

SciPy provides us with the module scipy.spatial, which has functions for working with spatial data.

Triangulation

A Triangulation of a polygon is to divide the polygon into multiple triangles with which we can compute an area of the polygon.

A Triangulation with points means creating surface composed triangles in which all of the given points are on at least one vertex of any triangle in the surface.

One method to generate these triangulations through points is the Delaunay() Triangulation.

python
Note: Note: The simplices property creates a generalization of the triangle notation.

Convex Hull

A convex hull is the smallest polygon that covers all of the given points.

Use the ConvexHull() method to create a Convex Hull.

python

KDTrees

KDTrees are a datastructure optimized for nearest neighbor queries.

E.g. in a set of points using KDTrees we can efficiently ask which points are nearest to a certain given point.

The KDTree() method returns a KDTree object.

The query() method returns the distance to the nearest neighbor and the location of the neighbors.

python

Distance Matrix

There are many Distance Metrics used to find various types of distances between two points in data science, Euclidean distsance, cosine distsance etc.

The distance between two vectors may not only be the length of straight line between them, it can also be the angle between them from origin, or number of unit steps required etc.

Many of the Machine Learning algorithm's performance depends greatly on distance metrices. E.g. "K Nearest Neighbors", or "K Means" etc.

Let us look at some of the Distance Metrices:

Euclidean Distance

Find the euclidean distance between given points.

python

Cityblock Distance (Manhattan Distance)

Is the distance computed using 4 degrees of movement.

E.g. we can only move: up, down, right, or left, not diagonally.

python

Cosine Distance

Is the value of cosine angle between the two points A and B.

python

Hamming Distance

Is the proportion of bits where two bits are different.

It's a way to measure distance for binary sequences.

python

Module quiz

2 questions
1

Which of the following is true about SciPy Spatial Data?

2

What is the most common pitfall when working with SciPy Spatial Data?

Answer all questions to submit.