Tools for Data Science#

So far in this (over-long) introduction, we’ve introduced the ways that technical communities find their heading (e.g. engineering practice), and why roads are important to go, efficiently, in that direction (e.g. data-driven fields need data infrastructure). The last thing we need for this metaphor are vehicles to get us to our destination: domain-specific tools (methods, algorithms, frameworks, etc.).

To keep things interesting, key tools will be introduced in the context of their use, from here-on-out. Rather than go into great detail, then, this page will store a maintained table of tools, with any notes on their use shortly after.

Tables, Vectors, & Graphs#

Basic ways of representing and manipulating data.

Tool Name

Description

Docs/Tutorials

R option

Notes

Numpy/Scipy

matrix (dense or sparse) manipulation and routines

quickstart

matrix

Pandas

All-purpose tabular data loader, manipulator, and writer.

docs

data.frame

Important! See below

pyJanitor

Convenient methods to (sanely) clean up your data-frame, in-line

XArray

N-dimensional extension of Pandas’ “named arrays”, based on NetCDF

docs

tidync

NetworkX

Graphs (vertices+edges) as general-purpose dictionaries with methods.

[]

tidygraph

graph-tool

C-based network analysis, focused on stochastic block models.

[]

Machine Learning#

Tool Name

Description

Docs/Tutorials

R option

Notes

Scikit-Learn

Standard for ML in Python

Scikit-Learn Course

Natural Language Processing#

Tool Name

Description

Docs/Tutorials

R option

Notes

NLTK

Spacy/Textacy

Gensim

flair

huggingface

cleantext

Visualization#

Tool Name

Description

Docs/Tutorials

R option

Notes

matplotlib.pyplot

seaborn

pyviz

see below

Notes#

Pandas#

For an excellent series on using pandas more effectively (which we all need to do), see this fantastic series from Tom Augsperger: Modern Pandas. For this course, ensure you read the method chaining post! It will dramitically alter (read: enhance) your code quality and maintainability while using pandas for data munging.

Jax#

Pyviz#