Overview#
Things that turn “natural language” into something worth processing
Tips, Tricks, & Unsolicited Data Science Opinions for the Aspiring Text Analyst
Chapter Contents
Expressions of Intent
Keywords and Tokens (and tidy-text!)
Finding Entities & Writing Relations (regex+ontology)
Driving with Data
Local Sequences and Probabilities (n-gram language models)
Global Frequencies and Context (vector semantics)
Perfectly Balanced? (distributional semantic embeddings) Reading Materials Referenced
Chapter Reading Materials
-
Finding content (regex, keywords, tokens) — Chapter 2.1-2.4,
Rules, WordNet — 18.1-18.3
Local frequency, markov models — 3.1-3.4,
Global Context, vector models — 6
Table of Contents