Overview#

Things that turn “natural language” into something worth processing

Tips, Tricks, & Unsolicited Data Science Opinions for the Aspiring Text Analyst

Chapter Contents

  1. Expressions of Intent

    • Keywords and Tokens (and tidy-text!)

    • Finding Entities & Writing Relations (regex+ontology)

  2. Driving with Data

    • Local Sequences and Probabilities (n-gram language models)

    • Global Frequencies and Context (vector semantics)

    • Perfectly Balanced? (distributional semantic embeddings) Reading Materials Referenced

Chapter Reading Materials

  • Speech & Language Processing

    • Finding content (regex, keywords, tokens) — Chapter 2.1-2.4,

    • Rules, WordNet — 18.1-18.3

    • Local frequency, markov models — 3.1-3.4,

    • Global Context, vector models — 6

Table of Contents