MTGJSON#

MTGJSON is an open-source project that catalogs all Magic: The Gathering cards in a portable format. A dedicated group of fans maintains and supplies data for a variety of projects and sites in the community. Using an aggregation process we fetch data between multiple resources and approved partners, and combine all this data in to various JSON files that you can learn about and download from this website.

mtgjson.com

import dvc.api as dvc
from pathlib import Path
import pandas as pd
from IPython.display import Code, HTML

import hvplot.pandas
import seaborn as sns
import pandera as pa

data_dir = Path(dvc.Repo().find_root())/'resources'/'data'/'mtg'
df = pd.read_feather(data_dir/'mtg.feather')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 10
      7 import seaborn as sns
      8 import pandera as pa
---> 10 data_dir = Path(dvc.Repo().find_root())/'resources'/'data'/'mtg'
     11 df = pd.read_feather(data_dir/'mtg.feather')

AttributeError: module 'dvc.api' has no attribute 'Repo'

We’ve done some work to extract out a useful tabular form from the original (nested) json format. It is now stored as a feather file to speed up read-times.

Validation is done using the following pandera schema:

from tlp.data import mtg, styleprops_longtext
from inspect import getsourcelines
Code(''.join(getsourcelines(mtg.MTGSchema)[0]), language='python')

There are key text columns that will be of use to this course, specifically, namely:

name

the name of the card

text

the rules-text displayed on the main “body” of the card-face.

flavor-text

the “story” and “fantasy” bit, which may not always be present, and is usually prose.

keywords

special, meaningful terms that appear in the “text”, which have gameplay impacts

(df[['name', 'text','flavor_text']]
 .sample(10, random_state=2).fillna('').style
 .set_properties(**styleprops_longtext(['text','flavor_text']))
 .hide_index()
)

There are a number of other potential sources of “fortuitous data”, as well:

%%HTML
<link href="//cdn.jsdelivr.net/npm/mana-font@latest/css/mana.min.css" rel="stylesheet" type="text/css" />
mtg.style_table(df.sample(10, random_state=2),
                        hide_columns=['text','flavor_text'])

Symbols are for vizualization only, with the original data consisting of lists of letters: ['W', 'U'], etc. “Mana font” is made by Andrew Gioia

(df
 .set_index('release_date')
 .sort_index()
 .resample('Y')
 .apply(lambda grp: grp.flavor_text.notna().sum()/grp.shape[0])
).hvplot( rot=45, title='What fraction of cards have Flavor Text each year?')