the Musical Corpus of Flow

A fundamental tenant of science is it's basis on hard evidencedata. The main dataset used throughout is a corpus of digital rap transcriptions called the Musical Corpus of Flow, or for short. currently contains transcriptions of 124 popular rap songs. The current transcriptions are freely available for download on the downloads page.


There are two important steps in creating a good scientific dataset:

1) Finding and selecting the data. 2) Measuring and recording the data.

Selecting Flow Data

Selecting which data points to use is called sampling, because the data you have is always just a "sample" of all the data in the world. The sampling scheme behind is explained on the sampling pageinformation about the 124 songs which have already been sampled is shared there as well.

Measuring Flow Data

The approach on is to "measure" rap flow by creating reductive symbolic transcriptions of rap. The features of flow which are transcribed, and details of how they are encoded, are described on the encoding page.

The current version of is 0.5. The 124 songs sampled in 0.5 are approximately half of the complete sampling target. 0.5 transcriptions do not yet include full prosodic or and syntactic encodings. 1.0targeted for completion in mid 2017will include the complete set of encodings described on the encoding page. Later versions of (2.0, 3.0, etc.) will add transcriptions of the remainder of the sampling target.

