Rap Theory


The material of rap flow is the sound of language. Thus, to understand rap flow, we need to understand a number of concepts from linguistics: the scientific study of language. The linguistic theory we need falls into two categories:
1) The sounds which make up syllables. 2) The sound of language across multiple syllables, which is called prosody.


A syllable is the smallest individual language sound we can speak. Syllables are used to form wordssome words consist of a single syllable while other words (like the word "syllable") contain multiple syllables. However, from a musical perspective, syllables are the more fundamental unit than words: in other words, Syllables are the musical building blocks of rap flow.

Syllables themselves consist of one or more phonemes. Though we all learn some phonemes when we learn to read, English spelling actually does a poor job of representing the sounds of spoken English: As we all know, some sounds can be spelled in multiple ways (like the sound starting "cat" and "kite"), and some spellings can result in different sounds (like the sounds starting "cat" and "city"). In fact, though we only have 26 letters in our alphabet, modern General American English actually uses over forty different sounds! In order to more accurately transcribe language, linguists invented the International Phonetic Alphabet, or IPA for short.

The set of phonemes used on rapscience is based on my own dialect of American English, which is pretty close to "generic" American. Forty-one phonemes are distinguished: 15 vowels and 26 consonants.


Simple vowels:

i = beat; ɪ = bit; ɛ = bet; æ = bat; ʌ = above; ə = above; ɚ = bird; u = boot; ʊ = book; ɑ = bought;

Diphthong (double) vowels:

= bait; = bite; = bout; ɔɪ = boy; = boat;



t / d = tip / dip; p / b = pit / bit; k / g = kill / gill; ʔ = gotta like "gah-uh"; ɾ = gotta like "gah-duh";


s / z = sip / zip; f / v = fan / van; ʃ / ʒ = sure / seizure; θ / ð = thin / this; h = hand;


tʃ / dʒ = chin / gin;


n = sun; m = sum; ŋ = sung;


j = yes; w = west; l = led; r = red;

Syllable Structure

Syllables can be broken down into three parts: the onset, nucleus, and coda. Each part of the syllable can contain zero or more phonemes.

the Nucleus

The core of syllable is the nucleus: all syllables must have a nucleus, or else they are not really a syllable. Some syllables consist of a just a nucleus, with no onset or coda, as in "I", "oh", or "ooh." The nucleus of a syllable is nearly always a vowel. However, in English some continuous phonemes can serve as the syllable nucleus. The most common non-vowel syllable nuclei are:
l, as in "li-ttle." r, as in "sir." m, as in "hmm." n, as in "isn't." Another possible example is the "sh" (ʃ) when you hush someone by saying "shhh."

the Onset

The onset of a syllable includes any phonemes that come before the nucleus. Onsets can contain anywhere from zero to three consonants. Zero phonemes, of course, simply means there is no onset, as in words like "ark," "oops," or "on." Examples of single phoneme onsets are "cat" (k) and "talk" (t). When two or more consonants are packed into the same part of the syllables, it's called a consonant cluster. Some examples of syllables with consonant clusters in their onsets are "brand" (br), "snap" (sn), and "strain" (str).

the Coda

The coda of a syllable includes any phonemes that come after the nucleus. The coda of a syllable consists of between zero and four phonemes. Examples of zero-phoneme codas are "do," "go," and "bra." Examples of single-phoneme cods are "squat" (t) and "man" (n). Some examples of syllables with consonant clusters in their codas are "snapped" (pt), "length" (ŋkθ), and "strengths" (ŋkθs).

Not all syllables are created equal. Some syllables are spoken in a way which emphasises them relative to other syllables. This is called stress. Stress is essential to the meaning of words: an "import" is a thing (noun) that you can "import" (verb). Similarly, I can "perfect" something, unless that thing is already "perfect." Generally, English tends to roughly alternate stressed and unstressed syllables in a regularish rhythm.

Stress is created in several different ways: Stressed syllables are pronounced slightly longer and louder that unstressed syllables. Also, the vowels in stressed syllables are always pronounced normally, whereas unstressed vowels are reduced, which means they are pronounced more like the sound "uh" (ə). For instance, if you listen to yourself saying either "object" or "object" you will notice that you pronounce the "ob" differently when it is unstressed...most likely it sounds like "uhb," not "ahb."

Rhyme is a special perceptual relationship between two or more spoken utterances. Rhyme occurs when one or more repeated syllables share some, but not all, their phonemes.

The most common definition of rhyme is when two words have all the same phonemes except the very first syllable onset.

Prosody is a term for sonic patterns which stretch across multiple syllables, especially rhythm and pitch. Here at rapscience.net we mostly discuss rhythm from a musical perspective on our music theory page; Therefore, we won't talk much about "rhythmic prosody." Instead, we'll focus on an aspect of prosody that is more important to rap flow: pitch prosody or intonation.

Many people mistakenly believe that rap flow has no pitchthat rap flow is "monotone." This misconception occurs because rap flow tends to utilize linguistic pitch (a.k.a. intonation) instead of musical pitch.

Musical pitch versus linguistic pitch

In musical pitch (singing or instrumental), a limited set of stable frequenciescalled a "scale"are used. The pitch of a singers' voice will jump between the individual frequencesthe "steps"of the scale. The following graphic shows the fundamental frequency contour of Eminem's voice, as he sings the first verse of "Hailie's Song":

The vocal pitch of part of the first verse of Eminem's "Halie's Song."

Notice how the pitch of his voice makes big jumps between relatively flat, stable areas, which align (more or less) with the musical letter names shown on the left side. Move your mouse over the image to see where these "stable" pitches occur.

Linguistic intonation is very different; the pitch of the voice doesn't pick out stable repeated pitches, but instead smoothly slides between nonspecific pitches. It doesn't matter exactly what pitch happens at any given time (there are no scales) what matters is how the pitch slides and changesthe pitch contour. This sort of pitch intonation happens in normal, day-to-day speach, but is also the norm in rap flow. This next image shows the pitch of Eminem's voice when rapping later in the same song:

The vocal pitch of part of the first verse of Eminem's "Halie's Song."

Notice that this time there are no stable flat areas in the pitch contour. However, there is a lot of pitch movement going on; it's not a monotone!

Intonation contours are essential to language, as they give it a shape and structure that helps communicate meaning. There are two important ways in which pitch prosody is relevant to rap flow:
1) Pitch accents 2) Prosodic units

Pitch Accents

By slightly raising (or lowering) pitch we can emphasize certain words or syllables; creating pitch accents. By combining pitch accents with basic syllable stress, we can create quite a variety of levels of syllabic prominence. In rap flow, this comes to play in the creation of rhythmic layers.

Prosodic Units

One of the most important functions of prosody is breaking speach into logical unitsletting us know when one thought ends, and another begins. Thus, prosody acts like punctuation (periods, commas, etc.) in written langauge. Rhythm is essential to prosodic segmentation: we pause between ideas, and often slow down and elongate the last syllables in a unit. However, pitch is also essential. Most utterences start at a relatively high pitch and end on a relatively low pitch, with one or two pitch accents somewhere in between. This basic prosodic contour (high accent low), is essential to understanding spoken English. This can be further elaborated at a micro level creating boundary contours, or at a macro level, as a part of a general pitch declination,

Boundary Contours

One of the most important ways that emcees create boundaries in their flow is by repeating a clear intonation contour, what are called boundary tones. The most common boundary contour is a slow drop in pitch on a syllablethe syllable is often rhythmically elongated too. The following example shows the pitch of Biggie Small's voice at the beginning of the song "Suicidal Thoughts."

Illustration of phrase-final boundaries in Biggie Smalls's "Suicidal Thoughts"

Notice how the pitch drops off in a slow curve on the words "hell" and "tell." These pitch drops help us hear that these are two distinct phrases. The matching contours on "hell" and "tell" also help make the rhyme stronger. (We can also see one pitch accent in each phrase, on the words "fuck" and "shit" respectively.

A dropping boundary contour, like the one in "Suicidal Thoughts," is by far the most common one in rap flow. However, emcees do sometimes create other types of boundary contours. For example, in the song "Money Maker" Ludacris ends each of the first four phrases of each verse with a dramatiec upward sliding contour: Shake, shake, shake your money ma-ker Like you've been shakin' it for some pa-per Took your momma nine months to make ya Might as well shake what your momma gave ya

Pitch Declination

So far we've discussed how individual utterances (like sentences) usually start with a high pitch and drop to a low boundary. However, this same basic highlow pattern can actually be stretched across multiple utterences. Each phrase has it's own internal highlow shape, and at the same time the overall height of each utterence drops lower and lowerthis overall drop is called pitch declination. Each phrase starts a little bit lower than the previous phrase started, and the low boundary contour at the end of each utterance is lower than the previous one. The resulting shape looks something like this:

In this image, each of the three hills in the contour represents a whole phrase. At the same time, the three phrases as a whole form one group. These multiple levels of prosodic segmentation is called the phonological hierarchy. The biggest level is "full intonational phrase"; Subphrases are called "intermediate intonational phrases."

For a real world example, consider this zoomed out look at the pitch contour of Eminem's voice in the first verse and chorus of "Without Me":

Illustration of pitch declination patterns in Eminem's "Without Me."

Notice how there's a jagged sawtooth pattern, with the pitch starting high and then gradually dropping down. Each of these six sawtooths actually takes up four measures of music. The highpoints which start each of the six lines go with the words:
1) I've cre-a-ted a monster... 2) Some vod-ka that'll jump your heart... 3) You wai-ted this long so stop debating... 4) So the F.C.C. won't let me be... 5) So, come on and dip, bum on your lips... 6) Now, this looks like a job for me...

Listen to the song to see if you can hear the pattern!

Singing vs Rapping

So far we've focused on how rap flow is full of linguistic pitch >intonation. However, the truth is that rap flow is very flexibleemcees do somethings sing, and often the boundary between rapping and singing is blurry. Emcees can mix stable, clear musical pitch with smooth sliding linguistic intonation. A great example is Andre 3000's verse in OutKast's "Da Art of Storytellin' (Part 1)": in this verse Andre 3000 switches between sections that are clearly "rapped," with no musical pitch, and other sections which are more like, but not quite, singing.

Example of pitched rap in OutKast's "Da Art of Storytellin'."

In this passage, Andre 3000 pretty clearly hits the musical pitches D, F#, and A. However, the overall feel of the passage is still more rap like, unlike the sung verse of "Hailie's Song" above.

