MIDIfiles are more or less MIDI data plus timing stored in a file.
Each file contains one or more parallel streams of events.
All files contain a header chunk followed by one or more track chunks.
MIDIfile data is stored in 'BigEndian' format with multibyte numbers coming high-byte first in the file - hence, multibyte data has to be byte-swapped for PC's.
Variable Length Numbers
Many values in MIDI files are stored in 'variable length' multi-byte format. The numbers are stored at 7 bits per byte with the most-significant end coming first. All bytes except the last have the most significant bit set. Hence, values between 0 and 127 fit simply into a single byte. Numbers up to 14 bits in length need two bytes and so on.
|0x001FFFFF||FF FF 7F|
|0x08000000||C0 80 80 00|
|0x0FFFFFFF||FF FF FF 7F|
Although the format theoretically could store numbers of any length, a limit of 32 bits is imposed - requiring 5 bytes.
|0x0000||4||char||Magic||'MThd' (See Note #1)|
|0x0004||4||int||Length of rest of header||6|
|0x0008||2||short||Format||See Note #2|
|0x000A||2||short||Num track chunks||-|
|0x000C||2||short/bytes||Tempo||See Note #3|
- NOTE #1
- Should say 'MThd'.
- If it says 'RIFF' then it's an evil bastardized Microsoftian MIDI file. Toss out the 'RIFF' and the next four bytes - and theoretically, you'll find the 'MThd' header.
- NOTE #2
- 0 - one, single multi-channel track
- 1 - one or more simultaneous tracks
- 2 - one or more sequentially independent single-track patterns
- NOTE #3
- If tempo is negative :
- The absolute value of the high byte = Number of frames per second.
- Low byte : Resolution within one frame.
- If tempo is positive:
- Tempo is the division of a quarter-note.
Track chunks are essentially a stream of MIDI data preceded by delta times.
Each event comprises a delta-time and an event number. The time is a variable-length quantity and stores the time that has to elapse until the NEXT event is due. A time of 0 means that there is no delay before the next event.
At the start of each track chunk there is a track chunk header:
|0x0004||4||int||Length of rest of track chunk in bytes||0|
All events start with a variable length number (see above) indicating the 'delta time' for this event (in reality, this is the delay from the start of this event to the start of the NEXT event).
Then, each event starts with either:
- A byte with the high bit set - the lower 7 bits indicating the type of the event.
- A byte with the high bit zeroed indicating that this event has the same type as the previous event in this track - and this is the first byte of the actual data for the event. This is called a 'running status' event.
There are three kinds of event:
- MIDI event - type fields between 0x80 and 0xEF
- System exclusive event (SysEx) - the type field is either 0xF7 or 0xF0
- Meta event (non-MIDI information) - the type field is 0xFF
The type field (which always has the high bit set) contains the event code in the top four bits and the channel id in the lower 4 bits. Hence there are 16 MIDI channels - and MIDI events 0x8_,0x9_,0xA_,0xB_,0xC_,0xD_ and 0xE_ :
The seven MIDI event types are:
|Type code||Additional bytes||Meaning|
|0x8_||nn vv||Note off event|
|0x9_||nn vv||Note on event|
|0xA_||nn vv||Note aftertouch event|
|0xB_||cc xx||Controller event|
|0xC_||pp||Program change event|
|0xD_||vv||Channel aftertouch event|
|0xE_||pppp||Pitch bend event|
|0xF_||(Not a MIDI event - this is either a SysEx or a Meta event)|
- nn - the note number (0..127).
- The frequency of the note 'nn' is: 2.0(nn-69)/12 x 440 Hz.
- The note that best approximates a frequency of 'freq' is: 69+12 x log2(freq/440Hz).
- vv - the 'velocity' with which a note is struck or released on a scale of (0..255).
- cc - the number of a controller (0..127).
- xx - the value of a particular controller (0..255).
- pp - a 'program' - typically an instrument type (0..127). For 'general MIDI' there is a standardised list of instruments. But for an arbitary synth - it just means "preset number pp" - whatever that is.
- pppp - a 16 bit 'bend amount'.
|0x0002||?||variable-length number||Length of data in bytes|
Predefined Meta events:
|FF||00||02||ssss||Sequence Number||Note #1.|
|FF||01||len||text||Text Event||ASCII text describing anything.|
|FF||05||len||text||Lyric||Each syllable should be a lyric event with the appropriate timing.|
|FF||06||len||text||Section Marker||A rehearsal note or section name (eg "Second chorus")|
|FF||07||len||text||Cue Point||Something happening in a related film/stage performance.|
|FF||2F||00||(nothing)||End of Track||Provides timing for final delay at the end of the track.|
|FF||51||03||3-byte int||Tempo change||Note #2|
|FF||54||05||hour min sec frame ff||SMPTE Offset||Note #3|
|FF||58||04||nn dd cc bb||Time Signature||Note #4|
|FF||59||02||sf mi||Key Signature||Note #5|
|FF||7F||len||data||Sequencer-Specific Meta-Event||Note #6|
- Note #1
- Sequence number:
- Optional event must occur at the beginning of a track, before any time has elapsed and before any MIDI events have to be sent.
- Note #2
- Tempo change:
- Can be envisaged as usec per MIDI quarter-note.
- Can be envisaged as 24ths of a microsecond per MIDI clock.
- Note #3
- SMPTE Offset:
- Designates the SMPTE time at which the track block is supposed to start.
- The hour is encoded in SMPTE format, same as MIDI Time Code.
- The ff field contains fractional frames, in 100ths of a frame.
- Note #4
- Time Signature:
- The time signature is four 1 byte numbers.
- nn & dd are the numerator and denominator of the time signature as it would be notated.
- dd is a negative power of two:
- 2 = quarter-note,
- 3 = eighth-note, etc.
- cc : number of MIDI clocks in a metronome click.
- bb : number of notated 32nd-notes in a MIDI quarter-note (24 MIDI Clocks).
- Note #5
- Key Signature
- st field:
- sf = -7: 7 flats
- sf = -1: 1 flat
- sf = 0: key of C
- sf = 1: 1 sharp
- sf = 7: 7 sharps
- mi field:
- mi = 0: major key
- mi = 1: minor key
- Note #6
- Sequencer-Specific Meta-Event
- First byte: is a manufacturer ID.
- After that - all bets are off!
System Exclusive Events
SysEx is used as an escape sequence that specifies completely arbitary data to be sent to a sequencer or synthesizer.
There are two versions:
- The 0xF7 event is used if a 0xF0 needs to be sent inside the data...otherwise use 0xF0
- SysEx events must end with an 0xF7 event.
|0x0000||1||byte||ID||0xF0 or 0xF7|