2 Informal standard M. Nilsson
3 Document: id3v2.4.0-structure.txt 16 September 2001
6 ID3 tag version 2.4.0 - Main Structure
8 Status of this document
10 This document is an informal standard and replaces the ID3v2.3.0
11 standard [ID3v2]. A formal standard will use another revision number
12 even if the content is identical to document. The contents in this
13 document may change for clarifications but never for added or altered
16 Distribution of this document is unlimited.
21 This document describes the main structure of ID3v2.4.0, which is a
22 revised version of the ID3v2 informal standard [ID3v2] version
23 2.3.0. The ID3v2 offers a flexible way of storing audio meta
24 information within the audio file itself. The information may be
25 technical information, such as equalisation curves, as well as
26 title, performer, copyright etc.
28 ID3v2.4.0 is meant to be as close as possible to ID3v2.3.0 in order
29 to allow for implementations to be revised as easily as possible.
34 Status of this document
37 2. Conventions in this document
41 3.2. ID3v2 extended header
44 4. ID3v2 frames overview
45 4.1. Frame header flags
46 4.1.1. Frame status flags
47 4.1.2. Frame format flags
50 6.1. The unsynchronisation scheme
51 6.2. Synchsafe integers
57 2. Conventions in this document
59 Text within "" is a text string exactly as it appears in a tag.
60 Numbers preceded with $ are hexadecimal and numbers preceded with %
61 are binary. $xx is used to indicate a byte with unknown content. %x
62 is used to indicate a bit with unknown content. The most significant
63 bit (MSB) of a byte is called 'bit 7' and the least significant bit
64 (LSB) is called 'bit 0'.
66 A tag is the whole tag described in this document. A frame is a block
67 of information in the tag. The tag consists of a header, frames and
68 optional padding. A field is a piece of information; one value, a
69 string etc. A numeric string is a string that consists of the
70 characters "0123456789" only.
72 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
73 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
74 document are to be interpreted as described in RFC 2119 [KEYWORDS].
79 ID3v2 is a general tagging format for audio, which makes it possible
80 to store meta data about the audio inside the audio file itself. The
81 ID3 tag described in this document is mainly targeted at files
82 encoded with MPEG-1/2 layer I, MPEG-1/2 layer II, MPEG-1/2 layer III
83 and MPEG-2.5, but may work with other types of encoded audio or as a
84 stand alone format for audio meta data.
86 ID3v2 is designed to be as flexible and expandable as possible to
87 meet new meta information needs that might arise. To achieve that
88 ID3v2 is constructed as a container for several information blocks,
89 called frames, whose format need not be known to the software that
90 encounters them. At the start of every frame is an unique and
91 predefined identifier, a size descriptor that allows software to skip
92 unknown frames and a flags field. The flags describes encoding
93 details and if the frame should remain in the tag, should it be
94 unknown to the software, if the file is altered.
96 The bitorder in ID3v2 is most significant bit first (MSB). The
97 byteorder in multibyte numbers is most significant byte first (e.g.
98 $12345678 would be encoded $12 34 56 78), also known as big endian
99 and network byte order.
101 Overall tag structure:
103 +-----------------------------+
104 | Header (10 bytes) |
105 +-----------------------------+
107 | (variable length, OPTIONAL) |
108 +-----------------------------+
109 | Frames (variable length) |
110 +-----------------------------+
112 | (variable length, OPTIONAL) |
113 +-----------------------------+
114 | Footer (10 bytes, OPTIONAL) |
115 +-----------------------------+
117 In general, padding and footer are mutually exclusive. See details in
118 sections 3.3, 3.4 and 5.
123 The first part of the ID3v2 tag is the 10 byte tag header, laid out
126 ID3v2/file identifier "ID3"
128 ID3v2 flags %abcd0000
129 ID3v2 size 4 * %0xxxxxxx
131 The first three bytes of the tag are always "ID3", to indicate that
132 this is an ID3v2 tag, directly followed by the two version bytes. The
133 first byte of ID3v2 version is its major version, while the second
134 byte is its revision number. In this case this is ID3v2.4.0. All
135 revisions are backwards compatible while major versions are not. If
136 software with ID3v2.4.0 and below support should encounter version
137 five or higher it should simply ignore the whole tag. Version or
138 revision will never be $FF.
140 The version is followed by the ID3v2 flags field, of which currently
144 a - Unsynchronisation
146 Bit 7 in the 'ID3v2 flags' indicates whether or not
147 unsynchronisation is applied on all frames (see section 6.1 for
148 details); a set bit indicates usage.
153 The second bit (bit 6) indicates whether or not the header is
154 followed by an extended header. The extended header is described in
155 section 3.2. A set bit indicates the presence of an extended
159 c - Experimental indicator
161 The third bit (bit 5) is used as an 'experimental indicator'. This
162 flag SHALL always be set when the tag is in an experimental stage.
167 Bit 4 indicates that a footer (section 3.4) is present at the very
168 end of the tag. A set bit indicates the presence of a footer.
171 All the other flags MUST be cleared. If one of these undefined flags
172 are set, the tag might not be readable for a parser that does not
173 know the flags function.
175 The ID3v2 tag size is stored as a 32 bit synchsafe integer (section
176 6.2), making a total of 28 effective bits (representing up to 256MB).
178 The ID3v2 tag size is the sum of the byte length of the extended
179 header, the padding and the frames after unsynchronisation. If a
180 footer is present this equals to ('total size' - 20) bytes, otherwise
181 ('total size' - 10) bytes.
183 An ID3v2 tag can be detected with the following pattern:
184 $49 44 33 yy yy xx zz zz zz zz
185 Where yy is less than $FF, xx is the 'flags' byte and zz is less than
191 The extended header contains information that can provide further
192 insight in the structure of the tag, but is not vital to the correct
193 parsing of the tag information; hence the extended header is
196 Extended header size 4 * %0xxxxxxx
197 Number of flag bytes $01
200 Where the 'Extended header size' is the size of the whole extended
201 header, stored as a 32 bit synchsafe integer. An extended header can
202 thus never have a size of fewer than six bytes.
204 The extended flags field, with its size described by 'number of flag
205 bytes', is defined as:
209 Each flag that is set in the extended header has data attached, which
210 comes in the order in which the flags are encountered (i.e. the data
211 for flag 'b' comes before the data for flag 'c'). Unset flags cannot
212 have any attached data. All unknown flags MUST be unset and their
213 corresponding data removed when a tag is modified.
215 Every set flag's data starts with a length byte, which contains a
216 value between 0 and 127 ($00 - $7f), followed by data that has the
217 field length indicated by the length byte. If a flag has no attached
218 data, the value $00 is used as length byte.
223 If this flag is set, the present tag is an update of a tag found
224 earlier in the present file or stream. If frames defined as unique
225 are found in the present tag, they are to override any
226 corresponding ones found in the earlier tag. This flag has no
233 If this flag is set, a CRC-32 [ISO-3309] data is included in the
234 extended header. The CRC is calculated on all the data between the
235 header and footer as indicated by the header's tag length field,
236 minus the extended header. Note that this includes the padding (if
237 there is any), but excludes the footer. The CRC-32 is stored as an
238 35 bit synchsafe integer, leaving the upper four bits always
242 Total frame CRC 5 * %0xxxxxxx
246 For some applications it might be desired to restrict a tag in more
247 ways than imposed by the ID3v2 specification. Note that the
248 presence of these restrictions does not affect how the tag is
249 decoded, merely how it was restricted before encoding. If this flag
250 is set the tag is restricted as follows:
253 Restrictions %ppqrrstt
255 p - Tag size restrictions
257 00 No more than 128 frames and 1 MB total tag size.
258 01 No more than 64 frames and 128 KB total tag size.
259 10 No more than 32 frames and 40 KB total tag size.
260 11 No more than 32 frames and 4 KB total tag size.
262 q - Text encoding restrictions
265 1 Strings are only encoded with ISO-8859-1 [ISO-8859-1] or
268 r - Text fields size restrictions
271 01 No string is longer than 1024 characters.
272 10 No string is longer than 128 characters.
273 11 No string is longer than 30 characters.
275 Note that nothing is said about how many bytes is used to
276 represent those characters, since it is encoding dependent. If a
277 text frame consists of more than one string, the sum of the
278 strungs is restricted as stated.
280 s - Image encoding restrictions
283 1 Images are encoded only with PNG [PNG] or JPEG [JFIF].
285 t - Image size restrictions
288 01 All images are 256x256 pixels or smaller.
289 10 All images are 64x64 pixels or smaller.
290 11 All images are exactly 64x64 pixels, unless required
296 It is OPTIONAL to include padding after the final frame (at the end
297 of the ID3 tag), making the size of all the frames together smaller
298 than the size given in the tag header. A possible purpose of this
299 padding is to allow for adding a few additional frames or enlarge
300 existing frames within the tag without having to rewrite the entire
301 file. The value of the padding bytes must be $00. A tag MUST NOT have
302 any padding between the frames or between the tag header and the
303 frames. Furthermore it MUST NOT have any padding when a tag footer is
309 To speed up the process of locating an ID3v2 tag when searching from
310 the end of a file, a footer can be added to the tag. It is REQUIRED
311 to add a footer to an appended tag, i.e. a tag located after all
312 audio data. The footer is a copy of the header, but with a different
315 ID3v2 identifier "3DI"
317 ID3v2 flags %abcd0000
318 ID3v2 size 4 * %0xxxxxxx
321 4. ID3v2 frame overview
323 All ID3v2 frames consists of one frame header followed by one or more
324 fields containing the actual information. The header is always 10
325 bytes and laid out as follows:
327 Frame ID $xx xx xx xx (four characters)
331 The frame ID is made out of the characters capital A-Z and 0-9.
332 Identifiers beginning with "X", "Y" and "Z" are for experimental
333 frames and free for everyone to use, without the need to set the
334 experimental bit in the tag header. Bear in mind that someone else
335 might have used the same identifier as you. All other identifiers are
336 either used or reserved for future use.
338 The frame ID is followed by a size descriptor containing the size of
339 the data in the final frame, after encryption, compression and
340 unsynchronisation. The size is excluding the frame header ('total
341 frame size' - 10 bytes) and stored as a 32 bit synchsafe integer.
343 In the frame header the size descriptor is followed by two flag
344 bytes. These flags are described in section 4.1.
346 There is no fixed order of the frames' appearance in the tag,
347 although it is desired that the frames are arranged in order of
348 significance concerning the recognition of the file. An example of
349 such order: UFID, TIT2, MCDI, TRCK ...
351 A tag MUST contain at least one frame. A frame must be at least 1
352 byte big, excluding the header.
354 If nothing else is said, strings, including numeric strings and URLs
355 [URL], are represented as ISO-8859-1 [ISO-8859-1] characters in the
356 range $20 - $FF. Such strings are represented in frame descriptions
357 as <text string>, or <full text string> if newlines are allowed. If
358 nothing else is said newline character is forbidden. In ISO-8859-1 a
359 newline is represented, when allowed, with $0A only.
361 Frames that allow different types of text encoding contains a text
362 encoding description byte. Possible encodings:
364 $00 ISO-8859-1 [ISO-8859-1]. Terminated with $00.
365 $01 UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
366 strings in the same frame SHALL have the same byteorder.
367 Terminated with $00 00.
368 $02 UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
369 Terminated with $00 00.
370 $03 UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.
372 Strings dependent on encoding are represented in frame descriptions
373 as <text string according to encoding>, or <full text string
374 according to encoding> if newlines are allowed. Any empty strings of
375 type $01 which are NULL-terminated may have the Unicode BOM followed
376 by a Unicode NULL ($FF FE 00 00 or $FE FF 00 00).
378 The timestamp fields are based on a subset of ISO 8601. When being as
379 precise as possible the format of a time string is
380 yyyy-MM-ddTHH:mm:ss (year, "-", month, "-", day, "T", hour (out of
381 24), ":", minutes, ":", seconds), but the precision may be reduced by
382 removing as many time indicators as wanted. Hence valid timestamps
384 yyyy, yyyy-MM, yyyy-MM-dd, yyyy-MM-ddTHH, yyyy-MM-ddTHH:mm and
385 yyyy-MM-ddTHH:mm:ss. All time stamps are UTC. For durations, use
386 the slash character as described in 8601, and for multiple non-
387 contiguous dates, use multiple strings, if allowed by the frame
390 The three byte language field, present in several frames, is used to
391 describe the language of the frame's content, according to ISO-639-2
392 [ISO-639-2]. The language should be represented in lower case. If the
393 language is not known the string "XXX" should be used.
395 All URLs [URL] MAY be relative, e.g. "picture.png", "../doc.txt".
397 If a frame is longer than it should be, e.g. having more fields than
398 specified in this document, that indicates that additions to the
399 frame have been made in a later version of the ID3v2 standard. This
400 is reflected by the revision number in the header of the tag.
403 4.1. Frame header flags
405 In the frame header the size descriptor is followed by two flag
406 bytes. All unused flags MUST be cleared. The first byte is for
407 'status messages' and the second byte is a format description. If an
408 unknown flag is set in the first byte the frame MUST NOT be changed
409 without that bit cleared. If an unknown flag is set in the second
410 byte the frame is likely to not be readable. Some flags in the second
411 byte indicates that extra information is added to the header. These
412 fields of extra information is ordered as the flags that indicates
413 them. The flags field is defined as follows (l and o left out because
414 ther resemblence to one and zero):
418 Some frame format flags indicate that additional information fields
419 are added to the frame. This information is added after the frame
420 header and before the frame data in the same order as the flags that
421 indicates them. I.e. the four bytes of decompressed size will precede
422 the encryption method byte. These additions affects the 'frame size'
423 field, but are not subject to encryption or compression.
425 The default status flags setting for a frame is, unless stated
426 otherwise, 'preserved if tag is altered' and 'preserved if file is
427 altered', i.e. %00000000.
430 4.1.1. Frame status flags
432 a - Tag alter preservation
434 This flag tells the tag parser what to do with this frame if it is
435 unknown and the tag is altered in any way. This applies to all
436 kinds of alterations, including adding more padding and reordering
439 0 Frame should be preserved.
440 1 Frame should be discarded.
443 b - File alter preservation
445 This flag tells the tag parser what to do with this frame if it is
446 unknown and the file, excluding the tag, is altered. This does not
447 apply when the audio is completely replaced with other audio data.
449 0 Frame should be preserved.
450 1 Frame should be discarded.
455 This flag, if set, tells the software that the contents of this
456 frame are intended to be read only. Changing the contents might
457 break something, e.g. a signature. If the contents are changed,
458 without knowledge of why the frame was flagged read only and
459 without taking the proper means to compensate, e.g. recalculating
460 the signature, the bit MUST be cleared.
463 4.1.2. Frame format flags
465 h - Grouping identity
467 This flag indicates whether or not this frame belongs in a group
468 with other frames. If set, a group identifier byte is added to the
469 frame. Every frame with the same group identifier belongs to the
472 0 Frame does not contain group information
473 1 Frame contains group information
478 This flag indicates whether or not the frame is compressed.
479 A 'Data Length Indicator' byte MUST be included in the frame.
481 0 Frame is not compressed.
482 1 Frame is compressed using zlib [zlib] deflate method.
483 If set, this requires the 'Data Length Indicator' bit
489 This flag indicates whether or not the frame is encrypted. If set,
490 one byte indicating with which method it was encrypted will be
491 added to the frame. See description of the ENCR frame for more
492 information about encryption method registration. Encryption
493 should be done after compression. Whether or not setting this flag
494 requires the presence of a 'Data Length Indicator' depends on the
495 specific algorithm used.
497 0 Frame is not encrypted.
498 1 Frame is encrypted.
500 n - Unsynchronisation
502 This flag indicates whether or not unsynchronisation was applied
503 to this frame. See section 6 for details on unsynchronisation.
504 If this flag is set all data from the end of this header to the
505 end of this frame has been unsynchronised. Although desirable, the
506 presence of a 'Data Length Indicator' is not made mandatory by
509 0 Frame has not been unsynchronised.
510 1 Frame has been unsyrchronised.
512 p - Data length indicator
514 This flag indicates that a data length indicator has been added to
515 the frame. The data length indicator is the value one would write
516 as the 'Frame length' if all of the frame format flags were
517 zeroed, represented as a 32 bit synchsafe integer.
519 0 There is no Data Length Indicator.
520 1 A data length Indicator has been added to the frame.
525 The default location of an ID3v2 tag is prepended to the audio so
526 that players can benefit from the information when the data is
527 streamed. It is however possible to append the tag, or make a
528 prepend/append combination. When deciding upon where an unembedded
529 tag should be located, the following order of preference SHOULD be
534 2. Prepend a tag with all vital information and add a second tag at
535 the end of the file, before tags from other tagging systems. The
536 first tag is required to have a SEEK frame.
538 3. Add a tag at the end of the file, before tags from other tagging
541 In case 2 and 3 the tag can simply be appended if no other known tags
542 are present. The suggested method to find ID3v2 tags are:
544 1. Look for a prepended tag using the pattern found in section 3.1.
546 2. If a SEEK frame was found, use its values to guide further
549 3. Look for a tag footer, scanning from the back of the file.
551 For every new tag that is found, the old tag should be discarded
552 unless the update flag in the extended header (section 3.2) is set.
557 The only purpose of unsynchronisation is to make the ID3v2 tag as
558 compatible as possible with existing software and hardware. There is
559 no use in 'unsynchronising' tags if the file is only to be processed
560 only by ID3v2 aware software and hardware. Unsynchronisation is only
561 useful with tags in MPEG 1/2 layer I, II and III, MPEG 2.5 and AAC
565 6.1. The unsynchronisation scheme
567 Whenever a false synchronisation is found within the tag, one zeroed
568 byte is inserted after the first false synchronisation byte. The
569 format of synchronisations that should be altered by ID3 encoders is
574 and should be replaced with:
576 %11111111 00000000 111xxxxx
578 This has the side effect that all $FF 00 combinations have to be
579 altered, so they will not be affected by the decoding process.
580 Therefore all the $FF 00 combinations have to be replaced with the
581 $FF 00 00 combination during the unsynchronisation.
583 To indicate usage of the unsynchronisation, the unsynchronisation
584 flag in the frame header should be set. This bit MUST be set if the
585 frame was altered by the unsynchronisation and SHOULD NOT be set if
586 unaltered. If all frames in the tag are unsynchronised the
587 unsynchronisation flag in the tag header SHOULD be set. It MUST NOT
588 be set if the tag has a frame which is not unsynchronised.
590 Assume the first byte of the audio to be $FF. The special case when
591 the last byte of the last frame is $FF and no padding nor footer is
592 used will then introduce a false synchronisation. This can be solved
593 by adding a footer, adding padding or unsynchronising the frame and
594 add $00 to the end of the frame data, thus adding more byte to the
595 frame size than a normal unsynchronisation would. Although not
596 preferred, it is allowed to apply the last method on all frames
599 It is preferred that the tag is either completely unsynchronised or
600 not unsynchronised at all. A completely unsynchronised tag has no
601 false synchonisations in it, as defined above, and does not end with
602 $FF. A completely non-unsynchronised tag contains no unsynchronised
603 frames, and thus the unsynchronisation flag in the header is cleared.
605 Do bear in mind, that if compression or encryption is used, the
606 unsynchronisation scheme MUST be applied afterwards. When decoding an
607 unsynchronised frame, the unsynchronisation scheme MUST be reversed
608 first, encryption and decompression afterwards.
611 6.2. Synchsafe integers
613 In some parts of the tag it is inconvenient to use the
614 unsychronisation scheme because the size of unsynchronised data is
615 not known in advance, which is particularly problematic with size
616 descriptors. The solution in ID3v2 is to use synchsafe integers, in
617 which there can never be any false synchs. Synchsafe integers are
618 integers that keep its highest bit (bit 7) zeroed, making seven bits
619 out of eight available. Thus a 32 bit synchsafe integer can store 28
624 255 (%11111111) encoded as a 16 bit synchsafe integer is 383
625 (%00000001 01111111).
630 Copyright (C) Martin Nilsson 2000. All Rights Reserved.
632 This document and translations of it may be copied and furnished to
633 others, and derivative works that comment on or otherwise explain it
634 or assist in its implementation may be prepared, copied, published
635 and distributed, in whole or in part, without restriction of any
636 kind, provided that a reference to this document is included on all
637 such copies and derivative works. However, this document itself may
638 not be modified in any way and reissued as the original document.
640 The limited permissions granted above are perpetual and will not be
643 This document and the information contained herein is provided on an
644 'AS IS' basis and THE AUTHORS DISCLAIMS ALL WARRANTIES, EXPRESS OR
645 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
646 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
647 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
652 [ID3v2] Martin Nilsson, 'ID3v2 informal standard'.
654 <url:http://www.id3.org/id3v2.3.0.txt>
656 [ISO-639-2] ISO/FDIS 639-2.
657 'Codes for the representation of names of languages, Part 2: Alpha-3
658 code.' Technical committee / subcommittee: TC 37 / SC 2
661 'Information Processing Systems--Data Communication High-Level Data
662 Link Control Procedure--Frame Structure', IS 3309, October 1984, 3rd
665 [ISO-8859-1] ISO/IEC DIS 8859-1.
666 '8-bit single-byte coded graphic character sets, Part 1: Latin
667 alphabet No. 1.' Technical committee / subcommittee: JTC 1 / SC 2
669 [JFIF] 'JPEG File Interchange Format, version 1.02'
671 <url:http://www.w3.org/Graphics/JPEG/jfif.txt>
673 [KEYWORDS] S. Bradner, 'Key words for use in RFCs to Indicate
674 Requirement Levels', RFC 2119, March 1997.
676 <url:ftp://ftp.isi.edu/in-notes/rfc2119.txt>
678 [MPEG] ISO/IEC 11172-3:1993.
679 'Coding of moving pictures and associated audio for digital storage
680 media at up to about 1,5 Mbit/s, Part 3: Audio.'
681 Technical committee / subcommittee: JTC 1 / SC 29
684 'Generic coding of moving pictures and associated audio information,
686 Technical committee / subcommittee: JTC 1 / SC 29
689 'Generic coding of moving pictures and associated audio information,
690 Part 3: Audio (Revision of ISO/IEC 13818-3:1995)'
692 [PNG] 'Portable Network Graphics, version 1.0'
694 <url:http://www.w3.org/TR/REC-png-multi.html>
696 [UNICODE] The Unicode Consortium,
697 'The Unicode Standard Version 3.0', ISBN 0-201-61633-5.
699 <url:http://www.unicode.org/unicode/standard/versions/Unicode3.0.htm>
701 [URL] T. Berners-Lee, L. Masinter & M. McCahill, 'Uniform Resource
702 Locators (URL)', RFC 1738, December 1994.
704 <url:ftp://ftp.isi.edu/in-notes/rfc1738.txt>
706 [UTF-8] F. Yergeau, 'UTF-8, a transformation format of ISO 10646',
707 RFC 2279, January 1998.
709 <url:ftp://ftp.isi.edu/in-notes/rfc2279.txt>
711 [UTF-16] F. Yergeau, 'UTF-16, an encoding of ISO 10646', RFC 2781,
714 <url:ftp://ftp.isi.edu/in-notes/rfc2781.txt>
716 [ZLIB] P. Deutsch, Aladdin Enterprises & J-L. Gailly, 'ZLIB
717 Compressed Data Format Specification version 3.3', RFC 1950,
720 <url:ftp://ftp.isi.edu/in-notes/rfc1950.txt>
732 Email: nilsson@id3.org