Wednesday, November 10, 2010

MP4 File Format Part 2

ISO IEC 14496-12 defined the base media file format for MPEG file structure.

ISO IEC 14496-15 Information technology — Coding of audio-visual objects — Part 15: Advanced Video Coding (AVC) file format extends part 12 to provide specific atom/box type for AVC (H.264)

Mdat Atom

As mdat is about frame, I have to mention about AVC sample structure


ISO 14496-15 define AVC sample structure as externally framed sample and have a frame length supplied by external framing. Thus, AVC access unit means a set of NAL units where each NAL has

  • a usually 4 bytes fields to denote the frame size
  • followed by a NAL unit

With that in mind, the screenshot shows a the red box denote the frame size (4 bytes). The blue box is the start of the frame, in this case, it is H.264 Non-IDR frame. See 5.2.3 of that document.

AVC Decoder Configuration Record

H.264 require decoder configuration data to initialize the decoder prior to any decoding process. Thus, MP4 file must have this record in the Movie Box.

These decoder configuration record data are stored in STSD (sample decription box) - Visual Sample Entry. For H.264, this is stored in a avc1 atom.

Before going into that, I should provide some information regarding the AVC decoder configuration record. Below is the AVC decoder configuration record in a class structure.


configurationVersion - 8 bits int value that is always 1. If decoder see unrecognized version, the decoder should not decode the stream

AVCProfileIndication - 8 bits int value that contains profile code in ISO IEC 14496-10

profile_compatibility - 8 bits int value that exactly the same byte that occurs between profile_IDC and level_IDC in the SPS

AVCLevelIndication - 8 bits int value that define the level code

lengthSizeMinusOne - 2 bits int value that indicate the length in bytes of the NALUnitLength field in an AVC video sample

numOfSequenceParameterSets - 5 bits int value that indicate the number os SPSs that are used as the initial set of SPSs for decoding the AVC elementary stream

sequenceParameterSetLength - 16 bits int value that indicate the length in bytes of SPS

sequenceParameterSetNALUnit - the actual SPS. The length is defined by the preceding sequenceParameterSetLength field

numOfPictureParameterSets - 8 bits int value that indicate the number of PPS that are used as the initial set of PPSs for decoding the AVC elementary stream

pictureParameterSetLength - 16 bits int value that indicate the length in bytes of the PPS

pictureParameterSetNALUnit - the actual PPS. The length is defined by the preceding pictureParameterSetLength field

acv1 -Sample Description Name And Format


avc1 is an AVC visual sample entry. It has a avcC (AVC Configuration Box) atom. This atom contains an AVCDecoderConfigurationRecord as state above.

I have broker down and highlighted those value in acvC atom

Red - these 4 bytes in red denote the length of the atom, including the length and type field
Green - these 4 bytes in green denote the type field
Purple - those bytes surrounded by purple is the AVCDecoderConfigurationRecord. So, the length of AVCDecoderConfigurationRecord is length of atom - 8

By following AVCDecoderConfigurationRecord structure, you can parse the values easily.

For example,

0x01 - the first byte is configurationVersion = 1

0x42 - the second byte is AVCProfileIndication = 66

0xC0 - the third byte is profile_compatibility = 192

0x15 - the fourth byte is AVCLevelIndication = 21

Reference: ISO IEC 14496-12 and ISO IEC 14496-15

16 comments:

  1. Thanks for these useful posts. Which software are you using to browse the file structure with shown in the screenshots?

    ReplyDelete
  2. It is mp4parser.

    http://code.google.com/p/mp4parser/

    ReplyDelete
  3. Hello guys,

    I have a question, please help me. Thanks a lot.
    My question is that quicktime can not play the mp4 file which be produced by ffmpeg. I try to fill AVC Decoder Configuration Record to h264 extradata, it is same as before. qt can not play, it seems I lost something, do you have any ideal, please help me, thanks a lot.

    ReplyDelete
  4. Hi psychesnet,

    I suggest you should localize the issue first.

    Can you output mp4 file played by VLC? If no, you file structure or AVC configuration have some issue.

    If VLC can play it, it means that it will be the fussy quicktime issue. What you can do is to change the profile or level to lower rate, remove B-Frame setting, etc..

    ReplyDelete
  5. Dear Thompson Ng,
    If I do not insert extradata for h264, the vlc can play it(mp4 file). But qt can not.
    If I insert
    0142001effe1001b6742001ee90283f7fe0000030001c48006ddd000cdfe600d88109401000468ce3152
    (
    SPS:6742001ee90283f7fe0000030001c48006ddd000cdfe600d881094
    PPS:68ce3152
    )
    the vlc can not play it, even no video screen.
    By the way, I do not have B-Frame and my video frame is coming driver(camera). So, I do not need encode and decode the frame, I just put it into the avpacket(ffmpeg).
    Dear Thompson Ng, please help me, Thanks a lot.

    ReplyDelete
  6. Dear Thompson Ng,
    May I send email to you and attach my output file.
    Thanks a lot.

    ReplyDelete
  7. Sure. Please send to tngcy@hotmail.com, some description on what you want to do and your error. I will get back to you asap

    ReplyDelete
  8. How i can send AVC via RTP? Which headers contain NAL unit in first picture? rfc 3964 compatible? For example, what means 0x41 byte header? A single NAL unit?
    PS: i already can stream this video file via RTP, but player play it with some artifacts. I think, key frames (0x65 - IDR) have a wrong header.

    ReplyDelete
  9. You need to read the H.264 specification to really know what 41 means. Or, even on RFC3984.

    In actual fact, RFC3984 - 5.3. NAL Unit Octet Usage does give you a brief description on what is H.264 header. From there, you can see that 0x41 means P-Frame. And like you had said, 0x65 is a I-Frame.

    As for artifacts, it could be due to many different reason from dropping of frames, incomplete frames, lack of SPS and PPS information, etc... The best tools to help you is wireshark where you really have to dig into network packet level to understand what is happening.

    ReplyDelete
  10. Hi Thompson,
    Your site help me lot in understanding mp4 file format. I have one question what is nal size ? when i muxing h264 data in mp4 file and running that file in vlc its giving waring that
    AVC: nal size -1683583999
    no frame!
    [0x981755c] avcodec decoder warning: cannot decode one frame (13258 bytes)
    can you explain why that warning
    thanks in advance

    ReplyDelete
  11. How did you mux the h264 frame? Did you mux 1 NAL at a time?

    If look at ffmpeg code, that error may occurs when you mux 2 or more NAL units in 1 mp4 mdat sample.

    You can see the source code here http://ffmpeg.org/doxygen/trunk/h264_8c-source.html

    ReplyDelete
  12. Hi Thompson,
    Thanks for reply.I did not set proper sample size values. so that i got error.Now its working fine.
    Your mp4 file format part1 helped lot to understand mp4 file.
    Thanks,
    Raghav

    ReplyDelete
  13. Really grad that your had fixed the issue ^^

    ReplyDelete
  14. nice post about mp4 file format part 2, you gave me more known with mp4 format. thank you.

    ReplyDelete
  15. Hi Thompson,
    MP4 blog 1 & 2 is absolutely brilliant stuff. Got very valuable insights. Thx a lot.


    Abhishek

    ReplyDelete