Saturday, March 13, 2010

RFC 3640 for AAC

RFC 3640 defined the rtp payload format for transport MPEG4 elementary stream. More importantly, I am only interested how to setup SDP for AAC High Bitrate and Low Bitrate

First, you need to have a AU header section in the RTP payload. In the AU header section, it has a 2 bytes AU headers length field that denote the length of the all AU headers in bits. For each AU header, it has a list of optional information. You will mostly only interested in AU size and Au index. See section 3.2.1 for details

Next, the SDP define the structure of the RTP payload. If mode exists, RFC 3640 has to be used. I will be focusing on AAC-lbr and AAC-hbr.

mode=AAC-lbr

  • The maximum size of the frame is 63 bytes
  • AAC frames must not be fragment
  • 6 bits of AU size
  • 2 bits to AU index (-delta)
  • AU index field must be 0
  • SDP must present sizeLength, indexLength, indexDeltaLength
  • config is the hexadecimal of AudioSpecificConfig of 14496-3
Sample SDP entry

m=audio 49230 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/22050/1 a=fmtp:96 streamtype=5; profile-level-id=14; mode=AAC-lbr; config= 1388; sizeLength=6; indexLength=2; indexDeltaLength=2; constantDuration=1024; maxDisplacement=5

See 3.3.5 for detail

mode=AAC-hbr

  • The maximum size of the frame is 8191 bytes
  • AAC frames can be interleaved and hence receivers must support de-interleaving
  • 13 bits of AU size
  • 3 bits to AU index (-delta)
  • AU index field must be 0
  • SDP must present sizeLength, indexLength, indexDeltaLength
  • config is the hexadecimal of AudioSpecificConfig of 14496-3

Sample SDP entry

m=audio 49230 RTP/AVP 96 a=rtpmap:96 mpeg4-generic/48000/6 a=fmtp:96 streamtype=5; profile-level-id=16; mode=AAC-hbr; config=11B0; sizeLength=13; indexLength=3; indexDeltaLength=3; constantDuration=1024
Since each RTP AAC payload can be independently decoded, you can extract the raw AAC data in the rtp packet as follow

1. Since at AU header section, there is 2 bytes of AU headers length which denote the all AU headers length in bits, all you need is to read the 2 bytes value into integer and divided by 8. That will tell you the size of all raw AAC data size

2. Next, base on the sizeLength in SDP, you will know the size of each raw AAC data in the payload. Parse each AAC raw data according to the respective size denoted in each AU header

See 3.3.6 for detail

18 comments:

  1. Even though it's in the RFC, your explanation was more clear, saved me some time.

    Thanks.

    ReplyDelete
  2. How can i insert a AAC raw data into RTP payload?
    Are a first two bytes in RTP payload a size of the raw data?
    I have audio frames with different aac raw data size. Does it means what i should add au-header section? Do you have a some example?

    A lot of thanks!

    ReplyDelete
  3. It depends on what AAC you are using. Low bit rate vs High bit rate.

    I will give an example of AAC-hbr which support AAC frame size up to 8191 octet and each RTP packets only had 1 AAC raw frame.

    In AAC-hbr mode (specified in SDP), your AU-header is 2 octets.

    The AU-headers must be preceded by the 16-bit AU-headers-length field.

    Let's my AAC data size are 16 bytes.

    So, in your rtp payload, the first 2 bytes are AU header length and it should be 16 bits. The value in AU header length is 00 10

    Then, the next 2 bytes are AU header with the first 13 bits are size of AAC data and last 3 bits are AU index. Using my AAC data size as 16, you will get following 00 80

    Then the remaining are your AAC data. So, the complete set is

    00 10 00 80 ...(your AAC data of 16 bytes )...

    Please read RFC 3640

    Especially

    3.2. RTP Payload Structure
    3.3.6. High Bit-rate AAC

    for details

    ReplyDelete
  4. Thank you very much. All works fine!

    ReplyDelete
  5. Any idea how I can extract from raw AAC the parameters that I should put in SDP (streamtype, config, idexLength)?
    Thanks

    ReplyDelete
  6. I don't think you can extract AAC parameter from raw frame itself..

    ReplyDelete
  7. I am trying to decode the mp4a-latm content using ffmpeg. I have used the CodecID as Codec_ID_AAC_LATM and extracted the config parameter from the SDP and further extracted the audio content from the RTP by removing the rtp header and sent it to decodeAudio4 function but the decode function always return the negative value.

    I have gone through the RFC 3016 and 6416 and the code of rtpdec_latm.c in ffmpeg src also but i am not able to find where i am doing the mistake and what exactly the procedure to send the content to decode.

    As general, we need to send the config value frm SDP and then the raw audio packet form the RTP after removing the header but its not working.

    Can you tell me what will be exact procedure to depacketize the packets out of the RTP?

    ReplyDelete
  8. I had not work with AAC LATM before, but, the concept should be the same.

    From what you describe, you procedure seems correct.

    When I look at RFC 3016, the sample SDP does have config= parameter, you need that for initializing your ffmpeg AAC encoder.

    And, if you are using LATM, does your AAC bitstream contain more than 1 distinct audio frames in it? LATM is a multiplexing protocol that may means that you have to post-process your RTP depacketize payload into N distinct audio payload before decoding each of them.

    Example work flow,

    1. Received RTP packet
    2. De-packetize it based on RFC 3016 section 4.1
    3. Demux the de-packetized payload base on muxConfigPresent. See RFC 3016 section 5.3. In general, if I am not wrong, that is the config= parameter in your SDP
    4. Base on that, demux the audioMuxElement into N different audio frame.
    5. Decode each frames with ffmpeg

    ReplyDelete
  9. Thanks pal! I got the workflow but upto #3 its fine but for the #4 I am not sure how we need to do that. I ave seen the code of ffmpeg but nothing seems to be clear on that front as well.

    I have cpresent=1 do we need StreamMuxInfo i.e. config value before decoding but demux in the N different audio frame is the thing which I am not sure as how to do it.

    Any idea on this?

    ReplyDelete
  10. cpresent=1 means your RTP payload is multiplexed.

    You need to demux before decoding.

    For how to demux, look at ISO/IEC 14496-3 (MPEG-4 Audio) document, they describe AudioMuxElement(), StreamMuxConfig(), etc... in pseudo coding.

    With that, you may want to find the same method in ffmpeg. Some time, they do have identical utility methods since they have implemented ISO/IEC 14496-3

    ReplyDelete
  11. Oh yeah, thanks man.

    I made a mistake, actually cpresent=0, means the RTP payload is not multiplexed but the size of RTP payload differs fro each RTP packet.

    I will go through the ISO doc and will see how can we demux the content.

    I have found some libfaad filter to do the LATM to ADTS conversion in Del[hi also, at the following link, i will try to see if that helps.

    http://www.dvbviewer.tv/forum/topic/23347-libfaad2-wrapper-filter/

    ReplyDelete
  12. Great to know you had found your mistake!! :)

    ReplyDelete
  13. hey man..

    I have understood the things but still the way i have extracted the content from the RTP packets, the output file seems not be correct as i am not able to play it any of the players.

    I have processed the config and append the required bytes in front of the each frame and saved it in one file but nothing working up.

    I am trying to find out the algorithm but did not found it anywhere. Though i have seen the code from live55 library and Gstreamer src doe and they are doing the same thing and i have tried to do the same thing in my c# code but the results are not correct.

    So, it seem like I am doing some small mistake in understanding the code of these libraries.

    Do you have any idea where i can find the algorithm to do that?

    Here at gstreamer i have seen the code in gstrtpmp4adepay.c file

    http://gstreamer.freedesktop.org/documentation/plugins.html

    My SDP says:
    fmtp:97 profile-level-id=15;cpresent=0;object=2;config=400026103FC0;SBR-enabled=1

    and frame starts like this:

    ab:01:40:20:06:cd:ea:40:02:00............1f
    ab:01:40:40:06:9d:02:80:00:00............1f

    Using the RFC understanding, my header coming as
    1f e0 and then i am appending the whole frame as above but the output file is not able to play.

    ReplyDelete
  14. Where is defined the streamtype parameter? Where can I find clarification on the value (in your example streamtype=5)
    Thanks

    ReplyDelete
  15. Based on RFC 3640,

    streamType:
    The integer value that indicates the type of MPEG-4 stream that is
    carried; its coding corresponds to the values of the streamType,
    as defined in Table 9 (streamType Values) in ISO/IEC 14496-1

    ISO/IEC 14496-1 is the ISO document that specify MPEG-4 and it is referring to part 1 of that series.

    ReplyDelete
  16. Hi,Thompson.
    Thanks for your great share.
    I meet a problem in extract raw AAC from AAC-hbr frame.

    My SDP is:

    a=rtpmap:97 MPEG4-GENERIC/8000/2
    a=fmtp:97 streamtype=5;profile-level-id=16;mode=AAC-hbr;sizelength=13;indexlength=3;indexdeltalength=3;config=1590;constantDuration=1024

    Live555 give me the audio data 225 bytes like:

    011e9eda 2a51b65b 8323acb7 03c5102d 47900be9 ....

    or

    01209eda 8a682398 d96e078a 2d879a20 5a0f4cc9 ....

    Based on RFC 3640,it seems not a correct header format. The size of AU header seems be not correct.
    Do you have any advisement about it?
    Many thanks for your help.

    ReplyDelete
  17. Thompson Ng, I think there is an error here:

    "Then, the next 2 bytes are AU header with the first 13 bits are size of AAC data and last 3 bits are AU index. Using my AAC data size as 16, you will get following 00 80"

    The AU Index value is number of octets (bytes), so for AAC data of 16 bytes, wouldn't that be 0x0010 (assuming an AU Index of 0x00)? 0x0080 seems like number of bits, am I missing something?

    Thanks for the post, it's been helpful!

    ReplyDelete
    Replies
    1. Ok, after working with how to fill out the AU index 13 bit portion, it's not as straight forward as converting 0x80 into decimal to see the size

      The field is 13 bits that are most significant bit (msb) first so this results in
      0000000010000 | 000. The left half is 13 bits AU size and right side is 3 bits for AU index. The left most bit is 2^12 because msb puts the most significant bit at the lowest bit/byte address. Then 1 bit is 2^4 which is 16 (our size).

      But when these 16 bits are viewed as bytes, it looks like 0x0080 because the right most bit is taken as 2^0 instead of the right most bit of our AU size field

      Delete