MPD/M3U8 question

Dear Lazyweb,

I have two files, 5 minutes long, MP4 video-only and M4A audio-only. I would like to hand them to the <VIDEO> tag and have it play both in sync. Is there a way to express this using an MPD or M3U8 file?

I am hoping to avoid having to run ffmpeg or otherwise parse MPEG frames or split the files. Or mux them.

I've tried a bunch of things but the MPD spec is baffling.

Tags: , ,

10 Responses:

  1. Cody says:

    Would this approach solve your problem?

    • jwz says:

      I think onseeking only fires when you hit the scrollbar. That seems unlikely to keep the two synchronized after minutes have passed and network and scheduling vagaries have built up.

      I'm pretty sure it's possible to do this with a properly crafted MPD or M3U8 file, I just haven't figured it out.

  2. Kyzer says:

    My understanding that you're hoping that by some magic incantation, you can create a "playlist" and make use of HLS (unofficial, unsupported in most browsers)MPEG-DASH to make a browser sync two static streams... because Javascript is hard? Except no real browsers support MPEG-DASH, it's achieved with a polyfill, aka someone else's Javascript. You're trying to come up with an incantation that pleases dash.js, Shaka Player or similar Javascript libraries.

    If you want a simple solution, e.g. you don't need adaptive streams for different bitrates and resolutions, you just need to sync two <video> and <audio> tags, you should sync them yourself, with Javascript. A fairly simple solution is given here - if the video/audio desync goes beyond a threshold, the audio is resynced.

    If you don't want to write your own code to mux in the browser, or use someone else's, your best option is to actually mux the files with ffmpeg. Disk space is cheap, right?

    • rollcat says:

      @jwz the short answer to get stuff done, is use ffmpeg with the copy codec and merge them into one file.

      @Kyzer - both HLS and DASH solve more or less the same set of problems: adaptive playback of video streams (adaptive = your connection might be dodgy but the stream will keep playing with best quality available). They also support multiple codecs, alternative audio and subtitle tracks. If you don't need any of that, they're overkill.

      Both are standards. HLS was first, originally created by Apple, and everything Apple supports it really well (including "it just works" playback in Safari, incl. mobile). Apple was always very open with the spec and it eventually got blessed as RFC8216. DASH, on the other hand, speaking from my experience, is a total mess - just avoid it.

      If you want to put a hack together, I really recommend working with HLS. It's really made for humans, I've been pulling off hacks using grep / sed / awk / simple Python. ffmpeg also does an OK job, and Apple has excellent commandline tools.

      Source: my day job for the past year.

      • jwz says:

        Look, the reason I'm trying to avoid using ffmpeg is that nobody has ported ffmpeg to JavaScript yet.

        I don't understand why having two AdaptationSets of one file each doesn't work, but apparently it really wants that exactly-N-magical-bytes initialization segment to be split out. And you can't know N without parsing MPEG frames.

        Likewise, MPD appears to have a syntax saying "here are two sub-playlists, one for audio and one for video" but I have not found a way to make that go, either.

        • rollcat says:

          > nobody has ported ffmpeg to JavaScript yet

          Hm, if you could elaborate on your use case? I suppose you can't re-host the files, or process them on some backend? Normally I'd avoid doing too much heavy lifting on the client, it kills battery and/or performance.

          > Likewise, MPD appears to have a syntax saying "here are two sub-playlists, one for audio and one for video" but I have not found a way to make that go, either.

          HLS (& as far as I can tell, DASH) are made for segmented media (chunks of anywhere between 1-10s, typically 6s, stored as separate files), I'm not sure if a 5min segment won't make a particular player unhappy (many will only start playing a segment once it's completely buffered).

          Can you PM me these files, maybe I can try to stitch together a playlist?

          • jwz says:

            It would be interesting to know whether this approach would make some players unhappy, but I haven't gotten them to play at all. Any two de-muxed files will do:

            youtubedown --suffix --fmt 140 ''
            youtubedown --suffix --fmt 137 ''

            • rollcat says:

              > Any two de-muxed files will do

              Not entirely true. The trickiest bit with hand-crafting manifests is figuring out the magic numbers to put in the CODECS attribute in the master .m3u8 playlist - the player needs this to make a decision whether it can or cannot play a given variant. Your tooling (encoder, packager, etc) would normally do that for you, but we once needed to deliver an HLS asset using an exotic codec variant and it was a major headache.

              ffprobe would theoretically give you all of the relevant information for almost anything you throw at it, but in a format that cannot be directly consumed by a player:

              % curl -s | grep CODECS | head -1
              % curl -sO
              % ffprobe seq-0.ts
              ffprobe version 4.1.3 Copyright (c) 2007-2019 the FFmpeg developers
              built with Apple LLVM version 10.0.1 (clang-1001.0.46.4)
              Input #0, mpegts, from 'seq-0.ts':
              Duration: 00:00:02.00, start: 0.083333, bitrate: 40 kb/s
              Program 1
              Stream #0:0[0x102]: Video: h264 (Main) ([27][0][0][0] / 0x001B), yuv420p(progressive), 426x180 [SAR 7680:7739 DAR 256:109], 24 fps, 24 tbr, 90k tbn, 48 tbc

              Other tools (like mp4info from bento4) will give you the correct magic numbers on a plate, but I've had mixed success with more exotic codecs.

              The keyword is CMAF, and the MDN docs shed some light on how the browser interprets CODECS.

  3. Tinus says:

    Apparently the Hippo Media Server can do this automatically. The documentation coincidentally even has an example MPD of a separate video and audio stream. I presume the purpose would be to serve different language streams without wasting bandwidth.