Concatenating videos and adding audio samples using FFmpeg

Concatenating many videos

I was trying to watch a video on a Chromecast, but it kept temporarily pausing/stuttering every 20 seconds. The weird thing is, it played continuously in the phone browser, so it wasn’t a buffering issue. Well no matter, I can probably just download the video and cast with VLC media player.

To download, I tried the Video DownloadHelper browser extension. It shows media on the current page. In this case, there was a lot of media, roughly 128 different video fragments at sequentially-numbered URLs like That explains the pauses: Chromecast had to keep loading new videos, and either the website’s casting code didn’t handle this well or the machine itself couldn’t multitask. Either way, this became a problem of downloading video fragments and concatenating them into one big video.

The command that eventually worked is mostly taken from the FFmpeg wiki page on concatenation:

ffmpeg -f concat -safe 0 -i fragments.txt -shortest -c copy merged.mp4


The full script goes in a few stages:

  1. Generate a list of video URLs.
  2. Download video fragments in parallel with aria2c. Its default is 5 simultaneous downloads.
  3. Generate a list of files for ffmpeg.
  4. Merge videos with ffmpeg.
  5. Remove the temporary files.
# Create the list of URLs.
for i in $(seq 1 128); do
  printf "\n" $i
done > urls.txt
# Download video fragments to the fragments/ directory.
mkdir -p fragments
aria2c --dir=fragments --input-file=urls.txt

# Create list of downloaded files for ffmpeg.
cat urls.txt |
while read url ; do
  printf "file fragments/%s\n" "$(basename "$url")" >> fragments.txt
done > urls.txt
# Merge video fragments.
ffmpeg -f concat -safe 0 -i fragments.txt -shortest -c copy merged.mp4

# Clean up everything but merged.mp4.
rm -f urls.txt fragments.txt fragments/*
rmdir -f fragments

Adding audio to a video sample

A few days later, I wanted to make a simple meme out of a screen recording. The basic idea here was to merge audio into a silent video. But before that, I also needed to crop the video, extract the funny segment, and re-encode it to be a reasonable size.

Let’s look at these steps individually. However, it’s best to keep re-encoding to a minimum, so we’ll combine all the steps into one command at the end.

Extract audio As a first step, we want to extract audio track from a video obtained with youtube-dl. Since this won’t be the final product, we can be a bit sloppy and re-encode it:

ffmpeg -nostdin -i audio_original.mkv -vn -b:a 128k -y audio_track.m4a

We could also have been careful and extracted the original audio. Running ffprobe audio_original.mkv shows that the audio codec is opus, which is customarily contained in an .oga file. Extracting the original would be:

ffmpeg -nostdin -i audio_original.mkv -vn -c copy -y audio_original.oga

However, we don’t go this route! Our final video will be a .mp4, which in not a well-supported container for Opus audio, so we’ll have to re-encode the audio anyway.

Crop video. The first step is cropping. In this case, the original video was 1440x2960 and I ended up removing the top 540 pixels and the bottom 420 pixels (420=2960-2000-540). Also, it made sense to output a 700x1000 video instead of 1400x2000. This took some trial and error of course, so previewed various options using ffplay. My final preview command looks a lot like the ffmpeg command to encode it:

ffplay -i video_original.mp4 -vf crop=1400:2000:0:540,scale=700:1000
ffmpeg -nostdin -i video_original.mp4 -vf crop=1400:2000:0:540,scale=700:1000 -an -b:v 1M -y video_track.mp4

Trim audio and video samples. Whatever you want to call this operation, it’s pretty easy.

ffmpeg -nostdin -ss 00:03:28.5 -t 00:00:46.5 -i audio_track.m4a -c copy -y audio_sample.m4a
ffmpeg -nostdin -ss 00:03:31.5 -t 00:00:46.5 -i video_track.mp4 -c copy -y video_sample.mp4

Merge audio and video. Finally the merging of audio and video. I had trouble keeping the audio and video in sync but eventually found the +genpts option. Unfortunately, I couldn’t figure out how to use this without re-encoding the video:

ffmpeg -nostdin -i audio_sample.m4a -i video_sample.mp4 -fflags +genpts -acodec copy -b:v 1M -y meme.mp4

As one command

In the steps above, the video is re-encoded twice. To avoid extra artifacts, the final product should be made with one command:

ffmpeg -nostdin \
  -ss 00:03:28.5 -i audio_original.mkv \
  -ss 00:03:31.5 -i video_original.mp4 \
  -t 00:00:46.5 \
  -map 0:a -map 1:v \
  -vf crop=1400:2000:0:540,scale=700:1000 \
  -fflags +genpts \
  -b:a 128k -b:v 1M -y meme.mp4

Instead of video bitrate -b:v 1M, you could play around with the H.264 Constant Rate Factor. For me, the default CRF of 23 yielded a video bitrate of 1236k, but after using slower presets and tuning as an animation, I was able to obtain bitrate of 1008k with a lower (better) CRF of 22. The final command being:

ffmpeg -nostdin \
  -ss 00:03:28.5 -i audio_original.mkv \
  -ss 00:03:31.5 -i video_original.mp4 \
  -t 00:00:46.5 \
  -map 0:a -map 1:v \
  -vf crop=1400:2000:0:540,scale=700:1000 \
  -fflags +genpts \
  -b:a 128k -crf 22 -preset veryslow -tune animation -y meme.mp4