Converting SUB/IDX subtitles to SRT on Debian

Introduction

This page describes how I converted SUB/IDX subtitles in an MKV to SRT format on Debian 12. This procedure does not work! (subtitle synchronisation is very bad) I keep it here just in case I come back to it.

Procedure

  1. Either:
    1. This section is still full of errors, don’t use it … yet!
    2. Create a debootstrap root as follows:
      #  run this as root!
      CHROOT_DIR=$HOME/.cache/vobsub2srt         # or whereever
      apt-get -y install debootstrap
      rm -fr $CHROOT_DIR $CHROOT_DIR.tmp
      mkdir $CHROOT_DIR.tmp                      #  work in a .tmp dir so setup can be made atomic
      # Populate temp chroot
      debootstrap --arch amd64 bullseye $CHROOT_DIR.tmp http://deb.debian.org/debian/ > /dev/null
      mount -t proc proc $CHROOT_DIR.tmp/proc
      mount -t sysfs sysfs $CHROOT_DIR.tmp/sys
      cp /etc/hosts $CHROOT_DIR.tmp/etc/hosts
      rm -f $CHROOT_DIR.tmp/etc/mtab # wasn't there but be sure
      ln -s ../proc/self/mounts $CHROOT_DIR.tmp/etc/mtab
    3. Work in the the chroot:
      chroot $CHROOT_DIR.tmp
    4. Install vobsub2srt as follows:
      apt-get -y install libtesseract4 tesseract-ocr-eng wget ffmpeg
      cd /tmp
      wget https://www.deb-multimedia.org/pool/main/v/vobsub2srt-dmo/vobsub2srt_1.0~pre7+20171219-dmo2_amd64.deb
      dpkg -i --force-depends *.deb
      apt-get -y --fix-broken install
      mkdir -p /srv/tessdata
      cd /srv/tessdata
      wget https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
    5. xxxxx
      umount $CHROOT_DIR.tmp/sys
      umount $CHROOT_DIR.tmp/proc
      # Amend chroot-rdiff-backup source so that it refers to the chroot-python,
      # but in a way that it is callable from outside the chroot.
      sed -i "s@/usr/bin/python3@$CHROOT_DIR/usr/bin/python3.9@" $CHROOT_DIR.tmp/usr/bin/rdiff-backup
      # Move temp chroot to final location.
      mv $CHROOT_DIR.tmp $CHROOT_DIR

    or:

    1. Log in to a Debian 11 system as root.
    2. Follow the instructions in the “Install vobsub2srt as follows” step above.
  2. As a normal user, set some variables:
  3. MOVIE="Big-Bunny.mkv"           #  or whatever, due to vobsub2srt bug be sure there is only one '.' in the name
    TESSDATA=/srv/tessdata          #  or whatever
    export TESSDATA                 #  $TESSDATA will be read by vobsub2srt so must be in enviroment
  4. Verify that the format of the subtitles in the file actually is SUB/IDX and note the subtitles’ stream number, as in this example:
    sugo$ ffmpeg -i "$MOVIE" 2>&1 | grep Subtitle
    Stream #0:2(eng): Subtitle: dvd_subtitle, 720x576
    sugo$

    (dvb_subtitle is what we are looking for. The stream number is 2.)

  5. Accordingly, set another variable:
    STREAMNO=2                     #  or whatever
  6. Extract the subtitle stream:
    mkvextract tracks "$MOVIE" $STREAMNO:"${MOVIE%.mkv}"
    

    (That should produce a .sub and a .idx file.)

  7. Convert the subtitles:
    vobsub2srt --lang en "${MOVIE%.mkv}"

    (That should produce a .srt file. Note that the ffmpeg command above reported the subtitle language in ISO 639-2 format, i.e. eng, whereas the vobsub2srt command wanted the language in ISO 639-1 format, i.e. en).

  8. Remove the original SUB/IDX subtitles from the movie (otherwise mplayer will play the built-in SUB/IDX subtitles):
    ffmpeg -i "$MOVIE" -c:v copy -c:a copy -sn "${MOVIE%.mkv}-nosubs.mkv"
  9. Use mplayer to verify the .srt file works:
    mplayer "${MOVIE%.mkv}-nosubs.mkv" -sub "${MOVIE%.mkv}.srt"
    
  10. If necessary, correct the .srt file. (I use a combination of jubler and aspell to do this.)
  11. Remove the original SUB/IDX subtitles from the movie and add the SRT ones instead:
    
    ffmpeg -i "${MOVIE%.mkv}-nosubs.mkv" -i "${MOVIE%.mkv}.srt" -map 0 -map 1 -c copy -metadata:s:s:0 language=eng "${MOVIE%.mkv}-srtsubs.mkv"

See also