UPD : Final fork available at : whisper.cpp
Objectives
- Transcribe a part, or full you tube video
- Least amount of downloading
- Translation, if required
- CLI
11:21:50 : Found this. Claims to download the video, transcribe and translate it
- Seems legit.
- TO make it work for a portion of input
- Avoid full download, i.e, only download the required portion
11:42:33 : Apparantely if you give ffmpeg a video source it can download only the specified portion
org_url="$(yt-dlp -g "${source_url}")"
vid_aud_url=(${org_url//'\n'/ }) # similar to python split finctionality to separate on delimeter \n
video_url="${vid_aud_url[0]}"
audio_url="${vid_aud_url[1]}"
ffmpeg -ss "${start_point}" -i "${video_url}" -ss "${start_point}" -i "${audio_url}" -map 0:v -map 1:a -t "${duration}" -c:v libx264 -c:a aac "${temp_dir}/vod.mp4"
11:52:58 : Need some bash scripting to
- Get both audio and video stream seprately from yt-dlp
- Use -1 for downloading full video
11:59:34 : Forking whisper.cpp is the best idea
- Setup small model
- keep results in res folder
- WO : Getting with a single res.mp4
12:29:34 : Final Pipeline
- Give a youtube url with starting point and duration you want to clip
- yt-dlp fetches video and audio streams.
- ffmpeg downloads part of the video
- whisper cli tool transcribes and if needed translates to english and store it as .srt file
- Use ffmped to embed this srt file to the video stream
- cleanup the residual files