Adding captions to videos is a tedious process, let alone adding multiple languages! Oliver Peters takes a look at the online transcription & translation service Simon Says,
Whether you need to document an interview or prepare captioning for a master, transcriptions have become a vital aspect of post-production. It's also an area that has greatly benefited from improvements in artificial intelligence. Numerous services offer both AI-based speech-to-text and human-assisted transcribing services. As the technology has advanced, AI-based services have greatly improved their accuracy, coming close to human transcription.
I've tested and used a number of services, but have been looking at Simon Says Transcription, which has been written about before here at FCP.co. Simon Says already supported a range of NLEs and DAWs, but recently added support for DaVinci Resolve. Depending on your workflow and editing system, you can use Simon Says' services through their website, through the Mac desktop app (a free download from the app store), via the Final Cut Pro extension, or Resolve scripting.
The desktop app is Mac-only, but Windows editors can still benefit from Simon Says by working through the website. Moving from one to another is seamless, as the various Simon Says tools cover the same functions and your projects are editable from any of them. When you install the Mac desktop app, you can then also install the FCP extension and the Resolve scripts.
Accuracy is key
AI speech-to-text requires an internet connection, because the necessary resources cannot be supported locally on your computer. (Simon Says does offer custom on-premises implementations for enterprise users.) This will also be true of the upcoming Adobe speech-to-text feature for Premiere Pro. The good news is that in my tests, Simon Says was lightning fast, taking only a few minutes for about 45 minutes of content. By default only audio is uploaded, unless you opt to include proxy video. For best results, it is recommended to leave the audio raw, instead of pre-filtering it with a noise reduction plug-in.
I uploaded three 15-minute interview clips, plus a five-minute edited program. Male and female English speakers, plus a French Canadian male speaker whose English was heavily accented. I was pleasantly surprised at the accuracy of the transcription, which required very little editing, including with the accent. The usual issues are there, such as when the interviewer interrupts or overlaps the interviewee. Simon Says Transcription has good context filtering, generally leaving out common speaking crutches, such as the frequent use of "um" and "ah." You won't have to manually edit those out later.
Speech-to-text algorithms are based on breaking down the audio to its component phonemes. These are abstractions of the sound, which are combined by humans to create spoken words and sentences. It is said that 44 phonemes are used to speak the English language. This results in common phonetic mistakes, such as the speaker saying "segue," but it becomes transcribed as "Segway." You'll encounter those issues with any human or automatic transcription. Nevertheless, the Simon Says results were better than any previous AI-based systems I've used or tested.
As with any of these services, there's a robust online text editor where you can make live changes like in a Google doc. The text in the editor will run in sync with the uploaded audio. Changes include text corrections, adding markers and notations, designating speakers, and altering the start timecode.
Transcriptions, notes, captions, and subtitles
There are several common transcription services that Simon Says can provide. Sometimes, you simply need to generate a written document, which a producer or story editor may use to select possible soundbites for the editor to use. Upload a clip to transcode, edit the result through the online text editor, and export a Word document - with or without timecode. Creative editors often want ranged-based notes or markers for the raw footage in the bin. This aids in being able to build the program. Export the format specific to your NLE and you are good to go.
If you don't deal with captions now, you soon will. Broadcasters and streaming networks in many countries require caption files to comply with accessibility laws. You may be tasked to create either subtitles (open captions) or closed captions for your timeline. Simon Says has tailored its steps for each of these needs based on the top NLEs. For example, Final Cut Pro users can use the extension, whereas Resolve editors would use scripting. Avid editors can use the desktop app in conjunction with the website.
Final Cut Pro and DaVinci Resolve integration
Final Cut Pro editors have the easiest workflow, thanks to the extension panel. It's a simple integration linking Final Cut with the Simon Says desktop app. If you want a range-based transcription of your raw footage, simply drag any event that contains clips to the extension window. The clips are automatically uploaded and transcribed. You can either edit these online first and then download - or download right away if you feel the transcription is close enough. Remember this is for the purpose of finding and reviewing edit choices, so a perfect transcription usually isn't essential.
Drag the purple icon in the extension panel back to the event to populate it with clips that have been organized by keyword ranges. Text for each range appears in the notes column and clicking between ranges will then advance the clip accordingly. Unfortunately FCP does not enable searching the text within the notes column, so you can't find a certain word or phrase using the Find command.
To generate titles or closed captions, upload an event containing the project and transcribe. Next, move to the Simon Says Visual Subtitle Editor and set the formatting parameters. Then, from the export menu, select either titles or captions.
For FCP titles, drag the purple icon to an event, which in turn will create a project with open captions, i.e. subtitles. If you opt for closed captions, then this option will download an SRT caption file. You can import the SRT file straight into Final Cut Pro and other NLEs, as well. If you prefer an SCC caption file, then various apps can be used to convert the SRT into an SCC file. Once inside Final Cut, adjust the caption text, style, and timing as needed.
The newest from Simon Says Transcription is integration with DaVinci Resolve. Instead of an extension, Resolve uses Python scripting. These scripts only work with Resolve downloaded from the Blackmagic Design website or via a reseller. They do not work with the App Store version of Resolve. Due to sandboxing, the pertinent directories are missing. There are three scripts available from the workspace pulldown menu. The first script sends clips to Simon Says for transcription. The second script applies markers and text back to the clips after transcriptions. The third script adds subtitles to the timeline.
There's language translation, too!
No more need to copy-and-paste text into Google Translate. Let's say you have a Spanish speaker and need English subtitles. Step one is to upload and transcribe the clip in its original language - in this case, Spanish. (Simon Says Transcription currently supports 100 languages.) Next, edit the transcription. Since you can invite others to share a Simon Says project, it's possible to have a native Spanish speaker log in and edit any needed corrections.
Once the Spanish text corrections are complete, move to the export menu and select translation for the rest of the process. In this example, you would be translating from Spanish to English. Simon Says creates a separate internal project with the translated files. Then export or download the desired file formats.
Translation can be tricky since many languages change the phrase or word order within the sentence structure compared with another language. I speak German and English so I ran a short test. The translated German (from the transcribed English) was reasonably good. A cool feature is that when I downloaded the Word document, the original English is noted to the side as comments. That's quite handy.
Simon Says is definitely a huge time saver for any type of transcription need. Get the app and set up an account for free. Payment is based only on transcription time, with several annual and monthly plans. There's also a "pay as you go" account at $15/hour. Since you are only billed for the length of the media that is being transcribed, you won't get dinged if you need to download various files from the same transcription.
Simon Says Transcription offers short workflow tutorials for every system it supports. The desktop app also includes numerous demo projects when you log in. These are transcribed versions of each of these tutorials. So getting started couldn't be easier.