How can you speed up your edits? If you have a lot of dialogue in your production, one way is to use transcription. Jake Carvey details his FCPX workflow on his recent Habitat for Humanity project.
When Habitat For Humanity approached us in September to produce a short promotional video for their Annual Construction Innovation Awards, we jumped at the chance. I knew immediately that the project would be a great opportunity for upgrading our multi-cam event production workflow, including integration of audio transcription into our post-production process.
The project came to us via recommendation after a pro-bono project for another non-profit. While the budget was small enough to present certain challenges, we were confident in our ability to deliver with a minimal 3-person crew to hit the budget while still maintaining professional standards.
The brief originally called for a single promotional video, based on 1 full day of shooting, including (3) 45-minute presentation sessions, (8-10) on-camera interviews, and B-Roll of all activities through the day, from staff prep to guest arrival to cocktail party and awards presentation.
As the pre-production progressed, we were asked to expand the delivery to also include a “next-day” edit: a 2 minute sizzle reel to be played back the next evening at the closing gala dinner.
From the beginning, I knew that transcription would be an important part of our workflow. The client had expressed interest early on to assist us in choosing sound bytes and video clips. Because the tight knit client team had worked for weeks coaching the presenters in completing and polished their proposal pitches, they knew the material inside and out, in a way that would have been difficult for our team to quickly absorb.
The plan was to deliver a text transcript of the relevant content late that same night. This way the team would have it first thing in the morning, and could quickly assist us with selecting sound bytes, even before they would have access to the video dailies. We also wanted to be able to merge that transcript in FCPX into our multicams, for searchability, and eventual captioning.
My editing background over time has been continuous over the years, mostly in support of my other career roles: Animation Director for tv and broadcast, VFX for TV and film, Lead Artist for video games, and recently, independent Director/Producer on a wide assortment of media and film projects. My software and post-production systems experience has spanned the gamut from ¾“ offline editing, Panasonic M2 and then DigiBeta online, and NLEs from Matrox to Avid to (early) Premiere to Final Cut 7 to Sony Vegas, and Premiere Pro, and recently Resolve or Apple’s Final Cut Pro X, which I use much more frequently now, as 70-80% of my projects are working with teams already heavily invested in FCP X on MacOS. .
Coremelt’s Scribeomatic transcription software is a solution I’d already been using for some time, having participated in alpha and beta testing. I have leveraged it for some shorter projects, including multicam interviews. This project was a great opportunity to put the Scribeomatic integration through its paces on a more complex project, with longer clips than we commonly handle.
My monthly work schedule throws me into many different types of projects and roles, so I frequently shoot with very different gear packages. It’s a fun challenge, and one that ensures I don’t easily get too comfortable.
Here in Bangkok, the film and media production industry tends to be fairly small, with lots of overlap, so most people who have been working here for more than a few years, generally end up knowing each other, at least by reputation and name.
This has an upside and a downside. Producers, shooters and editors often compete for the same jobs, in a market known for trying to pit producers against one another purely to get the best price. There is also a huge talent pool of “preditors”, many or most of whom work “under the table”, outside of the legitimate business environment. There is also a disparity between rates for local talent and foreigners, which can cause issues as well. But the community also mostly tends to play fair and do our best to help each other out whenever possible, especially those of us with longer-standing, professional ties to this country. We know enough shooters and editors, that even if one person is booked on another job, they can generally vouch for someone else, and cut through a lot of negotiation and “getting to know ya”.
For me, while I am not afraid of one-man, multicam scenarios, I far prefer to have a team on hand, and in this case, 3 of us for a 2-camera shoot was viable, although having a dedicated sound recordist would have been ideal. Our loadout on this project was planned to allow us to travel with only what each of us could carry. Our Camera package was Panasonic GH5 and GX85 with f4 Canon glass, (70-250mm and 17-40mm). For support, we each had sticks, and the GH5 was on a Fieyu Tech a1000 gimbal for B roll, with quick release plate underneath for dropping back onto the tripod at any time.
For sound, we used a Tascam DR-70D recorder with both shotgun and lav mics for interviews, and a Tascam DR-05 recording the mixed audio out from the stage sound, both recording WAV files at 24 bits.
For presentation lighting, we were more or less at the mercy of the house and stage lights, with a predictably nauseating mix of unfriendly, dirty color temps, a common constant in corporate event filming. To give some hope of clean skin tones, I added a single Godox SL-60 lamp just off the stage, cutting the light from the screen with a set of compact, Bowens mount barndoors. (While nothing beats the quality of light from real tungsten fixtures, I love these new color-correct, powerful LED spotlights from Aputure, Godox etc, especially with such a wide range of Bowen mount fixtures and modifiers available). This serves to give at least one light source with clean, accurate white balance, which helps tremendously with saving skin tones from getting all mushy and nasty.
We filmed interviews after the main presentations. For those, I threw a big softbox on the Godox to get some more flattering light on the interviewees’ faces, and let the background windows go wild. We used a clapboard for each take, in case there were any problems with (automatic) audio synchronization in post. I feel it also helps to clarify things for both crew and talent, to focus attention on the task and subject at hand.
Dailies / Rushes
We use several different ingest systems, but regardless of which we use on each project, we find that adding specific information into the filenames is extremely helpful for managing files, making them human readable, and maximizing search indexing methods across OS, editing, and effects software. We preface each filename with natural sorting format ( YYYYMMDD “20190916” ), then the project name, camera model and role / number, followed by the original file number. The resulting filenames look something like: “20190921_HFH_CamA_GH5_P1044056.MP4”
Unfortunately in this scenario, we ended up resorting to straight duplications of the SD cards on our first offload. The upside was that both myself and the B Cam operator (who was to function as Assistant Editor on the upcoming promo video) each walked away from production with a full set of footage and audio clips.
(Right click for larger image)
Back at my studio, I immediately offloaded simultaneously to my dedicated project SSD and my central backup.
Their really was no time for the (still extremely busy) client team to review the dailies for the next day edit. My focus after offloading the gear was getting the files and project prepped for edit session later in the morning, and sending off the interview transcripts so the client team would have them first thing. I also mocked up some quick lower thirds and main titles, using the client-provided brand identity guidelines.
For our looming next day edit, which would be due in 18 hours after our shoot ended, we knew that our transcripts were going to be key in helping us and the client team in quickly finding the bits and pieces we wanted to pull from, especially as I had needed to direct the interviews and present the questions without a client producer on hand. (They did provide specific written questions and minimal background information on the interviewees).
I spent some time organizing and keywording to ensure all footage was clearly tagged with the camera name.
Transcription was strategized from the beginning as an important client collaboration tool, and as I needed to create multicam clips with the interview video and audio anyway, I knew it was worth the time to further organize and carefully keyword our interview media clips, as well as the longer presentation clips, as we would be relying on them for the duration of the project. I also knew that I should plan for the potential of expanding the scope to include additional deliverables, including special formats and edits for social media.
Cost and turnaround time contributed significantly to the decision to use an automated, cloud-based transcription service such as Scribeomatic. I had 3.5 hours of audio to transcribe in a very fast turnaround (Ideally within a couple hours at most), and the usage was mostly for fast review of the subject matter. In my research, human based transcription services tend to cost $1.00 / audio minute with average delivery times of 3 days or so. Rush can go as high as $2.50 / minute, and still tends to require a 12 hour turnaround.
Cost for transcribing our roughly 225 minutes with Scribeomatic on “fast” setting was around $45 USD. By comparison, a human transcript would have been approx $560, at rush rates. We also didn’t need a 100% accurate transcript with multiple approvals.
Having used and tested Scribematic before, I was confident it would perform as I expected. The transcription process with Scribeomatic is quite flexible, including multiple methods for importing media and timelines, as well as exporting various formats to disk, and round-tripping seamlessly to Final Cut Pro X.
In FCPX, I started by double checking that all the footage was sorted correctly into one event for the presentations, and one event dedicated to interviews. I continued to refine the keywording, focusing within the Interviews - Multicam event. As I proceeded, I was double checking the client’s event outline to ensure names and titles were also correctly assigned, and then adding those names and titles to a temporary project timeline for safekeeping.
Once I was confident these were adequately organized, I dragged the Interview event into the Scribeomatic app workspace, saved it as an SCM workspace file, and was ready to transcribe
I knew that either audio track would be fine for transcribing, but as the shotgun mic (track S2) seemed to have a bit more volume and definition, I selected all the S2 tracks, checked my language and accuracy settings, and pushed TRANSCRIBE to set it all in motion.
While waiting for the transcription to finish, (30 minutes or so) I re-checked that the presentation clips were ready, and did a quick sync test in pluraleyes (although I was already sure I would just use FCPX built-in audio sync for the interviews).
When finished, I roughly reviewed the transcript just to ensure there was enough readable transcription to be of some help, but didn’t allow myself to get sucked into manically editing the output. We weren’t expecting anything close to perfect results given the wide range of American, European and South Asian accents, we just needed a scannable overview for the teams.
From Scribeomatic, I exported the results as both PDF and CSV files, uploaded the CSV files to Google Sheets, then sent the Google link and the PDFs via email and IM to the client. This provided a useful rough tool, but also proof of concept, and a way to establish communication with the client first thing in the morning. This put the playbook in their hands early, while we could get some rest in preparation to run with the ball through the day.
With the Interview transcripts sent off to the client, I sent the transcription back to Final Cut Pro from Scribeomatic choosing “Send to Final Cut Pro”, with both keyword ranges and markers. (I also exported FCPXML to disk with subtitles in case I wanted to include them later.)
While not generally necessary, I usually send my results back to a new fresh library, so I can double check them before integrating into my edit. In many situations, it is much more streamlined to simply replace the event media when prompted, but I tend to be overly cautious when working with new software and workflows, especially under tight deadlines. In this case, since we hadn’t yet done any major work - I chose to “replace” the media when prompted.
With my Scribeomatic Interview transcription results all deployed, I sent the Presentations event off to scribeomatic, selected the stereo audio tracks, and started the transcription of the 2-3 hours of presentations and awards across 4 separate sessions.
I let this to work overnight (best I can figure, took a little under 1.5 hours), and headed off for some shuteye, and brain cell regeneration.
Post-Production: Next Day Edit
Our first deliverable was an approximately 2 minute long recap / sizzle reel, due 18 hours after wrapping production. This “next day” edit was to be played back during a gala dinner celebrating the winners and presenting the goals of the organization for the next year.
As we reviewed the footage and the transcripts in the morning, I did a very quick metadata pass as we went, marking favorites and adding additional keywords with names of interviewees, key points, company names, affiliations, etc. Additionally, we use keyword “folders” and Smart Folder for additional organization (Subject, Synch Clips, Multicam Clips, B roll). We often use FCPx Events more like traditional “bins”, to provide additional organization on projects, and because keyword collections cannot be exported directly as FCPXML. Using events as our basic organization allows for more direct communication back and forth between Scribeomatic, as Scribeomatic remembers the original event name and can merge, duplicate, or replace media in those events.
If we had been shooting multiple days, or for the same client across multiple events, but under the same working project, we would have added the date as a prefix to each event, with natural sorting format ( “20190916_” ), so these events would sort naturally using built in tools. In this case, this specific project was a single shoot day, so there was no need for the additional detail.
Since the clock was ticking towards delivery (and presumably, a hurricane of revisions), I knew I needed to be aggressive on format and clip selection, and while I needed to invest the right amount of time. In the end I chose to only use wild audio for the presentation sound bites, while the Interview clips received the full multicam treatment and more carefully edited audio.
The client also had strict branding guides, but luckily a dedicated client staff member who could focus specifically on oversight for those key branding points, while others could focus more on pacing and content.
My first step was to pick several sound bites from the interviews to set the tone for the B Roll selections, and start finding away to keep the editing well paced, and deliver a bit of anticiation and follow through. Although it was fairly clear early on that the client was going to prefer a linear, chronological progression for the most part, this also helped to choose which visuals we would likely be focusing on, as well as which sound bites might work best from the presentations and judges critiques.
First I needed to create the multicams for the Interviews, which went surprisingly quickly and easily, although I did need to finesse the order and naming of the tracks for consistency across the clips as I proceeded. Having the keyword collections was very helpful in organizing the clips and audio tracks. As it allowed me to quickly jump around among clips, timelines and multicam clips. (something otherwise often onerous in FCPX dues to no tabbed interface). Taking the time in advance to set these things up well goes a long way towards easing that pain.
I selected the (2) audio tracks and (2) video tracks for each interview, in turn. In one of the interviews we had a brief battery outage, so that interview had an additional clip from the A Cam. Choosing sync by audio and double checking the specifications, the results was very quick, and simple. If there were a lot more interview sessions, or lots of camera stops and starts, I would have been more likely to set up the sync in Pluraleyes instead, but as it was all went very smoothly and gave great results.
I made sure to go back quickly through each multicam clip, making sure all the tracks were arranged consistently from top to bottom (Cam A, Cam B, Audio 1, Audio 2) and that audio roles were assigned. (We use a custom “Camera-Audio” role for all our wild audio from the cameras, and standard “Dialogue”).
One of the cool things about the transcription was that the markers carried anytime I needed to find a phrase, I could just open the multicam to search the index, copy the timecode at the spot, and quickly find it in the browser or temp project timeline.
A very quick adjustment to set the audio levels and reduce background noise, and a very rough color pass using a combination of Final Cut’s Color Boards and Coremelt Chromatic, and I was ready to start editing like a furious Banshee.
One thing to keep in mind about Multicam clips, is that unlike their predecessor, the Synch clips, effects and adjustments applied to the clips INSIDE the multicam, ripple through wherever that clip is used in a timeline, and are updated anytime the clips inside the Multicam are updated (which is AWESOME and quite powerful, sort of like Master Clips in Premiere ). This DOES NOT however always apply the same way to audio / video roles and track enable/disable toggles - it is always important to set these on the OUTSIDE of the clip as well (the “browser” instance of the clip) and double checked on all timelines to ensure those settings are passed through correctly. This bit us a couple times, as it doesn’t seems to be always consistent, and the toggles seem to revert sometimes when closing and reloading a library. Luckily only happened a couple times during the rough cut. By the time I hit the final cut stage, we had it pretty well figured out.
The edit proceeded very quickly roughing in a very fast assembly edit to get all the major features included - temp music layed down (constant adjustment of audio keyframes was a massive pain in the backside - what I wouldn’t give for automatic ducking in FCPX, or even just an audio “adjustment layer” concept).
We used frame.io continuously to present edit revisions, although the client preferred to provide feedback through their familiar communication channels of LINE, WhatsApp and email, referencing timecode from the frame.io deliveries.
At one point, while waiting on feedback from the first rough cut delivery, I quickly made a new timeline from each of the interview multicams, dropped a timecode effect on it (“Source Timecode” from Coremelt, I believe), and exported and uploaded all of them as 720p clips so the client could have an additional reference - this helped us locate several new soundbytes which we were able to very quickly chop up and add to the edit.
The rest was just lots of calculated, iterative strategic hard work, sending off revisions at each major stepping stone to ensure the client had the latest version on hand in case something went horribly wrong at some point. (Have been down this road more than a few times, and around the corner on a few. It pays to be safe, and to stay fluid, light on the toes).
We got through it, delivered all but the most last minute, esoteric notes, and it went from net to screen without a hitch.
Phew. The rest would take a lot longer, but be much easier.
Post-Production: Promo/Overview Edit
The promotional edit was to be less than 3 minutes in total running time, and yet including a thorough overview of the event and participants. Its purpose was to be used for ongoing publicity and marketing efforts to reach and inspire participants, donors, sponsors and volunteers.
There was a lot of downtime, as client staff recovered from weeks of intense planning and execution, and took staggered holidays. Having the Scribeomatic transcription spreadsheets, and the time-code window burns accessible to client teams, was extremely helpful in reducing confusion, and always bringing creative decisions back on track.
Our first order of business was to finally correctly sync the multicam clips for the presentations. As always with FCP X, I manually selected the specific clips in sets so that FCPX didn’t have to think too hard about which clips to include and which to ignore. As we had already keyworded the interviews by the interviewees names, that was dead simple and the multicam syncs went perfectly. Pluraleyes does a much better job at “dump and sync” - but even pluraleyes workflow benefits tremendously by well tagged and keyworded media, and some manual sorting and prioritizing.
Other than some long gaps of communication during holiday and travel periods, the Promo edit proceeded professionally, along fairly standard, corporate lines, with a few levels of approval and interacting with various members of staff sharing duties while others were absent.
There were a few points, where the client really wanted to include some very specific information remembered from the event. This was another point at which the transcriptions proved invaluable, as I was always able to point them back to the transcriptions, which they could then backtrack to the footage, to see if they could find the exact bit they wanted. Genius!
As with most projects, things don’t always move in predictable ways, and a big part of wearing the producer’s hat, means sometimes we gotta tug that brim, rub our chin, and quickly come up with creative solutions. Sometimes this results in new techniques that make it into our permanent toolbox. Sometimes it creates a ripple down effect that take some good hard elbow grease to work through. There were a few things I learned, and a few things we will certainly consider doing differently in the future.
- Stronger media management. Always have spare hard drives / SSDs on hand and a fast laptop or dedicated media copy device that can ideally handle multiple, simultaneous backups, with the fastest possible transfer speeds.
- Delivered a full complement of dailies online via frame.io or other.
- Record director’s questions via a dedicated microphone, or even via smartphone headset
- Pre-synchronize the multicams before transcoding if at all possible, to ensure timecode stays consistent from beginning. (explain reel offsets, sound and film speed)
- Fully used pluraleyes earlier on to sync B Cam footage to the presentations for proper multicam clips and cutaways
All in all a good result - we had an aggressive first delivery, and a long, drawn out second delivery - two of the most challenging client management situations in my experience. We learned a couple new lessons and found some new workflows and tools, which will help streamline future projects even more.
Leveraging transcription via Scribeomatic was definitely a great decision, and served us well all through the process. I was very happy with how smoothly our multicam synchronization went, and how Scribeomatic transcriptions assisted our client relations from the moment production was wrapped and we moved immediately into the next day edit, and preparing for the secondary deliveries.
In the end, there is often just no substitute for hard, manual work, but whenever our tech can step in and take away some of the drudgery, contribute more accuracy, and give our workflow and creativity a boost, then it is a good day on the battlefield, indeed.
Jake Carvey is an independent media producer, animator, and designer, obsessed with documenting, archiving and sharing his travels around the world. His career spans 25 years of film and video production, game development, and computer animation. You can find him on Twitter @jakecarvey or on his website.