To build the fastest video editing experience on the web, Runway combines a unique real-time and cloud-based machine learning inference pipeline with traditional graphics rendering techniques. This allows Runway's video editor the ability to automate the mundane and time-consuming aspects of video editing (like replacing the background in any video with Green Screen or removing anything from a video with Inpainting) and at the same time, perform different effects and filters using derived video bands generated by the machine learning models running behind the scenes. In particular, video depth-map and optical flow estimation. In this post, we will do a deep dive on how these bands are fed to the real-time shaders responsible for effects and filters. We will dissect the pipeline with practical and editable code examples to understand what’s behind the scenes of some of Runway’s most powerful features. There’s a lot to cover, so let’s get started!
This is an interactive article with editable code sandboxes.
High definition film and video is usually above 24 frames per second but for demo purposes, we will analyze a single frame of a video. Every time a video is uploaded to Runway, an array of machine learning models are used in real-time to estimate, among other things, a temporary consistent depth channel (A Depth Map Band) and analyze the motion of a video (Optical Flow Band). Once generated, these unique bands can be exported as single layers or used as the base to create and transform the content of a video.
With Depth Maps and Optical Flow bands generated automatically, shader operations can be applied at a pixel level using those bands as guides to create color adjustments, blurs, distortions, etc.
Note: Checkout The Book of Shaders for deep-dive into the world of shaders and how they work.
The following code in the sandbox below illustrates how the estimated depth can be used to create the illusion of depth of field. Depth of Field is a technical artifact, inherent in analog photography, where the lenses and shutter aperture create a region of an image to appear sharp (focus point) while the rest (behind and forward) appear blurred or out of focus. With digital cameras, cinematographers seek to use or imitate this artifact as a composition technique. It helps to create a sense of volume and/or intimacy to a scene. For our example we will take a naive approach to it by applying blur to the pixels at a specific distance from the camera using the ML-estimated depth.
This above example consists only of a naive and simple approach on further articles we will improve in this technique to address some issues like the foreground elements not properly blurring with the background (as it should) but sharply ending at the contour of the silhouettes.
When the user selects an object through Green Screen or a similar Runway magic tools, Runway can perform new types of operations like creating a segmentation mask, an SDF of it, and/or an inpainted version of the selection.
The sandbox below is an example of how three segmented elements create different bands (mask, sdf and inpanted) that could be used for different purposes. The Mask band is a black/white texture, allowing each pixel on the frame to know if it’s inside or outside the segmented area. The SDF, whic stands for Signed Distance Field, is used to inform each pixel how far away it is from the mask edge. Finally the inpainted texture provides a synthetically hallucinated image of the original filter without the segmented object.
Finally here is an empty example for you to have fun with. It has all the above bands, plus the optical flow which we will cover in depth in a following article.
This is it for now. Hopefully this gives you an idea of the power of our engine and triggers your imagination on what could be done with it. In future articles, we will discuss further some of the more complex bands like optical flow and dissect more advanced filters.