Download Area

Home > AI Video Generators

Make-A-Video - Pytorch (wip) (free) Download Full | **UPDATE

Implementation of Make-A-Video, new SOTA text to video generator - Make-A-Video - Pytorch (wip)

Make-A-Video - Pytorch (wip) (free) Download Full | **UPDATE

Published Date: 2024-04-15

Make-A-Video - Pytorch (wip) Free Download

Introducing Make-A-Video - Pytorch (WiP), the cutting-edge video generation tool that empowers you to create stunning visuals with ease. Harnessing the power of deep learning, this open-source platform enables you to turn your imagination into captivating videos. Its user-friendly interface makes it accessible to creators of all levels, empowering them to explore the realm of video production with unprecedented freedom.

Make-A-Video - Pytorch (WiP) seamlessly integrates with PyTorch, a popular deep learning framework, providing you with a robust foundation for video generation. It offers a comprehensive suite of features, including text-to-video generation, video interpolation, and video editing, giving you the tools to bring your creative visions to life. Join the growing community of creators embracing Make-A-Video - Pytorch (WiP) to unlock the full potential of video generation and push the boundaries of visual storytelling.


Make-A-Video - Pytorch (wip): Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch. They combine pseudo-3d convolutions (axial convolutions) and temporal attention and show much better temporal fusion. The pseudo-3d convolutions isn't a new concept. It has been explored before in other contexts, say for protein contact prediction as "dimensional hybrid residual networks". The gist of the paper comes down to, take a SOTA text-to-image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for attention across time and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out. Passing in images (if one were to pretrain on images first), both temporal convolution and attention will be automatically skipped. In other words, you can use this straightforwardly in your 2d Unet and then port it over to a 3d Unet once that phase of the training is done.