Bad Apple!! AI color generated with Stable Diffusion and ControlNet

Описание к видео Bad Apple!! AI color generated with Stable Diffusion and ControlNet

I am not the first person to run Bad Apple!! through Stable Diffusion, but as soon as I saw other similar videos, I wanted to try this myself.

For those into Stable Diffusion or want to get into it, here is my workflow:
I am using Automatic1111 for the web interface, using the img2img tab with the Batch sub-tab. In this case I am not using Deforum. At the time I started developing this video, ControlNet was very new and there was some development instability between ControlNet and Deforum. It's been a month since then, and I see other people making videos with Deforum, so the issues that I experienced have most likely been fixed at this point.

Since I am using img2img batch processing, it's a 1 for 1 image replacement. That's why this video is very flashy/flickery. I tried to focus on the color generation and having the Touhou characters have backgrounds that would make the most sense for where you might find them.

First step was to split the original Bad Apple!! video into individual png files. I got the original video mp4 file here: https://archive.org/details/TouhouBad...

I then used Blender to extract the images. I imported the video into the Video Sequencer and then rendered the animation with a File Format of PNG. This creates pictures into a folder with files likes 0001.png, 0002.png, 000N.png.

Now we create a scene, which is a string of common images. It's an art in itself on picking the correct images for a scene. But for each scene, we create an input and output directory. So I did something like:
scene_1_input_reimu_dancing/
scene_1_output_reimu_dancing/

In the input directory, I then copied just the files that I wanted for the scene from the complete extraction folder into the scene input directory.

Now we can start generating the actual color images. You will need to get the ControlNet extension for Automatic1111 to make this work. The mistake I made at first was that I thought I could just run the original Bad Apple!! video through Stable Diffusion and it would just automatically apply it's pictures to it. But Stable Diffusion doesn't work this way. Instead we use ControlNet, and set the Preprocessor to "canny" and the Model to "control_v11p_sd15_canny" (this version will certainly change over time, just make sure it has "canny" in it). This will then generate black and white images in a format that Stable Diffusion can understand. Once you click generate, Stable Diffusion will create both the Canny edge detection images and your final image together.

The last remaining infrastructure piece is to get the model I used for generating the images. I used Orange Mixs A0M3A1B:
https://huggingface.co/WarriorMama777...

This model has a lot of the Touhou characters, which is why I used it.

Now with the workflow mechanics worked out, we can set the values as desired. The positive prompts are what you want to see, and the negative prompts are what you don't want to see. There are a lot of other resources on prompt generation, so I won't go into that here.

The other values I then tweaked were:
Sampling steps: I start at 20 for figuring out what I want a scene to look like. I keep generating, deleting images, re-tweaking other values until I get something that looks close to what I want. Then I crank the Sampling steps to 150 (which is the current max) for best image generation. This will take a while to render.

I then would adjust Control Weight and CFG Scale. It takes a while to figure out which values work best for a particular scene. Settings that made sense in one scene may not work for another similar scene, so it's just a matter of playing with them to get what works best.

Sometimes I played with the seed if I didn't like what it was generating. By default it does -1 (random). Once I found a seed I liked I would keep using it until I didn't like it, and then go back to -1 until I found something else that was good.

I usually left the Denoising strength at 1. I think a couple times it made sense to turn it down but in general I found I liked having it at 1.

I used two systems for rendering the images, one with an RTX 2060 Super and one with a GTX 1080 Ti. The 2060 Super was significantly faster as it has Tensor cores, and the 1080 Ti does not. But the 1080 Ti still worked, so I used it for small scenes to help decrease the overall development time.

I then created another Blender instance, went into the Video Sequencer, and then added the Image/Sequence of each scene to produce the final video. Then I added the watermark with Cyberlink PowerDirector 365.

Bad Apple!! feat.nomico originally from Masayoshi Minoshima / ALSTROEMERIA RECORDS:
   • [Alstroemeria Records]Bad Apple!! fea...  

Original song and characters created by Zun.

Комментарии

Информация по комментариям в разработке