Revolutionary AI Short Video Workflow with N8N

Hello everyone! Today, we’re bringing you a revolutionary N8N automation workflow for creating viral short videos on platforms like Douyin and TikTok. No shooting, no editing – just one click to submit your topic, and this workflow can generate viral short videos in thousands of styles.
This workflow supports 24/7 unsupervised operation, covering the entire process from intelligent creative scriptwriting and batch AI image generation to dynamic video creation and automatic background music and beautification. It automates everything from zero to the final product. The finished videos are automatically saved to your Notion knowledge base, allowing you to truly batch-produce viral short videos effortlessly.
Workflow Test Results: Examples in Action
Let’s take a look at the test results achieved by this N8N video automation workflow. Click on “N8N Video Automation Workflow Test” and scroll down. We’ll examine the effects achieved by this workflow across three scenarios today.
Scenario 1: Realistic Scenes (Inspired by BBC Documentaries)
For the first scenario, we chose a realistic setting. We instantly mimicked the style of a BBC “Planet Earth” or “Animal World” documentary to create an animated short.
Looking at this 25-second video, you’ll find it almost impossible to discern with the naked eye that it’s AI-generated. The voiceover is also perfectly matched, achieving that slow, immersive “Animal World” presentation with a BBC-like vocal quality. This is a horizontal video.
Let’s look at the vertical video. We might notice a small flaw here, but overall, the quality is remarkably high.
Notice how the sound gradually fades out at the end. All my tests involve shorter videos, around 20 seconds, as this is the typical length for current short videos. This is a realistic example; let’s look at some of its parameters. We generated a scene introducing African savanna lions, mimicking a BBC “African Animal World” documentary. The visuals are in a documentary style, showcasing the raw beauty of the “king of the beasts,” the lion. This was the prompt for this set.
The main model used today is the Flacks model. For Lora, we selected a highly effective F1 model in a documentary style, created by an acclaimed author. The quality is exceptionally high. When used for image generation, as you can see, the resulting images are entirely indistinguishable from real photographs. This is the quality it achieves. Based on these high-quality Lora models, we can generate tens of thousands of different short video styles. This is just the first example; we need to configure various content types better according to our actual needs. This is the first type.
Background Music:
https://pub-b65afb21c951453a872a026d19411abe.r2.dev/dongwu.MP3
Trigger Words
Video for Analysis:
https://pub-b65afb21c951453a872a026d19411abe.r2.dev/dongwushijie.mp4
Example Prompt for the Model:
Film Photography, A tree on the wasteland, half cut down and only a section of it, stood a crow, and there were many crows hovering in the sky in the distance.
Scenario 2: Animated/Lego Style (A Mini-Story)
The second scenario explores an animated or Lego-style setting. This actually comprises three short videos that form a small story. Let’s take a look.
Video 1: The Beginning of a Love Story
The first video depicts a beautiful love story of an urban couple, from their first meeting to getting to know each other, falling in love, and eventually accompanying each other. This was my input prompt for it.
You can see it’s a 9:16 vertical video, and the overall quality is very high. We find that the character restoration and movements are excellent, especially the ambiance. As you watch, even with sound on, you’ll notice the rainy atmosphere. Look at the dynamic emotions exchanged between our two protagonists: the man talks animatedly while the woman quietly eats. The mood displayed here is something I truly appreciate; I feel these short videos are very immersive and have a healing quality. Notice the woman has her eyes closed, and the man caresses the scene, all while it’s raining outside. The little cat’s movements are quite leisurely, with its tail wagging. So, I believe this short video is of very high quality. This is the first segment.
Video 2: Family Life and Growth
Since they met and fell in love, this segment covers the温馨场景 of a young couple getting married, getting pregnant, having a child, and the child growing up. This showcases a wonderful love story, now evolving into the warm scenes of starting a family. I chose videos around 20 seconds for this. The audio also has a gradual fade-in and fade-out effect.
If we look closely at this short video, I think the generated atmosphere is also quite good. We can watch it frame by frame; it’s still raining outside, the cat looks sleepy, and the man and woman embracing convey a sense of warmth. Then, pregnancy! Especially, look at the child’s movements here; I feel they are incredibly realistic. So, I believe the quality of AI-generated video is truly reaching the same level as text and image generation. Watch the child’s foot movements, the cat’s wobbly gait, and its tail wagging. The campfire is burning, and we’ve generated this in a Lego-style animation, so I think the quality is very high. The little stars decorating the scene are also quite healing.
Video 3: Time’s Passage and Bittersweet Moments
Next, I designed a new third story: it depicts the man and woman from falling in love, getting married, and time passing, focusing on a poignant scene. This poignant scene sets the overall tone of the video. It starts from the beauty of love, transitions to the warmth of family, and finally reaches this poignant moment. Of course, I also selected suitable background music. For the second video, if you’ve seen the Korean drama, “Song Chunying Must Win,” that’s one of my favorite songs inside; the first one is an English song, also good. Let’s look at the third poignant scene.
We see the man and woman in their youth, experiencing love and companionship. Then, the female protagonist falls ill; the man cares for her as it rains. This depicts a state of illness or slow recovery, and finally, with the passage of time, we notice the cat’s rather sad expression. This is the entire scene we generated.
So, if I were to launch a viral short video initiative today, we would need to select different story themes and different protagonists. The entire workflow is comprehensive. Thus, from love to warmth, to marriage, and the passage of time – this forms a complete story. Of course, all the videos I generated are only around ten seconds long, but they convey the atmosphere exceptionally well. This set is actually my favorite among all the short videos; it’s animated, focusing on character-driven short video generation. Let’s continue. I’ve also listed the general parameters used for this short video.
Scenario 3: Miniature Work Scenes
The fifth scenario features miniature or micro-scale work scenes. This one depicts farming: “Design a miniature series of scenes showing crops from planting to harvesting and construction.” Let’s play it.
Background Music:
https://pub-b65afb21c951453a872a026d19411abe.r2.dev/aiqing.mp3
Trigger Word:
3D
Video for Analysis:
https://pub-b65afb21c951453a872a026d19411abe.r2.dev/aiqingshipin.mp4
Example Prompt:
3D, A pixel-style little man in a Japanese boy style, with no decorations on his hair, wearing a black top and a black watch, sitting on the bed and looking at his phone with one hand while hugging a long-haired girl with his other hand. The long-haired girl, with her hair covering the blanket, is sleeping in the embrace of the boy looking at the phone. Next to them, there is a British Shorthair gray and white cat. The two are lying on the same bed, with rain outside the window and a small night light beside it, and the room lighting is dim.
We see the man and woman in their youth, experiencing love and companionship. Then, the female protagonist falls ill; the man cares for her as it rains. This depicts a state of illness or slow recovery, and finally, with the passage of time, we notice the cat’s rather sad expression. This is the entire scene we generated.
So, if I were to launch a viral short video initiative today, we would need to select different story themes and different protagonists. The entire workflow is comprehensive. Thus, from love to warmth, to marriage, and the passage of time – this forms a complete story. Of course, all the videos I generated are only around ten seconds long, but they convey the atmosphere exceptionally well. This set is actually my favorite among all the short videos; it’s animated, focusing on character-driven short video generation. Let’s continue. I’ve also listed the general parameters used for this short video.
Scenario 3: Miniature Work Scenes
The fifth scenario features miniature or micro-scale work scenes. This one depicts farming: “Design a miniature series of scenes showing crops from planting to harvesting and construction.” Let’s play it.
Our machinery is operating – even if some are moving backward, others are moving forward. It has that miniature feel, with a nice contrast in perspective, and you can see the quality of the generated models is quite high. The final result is quite good. This is a very pleasant, healing style, showing a detective watching, or perhaps eating. Crucially, this video is generated fully automatically without human intervention; you can just let it generate as you wish, and its composition is relatively simple. Okay, that’s our fifth video. We have another set to look at – watermelons, also agricultural crops. Starting from spraying pesticides, we need to use machinery to lift and transport the watermelons onto our truck. Now it’s cut open, and everyone is cheering happily, ready to eat the watermelon. This is also a miniature scene, a different style.
We’ve successfully handled various themes: from animals to realistic scenes, to animation, to stories, to cozy settings. Let’s scroll down further.
The Philosophy Behind Viral Short Videos
The creator approaches everything with a philosophical framework: “What is it? Why is it? How to do it?” So, the first step is to ask: “What are the categories of short videos on Douyin, TikTok, and Shorts?” This was the first question that came to mind before creating these videos. Therefore, I conducted extensive research by watching many videos and summarized my findings. Of course, this summary is not official, nor is it supported by academic research or papers; it’s based on my own analysis and intuition. I believe short videos primarily fall into three categories:
- Sensory
- Narrative/Plot
- Conceptual/Ideological
Naturally, each step represents a progressively higher level of complexity.
1. Sensory Videos
Sensory videos primarily aim to deliver a visual impact. Through high-quality visual presentation and perfectly matched sound, they instantly capture the audience’s attention and stimulate a sensory and visual delight. You gain an audio-visual experience from them. They mainly focus on handling the details of visuals and sound.
2. Narrative/Plot Videos
The second level is narrative. Building on the basic visuals and sound, we must inject storytelling and emotional elements. Through characters, narration, and dialogue, the short video becomes more immersive and shareable. If the first type is merely listening to music and watching an MV, the second type tells a story. It has distinct plotlines, character actions, subjective emotions, and thoughts. The core elements here are dialogue and narration.
3. Conceptual/Ideological Videos
Moving up, the difference between the two previous levels is not significant. If you can integrate values and your reflections into the dialogue and narration, building upon the visual and narrative foundations, then the video reaches the conceptual level, where your values are conveyed. If your short video reaches this conceptual level, it can move hearts, evoke resonance, and, more importantly, trigger a behavioral transformation in your viewers. This is the conceptual level. The main elements here are not just stories, dialogues, or narrations. In the process of generating these series of short videos, you gradually build an IP, establish values, and drive behavioral output. This is the conceptual level. In fact, there is no fundamental difference between narrative and conceptual; the key is whether you imbue the video with meaning.
Today, we can use the “Charlie the Dog King” short video series to compare these three levels.
Comparing “Charlie the Dog King” Videos
Sensory Level: Let’s first look at the sensory level. This video showcases the prelude to “Charlie the Dog King,” laying the groundwork for his story and his dramatic appearance. What is this? From the visuals, we see dogs fighting, then Charlie arrives and exerts dominance. This is a simple combination of visuals and sound in a short video. Of course, this particular video went viral because it presented powerful visuals. Doesn’t it primarily convey a visual and auditory experience? The intense argument in the visuals immediately grabs our attention, and there’s rhythmic, melodic music in the background. So, this first video is purely sensory; it achieves an immersive audio-visual experience through the combination of visuals and sound, making it attractive and dynamic, drawing your attention. This is the first level.
Narrative Level: Let’s look at the narrative level. This video, building on the original foundation, incorporates dialogue, narration, and even character dialogue (which we’ll discuss later). By adding dialogue and storytelling, it transforms a common scene of dogs fighting into a narrative. There’s a plot here, with characters like “Long Hair,” “Black Old Four,” and even “Charlie the Dog King” as he’s known internationally. Characters have emerged, and there’s a story with a sequence of events. This is our narrative level, the second level of short video.
Conceptual Level: Now, let’s see how it injects our values. I won’t show you the full long story, but it tells of a dog king’s passing, with surrounding dogs showing respect and remembrance, guarding him as he is sent off. This embodies the human sentiment of respect and love for those who have passed. It conveys the emptiness felt when losing a beloved member, which already implies a value system, containing genuine human emotions. This “Charlie the Dog King” video was also generated through secondary creation abroad, but I believe it primarily infused a conceptual meaning compared to previous videos. It’s not just about knowing the storyline. I initially paid attention to a simple short video about dogs fighting, then someone introduced me to it by adding plot, telling me who A, B, Charlie, Long Hair, and Black Old Four were. It introduced different roles, different dog characters, guards, etc., showing me that there’s a deep story behind these dog videos. Finally, we discover it has inherent emotion. In fact, the core viral element of “Charlie the Dog King” comes from its output of values. This dog, from being small and bullied at three months old, gradually, through its own struggle and unyielding spirit, persisted, never giving up, confronting others to establish its status, or endlessly pursuing its goals. That’s why we like “Charlie the Dog King.” It embodies emotion and values; we humans are upward-striving beings who must persevere. This value brings us to the conceptual level. These are the three types of videos I’m discussing.
Comparing Movies to Video Levels
Let’s also look at it from a cinematic perspective:
Sensory Level: This is characterized by soothing music and healing visuals, which constitute the sensory level.
Narrative Level: If we look at “A Better Tomorrow” (Xiao Ma Ge), from the perspective of dialogue, it tells a story through the protagonist’s conversations, embodying the narrative aspect. This draws us deeper into understanding Xiao Ma Ge and his past experiences, with the actor’s expressions truly conveying the character. This is our second, narrative level.
Conceptual Level: What is art? What are values? My favorite film, “To Live” (活着), captures this: “No matter what, you have to endure it, you have to suffer it. We must live and come back, and live well.” Different protagonists convey a single truth: “You owe me a life; you must live well.” Do you know what value is embedded here? It’s the resilient character of a nation. Over five thousand years, enduring various hardships and experiences, such a nation stands tall. It truly showcases the character of a nation. Is that not a value output? This film differs from all previous films not through special effects or plot, but by telling you the entire story of a nation, its struggles, how a stubborn spirit and national character are rooted in its soul and bloodline. This is a value output.
So, after discussing all this, let’s return to the presentation. First, understand that when producing short videos, there are three levels. Which level do you want your video to reach? Let’s use the “Dog King” example, as its popularity is quite recent. This particular blogger initially focused on creating visually appealing videos of dogs. They filmed over 800 videos. I really like this blogger because, emotionally, they probably never expected to go viral. They simply loved dogs and filmed their daily lives. They primarily started with sensory elements, focusing on visuals and sounds—the intense scenes of dogs fighting, and also the more relaxed moments, like dogs comfortably lying in the yard on a hot summer day. It was mostly about conveying visuals and sounds. But when you film over 800 videos, stories naturally start to emerge. What stories? Each dog becomes a character—Long Hair, Black Old Four, and other distinct personalities. Storytelling is infused into them. These stories, whether meaning is attributed through subsequent secondary creations or inherent in the original content, become richer. Of course, with more secondary creations, people extract values from these simple visuals: an unyielding spirit, striving from a young age, and a down-to-earth approach. So, the “Dog King” story demonstrates how a short video channel can go viral: from initially pursuing a sensory level, as you film more videos, you develop a narrative, and eventually reach the conceptual level.
The N8N workflow created here, in my opinion, has achieved some narrative elements. Why do I particularly like it? The couple’s story we saw earlier already possessed some narrative elements, from meeting to falling in love and getting to know each other. It reached the second level. However, the short video workflow I’m discussing today primarily focuses on the sensory level, emphasizing the contrast between visuals and sound. Previously, I created a fully automated workflow where every scene, voiceover, plot, and storyboard was automated. But this high degree of freedom in each module made it difficult to convey emotions and atmosphere, lacking the warm feeling I mentioned earlier. So, recently, after a long period, I completely scrapped that workflow and rebuilt it. Since I couldn’t achieve all conditions, I decided to focus on the sensory level to create this workflow.
Now, with this workflow, you can gradually use it to first pursue sensory impact through visuals and sound, and then gradually incorporate narrative elements to elevate your short videos to the next level. This is “what a short video is today”; today, the focus is on the sensory level. What is the strategy for producing viral short videos at the sensory level? It’s “what I have, how I do it.” What to do? Second, what strategy do I need?
Strategy for Viral Sensory Short Videos
Since the sensory level involves matching visuals and sound, I’ve delved deep into each area to explain how to produce viral short video visuals and sound.
Visuals: Key Requirements and Tool Selection
For visuals, my primary requirements are:
The workflow needs to integrate various visual styles; it cannot be limited to just one. Because a workflow needs to be universal and adaptable. You can’t develop a workflow today that only generates one type of video with a consistent visual style; that would be inefficient after investing so much time in building it. So, my first requirement today is that this short video workflow must support a diversity of styles. Furthermore, once you fix a style, the entire style must be consistently reproducible. Because if you look at viral short video accounts today, they generally publish videos of a single type. For example, the “Dog King” series is all one type of video. So, within a single style, you must be able to consistently produce high-quality output. If you can’t, your quality will be insufficient. The visual requirement is to consistently produce high-quality output when focusing on a specific style.
What tool did I choose? As mentioned in my previous video about comic videos, it’s still LiblibAI’s API. This is a very important image generation platform in our domestic market. It has over 100,000 built-in image generation models and Lora models, supports API calls, and offers reasonable pricing with a flexible pay-as-you-go model. You can add various plugins and even different Lora models with varying weights to achieve your desired results. As long as you can generate an image in LiblibAI, I can convert that image into a video for you. This is why we chose it: it supports multiple styles while also ensuring stable output for a single style.
Sound: Key Requirements and Selection
How do we select sounds? If we look at some viral channels, their voiceovers are quite consistent. Typically, 10 to 20 or even dozens of videos will use the same voiceover. This is because our auditory perception has inertia. If you like a piece of music, you’ll grow to love it even more. At first, you might feel the melody is out of tune or not to your taste, but the more you listen, the more your auditory inertia kicks in, and you’ll begin to enjoy the music. This is why the comic video I mentioned earlier, whether it’s the music from “Song Chunying” or “Animal Crossing,” are among my favorites—because of this inertia. So, the voiceover needs to be fixed, but its *duration* must be adjusted according to the short video.
Therefore, my suggested process is to first filter top accounts on platforms like Douyin and TikTok. From their most frequently used music, filter out the ones you like. Based on their duration, emotional tone, and rhythmic points (you can look at their waveform graphs to identify beat points, etc.), fix these sounds. Of course, to truly go viral, you need to generate your own unique sounds. Currently, many sounds are created by specialized AI voice actors or sound engineers. Overall, we’ve identified two paths for implementation. Later, in the setup process, I’ll go into detail about the many configuration tips for LiblibAI. Here, I’m primarily discussing the underlying principles and theoretical logic.
How to Implement: The Director’s Five-Step Method
So, what to do, how to do it? That involves our workflow. If you’re producing a short video, it’s like being a director. So, we need to see how a director works. I’ve summarized this into a five-step method:
Script Writing: First, your short video must have a script.
Scene Breakdown and Arrangement: After writing the script, you need to break it down into scenes. “Action One,” “Action Two,” and so on. After scene breakdown, you arrange them by chapters or sections for shooting.
Image Creation: You’re not holding a camera and directing actors. You first create images. This is our image generation phase.
Video Generation from Images: Then, you generate video based on these images. Currently, you *can* generate video directly, but as I said, to ensure consistent style within a video, we first choose to generate images. Because images are easy to control in terms of style; you have various Lora models, and the Flacks model’s capabilities are strong, ensuring stable output. So, we first generate images to fix the style, then pass these images to our video generation model to create the video.
Post-Production/Synthesis: Finally, it’s our post-production, including voiceover synthesis, video splicing, adding fade-in/fade-out effects to the sound, and even adding subtitles and watermarks. All of this is part of our post-production. So, I will build this workflow today following a director’s workflow.
Who is the Director? Multimodal Large Language Models
So, who will be the director? I’ve concluded that it should be a multimodal large model. Speaking of multimodal capabilities, let’s go to my website to demonstrate. I’ve been discussing the role of multimodal models for a long time. For example, in Make.com, I was the first to show how multimodal models can generate voiceover scripts for 25-second videos. It was the first multimodal Make workflow on the entire network, effortlessly generating voiceover scripts from factual text. Back then, it utilized multimodal models to directly extract elements from viral short videos and generate voiceover scripts for us. And I’ve always been organizing, right? “What are the new tracks?” These are articles I’ve written, outlining many directions. What are the multimodal videos in ‘What’? This uses the Flacks Children’s Picture Book model, which is directly inherited from today’s LiblibAI video. So, my videos actually have a logical progression. If you’re interested, you can check out “What are the new tracks for multimodal models?” which are presented quite well there.
Returning to the presentation, why use a multimodal model as the director? In the early stages of creating my first video, I tried to make a text-based large model the director. I’d ask the text large model to extract the voiceover script from Douyin, then generate a script from that, then break it into scenes. I found the results weren’t good. Why? Because the large model couldn’t *see* the visuals. A short video, especially one targeting the sensory level like this workflow, is fundamentally a combination of visuals and sound. Our current workflow doesn’t even have a voiceover script, so how could a text-based large model be the director? Therefore, what kind of large model should be the director? You should let a large model that can *recognize video* be the director. This is where multimodal models come in. Furthermore, Google’s Gemini large model, particularly Flash, is free and supports uploading 2GB or even larger files, allowing it to directly analyze short videos.
So, today, I’m using Google’s multimodal large model. Based on the script’s theme, the large model’s multimodal capabilities will deeply analyze the viral propagation mechanisms, visual hotspots, and atmosphere creation techniques within viral videos. Based on your reference videos, it generates visual scene prompts. This is like having the director, the large model, first watch the best movies of the same genre from IMDb’s Top 250. After it watches them, the large model learns. Then you tell it, “Okay, now, generate a script for me in the style of ‘To Live’.” I’ve tested this, and the results are quite impressive. If you don’t have this workflow, you can also upload a video to Google AI Studio and ask it to analyze the viral points within that video; it will analyze them incredibly thoroughly. This is our first point.
Then, it generates scene prompts. These prompts don’t just contain image elements; they also encapsulate the atmosphere, which it summarizes quite well within the prompts. After generating the images, creating the animation effect from images to video also requires prompts. This also requires multimodal capabilities: letting the large model directly view the entire image, analyze its dynamic elements and cinematic language, and then generate video prompts for the video generation model. In fact, all steps in this workflow today are performed by AI. Essentially, you’re entrusting all the work of a professional director to a multimodal large model. Because multimodal large models can now not only read scripts but also watch videos, listen to sounds, and see visuals. This is the core technique of our entire workflow today, and it’s a first in the entire field, whether in Chinese or English. Among all N8N workflows I’ve seen, I haven’t yet seen one that embeds the Gemini multimodal large model workflow into the short video production domain. This is actually quite difficult, and my nodes are quite numerous, so I’ll explain it in detail later. This covers “what we have,” “what we do,” and “how we do it,” which are the most important aspects.
N8N Workflow Steps: A Summary
Now, let’s implement this on our N8N workflow through a series of steps. Each step has detailed node configurations. Here’s a little jingle to remember it:
- AI automatically extracts, ideas for topics flow endlessly.
- Multimodal deeply analyzes, scene scripts auto-compose.
- Lora constrains to produce masterpieces, thousands of styles at your choice.
- Intelligent recognition of motion points, images instantly come alive.
- One-click addition of sound and watermark, viral short videos appear instantly.
That’s the entire logic and concept behind this workflow.
Key Features of this Workflow
Now, let’s summarize the features of this short video workflow:
Batch Production: It supports 24-hour unsupervised batch production of short videos.
Versatile Style: It utilizes all commercially viable large models from LiblibAI. You can use them to generate videos, with one-click switching between tens of thousands of styles, adapting to all scenarios. As we just tested, whether it’s real people, anime characters, or animated themes, it’s fully universal. This is because LiblibAI has various image model training experts who train these models for us, resulting in quite stable generation.
Multimodal Dynamic Analysis: This is a first on the entire network, combining viral video analysis features. It leverages Gemini’s powerful multimodal capabilities to deeply extract viral video characteristics, and prompts are also generated based on multimodal input, creating dynamic prompts for our videos.
Cutting-Edge Video Generation: The video generation model we’re using today is Fal.AI. This website allows quick switching between Minimax and Hunyuan. Google’s second and third generation models haven’t been integrated yet, but I expect them soon, and they’ll be available for use.
NCA Toolkit Integration: We’re also introducing a new open-source tool, the NCA Toolkit. It truly enables cloud-based video processing, free watermark addition, music synthesis, and effects processing, all in the cloud. This is a key part of today’s video process. It’s also multi-language.
Flexible and Open: Based on the N8N open-source framework, it offers flexible parameter configuration. This is the most complex workflow I have ever created.
Let me also explain the entire evolution process of this workflow. Initially, I primarily relied on a large model’s generation strategy to achieve full automation of the entire process in one go. However, in practical application, I found that the overall matching accuracy was low. So, this workflow was incredibly challenging for me, truly very difficult. The initial setup took a very long time, and ultimately, the results weren’t good, so I completely scrapped it and started over. I was so frustrated I even deleted the workflow. Then, there was no choice. After in-depth research and repeated testing, I adopted a strategy of local constraints plus intelligent analysis. Through Lora models, I constrained the image styles, leading to the current controllable workflow with thousands of video styles that can be changed at will.
Post-Production Tools: The NCA Toolkit
Now, let’s look at the primary tool used for post-production in this short video workflow: the NCA Toolkit. This open-source tool was developed by a prominent blogger and developer in the Make and N8N community, who generously made it available to us. This toolkit is high-quality and free. This developer also provides instructions on how to deploy it on Google Cloud for free, and you can then call it via N8N and Make. More importantly, it’s free. If you’re familiar with Make and N8N platforms, calling mature APIs typically costs $40 or even $50 per month, which is very expensive. Furthermore, these are usually monthly subscriptions, not pay-as-you-go. This is why this video was delayed—this particular problem hadn’t been solved. Because if you use N8N and Make for video generation, video synthesis is always involved. With this open-source tool, we can now perform this operation very well.
Let me open its official website for you to see; this is the entire open-source tool. With such excellent open-source tools, I hope everyone can give it more likes and support to help it develop further. It supports many tools. I’ve also opened my official website for you. It took me about half a day, around three hours, to write these two videos, really half a day. Click on “Categories,” and you’ll find a complete tutorial for the NCA Toolkit. This includes a step-by-step Chinese tutorial for free installation on Google Cloud Platform, with dozens of images that I carefully prepared; I’ll also walk you through it in this video. It also shows you how to set up Google Cloud Storage, which I’ve often been asked about regarding direct links besides R2; Google’s solution is also quite useful. This covers the entire process of deploying Google Cloud Storage and how to call it directly in N8N. The same logic applies to Make; you can achieve this tool with both. This is a complete tutorial I’ve prepared for everyone, available on the official website, along with a practical comprehensive guide introducing the functions of the NCA tool. Let’s take a look:
- Add subtitles to video
- Video splicing
- Extract thumbnails
- Edit and split video
- Audio/video splicing
- Overlay and mix audio
- Static image to video conversion
- General format conversion
- Content extraction and transcription
- It also supports Python code execution and cloud deployment.
Some of these parameters might be incorrect because I also used AI to write them, but that’s okay. You can understand the function of each node through the Chinese explanation. For specific parameters, the official website provides detailed information, so you can go there to see the introductions for each parameter. First, understand them in Chinese, then when you actually use them, you can send the entire page to GPT and ask it to help you construct the request body. This is perfectly fine; you can understand what each module does by simply reading the Chinese. So, the supported functions I just mentioned are numerous. This truly “adds wings” to N8N and Make for video automation; simply put, it enables “one-click takeoff,” right? This is the entire tool, and I’ll explain how to set it up later. Let’s continue.
Practical Implementation: Setting Up NCA Toolkit on Google Cloud
Now that I’ve provided so much preliminary information, let’s move on to the practical part and see how the entire workflow is implemented step-by-step. First, I’ll help you install the NCA Toolkit on your Google Cloud. We can start from the beginning. The first step is to go to the Google Cloud Platform website. I can also put this URL in the video description. You need an account; simply log in with your Google account. Click “Get started for free” in the top right. After clicking, a screen will pop up. You’ll need to select your country or region. I recommend choosing the United States, as my N8N is also deployed there. After selecting the US, you need to add a payment profile. Click “Add.” There will be fields for your address. I recommend entering your real address, but if you don’t have one, I’ve also provided a tool to generate one; you can paste it in there. After generating the profile, which includes your payment method, you’ll need to add a credit card for verification. GCP will perform a small pre-authorization to verify the card’s validity; this amount will be refunded. So, you need a dual-currency credit card that supports Google’s verification. To my knowledge, Chinese mainland credit cards can be used for this deduction; virtual cards are not supported. Once the card is added and a small amount is deducted, your account will be verified, and you can activate a project. For any future changes, you can check the video comments, as many users encounter issues here, and I’ve listed important notes there. Once you’ve entered the card, you should succeed, and your account will generally be activated. If not, you can also search on Google for “Google Cloud Platform activation methods”; many short videos explain this. I won’t go into too much detail here, but this is the general process. Afterward, you just need to fill out a short survey to complete the project setup.
GCP Project and API Setup
Now, with the preparation done, I can demonstrate. Let’s start with the preparation. The first step is to click the top-left corner and create a new project. I’ll guide you through it here. We’ll enter a name, for example, “n8n-test.” It will check; the actual project ID will be generated. You can enter a unique name based on your input. For organization, select “No organization,” then click “Create” to create a new project. Okay, wait for the project to finish. Once created, it will appear in the top-right. You can click “Select Project” or click here to open it. I have many projects, so I’ll select the “n8n-test” project. Make sure you switch to your corresponding project.
Click on the left, then select “API & Services” -> “Enabled APIs & Services.” This is where we need to enable APIs. We need to enable a few APIs. First, you can directly copy the name “Cloud Storage API” from the website. Check it carefully, then click “Open.” Mine is already enabled; if yours isn’t, be sure to click “Enable.” It will then show “API enabled,” which is fine. Okay, let’s continue with other APIs. Search for “Cloud Storage JSON API.” Pay close attention to the details. Click “Manage,” then “Enable.” This is also enabled, so no problem. Click “Back.” Also, “Cloud Run.” We’ll use the Cloud Run module to deploy this code, Docker. Click to enable it. Okay, once enabled, we need to create a service account. The navigation is usually on the left because there are too many functions, so it’s in a sub-menu. We select “IAM & Admin,” scroll down, and find “Service Accounts.”
Creating a Service Account and Key
We now need to create a new service account. Click “Create Service Account.” Enter a name here; I’ll copy-paste. To keep it consistent, I’ll add “test” as a suffix. We can add a description here; it’s optional but good for clarity: “Used for NCA Toolkit to access GCP resources.” It’s good to add these small details so you can understand and identify them later. Click “Create and continue.” Next, we need to assign it some roles. Search for “storage admin,” click to select it. Then, continue to add “viewer.” Search for “viewer” again and add it. Okay, once added, we click “Done.” The service account is now added. Now we need to perform an action on the service account. Select the one you just created. This is where the description we added earlier (“Used for testing”) comes in handy. Click its name, then “Manage keys.” Find the corresponding “Manage keys,” then select “Add key,” and “Create new key.” We’ll keep the JSON format. After clicking “Create,” it will download a file to your local machine. You must save this file carefully. We’ll use this file later; you’ll copy its entire content as the JSON key. Because we enabled the Cloud Storage API earlier, this allows the deployed Docker service to use the JSON key to access our Google Cloud Storage, making it convenient to store our processed videos directly into our Google Cloud Storage account. That’s its purpose. So we activated this earlier. I’ll connect the dots for you later; let’s first go through the specific steps.
Creating a Cloud Storage Bucket
After creating and downloading the key locally, we now need to create our Cloud Storage bucket. Click the top-left corner, then “Cloud Storage,” then “Overview.” We’ll start at the bucket overview. Click “Create bucket.” This name must be unique globally, so make sure to choose a unique name, e.g., “n8n-test-video-33.” You must choose your own name, perhaps incorporating your username, to ensure global uniqueness. “n8n-33-test,” no problem, let’s continue. We’ll keep the US multi-region. Click “Continue.” Here, there’s an option we need to modify: we must disable “Enforce public access prevention on this bucket.” This means our Cloud Storage bucket will be publicly accessible. Please note: all files in this bucket will be publicly accessible on the internet. There’s no need to add further authentication here because the generated videos will definitely be publicly accessible. N8N and Make will need to directly access and use them later, and our AI-generated videos don’t involve any privacy concerns. So, you must turn this off; it’s a very important option to remember. Okay, click “Continue.” Click “Create.” The bucket is now being created. We’ll use this Cloud Storage bucket name later: “n8n-33-test.” It’s now created. You can open this website later and upload files here. After uploading, you’ll see its public access link, which you can copy and share anywhere. This includes Lora models, for example; if you’re using custom Lora, you need a direct link, and you can use this Google Cloud Storage bucket for that.
Of course, we need to modify the bucket’s permissions. Click “Permissions,” then “Grant Access.” Here, we need to add a role. We’ll enter “allUsers” for all users. What permission should they have? “Storage Object Viewer.” Copy and paste “storage object viewer.” So, in fact, that’s why I always insist on making a video with images, because such videos are actually less convenient for viewing as you often need to copy content. “Storage Object Viewer.” If you click and grant this role, make sure it’s successfully added above, then it’s fine. Click “Save Changes,” confirm “Set to public,” allow public access, and click “Confirm.” Okay, these two steps: first, we created a new project and followed the project flow, right? After creating the new project, what did we do? We enabled APIs, allowing various APIs to be activated. After enabling, we created a service account. After creating the service account, we created a storage bucket. After the storage bucket is created, we can now use the Cloud Run module to deploy our open-source tool. Actually, you can deploy this anywhere as long as you have a server, so it’s quite flexible. Because it’s a Docker environment, anything that supports Docker can deploy this content, you just need to configure the variables correctly.
Deploying NCA Toolkit with Cloud Run
Why did I choose Google? Because it has a free trial period. Please note: after the trial, be sure to close your account promptly, as it will send emails to prevent excessive charges. This is an important consideration. To be honest, you can use a new credit card to start another test and achieve a kind of continuous use. In fact, its resource consumption shouldn’t be much; if you run it long-term, it’s still viable, because what we’re using is an instance that can be stopped. Now, let’s start deploying this Docker on Cloud Run. Click the left, scroll to Cloud Run, and enter. We’ll deploy a container, or directly create a service. Click “Create Service.” Okay, so we’re deploying the image directly from Docker Hub. We can directly enter the Docker Hub address. I’ll copy the container location link, and it will configure it directly for you. I usually choose a region with lower carbon emissions, like Oregon. I’m not sure what it’s called. Scroll down. Confirm the address and service name. Scroll down further. After pasting, you’ll see it’s “latest,” meaning the newest version will be deployed. For authentication, we select “Allow unauthenticated invocations.” We allow unauthenticated access because N8N doesn’t need authentication; it just needs direct access. Also, be sure to remember this endpoint address; you’ll use it later. This is the fixed URL for all your API requests. Then, select “Allow all.” We choose “Based on instances,” not “Based on requests.” It has a lifecycle; it doesn’t spin up for every request. Scroll down further.
Here, for the minimum number of instances, select “All.” This means at least one instance will be running. Click “All” here; we also need to enable it and add parameters. Scroll down further. You can also set the minimum instances to 0 here to reduce cold start time and costs. Let’s temporarily set it to 0. The port is 8080. For CPU and memory, we can use 4 CPUs and 16GB. If you’re processing video, this can handle a lot; if you have money, you can choose even larger resources for faster completion, which is perfectly fine. Choose according to your needs. I generally choose the default, simpler, cheaper options: 16GB, 4 CPUs. Of course, if you have a GPU, it’s even better. For concurrency, I’ll set it to 1. My own requests will definitely process one at a time, so 1. I’ll choose “Second generation.” For max instances, I’ll select 1. Here, I’ll choose 0, which is fine. Enable CPU acceleration. This configuration should be fine. Set timeout to 300. Okay, we’ve configured everything in this dropdown menu: roughly 16GB, 4 CPUs, 300s request timeout, 1 max instance, 2nd generation, 0 min instances, 1 max instance, and CPU acceleration enabled. Here, let’s review: select “All,” automatic scaling down to 0 instances when idle, allow all users to access, selected US region, configured name is “latest.” Okay, this is the content of our Docker deployment.
Configuring Variables and Keys in Cloud Run
Next, we can find “Variables and Secrets,” which is where we set our variables. Add the first one, which must be our API Key, the one you commonly use for verification. I’ll use “123” for simplicity; you should set your own username or password. A username is usually enough because no one will know your access URL, and combined with the API, it’s very secure. So, remember what you set here, as this will be used for every future request. This is your API. Second, you need to copy the entire content of the file we downloaded earlier. We’ll paste the copied code. You can directly copy it from my tutorial; be sure to double-check for correctness. Now, open the JSON file we downloaded in the previous steps. It’s in JSON format. Open it with Notepad or any text editor without altering its content, select all, and copy it here. Okay, I’ve copied all its content and pasted it. This is your JSON request, essentially giving our Cloud Docker deployment tool, NCA, access to your local Google Cloud Storage content, allowing it to upload files and perform various operations to get public URLs. Then, continue to add the next variable, which is our bucket name: `Bucket Name`. Also, be careful about trailing spaces and other minor details. You can open another Google Cloud Storage window to find its location. First, always make sure you’ve selected the correct project if you have multiple; otherwise, you’ll often encounter errors like “password incorrect” because permissions were set in Project A but not in Project B, and you’re using Project B’s API Key with Make. So, select “n8n-test” and then Google Cloud Storage within “n8n-test.” Is this not its name? As you can see, “n8n-33-test” is its bucket name. Go back here and paste it in. This addresses a very small detail that often causes errors. Okay, once these details are set, we can click “Create,” and it will be deployed soon. You can deploy others too. Wait for it to finish running, then we can start making requests. We’ll need the generated page’s URL later; it’s what we’ll be requesting. So, we’ll keep it open. Now, let’s go directly to N8N and set up our workflow; we’ll revisit this page when needed.
N8N Workflow Setup
Let’s open N8N. First, I’ll show you what I’ve used in my storage. These are the contents I mentioned deleting earlier. I deleted the main workflow. Yes, yes, yes. So, for every character splicing and synthesis, I used everything, but the effect wasn’t good, so I had no choice but to start over. Click “Create” in the top right. Okay, so we’re familiar with how N8N creates workflows. I’ll demonstrate from scratch. First, when building this workflow, I usually rename it to “33.” I typically start workflow creation from a form. Why? Because a form can be fixed, and once you submit it, you can consistently test it later. At the same time, you can quickly add the parameters you need in the form. This automation workflow doesn’t use any external knowledge bases. In fact, I’ve seen many foreign N8N and Make tutorials that first build a very complex Notion or Airtable No-code DB knowledge base. You need to understand the concept of these knowledge bases first. Notion is simple, but other tables might have nesting, making it very complex. So, this workflow is completely detached from any knowledge base; all data transfer and generation happen within Make itself, making it simpler to use and easier to understand the logic, without needing to prepare a separate database to serve it. Also, when importing my workflow, make sure your N8N is updated to the latest version, because I always use the latest version. That’s just how I am; I prefer new things over old. So, using a form is simpler, and a form can replace the function of a database. You don’t need to jump to Notion, click to enter an item, and then run it.
Form Trigger Setup
Click “Add,” then “Form.” Okay, we’ll enter a name. My old tradition is to rename it first. Why? Because this name determines how we’ll write expressions for data mapping later, so the name must be correct. Let’s give it a title: “Workflow Name Generator.” This is its name. Now, let’s add its elements. N8N also has a Chinese version now, but I don’t recommend using it. Why? It doesn’t have many words; they’re quite simple, and you’ll remember them over time. This is just the field name. And if you upgrade to the Chinese version, you might encounter issues later. N8N is continuously rolling out new features, so it’s essential to follow the official pace. It will become simpler and simpler later. Now, it has community nodes that can be added directly to N8N, which are good features. If you update to the Chinese version, you might encounter data or conflict issues. I even hesitate to deploy locally because sometimes after a long period of disuse, folders might be deleted, causing problems. So, I generally prefer the freedom of the cloud; I can open my computer anywhere, log in, and start using it immediately. This is also more flexible and not dependent on the computer; if the computer has issues, you might face data problems. So, I generally use cloud-based solutions, but local is free, so choose according to your needs.
For “Video Theme,” sometimes when it’s a long text, you enter “Text Area.” For short text, it’s “Text.” It doesn’t matter. Next, the field name is very important. “Scene Count.” I’ll continue to add it. “Scene Count” is a choice because it’s a number. “Text” is fine. This is “Branch,” whether you want to use automation or submit through the form yourself. Use the form for self-testing, and automation for automated use. So, this should be a dropdown menu. Select “Automatic.” And when creating your N8N workflow, fix these names yourself. “Automation,” “Form.” In the future, if you call it “Form,” always call it “Form.” This makes it easier to copy many fields directly from other workflows without having to rewrite them. Once you use them more often, it becomes easier. I now distinguish clearly between “Automation” and “Form,” and I’ve gradually formed this habit.
Next, we add the “Form Element.” For “MP4,” you don’t need to change anything here because it relates to the binary name for file download. You must use English here; Chinese will result in garbled characters. This is a small detail. We select “File.” Generally, we don’t enable “Multiple file uploads” here; only one file can be uploaded. Okay, once these fields are set, click “Add additional fields” and turn off the N8N identifier. Change “Button Label” to “Create Video.” These are just my small habits. For “Path,” I’ll name it “Workflow Video.” Any place where “Workflow” is written is a place you can modify; you can change it to your own name. For example, if your name is John, you can change it to “John-519.” This helps me indicate where you can make changes. Why? Sometimes I might forget to hint where you can modify something. The first form is generated, so we can quickly run it. Enter “11,” select one. I won’t upload a file. “Create.” Okay, no problem for the first module. After the second module, we need two “Merge” routers. Press “M” for Merge. Click “Add.” Okay, move it here. Let’s duplicate it. “Inactive.” Set it to “Active” and copy it. These two are for building our automation module, which we’ll use later. Let’s first go through the form logic, and once it’s working, we’ll use automation. Here, we’ll open it and modify the name. This router is “Automation,” coming from the automation side. There are three branches here. Click “Rename.” Click “Close.” Okay. The parameters used are “Combine by Position,” with “Number” set to 3. And also enable “Include other inputs.” This is configured. Now, let’s configure the second router. Router “Form,” also three. “Combine by Position.” Set to 3. Rename it to “Form.” Click “Rename.” Since it was copied, this module should also be open. Okay. Let’s place these two modules first and connect them to the router above. We’ll continue later.
Setting Parameters (Edit Fields Node)
Next, is a frequently used section for setting parameters. If you import the video, the main thing you’ll need to modify is this section; other parameters might be less changed. It’s mainly this “Edit Fields” section. I’ll name it “Comprehensive Parameter Settings.” This section will have many fields. If you modify any API keys, generally do it here, not later, because I’ve centralized all keys here. If you import my workflow video, you just need to modify this section, and the general workflow will run. This section has several fields:
-
Video Theme: This is from our form, accessed via JSON. I generally choose to use the “Video Theme” directly from JSON, as this is your field name.
-
Scene Count: This is the second field.
-
Prompt Example: This is also inherited from the previous steps.
-
Music URL: We also need to add a music URL. Generally, each video will have fixed music, so this music will be automatically fixed for a batch of automated videos. We can directly put an open-source URL here. This is why it’s convenient to upload to a bucket we just created; you can use it anywhere, and most importantly, it’s a direct link, ending in .mp3 or .mp4, meaning you can directly download and access the file.
-
Main Model ID: I’ll copy and paste this. Because you support thousands of styles, you’ll definitely need to switch between different models, whether it’s Flex, other specific models, or other services. You can switch models based on their Lora ID. This is the main model ID, our Checkpoint. Later, there’s the Lora ID. These are all officially defined by LiblibAI; I’ll explain where to find them later. I haven’t talked about LiblibAI yet.
-
Trigger Words: Since Lora often uses trigger words, try to include them. My trigger words are set for our current video style, so you can switch these trigger words when changing styles to match the corresponding style’s trigger words.
-
Special Image Requirements: I added this node/module to allow you to add new content on top of the automation, enabling you to guide the AI and incorporate your own ideas. This is for images.
-
Special Video Requirements: I’ll copy this. These are currently placeholders for a consistent style.
-
Single Video Duration: We’ll select 5 seconds.
-
Branch: Which path are you taking?
Okay, I took a short break. I was talking for over an hour and was really tired. Let’s continue now. We’ve set up the branch. Below it is a timestamp, because LiblibAI needs a timestamp for API verification. I’ll set up today’s parameters first, then explain the timestamp. After the timestamp, there’s a random number, which we’ll also input. These all use N8N’s built-in functions to generate random numbers or strings, achieving our encryption goal. Randomness is for LiblibAI image generation.
LiblibAI has many interfaces. Each interface requires calculation based on the port URL during the key calculation process. Text-to-Image is such an API endpoint, and you need to configure the subsequent process based on this. We’ll also add the LiblibAI result endpoint. Since you’ve generated an image from text, you then need to obtain its state. Okay, that’s its status. Let’s continue to add the LiblibAI URL; it’s the fixed API access URL: `open.liblib.cloud`. I actually recommend LiblibAI to make this URL simpler, as most large models now use a single key. Of course, it has many parameters, which might be due to different considerations.
Now, let’s continue to copy the Google Cloud Bucket Name. This name still comes from our “n8n-33-test”; replace it with your own name. These all need to be changed by you, as you followed my steps. For Gemini’s API key, where do you apply for it? Click this link, then enter the URL “aistudio.google.” This is the Google AI Studio interface. Click “Get API Key” in the top right. The key you copy is free. You just need to copy the key; it has free usage quotas and usage tracking. This is its simple request body. If you can only use Flash, you’ll use 2.5. The specific model name for Flash is copied here. If your free account can’t use it, you need to bind a card for payment. In fact, if you bind a Chinese card, which we did during the Google Cloud Platform setup, it should work. It’s just a matter of adding your payment information and Alipay account; domestic technology should support it. After binding your domestic dual-currency credit card, you can use Pro. Of course, to achieve the best results, you’ll definitely use Pro. If you don’t have it, you can use Flash. The specific URL is copied here, and the API is here. Click “Copy” and paste it here, then you can use it. You can also add corresponding modules. Since we don’t have a mature multimodal module, we just paste the key here. We continue to add the NCA API Key. We set it as “123” earlier; I set mine as “123.” For NCA URL, I also told you that you can copy this URL from Google Cloud Platform after we generated it, and paste it. For Fal.AI, I’ll paste it first and then show you Fal.AI. This is the Fal.AI website. You can find it by entering the keyword. Click “Home,” and you’ll find “API Key” here. You can create your own and paste it in, which is fine. This is our Fal.AI, as we use this as an intermediary platform for generating images using Keling. Continue to add the Logo URL. Since we’ll be adding a watermark later, you need to set up a transparent logo yourself. If you’re a developer, it’s quite simple. Let me show you; the URL I set up should be white, so you can’t see it. It’s a transparent logo. You can design your own logo, perhaps “Workflow” (like “Workflow” for our blog), and create it with dimensions of 500. You can even download this website’s URL and generate it yourself according to the dimensions I used, so that it’s exactly the same size. Otherwise, you’ll need to adjust parameters. This URL actually needs two more things to be added: Liblib Access Key, I’ll add it here first. And Liblib Secret Key. Since these cannot be replaced by everyone, I won’t display these two values. So, let’s use this spot to briefly explain Liblib’s verification format. Click to go to any Liblib official website. For example, if you’re on the official website, there’s an API platform in the bottom left. Click to enter, and it will tell you which models it supports, like Xingliu 3. You can also use it here, but you’ll need to modify the request body. It supports custom models, high-precision control, suitable for Flux, Stable Diffusion v3.1.5, and fully commercial private and commercial models are all supported. This includes the famous AW Portrait; these are all models we can use. It will also tell you about unified pricing; generally, it’s a few cents per image. You can recharge 10 yuan and try it out yourself, feel it, and you can also get invoices, etc. This is how you recharge. It should give you free credits upon registration, but I’m not sure because I directly recharged. Click the “API Documentation” on the left, and you can see all its interfaces. Let’s continue to scroll down.
Its logic for generating API keys is as follows: first, it has an Access Key, a Signature, a timestamp, and a random node. The random node and timestamp are randomly generated in N8N, and these will be automatically assigned to you. Just follow my instructions, and it will work. For the Access Key, after you log in and recharge, there will be an interface with your Access Key and Access Key Secret. The Secret is hidden. Copy those two values and paste them here, and it will be fine. I won’t copy them here because the Access Key cannot be modified, so I won’t display it to everyone. The process is actually very simple. The process is that the original text of your request content is the URL address. This address is what we discussed earlier; there are different endpoints for image generation and result retrieval. You map all of them to variables, so you retrieve that variable. During the image generation process, you get this URL, then use this connector to link the millisecond timestamp and random string. This is the original text, then you need to add HMAC-SHA1 encryption to generate the signature. This signature can then be used in subsequent requests. Of course, it also provides Java and JS code, but these two codes don’t work well in the N8N environment. Why? Because N8N’s encryption library doesn’t seem to include this format. So, I’ll use my own method, which I’ll discuss shortly. But here, I’ll briefly introduce how to concatenate these and then encrypt them. There are specific processes for how to obtain the Access Key and Secret Key, and how to acquire both the Key and Secret, which I’ve all explained. After logging in and recharging, copy those two codes and paste them here, and we can start referencing them. Okay, we’ve set up the parameter settings. Let’s continue.
Code Node for Video Upload (Google Gemini)
This is a Code node. After importing, you first need to change these modifiable parts to your own. Below is a section of code. Because Google’s multimodal video upload document doesn’t have a native node, it can upload images but not videos. So, we need to construct the request body within N8N to complete the video upload. First, you need to upload the video to Google via the File Google API. Let me open the URL to show you; this is the Chinese introduction. This is a simple request. First, you need to use the endpoint file API endpoint in this code to process the file, extract its size, and then upload it. I’ve replicated this process in N8N for you to use. This is its source. If you need a reference, you can use it. For the specific operation, we’ll directly add a “Code” node. After adding the code, I wrote a piece of code that can upload MP4, MP3, PDF, PNG, JPG files. Let’s rename this code. Okay, after processing, we can continue to add the next node, “Edit Fields,” and rename it to “Video Pre-processing.” This requires video pre-processing because its API has a few HTTP requests, not IDs. Sorry, HTTP. Okay, let’s rename it to “Video Pre-processing.”
It first makes a POST request. The URL is our Upload File request endpoint. Here, you need to open “Query Parameters”; the request parameter uses a Key. This Key is our Gemini API Key. You’ve already assigned it through Gemini in the parameter settings, obtained from Google AI Studio, as it uses this to verify who we are. Next, we send Headers. There are several parameters for this Header, which are fixed Google upload protocols. This fixed parameter adds “Upload Command,” which tells it to start the upload action. Adding it will generate a URL or code telling it where to upload. This is “JSON File Size”; just enter it directly. For “Content Type,” we enter “application/json.” Once these parameters are set, we open the “Body” request and copy and paste this request body. If you want to use JSON directly, use “Use JSON.” Open “Expression” and paste it; it will directly extract and process the JSON file. This is our endpoint setup. You can review it again, and I’ll also double-check to ensure no details are wrong. My voice is getting a bit heavy now; I’m a bit tired. I’m explaining this without any cuts, pauses, or repetitions, and without any draft preparation. I didn’t even write a script; I put more time and effort into video production. Perhaps what I’m doing is still a bit rough, but I hope you can understand me. Okay, connect it here, because we’ll need the actual file uploaded later, so the binary data will also be extracted. I’ll connect it here. The next two forms will be fine.
Uploading the Video (HTTP Request Node)
After the form, it will preprocess, and after generation, we continue to add new elements. The next is the video upload node. I estimate it’s also an HTTP request body process. Okay, let’s first rename it “Upload Video.” Of course, I’ve done this process, and many parts can be simplified or achieved with just one line of code. I generally stick to what works during testing and don’t change it much. There might be some redundant nodes later, but you just need to know that there are better ways. All roads lead to Rome, so it doesn’t matter. You can have your own method. The Key is still copied from our comprehensive parameter settings, Gemini. Then, scroll down and provide the specific parameters for sending the Header. “Content Length for your size.” Just paste it directly. Because my approach with Make is different from N8N, I often run Make because its syntax is simpler and you can click directly. With N8N, I don’t use manual writing because it’s too complex and would waste even more time. So, I generally paste and don’t re-run. But when I explain the upload process, I’ll explain these request body parameters and tell you if it’s completed. Earlier was the start; this is the completion of the upload. That’s its meaning. I’ll open the request body. What do you need to send here? You need to send N8N’s binary file. Because this file is truly processed for upload, its Type is “Binary File.” The “Input Data Field Name” is “MP4.” Why? Because this “MP4” is the name of the binary file from your upload here. So, you directly obtain it by order. Because it will directly request to obtain the MP4 binary file, and once these processes run, the binary file will be lost, as everything will be in JSON format. So, what do you do? I’ll use an aggregation node to bring this file along. That’s the logic; it can immediately upload this binary file. So, let’s check again, POST URL is fine. Headers have these few parameters. Below is the Body, generally “File,” yes, no problem.
After this request is uploaded, it will… after uploading, we need to extract the parameters again, which requires another Code node, because it returns from the URL. I forgot. We use this Code node to rename the truly needed data fields. “Check video type.” Why? Is there an error? You see, while I was explaining, I got mixed up; this part is wrong. After uploading the video, it needs to wait. I made a mistake again. So you see, my mind is really overloaded. So, is the next one a Wait node? It needs to wait for a period of time, because after all, you need to process a file, and Google will also analyze this file with a large model. So, you might need to adjust this. I usually select “1” for the command, giving one minute per file. If your file is longer, you need to increase this, otherwise, it won’t be able to get the address or the actual ID. So, everyone must pay attention; this might need modification. So, even if you import my video, you can simply watch it. After waiting for a period, we’ll add another “Edit Fields” node. This is where we set the prompt. Why did I create a new field? It’s for easy editing and mapping, because this is the most important location for the prompt. We’ll add a field named “Prompt,” and its String will be an Expression field. I’ll copy my prompt here and click to open this field. I intentionally kept this prompt relatively short, because the large model has already seen the video itself, so you don’t need to define what type it is or what its viral points are. You just need it to analyze the video. Giving more words to the prompt is not necessarily a good option, especially with such powerful large models now. Look, my prompt is: “Human, your task is to accurately analyze the reference viral video provided by the user, provide its successful propagation techniques, methods, visual hotspots, and skillfully combine the style and characteristic elements from the reference image prompts provided by the user. Focusing on the new video theme specified by the user, design a series of viral image prompts for display. These prompts should precisely replicate the successful elements of the reference video and creatively apply the image style, ensuring each scene has strong viral potential.” This is its core definition.
We provide this prompt example for you to use. For example, we go to LiblibAI and use this Lora. Let’s filter for a Lora. This is a Lora. If I use this Lora, this Lora will typically provide some reference prompts, and these prompts might often contain keywords and atmospheric words commonly used within that model. So, we need to input these words into our large model, allowing it to combine them with the atmosphere of the image prompt. You’re definitely not generating a video identical to the original; you’re creating a new theme. So, here you can also set your own new theme, and specify how many scene images, how many quantities it should generate. This is the scene count, all from the previous requirements. Also, do you need to place trigger words at the beginning? And what are the pre-requisites for the image prompt? These are all its outputs. For the output format, I ask it to wrap everything in JSON, with each prompt containing a certain number of characters, which is optimal for Flacks image best practices. Since I’m currently using Flacks 1, if you’re using Stable Diffusion, you can modify it, or you can add all of this here, it doesn’t matter. The prompt I set is of relatively high quality, and I feel what’s important is the content generated. This is the prompt. After giving it, we’ll close it. We also need to open “Other Input Fields.” It’s okay to open all of them. No problem. Now that this prompt is set, we can truly start calling Google’s multimodal large model to analyze our video and generate scenes. Just imagine, the large model directly uses your scene breakdown and analyzes the video to generate scenes for you. Isn’t that being a director? The most important task for a director is to design visuals based on the script, right? Now, we’re letting the large model be the director. Then, we give it a name: “Analyze Viral Video,” and click “Paste.”
Analyzing Viral Videos with Gemini
In fact, much of what I’m doing now is development in disguise; it’s a prototype in disguise. If you’re a programmer, I wouldn’t dare say I can replace a programmer’s work, because programmers need more redundancy. What I’m doing can be used by one person, but if you need 1,000 or 10,000 people to use it, that’s different. I’m currently only focused on making it usable for myself. So, for software development, you might need to add computational frameworks or redundancy mechanisms, including these contents. However, programmers can refer to my ideas, which is perfectly fine. In fact, what I do every day is essentially the simple development of a program over a month; it’s all of that difficulty. This is a URL, the endpoint you’ll request. Here’s what’s important: when requesting the model, you *must* change this. To simplify for everyone, if you’re using Flash, Flash is free. If you want to use a paid model, remember what I told you earlier about where to find it on Google. That page has been closed. Let me open Google AI Studio for you. It’s essentially here: you click on which model you want to use. For example, if I’m using the free one, I’ll use Flash. Copy this and replace it there. If you’re paying, then use Pro. I feel Pro will definitely have better results because, in fact, the entire creative part of the video is handled here. So, I use Pro myself. You can use the free version, and you can switch it later. This is the request body process. This is “Generate Content” for generation. No problem. So, let’s continue to add and open. Query definitely needs to include our Key, right? That Key is the code we set in the parameters earlier. Continue to add our Body, still JSON, Use JSON. The request body is complex; paste it. You see, copy the text. This is the complete part of the entire prompt. Why did I ask you to combine and map the fields in the previous node? It’s to make it easy to process the different fields here. This Type also uses the previous parameter; what kind of URL is it? This is the most important file URL, the ID of the video you’re analyzing. So, the previous half-day of uploading was just to get this ID. The rest is its prompt. Okay, it will analyze such a video for you.
We continue to add. After analysis, we need to extract this prompt, which requires “Edit Fields.” Let’s add it. I’ll rename the Expression field to “Image Prompts” and paste it here. This uses a prompt to directly extract JSON from it, right? Because it inherently means this. After extracting and parsing, we’re done. After that, we’ll add a traversal node. Now, scene generation has truly begun; we’re actually going to generate images. For traversal, we’ll send it from “Data” and rename it. Click “Traversal.” Here, it’s the scene breakdown within “Image Prompts.” This must be “Fix,” not “Expression.” Okay, no problem. These two nodes are now set. Let’s run it now, let’s really test it. Click “Run.” Our theme is fine. Set the scene count to 4. For “Branch,” this is definitely a form submission. Let’s select one. For “Download Media,” I’ll just upload a random video. Click “Create.” We can minimize it now; it doesn’t affect anything, as closing it directly also works. It’s inherently a… here, you haven’t finished uploading yet, so I can’t close it; I have to start over. Okay, “Lion.” 4. Select “Form Upload.” It needs to process first. “Create.” In fact, you can modify this part yourself or directly provide a URL. If it’s a URL, we can close it. Okay, submitted. It’s given to us via Webhook, and then it aggregates. Processing. As expected, an error occurred. It didn’t get the binary file. We can check and see what happened. You didn’t enable “Include Other Input Fields,” so it didn’t receive the binary file, which led to an error. This is why we need to test. My general workflow doesn’t have this error, but when explaining, it’s inevitable to have these small details and setting errors. So, if you can reproduce errors during testing, it’s good to test more.
Let’s try again: “Lion.” 4. Select “Form,” and let’s switch the file to a lion and upload it. As you can see, it no longer errors because you enabled that option, and it’s carrying the binary file with it. This is fine. In fact, you don’t have to use this form; you can enable the binary options for all these nodes as well. But I generally don’t process it that way; I prefer to aggregate through routing. Why? Because this field actually analyzes binary files, so all these options should be open, bringing other fields, and bringing the binary file in addition to the JSON data for analysis, then continue processing. And the uploaded file also needs a binary file. So, what do I do? I’ll pass this binary file directly to it via the Merge Form router, by means of “Position” aggregation, and let it upload. This is the complex logic of this workflow. Because some nodes must have binary file input, you need to use techniques like setting up additional fields within modules or using routers to bring binary files from elsewhere. Now, it has started generating, waiting for one minute to process. After obtaining the file URI, setting the prompt, and giving it to the general director, it designs the scene breakdown for us. No problem. After extraction, we notice a process in our actual tests: it returns a complete JSON, but why didn’t it parse it? Because you treated it as a string, so here it should be “Object.” And I didn’t set this here; I made a small mistake in my actual video settings. So, this should be “Object.” After that, you tell it that this is data, and it will parse it for you. If it doesn’t parse, you can’t iterate later, so we have to go through it again. I’ll save it again and go through this small process again. So, there are indeed many small details, but my own workflow is fine, and the one I provide to you won’t have issues because I’ve tested it with dozens of videos without problems.
We’ll wait for the entire processing to complete, which takes about a minute, before it triggers subsequent steps. This is a table-driven approach for requirements. What about automation? Because I often explain automation in Make, with automated processes, each of my workflows can be completed automatically. You just need to add this mode. Today, using this example, I’ll show you how to generate automation within the N8N framework. Now, automation is another node. We generally use time-based automation. Select “Add another trigger,” then “On Schedule.” This is a custom time setting. Let’s drag it here. For example, I can modify it according to my settings. This is an automation node. This is automation. How many times a day do you want it to trigger? I usually choose 5:00 PM, 0 minutes past 5 PM, to trigger once directly. It will run once a day. This is its daily run parameter. You need to change the Workflow to “Active” state for this automation trigger to truly work. So, this is a small detail. But once this is set, you have an automated trigger node. You can add many more, such as specific times each day, or even running every 10 minutes, as long as you want. So, it’s 24/7 monitoring. After it starts monitoring and enables this workflow, we need to input some “Edit Fields.” You’ll need to pre-set some requirements because you’ll definitely input some in the form. This is the theme. Click “Add.” First, for “Scene Count,” if it’s automated, it’s generally fixed; you won’t have 4 today and 5 tomorrow. So, you fix it to tell the director how long to shoot today, say, 20 seconds. He’ll know. This means 1 second, with one scene being 5 seconds. The image prompt example, for automation, generally doesn’t change. For example, if I’m generating one style, then the example prompt here is the example prompt provided by each of our Lora models. You paste it here, and it’s generally fixed, mainly for reference style. Continue to add.
Reference video: For example, which viral short video series are you referencing? I’ll put it here. Who do you want it to learn from? These are all fixed; generally, for generating a single style, we don’t need to change them. There are two parameters below: “Theme Summary” and “Beach.” Let me explain the specific logic behind these two. I’ll paste it first. These are all for if you want to test generating one at a time. If you test daily, “Theme Summary” is one option. But if you want automation, you definitely don’t want to input it manually every time, you want the large model to generate it. So, I’ve listed 30 numbers for it. If you have any theme style you want to input, you can keep the first 30 unchanged and then specify the themes you want to generate. You can have the large model generate dozens, or as many as you want. After generating 30, the large model will randomly pick one to generate. This is the meaning of automation. This is the theme, also generated by the large model itself, but within your selected range. This is our module setup today. With the theme, next is the AI Agent. Add an AI Agent, etc. Click “Generate Theme.” The AI Agent definitely won’t be through the dialogue below, but “I require specific output format.” Paste this Prompt. I’ll also paste the system prompt. First, paste it. Let me explain this section. Click “Function,” then “Open.” The meaning here is to select a theme from the dozens of numbers you provide, then generate a Chinese and English description for it, and generate an appropriate number of scenes. This is automated generation; this is its prompt. You receive all the information in the “Theme Summary,” and you randomly pick one. The way it picks is through random generation here. Because this is a prompt, the example scene count is fixed, and the theme summary is also fixed, I used an expression here to make it automatically provide a number from 1 to 30. Of course, for example, if 30 isn’t enough, and you generate 100, then you change this to 100, and change the “Theme Summary” to 100, and it will automatically extract one of them for you to generate. Why provide a prompt example? It’s to tell the large model how to set the atmosphere, like a sad feeling.
Okay, now that this is set up, there’s another point to modify. If you want to summarize multiple items, you can use the “Beach” data. For example, if I want to select many from “Beach,” then when mapping, you’ll use “Beach” instead of “Theme Summary.” You must change it to the “Beach” from earlier. This is because I’ve made it convenient for you to add many series yourself; you can modify this series here freely. I usually keep one item in “Theme Summary” for testing, so I don’t need to input anything. Every time I test, I just click run, and it only selects that one, making it easier to control the generated result later. If you want multiple, just modify it. Then, you’ll definitely need to add a Model. We’ll choose this. 2.5. I usually default to 2.5 now, it’s pretty good and free. We’ll select “Text Parser” to parse it, “Structure Output.” Click “Open.” This field is the example it generated based on your prompt, so you put it here, and it will parse and extract it according to that. After extraction, we’ll rename it; this is the actual theme for the generated video, for the content. After generation, we need to add another “Edit Fields” node to summarize the fields. Okay, what are these fields respectively? Let’s open and see. I’ll rename it “Set Parameters.” The first field is “Video Theme,” getting the video theme from inside, still in JSON format. For “Scene Count,” we click “Add,” also getting the scene from earlier. There’s definitely an issue with the “Branch” here; the “Branch” field needs to be fixed. This is the automated prompt example. Click “Add,” and we’ll add the prompt example here. Once these items are processed, we’ll close it.
We’ll connect these items here, pass them to it, and slide this up a bit to offset it. There’s also the middle one, which is our HTTP request. Since you need binary data, I’ll download it for you here. Download “Analyze Video,” select “Rename,” select “GET.” Its URL is the reference video you provided as a field earlier. Scroll down, and change the “Response” to “File.” No problem. Okay, where is it? It’s here. Drag it over. The “Theme” branches out from here and goes to the second one. Because this part needs a binary file, you’ll pass this binary file to it as well, and also the previous content. And for the theme’s video analysis URL, if you’re automating, you need to input all these URLs beforehand, so that automation provides it. This part also needs a binary file, so what do I do? I’ll also pass it to the second one. Okay, now click “Save.” So, have we processed both paths clearly? Let’s check; these 4 scenes have been processed. No problem, let’s test again. Generally, align them; this one goes up, that one goes down. The form, because this part requires binary code, it passes it through the router. Since both of them have binary file options open, it makes a selection. This part requires a binary file, so it also passes a binary file to it through a router, directly obtaining it from there. It proceeds backward without issue. This path is pre-set for automatic startup. Whether based on your provided theme summary or a fixed single summary, it generates various parameters for the large model. Downloading the video is also sent here, for this module to use. The downloaded binary video is also directly sent to the upload video via the router. This is the logic. Now, we can directly click “Run” and see if there are any errors.
Okay, let’s test. As expected, we got an error again. Where? It didn’t get the MP4 file. Why? Click “Open” and check the “Download” section. I didn’t rename it. It must be named “MP4” for it to be recognized later, because everything afterward uses “MP4.” Whether it’s PDF or PNG, you must rename it to “MP4” for the file to be identified later. So, I will test more while explaining to everyone, because there are many small details, and it’s impossible to explain every single one in such detail. I’ll run it again. In fact, here, I can also refer to images. JPG images can also be used. For example, if I can’t find a reference video for this direction, what do I do? I generate an image, let it generate a PPT-formatted image, let it read and understand it, and then set it directly from the image. So, there’s no other way. The result can be generated as images and PNGs. However, the Name of the downloaded file must be “MP4,” because it needs to run later. Okay, it’s running now; let’s continue with the next step.
Image Generation Loop
Next is a large loop. The meaning of “Loop Over Items” is not traversal; it means to limit one at a time. I also interpret it as traversal, setting it to one at a time to limit it. Click “Loop,” rename it. Okay, after the loop, we’re actually going to generate images. Image generation involves splicing many HTTP requests. So, let’s continue to add a “Code” node, close it, and connect it here. Close this. Okay, let’s rename it to “LiblibAI Stitch URL.” As we moved the Liblib tutorial forward earlier, it uses an API key that requires us to first stitch the address with milliseconds and random strings. So, let’s stitch it first. We’ll use “Raw,” which is the original text. The String is the previous content. This is why you know it; our comprehensive parameter settings for LiblibAI image generation have an endpoint, and then there’s the timestamp from the parameter settings plus the random string. Stitch it once first. After stitching, we continue to add an encrypted signature. “Crypto.” Close. “Crypto.” Let’s rename it. This is where I encountered a problem that stumped me for a month. Here, we’re using HMAC. For “Type,” if we select from the dropdown, there’s actually no SHA1. However, my N8N code library supports this field. So, by chance, I chose this expression. Previously, I tried to encrypt it using third-party API requests, but I felt it was insecure to give my key to others for encryption. Then I looked for third-party “Community Nodes,” but they required embedding code libraries, which I also felt was unsuitable; I didn’t want to install many other things in N8N. So, there was no other way. I tested it many, many times, trying whether it would work without, or directly using Code, but nothing worked. Finally, I used this method. Although it has a red star (indicating “is not supported”), it runs successfully. It takes the value, which is the result of the stitching, and gives it here. Then, “Secret Property Name” is passed to it, and the Secret Key, which you get from our N8N LiblibAI, is placed here, encrypted together using Base64 format. However, this function is successful despite being marked with a red star “is not supported.” I achieved it through expressions. So, I actually wanted to talk about Liblib for a long time but kept delaying it because I hadn’t solved this problem. Previously, you asked if I could make videos in certain directions, and I said there were sticking points; that’s what it was, just couldn’t figure it out. But now, one day, out of boredom, I decided to modify it, and it worked.
Then, for the URL, we process it with a Code node and give it a name, adding something later. The generated result here also needs to be processed by a Code node because it must be URL-safe, meaning you need to replace certain characters in the string. Some characters are not supported in URLs, so we need to process the generated ciphertext. This reason comes from here: it generates a URL-safe Base64 signature, so this is another piece of code to process it. So, I recommend Liblib’s official tool, which is very good, and can make this simpler for us users. This is also the first time I’ve seen someone explain LiblibAI; I’m again a pioneer. Currently, there are many N8N workflows, but you won’t find the results I’ve shown in my videos, whether abroad or domestically. You can search anywhere. So, it might take a long time for me to produce videos; sometimes I’m not even sure if I can produce one video every one or two months. The challenge is getting harder for me. But I hope to share these unique things with everyone. Even if I want to talk about something repetitive, I must find my own innovative angle before I can talk about it. If not, I won’t. Click “Send Header.” This is the actual LiblibAI request. Of course, in a program, it can be achieved with one line of code, but we’re working within the N8N framework. “Open Liblib Cloud” is its fixed URL, and the parameter “LiblibAI Generate Image” is its endpoint. What is its Access Key? What is the signature? What is the timestamp? What is “Nonce,” the random string? They combine to form a very complex URL for requesting. Whether you’re generating images or getting content, they’re all different. Then, for “Content-Type,” select “application/json.” I believe that if you can use this workflow well today, it will be worth the cost of your “Small Newsboy” subscription or half a cup of coffee, even if I’m not running an advertisement. Okay, copy and paste this JSON here. This is a key part I want to emphasize.
This is where the actual image generation parameters are matched. Let’s open it and look at the details. “Template UUID” is the ID for each template used, which is fixed. Let’s take a look. On LiblibAI, you can actually use Xingliu 3. It also has a “Template ID,” which is fixed. If you use Xingliu for text-to-image generation, you can, but you’ll need to replace this ID. You can also use LiblibAI’s custom models. I use this model. Scrolling down, its complete parameter “Template ID” is here. Is it E10? E10, no problem. So, you need to see whether you’re choosing a custom model or a Xingliu model. I even remember it should support ComfyUI. The text-to-image model’s “Template ID” is also different. So, this workflow is very suitable if you’re already an image generation expert, a Lora training expert. Then you can use this to batch produce videos. For you, parameters like ComfyUI and iteration deployment are all very simple. So, you can use this ID. This workflow is very powerful in terms of style; you can deploy your ComfyUI models on LiblibAI and have N8N call them. It’s like adding something extra. I’ve forgotten the idiom. It’s on the tip of my tongue, really forgot. This part is for sparking ideas. If you’re a powerful expert, you’ll have many ideas here.
Then, for “Checkpoint ID,” this is your main model’s ID. How do you find it? For example, if you’re looking for a large model, click here, then click “Filter.” Go to the official website, open “Bubbles.” Right? This is a parameter for all large models. I usually use the F1 standard official model. You can filter by “Most runs.” I typically use this one. Click “Copy” and “Open.” There’s a UUID Version, a UUID 412B. This parameter is a very important ID. Where should you modify it? It’s in the parameter settings. Let’s close this. In the parameter settings, “Main Model ID,” you need to modify this according to the model you choose. I’ll close all of them. I generally use the F1 large model, which is the official large model. Then use its ID, and that’s fine. Of course, it also supports various other large models, such as Midjourney realistic, AW Portrait, right? These are all very famous models, right? Midjourney, these are all very powerful large models; you just need to pair them well for image generation. Once you’ve selected the large model ID, let’s continue.
Next are the Lora IDs. Let’s continue to open LiblibAI Image Generation here, and click “Open.” The “Prompt” is definitely the prompt for our scene generation from earlier. Let’s click “Open.” Inside the loop, there’s “Chinese Prompt,” which should be “English Prompt,” generated by it, and it also carries trigger words. So, we need to pass it here. Then, I’ll gradually explain the next few parameters. Generally, how do I recommend finding models? I’ve written a LiblibAI tutorial here. In this entire workflow, I believe the most important thing is that you cannot control image-to-video generation; the model you choose, whether it’s 1.6 or Keling 2.1, doesn’t offer many steps to improve it. It might have small flaws; this is a limitation of image-to-video generation. But the most important thing is that you *can* control the Lora. So, when looking for Lora, if you want the entire workflow to run stably without flaws every time, you must choose a high-quality Lora. What is a high-quality Lora? A high-quality Lora has a fixed style. For example, it generates a Lego style, but when you give it different prompts, it can accurately reproduce every character and environment, and it’s also divergent, with strong generalization capabilities for various scenes. This is what makes a good model. For example, there’s a creator I mentioned earlier, “Wan Junping” (referring to the author of the Lora model used in the examples), whose models I’ve tested extensively. I can recommend them based on the official website. Open this. This is not the realistic one. I tested the author’s models, and almost every one has a very high success rate. Why? Because if you find a good Lora model, it means the author is truly serious about training image generation models in that field. To be honest, you can’t tell the difference from surface-level images now; every one is beautiful. How do you judge if a Lora is high-quality? You truly test an idea in practice. For example, I initially generated miniature models. Let’s say I’m looking for this one; I’ll immediately find a good prompt model, click “Generate Similar,” and then based on that, you yourself generate 10 prompts for different scenes in a similar direction. Look at the generated results within these 10 images to see if they are stable. If stable, then apply it to the workflow. What does stability mean? It means this person was very meticulous in labeling when training this Lora model, and their ability is strong. They understand the various details generated, and they are careful, because I feel that the most important aspect of Lora training is being able to find a very rich set of images, and at the same time, each image can be richly labeled. This depends on the person’s style of work. So, once you find one good Lora, you can try their other Lora models. Also, why do I use LiblibAI? As I mentioned in video 19, for children’s illustration picture books, we downloaded the model and ran it under the Fal.AI framework. But now, it’s encrypted, so you can’t. If you want to use “Wan Junping”‘s model, you can only use the LiblibAI platform. This is why our entire image generation model uses so many modules to implement this API request; this is the reason. So, for example, if I don’t want to use LiblibAI and want to use another platform, can I? Then you need to see if your model supports it. If you have free models that can be downloaded, you can use them on other platforms. But the model I use is encrypted, so there’s no other way. This also means we’ve grown larger; after all, if people do this seriously, they need a certain return, and I support this behavior. On the contrary, I don’t support the idea of “it’s free, it’s free.” In fact, free is often the most expensive. So, here, you must choose a high-quality model. This is your selection logic. Let me summarize again: find an image, use its example image, “generate similar,” then use a large model to generate dozens or 10 prompts in similar directions. Test it for a long time, because with LiblibAI, you can quickly generate images and see their effects, and check if its Lora is stable. As long as the Lora is stable, you can use it.
Once stable, then it involves what your negative prompt should be. You can reference the Lora model’s own negative prompt, copy it, and paste it here. Generally, negative prompts are fixed. Then, its iteration steps. For “Sampler,” where do we find it? Let’s look at the “Wan Junping” model I mentioned earlier; my favorite is this one. Its sampler is Euler A. How do you know this sampler method? I’ll open LiblibAI, scroll down. Model selection 5.5. Sampling method. For example, if it uses “kWh,” you change it to 15. So, you adjust the sampling method here based on the model you use, including the iteration steps. Some are 30, some are 28, some are 25; you modify it according to its image generation examples. For example, if the prompt CFG is 3.5, Scale is -1, and iteration steps are 30, you adjust these parameters accordingly. Then, for the specific image generation style, if it’s 16:9, you set it to 16:9. If it’s 9:16, you set it accordingly. Generally, for horizontal screens, I use 1024×576; you can switch them. If it’s a vertical screen, put it at the top; for horizontal, put it at the bottom. This is it. And here, it’s generally fixed to generate one image. Then, for other Lora models, you can select their Lora ID. This is where you find the “v” User ID and ID. You copy and paste it into our parameter settings, set the Lora ID, and specify its weight, some are 0.8, some are 0.7; you need to modify it. It also supports high-definition resolution repair. What are the repair iteration steps? And the denoising strength is 0.7. Then, its upscaling parameters. There’s also an upscaling model; are you using 10, or the more commonly used 8x or 16x? You adjust this progressively based on your image generation model. Ultimately, the two parameters for the images you generate must match: whether you’re generating 2x or 1.5x. This is the most complex part. So, if you’re an expert in image generation models, you don’t need to test; you’ll know how to modify it yourself. You can even write this with your eyes closed. If you don’t understand, then find these models and their original examples, and even there are many different parameters below. For example, if you modify all these, and then “generate similar,” you can quickly extract these parameters. That’s it. So, you see, the generated model quality is very high, and most importantly, I feel the characters are consistent, so we no longer have character consistency issues, because every male character has the same appearance, and every female character has the same appearance. So, I really like this model, and I also want to give a shout-out to the author. Of course, there are many very high-quality models on LiblibAI that I haven’t tested further because I have limited time. I’ve only tested one high-quality one, but there are many, many others for you to test, which is fine. Various styles.
Video Generation with Fal.AI
Okay, we’ve covered the image generation model in detail, as it involves complex parameters that you need to understand how to modify. So, listen carefully to this section. It uses our example to generate this result. Okay, with the image generation model, let’s continue. After image generation, we add a “Wait” node, because it’s not real-time. Click “Wait.” Rename it. Image generation is usually fast, so I generally set it to one minute. After image generation, you need to retrieve the results. Since the endpoint is different, you’ll stitch it again. This is that part. So, let’s continue to add a plus sign. There are tens of thousands of models in LiblibAI, so I had to find a solution for this complex content to provide to everyone. Because it’s very simple; later, when you change models, it’s very fast. If you’re proficient, just click. This combination has changed; it’s not the previous result, not the previous image generation; it’s its result. Combine it. Okay. Then, let’s have it sign again. For HMAC, the encryption method, we choose “Expression” and enter “SHA1” in uppercase. Its “Value” is “JSON Raw,” and then it applies to it. Then, we select its “Secret,” and we copy our own Secret Key, choosing Base64. This will sign it for us. Then, we need to add a “Code” node to process it, making it URL-safe. Then, paste the code.
Below is an HTTP request, actually getting the generated result. Click “Result,” then “Paste.” Okay, we click “Send POST” to query its status. This URL is also quite long; it’s stitched together using the same method as before. Open “Header,” select “Content Type,” still JSON. For “Body,” we don’t need the original layer; this is the generated ID. You can directly input the ID generated by the previous field here, and it will retrieve it and return the result. We also need to add two conditional checks using “If.” Because it might contain sensitive keywords, it would stop generating. Otherwise, what happens to the loop? I’ll add a condition to check if it contains sensitive words. You know, for example, even if I don’t add it, I can still explain this workflow to everyone, but then you might not be very stable. Of course, I can’t guarantee that my workflow will always be stable now, but I’ve done a lot of detailed testing to ensure it’s stable in my opinion. Select “2” here. This means to check if it contains any “sensitive” or “invalid” content. Because sometimes your request body is invalid, and you need to include that. And if it’s sensitive, you also need to include that. If it contains sensitive content, what do you do? Where do you go? You go to the next one, right? I won’t go through the entire image; I won’t generate it at all. I’ll wait a bit. Shrink it a bit. I have so much content. Shrink it a bit and let it go directly to the loop. As soon as it detects sensitive or invalid content, it goes back and says, “Let’s discard this scene. Next.” Then, we can all copy it. If it doesn’t contain sensitive content, then we can continue to check if it’s completed. Click “Rename.” Okay, the returned result is whether an “Image URL” exists. Does it contain “HTTP”? Because as long as it contains “HTTP,” it means there’s an image. This is a judgment condition I frequently use. Delete this. Generally, if this field’s content contains “HTTPS,” it means the image has been generated and is complete, so it moves on. If it’s not complete, what do you do? You wait a bit longer, right? So, I’ll connect it to wait for a while. You generate the image, then wait, then query. Query. If it doesn’t contain sensitive content, has it completed? If not, wait a bit longer. Okay, continue. If it’s not complete, wait a bit longer until it’s complete and moves on. This is the conclusion. Then, below, I added a node to download the image. Why download it? Because later, we’ll give it to the large model to find dynamic elements based on this image. Click “Download Image,” paste, rename. Click “Response,” we select “File.” The name is “Data”; we don’t need to modify it. Okay, this is what we’ve tested. We can run it first; it will run automatically. So, we can see that in the process of creating this workflow, I’ve always been studying the content in great detail before publishing each workflow, to understand how the prompts are generated. So, if you want to see this process, you can join my “Small Newsboy” channel, where I often prepare relevant documents before each video to explain how I generate from images. What are the logics involved? For example, if we click this URL, we can directly jump to the complete content that I explained about image-to-video prompts, what parts they consist of, and why they are designed this way. You can see, scrolling down, there are some director’s notes, right? How the screen is constructed, resolution, duration, and parameters used. These are all for different image generation models. Further down, I mainly look at this part, “Structured Replacement.” This means that if you want to turn an image into a video, it must involve the use of cinematic language. So, you need to, right? In the process of constructing the prompt, how do you use the subject, what are the actions, what is the scene, what is the environment, what is the style, what is the aesthetic? How does the subject move, run, jump, happily jump, rush away quickly? These things form the atmosphere of our video’s dynamic effect. The short video we’re creating at the sensory level must have atmosphere. If it lacks atmosphere, the video has no core, no focus. So, how these prompts are designed, I’ve prepared quite lengthy texts for all of them.
And today, there are also things like how to create scripts. We can scroll down to see what its main content types are, right? What types of viral short videos are there now? How to construct prompts? How to construct characters? If you don’t want to read these things, you can send them to a large model. Send them to a large model and ask it to extract a prompt based on this, or optimize the prompt for you. Google now has context windows of tens of thousands, hundreds of thousands, so you can have Google optimize these prompts for you. These are our foundations. This is how I always prepare my thoughts before each explanation. Of course, my official website is also updated with similar content. For example, for short videos, there will be, right? These two from “Small Newsboy” that you can read after subscribing. There’s also “Decoding TikTok and YouTube Shorts.” These are things I usually study to prepare for such videos. I hope to learn in a philosophical way, right? You recognize the world before you transform it. How can you transform the world if you don’t recognize it? This involves analyzing what makes TikTok Shorts videos viral, what elements they use. After watching these videos, I analyzed and concluded that short videos have three levels: sensory, narrative, and conceptual. After understanding these three levels, I realized that my current video creation direction is sensory. Since it’s visual, what methods do I need to use? I use prompts to give the visuals a certain atmosphere. This is all covered in the scriptwriting I’ve prepared, and there are relevant videos. This is what I wrote previously, right? Tutorials for on-camera voiceovers, guides for transitioning self-media niches, and even other workflows. You can often log in to my official website to check them out. There are NCA tools for cross-border e-commerce, SEO tools, right? These multimodal categories are all available. I’ve probably written over 100 articles by now, of course, with the help of AI tools or automated workflows. But this doesn’t matter, because sometimes if you don’t want to read, you can let a large model read it and summarize the core information, which is also a good way to get information. I will continue to update in this way.
Post-Processing: Video Stitching, Audio, Watermark
Okay, after the image is generated, we can see how the image generation turned out. This image, you’d say BBC themselves took it. No problem at all. Look at the lion’s expression, the sunlight, the atmosphere, especially the sense of perspective, right? The virtual perspective. I think the quality is very high, and now there’s also high-definition repair; you can make the image very high-definition, and you can add different Lora models. I feel with LiblibAI, if you truly know its powerful capabilities, you’ll know you can combine different Lora models, right? Different ComfyUI setups, so that the entire… actually, now, due to the limited capabilities of our large models, the direction of our video generation theme still needs constraints.
Okay, after generating the images, we continue to set up the video. We add an AI. Indeed. You see, the latest N8N workflow already has community nodes; you can install them directly here. I haven’t tested the Agent yet. Click “Video Prompt Dynamic,” then set this part. I’ll copy all the prompts. System. Expression. Okay, are there any other points? Be sure to open “Automatic Pass Through Binary Images.” Let’s review this again; this is also quite important. The large model has written the scene breakdown for us, and now the large model wants to figure out how to shoot it. This is also the large model acting as a director. Let’s open it. Your role definition is based on image input: “You are a top-tier narrative director and AI video expert. Your task is to deeply interpret the static images provided by the user, and with precise conception, create highly optimized English video prompts. These prompts should be designed for AI image-to-video models to guide the creation of dynamic, fluent 5-10 second videos.” Isn’t that it? It tells it to act as a director, to analyze the dynamic content within the user’s static images, including the main subject, multiple subjects, scene, environment, lens style, simple prompt practice, and even output format. So, isn’t this what’s in our file here? This is the script; this is an extraction from the complete image-to-video content. It was precisely because I read this long article and analyzed it that I was able to write such a prompt. So, this is the most important second step in being a director, and you need to pass the images to it. And then, you can also pass the special video requirements from the comprehensive parameter settings to it, and also the generated image prompt. This allows it to generate images and act as a director once more for us. This is what it generates. We click “Add this MS2.5520.” Then, I actually used regular expressions to process it. We’ll clip and add. In fact, I still think a workflow doesn’t need too much; it just needs to be handy and concise. So, I don’t pursue quantity, nor will I rush to publish one quickly. Perhaps once every one or two months is fine, but it must be insightful. I hope that when you see this, you can truly support me. Support me by purchasing my “Small Newsboy” subscription or buying me half a cup of coffee, so I can spend more time creating even better workflows.
Then, for these two video prompts, you just need to use regular expressions to remove the outer formatting and directly extract the prompt content. Additionally, I also put the image prompt here. When generating images, you have two choices: you can use it to act as a director, or if in the future you don’t want it to act as a director, you can just use the image prompt to generate directly. I’ve left both options, you can choose according to your needs. After setting the parameters, we need Keling. HTTP. Click “Open.” Click to rename it. This is essentially a Fal.AI request, which is relatively simpler. I’ll set up the POST request for you first with its URL. Click “POST.” You’ll need to register for it. If you’re following my “Small Newsboy,” you should have received about $70 in credits, so generating hundreds of videos should be no problem. So, actually, this workflow today is free. Then, paste the Key. I’ll explain the steps first, then paste the request body, click “JSON Open.” Okay, this is its request body.
Let’s first discuss the content here. CFG is generally set to 0.5. I haven’t changed the negative prompt. When generating horizontal or vertical videos, you must adjust it here according to whether it’s 9:16 or 16:9. The parameters you choose must be supported by your large model, whether it’s an image large model or a video large model, as long as they correspond. For duration, we select from the settings. You can set it to 5 or 10, as Keling currently only supports these two options. For the prompt, you can choose the previous video prompt. If you want to use the image prompt directly without analysis, you can choose the image prompt; it’s actually selected here. For these two parameters. Then, the URL is the image we generated from LiblibAI earlier. Since it supports URLs, I’ll directly take the image URL from there and put it here, which is fine. Then, I’ll focus on explaining the Fal.AI platform. In fact, for this website, I have $22, but I remember I personally received over $70 in credits. If you want to know, I’ll click “Home” here, then “API.” It will often tell you how to receive $100-$200 credits, which is also worth it, right? It gives $10 for models, Fal.AI gave $20, and later another $50, so it should be over $70, no problem. These are all from various experts, I think “Geek” is the expert who shared it, and I also shared this good news in “Small Newsboy” so everyone can receive such benefits. Then, there are Bimi and coffee. For models, you generally click “Excel,” and you can generate API keys directly from here. In Excel, when you open it, it’s usually “image to video v6.” Now, for “image to video,” Keling has released 2.1, or maybe 2.0. Most of my previous videos were generated using 1.6 because 2.1 wasn’t available when I was discussing videos. Now it’s out, so the quality is quite good. You can click to open it. If you want to use a specific URL, for example, “image to test,” right? If you want to use this video URL, make sure to click “API,” then click here. This URL is your actual API request URL. You then paste this URL here to replace it. I’m using “2.1 Master Image to Video 9,” so this model is actually 2.1, and 2.2 is out now, which I didn’t even know. We can check if there’s 2.1, 1.6. As expected, there’s 2.1. I’m using 2.1, just switched to it. Most previous examples used 1.6, so modify this position according to your needs. But make sure the request prompts are similar. For example, if you select “image to video v9” here, after entering all the content, the request body will appear directly, and then you can paste it. This is a small detail of its generation. With such free credits, you can use it. Then, it will generate the video.
Then, we continue to add another “Wait” node, because this part takes a long time. For videos, we’re currently at the sensory level. Later, I’ll add the narrative level, right? Now there’s a very powerful Google model, VideoPoet model. Change this to one minute. It can now generate conversations, so it can develop towards our scene story direction. So, I actually feel that the video field is also gradually reaching a disruptive moment. For example, Disciplined Agile and Rebar can perform structured writing, allowing us to truly achieve, right? Writing for WeChat official accounts and Toutiao. Video has also reached such a moment. I will soon automate it. If you don’t learn automation, I estimate you’ll soon fall behind, because many videos now are actually AI-generated, a very, very large number. Real videos can now gradually replace them, including digital humans. So, everyone needs to learn automation to improve their video efficiency.
Then, the HTTP request to get the video. We click GET. We copy and paste it. Then, for “Send Header,” just send your key. In fact, this can also be built in and created by yourself, but for convenience when demonstrating, because many people don’t know how to add it, this method is simpler. So, I’ll put it here. You enter the password here, and it will run. After it generates the image and then generates the request, it will return a URL. You get this URL, which is its request address, and through this URL, you can get the video. In fact, the red warning is okay because it hasn’t generated yet or hasn’t connected properly. I’ll connect it, and this node will be fine.
Then, do we continue to make judgments? Click “Add,” then “If.” Has it generated? That is, does this URL exist? Previously, we used “equals HTTP,” meaning if it exists, it can run. Okay, then if it exists, we add another “Edit Fields” node. This final video should probably be over two hours. I’ve spent a lot of time preparing this video, certainly not just tens or hundreds of hours, probably about a month. It can be saved. “Video URL.” Because Google has an issue where it won’t directly give you a publicly accessible link. It’s a set, so I’ll combine and concatenate it here, and it will directly provide a public link. Because this is actually publicly accessible, you can download it as long as you get a public link. Let’s rename it. This has cost me a lot of time. No problem, this is the actual final image URL. The video has actually been generated. Let’s click “Open” and see the “The Evening Sun washes the savannah in gold” we tested earlier. 5 seconds, no problem, right? “The evening sun washes the savannah in gold,” no problem. Generated well.
Saving and Finalizing the Workflow
Okay, after setting up and extracting the parameters, we need to consider the branching, because now we have a form. Let’s use a “Switch” node to make a judgment. Click “Branch Judgment,” then “Rename.” Since you have a form, if you don’t have a final form ending, the workflow will error. So, we need to make a judgment here. Let’s branch it. Rename it, open it. “Branch Automation.” Here, we generally select “Content.” “Form” paste. “Automation” is copied and pasted. This means it will determine which direction to go based on where this branch leads. If it goes to the form, then I have a node to return to the form later. If not, I won’t go there. If it’s a form, we’ll add a “Form” node, a “Form Ending.” Click “Redirect Video,” rename. Okay, it will return all text, or you can directly redirect to the video, that’s fine too. This Radio can paste the video. That’s up to you. I just returned a TXT log, making it end in 1.1 minutes, otherwise, the workflow would keep running, so I constrain its time. So, this is the redirection. Close it and scroll down. Later, we can finally save to the Notion knowledge base. For Notion, we select “Create New Database Page,” and connect it. Your judgment condition is that if it goes to the form, I’ll return a form and save it. If it doesn’t go to the form, it saves directly to Notion. Here, I select my Notion account, select “Automated Videos,” and choose your Notion knowledge base. The entire title is usually according to the Workflow ID plus the execution ID; these two are unique. In fact, you just add a video link, paste the final video link here, and paste it into Notion. Okay. Let’s rename it to “Save Video,” click “Rename.” Okay. So, which Notion template is it? It’s quite simple: just a title and a video link. You can create one yourself. This means all generated videos will be automatically saved here, combining the execution ID and the workflow ID. This is essentially direct saving. Of course, you have many other forms; ultimately, it saves to a Notion knowledge base.
Okay, I’ve walked you through this process today. Now, let’s align it and explain the details further. If you’re using this workflow yourself, what should you do? After importing, first update your workflow to the latest version. You see, there’s already another update, 1.95.2. If you don’t know where to update, my “Small Newsboy” has professional tutorials on how to update. After importing, you primarily need to modify the contents of the parameter settings. You don’t need to input any of the fields that are in JSON format. However, you can use the default music URL first for testing. You can keep the main model and Lora as they are for a trial run. The trigger words are also fixed, and the images are fixed. The most important thing is later, I won’t go into it now. You need to replace some of the API keys, like the Gemini API, with your own. If some of the initial inputs are fixed by me, that’s fine, and you can run it yourself. After running, first find a new style you want to replicate. For example, I want to use Lora with Flex 1 again, what do I do? I click “Filter,” select. And of course, the corresponding base model is Flex 1. Okay, then find your Lora. For example, you can filter by “Most runs,” which are the most powerful models. You can also filter by “Hottest.” After finding one, for example, I want to generate character portraits. So, for the realistic character portraits we saw earlier, which had minor hand flaws, I can use this. Click to open it. Does it have a hand repair plugin? Then you test it. For example, click “Generate Similar” and test twenty-something times. If there are no issues, you copy these parameters, for example, the positive prompt, and paste it into the prompt example. You modify the number of scenes generated. For example, for character portraits, I’ll find a reference video, a viral short video on Douyin, and download it. Then, I upload it to my Google Cloud Storage and get a direct link. This direct link can be directly downloaded. You can then input your own prompt, for example, to generate a Japanese-style, Korean-style, or Taiwanese-style image, no problem. You can input it into the prompt. After it generates and you modify the parameters, you then move on to the most important part: configuration. You modify all the parameters you might use here according to your LiblibAI model, and then progressively test it.
I recommend watching the entire workflow because it’s quite complex. You first need to configure and deploy the NCA Toolkit on Google Cloud. After binding a credit card, you can also use Google’s Pro model, which is supported. You can always monitor your billing to avoid unexpected charges. For example, if you’re approaching your limit, close the project and stop the trial. Then, you can open a new one to create this workflow completely for free. Also, keep an eye on “Small Newsboy” for any free methods I publish; these are hands-on tutorials for claiming these credits. Okay, what if you’ve used this workflow and want to expand it? Then you need to understand my workflow well. For example, if I want to add Google’s model, what do I do? You can modify it here in Fal.AI to switch to different models. For example, if I want to add ComfyUI, you modify ComfyUI. More importantly, can you also solve the character voiceover generation here? You can summarize the character voiceover generation here. This is all achievable. For example, adding effects or subtitles, anything you can imagine, this workflow today can achieve.
My video ends here today. If you’ve watched this far, because the completion rate for my long videos is really low, I hope everyone can continue to like and comment, and support my “Small Newsboy” and “Bimi Coffee.” I will also do my best to focus seriously on creating high-quality workflows. I hope that every time you watch my videos, you can learn something or see that I’m serious about the small details here, giving you some inspiration and a sense of accomplishment. You can also develop higher-quality workflows based on mine and share them with me; that’s fine too. Today’s video is over. Video 33, viral short video. My first one. I will continue to update it in the future. Because Make.com needs to find suitable application scenarios; it’s not that I don’t post, but I only post when there are good, innovative application scenarios. It’s not my choice whether there are sticking points. Make.com has also updated many features, and I will continue to release videos on it. So, this is essentially Make.com plus N8N. I hope you enjoyed the video. See you next time.