My First Try with Image-to-Video AI on Google Colab

Today I tried something exciting — building my own image-to-video AI model on Google Colab. My goal was not complicated: I wanted to upload an image, write a prompt describing what I wanted, and let the AI generate either a new image aligned with that prompt or even a short video based on it.

The idea behind this is powerful. Imagine taking a single picture and transforming it into a living, moving story, all with the help of AI. Many platforms already provide this kind of service, but most of them are subscription-based and costly. Since I could not afford those, I decided to explore the free alternative that Colab offers.

The results were interesting. The AI did manage to create images and even videos based on my input, but they were far from perfect. The original image wasn’t always preserved well — faces and objects would sometimes distort, or the AI would hallucinate completely new details that weren’t there before. While this can look creative in its own way, it wasn’t what I was aiming for.

Still, the experience taught me a lot. Running such models on free resources like Colab is not easy — performance is limited, and results can be unpredictable. But it also opened up exciting possibilities. With refinement, better models, and more resources, these AI tools could become truly impressive.

For now, I’m happy I managed to bring my idea to life, even in this rough version. This is just the beginning, and I’ll try it again in the future. It was particularly helpful that Gemini on Colab adjusted the code and solved the issues, allowing it to run successfully.

Here’s the google Colab Python code snippet with a prompt on planned image or video

(This version is the one adjusted by Gemini on Colab, so it runs successfully.)


# --- Setup (for Google Colab with GPU) ---
!pip install diffusers transformers accelerate safetensors torch controlnet_aux modelscope --quiet

import torch
from diffusers import StableDiffusionPipeline
from PIL import Image
from modelscope.pipelines import pipeline
from modelscope.outputs import OutputKeys
from google.colab import files

# --- Upload your image ---
uploaded = files.upload()
input_image_path = list(uploaded.keys())[0]
image = Image.open(input_image_path).convert("RGB")
image = image.resize((512, 512))

# --- Load pipelines ---
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5").to("cuda")
video_pipe = pipeline('text-to-video-synthesis', 'damo-vilab/text-to-video-ms-1.7b')

# --- Prompt for historical painting ---
prompt = (
    "A realistic 19th-century painting of a historical woman, "
    "sitting at a table, calm expression, elegant historical attire, "
    "cinematic lighting, highly detailed, classic oil painting style"
)

# --- Generate Image ---
result = pipe(prompt, num_inference_steps=25).images[0]
result.save("historical_painting.png")
print("✅ Historical painting generated and saved as historical_painting.png")

# --- Prompt for video animation ---
video_prompt = (
    "A short cinematic animation of the historical woman in 19th-century attire, "
    "classical painting style, sitting calmly at a table, elegant lighting"
)

# --- Generate Video ---
video_result = video_pipe({'text': video_prompt})
with open('historical_video_painting.mp4', 'wb') as f:
    f.write(video_result[OutputKeys.OUTPUT_VIDEO])

print("✅ Historical painting video generated and saved as historical_video_painting.mp4")

WONDER

Monday, August 25, 2025

My colab text-to-video prompt

My First Try with Image-to-Video AI on Google Colab

Pages