This was a personal project that took place at the beginning of the year. During the Christmas break, I was re-reading Michel Ende’s NeverEnding Story and with the advancement of AI it got me wondering if I could start a stream of an actual story that never ends, with some sort of structure and all that can be listened to on the go.

That’s what started “A Story That Never Ends” - I didn’t want to have any problems with copyright infringement… Haha.

What I thought was interesting about this was the approach I was going to take, that would help me learn a few different aspects of the AI space. I wanted to be:

Run completely locally. My own resources, not relying on third party servers to generate the story (or any of the output).
The story must be narrated, so can’t be just text.
It would have to be able to stream 24/7 on a platform like Youtube. With a looping video (lofi girl style). Which means the story generation must be faster than the output being generated

Using local AI models can be pretty painful without some of the publicly available frameworks out there. I spent some time exploring Ollama, vLLM, LM Studio, etc… All of these tools required a very careful setup to work so creating the right environments would be crucial. This was cool because it got me doing some Docker container development. I had used Docker before, but only to setup other people’s containers to run tools, this got me doing my own.

Docker quickly became invaluable because I realized that some common libraries with the AI tools needed to generate the story would conflict with the frameworks I would use for Text to Speech. So I could then create a separate container for that without any conflict. And then a 3rd one to generate the subtitles from the audio.

I had a pretty specific prompt that would tell my local llm what the task is and what it is expected of it. It would generate a story and then that would kick off the text-to-speech (OpenTTS) that would generate the audio for it. Once that was done, a separate model (WhisperX) would turn that audio into useful “.srt” subtitles. Now, the system would kick off another story and start again.

This was good, but there was no continuity. So now I updated the prompt so that it would include the previous X amount of stories to continue where it left of. Then I added chapter generation. Then proper narrative structure. Etc.. Etc…

I kept iterating. And there were many problems that I had to resolve, like how to stop the AI from getting stuck in a loop, or repeat itself too much, or tell the same story, etc… In the end I had a story generator that had the following tasks:

Pick a story structure (for example Freytag’s Pyramid) randomly. Then make a brief of 10 -> 15 Chapters for the story that follow that structure. With a brief description of each chapter.
For each chapter, make a full length story with detailed depiction of the events. Keep track of main events, characters, plot holes, etc. If it had previous knowledge of the story events keep them in mind for consistency.
Once the story was generated, send off for Narrating, Subtitles and combining as video.
When a whole story would end. We repeat the process, except in between we would add a few transition episodes which make it so that the story casually goes from one to the other so that it happened naturally. This way, the story never ends but we simply follow a different path.

But here’s where I encountered the first problem. It was taking too long to generate and while each video was roughly 3-5 minutes long, the generation would take much longer. So I spent some time optimizing following different approaches:

Lighter Models: It was a fine line between “quick models vs good story generator vs good memory management”
- I was lucky that I had the RTX A6000 I had won in the NVIDIA Omniverse contest, so I was able to find a pretty sweet spot with deepseek-r1:32b-qwen-distill-q4_K_M
Split GPUs: This was a pretty useful time saver. Since getting the A6000, I put my 3080 in a box and never used it again. So I used this opportunity to try and run a dual GPU system where one card would do the story generation, and the other would take care of other tasks in parallel. It took some work to get it fully working (using Docker containers helped big time) but it saved a ton of time.
Skip Video Generation: Originally, the video was being put together during the generation too and timed the crops so that one video would seamlessly connect with the next. Knowing that I had a video that looped perfectly in itself I realized I could do a different approach. All I had to do is set OBS to play that video on the loop, the audio and subtitles and stream that to Youtube.

And after loads of tweaking… Voila! I had my first stream!

This ran for nearly 2 months. And I would listen to it while driving, walks, etc. I realized that a constant stream of story easily covers so much that by the next day you can’t tell what the story is up to. There’s a lot I felt I could do to improve, from the story generation to the audio output, but in the end I decided to pause this because my electricity bill was going off the roof…

If you’d like to see the code, find it in my public-portfolio-repo. This one is somewhat intact (thought I did hide some personal data that was hardcoded into the scripts)

One day I reckon I’ll pick it up again. Here’s a segment of the stories that were generated and were streamed live on Youtube: