Projects
Home
Hermes
A
Profile

    Prometheus

    The pursuit of digital immortality.

    Purpose

    Project Prometheus is about creating a digital twin – an AI that captures me, not just my mannerisms. The question: how close can we get to an AI that's indistinguishable from talking to the real me?

    Model and Technology Design

    The 'brain' is Hermes, a language model for conversations. But a digital twin needs my voice. I'm testing Text-to-Speech (TTS) technologies—Tortoise TTS and Coqui TTS—to replicate my voice.

    Training Data

    TTS models learn from audio samples. I recorded samples with different sounds, intonations, and expressions to capture how I speak. The recording includes all alphabetic sounds and my name as a baseline for training.

    Training Input Audio
    0:00/--:--
    Training Data Sample
    0:00/--:--

    TTS Model Comparisons

    Finding the right TTS model took time. High-quality commercial options are closed-source and expensive. Good open-source alternatives are rare. After research, I found a few worth testing.

    Coqui TTS

    Coqui TTS is an open-source library with models like Tacotron, VITS, and Glow-TTS. It supports voice cloning, multilingual speech, and custom training. It has an active community and works for both testing and production.

    Coqui TTS Output
    0:00/--:--

    E2/F5 TTS

    E2/F5 TTS focuses on voice cloning. Its 'one-shot' cloning can replicate a voice from a short audio sample.

    E2/F5 TTS Output
    0:00/--:--

    Resemble.ai

    Resemble.ai is a voice synthesis platform with cloning capabilities. It can clone voices from minimal data and focuses on realistic, expressive output.

    Resemble.ai Output
    0:00/--:--

    E2/F5 TTS: The Winner

    After testing, E2/F5 TTS works best for voice cloning. It captures my voice accurately from minimal input and avoids common accent artifacts.

    Digital Likeness

    With voice synthesis working, the next step is visual representation. The goal is a 3D avatar that resembles me and can be animated—needed for a complete digital twin.

    3D Head Scan with RealityScan

    For the 3D model, I'm using RealityScan, an app by Epic Games that creates 3D models from smartphone photos using photogrammetry.

    Creating a digital human from scratch is complex. I'm using game development tech to build a facial mesh for lip-syncing and expressions.

    AI-Generated Digital Twin with Google VEO3

    I also tested Google's VEO3 video generation model. It creates video from text prompts and reference images—another way to generate a moving digital version.

    Your browser does not support the video element.

    The video shows what VEO3 generated from my reference images.