Project V: Generating Images with the Variational Auto-Encoder

December 2, 2023

Project Steps:

Data Collection: Download and prepare the CelebA Faces dataset, ensuring a collection of celebrity face images for training the VAE model.
Data Preprocessing: Resize images if necessary, normalize pixel values, and split the data into training and testing sets.
VAE Architecture Creation: Define the architecture of the Variational Autoencoder, including the encoder and decoder components. The encoder compresses input images into latent representations, while the decoder generates new images from these representations.
Loss Function Definition: Establish a custom loss function for the VAE, incorporating both reconstruction loss and Kullback-Leibler (KL) divergence loss. The reconstruction loss measures image generation accuracy, while the KL loss encourages latent representations to follow a normal distribution.
VAE Training: Train the VAE model on the CelebA Faces training dataset. Monitor loss during training and adjust hyperparameters if necessary.
Image Generation: Once the VAE model is trained, utilize it to generate new celebrity face images. Sample points from the latent space to produce different faces.
Evaluation: Assess the quality of the generated images using metrics such as Structural Similarity Index (SSIM) or visual evaluations.
Results Display: Display the generated celebrity faces produced by the model. Create an image grid to visualize the results.
Optimization and Iterations: Iterate on training, architecture, and hyperparameters to enhance the quality of generated images.
CelebA Faces Dataset: The CelebA Faces dataset is renowned for facial recognition and celebrity image generation tasks. It comprises a vast collection of high-resolution celebrity face images from various sources, making it a valuable resource for diverse image generation. Key features of the CelebA Faces dataset include its extensive size, diverse facial characteristics, annotations accompanying each image, and images typically being high-resolution and in JPG format.

Link to GitHub Repository