realfill

πŸ–ΌοΈ RealFill: Reproduction, Exploration, and Improvement

APAI3010/STAT3010 Image Processing and Computer Vision - Group Project (Spring 2025) The University of Hong Kong

This repository contains the code and resources for our group project focused on the paper β€œRealFill: Reference-Driven Generation for Authentic Image Completion” by Tang et al. (SIGGRAPH 2024). Our objective was to reproduce the core results, analyze the method’s strengths and weaknesses, and explore potential extensions.



🎯 Project Overview

Image completion, particularly achieving authentic results faithful to the original scene, is a challenging task. RealFill tackles this by fine-tuning a diffusion inpainting model (Stable Diffusion v2 Inpainting) using a small set of reference images and Low-Rank Adaptation (LoRA).

This project involved:

  1. Reproduction: Implementing the RealFill pipeline and reproducing key results from the paper on the RealBench dataset.
  2. Exploration & Analysis: Evaluating the reproduced model’s performance, identifying limitations (especially concerning geometric consistency and computational cost), and testing on custom real-world data.
  3. Extension (ReFill): Proposing and implementing a 2-stage iterative refinement process (β€œReFill”) using LoFTR-ranked generated images as augmented references, inspired by related works like FaithFill.
  4. Benchmarking: Developing a comprehensive benchmarking suite to evaluate image completion quality using various metrics (PSNR, SSIM, LPIPS, DreamSim, DINO, CLIP).

πŸ§‘β€πŸ’» Team Members

Name UID Profile
Cheng Ho Ming 3036216734 Cheng Ho Ming
Chung Shing Hei 3036216760 Chung Shing Hei
Chan Hin Chun 3036218017 Chan Hin Chun

(See Appendix A.1 in the Project Report for a detailed breakdown of contributions.)

πŸ“ Repository Structure

β”œβ”€β”€ benchmark/                     # Scripts for individual metric calculations (PSNR, SSIM, LPIPS, etc.)
β”œβ”€β”€ data/                          # Placeholder for example data (full datasets usually downloaded separately)
β”œβ”€β”€ project_documents/             # Contains the final report LaTeX template
β”œβ”€β”€ README-Realfill.md             # Original README from the forked base repository
β”œβ”€β”€ LICENSE                        # MIT License file covering base code and our modifications
β”œβ”€β”€ benchmarks.py                  # Main script to orchestrate metric calculation and analysis
β”œβ”€β”€ infer.py                       # Script for running inference with a trained RealFill model
β”œβ”€β”€ loftr_ranking.py               # Script for ranking images based on LoFTR correspondences
β”œβ”€β”€ requirements.txt               # Core dependencies for training and inference
β”œβ”€β”€ requirements-benchmarks.txt    # Additional dependencies for the benchmarking suite
β”œβ”€β”€ train_realfill.ipynb           # Jupyter Notebook for running experiments (primarily on Google Colab)
β”œβ”€β”€ train_realfill.py              # Python script for training/fine-tuning the RealFill model

βš™οΈ Setup

  1. Clone the Repository:

     git clone https://github.com/eric15342335/realfill
     cd realfill
    
  2. Create a Virtual Environment (Recommended):

     python -m venv .venv
     source .venv/bin/activate  # Linux/macOS
     # .venv\Scripts\activate  # Windows
    
  3. Install Dependencies:
    • For Training & Inference:

        # Using pip:
        pip install -r requirements.txt
        # Or using the faster uv:
        # uv pip install -r requirements.txt
      
    • For Benchmarking:

        # Using pip:
        pip install -r requirements-benchmarks.txt
        # Or using uv:
        # uv pip install -r requirements-benchmarks.txt
      

    ⚠️ GPU Acceleration (PyTorch): The requirements.txt file installs the CPU-only version of PyTorch by default to ensure basic compatibility. For GPU acceleration (highly recommended for training and faster inference/benchmarking), you must manually install the appropriate GPU-enabled version of PyTorch matching your CUDA version after installing requirements. For example, if you have CUDA 12.8 installed, run: pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128 Visit the official PyTorch installation guide for instructions.

  4. Dataset:
    • The experiments primarily use the RealBench dataset subset provided by the original RealFill authors and our custom dataset.
    • The train_realfill.ipynb notebook includes cells to download and extract the necessary datasets (realfill_data_release_full.zip, jensen_images.zip) within the Colab environment. Follow instructions there.
    • For local use, download these datasets manually and place them in the appropriate directory structure (e.g., ./realfill_data_release_full/, ./jensen_images/).

▢️ Usage

The primary workflow for this project was developed and tested using the train_realfill.ipynb notebook, especially on Google Colab. We recommend using it for reproducing experiments.

Alternatively, you can use the Python scripts directly as follows:

1. Training / Fine-tuning

Running on Low-Memory GPUs (e.g., Google Colab T4 - 16GB VRAM) πŸ”‹

Training RealFill typically requires significant VRAM. To successfully run fine-tuning on hardware with limited memory like the 16GB GPUs available on Google Colab Free Tier, several optimizations are essential:

  1. Mixed Precision: Enable FP16 mixed precision via Accelerate configuration or by passing --mixed_precision=fp16 (if overriding config).
  2. 8-bit Adam Optimizer: Use the memory-efficient Adam variant via --use_8bit_adam. Requires bitsandbytes.
  3. xFormers: Enable memory-efficient attention mechanisms with --enable_xformers_memory_efficient_attention. Requires xformers.
  4. Set Grads to None: Further reduce memory by setting gradients to None instead of zeroing them using --set_grads_to_none.

Monitoring Training with TensorBoard πŸ“Š

Monitoring the training process is crucial, especially to see how well the model is learning to inpaint the target region during fine-tuning. We incorporated TensorBoard logging for this purpose.

2. Inference

3. Benchmarking πŸ“ˆ

4. LoFTR Ranking/Selection

✨ Our Extension: ReFill

Based on our analysis and inspiration from concurrent work FaithFill, we proposed ReFill. The core idea is a two-stage iterative refinement:

  1. Run the standard RealFill fine-tuning and inference process.
  2. Use loftr_ranking.py to identify the best-generated images from step 1 based on correspondence with the original references.
  3. Augment the original reference set with these top-ranked generated images (up to a limit, e.g., 5 total references).
  4. Perform a second RealFill fine-tuning pass using this augmented reference set.
  5. Run inference again using the model from step 4.

The hypothesis was that adding high-quality, view-diverse generated references could improve the model’s understanding of the scene geometry and lead to more authentic completions.

(See Section 4 in the Project Report for implementation details and results.)

πŸ“Š Results

Detailed quantitative results, qualitative examples, comparisons, and analysis of both the baseline RealFill reproduction and our ReFill extension can be found in the Project Report.

Key findings include:

πŸ“„ License

This repository builds upon the unofficial RealFill implementation by thuanz123, which is licensed under MIT.

Our project, including all modifications, extensions (ReFill), benchmarking suite, and custom code, is also released under the MIT License. This permits anyone to use, modify, and distribute this software, provided the original copyright notice and permission notice are included.

See the LICENSE file for full details.

πŸ™ Acknowledgements


Back to top