APAI3010/STAT3010 Image Processing and Computer Vision - Group Project (Spring 2025) The University of Hong Kong
This repository contains the code and resources for our group project focused on the paper βRealFill: Reference-Driven Generation for Authentic Image Completionβ by Tang et al. (SIGGRAPH 2024). Our objective was to reproduce the core results, analyze the methodβs strengths and weaknesses, and explore potential extensions.
Image completion, particularly achieving authentic results faithful to the original scene, is a challenging task. RealFill tackles this by fine-tuning a diffusion inpainting model (Stable Diffusion v2 Inpainting) using a small set of reference images and Low-Rank Adaptation (LoRA).
This project involved:
Name | UID | Profile |
---|---|---|
Cheng Ho Ming | 3036216734 | |
Chung Shing Hei | 3036216760 | |
Chan Hin Chun | 3036218017 |
(See Appendix A.1 in the Project Report for a detailed breakdown of contributions.)
βββ benchmark/ # Scripts for individual metric calculations (PSNR, SSIM, LPIPS, etc.)
βββ data/ # Placeholder for example data (full datasets usually downloaded separately)
βββ project_documents/ # Contains the final report LaTeX template
βββ README-Realfill.md # Original README from the forked base repository
βββ LICENSE # MIT License file covering base code and our modifications
βββ benchmarks.py # Main script to orchestrate metric calculation and analysis
βββ infer.py # Script for running inference with a trained RealFill model
βββ loftr_ranking.py # Script for ranking images based on LoFTR correspondences
βββ requirements.txt # Core dependencies for training and inference
βββ requirements-benchmarks.txt # Additional dependencies for the benchmarking suite
βββ train_realfill.ipynb # Jupyter Notebook for running experiments (primarily on Google Colab)
βββ train_realfill.py # Python script for training/fine-tuning the RealFill model
Clone the Repository:
git clone https://github.com/eric15342335/realfill
cd realfill
Create a Virtual Environment (Recommended):
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
For Training & Inference:
# Using pip:
pip install -r requirements.txt
# Or using the faster uv:
# uv pip install -r requirements.txt
For Benchmarking:
# Using pip:
pip install -r requirements-benchmarks.txt
# Or using uv:
# uv pip install -r requirements-benchmarks.txt
β οΈ GPU Acceleration (PyTorch): The
requirements.txt
file installs the CPU-only version of PyTorch by default to ensure basic compatibility. For GPU acceleration (highly recommended for training and faster inference/benchmarking), you must manually install the appropriate GPU-enabled version of PyTorch matching your CUDA version after installing requirements. For example, if you have CUDA 12.8 installed, run:pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
Visit the official PyTorch installation guide for instructions.
RealBench
dataset subset provided by the original RealFill authors and our custom dataset.train_realfill.ipynb
notebook includes cells to download and extract the necessary datasets (realfill_data_release_full.zip
, jensen_images.zip
) within the Colab environment. Follow instructions there../realfill_data_release_full/
, ./jensen_images/
).The primary workflow for this project was developed and tested using the train_realfill.ipynb
notebook, especially on Google Colab. We recommend using it for reproducing experiments.
Alternatively, you can use the Python scripts directly as follows:
train_realfill.py
script launched via accelerate
.--pretrained_model_name_or_path
: Base model (e.g., stabilityai/stable-diffusion-2-inpainting
).--train_data_dir
: Path to the specific scene directory.--output_dir
: Where to save LoRA checkpoints/model.Training RealFill typically requires significant VRAM. To successfully run fine-tuning on hardware with limited memory like the 16GB GPUs available on Google Colab Free Tier, several optimizations are essential:
--mixed_precision=fp16
(if overriding config).--use_8bit_adam
. Requires bitsandbytes
.--enable_xformers_memory_efficient_attention
. Requires xformers
.None
instead of zeroing them using --set_grads_to_none
.Example Command (Adapted from our Colab Setup for RealBench Scene 23): This command incorporates the necessary flags for low-memory training and includes monitoring/checkpointing flags (see next section).
# --- Set Environment Variables ---
export MODEL_NAME="stabilityai/stable-diffusion-2-inpainting"
export BENCHMARK="RealBench"
export DATASET_NUMBER=23
export TRAIN_DIR="realfill_data_release_full/$BENCHMARK/$DATASET_NUMBER"
export OUTPUT_DIR="$BENCHMARK-$DATASET_NUMBER-model" # Example output dir
# --- Launch Training ---
accelerate launch train_realfill.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--train_data_dir=$TRAIN_DIR \
--output_dir=$OUTPUT_DIR \
--resolution=512 \
--train_batch_size=16 \
--gradient_accumulation_steps=1 \
--use_8bit_adam `# Use 8-bit Adam` \
--enable_xformers_memory_efficient_attention `# Use xFormers` \
--set_grads_to_none `# Set Grads to None` \
--unet_learning_rate=2e-4 \
--text_encoder_learning_rate=4e-5 \
--lr_scheduler="constant" \
--lr_warmup_steps=100 \
--max_train_steps=2000 \
--lora_rank=8 \
--lora_dropout=0.1 \
--lora_alpha=16 \
--mixed_precision=fp16 `# Explicitly set mixed precision` \
--resume_from_checkpoint="latest" `# Resume if checkpoints exist` \
--report_to tensorboard `# Enable TensorBoard logging` \
--checkpointing_steps 200 `# Save checkpoint every 200 steps` \
--validation_steps 100 `# Run validation every 100 steps` \
--num_validation_images 4 `# Generate 4 validation images`
Some options worth considering:
--gradient_checkpointing
: Enable gradient checkpointing to save memory at the cost of speed. This is useful if you trying to run on a GPU with less VRAM than Google Colab T4 (16GB).--allow_tf32
: Enable TensorFloat-32 (TF32) for NVIDIA Ampere GPUs (e.g., A100, RTX 30/40 series). This can improve performance but is not available on all GPUs.--mixed_precision=bf16
: Use Brain Float 16 (BF16) precision if supported by your hardware. This is generally faster than FP16 but requires specific GPU support (e.g., A100, H100). Google Colab T4 does not support BF16.
Note that by default --train_batch_size
has no effect if the number is larger than the number of available images in the training set (reference & target images). If you have access to hardware with more VRAM, you can consider using --pad_to_full_batch
to pad the input batch to the full batch size.
Monitoring the training process is crucial, especially to see how well the model is learning to inpaint the target region during fine-tuning. We incorporated TensorBoard logging for this purpose.
--report_to tensorboard
flag to your accelerate launch
command.--validation_steps <N>
: Runs the validation loop every N
training steps. Validation involves generating sample inpainted images using the current state of the model.--num_validation_images <K>
: Generates K
sample images during each validation run.--checkpointing_steps <M>
flag saves the model state every M
steps.Viewing Logs: While training is running (or after it finishes), navigate to the parent directory of your OUTPUT_DIR
in your terminal and run:
tensorboard --logdir <OUTPUT_DIR>/logs
(Note: <OUTPUT_DIR>
is the directory specified in your training command, e.g., RealBench23-model
).
Open the URL provided by TensorBoard (usually http://localhost:6006/
) in your browser. The generated validation images will appear under the βImagesβ tab, allowing you to visually inspect the learning progress.
infer.py
script after training.--model_path
: Path to the trained model directory (containing unet/
and text_encoder/
subfolders).--validation_image
: Path to the target image (target.png
).--validation_mask
: Path to the mask image (mask.png
).--output_dir
: Directory to save the 16 generated output images.Example Command:
accelerate launch infer.py \
--model_path="./RealBench23-model" \
--validation_image="./realfill_data_release_full/RealBench/23/target/target.png" \
--validation_mask="./realfill_data_release_full/RealBench/23/target/mask.png" \
--output_dir="./realfill_results/RealBench23-results"
benchmarks.py
script to evaluate generated results against ground truth.--results_base_dir
: The parent directory containing multiple scene result folders (e.g., ./realfill_results/
or your Google Drive path). Folders should follow a pattern like RealBench-X-results
or Custom-Y-results
.--realbench_dataset_dir
: Path to the base directory of the original RealBench dataset (needed for finding corresponding GT/Mask files).--custom_dataset_dir
: Path to the base directory of the custom dataset (if applicable).--cache_dir
: Directory to store intermediate and final metric caches (speeds up re-runs).--output_file
: Path to save the final text report.--metrics
: (Optional) List specific metrics to run (e.g., PSNR SSIM LPIPS
). Default is all configured metrics.--force_recalc
: (Optional) Force recalculation for specific metrics or all
or loftr
.gt.png
) and mask (mask.png
) in the dataset directories.benchmark/
in parallel.master_results_cache.json
and per_scene_cache/
) to avoid recomputing.loftr_ranking.py --rank-only
) on RealBench results if the script is found.benchmark_report.txt
).Example Command:
python benchmarks.py \
--results_base_dir="./realfill_results" \
--realbench_dataset_dir="./realfill_data_release_full" \
--custom_dataset_dir="./jensen_images" \
--cache_dir="./benchmark_cache" \
--output_file="./benchmark_report.txt"
loftr_ranking.py
script serves two purposes:
train_realfill.ipynb
): When USE_GENERATED_REF_IMAGE=True
in the notebook, this script is called to rank images from a previous runβs output directory against the original references. It copies the top N candidates (based on --target-count
) into the current runβs ref/
directory before training starts.benchmarks.py
): The benchmarking script calls loftr_ranking.py --rank-only
. This ranks the 16 generated images within a result folder against the original references and saves the ranking scores to loftr_ranking_scores.json
inside that result folder. This data is used for the LoFTR filtering analysis in the final report.Based on our analysis and inspiration from concurrent work FaithFill, we proposed ReFill. The core idea is a two-stage iterative refinement:
loftr_ranking.py
to identify the best-generated images from step 1 based on correspondence with the original references.The hypothesis was that adding high-quality, view-diverse generated references could improve the modelβs understanding of the scene geometry and lead to more authentic completions.
(See Section 4 in the Project Report for implementation details and results.)
Detailed quantitative results, qualitative examples, comparisons, and analysis of both the baseline RealFill reproduction and our ReFill extension can be found in the Project Report.
Key findings include:
This repository builds upon the unofficial RealFill implementation by thuanz123, which is licensed under MIT.
Our project, including all modifications, extensions (ReFill), benchmarking suite, and custom code, is also released under the MIT License. This permits anyone to use, modify, and distribute this software, provided the original copyright notice and permission notice are included.
See the LICENSE file for full details.