I am Jiayi Wu, a 3rd-year Ph.D. student supervised by Prof. Yiannis Aloimonos at the Perception & Robotics Group, University of Maryland, College Park.

My research focuses on 3D vision and robotics, especially 3D/4D generation, active vision, and field robotics with an emphasis on underwater robotics. Inspired by nature, I aim to build world generators, simulators, and robot perception systems that challenge existing paradigms and redefine the future of 3D vision and robotics.

Before joining UMD, I worked with Prof. Md Jahidul Islam and completed my Master's thesis under his supervision.

Research Interests

  • World Reconstruction and Generation

    3D/4D reconstruction, view synthesis, and world models for simulation and embodied AI.

  • Robust and Active Vision

    Reliable perception in harsh conditions and closed-loop vision for deliberate sensing.

  • Physics-informed generative model

    Generative modeling constrained by physical structure for plausible dynamics and simulation.

News

  1. Single-Step Latent Diffusion for Underwater Image Restoration accepted by TPAMI

    2025/07/02

    The paper will be presented at ICCP 2025.

  2. Learning Normal Flow Directly From Event Neighborhoods accepted by ICCV 2025

    2025/06/25

    Work on event-based normal flow estimation with collaborators at UMD.

  3. ViewActive accepted by IROS 2025

    2025/06/15

    Active viewpoint optimization from a single image.

  4. Event3DGS accepted by CoRL 2024

    2024/09/04

    Event-based 3D Gaussian Splatting for high-speed robot egomotion.

  5. MARVIS accepted by IROS 2024

    2024/06/30

    Motion and geometry aware real and virtual image segmentation.

Academic CV

Education

  1. University of Maryland, College Park

    Aug. 2023 - Present

    Ph.D. in Computer Science. Supervised by Prof. Yiannis Aloimonos.

  2. University of Florida

    Aug. 2021 - May 2023

    M.S. (Thesis) in Electrical and Computer Engineering. Supervised by Prof. Md Jahidul Islam.

  3. Zhejiang Sci-Tech University

    Sept. 2017 - Jun. 2021

    B.E. in Mechatronic Engineering. 2021 Outstanding Graduate.

Experience

  1. Applied Scientist Intern

    Mar. 2026 - Present

    3D-consistent and physically plausible video generation at Prime Video & Amazon MGM Studios.

    Seattle, WA, United States.

  2. Ph.D. in Computer Science

    Aug. 2023 - Present

    Computer vision and robotics research at the University of Maryland.

    University of Maryland, College Park, MD, United States.

    • Proposed SLURPP, a single-stage latent diffusion model with dual-branch tailored for physically accurate scattering medium decomposition, achieving ~3dB PSNR gain and 200× speedup for underwater image restoration (accepted by TPAMI).
    • Proposed a point-based normal flow estimator using a local point cloud encoder to predict per-event flow, enabling sharp, robust, and transferable predictions with uncertainty quantification, and supporting an IMU-based egomotion solver for challenging scenarios (accepted by ICCV 2025).
    • Proposed ViewActive, a novel framework for active viewpoint optimization that mimics human-like spatial reasoning to enhance robotic perception (accepted by IROS 2025).
    • Proposed Event3DGS, an event-based 3D Gaussian Splatting method that achieves state-of-the-art reconstruction quality and significantly accelerates training and rendering (accepted by CoRL 2024).
    • Proposed MARVIS, a cutting-edge solution for real-virtual image segmentation near water surfaces, effectively leveraging synthetic data and domain-invariant features (accepted by IROS 2024).
    • Teaching Assistant for CMSC426 Computer Vision.
  3. Ph.D. Research Intern

    May 2025 - Aug. 2025

    3D live broadcasting for large-scale sports with high-fidelity scene reconstruction.

    Dolby Laboratories, Sunnyvale, CA, United States.

    • Proposed a 3D live broadcasting solution for large-scale sports, integrating a 3D tracking foundation model prior to 4D sparse-view inverse rendering for high-fidelity scene reconstruction (Patent pending).
  4. 3D Vision Research Intern (Master’s Thesis)

    Jan. 2022 - Jun. 2023

    Master’s thesis research on underwater 3D vision and depth estimation in RoboPI Lab.

    University of Florida ยท RoboPI Lab, Gainesville, FL, United States.

    • Proposed AquaFuse, a physics-based waterbody fusion method for underwater imagery that preserves depth and object geometry, enabling geometry-consistent data augmentation and accurate 3D view synthesis with 94% depth and 90–95% structural similarity (accepted by RA-L).
    • Proposed a 3D underwater reconstruction pipeline using pixel-wise depth guidance for sparse-to-semi-dense point clouds, reducing 2D feature reliance and achieving 3× faster inference with higher reconstruction accuracy (Best Paper Award at IEEE CAI 2023).
    • Formulated a robust and efficient monocular depth estimation model named UDepth, by incorporating underwater domain knowledge into its supervised learning pipeline (accepted by ICRA 2023).
  5. Student Research Assistant (Automated Script Modeling)

    Jan. 2022 - Jan. 2023

    Programmatic 3D modeling and dynamic scene generation for remote sensing applications.

    Remote Sensing Laboratory, University of Florida, Gainesville, FL, United States.

    • Generate complex 3D models programmatically.
    • Connect the program with the database to realize the dynamic real-time 3D model.
    • 3 papers accepted by IGARSS 2023, IEEE TGRS and IEEE JSTARS.
  6. Digital Audio and Video Algorithm Intern

    May 2022 - Aug. 2022

    Learning-based video retrieval and psychoacoustic audio fingerprint encoding.

    Vobile, Santa Clara, CA, United States.

    • Developed and implemented a learning-based video retrieval system based on global feature and local feature fusion. And also wrote the user manual and targeted model performance optimization guidelines document of the system.
    • Conducted a number of qualitative phase-shift auditory tests and found a relationship between the phase-shift cases and the psychoacoustic model. Upgraded the audio fingerprint encoding algorithm based on the classic psychoacoustic model. The upgraded algorithm can encode not only the sound pressure level of the audio fingerprint but also the threshold of its phase shift (implemented in C and MATLAB).

Honors and Awards

  1. UMIACS Fellowship

    2024
  2. UF Herbert Wertheim College of Engineering Engineering Achieve Award

    2021 / 2022
  3. First Class School Financial Aid for Overseas Exchange Program

    2019
  4. National University Graduate Design Competition, Individual 1st Prize

    2021.06

    Only two people won this award nationwide.

  5. National 3D Digital Innovative Design Competition, Provincial 1st Prize

    2019.10

Academic Services

Reviewer

ICLR, NeurIPS, ICRA, IROS, TPAMI, RA-L, and IEEE JOE, etc.

Research

  • Single-Step Latent Diffusion for Underwater Image Restoration

    Single-Step Latent Diffusion for Underwater Image Restoration

    TPAMI & ICCP 2025

    Single-step latent diffusion for fast underwater image restoration in challenging scattering media.

  • Learning Normal Flow Directly From Event Neighborhoods

    Learning Normal Flow Directly From Event Neighborhoods

    ICCV 2025

    Event-based normal flow estimation directly from local event neighborhoods.

  • ViewActive

    ViewActive: Active Viewpoint Optimization From a Single Image

    IROS 2025

    Active viewpoint optimization from a single image for perception-aware scene understanding.

  • Event3DGS

    Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion

    CoRL 2024

    Event-based 3D Gaussian Splatting for reconstructing high-speed robot egomotion.

  • MARVIS

    MARVIS: Motion & Geometry Aware Real and Virtual Image Segmentation

    IROS 2024

    Motion and geometry aware segmentation across real and virtual images.

  • 3D Reconstruction of Underwater Scenes

    3D Reconstruction of Underwater Scenes Using Nonlinear Domain Projection

    IEEE CAI 2023, Best Paper Award

    Depth-guided underwater 3D reconstruction with nonlinear domain projection.

  • UDepth

    UDepth: Fast Monocular Depth Estimation for Visually-Guided Underwater Robots

    ICRA 2023

    Fast monocular underwater depth estimation for visually guided robotic systems.

  • AquaFuse

    AquaFuse: Waterbody Fusion for Physics Guided View Synthesis of Underwater Scenes

    RA-L

    Physics-guided underwater view synthesis by fusing waterbody information with scene geometry.

  • Stilt Type Deformation Wheel patent figures

    Stilt Type Deformation Wheel

    CN212400777U

    Patent for a deformable stilt-type wheel mechanism.

  • Unmanned Automobile Automatic Charging System patent figures

    Unmanned Automobile Automatic Charging System and Charging Docking Method

    CN113511087A

    Patent for an automatic charging system and docking method for unmanned automobiles.