BD Brain Drip
🧊
Module 10 9 concepts

3D Vision

Depth estimation, point clouds, and 3D reconstruction.

01

3D Gaussian Splatting

3D Gaussian Splatting represents scenes as collections of learnable 3D Gaussian primitives that are rasterized via differentiable tile-based splatting, achieving NeRF-quality novel views at real-time rendering speeds exceeding 100 FPS.

02

3D Object Detection

3D object detection localizes objects with oriented 3D bounding boxes (x, y, z, width, height, length, yaw) from LiDAR point clouds, camera images, or fused sensor inputs.

03

3D Reconstruction

3D reconstruction recovers the shape and appearance of objects or scenes from sensor observations, producing explicit representations (meshes, voxel grids, point clouds) or neural implicit surfaces (signed distance functions, occupancy fields).

04

Depth Estimation

Depth estimation recovers per-pixel distance from the camera to the scene, either from a single image (monocular) or from stereo image pairs, enabling 3D understanding from 2D observations.

05

Multi-View Geometry

Multi-view geometry provides the mathematical framework for relating 2D image observations from multiple cameras to 3D scene structure, grounded in epipolar geometry, the fundamental matrix, and triangulation.

06

Neural Radiance Fields (NeRF)

NeRF represents a 3D scene as a continuous volumetric function, implemented by an MLP that maps 5D coordinates (position + viewing direction) to color and density, enabling photorealistic novel view synthesis.

07

Point Cloud Processing

Point cloud processing handles unordered sets of 3D points acquired from LiDAR, depth cameras, or photogrammetry, using specialized data structures and algorithms for efficient spatial reasoning.

08

PointNet

PointNet consumes raw, unordered 3D point clouds directly via shared MLPs and a symmetric max-pooling function, bypassing the need for voxelization or mesh conversion.

09

SLAM (Simultaneous Localization and Mapping)

SLAM simultaneously estimates a sensor’s pose (localization) and builds a map of the environment (mapping), solving the chicken-and-egg problem where you need a map to localize and a location to map.