Breakdown
1. Interactive CUDA Pathtracer: A realtime pathtracing renderer (C++, CUDA, OpenGL) | 2013
Based off an CUDA-OpenGL inter-op framework, I did everything else myself, building a CUDA raytracer first and expanding to a path tracer. Features include full global illumination, super-sampled anti-aliasing, ray parallelization, motion blur, depth of field, Fresnel reflection and refraction and separated direct indirect illumination.
Taking advantage of the massively parallel nature of GPU, the CUDA pathtracer achieves global illumination by brute force Monte Carlo method, only at a faster speed. Anti-aliasing is trivially implemented just by jittering the direction of the primary rays every iteration. A major speed-up is parallelization by ray instead of by pixel. In this manner, a ray pool is created to contain all rays that are currently being traced. It is initialized with primary rays and upon contact with surface, the ray is either replaced by another ray going other directions (reflection, refraction, diffuse-sampling), or terminated if no intersection was detected. With a loop of kernel calls and the help of stream compaction provide by CUDA, we can work with fewer rays after each bounce and can allocate hardware resource among all alive rays in a much more efficient way. Translational motion Blur is simple and implemented by translating the object a bit every iteration. Depth of field is not difficult in theory, but I encountered a little problem with the result not being blurred uniformly. Instead, there seems to be pattern in the blurring effect for object not in focus. I solved this problem by jittering the camera position differently for each primary ray and each iteration, vertically within a certain aperture.
Due to the nature of pathtracing, convergence is usually slow, even on an GPU implementation. Noise is visible even after several hundred iterations. I partially solved this problem by separating direct and indirect illumination. This requires precise calculation of the contribution of the primary ray and secondary rays, otherwise the overlapping luminance or missing luminance from either part will make the image quickly look very bad. It took me a lot of time messing with the maths to finally figure it out, but the end result shows a relatively stable image much resembling the final image after only 50 iterations.
2. FFT Ocean Simulation on WebGL: An ocean simulation based on Fast Fourier Transform on WebGL | 2013 | Collaborated with Guanyu He
FFT based height field ocean simulation has been present in the industry for many years, and has become a sample program in many SDKs, such as CUDA and DirectX. However, in the area of WebGL, there has not been a implementation that simulates ocean with FFT. Since I am very interested in water simulation in general, know Fast Fourier Transform pretty well and would love to checkout the potential of WebGL myself, I teamed up with Guanyu He and implemented an ocean simulation using custom FFT shader. In particular, I was responsible for construction of the framework of the simulation and rendering, implementation of height field simulation of an ocean patch using a WebGL FFT library in JavaScript I was to develop, and deliver the simulated height field with a clear interface for him to render. The program runs at 60fps on a desktop with i7 processor and GTX TItan graphics card, with both Chrome and Firefox support.
The simulation uses a classic General Purpose GPU approach. The initial spectrum, containing real part and imaginary part, is calculated at the start of the program following oceanographic spectrum model—Phillips Spectrum and stored in a floating point texture. This is only computed once. Then for each frame, a full screen quad is first rendered to screen with the size of the ocean patch. The initial spectrum will be fetched and developed according to a dispersion relationship, resulting in the spectrum texture for that frame. After the simulation, a custom FFT shader will take the spectrum texture as input and output an height field texture for use in the in the rendering.
Having understood Fast Fourier Transform both by hand and programming in MATLAB, I designed a JavaScript 2D FFT shader in order to perform inverse FFT on the spectrum obtained from the simulation shader. According to the separability of 2D FFT, my FFT shader is actually composed of two passes, one for each row of the source spectrum texture and the other for each column of the intermediate spectrum texture. Within each pass, there are several stages of butterfly computation, where a texture "Ping-Pong" approach was employed to transfer texture between stages. Buttlerfly weights and indices are simply computed before the main render loop starts and are fetched in each FFT stage.
Before the height field data is actually usable, we need to invert the sign of every grid point where x+y is even and 0 <= x, y < N. The theory behind this is still unknown to us, but it seems necessary from our experiment as well as the ocean FFT simulation from NVIDIA’s CUDA sdk.
The sun, sky and ocean rendering is done by Guanyu He, my teammates. Thanks to him, the realistic simulation now looks even more real.
3. SimBubble: A bubble simulator plugin integrated into Maya fluid solver (C++, MEL) | 2013 | Collaborated with Sijie Tian
This is a particle bubble simulator Maya plugin I and SIjie developed as the final project for CIS660: Advanced Computer Graphics and Animation, which is based on 2012 SIGGRAPH paper Animating Bubble Interactions in a Liquid Foam by Busaryev et al. This plugin can seamlessly be integrated into normal Maya Fluid workflow and enhance it with large number of non-deformable, physically realistic bubbles. The user needs only to setup the fluid simulation as he or she normally does, add bubble emitters under the water (or above the water, as long as they are placed within the Maya fluid container), adjust several intuitive simulation parameters and the simulation is ready to go! Simply click "play" and the bubble simulation will start along with the fluid simulation.
The bubbles are represented as undeformable spheres under small bubble assumption. We utilized Delaunay Triangulation in CGAL library to construct data structures used to store topology information. A series of edges are formed after triangulation and are categorized either into "Alpha Complex Edges" or normal edges, where alpha complex edges connect bubbles that are close enough to form certain foam structure. The distance between connected bubble pairs, radii, as well as wetness coefficient are stored. Under this framework, a number of behaviors of foam, including foam structures under Plateau’s laws, bubble coalescing and bursting, bubble- liquid and bubble-solid coupling, are handled by different forces proposed in the paper. These forces will be then used to update positions and velocities of particles under an implicit integration scheme. In each time step, information regarding liquid surface and velocity field will be required and extracted from the fluid solver.
I was primarily responsible for the integration of Maya and computer graphics algorithms library (CGAL), which provides complex and highly sophisticated data structure for "weighted points", which we use to represent bubbles, as well as Voronoi Diagram and Delaunay Triangulation algorithm. It took me quite a while to figure out how to compile this multi-gigabytes huge library with correct settings to use in our OpenGL previewer and Maya plugin. But it indeed was a valuable experience to solve the problems in compiling, linking and working with CMake, which has to be used to compile this library. Also, because of its deep integration with Maya infrastructure, I learned a great deal how Maya functions internally, as a group of interlinked nodes, and the interface provided by the Maya fluid dynamics. I spent over a week until finally devised a node network that can work properly along side with the Maya fluid nodes.
4. Efficient SPH Surface Reconstruction: An implementation that substantially reduces computation cost (C++, OpenGL) | 2013 | Collaborated with Yuqin Shao
Using base code from a open source SPH framework, this project is mainly concerned with surface reconstruction, rather than SPH simulation itself. Traditional surface reconstruction method (marching cube) is greatly influenced by the grid cell size. In order to obtain a very smooth and artifact free effects, the method needs very small grid cell size. This considerably increases computational complexity. According to a 2012 SigGraph paper Parallel Surface Reconstruction for Particle-Based Fluids (Akinci et al), we implemented the algorithm described to improve the efficiency by only considering surface gird vertices that are in a narrow band near the surface.
In order to boost the performance of scalar field computation when doing neighborhood search, I implemented the Z-index sort for the Marching Cube grid cell, which ensures spatially close cells are also close in memory. This reduces transfer between memory and cache. To do this, we group every particle and its containing Marching Cube cell into pairs and calculate the cell id by taking the x, y and z index of the cell and do bit interleaving. We call this pair a Handler which will be used in scalar field computation. Once we have a Handler for every particle, we sort the Handler array by cell id and create an additional Start Indices array with the size of total cell number. Each element of this array will store a pointer to an element in Handler array. We define that each non-empty cell will point to corresponding entry in the Handler array with lowest index.
The entire program is written in C++. The output mesh is the batch rendered in Maya with Mentalray.
5. TornadoGeddon: A Unity3D game featuring the control of a devastating tornado (C#) | 2013 | Collaborated with Cheng Xie and Katherine Anderson
Labeled by many players as "novel", "fun", "awesome", TornadoGeddon features a fully animated tornado using Unity Shuriken particle system, a tornado force field model that is simple but realistic and a destructible environment. Initially doubted by my teammates about the feasibility of implementation the time I came up with this concept, I had to search through a number of resources to show them that Unity particle system is indeed capable of animating a tornado and that there indeed exist scripts and plugins in 3DS Max to fracture models. In addition, I designed a tornado wind field model from scratch by both referencing the literature in tornado and experimenting and balancing between the end effects. In the end, I was able to persuade them, get them excited about this unexplored genre and together make a design doc that outlined the formal elements of the game and development schedule.
Throughout the implementation phase, which was about 1 month, I was mainly in charge of the writing of major scripts in C#, such as tornado control, camera control, tornado forces, terrain traverse rules, etc., while Cheng, with his expertise in 3DS Max, modeled buildings, trees, fractured them and imported them into Unity as ready-to-use game assets. Katherine worked on GUI and they both worked on level design. This is the first serious game i have ever made and through several brainstorming and playtesting sessions, It underwent several critical gameplay changes, from removing the health bar, changing the control mechanics to completely change game goals. We did not have a lot to reference because none of us had ever seen a game like this. However, our adviser, Dr. Lane, gave us tons of useful advice, such as destructing stuffs selectively, introducing more complex control scheme for the tornado and ability to pick up and drop off special items.
I learned a lot in the process of designing this game, from task assignment according to specialty, communication, source control, persuading and making compromises to ability to develop upon existing game engine, listening to feedback and adjust gameplay accordingly and project management. In the beginning of the last semester, I reassembled the team and hoped to utilize the free time in last semester to make the game better.
6. CUDA Rasterizer: Complete parallel software rasterization pipeline using CUDA (C++, CUDA, OpenGL) | 2013
CUDA Rasterizer is a rasterization pipeline implemented in on the GPU using CUDA. It mimics the functions of a OpenGL pipeline, and is implemented from scratch. It has all the stages present in a standard OpenGL pipeline, including vertex assembly, vertex shader, primitive assembly, rasterization, and fragment shader.
Before vertex shader, an OBJ file is loaded and converted into vbo format for both vertices and normals. In the vertex shader, the vertices are transformed into screen space using the model-view-projection matrix and viewport transform matrix. Primitive assembly stage assembles the vertices and normals into primitives (triangles). Here, an optional backface culling happens, which culls the triangles that are facing in the same direction as the viewer. The rasterization stage parallels by primitive and is done by scan line algorithm. Early depth test is also performed here using CUDA atomics so that fragments that are occluded are throw away early. The last stage, fragment shader, shades the fragments using Phong shading model.This stage is parallelized by fragment. The process of rendering the pixel buffer onto the screen is the only place where OpenGL is used in this project.
I also implemented super-sampled anti-aliasing and interactive camera.
A Buddha model (100,000 Triangles) runs at 60-70fps.
7. WebGL Virtual Globe: A simulated globe rendered in WebGL (HTML, JavaScript, WebGL) | 2013
This is an WebGL experiment I did in CIS565: GPU Programming. Features in this project include rendering a sphere with textures mapping, JavaScript mouse interaction, bump mapped terrain, rim lighting to simulate atmosphere, night-time lights on the dark side of the globe, specular mapping and moving clouds. To increase realism, I also implemented spacebox (skybox) and raytraced cloud shadow in WebGL.
Based off an CUDA-OpenGL inter-op framework, I did everything else myself, building a CUDA raytracer first and expanding to a path tracer. Features include full global illumination, super-sampled anti-aliasing, ray parallelization, motion blur, depth of field, Fresnel reflection and refraction and separated direct indirect illumination.
Taking advantage of the massively parallel nature of GPU, the CUDA pathtracer achieves global illumination by brute force Monte Carlo method, only at a faster speed. Anti-aliasing is trivially implemented just by jittering the direction of the primary rays every iteration. A major speed-up is parallelization by ray instead of by pixel. In this manner, a ray pool is created to contain all rays that are currently being traced. It is initialized with primary rays and upon contact with surface, the ray is either replaced by another ray going other directions (reflection, refraction, diffuse-sampling), or terminated if no intersection was detected. With a loop of kernel calls and the help of stream compaction provide by CUDA, we can work with fewer rays after each bounce and can allocate hardware resource among all alive rays in a much more efficient way. Translational motion Blur is simple and implemented by translating the object a bit every iteration. Depth of field is not difficult in theory, but I encountered a little problem with the result not being blurred uniformly. Instead, there seems to be pattern in the blurring effect for object not in focus. I solved this problem by jittering the camera position differently for each primary ray and each iteration, vertically within a certain aperture.
Due to the nature of pathtracing, convergence is usually slow, even on an GPU implementation. Noise is visible even after several hundred iterations. I partially solved this problem by separating direct and indirect illumination. This requires precise calculation of the contribution of the primary ray and secondary rays, otherwise the overlapping luminance or missing luminance from either part will make the image quickly look very bad. It took me a lot of time messing with the maths to finally figure it out, but the end result shows a relatively stable image much resembling the final image after only 50 iterations.
2. FFT Ocean Simulation on WebGL: An ocean simulation based on Fast Fourier Transform on WebGL | 2013 | Collaborated with Guanyu He
FFT based height field ocean simulation has been present in the industry for many years, and has become a sample program in many SDKs, such as CUDA and DirectX. However, in the area of WebGL, there has not been a implementation that simulates ocean with FFT. Since I am very interested in water simulation in general, know Fast Fourier Transform pretty well and would love to checkout the potential of WebGL myself, I teamed up with Guanyu He and implemented an ocean simulation using custom FFT shader. In particular, I was responsible for construction of the framework of the simulation and rendering, implementation of height field simulation of an ocean patch using a WebGL FFT library in JavaScript I was to develop, and deliver the simulated height field with a clear interface for him to render. The program runs at 60fps on a desktop with i7 processor and GTX TItan graphics card, with both Chrome and Firefox support.
The simulation uses a classic General Purpose GPU approach. The initial spectrum, containing real part and imaginary part, is calculated at the start of the program following oceanographic spectrum model—Phillips Spectrum and stored in a floating point texture. This is only computed once. Then for each frame, a full screen quad is first rendered to screen with the size of the ocean patch. The initial spectrum will be fetched and developed according to a dispersion relationship, resulting in the spectrum texture for that frame. After the simulation, a custom FFT shader will take the spectrum texture as input and output an height field texture for use in the in the rendering.
Having understood Fast Fourier Transform both by hand and programming in MATLAB, I designed a JavaScript 2D FFT shader in order to perform inverse FFT on the spectrum obtained from the simulation shader. According to the separability of 2D FFT, my FFT shader is actually composed of two passes, one for each row of the source spectrum texture and the other for each column of the intermediate spectrum texture. Within each pass, there are several stages of butterfly computation, where a texture "Ping-Pong" approach was employed to transfer texture between stages. Buttlerfly weights and indices are simply computed before the main render loop starts and are fetched in each FFT stage.
Before the height field data is actually usable, we need to invert the sign of every grid point where x+y is even and 0 <= x, y < N. The theory behind this is still unknown to us, but it seems necessary from our experiment as well as the ocean FFT simulation from NVIDIA’s CUDA sdk.
The sun, sky and ocean rendering is done by Guanyu He, my teammates. Thanks to him, the realistic simulation now looks even more real.
3. SimBubble: A bubble simulator plugin integrated into Maya fluid solver (C++, MEL) | 2013 | Collaborated with Sijie Tian
This is a particle bubble simulator Maya plugin I and SIjie developed as the final project for CIS660: Advanced Computer Graphics and Animation, which is based on 2012 SIGGRAPH paper Animating Bubble Interactions in a Liquid Foam by Busaryev et al. This plugin can seamlessly be integrated into normal Maya Fluid workflow and enhance it with large number of non-deformable, physically realistic bubbles. The user needs only to setup the fluid simulation as he or she normally does, add bubble emitters under the water (or above the water, as long as they are placed within the Maya fluid container), adjust several intuitive simulation parameters and the simulation is ready to go! Simply click "play" and the bubble simulation will start along with the fluid simulation.
The bubbles are represented as undeformable spheres under small bubble assumption. We utilized Delaunay Triangulation in CGAL library to construct data structures used to store topology information. A series of edges are formed after triangulation and are categorized either into "Alpha Complex Edges" or normal edges, where alpha complex edges connect bubbles that are close enough to form certain foam structure. The distance between connected bubble pairs, radii, as well as wetness coefficient are stored. Under this framework, a number of behaviors of foam, including foam structures under Plateau’s laws, bubble coalescing and bursting, bubble- liquid and bubble-solid coupling, are handled by different forces proposed in the paper. These forces will be then used to update positions and velocities of particles under an implicit integration scheme. In each time step, information regarding liquid surface and velocity field will be required and extracted from the fluid solver.
I was primarily responsible for the integration of Maya and computer graphics algorithms library (CGAL), which provides complex and highly sophisticated data structure for "weighted points", which we use to represent bubbles, as well as Voronoi Diagram and Delaunay Triangulation algorithm. It took me quite a while to figure out how to compile this multi-gigabytes huge library with correct settings to use in our OpenGL previewer and Maya plugin. But it indeed was a valuable experience to solve the problems in compiling, linking and working with CMake, which has to be used to compile this library. Also, because of its deep integration with Maya infrastructure, I learned a great deal how Maya functions internally, as a group of interlinked nodes, and the interface provided by the Maya fluid dynamics. I spent over a week until finally devised a node network that can work properly along side with the Maya fluid nodes.
4. Efficient SPH Surface Reconstruction: An implementation that substantially reduces computation cost (C++, OpenGL) | 2013 | Collaborated with Yuqin Shao
Using base code from a open source SPH framework, this project is mainly concerned with surface reconstruction, rather than SPH simulation itself. Traditional surface reconstruction method (marching cube) is greatly influenced by the grid cell size. In order to obtain a very smooth and artifact free effects, the method needs very small grid cell size. This considerably increases computational complexity. According to a 2012 SigGraph paper Parallel Surface Reconstruction for Particle-Based Fluids (Akinci et al), we implemented the algorithm described to improve the efficiency by only considering surface gird vertices that are in a narrow band near the surface.
In order to boost the performance of scalar field computation when doing neighborhood search, I implemented the Z-index sort for the Marching Cube grid cell, which ensures spatially close cells are also close in memory. This reduces transfer between memory and cache. To do this, we group every particle and its containing Marching Cube cell into pairs and calculate the cell id by taking the x, y and z index of the cell and do bit interleaving. We call this pair a Handler which will be used in scalar field computation. Once we have a Handler for every particle, we sort the Handler array by cell id and create an additional Start Indices array with the size of total cell number. Each element of this array will store a pointer to an element in Handler array. We define that each non-empty cell will point to corresponding entry in the Handler array with lowest index.
The entire program is written in C++. The output mesh is the batch rendered in Maya with Mentalray.
5. TornadoGeddon: A Unity3D game featuring the control of a devastating tornado (C#) | 2013 | Collaborated with Cheng Xie and Katherine Anderson
Labeled by many players as "novel", "fun", "awesome", TornadoGeddon features a fully animated tornado using Unity Shuriken particle system, a tornado force field model that is simple but realistic and a destructible environment. Initially doubted by my teammates about the feasibility of implementation the time I came up with this concept, I had to search through a number of resources to show them that Unity particle system is indeed capable of animating a tornado and that there indeed exist scripts and plugins in 3DS Max to fracture models. In addition, I designed a tornado wind field model from scratch by both referencing the literature in tornado and experimenting and balancing between the end effects. In the end, I was able to persuade them, get them excited about this unexplored genre and together make a design doc that outlined the formal elements of the game and development schedule.
Throughout the implementation phase, which was about 1 month, I was mainly in charge of the writing of major scripts in C#, such as tornado control, camera control, tornado forces, terrain traverse rules, etc., while Cheng, with his expertise in 3DS Max, modeled buildings, trees, fractured them and imported them into Unity as ready-to-use game assets. Katherine worked on GUI and they both worked on level design. This is the first serious game i have ever made and through several brainstorming and playtesting sessions, It underwent several critical gameplay changes, from removing the health bar, changing the control mechanics to completely change game goals. We did not have a lot to reference because none of us had ever seen a game like this. However, our adviser, Dr. Lane, gave us tons of useful advice, such as destructing stuffs selectively, introducing more complex control scheme for the tornado and ability to pick up and drop off special items.
I learned a lot in the process of designing this game, from task assignment according to specialty, communication, source control, persuading and making compromises to ability to develop upon existing game engine, listening to feedback and adjust gameplay accordingly and project management. In the beginning of the last semester, I reassembled the team and hoped to utilize the free time in last semester to make the game better.
6. CUDA Rasterizer: Complete parallel software rasterization pipeline using CUDA (C++, CUDA, OpenGL) | 2013
CUDA Rasterizer is a rasterization pipeline implemented in on the GPU using CUDA. It mimics the functions of a OpenGL pipeline, and is implemented from scratch. It has all the stages present in a standard OpenGL pipeline, including vertex assembly, vertex shader, primitive assembly, rasterization, and fragment shader.
Before vertex shader, an OBJ file is loaded and converted into vbo format for both vertices and normals. In the vertex shader, the vertices are transformed into screen space using the model-view-projection matrix and viewport transform matrix. Primitive assembly stage assembles the vertices and normals into primitives (triangles). Here, an optional backface culling happens, which culls the triangles that are facing in the same direction as the viewer. The rasterization stage parallels by primitive and is done by scan line algorithm. Early depth test is also performed here using CUDA atomics so that fragments that are occluded are throw away early. The last stage, fragment shader, shades the fragments using Phong shading model.This stage is parallelized by fragment. The process of rendering the pixel buffer onto the screen is the only place where OpenGL is used in this project.
I also implemented super-sampled anti-aliasing and interactive camera.
A Buddha model (100,000 Triangles) runs at 60-70fps.
7. WebGL Virtual Globe: A simulated globe rendered in WebGL (HTML, JavaScript, WebGL) | 2013
This is an WebGL experiment I did in CIS565: GPU Programming. Features in this project include rendering a sphere with textures mapping, JavaScript mouse interaction, bump mapped terrain, rim lighting to simulate atmosphere, night-time lights on the dark side of the globe, specular mapping and moving clouds. To increase realism, I also implemented spacebox (skybox) and raytraced cloud shadow in WebGL.