In this project, I built a small raytracer program that simulates how light rays interact with objects and the world. This physicalbased rendering explains 'why we see what we see'. Many ideas and details are included and explored, such as rayscene intersection, material and BRDF, optics fundamentals like refraction and reflection, as well as acceleration structures like bounding volume hierarchy.
In this part we assume a pinhole camera that looks along the z axis, in terms of its own coordinate space. The position of the camera determines the origin of rays. We also have a sensor plane, which is a pixel grid and has its own pixel coordinate space. The size of the sensor plane is determined by the field of view of the camera. As we randomly sample N points within each grid (pixel), the position of each sample determines the direction of the rays. With both the camera and the viewing windows, we are able to generate rays to be traced throughout the entire scene. Note that the camera coordinate differs from the pixel coordinate, and thus we need to normalize the coordinates when passing from one space to the other. Finally, once we have a ray in camera space, we convert it to the world space to be ready for ray tracing.
The number of sample per pixel is a parameter of this rendering program. As will be shown in later sections, the number of samples per pixel as well as the sampling method we choose affect the quality of rendering results.
Now we consider the basics of ray tracing: intersecting with primitives. Earlier we show that rays are represented with an origin and a direction; we here use t as the length of the ray to describe its intersection point with objects. Notice that a valid t value is one that's within the t range of the ray, denoted as t_{min} and t_{max}. For the sake of this project, we only show how rays intersect with triangles and spheres.
RayTriangle intersection: The idea is to view each triangle as a plane and perform rayplane intersection, and then determine whether the intersection point is inside the triangle boundary or not. We used Möller Trumbore (MT) Algorithm to efficiently calculate the intersection point without the need of precomputing the plane equation. The MT algorithm takes advantage of parameterizing the intersection point in terms of barycentric coordinates and solve a linear system using Cramer's rule'.
More detailed steps are as follows:
1) Parameterize point P in barycentric coordinate: P = uA + vB + (1uv)C.
2) Write P as P = O + tD.
3) Solve a linear system with equations from 1) and 2) using Cramer's rule (gives the solution in terms of determinant).
4) Write out the determinant as cross product and dot product.
5) Return false if the ray is parallel to the plane.
6) The solution is only valid if u, v and 1uv are all within [0, 1], otherwise return false.
7) Update the max t value of the ray with the valid intersection t value.
RaySphere intersection: We used quadratic formula of a sphere to calculate the ray intersection points. There could be no roots (no intersection), two equal roots (one intersection) or two distinct roots (two intersections).
Once we found the intersection points, we store them in a structure that contains:
1) t value of the ray that causes this intersection
2) the surface normal at this hit point
3) the primitive that contains this hit point
4) the surface bsdf that is later used to render different materials.
We also want to take into consideration the depth of objects so that when a ray hits on the first object it encounters, it won't propagate to further objects. This could be achieved by updating the t range of the ray once it hits a primitive: we update t_{max} with t value of the hit point.
The current version of ray tracer takes a long time to render an even mildly complicated geometry since the complexity is still linear. A single ray checks all the primitives in the scene to determine which it hits on, making the rendering inefficient. With a tolerable amount of rendering time, I'm now only showing some small rendered dae files:

We notice in part 1 that it takes too long to render a slightly complex scene without implementing any acceleration structure. This part improves the rendering efficiency by using a bounding volume hierarchy (BNH). BVH is a binarytreelike structure that iteratively divides all the primitives in a scene into two parts. At each node level, we calculate which of the two children does the ray intersect with and keep searching only through that branch until we encounter a leaf node. Within that leaf node, we test intersection between the ray with all primitives in that leaf node to get the hit point. This structure decreases the computation complexity from linear to log scale, and thus is able to deal with rather complicated scenes. Following I will talk more about BVH construction and then the intersection algorithm.
BVH Construction
Construction Algorithm Overview:
1) Construct a world bounding box for all the primitives in the scene and create a root node
2) If the current node contains a number of primitives that is below a threshold, we return this node as a leaf node with the current set of primitives.
3) If the current node contains a number of primitives that is still more than a threshold, we split the bounding box into half using the split point (more details on calculating the split point will be discussed later)
4) We put one node as left child and one as the right child, and keep recursively splitting within each of the two children.
Get the split point:
I first picked an axis to recurse on. The axis is chosen as the largest dimension of the bounding box. For the majority of the cases I'm simply using the midpoint of the axis as the splitting point. However, there are two special cases need to be considered:
1) when one of the children contain no primitive (see below figure on the left): this happens when the primitives in the current bounding box is not distributed evenly, and thus one of the children contains no primitive. In this case, instead of using midpoint of the axis, I used the midpoint of the primitives' centers. This not a efficient way of splitting the bounding box but guarantees no empty children after splitting.
2) when both children contain no primitive: this happens when the node is empty and thus splitting it results in both children being empty. In this case, I view the empty node as a leaf node and then return.

BVH Intersection:
Bounding Box Intersection Algorithm Overview:
Shown in the above figure on the right, we can simply get the t value of the intersection point by calculating $t\; =\; (p.xo.x)/d.x$ for xdimension. The main idea is to calculate the intersection per dimension and find the intersection of all the t values.
1) Initialize t_{min} with INFINITY and t_{max} with +INFINITY
2) For xdimension, calculate tx_{min} and tx_{max}.
3) For ydimension, calculate ty_{min} and ty_{max}.
4) For zdimension, calculate tz_{min} and tz_{max}.
5) Find the intersection of the three ranges [tx_{min}, tx_{max}], [ty_{min}, ty_{max}] and [tz_{min}, tz_{max}].
6) If there is no intersection among the three ranges, the ray doesn't intersect with the bounding box.
7) If there is intersection among the three ranges, and the intersected range also has overlapping with the valid t value of the ray, then there is an intersection between the ray and the bounding box.
BVH Intersection:
We want to store the closer hit point into the BVH structure, but we don't need to take care of this explicitly. Remember that whenever we find an intersection point of the ray, we update its valid t value with the intersection t value, and thus any further primitives would not be considered. Therefore the t_{max} value of the ray guarantees that we always store the closer hit point into the BVH structure.
Below I show some dae files that was too complicated to render but is now rendered efficiently with the implementation of acceleration structure BVH.

Now we want to add more complexity into our rendering by considering light sources in the scene. We first deal with direct illumination such as area light shadows and ambient occlusion. At a high level, for each hit point we want to sum over the radiance induced by all light sources in the scene.
Algorithm Overview:
1) For each hit point, we loop through all light sources in the scene.
2) For each light source, we probabilistically sample N samples on area light sources given the hit point (N=1 if the light source is a delta light).
3) We calculate the incoming radiance and the distance from the hit point to the light source sample.
4) If the sampled light point lies behind the surface we ignore this sample and keep looping to the next sample.
5) Otherwise we cast a shadow ray to the scene to check if there is an intersection.
6) Only if there is no intersection should we calculate the BSDF given the incoming and outgoing directions.
7) Accumulate a weighted spectrum value; the weight depends on the pdf of the sample, the number of samples, as well as the angle between the incoming ray and the normal.
Implementation details::
1) when casting the shadow ray, we have to offset the origin by a random small amount (e.g. EPS_D) so that the intersection point won't be frequently the same due to floating point imprecision.
2) take the absolute of the cosine value since we don't want to weigh the spectrum by a negative value.
Below I show some rendered files with direct illumination

The noise level of rendered scene depends on the number of samples per area light. For the bunny scene, I compared the noise levels in soft shadows when rendering with different numbers of light rays.

We are able to observe the differences of the above rendered files. Obviously when l=1, the entire scene is noisy as each time we randomly sample a point within the area light. And there is almost no soft shadows since we are making a binary decision: whether the shadow ray connects with the light or not. As we increase the number of samples per light source, we are able to see more soft shadow under the bunny, and the scene is less noisy. By averaging all the spectrum returned by more shadow rays, we are able to better approximate the real radiance value per pixel in the scene.
Now we consider indirect illumination. The idea is to trace rays recursively for each sample ray until it hits on a light source. We also need to terminate the ray if it bounces too many times not being able to hit the light source. Here we use Russian Roulette to decide whether to randomly terminate a ray, basing on the reflectance of the BSDF Spectrum. We want to achieve that the lower the spectrum, the higher probability to terminate the ray.
Algorithm Overview:
1) Given outgoing radiance direction at the intersection point, sample the incoming radiance direction with a pdf value.
2) Evaluate the BSDF spectrum and apply it to Russian Roulette to get a terminating probability.
3) If terminating, return empty spectrum. Otherwise trace a ray iteratively with the hit point as origin and in the incoming radiance direction. Trace this ray recursively to get an approximation of the radiance.
4) Scale the approximated incoming radiance to get the outgoing radiance. The scale depends on the terminating probability, the BSDF, the cosine factor as well as the pdf.
Implementation details::
1) we don't want the ray to terminate too early that results in noisy rendering, so we scale the BSDF spectrum by a factor of 10 or 20 to guarantee a reasonable amount of bouncing per ray.
Below I show the bunny scene rendered with only direct lighting and only with indirect lighting.

Below I compare rendered views with various values of max ray depth. Notice that with only Lambertian surface, there is little difference for different max ray depths. Rendered image appear to be slightly brighter when we have bigger max ray depth, as the diffuse lighting serves like a low pass filter of the environment lighting, thus more bounces per ray results in more uniformly lighted scenes. However, when we include specular surface in the next part, we could see the ray depth plays a much more important role in rendering the scenes.

Below I compare rendered views with various sampleperpixel rates, ranging up to at 1024. Fewer sample per pixel results in noisy rendered image, and the noise level decreases as we increase the sample per pixel. The more samples we have, the better we can approximate the real scene lighting. When s reaches to 1024, the scene converges to a stable state.

Previous parts allow us to render Lambertian surfaces, but in real world there are many other materials that are not Lambertian, such as mirror and glass. In this part we consider both mirror and glass surfaces.
Algorithm Overview:
1) For mirror surface, we only have reflection. So we will return a cosine weighted spectrum.
2) For glass surface, we have both refraction and reflection. We first check whether the ray is entering from air of glass by inspecting the angle between the ray direction and normal (always point to the air).
3) Check for total internal reflection if ray is entering from glass to air.
4) If internal reflection doesn't happen, apply Schlick's approximation to get the chance of reflection or refraction.
5) If reflection happens, return a cosine weighted reflectance; otherwise if refraction happens, return a cosine weighted transmittance.
Reflection:
Reflection is simple in that the reflected ray is symmetric with the incoming ray with respect to the normal vector.
Refraction:
Below I show an illustration of refraction. The left figure captures a ray entering from glass to air, while the right figure shows a ray entering from air to glass. We have to take care of total internal reflection when the ray enters into a medium with lower refractive index. Then according to snell's law we are able to calculate the outgoing ray direction. Snell's law returns us with the angle of outgoing ray, but we would like to have the 3D direction of the outgoing ray. Take the left refraction illustration image as an example, we know that the incoming and outgoing rays are in the same plane. Also: w_{o} = w_{o}^{y}+w_{o}^{x}, w_{i} = w_{i}^{y}+w_{i}^{x}. And we can easily get the ratio between the magnitude of w_{o}^{x} and w_{i}^{x} is n_{o}/n_{i} according to Snell's law. Then we are able to calculate incoming ray given the outgoing ray, or vice versa.

In the earlier part when we only have diffuse surface, max ray depth doesn't affect rendered image much. However now as the scene contains both diffuse and specular surfaces, setting max ray depth to different values result in very different rendered scenes. More specifically, when:
m=1: we can render the diffuse surface with one bounce, but not for reflection or refraction. Thus glass and mirror balls are black.
m=2: we can render diffuse and reflection with 2 bounces; thus we can render the mirror ball surface that only reflects light, while the glass ball is almost dark. This is because most rays are refracted when entering the glass ball, and stays inside the ball. Thus we can only observe the light being reflected, making the glass ball very dim.
m=3: the mirror ball is rendered in the same manner with only reflection. This time we are able to observe refraction on the glass ball as we let the ray refract twice to get out of the ball and enter the camera.
m=4: we observe the light source's image on the floor, since these rays go through 4 bounces until enter the camera. (3 bounces get in and out of the glass ball to hit the floor and 1 bounce from the floor to the camera)
m=5: we observe some lights on the right half of the wall since these rays go through 5 bounces until enter the camera. (3 bounces get in and out of the glass ball to hit the floor, 1 bounce from the floor to the right half of the wall and 1 bounce from the wall to the camera)
m=100: To reach a stable state of rendering and get a smooth scene, we set the max ray depth to a large number so that Russian roulette fully controls ray termination, better approximating the real ray paths.

Below I show rendered images with different samples per pixel, similar with what previous part shows with only diffuse surface, the more samples we have per pixel, the less noisy the rendered image is.

Below is a converged rendered image with 1024 samples per pixel.
