Fun with neural networks!

Part 1: Nose Detection

Sample Images from Dataloader

Training and Validation Accuracy

Plotted below are the training and validation accuracy trend lines for a total of 25 epochs with a learning rate of 1e-3 and batch size of 4. Blue is the training loss & orange is the loss on testing data.

Examples of successes!

Examples of failures!

Why failure?

If we take a closer look at the failure photos, we can see that our prediction is detecting a point where the pattern at that position resembles that of a nose. For example with the woman, the dimples in her smile create a similar pattern to that of a nose, and the last man's eyebags also has a similar pattern to that of a nose as well. Thus I believe that the filter in the first layer which is trained to recognize the pixel composition of a nose is mistaking these other features on the face for that of a nose.

Part 2: Detecting the rest of the faces

Sample Images from Dataloader

For Data Augmentation, I incorporated random changes in brightness, saturation, randomly cropping my image within a specified window, and randomly rotating it from -15 to 15 degrees. Below are some examples.

My Neural Architecture

For task two I actually forewent the advice and stayed with 4 thicc convolutional layers instead of 5-6. After tons of experimentation, I got far better results with 4 layers and I hypothesizoe it is because my image size is still relatively small. I chose to pass in image sizes of 160x120.

Convolutional layer1: I had a 20 channel 5x5 convolutional layer followed by a relu and a maxpool.

Convolutional layer2: Another 20 channel 5x5x20 convolutional layer followed by a relu and a maxpool.

Convolutional layer3: A 40 channel 5x5x20 convolutional layer followed by a relu and a maxpool

layer4: a 60 channel 3x3x40 convolutional layer straight into the fully connected layers.

Fully connected layer 1: a 640x7560 matrix followed by a relu.

Fully connected layer 2: a 116x640 matrix.

As we can see, instead of limiting my channel size to 12-28, I actually increased it to about 40 for the last two layers and got far better results. When I was experimenting with the 5-layered CNN with 12-28 channel sizes, my predictions tended to be more of an average of the facial poisitions.

Training and Validation Accuracy

Once again, using a learning rate of 1e-3 and batch size of 4.

Visualization of weights!

Examples of successes!

Examples of failures!

As seen above my model is able to account for faces that are rotated, but not so much faces where they face strongly in a single direction. As seen by the bottom examples, the model has trouble identifying the orientation of the face.

!!!!!! FOR GRADERS PLEASE CLICK THIS LINK FOR MY PT3 CODE !!!!!!!!!

here

Part 3: Resnet18 trained on 6k images!

Training loss over 10 epochs

Since there did not exist a validation dataset, I was not able to obtain a validation training loss line. Please ignore the orange line. And as above, my training rate was 1e-3 and the batch size I used was 4.

Neural Architecture

I opted to use resnet18, which is 18 layers deep and has the special property of adding the "residual" of the image between every layer. I changed the input channel size and output channel size to adhere to our specific data.

Some examples of successes on the training data!

Examples of successes on me + the boiz

Failures in goonzone