date: 2024-12-20
title: CV-Netural Network
status: DONE
author:
- AllenYGY
tags:
- NOTE
publish: True
CV-Netural Network
Cross-entropy loss is a commonly used loss function for classification tasks, particularly in multi-class problems. The loss function is defined as:
where:
Multi-Class Classification: Cross-entropy is widely used in deep learning models for tasks like image classification (e.g., CNNs, transformers).
Cross validation: measure prediction error on validation data.
Underfitting
Overfitting
Regularization
Nonlinearity in neural networks is introduced through activation functions. These functions are applied to the output of each neuron to introduce nonlinearity, enabling the network to learn complex patterns.
Activation Function | Formula | Range | Properties | Advantages | Disadvantages |
---|---|---|---|---|---|
Sigmoid | Smooth, differentiable | Useful for probabilistic output | Vanishing gradient problem; not zero-centered | ||
Tanh | Smooth, zero-centered | Zero-centered output; better gradient flow | Still suffers from vanishing gradients | ||
ReLU | Sparse activation | Efficient; mitigates vanishing gradients | "Dead neurons" if weights drive inputs negative | ||
Leaky ReLU | Allows small gradient for negatives | Avoids dead neuron problem | Small gradient for negatives limits learning rate | ||
PReLU | Learnable negative slope | Adaptive negative slope | Risk of overfitting due to extra parameters | ||
Softmax | Converts logits to probabilities | Used for classification tasks | Not for hidden layers; computationally expensive | ||
Swish | Smooth, differentiable | Improves training; no dead neurons | Computationally expensive | ||
GELU | Combines ReLU and Sigmoid concepts | Smooth, differentiable | Better for Transformer models | Slower than ReLU |
Adding More Layers
How: Simply add more hidden layers between the input and output layers. This increases the depth of the neural network.
Why: More layers allow the network to learn more complex, hierarchical representations of data. Each additional layer can capture higher-order features, which makes the model capable of handling more abstract patterns. By increasing depth, the model's capacity to learn complex mappings between inputs and outputs improves, making it suitable for tasks like image recognition, language modeling, and other advanced problems.
Residual Connections (Skip Connections)
How: In architectures like ResNet, residual connections are used where the input to a layer is added directly to its output, bypassing the transformation at that layer. This is often referred to as "skip connections."
Why: Residual connections address the vanishing gradient problem by allowing gradients to flow more easily during backpropagation, even in very deep networks. This makes it easier to train deep networks without worrying about gradients becoming too small to update the weights effectively. These connections also help the network maintain performance by allowing it to learn both the residual (new information) and the identity (previous knowledge) mapping.
Stacking Blocks of Layers
How: Instead of adding individual layers, a network can be built by stacking blocks of layers. For example, a convolutional block might consist of several convolutional layers followed by pooling layers. These blocks are repeated multiple times to form a deeper network.
Why: Stacking blocks of layers allows for more efficient learning. Each block can perform specific types of feature extraction (like edge detection in convolutional layers), and by stacking them, the network can learn progressively more abstract features. For example, early blocks may detect edges in images, while later blocks can recognize more complex shapes or objects. This modular approach also improves the reusability of network components, which makes training deeper networks more manageable.