第三次作业报告

为了完成对这101个类别的分类任务,使用vgg带块的卷积神经网络

VGG

网络结构的实现如下

vgg.py

from torch import nn


def vgg_block(num_convs, in_channels, out_channels):
"""
实现一个VGG块
:param num_convs: 卷积层数量
:param in_channels: 输入通道数量
:param out_channels: 输出通道数量
:return: 由卷积层、激活函数和池化层组成的序列模型
"""
layers = []
for _ in range(num_convs):
layers.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
layers.append(nn.ReLU())
in_channels = out_channels
layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
return nn.Sequential(*layers)


# vgg
def vgg(conv_arch):
conv_blks = []
in_channels = 3
# 卷积层部分
for (num_convs, out_channels) in conv_arch:
conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
in_channels = out_channels

# 全连接层部分的输入维度依赖于最后一个卷积块的输出尺寸
return nn.Sequential(
*conv_blks,
nn.Flatten(),
# 全连接层部分
nn.Linear(out_channels * 7 * 7, 4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 4096),
nn.ReLU(),
nn.Dropout(0.5),
nn.Linear(4096, 102)
)

下面对这个网络结构进行分析:

  1. 首先定义基本的vgg块:vgg块的初始化依赖外部传入的参数,即卷积层的数量、输入通道的数量、输出通道的数量,使用这三个参数初始化基本块(卷积层、输入通道和输出通道),返回由卷积层、激活函数和池化层组成的序列模型

  2. 然后使用这个vgg块定义vgg网络,因为彩色图像共有r, g, b三个输入通道,所有使用三个输入通道;之后通过一个循环遍历输入参数更新卷积层

  3. vgg网络返回一个模型,该模型将卷积层输出的多维张量展平为一维向量,之后使用三个全连接层:

    第一个全连接层 (nn.Linear(out_channels \* 7 \* 7, 4096))

    • 输入维度是展平后的输出尺寸,即卷积层输出的通道数(out_channels)乘以空间尺寸(7x7),即 out_channels * 7 * 7
    • 输出维度是 4096,即该层有 4096 个神经元。

    nn.ReLU():对输出应用 ReLU 激活函数,增加非线性。

    nn.Dropout(0.5):在训练过程中,50% 的神经元会被随机丢弃,以减少过拟合。

    第二个全连接层 (nn.Linear(4096, 4096)):这是一个具有 4096 个神经元的全连接层,输入和输出大小均为 4096。

    第三个全连接层 (nn.Linear(4096, 102))

    • 输出的维度是 102,表示网络的最终输出是 102 个类别的概率分布(假设你处理的是 102 个类的分类问题)

超参数与优化器

训练时使用的优化器为随机梯度下降优化,损失函数使用交叉熵损失函数,学习率设置为0.01,训练轮数设置为20,设置批量大小batch_size为32。

结果

模型训练与测试的结果如下:

training on cuda
epoch: 1, loss: 4.1256, accuracy: 11.84%
epoch: 2, loss: 3.3503, accuracy: 29.57%
epoch: 3, loss: 2.7933, accuracy: 39.18%
epoch: 4, loss: 2.3512, accuracy: 46.23%
epoch: 5, loss: 1.9799, accuracy: 53.01%
epoch: 6, loss: 1.6415, accuracy: 60.11%
epoch: 7, loss: 1.3198, accuracy: 66.47%
epoch: 8, loss: 1.0045, accuracy: 73.82%
epoch: 9, loss: 0.7526, accuracy: 80.01%
epoch: 10, loss: 0.5546, accuracy: 85.41%
epoch: 11, loss: 0.4127, accuracy: 88.86%
epoch: 12, loss: 0.3165, accuracy: 91.59%
epoch: 13, loss: 0.2522, accuracy: 92.89%
epoch: 14, loss: 0.2186, accuracy: 94.30%
epoch: 15, loss: 0.1374, accuracy: 96.43%
epoch: 16, loss: 0.1213, accuracy: 97.03%
epoch: 17, loss: 0.1369, accuracy: 96.51%
epoch: 18, loss: 0.1459, accuracy: 96.16%
epoch: 19, loss: 0.1010, accuracy: 97.10%
epoch: 20, loss: 0.1033, accuracy: 97.48%

image-20241129191242474

数据增广

数据增广的策略如下

train_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(), # 随机水平翻转
transforms.RandomRotation(10), # 随机旋转 +/- 10 度
transforms.RandomResizedCrop(224, scale=(0.8, 1.0)), # 随机裁剪并调整大小
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2), # 颜色抖动
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
test_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

对于训练集:

  • 将所有图像调整到统一的大小,224x224像素,确保图像能够适应神经网络的输入尺寸。
  • 随机水平翻转图像,这有助于模型学习到不依赖于物体方向的特征。
  • 随机旋转图像 +/- 10度,帮助模型学习到旋转不变的特征。
  • 随机裁剪图像到224x224像素,这有助于模型学习到不同尺度和位置的特征。
  • 随机调整图像的亮度、对比度、饱和度和色调。这些调整模拟了不同光照和色彩条件下的图像,增强了模型对这些变化的鲁棒性。
  • ndarray 转换为torch.Tensor
  • 对图像的每个通道进行标准化,使用的参数是ImageNet数据集上预训练模型的统计值,用于匹配预训练模型的期望输入分布。

对于测试集:

  • 将图像调整到224x224像素
  • 统一转换为tensor张量
  • 对图像的每个通道进行标准化

辅助代码

util.py

from torch import nn
from matplotlib import pyplot as plt
import torch
from torch import optim


def train(net, train_iter, test_iter, num_epochs, lr, device):
"""使用GPU训练模型"""
def init_weights(m):
if type(m) == nn.Linear or type(m) == nn.Conv2d:
nn.init.xavier_uniform_(m.weight)
net.apply(init_weights)
print("training on", device)
net.to(device)
optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=2e-4)
loss = nn.CrossEntropyLoss()

# 初始化记录列表
train_losses = []
train_accuracies = []

for epoch in range(num_epochs):
# 训练损失之和, 训练准确率之和, 样本数
net.train()
correct = 0
total = 0
epoch_loss = 0.0

for i, (X, y) in enumerate(train_iter):
optimizer.zero_grad() # 梯度清空
X, y = X.to(device), y.to(device) # GPU加速
y_hat = net(X) # 前向计算
l = loss(y_hat, y) # 计算损失
l.backward() # 反向传播
optimizer.step() # 参数更新

# 记录损失
epoch_loss += l.item() # 累加损失
# 计算准确率
_, predicted = torch.max(y_hat, 1)
total += y.size(0)
correct += (predicted == y).sum().item()

# 每个epoch的平均损失和准确率
avg_loss = epoch_loss / len(train_iter)
accuracy = correct / total * 100 # 百分比

# 记录到列表中
train_losses.append(avg_loss)
train_accuracies.append(accuracy)

print(f"epoch: {epoch+1}, loss: {avg_loss:.4f}, accuracy: {accuracy:.2f}%")

# 绘制训练损失和准确率曲线
plt.figure(figsize=(12, 5))

# 绘制损失曲线
plt.subplot(1, 2, 1)
plt.plot(range(1, num_epochs + 1), train_losses, label='Train Loss', color='blue')
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

# 绘制准确率曲线
plt.subplot(1, 2, 2)
plt.plot(range(1, num_epochs + 1), train_accuracies, label='Train Accuracy', color='green')
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.legend()

plt.tight_layout()
plt.show()

主程序代码

main.py

from vgg import *
import torch
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms
from util import train

if __name__ == "__main__":
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
net = vgg(conv_arch) # vgg网络模型

dataset_path = 'caltech-101\\caltech-101\\101_ObjectCategories\\101_ObjectCategories'

# transform进行数据增广

# 数据增广strategy1
# train_transform = transforms.Compose([
# transforms.Resize((224, 224)),
# transforms.RandomHorizontalFlip(),
# transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
# ])
# 最终准确率: 85.60%, loss = 0.5337

# 数据增广strategy2
train_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.RandomHorizontalFlip(), # 随机水平翻转
transforms.RandomRotation(10), # 随机旋转 +/- 10 度
transforms.RandomResizedCrop(224, scale=(0.8, 1.0)), # 随机裁剪并调整大小
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.2), # 颜色抖动
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
test_transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# 最终准确率: 84.76%, loss = 0.5694

# 数据集
dataset = datasets.ImageFolder(root=dataset_path, transform=train_transform)

# 划分数据集
train_size = int(0.8 * len(dataset))
test_size = len(dataset) - train_size
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

# 应用transform
test_dataset.dataset.transform = test_transform

# 获得数据加载器
batch_size, lr, epochs = 32, 0.01, 20
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# 定义优化器、损失函数、策略
# optimizer = optim.SGD(net.parameters(), lr=lr, momentum=0.9, weight_decay=2e-4)
# criterion = nn.CrossEntropyLoss() # 交叉熵损失
# scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=epochs)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train(net, train_loader, test_loader, epochs, lr, device)