当前位置：首页 > 文章列表 > 文章 > python教程 > VGG16迁移学习教程：MNIST实战指南

VGG16迁移学习教程：MNIST实战指南

2025-09-05 09:33:36 0浏览收藏

今天golang学习网给大家带来了《VGG16迁移学习教程：MNIST数字识别实战》，其中涉及到的知识点包括等等，无论你是小白还是老手，都适合看一看哦~有好的建议也欢迎大家在评论留言，若是看完有所收获，也希望大家能多多点赞支持呀！一起加油学习~

使用 VGG16 进行 MNIST 数字识别的迁移学习教程

本文档旨在指导读者如何利用 VGG16 模型进行 MNIST 手写数字识别的迁移学习。我们将重点介绍如何构建模型、加载预训练权重、调整输入尺寸，以及解决可能出现的 GPU 配置问题，最终实现对手写数字的有效分类，并为后续基于梯度的攻击提供 logits。

迁移学习简介

迁移学习是一种机器学习技术，它允许我们将一个任务上训练的模型应用于另一个相关任务。在图像识别领域，常用的方法是使用在大型数据集（如 ImageNet）上预训练的模型，然后针对特定任务进行微调。VGG16 是一个经典的卷积神经网络，在 ImageNet 上表现出色，因此非常适合作为迁移学习的基础模型。

环境配置和问题排查

在开始之前，请确保你的环境中已安装以下库：

TensorFlow
Keras
NumPy

如果遇到 Kernel Restarting 的问题，首先需要检查 TensorFlow 是否正确识别并使用了 GPU。可以尝试以下步骤：

检查 TensorFlow 版本： 确保你使用的是支持 GPU 的 TensorFlow 版本。
检查 GPU 驱动： 确保已安装与 TensorFlow 版本兼容的 GPU 驱动程序。
验证 GPU 可用性： 使用以下代码验证 TensorFlow 是否能检测到 GPU：

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  print("GPU is available")
  print("Num GPUs Available: ", len(gpus))
else:
  print("GPU is not available")

如果输出 "GPU is not available"，则需要检查 GPU 驱动和 TensorFlow 安装。对于 Apple M2 Max 芯片，确保 TensorFlow 已配置为使用 Metal 框架。

构建 VGG16 迁移学习模型

以下代码展示了如何使用 VGG16 模型进行 MNIST 数字识别的迁移学习：

import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras import layers, models

class VGG16TransferLearning(tf.keras.Model):
  def __init__(self, base_model):
    super(VGG16TransferLearning, self).__init__()
    #base model
    self.base_model = base_model
    self.base_model.trainable = False # Freeze the base model

   # other layers
    self.flatten = layers.Flatten()
    self.dense1 = layers.Dense(512, activation='relu')
    self.dense2 = layers.Dense(512, activation='relu')
    self.dense3 = layers.Dense(10) # 10 classes for MNIST digits

  def call(self, x, training=False):
    x = self.base_model(x)
    x = self.flatten(x)
    x = self.dense1(x)
    x = self.dense2(x)
    x = self.dense3(x)
    if not training:
      x = tf.nn.softmax(x)
    return x

代码解释：

VGG16TransferLearning 类继承自 tf.keras.Model，用于构建自定义模型。
base_model 接收预训练的 VGG16 模型。
base_model.trainable = False 用于冻结 VGG16 模型的权重，防止在训练过程中被修改。这是迁移学习的关键步骤，可以利用预训练的特征提取能力。
flatten 将 VGG16 模型的输出展平。
dense1，dense2，dense3 是全连接层，用于分类。dense3 的输出维度为 10，对应 MNIST 的 10 个数字类别。
call 方法定义了模型的前向传播过程。
训练时返回 logits，预测时返回 softmax 概率。

数据预处理

MNIST 数据集通常是 28x28 的灰度图像，而 VGG16 期望的输入是彩色图像 (RGB) 且尺寸较大。因此，需要对数据进行预处理：

调整尺寸： 将图像调整为 VGG16 期望的尺寸，例如 75x75 或 224x224。
转换为 RGB： 将灰度图像转换为 RGB 图像。

import numpy as np
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import img_to_array, array_to_img

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Resize images
img_height, img_width = 75, 75  # Or 224, 224
x_train_resized = np.array([img_to_array(array_to_img(img).resize((img_height, img_width))) for img in x_train])
x_test_resized = np.array([img_to_array(array_to_img(img).resize((img_height, img_width))) for img in x_test])

# Normalize pixel values
x_train_resized = x_train_resized.astype('float32') / 255.0
x_test_resized = x_test_resized.astype('float32') / 255.0

print("Shape of x_train_resized:", x_train_resized.shape) # Should be (60000, 75, 75, 3) or (60000, 224, 224, 3)

模型编译和训练

# Load VGG16 model
base_model = VGG16(weights="imagenet", include_top=False, input_shape=(img_height, img_width, 3))

# Instantiate the transfer learning model
model = VGG16TransferLearning(base_model)

# Compile the model
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

# Train the model
model.fit(x_train_resized, y_train, epochs=10, validation_data=(x_test_resized, y_test))

代码解释：

VGG16(weights="imagenet", include_top=False, input_shape=(img_height, img_width, 3)) 加载预训练的 VGG16 模型。include_top=False 表示不包含 VGG16 的顶层分类器，input_shape 指定输入图像的尺寸。
model.compile 配置模型的损失函数、优化器和评估指标。SparseCategoricalCrossentropy(from_logits=True) 适用于多分类问题，且输入是 logits。
model.fit 训练模型。

获取 Logits 用于梯度攻击

训练完成后，你可以使用该模型获取 logits，用于后续的梯度攻击。

# Get logits for a sample image
sample_image = x_test_resized[0:1] # Reshape to (1, img_height, img_width, 3)
logits = model(sample_image)

print("Logits shape:", logits.shape)
print("Logits:", logits)