使用scikit-learn为PyTorch 模型进行超参数网格搜索
liuian 2025-05-08 19:41 3 浏览
scikit-learn是Python中最好的机器学习库,而PyTorch又为我们构建模型提供了方便的操作,能否将它们的优点整合起来呢?在本文中,我们将介绍如何使用 scikit-learn中的网格搜索功能来调整 PyTorch 深度学习模型的超参数:
- 如何包装 PyTorch 模型以用于 scikit-learn 以及如何使用网格搜索
- 如何网格搜索常见的神经网络参数,如学习率、Dropout、epochs、神经元数
- 在自己的项目上定义自己的超参数调优实验
如何在 scikit-learn 中使用 PyTorch 模型
要让PyTorch 模型可以在 scikit-learn 中使用的一个最简单的方法是使用skorch包。,这个包为 PyTorch 模型提供与 scikit-learn 兼容的 API。 在skorch中,有分类神经网络的NeuralNetClassifier和回归神经网络的NeuralNetRegressor。
pip install skorch
要使用这些包装器,必须使用 nn.Module 将 PyTorch 模型定义为类,然后在构造 NeuralNetClassifier 类时将类的名称传递给模块参数。 例如:
class MyClassifier(nn.Module):
def __init__(self):
super().__init__()
...
def forward(self, x):
...
return x
# create the skorch wrapper
model = NeuralNetClassifier(
module=MyClassifier
)
NeuralNetClassifier 类的构造函数可以获得传递给 model.fit() 调用的参数(在 scikit-learn 模型中调用训练循环的方法),例如轮次数和批量大小等。 例如:
model = NeuralNetClassifier(
module=MyClassifier,
max_epochs=150,
batch_size=10
)
NeuralNetClassifier类的构造函数也可以接受新的参数,这些参数可以传递给你的模型类的构造函数,要求是必须在它前面加上module__(两个下划线)。这些新参数可能在构造函数中带有默认值,但当包装器实例化模型时,它们将被覆盖。例如:
import torch.nn as nn
from skorch import NeuralNetClassifier
class SonarClassifier(nn.Module):
def __init__(self, n_layers=3):
super().__init__()
self.layers = []
self.acts = []
for i in range(n_layers):
self.layers.append(nn.Linear(60, 60))
self.acts.append(nn.ReLU())
self.add_module(f"layer{i}", self.layers[-1])
self.add_module(f"act{i}", self.acts[-1])
self.output = nn.Linear(60, 1)
def forward(self, x):
for layer, act in zip(self.layers, self.acts):
x = act(layer(x))
x = self.output(x)
return x
model = NeuralNetClassifier(
module=SonarClassifier,
max_epochs=150,
batch_size=10,
module__n_layers=2
)
我们可以通过初始化一个模型并打印来验证结果:
print(model.initialize())
#结果如下:
<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
module_=SonarClassifier(
(layer0): Linear(in_features=60, out_features=60, bias=True)
(act0): ReLU()
(layer1): Linear(in_features=60, out_features=60, bias=True)
(act1): ReLU()
(output): Linear(in_features=60, out_features=1, bias=True)
),
)
在scikit-learn中使用网格搜索
网格搜索是一种模型超参数优化技术。它只是简单地穷尽超参数的所有组合,并找到给出最佳分数的组合。在scikit-learn中,GridSearchCV类提供了这种技术。在构造这个类时,必须在param_grid参数中提供一个超参数字典。这是模型参数名和要尝试的值数组的映射。
默认使用精度作为优化的分数,但其他分数可以在GridSearchCV构造函数的score参数中指定。GridSearchCV将为每个参数组合构建一个模型进行评估。并且使用默认的3倍交叉验证,这些都是可以通过参数来进行设置的。
下面是定义一个简单网格搜索的例子:
param_grid = {
'epochs': [10,20,30]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
通过将GridSearchCV构造函数中的n_jobs参数设置为 -1表示将使用机器上的所有核心。否则,网格搜索进程将只在单线程中运行,这在多核cpu中较慢。
运行完毕就可以在grid.fit()返回的结果对象中访问网格搜索的结果。best_score提供了在优化过程中观察到的最佳分数,best_params_描述了获得最佳结果的参数组合。
示例问题描述
我们的示例都将在一个小型标准机器学习数据集上进行演示,该数据集是一个糖尿病发作分类数据集。这是一个小型数据集,所有的数值属性都很容易处理。
如何调优批大小和训练的轮次
在第一个简单示例中,我们将介绍如何调优批大小和拟合网络时使用的epoch数。
我们将简单评估从10到100的不批大小,代码清单如下所示:
import random
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from skorch import NeuralNetClassifier
from sklearn.model_selection import GridSearchCV
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
# PyTorch classifier
class PimaClassifier(nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
# create model with skorch
model = NeuralNetClassifier(
PimaClassifier,
criterion=nn.BCELoss,
optimizer=optim.Adam,
verbose=False
)
# define the grid search parameters
param_grid = {
'batch_size': [10, 20, 40, 60, 80, 100],
'max_epochs': [10, 50, 100]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
结果如下:
Best: 0.714844 using {'batch_size': 10, 'max_epochs': 100}
0.665365 (0.020505) with: {'batch_size': 10, 'max_epochs': 10}
0.588542 (0.168055) with: {'batch_size': 10, 'max_epochs': 50}
0.714844 (0.032369) with: {'batch_size': 10, 'max_epochs': 100}
0.671875 (0.022326) with: {'batch_size': 20, 'max_epochs': 10}
0.696615 (0.008027) with: {'batch_size': 20, 'max_epochs': 50}
0.714844 (0.019918) with: {'batch_size': 20, 'max_epochs': 100}
0.666667 (0.009744) with: {'batch_size': 40, 'max_epochs': 10}
0.687500 (0.033603) with: {'batch_size': 40, 'max_epochs': 50}
0.707031 (0.024910) with: {'batch_size': 40, 'max_epochs': 100}
0.667969 (0.014616) with: {'batch_size': 60, 'max_epochs': 10}
0.694010 (0.036966) with: {'batch_size': 60, 'max_epochs': 50}
0.694010 (0.042473) with: {'batch_size': 60, 'max_epochs': 100}
0.670573 (0.023939) with: {'batch_size': 80, 'max_epochs': 10}
0.674479 (0.020752) with: {'batch_size': 80, 'max_epochs': 50}
0.703125 (0.026107) with: {'batch_size': 80, 'max_epochs': 100}
0.680990 (0.014382) with: {'batch_size': 100, 'max_epochs': 10}
0.670573 (0.013279) with: {'batch_size': 100, 'max_epochs': 50}
0.687500 (0.017758) with: {'batch_size': 100, 'max_epochs': 100}
可以看到'batch_size': 10, 'max_epochs': 100达到了约71%的精度的最佳结果。
如何调整训练优化器
下面我们看看如何调整优化器,我们知道有很多个优化器可以选择比如SDG,Adam等,那么如何选择呢?
完整的代码如下:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from skorch import NeuralNetClassifier
from sklearn.model_selection import GridSearchCV
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
# PyTorch classifier
class PimaClassifier(nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
# create model with skorch
model = NeuralNetClassifier(
PimaClassifier,
criterion=nn.BCELoss,
max_epochs=100,
batch_size=10,
verbose=False
)
# define the grid search parameters
param_grid = {
'optimizer': [optim.SGD, optim.RMSprop, optim.Adagrad, optim.Adadelta,
optim.Adam, optim.Adamax, optim.NAdam],
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
输出如下:
Best: 0.721354 using {'optimizer': <class 'torch.optim.adamax.Adamax'>}
0.674479 (0.036828) with: {'optimizer': <class 'torch.optim.sgd.SGD'>}
0.700521 (0.043303) with: {'optimizer': <class 'torch.optim.rmsprop.RMSprop'>}
0.682292 (0.027126) with: {'optimizer': <class 'torch.optim.adagrad.Adagrad'>}
0.572917 (0.051560) with: {'optimizer': <class 'torch.optim.adadelta.Adadelta'>}
0.714844 (0.030758) with: {'optimizer': <class 'torch.optim.adam.Adam'>}
0.721354 (0.019225) with: {'optimizer': <class 'torch.optim.adamax.Adamax'>}
0.709635 (0.024360) with: {'optimizer': <class 'torch.optim.nadam.NAdam'>}
可以看到对于我们的模型和数据集Adamax优化算法是最佳的,准确率约为72%。
如何调整学习率
虽然pytorch里面学习率计划可以让我们根据轮次动态调整学习率,但是作为样例,我们将学习率和学习率的参数作为网格搜索的一个参数来进行演示。在PyTorch中,设置学习率和动量的方法如下:
optimizer = optim.SGD(lr=0.001, momentum=0.9)
在skorch包中,使用前缀optimizer__将参数路由到优化器。
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from skorch import NeuralNetClassifier
from sklearn.model_selection import GridSearchCV
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
# PyTorch classifier
class PimaClassifier(nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
# create model with skorch
model = NeuralNetClassifier(
PimaClassifier,
criterion=nn.BCELoss,
optimizer=optim.SGD,
max_epochs=100,
batch_size=10,
verbose=False
)
# define the grid search parameters
param_grid = {
'optimizer__lr': [0.001, 0.01, 0.1, 0.2, 0.3],
'optimizer__momentum': [0.0, 0.2, 0.4, 0.6, 0.8, 0.9],
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
结果如下:
Best: 0.682292 using {'optimizer__lr': 0.001, 'optimizer__momentum': 0.9}
0.648438 (0.016877) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.0}
0.671875 (0.017758) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.2}
0.674479 (0.022402) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.4}
0.677083 (0.011201) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.6}
0.679688 (0.027621) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.8}
0.682292 (0.026557) with: {'optimizer__lr': 0.001, 'optimizer__momentum': 0.9}
0.671875 (0.019918) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.0}
0.648438 (0.024910) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.2}
0.546875 (0.143454) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.4}
0.567708 (0.153668) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.6}
0.552083 (0.141790) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.8}
0.451823 (0.144561) with: {'optimizer__lr': 0.01, 'optimizer__momentum': 0.9}
0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.0}
0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.2}
0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.4}
0.450521 (0.142719) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.6}
0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.8}
0.348958 (0.001841) with: {'optimizer__lr': 0.1, 'optimizer__momentum': 0.9}
0.444010 (0.136265) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.0}
0.450521 (0.142719) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.2}
0.348958 (0.001841) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.4}
0.552083 (0.141790) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.6}
0.549479 (0.142719) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.8}
0.651042 (0.001841) with: {'optimizer__lr': 0.2, 'optimizer__momentum': 0.9}
0.552083 (0.141790) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.0}
0.348958 (0.001841) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.2}
0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.4}
0.552083 (0.141790) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.6}
0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.8}
0.450521 (0.142719) with: {'optimizer__lr': 0.3, 'optimizer__momentum': 0.9}
对于SGD,使用0.001的学习率和0.9的动量获得了最佳结果,准确率约为68%。
如何激活函数
激活函数控制单个神经元的非线性。我们将演示评估PyTorch中可用的一些激活函数。
import numpy as np
import torch
import torch.nn as nn
import torch.nn.init as init
import torch.optim as optim
from skorch import NeuralNetClassifier
from sklearn.model_selection import GridSearchCV
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
# PyTorch classifier
class PimaClassifier(nn.Module):
def __init__(self, activation=nn.ReLU):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = activation()
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
# manually init weights
init.kaiming_uniform_(self.layer.weight)
init.kaiming_uniform_(self.output.weight)
def forward(self, x):
x = self.act(self.layer(x))
x = self.prob(self.output(x))
return x
# create model with skorch
model = NeuralNetClassifier(
PimaClassifier,
criterion=nn.BCELoss,
optimizer=optim.Adamax,
max_epochs=100,
batch_size=10,
verbose=False
)
# define the grid search parameters
param_grid = {
'module__activation': [nn.Identity, nn.ReLU, nn.ELU, nn.ReLU6,
nn.GELU, nn.Softplus, nn.Softsign, nn.Tanh,
nn.Sigmoid, nn.Hardsigmoid]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
结果如下:
Best: 0.699219 using {'module__activation': <class 'torch.nn.modules.activation.ReLU'>}
0.687500 (0.025315) with: {'module__activation': <class 'torch.nn.modules.linear.Identity'>}
0.699219 (0.011049) with: {'module__activation': <class 'torch.nn.modules.activation.ReLU'>}
0.674479 (0.035849) with: {'module__activation': <class 'torch.nn.modules.activation.ELU'>}
0.621094 (0.063549) with: {'module__activation': <class 'torch.nn.modules.activation.ReLU6'>}
0.674479 (0.017566) with: {'module__activation': <class 'torch.nn.modules.activation.GELU'>}
0.558594 (0.149189) with: {'module__activation': <class 'torch.nn.modules.activation.Softplus'>}
0.675781 (0.014616) with: {'module__activation': <class 'torch.nn.modules.activation.Softsign'>}
0.619792 (0.018688) with: {'module__activation': <class 'torch.nn.modules.activation.Tanh'>}
0.643229 (0.019225) with: {'module__activation': <class 'torch.nn.modules.activation.Sigmoid'>}
0.636719 (0.022326) with: {'module__activation': <class 'torch.nn.modules.activation.Hardsigmoid'>}
ReLU激活函数获得了最好的结果,准确率约为70%。
如何调整Dropout参数
在本例中,我们将尝试在0.0到0.9之间的dropout百分比(1.0没有意义)和在0到5之间的MaxNorm权重约束值。
import numpy as np
import torch
import torch.nn as nn
import torch.nn.init as init
import torch.optim as optim
from skorch import NeuralNetClassifier
from sklearn.model_selection import GridSearchCV
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
# PyTorch classifier
class PimaClassifier(nn.Module):
def __init__(self, dropout_rate=0.5, weight_constraint=1.0):
super().__init__()
self.layer = nn.Linear(8, 12)
self.act = nn.ReLU()
self.dropout = nn.Dropout(dropout_rate)
self.output = nn.Linear(12, 1)
self.prob = nn.Sigmoid()
self.weight_constraint = weight_constraint
# manually init weights
init.kaiming_uniform_(self.layer.weight)
init.kaiming_uniform_(self.output.weight)
def forward(self, x):
# maxnorm weight before actual forward pass
with torch.no_grad():
norm = self.layer.weight.norm(2, dim=0, keepdim=True).clamp(min=self.weight_constraint / 2)
desired = torch.clamp(norm, max=self.weight_constraint)
self.layer.weight *= (desired / norm)
# actual forward pass
x = self.act(self.layer(x))
x = self.dropout(x)
x = self.prob(self.output(x))
return x
# create model with skorch
model = NeuralNetClassifier(
PimaClassifier,
criterion=nn.BCELoss,
optimizer=optim.Adamax,
max_epochs=100,
batch_size=10,
verbose=False
)
# define the grid search parameters
param_grid = {
'module__weight_constraint': [1.0, 2.0, 3.0, 4.0, 5.0],
'module__dropout_rate': [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
结果如下:
Best: 0.701823 using {'module__dropout_rate': 0.1, 'module__weight_constraint': 2.0}
0.669271 (0.015073) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 1.0}
0.692708 (0.035132) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 2.0}
0.589844 (0.170180) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 3.0}
0.561198 (0.151131) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 4.0}
0.688802 (0.021710) with: {'module__dropout_rate': 0.0, 'module__weight_constraint': 5.0}
0.697917 (0.009744) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 1.0}
0.701823 (0.016367) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 2.0}
0.694010 (0.010253) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 3.0}
0.686198 (0.025976) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 4.0}
0.679688 (0.026107) with: {'module__dropout_rate': 0.1, 'module__weight_constraint': 5.0}
0.701823 (0.029635) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 1.0}
0.682292 (0.014731) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 2.0}
0.701823 (0.009744) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 3.0}
0.701823 (0.026557) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 4.0}
0.687500 (0.015947) with: {'module__dropout_rate': 0.2, 'module__weight_constraint': 5.0}
0.686198 (0.006639) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 1.0}
0.656250 (0.006379) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 2.0}
0.565104 (0.155608) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 3.0}
0.700521 (0.028940) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 4.0}
0.669271 (0.012890) with: {'module__dropout_rate': 0.3, 'module__weight_constraint': 5.0}
0.661458 (0.018688) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 1.0}
0.669271 (0.017566) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 2.0}
0.652344 (0.006379) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 3.0}
0.680990 (0.037783) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 4.0}
0.692708 (0.042112) with: {'module__dropout_rate': 0.4, 'module__weight_constraint': 5.0}
0.666667 (0.006639) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 1.0}
0.652344 (0.011500) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 2.0}
0.662760 (0.007366) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 3.0}
0.558594 (0.146610) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 4.0}
0.552083 (0.141826) with: {'module__dropout_rate': 0.5, 'module__weight_constraint': 5.0}
0.548177 (0.141826) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 1.0}
0.653646 (0.013279) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 2.0}
0.661458 (0.008027) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 3.0}
0.553385 (0.142719) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 4.0}
0.669271 (0.035132) with: {'module__dropout_rate': 0.6, 'module__weight_constraint': 5.0}
0.662760 (0.015733) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 1.0}
0.636719 (0.024910) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 2.0}
0.550781 (0.146818) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 3.0}
0.537760 (0.140094) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 4.0}
0.542969 (0.138144) with: {'module__dropout_rate': 0.7, 'module__weight_constraint': 5.0}
0.565104 (0.148654) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 1.0}
0.657552 (0.008027) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 2.0}
0.428385 (0.111418) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 3.0}
0.549479 (0.142719) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 4.0}
0.648438 (0.005524) with: {'module__dropout_rate': 0.8, 'module__weight_constraint': 5.0}
0.540365 (0.136861) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 1.0}
0.605469 (0.053083) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 2.0}
0.553385 (0.139948) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 3.0}
0.549479 (0.142719) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 4.0}
0.595052 (0.075566) with: {'module__dropout_rate': 0.9, 'module__weight_constraint': 5.0}
可以看到,10%的Dropout和2.0的权重约束获得了70%的最佳精度。
如何调整隐藏层神经元的数量
单层神经元的数量是一个需要调优的重要参数。一般来说,一层神经元的数量控制着网络的表示能力,至少在拓扑的这一点上是这样。
理论上来说:由于通用逼近定理,一个足够大的单层网络可以近似任何其他神经网络。
在本例中,将尝试从1到30的值,步骤为5。一个更大的网络需要更多的训练,至少批大小和epoch的数量应该与神经元的数量一起优化。
import numpy as np
import torch
import torch.nn as nn
import torch.nn.init as init
import torch.optim as optim
from skorch import NeuralNetClassifier
from sklearn.model_selection import GridSearchCV
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
class PimaClassifier(nn.Module):
def __init__(self, n_neurons=12):
super().__init__()
self.layer = nn.Linear(8, n_neurons)
self.act = nn.ReLU()
self.dropout = nn.Dropout(0.1)
self.output = nn.Linear(n_neurons, 1)
self.prob = nn.Sigmoid()
self.weight_constraint = 2.0
# manually init weights
init.kaiming_uniform_(self.layer.weight)
init.kaiming_uniform_(self.output.weight)
def forward(self, x):
# maxnorm weight before actual forward pass
with torch.no_grad():
norm = self.layer.weight.norm(2, dim=0, keepdim=True).clamp(min=self.weight_constraint / 2)
desired = torch.clamp(norm, max=self.weight_constraint)
self.layer.weight *= (desired / norm)
# actual forward pass
x = self.act(self.layer(x))
x = self.dropout(x)
x = self.prob(self.output(x))
return x
# create model with skorch
model = NeuralNetClassifier(
PimaClassifier,
criterion=nn.BCELoss,
optimizer=optim.Adamax,
max_epochs=100,
batch_size=10,
verbose=False
)
# define the grid search parameters
param_grid = {
'module__n_neurons': [1, 5, 10, 15, 20, 25, 30]
}
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
结果如下:
Best: 0.708333 using {'module__n_neurons': 30}
0.654948 (0.003683) with: {'module__n_neurons': 1}
0.666667 (0.023073) with: {'module__n_neurons': 5}
0.694010 (0.014382) with: {'module__n_neurons': 10}
0.682292 (0.014382) with: {'module__n_neurons': 15}
0.707031 (0.028705) with: {'module__n_neurons': 20}
0.703125 (0.030758) with: {'module__n_neurons': 25}
0.708333 (0.015733) with: {'module__n_neurons': 30}
你可以看到,在隐藏层中有30个神经元的网络获得了最好的结果,准确率约为71%。
总结
在这篇文章中,我们介绍了如何使用PyTorch和scikit-learn在Python中优化深度学习网络的超参数。
作者:Jason Brownlee
相关推荐
- 深入解析 MySQL 8.0 JSON 相关函数:解锁数据存储的无限可能
-
引言在现代应用程序中,数据的存储和处理变得愈发复杂多样。MySQL8.0引入了丰富的JSON相关函数,为我们提供了更灵活的数据存储和检索方式。本文将深入探讨MySQL8.0中的JSON...
- MySQL的Json类型个人用法详解(mysql json类型对应java什么类型)
-
前言虽然MySQL很早就添加了Json类型,但是在业务开发过程中还是很少设计带这种类型的表。少不代表没有,当真正要对Json类型进行特定查询,修改,插入和优化等操作时,却感觉一下子想不起那些函数怎么使...
- MySQL的json查询之json_array(mysql json_search)
-
json_array顾名思义就是创建一个数组,实际的用法,我目前没有想到很好的使用场景。使用官方的例子说明一下吧。例一selectjson_array(1,2,3,4);json_array虽然单独...
- 头条创作挑战赛#一、LSTM 原理 长短期记忆网络
-
#头条创作挑战赛#一、LSTM原理长短期记忆网络(LongShort-TermMemory,LSTM)是一种特殊类型的循环神经网络(RNN),旨在解决传统RNN在处理长序列数据时面临的梯度...
- TensorBoard最全使用教程:看这篇就够了
-
机器学习通常涉及在训练期间可视化和度量模型的性能。有许多工具可用于此任务。在本文中,我们将重点介绍TensorFlow的开源工具套件,称为TensorBoard,虽然他是TensorFlow...
- 图神经网络版本的Kolmogorov Arnold(KAN)代码实现和效果对比
-
本文约4600字,建议阅读10分钟本文介绍了图神经网络版本的对比。KolmogorovArnoldNetworks(KAN)最近作为MLP的替代而流行起来,KANs使用Kolmogorov-Ar...
- kornia,一个实用的 Python 库!(python kkb_tools)
-
大家好,今天为大家分享一个实用的Python库-kornia。Github地址:https://github.com/kornia/kornia/Kornia是一个基于PyTorch的开源计算...
- 图像分割掩码标注转YOLO多边形标注
-
Ultralytics团队付出了巨大的努力,使创建自定义YOLO模型变得非常容易。但是,处理大型数据集仍然很痛苦。训练yolo分割模型需要数据集具有其特定格式,这可能与你从大型数据集中获得的...
- [python] 向量检索库Faiss使用指北
-
Faiss是一个由facebook开发以用于高效相似性搜索和密集向量聚类的库。它能够在任意大小的向量集中进行搜索。它还包含用于评估和参数调整的支持代码。Faiss是用C++编写的,带有Python的完...
- 如何把未量化的 70B 大模型加载到笔记本电脑上运行?
-
并行运行70B大模型我们已经看到,量化已经成为在低端GPU(比如Colab、Kaggle等)上加载大型语言模型(LLMs)的最常见方法了,但这会降低准确性并增加幻觉现象。那如果你和你的朋友们...
- ncnn+PPYOLOv2首次结合!全网最详细代码解读来了
-
编辑:好困LRS【新智元导读】今天给大家安利一个宝藏仓库miemiedetection,该仓库集合了PPYOLO、PPYOLOv2、PPYOLOE三个算法pytorch实现三合一,其中的PPYOL...
- 人工智能——图像识别(人工智能图像识别流程)
-
概述图像识别(ImageRecognition)是计算机视觉的核心任务之一,旨在通过算法让计算机理解图像内容,包括分类(识别物体类别)、检测(定位并识别多个物体)、分割(像素级识别)等,常见的应用场...
- PyTorch 深度学习实战(15):Twin Delayed DDPG (TD3) 算法
-
在上一篇文章中,我们介绍了DeepDeterministicPolicyGradient(DDPG)算法,并使用它解决了Pendulum问题。本文将深入探讨TwinDelayed...
- 大模型中常用的注意力机制GQA详解以及Pytorch代码实现
-
分组查询注意力(GroupedQueryAttention)是一种在大型语言模型中的多查询注意力(MQA)和多头注意力(MHA)之间进行插值的方法,它的目标是在保持MQA速度的同时...
- pytorch如何快速创建具有特殊意思的tensor张量?
-
专栏推荐正文我们通过值可以看到torch.empty并没有进行初始化创建tensor并进行随机初始化操作,常用rand/rand_like,randint正态分布(0,1)指定正态分布的均值还有方差i...
- 一周热门
-
-
Python实现人事自动打卡,再也不会被批评
-
Psutil + Flask + Pyecharts + Bootstrap 开发动态可视化系统监控
-
一个解决支持HTML/CSS/JS网页转PDF(高质量)的终极解决方案
-
再见Swagger UI 国人开源了一款超好用的 API 文档生成框架,真香
-
【验证码逆向专栏】vaptcha 手势验证码逆向分析
-
网页转成pdf文件的经验分享 网页转成pdf文件的经验分享怎么弄
-
C++ std::vector 简介
-
python使用fitz模块提取pdf中的图片
-
《人人译客》如何规划你的移动电商网站(2)
-
Jupyterhub安装教程 jupyter怎么安装包
-
- 最近发表
-
- 深入解析 MySQL 8.0 JSON 相关函数:解锁数据存储的无限可能
- MySQL的Json类型个人用法详解(mysql json类型对应java什么类型)
- MySQL的json查询之json_array(mysql json_search)
- 头条创作挑战赛#一、LSTM 原理 长短期记忆网络
- TensorBoard最全使用教程:看这篇就够了
- 图神经网络版本的Kolmogorov Arnold(KAN)代码实现和效果对比
- kornia,一个实用的 Python 库!(python kkb_tools)
- 图像分割掩码标注转YOLO多边形标注
- [python] 向量检索库Faiss使用指北
- 如何把未量化的 70B 大模型加载到笔记本电脑上运行?
- 标签列表
-
- python判断字典是否为空 (50)
- crontab每周一执行 (48)
- aes和des区别 (43)
- bash脚本和shell脚本的区别 (35)
- canvas库 (33)
- dataframe筛选满足条件的行 (35)
- gitlab日志 (33)
- lua xpcall (36)
- blob转json (33)
- python判断是否在列表中 (34)
- python html转pdf (36)
- 安装指定版本npm (37)
- idea搜索jar包内容 (33)
- css鼠标悬停出现隐藏的文字 (34)
- linux nacos启动命令 (33)
- gitlab 日志 (36)
- adb pull (37)
- table.render (33)
- uniapp textarea (33)
- python判断元素在不在列表里 (34)
- python 字典删除元素 (34)
- react-admin (33)
- vscode切换git分支 (35)
- vscode美化代码 (33)
- python bytes转16进制 (35)