终其一生,我们只不过在寻找自己

0%

pytorch初体验

上一次写神经网络应该还是大三的时候学TensorFlow,当时学了也没怎么用,最近作业需要pytorch,刚好整理一下pytorch的入门第一讲。
pytorch和numpy在很多地方相似,使用简单,最近使用量有超过TF的势头。
这一节通过对于一个简单的numpy网络,一步一步改进使用pytorch的高级接口。

  • Step 0: numpy 定义每一步的操作,包括Forward, loss, Backward, updata weight
    (如果这里看不懂需要去看神经网络的基本知识)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
N, D_in, H, D_out = 64, 1000, 100, 10
# 随机创建一些训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
# Forward pass
h = x.dot(w1) # N * H
h_relu = np.maximum(h, 0) # N * H
y_pred = h_relu.dot(w2) # N * D_out

# compute loss
loss = np.square(y_pred - y).mean()
if it%100==0: print(it, loss)

# Backward pass
# compute the gradient
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h[h<0] = 0
grad_w1 = x.T.dot(grad_h)

# update weights of w1 and w2
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
  • Step 1: 现在把数据类型换成torch的数据接口
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
N, D_in, H, D_out = 64, 1000, 100, 10
# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
# Forward pass
h = x.mm(w1) # N * H
h_relu = h.clamp(min=0) # N * H #上下限
y_pred = h_relu.mm(w2) # N * D_out

# compute loss
loss = (y_pred - y).pow(2).sum().item()
if it%100==0 :print(it, loss)

# Backward pass
# compute the gradient
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h<0] = 0
grad_w1 = x.t().mm(grad_h)

# update weights of w1 and w2
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2


通过对比,可以发现基本没有差异,就是矩阵计算的符号有所改变。

  • Step 2: PyTorch的一个重要功能就是autograd,也就是说只要定义了forward pass(前向神经网络),计算了loss之后,PyTorch可以自动求导计算模型所有参数的梯度。一个PyTorch的Tensor表示计算图中的一个节点。如果x是一个Tensor并且x.requires_grad=True那么x.grad是另一个储存着x当前梯度(相对于一个scalar,常常是loss)的向量。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
# Forward pass
y_pred = x.mm(w1).clamp(min=0).mm(w2)

# compute loss
loss = (y_pred - y).pow(2).sum() # computation graph
if it%100==0: print(it, loss.item())

# Backward pass
loss.backward()

# update weights of w1 and w2
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
w1.grad.zero_()
w2.grad.zero_()

  • Step 3: 调用torch.nn.Sequential来定义网络
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import torch.nn as nn
N, D_in, H, D_out = 64, 1000, 100, 10
# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
torch.nn.ReLU(),
torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
# Forward pass
y_pred = model(x) # model.forward()

# compute loss
loss = loss_fn(y_pred, y) # computation graph
if it%100==0 :print(it, loss.item())

# Backward pass
loss.backward()

# update weights of w1 and w2
with torch.no_grad():
for param in model.parameters(): # param (tensor, grad)
param -= learning_rate * param.grad

model.zero_grad()

  • Step 4: 这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。 optim这个package提供了各种不同的模型优化方法,包括SGD+momentum, RMSProp, Adam等等。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
torch.nn.ReLU(),
torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
# Forward pass
y_pred = model(x) # model.forward()

# compute loss
loss = loss_fn(y_pred, y) # computation graph
if it%100==0 :print(it, loss.item())

optimizer.zero_grad() #梯度初始化为零
# Backward pass
loss.backward()

# update model parameters
optimizer.step()

  • Step 5: 这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型,就需要定义nn.Module模型。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
def __init__(self, D_in, H, D_out):
super(TwoLayerNet, self).__init__()
# define the model architecture
self.linear1 = torch.nn.Linear(D_in, H, bias=False)
self.linear2 = torch.nn.Linear(H, D_out, bias=False)

def forward(self, x):
y_pred = self.linear2(self.linear1(x).clamp(min=0))
return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for it in range(500):
# Forward pass
y_pred = model(x) # model.forward()

# compute loss
loss = loss_fn(y_pred, y) # computation graph
if it%100==0 :print(it, loss.item())

optimizer.zero_grad()
# Backward pass
loss.backward()

# update model parameters
optimizer.step()

通过一步一步的改进,从numpy使用上了torch的接口,还真是收获满满的一天呢。🤪

-------------    你的留言  是我更新的动力😊    -------------