Multi-Layer Perceptron (MLP, Fully-connected layer)

2021. 5. 2. 14:20

728x90

이번 시간에는 여러 개의 Fully-connected layer를 쌓는 것을 코딩해본다.

먼저 아래의 코드를 통해 library 를 importing 함

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
import os
import matplotlib.pyplot as plt
import numpy as np

또한 아래의 코드를 통해 현재의 pytorch 버전에 대해 확인함

torch.__version__

이후 연산할 장치에 대해 선언해야 함

아래와 같이 device를 gpu로 설정함

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.manual_seed(2891)
num_gpu = 1
if torch.cuda.device_count() > 1:
    num_gpu = torch.cuda.device_count()
print("Let's use", num_gpu, "GPUs!") # 1
print('device', device) # cuda

이후 간단한 MLP (Multi-Layer Perceptron) 모델을 구현함

입력은 MNIST dataset을 사용함

MNIST dataset의 각각의 요소는 (28, 28) 의 shape을 갖고 있기 때문에,

MLP를 통과하기 위해서는 각 요소를 flatten 시켜야 함

즉 reshape 을 사용하여 (28, 28) --> (1, 784)로 변경해야 함

class MnistMLP(nn.Module):
    def __init__(self, num_class, drop_prob):
        super(MnistMLP, self).__init__()
        # input is 28x28
        # need for flatten ==> 784
        self.dropout = nn.Dropout(p=drop_prob)
        self.linear1 = nn.Linear(784, 512)
        self.linear2 = nn.Linear(512, 256)
        self.linear3 = nn.Linear(256, 10)

        self.reduce_layer = nn.Linear(10, num_class)
        self.logsoftmax = nn.LogSoftmax(dim=1)
       
    def forward(self, x):
       
        x = x.float()
        mlp1 = F.relu(self.linear1(x.view(-1, 784)))
        mlp1 = self.dropout(mlp1)
        mlp2 = F.relu(self.linear2(mlp1))
        mlp2 = self.dropout(mlp2)
        mlp3 = F.relu(self.linear3(mlp2))
        mlp3 = self.dropout(mlp3)
  
        output = self.reduce_layer(mlp3)

    
        return self.logsoftmax(output)

이후 아래의 코드처럼 모델을 선언하고 gpu에 올림

model = MnistMLP(10, 0.3)
model.to(device)

'''
MnistMLP(
  (dropout): Dropout(p=0.3, inplace=False)
  (linear1): Linear(in_features=784, out_features=512, bias=True)
  (linear2): Linear(in_features=512, out_features=256, bias=True)
  (linear3): Linear(in_features=256, out_features=10, bias=True)
  (reduce_layer): Linear(in_features=10, out_features=10, bias=True)
  (logsoftmax): LogSoftmax(dim=1)
)
'''

MNIST의 class 개수는 10개 이므로, 첫 번째 인자에 10을 넣었고, dropout은 30%확률로 진행

작성한 모델의 매 layer 마다의 shape은 아래를 통해서 확인할 수 있고,

#model shape
for p in model.parameters():
    print(p.size())
'''
torch.Size([512, 784])
torch.Size([512])
torch.Size([256, 512])
torch.Size([256])
torch.Size([10, 256])
torch.Size([10])
torch.Size([10, 10])
torch.Size([10])
'''

총 hyperparameter는 아래를 통해 확인 가능함

def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

model_hp = count_parameters(model)
print('model"s hyper parameters', model_hp)
'''
model"s hyper parameters 535928
'''

이제 모델 선언은 끝났고, data를 loading 해야 함

아래의 코드를 통해 MNIST dataset을 다운받고, train 및 test로 분할함

batch_size = 128
train_loader = torch.utils.data.DataLoader(datasets.MNIST('data', train=True, download=True, transform=transforms.ToTensor()),batch_size=batch_size, shuffle=True)
print(len(train_loader)) # 118, 512 * 118 = 60000
test_loader = torch.utils.data.DataLoader(datasets.MNIST('data', train=False, transform=transforms.ToTensor()),batch_size=1000)
print(len(test_loader)) # 10, 10 * 1000 = 10000
'''
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz
9913344/? [04:54<00:00, 33670.03it/s]

Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz
29696/? [00:01<00:00, 26891.25it/s]

Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz
1649664/? [00:00<00:00, 3911534.90it/s]

Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz
5120/? [00:00<00:00, 159181.34it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw

Processing...
Done!
469
10
/usr/local/lib/python3.7/dist-packages/torchvision/datasets/mnist.py:502: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:143.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
'''

Optimizer 선언은 아래와 같이 진행하며, 가장 많이 사용되는 Adam을 learning rate 1e-4로 사용

optimizer = optim.Adam(model.parameters(), lr=1e-4)

Training을 진행함 (epoch은 10까지만 진행)

model.train()
epochs = 10 ### change
total_loss = 0
total_acc = 0
train_loss = []
train_accuracy = []
i = 0
for epoch in range(epochs):
    for data, target in train_loader:
        data, target = Variable(data), Variable(target)
        data = data.to(device)        
       
        target = target.to(device)
         
        optimizer.zero_grad()
        output = model(data)
       
        loss = F.nll_loss(output, target)
        loss.backward()    # calc gradients
       
        total_loss += loss
       
        train_loss.append(total_loss/i)
        optimizer.step()   # update gradients
       
        prediction = output.data.max(1)[1]   # first column has actual prob.
        accuracy = prediction.eq(target.data).sum()/batch_size*100
       
        total_acc += accuracy
       
        train_accuracy.append(total_acc/i)
       
        if i % 10 == 0:
            print('Epoch: {}\t Train Step: {}\tLoss: {:.3f}\tAccuracy: {:.3f}'.format(epoch+1, i, loss, accuracy))
        i += 1
    print('Epoch: {} finished'.format(epoch+1))
'''
Epoch: 9	 Train Step: 4200	Loss: 0.652	Accuracy: 78.906
Epoch: 9	 Train Step: 4210	Loss: 0.422	Accuracy: 85.156
Epoch: 9	 Train Step: 4220	Loss: 0.496	Accuracy: 61.719
Epoch: 9 finished
Epoch: 10	 Train Step: 4230	Loss: 0.432	Accuracy: 84.375
Epoch: 10	 Train Step: 4240	Loss: 0.435	Accuracy: 89.062
Epoch: 10	 Train Step: 4250	Loss: 0.370	Accuracy: 86.719
Epoch: 10	 Train Step: 4260	Loss: 0.468	Accuracy: 83.594
Epoch: 10	 Train Step: 4270	Loss: 0.479	Accuracy: 85.156
Epoch: 10	 Train Step: 4280	Loss: 0.422	Accuracy: 85.156
Epoch: 10	 Train Step: 4290	Loss: 0.538	Accuracy: 78.906
Epoch: 10	 Train Step: 4300	Loss: 0.493	Accuracy: 87.500
Epoch: 10	 Train Step: 4310	Loss: 0.531	Accuracy: 82.031
Epoch: 10	 Train Step: 4320	Loss: 0.524	Accuracy: 82.031
Epoch: 10	 Train Step: 4330	Loss: 0.520	Accuracy: 83.594
Epoch: 10	 Train Step: 4340	Loss: 0.557	Accuracy: 82.812
Epoch: 10	 Train Step: 4350	Loss: 0.597	Accuracy: 80.469
Epoch: 10	 Train Step: 4360	Loss: 0.272	Accuracy: 90.625
Epoch: 10	 Train Step: 4370	Loss: 0.402	Accuracy: 85.938
Epoch: 10	 Train Step: 4380	Loss: 0.552	Accuracy: 78.906
Epoch: 10	 Train Step: 4390	Loss: 0.450	Accuracy: 85.156
Epoch: 10	 Train Step: 4400	Loss: 0.505	Accuracy: 85.156
Epoch: 10	 Train Step: 4410	Loss: 0.498	Accuracy: 79.688
Epoch: 10	 Train Step: 4420	Loss: 0.550	Accuracy: 77.344
Epoch: 10	 Train Step: 4430	Loss: 0.515	Accuracy: 84.375
Epoch: 10	 Train Step: 4440	Loss: 0.556	Accuracy: 78.125
Epoch: 10	 Train Step: 4450	Loss: 0.363	Accuracy: 88.281
Epoch: 10	 Train Step: 4460	Loss: 0.376	Accuracy: 88.281
Epoch: 10	 Train Step: 4470	Loss: 0.409	Accuracy: 86.719
Epoch: 10	 Train Step: 4480	Loss: 0.494	Accuracy: 84.375
Epoch: 10	 Train Step: 4490	Loss: 0.550	Accuracy: 82.031
Epoch: 10	 Train Step: 4500	Loss: 0.349	Accuracy: 90.625
Epoch: 10	 Train Step: 4510	Loss: 0.465	Accuracy: 82.812
Epoch: 10	 Train Step: 4520	Loss: 0.577	Accuracy: 78.906
Epoch: 10	 Train Step: 4530	Loss: 0.412	Accuracy: 85.938
Epoch: 10	 Train Step: 4540	Loss: 0.557	Accuracy: 81.250
Epoch: 10	 Train Step: 4550	Loss: 0.481	Accuracy: 83.594
Epoch: 10	 Train Step: 4560	Loss: 0.373	Accuracy: 86.719
Epoch: 10	 Train Step: 4570	Loss: 0.445	Accuracy: 84.375
Epoch: 10	 Train Step: 4580	Loss: 0.543	Accuracy: 77.344
Epoch: 10	 Train Step: 4590	Loss: 0.358	Accuracy: 88.281
Epoch: 10	 Train Step: 4600	Loss: 0.408	Accuracy: 87.500
Epoch: 10	 Train Step: 4610	Loss: 0.523	Accuracy: 82.812
Epoch: 10	 Train Step: 4620	Loss: 0.418	Accuracy: 86.719
Epoch: 10	 Train Step: 4630	Loss: 0.423	Accuracy: 85.938
Epoch: 10	 Train Step: 4640	Loss: 0.512	Accuracy: 79.688
Epoch: 10	 Train Step: 4650	Loss: 0.625	Accuracy: 77.344
Epoch: 10	 Train Step: 4660	Loss: 0.379	Accuracy: 86.719
Epoch: 10	 Train Step: 4670	Loss: 0.440	Accuracy: 82.812
Epoch: 10	 Train Step: 4680	Loss: 0.499	Accuracy: 81.250
Epoch: 10 finished
'''

Training에 대한 loss를 시각화 하기 위해 matplotlib 사용

plt.figure()
plt.plot(np.arange(len(train_loss)), train_loss)
plt.show()
#plt.savefig('./train_loss_result.png')

plt.figure()
plt.plot(np.arange(len(train_accuracy)), train_accuracy)
plt.show()
#plt.savefig('./train_accuracy_result.png')

모델의 실제 성능 평가를 하기 위해 training에 쓰이지 않은 test data로 아래와 같이 평가 진행

with torch.no_grad():
    model.eval()
    correct = 0
   
    for data, target in test_loader:
        data, target = Variable(data), Variable(target)
        data = data.to(device)
        target = target.to(device)
        output = model(data)
        prediction = output.data.max(1)[1]
        correct += prediction.eq(target.data).sum()

print('\nTest set: Accuracy: {:.2f}%'.format(100. * correct / len(test_loader.dataset)))
#Test set: Accuracy: 96.04%

간단한 MLP 3-layer 만으로도 96%의 성능을 얻었음

728x90

저작자표시 (새창열림)

'딥러닝 > Pytorch' 카테고리의 다른 글

Recurrent Neural Network (RNN) pytorch 코딩 (0)	2021.05.02
Covolutional Neural Networks (CNN) pytorch 코딩 (0)	2021.05.02

Kaen's Ritus

Multi-Layer Perceptron (MLP, Fully-connected layer)

'딥러닝 > Pytorch' 카테고리의 다른 글

+ Recent posts

티스토리툴바