Home [BC] 모두를 위한 딥러닝 4 - RNN
Post
Cancel

[BC] 모두를 위한 딥러닝 4 - RNN

Part-4 RNN

Lab-11-0 RNN Intro

  • 순환신경망(Recurrent Neural Network)

RNN

  • For sequential data
  • word, sentence, time series, …
  • hiddenstate: 출력되지 않고 자체적으로 처리되는 값
  • 모든 셀이 파라미터를 공유함

$h_t = f(h_{t-1}, x_t)$
$h_t = tanh(W_th_{t-1} + W_x x_t)$

  • LSTM, GRU, …

Usage of RNN

  • one to one
  • one to many
  • many to one
  • many to many

Lab-11-1 RNN basic

  • 순환신경망(Recurrent Neural Network)
  • 은닉 상태(Hidden State)
1
2
3
rnn = torch.nn.RNN(input_size, hidden_size) # Cell A
outputs, _status = rnn(input_data) # X_t, shape = (batch_size, seq_len, input_size)
#outputs shape = (batch_size, seq_len, hidden_size) 

Lab-11-2 RNN hihello and charseq

  • 순환신경망(RNN)
  • 문자열 시퀀스(Character Sequence)
1
2
3
import torch

import numpy as np
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
char_set = ["h", "i", "e", "l", "o"]
# hyper params
input_size = len(char_set)
hidden_size = len(char_set)
learning_rate = 0.1
# data setting
x_data = [[0, 1, 0, 2, 3, 3]]
x_one_hot = [[
    [1, 0, 0, 0, 0], # h
    [0, 1, 0, 0, 0], # i
    [1, 0, 0, 0, 0], # h
    [0, 0, 1, 0, 0], # e
    [0, 0, 0, 1, 0], # l
    [0, 0, 0, 1, 0] # l
]]

y_data = [[1, 0, 2, 3, 3, 4]] # i h e l l o

# transform as torch tensor var
X = torch.FloatTensor(x_one_hot)
Y = torch.LongTensor(y_data)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
sample = "if you want you"
# make dict
char_set = list(set(sample))
char_dict = {c:i for i, c in enumerate(char_set)}

# hyper params
dic_size = len(char_dict)
hidden_size = len(char_dict)
learning_rate = 0.1

# data setting
sample_idx = [char_dict[c] for c in sample]
x_data = [sample_idx[:-1]]
x_one_hot = [np.eye(dic_size)[x] for x in x_data]
y_data = [sample_idx[1:]]

X = torch.FloatTensor(x_one_hot)
Y = torch.LongTensor(y_data)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# RNN
rnn = torch.nn.RNN(dic_size, hidden_size, batch_first=True) # (B, S, F)

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)

# training
for i in range(100):
    optimizer.zero_grad()
    outputs, _status = rnn(X)
    loss = criterion(outputs.view(-1, dic_size), Y.view(-1))
    loss.backward()
    optimizer.step()
    result = outputs.data.numpy().argmax(axis=2)
    result_str = "".join([char_set[c] for c in np.squeeze(result)])
    if i%10==0:
        print(i, "loss: ", loss.item(), "pred: ", result, "true Y: ", "pred str: ", result_str)
1
2
3
4
5
6
7
8
9
10
0 loss:  2.358027935028076 pred:  [[0 5 0 3 5 6 0 5 4 5 6 5 9 5]] true Y:  pred str:   w nwi wywiwuw
10 loss:  1.1300115585327148 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
20 loss:  0.8823428750038147 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
30 loss:  0.830751359462738 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
40 loss:  0.8180180788040161 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
50 loss:  0.8141887784004211 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
60 loss:  0.8121195435523987 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
70 loss:  0.8108598589897156 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
80 loss:  0.8099285960197449 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you
90 loss:  0.8091897964477539 pred:  [[8 0 4 1 9 0 5 2 3 7 0 4 1 9]] true Y:  pred str:  f you want you

Lab-11-3 Long sequence

  • 순환신경망(RNN)
  • 문자열 시퀀스(Character Sequence)
1
2
3
sentence = ("if you want to build a ship, don't drum up people together to "
            "collect wood and don't assign them tasks and work, but rather"
            "teach them to long for the endless immensity of the sea.")
1
2
3
4
5
6
7
8
9
# make dictionary
char_set = list(set(sentence))
char_dic = {c: i for i, c in enumerate(char_set)}

# hyper parameters
dic_size = len(char_dic)
hidden_size = len(char_dic)
sequence_length = 10 
learning_rate = 0.1
1
2
3
4
5
6
7
8
9
10
11
12
13
# data setting
x_data = []
y_data = []

for i in range(0, len(sentence) - sequence_length):
    x_str = sentence[i:i + sequence_length]
    y_str = sentence[i + 1: i + sequence_length + 1]
    print(i, x_str, '->', y_str)

    x_data.append([char_dic[c] for c in x_str])  # x str to index
    y_data.append([char_dic[c] for c in y_str])  # y str to index

x_one_hot = [np.eye(dic_size)[x] for x in x_data]
1
2
X = torch.FloatTensor(x_one_hot)
Y = torch.LongTensor(y_data)
1
2
3
4
5
6
7
8
9
10
11
12
13
class Net(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, layers):
        super(Net, self).__init__()
        self.rnn = torch.nn.RNN(input_dim, hidden_dim, num_layers=layers, batch_first=True)
        self.fc = torch.nn.Linear(hidden_dim, hidden_dim, bias=True)

    def forward(self, x):
        x, _status = self.rnn(x)
        x = self.fc(x)
        return x


net = Net(dic_size, hidden_size, 2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# loss & optimizer setting
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), learning_rate)

# start training
for i in range(100):
    optimizer.zero_grad()
    outputs = net(X)
    loss = criterion(outputs.view(-1, dic_size), Y.view(-1))
    loss.backward()
    optimizer.step()

    results = outputs.argmax(dim=2)
    predict_str = ""
    for j, result in enumerate(results):
        # print(i, j, ''.join([char_set[t] for t in result]), loss.item())
        if j == 0:
            predict_str += ''.join([char_set[t] for t in result])
        else:
            predict_str += char_set[result[-1]]

    print(predict_str)

Lab-11-4 RNN timeseries

  • 순환신경망(Recurrent Neural Network)
  • 시계열 데이터(Time Series Data)

Lab-11-5 RNN seq2seq

  • 순환신경망(Recurrent Neural Network)
  • Sequence-To-Sequence

Apply Seq2Seq: Encoder-Decoder

1
2
3
4
import torch
import torch.nn as nn
import torch.optim as optim
import random
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

raw = ["I feel hungry.	나는 배가 고프다.",
       "Pytorch is very easy.	파이토치는 매우 쉽다.",
       "Pytorch is a framework for deep learning.	파이토치는 딥러닝을 위한 프레임워크이다.",
       "Pytorch is very clear to use.	파이토치는 사용하기 매우 직관적이다."]

# fix token for "start of sentence" and "end of sentence"
SOS_token = 0
EOS_token = 1

# class for vocabulary related information of data
class Vocab:
    def __init__(self):
        self.vocab2index = {"<SOS>": SOS_token, "<EOS>": EOS_token}
        self.index2vocab = {SOS_token: "<SOS>", EOS_token: "<EOS>"}
        self.vocab_count = {}
        self.n_vocab = len(self.vocab2index)

    def add_vocab(self, sentence):
        for word in sentence.split(" "):
            if word not in self.vocab2index:
                self.vocab2index[word] = self.n_vocab
                self.vocab_count[word] = 1
                self.index2vocab[self.n_vocab] = word
                self.n_vocab += 1
            else:
                self.vocab_count[word] += 1
                
# filter out the long sentence from source and target data
def filter_pair(pair, source_max_length, target_max_length):
    return len(pair[0].split(" ")) < source_max_length and len(pair[1].split(" ")) < target_max_length

# read and preprocess the corpus data
def preprocess(corpus, source_max_length, target_max_length):
    print("reading corpus...")
    pairs = []
    for line in corpus:
        pairs.append([s for s in line.strip().lower().split("\t")])
    print("Read {} sentence pairs".format(len(pairs)))

    pairs = [pair for pair in pairs if filter_pair(pair, source_max_length, target_max_length)]
    print("Trimmed to {} sentence pairs".format(len(pairs)))

    source_vocab = Vocab()
    target_vocab = Vocab()

    print("Counting words...")
    for pair in pairs:
        source_vocab.add_vocab(pair[0])
        target_vocab.add_vocab(pair[1])
    print("source vocab size =", source_vocab.n_vocab)
    print("target vocab size =", target_vocab.n_vocab)

    return pairs, source_vocab, target_vocab

# declare simple encoder
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, x, hidden):
        x = self.embedding(x).view(1, 1, -1)
        x, hidden = self.gru(x, hidden)
        return x, hidden
    
# declare simple decoder
class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, x, hidden):
        x = self.embedding(x).view(1, 1, -1)
        x, hidden = self.gru(x, hidden)
        x = self.softmax(self.out(x[0]))
        return x, hidden
    
# convert sentence to the index tensor
def tensorize(vocab, sentence):
    indexes = [vocab.vocab2index[word] for word in sentence.split(" ")]
    indexes.append(vocab.vocab2index["<EOS>"])
    return torch.Tensor(indexes).long().to(device).view(-1, 1)

# convert sentence to the index tensor
def tensorize(vocab, sentence):
    indexes = [vocab.vocab2index[word] for word in sentence.split(" ")]
    indexes.append(vocab.vocab2index["<EOS>"])
    return torch.Tensor(indexes).long().to(device).view(-1, 1)

# training seq2seq
def train(pairs, source_vocab, target_vocab, encoder, decoder, n_iter, print_every=1000, learning_rate=0.01):
    loss_total = 0

    encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate)
    decoder_optimizer = optim.SGD(decoder.parameters(), lr=learning_rate)

    training_batch = [random.choice(pairs) for _ in range(n_iter)]
    training_source = [tensorize(source_vocab, pair[0]) for pair in training_batch]
    training_target = [tensorize(target_vocab, pair[1]) for pair in training_batch]

    criterion = nn.NLLLoss()

    for i in range(1, n_iter + 1):
        source_tensor = training_source[i - 1]
        target_tensor = training_target[i - 1]

        encoder_hidden = torch.zeros([1, 1, encoder.hidden_size]).to(device)

        encoder_optimizer.zero_grad()
        decoder_optimizer.zero_grad()

        source_length = source_tensor.size(0)
        target_length = target_tensor.size(0)

        loss = 0

        for enc_input in range(source_length):
            _, encoder_hidden = encoder(source_tensor[enc_input], encoder_hidden)

        decoder_input = torch.Tensor([[SOS_token]]).long().to(device)
        decoder_hidden = encoder_hidden # connect encoder output to decoder input

        for di in range(target_length):
            decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
            loss += criterion(decoder_output, target_tensor[di])
            decoder_input = target_tensor[di]  # teacher forcing

        loss.backward()

        encoder_optimizer.step()
        decoder_optimizer.step()

        loss_iter = loss.item() / target_length
        loss_total += loss_iter

        if i % print_every == 0:
            loss_avg = loss_total / print_every
            loss_total = 0
            print("[{} - {}%] loss = {:05.4f}".format(i, i / n_iter * 100, loss_avg))
            
# insert given sentence to check the training
def evaluate(pairs, source_vocab, target_vocab, encoder, decoder, target_max_length):
    for pair in pairs:
        print(">", pair[0])
        print("=", pair[1])
        source_tensor = tensorize(source_vocab, pair[0])
        source_length = source_tensor.size()[0]
        encoder_hidden = torch.zeros([1, 1, encoder.hidden_size]).to(device)

        for ei in range(source_length):
            _, encoder_hidden = encoder(source_tensor[ei], encoder_hidden)

        decoder_input = torch.torch.Tensor([[SOS_token]]).long().to(device)
        decoder_hidden = encoder_hidden
        decoded_words = []

        for di in range(target_max_length):
            decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
            _, top_index = decoder_output.data.topk(1)
            if top_index.item() == EOS_token:
                decoded_words.append("<EOS>")
                break
            else:
                decoded_words.append(target_vocab.index2vocab[top_index.item()])

            decoder_input = top_index.squeeze().detach()

        predict_words = decoded_words
        predict_sentence = " ".join(predict_words)
        print("<", predict_sentence)
        print("")
1
2
3
4
5
SOURCE_MAX_LENGTH = 10
TARGET_MAX_LENGTH = 12
# preprocess the corpus
load_pairs, load_source_vocab, load_target_vocab = preprocess(raw, SOURCE_MAX_LENGTH, TARGET_MAX_LENGTH)
print(random.choice(load_pairs))
1
2
3
4
5
6
7
reading corpus...
Read 4 sentence pairs
Trimmed to 4 sentence pairs
Counting words...
source vocab size = 17
target vocab size = 13
['i feel hungry.', '나는 배가 고프다.']
1
2
3
4
5
# declare the encoder and the decoder
enc_hidden_size = 16
dec_hidden_size = enc_hidden_size
enc = Encoder(load_source_vocab.n_vocab, enc_hidden_size).to(device)
dec = Decoder(dec_hidden_size, load_target_vocab.n_vocab).to(device)
1
2
# train seq2seq model
train(load_pairs, load_source_vocab, load_target_vocab, enc, dec, 5000, print_every=1000)
1
2
3
4
5
[1000 - 20.0%] loss = 0.7360
[2000 - 40.0%] loss = 0.1106
[3000 - 60.0%] loss = 0.0338
[4000 - 80.0%] loss = 0.0182
[5000 - 100.0%] loss = 0.0124
1
2
# check the model with given data
evaluate(load_pairs, load_source_vocab, load_target_vocab, enc, dec, TARGET_MAX_LENGTH)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
> i feel hungry.
= 나는 배가 고프다.
< 나는 배가 고프다. <EOS>

> pytorch is very easy.
= 파이토치는 매우 쉽다.
< 파이토치는 매우 쉽다. <EOS>

> pytorch is a framework for deep learning.
= 파이토치는 딥러닝을 위한 프레임워크이다.
< 파이토치는 딥러닝을 위한 프레임워크이다. <EOS>

> pytorch is very clear to use.
= 파이토치는 사용하기 매우 직관적이다.
< 파이토치는 사용하기 매우 직관적이다. <EOS>

Lab-11-6 PackedSequence

  • 순환신경망(Recurrent Neural Network)
  • PackedSequence
  • 패딩(Padding)
  1. pad
  2. pack
This post is licensed under CC BY 4.0 by the author.

[BC] 모두를 위한 딥러닝 3 - CNN

23년 2월 2주차 주간 회고

Comments powered by Disqus.