文書生成のサンプルを動かしてみる
keras/lstm_text_generation.py at master · keras-team/keras · GitHub
Colabはこれ
https://colab.research.google.com/drive/1lbSMhWEUxenIXVPSCl8MaYcz_rJbuB6o
もとの文書は
https://s3.amazonaws.com/text-datasets/nietzsche.txt
まず1~40文字, 4~43文字, 7~46文字 のように、3字ずつずらしながら40文字ずつに切り分けていく & 41文字目、44文字目、47文字目 のように それぞれの40文字の次の1文字をnext_charsに格納していく
maxlen = 40 step = 3 sentences = [] next_chars = [] for i in range(0, len(text) - maxlen, step): sentences.append(text[i: i + maxlen]) next_chars.append(text[i + maxlen]) print('nb sequences:', len(sentences))
つぎにベクトル化をしているらしい
print('Vectorization...') x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool) y = np.zeros((len(sentences), len(chars)), dtype=np.bool) for i, sentence in enumerate(sentences): for t, char in enumerate(sentence): x[i, t, char_indices[char]] = 1 y[i, char_indices[next_chars[i]]] = 1
各文字のindexは indices_char に格納されている
{0: '\n', 1: ' ', 2: '!', 3: '"', 4: "'", 5: '(', 6: ')', 7: ',', 8: '-', 9: '.', 10: '0', 11: '1', 12: '2', 13: '3', 14: '4', 15: '5', 16: '6', 17: '7', 18: '8', 19: '9', 20: ':', 21: ';', 22: '=', 23: '?', 24: '[', 25: ']', 26: '_', 27: 'a', 28: 'b', 29: 'c', 30: 'd', 31: 'e', 32: 'f', 33: 'g', 34: 'h', 35: 'i', 36: 'j', 37: 'k', 38: 'l', 39: 'm', 40: 'n', 41: 'o', 42: 'p', 43: 'q', 44: 'r', 45: 's', 46: 't', 47: 'u', 48: 'v', 49: 'w', 50: 'x', 51: 'y', 52: 'z', 53: 'ä', 54: 'æ', 55: 'é', 56: 'ë'}
x は (len(sentences), maxlen, len(chars)) の形になっている
x[i][j][k] は i文目 の j文字目 が id=k の文字なら True, そうじゃなければ False となっている (便宜上、"文"と書いたけど、先程40文字ずつに分けた1まとまりを文としている)
たとえば
x[0][0] → array([False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False])
これは最初の文 (preface\n\n\nsupposing that truth is a woma) の 1文字目が、id:42(=p)であることを表している
つぎにLSTMのモデルを構築する
# build the model: a single LSTM print('Build model...') model = Sequential() model.add(LSTM(128, input_shape=(maxlen, len(chars)))) model.add(Dense(len(chars), activation='softmax')) optimizer = RMSprop(lr=0.01) model.compile(loss='categorical_crossentropy', optimizer=optimizer)
学習途中の経過表示用
def sample(preds, temperature=1.0): # helper function to sample an index from a probability array preds = np.asarray(preds).astype('float64') preds = np.log(preds) / temperature exp_preds = np.exp(preds) preds = exp_preds / np.sum(exp_preds) probas = np.random.multinomial(1, preds, 1) return np.argmax(probas) def on_epoch_end(epoch, _): # Function invoked at end of each epoch. Prints generated text. print() print('----- Generating text after Epoch: %d' % epoch) start_index = random.randint(0, len(text) - maxlen - 1) for diversity in [0.2, 0.5, 1.0, 1.2]: print('----- diversity:', diversity) generated = '' sentence = text[start_index: start_index + maxlen] generated += sentence print('----- Generating with seed: "' + sentence + '"') sys.stdout.write(generated) for i in range(400): x_pred = np.zeros((1, maxlen, len(chars))) for t, char in enumerate(sentence): x_pred[0, t, char_indices[char]] = 1. preds = model.predict(x_pred, verbose=0)[0] next_index = sample(preds, diversity) next_char = indices_char[next_index] generated += next_char sentence = sentence[1:] + next_char sys.stdout.write(next_char) sys.stdout.flush() print()
学習の実行
print_callback = LambdaCallback(on_epoch_end=on_epoch_end) model.fit(x, y, batch_size=128, epochs=60, callbacks=[print_callback])
学習結果は以下
多分変な文が生成されてるけど、英語が苦手だからどのくらい不自然なのか判別できない...
Epoch 60/60 200285/200285 [==============================] - 148s 741us/step - loss: 1.3291 ----- Generating text after Epoch: 59 ----- diversity: 0.2 ----- Generating with seed: "ar as it is desired and demanded by wome" ar as it is desired and demanded by women and and all the sense and subject of the problems of the power the sense to the most sense of the present the former the strength, the strength of the sense of the power the sense of the sense of the sense of the power of the participated to the sense and account the consciousness of the power of the power of the problemse--and the problemse--he wishes to the problemse--he who would not the prob ----- diversity: 0.5 ----- Generating with seed: "ar as it is desired and demanded by wome" ar as it is desired and demanded by women and believe to the unhered in the stands and entirely but all in his learned to extent as a standard new and affation of a power the result of the instances, and fatherly, which is the contempt in the person the power and a consciousness of the standate of a participation of the man and entire power, which may be present is he was personal of the self-sake and struggle and our acts and instinct ----- diversity: 1.0 ----- Generating with seed: "ar as it is desired and demanded by wome" ar as it is desired and demanded by women; the counter" in pitts. the latter hand, in. a notainly ragulour, in the goodan; ard dogmes really who experience by the "made of a liuitest power--from the cosgrable oh, an true.--we really, indeed and outicated of every such a book of its "me the one's act of worl! for hthe personal grancts, to position--have will, well to cause about self-my tereomtic grow where the lightness, with its choie ----- diversity: 1.2 ----- Generating with seed: "ar as it is desired and demanded by wome" ar as it is desired and demanded by women, he some good utiesnely it cleards who gif and purtant, numblinariation, ourselves. the secret curioe, in emess,--above spiritually tuld insiestiaing make. this kanishons this kind of at personne--it not "let peraportly aginives ever also, thanc easists of delusion fseeses: suth on wind nothin! herguimed portagh highle eighlesing, any kundod systems the svenses its power overwary acvedges? w) as