Keras に入門する 文書生成 Part.1 サンプルを動かしてみる

文書生成のサンプルを動かしてみる

keras/lstm_text_generation.py at master · keras-team/keras · GitHub

Colabはこれ

https://colab.research.google.com/drive/1lbSMhWEUxenIXVPSCl8MaYcz_rJbuB6o

もとの文書は

https://s3.amazonaws.com/text-datasets/nietzsche.txt

まず1~40文字, 4~43文字, 7~46文字 のように、3字ずつずらしながら40文字ずつに切り分けていく & 41文字目、44文字目、47文字目 のように それぞれの40文字の次の1文字をnext_charsに格納していく

maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

つぎにベクトル化をしているらしい

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

各文字のindexは indices_char に格納されている

{0: '\n',
 1: ' ',
 2: '!',
 3: '"',
 4: "'",
 5: '(',
 6: ')',
 7: ',',
 8: '-',
 9: '.',
 10: '0',
 11: '1',
 12: '2',
 13: '3',
 14: '4',
 15: '5',
 16: '6',
 17: '7',
 18: '8',
 19: '9',
 20: ':',
 21: ';',
 22: '=',
 23: '?',
 24: '[',
 25: ']',
 26: '_',
 27: 'a',
 28: 'b',
 29: 'c',
 30: 'd',
 31: 'e',
 32: 'f',
 33: 'g',
 34: 'h',
 35: 'i',
 36: 'j',
 37: 'k',
 38: 'l',
 39: 'm',
 40: 'n',
 41: 'o',
 42: 'p',
 43: 'q',
 44: 'r',
 45: 's',
 46: 't',
 47: 'u',
 48: 'v',
 49: 'w',
 50: 'x',
 51: 'y',
 52: 'z',
 53: 'ä',
 54: 'æ',
 55: 'é',
 56: 'ë'}

x は (len(sentences), maxlen, len(chars)) の形になっている

x[i][j][k] は i文目 の j文字目 が id=k の文字なら True, そうじゃなければ False となっている (便宜上、"文"と書いたけど、先程40文字ずつに分けた1まとまりを文としている)

たとえば

x[0][0]

→ array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False,  True, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False])

これは最初の文 (preface\n\n\nsupposing that truth is a woma) の 1文字目が、id:42(=p)であることを表している

つぎにLSTMのモデルを構築する

# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

学習途中の経過表示用


def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def on_epoch_end(epoch, _):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

学習の実行

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=60,
          callbacks=[print_callback])

学習結果は以下

多分変な文が生成されてるけど、英語が苦手だからどのくらい不自然なのか判別できない...

Epoch 60/60
200285/200285 [==============================] - 148s 741us/step - loss: 1.3291

----- Generating text after Epoch: 59
----- diversity: 0.2
----- Generating with seed: "ar as it is desired and demanded by wome"
ar as it is desired and demanded by women and and all the sense and subject of the problems of the power the sense to the most sense of the present the former the strength, the strength of the sense of the power the sense of the sense of the sense of the power of the participated to the sense and account the consciousness of the power of the power of the problemse--and the problemse--he wishes to the problemse--he who would not the prob
----- diversity: 0.5
----- Generating with seed: "ar as it is desired and demanded by wome"
ar as it is desired and demanded by women and believe to the unhered in the stands and entirely but all in his learned to extent as a standard new and affation of a power the result of the instances, and fatherly, which is the contempt in the person the power and a consciousness of the standate of a participation of the
man and entire power, which may be present is he was personal of the self-sake and struggle and our acts and instinct 
----- diversity: 1.0
----- Generating with seed: "ar as it is desired and demanded by wome"
ar as it is desired and demanded by women; the counter" in pitts. the
latter hand, in. a notainly ragulour, in the goodan; ard dogmes really who
experience by the "made of a liuitest power--from the cosgrable oh, an true.--we really, indeed and outicated of every such a book of its "me the one's act of worl! for hthe personal
grancts, to position--have will, well to cause about  self-my tereomtic grow where the lightness, with its choie
----- diversity: 1.2
----- Generating with seed: "ar as it is desired and demanded by wome"
ar as it is desired and demanded by women, he some good utiesnely it cleards who gif and purtant, numblinariation, ourselves. the secret curioe,
in emess,--above
spiritually tuld
insiestiaing make. this kanishons this kind of at personne--it
not "let peraportly aginives ever also, thanc
easists of delusion fseeses: suth on wind nothin! herguimed portagh highle
eighlesing, any
kundod
systems the svenses its power overwary acvedges? w) as