由Goodfellow等人在2014年推出的生成對抗神經網路適用於學習圖像空間VAE的替代方法。藉由強制讓聲稱ˊ圖像與真實圖片在統計上幾乎無法分別的條件下,生成相當逼真的合成圖像。

若要以直覺的方式來理解GAN,可以想像一個偽造者試圖創造衣服偽造的畢卡索畫作。起先,偽造者的偽造作品非常糟糕。然後他將依些假貨與真正的畫作混合在一起,並給鑑定師鑑定。鑑定師對每幅畫進行真實性評估,並給予位作者意見回饋。接著偽造者根據意見回饋再準備一些新的假貨,隨時間的推移偽造者跟鑑定師也越來越專業,到最後便能產出一個專業偽造者與鑑定師。

這就是GAN的運作原理:為偽造神經網路以及專家神經網路。因此由兩部分組成:

1.生成器神經網路generator:將隨機向量作為輸入,並將其解碼生成合成圖片。 2.鑑別器神經網路discriminator:將圖像輸入並預測是來自訓練集還是生成器。

生成器神經網路為了能夠欺騙鑑別器神經網路,因此隨著訓練的發展,將逐步產生越來越逼真的圖像,讓鑑別神經網路無法判定兩者的差別,而在同時鑑別器也會逐漸改進能力,進而對生成圖像訂出更高的鑑定標準。

當訓練結束時生成器就能夠將其輸入空間中的任何點轉換為可信的圖像,不同於VAE,GAN的潛在空間較少具有明確意義的結構,尤其是他不具有連續性質。

值得注意的是,gan是一個優化最小化不固定的系統,與其他的深度學習訓練設定較不一樣,通常梯度下降主要是在靜態損失趨勢中持續往下移動,但是gan會使得往下移動的每一步都會改變損失趨勢概況,也就是每次經過梯度下降後,損失函數的趨勢都可能會改變,我們無法照著一個固定的趨勢去進行優化。這是一個動態系統,其優化過程尋求的不是最小,而是兩例平衡。出於這個原因,GAN的參數是非常難訓練,要讓GAN的工作需要大量此係調整模型架構與訓練參數。

我在這邊會以一種特定深度卷積的GAN:Deep Convolutional GAN,DCGAN,其中生成器與鑑別器都是這種深度神經的卷積網路,且主要使用Conv2Dtranspose層進行生成器的圖像取樣。

Data: CIFAR10 32*32的RGB圖像資料集,這裡指使用青蛙的類別。

概念上GAN應該這樣運作:
1.生成器將shape的向量對應到shape = (32,32,3)的圖像
2.鑑別器將shape的圖像對應到二元分類器
3.gan神經網路將生成器與鑑別器連接,使神經網路對潛在空間向量進行真實性的評估,即鑑別器對生成器解碼的這些潛在向量進行真實性的評估。
4.我們可以使用針和假的圖像對應於“t”/“f”的標籤來訓練鑑別器,如同訓練一般的二元分類器一樣。
5.要訓練生成器,我們可以使用生成器權重設定之餘gan模型損失的梯度。這表示在每個步驟中,將生成器的權重一痛到使鑑別器更可能將生成器解碼的圖像歸類為“t”的方向,也就是說要訓練生成器來欺騙鑑別器。

有一些gan的技巧,他們主要是經驗法則而不是理論支持的特點。

1.用tanh最為生成器中最後一個啟動函數,而不是其他的激活函數。 2.使用常態分佈從前在空間中取樣點,比均勻分布來的更好。
3.良好隨機性可以誘導模型的完整性。由於gan訓練的目的在於找到一個動態平衡,所以很可能會以各種方式陷入困境。因此若能在訓練期間加入隨機性有助於防止這種情況,有兩種比較常見的方式可以引入隨機性:在鑑別器使用dropout;y94ru041u,在鑑別器的資料標籤增加隨機雜訊。
4.稀疏梯度可能阻礙gan訓練,在深度學習中,稀疏性通常是理想的屬性,但gan則不然,有兩件事會造成梯度的稀疏:最大池化以及relu。因此建議使用跨步卷積神經網路以及leakyReLU來取代之。
5.生成圖像時,通常會看到由於生成器中向速空間的不均於覆蓋導致的棋盤格偽影圖(如下圖)。為了解決這個問題,可以使用可被步長大小整除的大小。

A sample

The generator

首先我們先定義一個生成器模型,該模型將向量轉換為圖像,gan常常會遇到許多問題之一是,生成器卡在生成圖像看起來像雜訊,可能可以用丟棄法改善。

import keras
from keras import layers
import numpy as np

latent_dim = 32
height = 32
width = 32
channels = 3

generator_input = keras.Input(shape=(latent_dim,))

# First, transform the input into a 16x16 128-channels feature map
x = layers.Dense(128 * 16 * 16)(generator_input)
x = layers.LeakyReLU()(x)
x = layers.Reshape((16, 16, 128))(x)

# Then, add a convolution layer
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

# Upsample to 32x32
x = layers.Conv2DTranspose(256, 4, strides=2, padding='same')(x)
x = layers.LeakyReLU()(x)

# Few more conv layers
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256, 5, padding='same')(x)
x = layers.LeakyReLU()(x)

# Produce a 32x32 1-channel feature map
x = layers.Conv2D(channels, 7, activation='tanh', padding='same')(x)
generator = keras.models.Model(generator_input, x)
generator.summary()
Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32768)             1081344   
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 32768)             0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 16, 16, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 16, 16, 256)       819456    
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 16, 16, 256)       0         
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 32, 32, 256)       1048832   
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 256)       1638656   
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 32, 32, 256)       1638656   
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU)    (None, 32, 32, 256)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 32, 32, 3)         37635     
=================================================================
Total params: 6,264,579
Trainable params: 6,264,579
Non-trainable params: 0
_________________________________________________________________

The discriminator

一個鑑別器的功能,主要是將候選圖像最為輸入,並將其分為兩類:真實or產生器。

discriminator_input = layers.Input(shape=(height, width, channels))
x = layers.Conv2D(128, 3)(discriminator_input)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(128, 4, strides=2)(x)
x = layers.LeakyReLU()(x)
x = layers.Flatten()(x)

# One dropout layer
x = layers.Dropout(0.4)(x)

# Binary Classification layer
x = layers.Dense(1, activation='sigmoid')(x)

# 將輸入的圖片轉換為二元分類決策
discriminator = keras.models.Model(discriminator_input, x)
discriminator.summary()


discriminator_optimizer = keras.optimizers.RMSprop(lr=0.0008, clipvalue=1.0, decay=1e-8)
discriminator.compile(optimizer=discriminator_optimizer, loss='binary_crossentropy')
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 30, 30, 128)       3584      
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 30, 30, 128)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 14, 14, 128)       262272    
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 6, 6, 128)         262272    
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU)    (None, 6, 6, 128)         0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 2, 2, 128)         262272    
_________________________________________________________________
leaky_re_lu_9 (LeakyReLU)    (None, 2, 2, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 512)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 513       
=================================================================
Total params: 790,913
Trainable params: 790,913
Non-trainable params: 0
_________________________________________________________________

The adversarial network

設定對抗網路,連結兩個主要神經網路,經過訓練後該模型會驅使生成器往提高欺騙能力當作方向前進,這個模型將潛在空間點轉換成分類決策,即判斷真假,並表示使用始終“是實體”的圖像標籤進行訓練。因此訓練gan會在查看假圖像時,使鑑別器更有可能預測“真的”的情況下更新生成器的權重。另外在訓練期間不能凍結鑑別器的設定。

# Set discriminator weights to non-trainable
# (will only apply to the `gan` model)
discriminator.trainable = False

gan_input = keras.Input(shape=(latent_dim,))
gan_output = discriminator(generator(gan_input))
gan = keras.models.Model(gan_input, gan_output)

gan_optimizer = keras.optimizers.RMSprop(lr=0.0004, clipvalue=1.0, decay=1e-8)
gan.compile(optimizer=gan_optimizer, loss='binary_crossentropy')

訓練DCGAN

現在我們可以開始訓練DCGAN。訓練循環的示意:

繪製於潛在空間(隨機噪聲)的隨機點。
生成帶使用隨機噪聲圖像。
混合以假亂真生成的圖像。
使用這些混和圖像與對應的目標標籤訓練鑑別器,目標標籤包括“T”/“F” 。
於潛在空間繪製新的隨機點。
使用這些隨機向量訓練GAN,目標都是要判斷“t”。這樣會持續更新生成器的權重,訓練生成器來欺騙鑑別器。

Let’s implement it:

import os
from keras.preprocessing import image

# Load CIFAR10 data
(x_train, y_train), (_, _) = keras.datasets.cifar10.load_data()

# Select frog images (class 6)
x_train = x_train[y_train.flatten() == 6]

# Normalize data
x_train = x_train.reshape(
    (x_train.shape[0],) + (height, width, channels)).astype('float32') / 255.

iterations = 10000
batch_size = 20
save_dir = '/home/ubuntu/gan_images/'

# Start training loop
start = 0
for step in range(iterations):
    # Sample random points in the latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))

    # Decode them to fake images
    generated_images = generator.predict(random_latent_vectors)

    # Combine them with real images
    stop = start + batch_size
    real_images = x_train[start: stop]
    combined_images = np.concatenate([generated_images, real_images])

    # Assemble labels discriminating real from fake images
    labels = np.concatenate([np.ones((batch_size, 1)),
                             np.zeros((batch_size, 1))])
    # Add random noise to the labels - important trick!
    labels += 0.05 * np.random.random(labels.shape)

    # Train the discriminator
    d_loss = discriminator.train_on_batch(combined_images, labels)

    # sample random points in the latent space
    random_latent_vectors = np.random.normal(size=(batch_size, latent_dim))

    # Assemble labels that say "all real images"
    misleading_targets = np.zeros((batch_size, 1))

    # Train the generator (via the gan model,
    # where the discriminator weights are frozen)
    a_loss = gan.train_on_batch(random_latent_vectors, misleading_targets)
    
    start += batch_size
    if start > len(x_train) - batch_size:
      start = 0

    # Occasionally save / plot
    if step % 100 == 0:
        # Save model weights
        gan.save_weights('gan.h5')

        # Print metrics
        print('discriminator loss at step %s: %s' % (step, d_loss))
        print('adversarial loss at step %s: %s' % (step, a_loss))

        # Save one generated image
        img = image.array_to_img(generated_images[0] * 255., scale=False)
        img.save(os.path.join(save_dir, 'generated_frog' + str(step) + '.png'))

        # Save one real image, for comparison
        img = image.array_to_img(real_images[0] * 255., scale=False)
        img.save(os.path.join(save_dir, 'real_frog' + str(step) + '.png'))
discriminator loss at step 0: 0.685675
adversarial loss at step 0: 0.667591
discriminator loss at step 100: 0.756201
adversarial loss at step 100: 0.820905
discriminator loss at step 200: 0.699047
adversarial loss at step 200: 0.776581
discriminator loss at step 300: 0.684602
adversarial loss at step 300: 0.513813
discriminator loss at step 400: 0.707092
adversarial loss at step 400: 0.716778
discriminator loss at step 500: 0.686278
adversarial loss at step 500: 0.741214
discriminator loss at step 600: 0.692786
adversarial loss at step 600: 0.745891
discriminator loss at step 700: 0.69771
adversarial loss at step 700: 0.781026
discriminator loss at step 800: 0.69236
adversarial loss at step 800: 0.748769
discriminator loss at step 900: 0.663193
adversarial loss at step 900: 0.689923
discriminator loss at step 1000: 0.706922
adversarial loss at step 1000: 0.741314
discriminator loss at step 1100: 0.682189
adversarial loss at step 1100: 0.76548
discriminator loss at step 1200: 0.687244
adversarial loss at step 1200: 0.746018
discriminator loss at step 1300: 0.697884
adversarial loss at step 1300: 0.766032
discriminator loss at step 1400: 0.691977
adversarial loss at step 1400: 0.735184
discriminator loss at step 1500: 0.696238
adversarial loss at step 1500: 0.738426
discriminator loss at step 1600: 0.698334
adversarial loss at step 1600: 0.741093
discriminator loss at step 1700: 0.70315
adversarial loss at step 1700: 0.736702
discriminator loss at step 1800: 0.693836
adversarial loss at step 1800: 0.742768
discriminator loss at step 1900: 0.69059
adversarial loss at step 1900: 0.741162
discriminator loss at step 2000: 0.696293
adversarial loss at step 2000: 0.755151
discriminator loss at step 2100: 0.686166
adversarial loss at step 2100: 0.755129
discriminator loss at step 2200: 0.692612
adversarial loss at step 2200: 0.772408
discriminator loss at step 2300: 0.704013
adversarial loss at step 2300: 0.776998
discriminator loss at step 2400: 0.693268
adversarial loss at step 2400: 0.70731
discriminator loss at step 2500: 0.684289
adversarial loss at step 2500: 0.742162
discriminator loss at step 2600: 0.700483
adversarial loss at step 2600: 0.734719
discriminator loss at step 2700: 0.699952
adversarial loss at step 2700: 0.759745
discriminator loss at step 2800: 0.697416
adversarial loss at step 2800: 0.733726
discriminator loss at step 2900: 0.697604
adversarial loss at step 2900: 0.740891
discriminator loss at step 3000: 0.698498
adversarial loss at step 3000: 0.754564
discriminator loss at step 3100: 0.695516
adversarial loss at step 3100: 0.759486
discriminator loss at step 3200: 0.693453
adversarial loss at step 3200: 0.769369
discriminator loss at step 3300: 1.5083
adversarial loss at step 3300: 0.726621
discriminator loss at step 3400: 0.686934
adversarial loss at step 3400: 0.747121
discriminator loss at step 3500: 0.689791
adversarial loss at step 3500: 0.751882
discriminator loss at step 3600: 0.71331
adversarial loss at step 3600: 0.704916
discriminator loss at step 3700: 0.690504
adversarial loss at step 3700: 0.853764
discriminator loss at step 3800: 0.688844
adversarial loss at step 3800: 0.791077
discriminator loss at step 3900: 0.679162
adversarial loss at step 3900: 0.724979
discriminator loss at step 4000: 0.676585
adversarial loss at step 4000: 0.69554
discriminator loss at step 4100: 0.693313
adversarial loss at step 4100: 0.742666
discriminator loss at step 4200: 0.678367
adversarial loss at step 4200: 0.778793
discriminator loss at step 4300: 0.699712
adversarial loss at step 4300: 0.740457
discriminator loss at step 4400: 0.697605
adversarial loss at step 4400: 0.755847
discriminator loss at step 4500: 0.710596
adversarial loss at step 4500: 0.814832
discriminator loss at step 4600: 0.706518
adversarial loss at step 4600: 0.83636
discriminator loss at step 4700: 0.687217
adversarial loss at step 4700: 0.775736
discriminator loss at step 4800: 0.769103
adversarial loss at step 4800: 0.774639
discriminator loss at step 4900: 0.692414
adversarial loss at step 4900: 0.775192
discriminator loss at step 5000: 0.715357
adversarial loss at step 5000: 0.775003
discriminator loss at step 5100: 0.703434
adversarial loss at step 5100: 0.940242
discriminator loss at step 5200: 0.704034
adversarial loss at step 5200: 0.708327
discriminator loss at step 5300: 0.698559
adversarial loss at step 5300: 0.730377
discriminator loss at step 5400: 0.684378
adversarial loss at step 5400: 0.759259
discriminator loss at step 5500: 0.693699
adversarial loss at step 5500: 0.700122
discriminator loss at step 5600: 0.715242
adversarial loss at step 5600: 0.808961
discriminator loss at step 5700: 0.689339
adversarial loss at step 5700: 0.621725
discriminator loss at step 5800: 0.679717
adversarial loss at step 5800: 0.787711
discriminator loss at step 5900: 0.700126
adversarial loss at step 5900: 0.742493
discriminator loss at step 6000: 0.692087
adversarial loss at step 6000: 0.839669
discriminator loss at step 6100: 0.677867
adversarial loss at step 6100: 0.797158
discriminator loss at step 6200: 0.70392
adversarial loss at step 6200: 0.842135
discriminator loss at step 6300: 0.688377
adversarial loss at step 6300: 0.718633
discriminator loss at step 6400: 0.781234
adversarial loss at step 6400: 0.710833
discriminator loss at step 6500: 0.682696
adversarial loss at step 6500: 0.739674
discriminator loss at step 6600: 0.693081
adversarial loss at step 6600: 0.747336
discriminator loss at step 6700: 0.681836
adversarial loss at step 6700: 0.780143
discriminator loss at step 6800: 0.728136
adversarial loss at step 6800: 0.838522
discriminator loss at step 6900: 0.660475
adversarial loss at step 6900: 0.717434
discriminator loss at step 7000: 0.672144
adversarial loss at step 7000: 0.948783
discriminator loss at step 7100: 0.692428
adversarial loss at step 7100: 0.837047
discriminator loss at step 7200: 0.731133
adversarial loss at step 7200: 0.728315
discriminator loss at step 7300: 0.671766
adversarial loss at step 7300: 0.793155
discriminator loss at step 7400: 0.712387
adversarial loss at step 7400: 0.807759
discriminator loss at step 7500: 0.68638
adversarial loss at step 7500: 0.967421
discriminator loss at step 7600: 0.690096
adversarial loss at step 7600: 0.811904
discriminator loss at step 7700: 0.702784
adversarial loss at step 7700: 0.867017
discriminator loss at step 7800: 0.674138
adversarial loss at step 7800: 0.837909
discriminator loss at step 7900: 0.674747
adversarial loss at step 7900: 0.743664
discriminator loss at step 8000: 0.680357
adversarial loss at step 8000: 0.810859
discriminator loss at step 8100: 0.688885
adversarial loss at step 8100: 0.786809
discriminator loss at step 8200: 0.671557
adversarial loss at step 8200: 0.784159
discriminator loss at step 8300: 0.70359
adversarial loss at step 8300: 0.95692
discriminator loss at step 8400: 0.720167
adversarial loss at step 8400: 1.14066
discriminator loss at step 8500: 0.747376
adversarial loss at step 8500: 0.630725
discriminator loss at step 8600: 0.688931
adversarial loss at step 8600: 0.849245
discriminator loss at step 8700: 0.707559
adversarial loss at step 8700: 0.713202
discriminator loss at step 8800: 0.673593
adversarial loss at step 8800: 0.832419
discriminator loss at step 8900: 0.6777
adversarial loss at step 8900: 0.773395
discriminator loss at step 9000: 0.659887
adversarial loss at step 9000: 0.77255
discriminator loss at step 9100: 0.675182
adversarial loss at step 9100: 0.749544
discriminator loss at step 9200: 0.687147
adversarial loss at step 9200: 0.836509
discriminator loss at step 9300: 0.690807
adversarial loss at step 9300: 0.829561
discriminator loss at step 9400: 0.656649
adversarial loss at step 9400: 0.788181
discriminator loss at step 9500: 0.703494
adversarial loss at step 9500: 0.78302
discriminator loss at step 9600: 0.680718
adversarial loss at step 9600: 0.813078
discriminator loss at step 9700: 0.704956
adversarial loss at step 9700: 0.761652
discriminator loss at step 9800: 0.673504
adversarial loss at step 9800: 0.853213
discriminator loss at step 9900: 0.669288
adversarial loss at step 9900: 0.677691

顯示一些生成的假圖像:

import matplotlib.pyplot as plt

# Sample random points in the latent space
random_latent_vectors = np.random.normal(size=(10, latent_dim))

# Decode them to fake images
generated_images = generator.predict(random_latent_vectors)

for i in range(generated_images.shape[0]):
    img = image.array_to_img(generated_images[i] * 255., scale=False)
    plt.figure()
    plt.imshow(img)
    
plt.show()