PyTorch的神經風格轉移教程

什麼是神經風格轉移，它是如何工作的？

神經風格轉移是一種用於以另一圖像的樣式生成圖像的技術。神經風格算法將內容圖像作爲輸入，使用樣式圖像，並返回內容圖像，就好像它是使用樣式圖像的藝術風格繪製的一樣。

原始圖像（左）| 應用星夜風格（中）| 風格圖像結果（右）

神經風格算法如何工作？

爲了理解該算法中涉及的所有數學，我建議您閱讀Leon A. Gatys等人的原始論文。實現這個算法時，我們定義了兩個距離; 一個用於內容（Dc），一個用於風格（Ds）。Ds測量兩個圖像之間的內容有多麼不同，而Ds測量兩個圖像之間的風格有多麼不同。我們採用第三個圖像 - 輸入 - 並對其進行轉換，以便最小化其與內容圖像的內容距離及其與風格圖像的風格距離。

在PyTorch中實現

爲了實現此算法，我們必須導入以下Python包：

torch，torch.nn，numpy分別實現了神經網絡和科學計算。torch.optim 用於實現各種優化算法。PIL，PIL.Image，matplotlib.pyplot加載和顯示圖像。torchvision.transforms 處理PIL圖像並將其轉換爲火炬張量。torchvision.models 用於訓練和加載預訓練的機器學習模型。copy 深度複製機器學習模型。

from __future__ import print_functionimport torchimport torch.nn as nnimport torch.nn.functional as Fimport torch.optim as optimfrom PIL import Imageimport matplotlib.pyplot as pltimport torchvision.transforms as transformsimport torchvision.models as modelsimport copy

使用Cuda

如果您使用的是帶有GPU的計算機，則可以運行更大的神經網絡。如果您的計算機支持GPU，則torch.cuda.is_available() 運行將返回true。然後，您必須設置torch.device將用於此腳本。 .to(device)方法將張量或模塊移動到所需的設備。要將此張量或模塊移回CPU，請使用該 .cpu()方法。Python代碼如下：

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

加載圖像

我們通過導入風格和內容圖像開始。然後，我們將圖像縮放到所需的輸出圖像大小，並將它們轉換爲torch張量。加載的圖像必須大小相同。在本教程中使用的圖像可在Pixabay找到，可以從以下地址下載：https://pixabay.com/en/fig-cheese-bread-baguette-eat-3640553/和https://pixabay.com/en/apple-fruits-fruit-apple-tree-ripe-3640970/。我還使用Gimp將它們調整到相同的尺寸。

圖像被轉換爲torch張量，它們的值在0和1之間。這很重要，因爲神經網絡是用0-1圖像張量訓練的。

Python代碼如下：

imsize = 512 if torch.cuda.is_available() else 128 loader = transforms.Compose([ transforms.Resize(imsize), transforms.ToTensor()]) def image_loader(image_name): image = Image.open(image_name) image = loader(image).unsqueeze(0) return image.to(device, torch.float)style_img = image_loader("images/apple.jpg")content_img = image_loader("images/fig.jpg")assert style_img.size() == content_img.size(), \ "You have to to import style and content images of the same size"

代碼格式

顯示圖像

我們使用Matplotlib plt.imshow 來顯示圖像。這涉及幾個步驟：

將圖像重新轉換爲PIL圖像。複製張量，以免改變它。。刪除fake batch dimension。暫停以便更新圖表。使用imshow繪製圖像。

Python代碼如下：

unloader = transforms.ToPILImage() plt.ion()def imshow(tensor, title=None): image = tensor.cpu().clone() image = image.squeeze(0) image = unloader(image) plt.imshow(image) if title is not None: plt.title(title) plt.pause(0.001) plt.figure()imshow(style_img, title='Style Image')plt.figure()imshow(content_img, title='Content Image')

內容損失函數

內容損失是將特徵映射作爲網絡中的層的輸入並且返回該圖像與內容圖像之間的加權內容距離的函數。此函數作爲一個torch模塊實現，其構造函數將權重和目標內容作爲參數。

兩組特徵圖之間的均方誤差可以使用標準nn.MSELoss計算，形成第三個參數。每一層的內容損失作爲神經網絡的附加模塊被添加。這樣，每次向網絡輸入圖像時，所有的內容損失都將在所需的層中計算出來。autograd處理所有梯度的計算。爲此，我們使模塊的forward返回輸入。

模塊成爲神經網絡的透明層，並且計算的損失被計算爲模塊的參數。然後我們定義一個 fake backward方法，調用後向方法nn.MSELoss以重建梯度。此方法返回計算的損失，該值將在運行梯度下降時使用，以顯示風格和內容損失的演變。

Python代碼如下：

class ContentLoss(nn.Module): def __init__(self, target,): super(ContentLoss, self).__init__() self.target = target.detach() def forward(self, input): self.loss = F.mse_loss(input, self.target) return input

風格損失

對於風格，我們定義了一個模塊，該模塊計算給定神經網絡特徵圖生成的gram。然後，我們通過除以每個特徵圖中元素的數量對gram矩陣的值進行歸一化。

def gram_matrix(input): a, b, c, d = input.size() features = input.view(a * b, c * d) # resise F_XL into \hat F_XL G = torch.mm(features, features.t()) return G.div(a * b * c * d)

樣式損失模塊的實現方式與內容損失模塊完全相同;然而，它比較了目標gram矩陣和輸入矩陣的差異。

class StyleLoss(nn.Module): def __init__(self, target_feature): super(StyleLoss, self).__init__() self.target = gram_matrix(target_feature).detach() def forward(self, input): G = gram_matrix(input) self.loss = F.mse_loss(G, self.target) return input

加載神經網絡

與論文中描述的類似，我們使用具有19層（VGG19）的預先訓練的VGG網絡。PyTorch中允許我們這樣做的模塊分爲兩個子序列層; 包含卷積和池化層的特徵以及具有完全連接層的分類器。

cnn = models.vgg19(pretrained=True).features.to(device).eval()

VGG網絡在圖像上訓練，每個信道通過均值= [0.485,0.456,0.406]和std = [0.229,0.224,0.225]歸一化。我們使用它們來歸一化圖像，然後再將其發送到網絡。

cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406]).to(device)cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225]).to(device)class Normalization(nn.Module): def __init__(self, mean, std): super(Normalization, self).__init__() self.mean = torch.tensor(mean).view(-1, 1, 1) self.std = torch.tensor(std).view(-1, 1, 1) def forward(self, img): return (img - self.mean) / self.std

我們想添加我們的風格和內容模塊作爲添加透明層在我們的網絡中想要的層。爲了實現這一點，我們構建了一個新的Sequential 模塊，其中我們從vgg19和loss模塊中添加模塊。

content_layers_default = ['conv_4']style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']def get_style_model_and_losses(cnn, normalization_mean, normalization_std, style_img, content_img, content_layers=content_layers_default, style_layers=style_layers_default): cnn = copy.deepcopy(cnn) normalization = Normalization(normalization_mean, normalization_std).to(device) content_losses = [] style_losses = [] model = nn.Sequential(normalization) i = 0 for layer in cnn.children(): if isinstance(layer, nn.Conv2d): i += 1 name = 'conv_{}'.format(i) elif isinstance(layer, nn.ReLU): name = 'relu_{}'.format(i) layer = nn.ReLU(inplace=False) elif isinstance(layer, nn.MaxPool2d): name = 'pool_{}'.format(i) elif isinstance(layer, nn.BatchNorm2d): name = 'bn_{}'.format(i) else: raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__)) model.add_module(name, layer) if name in content_layers: target = model(content_img).detach() content_loss = ContentLoss(target) model.add_module("content_loss_{}".format(i), content_loss) content_losses.append(content_loss) if name in style_layers: target_feature = model(style_img).detach() style_loss = StyleLoss(target_feature) model.add_module("style_loss_{}".format(i), style_loss) style_losses.append(style_loss) for i in range(len(model) - 1, -1, -1): if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss): break model = model[:(i + 1)] return model, style_losses, content_losses

在本教程中，我們將使用內容圖像作爲輸入圖像。您可以使用其他圖像，但其他圖像必須具有相同的尺寸。

input_img = content_img.clone()plt.figure()imshow(input_img, title='Input Image')

作者建議我們使用L-BFGS算法來運行梯度下降。我們訓練輸入圖像以最小化內容/風格損失。我們創建了一個PyTorch L-BFGS優化器optim.LBFGS，並將圖像作爲張量進行優化。我們用 .requires_grad_()來確保圖像需要梯度。

def get_input_optimizer(input_img): optimizer = optim.LBFGS([input_img.requires_grad_()]) return optimizer

我們必須向網絡提供更新後的輸入，以便在每個步驟中計算新的損失。我們運行每個損失的backward 方法來動態計算它們的梯度，並執行梯度下降。優化器需要一個閉包作爲參數，閉包是一個重新評估模型並返回損失的函數。這樣做時出現的一個小挑戰是，優化後的圖像可以根據需要在-∞和+∞之間取值，而不是在0和1之間。因此，我們必須在約束下執行優化，以確保在輸入圖像中保持正確的值。我們通過對圖像進行校正，使其值在每一步的0-1之間。

def run_style_transfer(cnn, normalization_mean, normalization_std, content_img, style_img, input_img, num_steps=300, style_weight=1000000, content_weight=1): model, style_losses, content_losses = get_style_model_and_losses(cnn, normalization_mean, normalization_std, style_img, content_img) optimizer = get_input_optimizer(input_img) print('Optimizing..') run = [0] while run[0] <= num_steps: def closure(): # correct the values of updated input image input_img.data.clamp_(0, 1) optimizer.zero_grad() model(input_img) style_score = 0 content_score = 0 for sl in style_losses: style_score += sl.loss for cl in content_losses: content_score += cl.loss style_score *= style_weight content_score *= content_weight loss = style_score + content_score loss.backward() run[0] += 1 if run[0] % 50 == 0: print("run {}:".format(run)) print('Style Loss : {:4f} Content Loss: {:4f}'.format( style_score.item(), content_score.item())) print() return style_score + content_score optimizer.step(closure) input_img.data.clamp_(0, 1) return input_img

結論

現在讓我們繼續看看我們新生成的具有風格圖像藝術風格的圖像。Python代碼如下：

output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std, content_img, style_img, input_img)plt.figure()imshow(output, title='Output Image')plt.ioff()plt.show()