RamenStyleAsYouLike:
スタイル特徴を考慮したマスク画像からの画像生成

趙宰亨岡本開夢下田和柳井啓司

電気通信大学　情報学専攻

Demo Page

デモUIの説明

概要

本研究では、従来の領域マスク画像からの実画像生成における問題点であった領域毎のスタイルが指定できない問題点を解決する。実画像の各領域からMask Average Poolingによってスタイル特徴を抽出し、生成時にはマスク画像に各マスク領域のスタイル特徴を組み合わせることで、ユーザが自由に領域のスタイルを制御可能とした。
この手法にUEC-Ramen555 datasetを組み合わせることで、スープ、丼、チャーシューなどのスタイルを自由に指定し画像生成することが可能となり、多数のラーメン画像からユーザが好みのトッピングスタイルを選んで、その特徴量を元に「究極のラーメン画像」を生成することが可能となった。
提案手法をWeb ベースのシステムとして実装した「RamenStyleAsYouLike」のオンラインデモをMIRUで期間中限定で公開するので、ぜひ「究極の一杯」の生成を体験して頂きたい。

Demo Page

デモUIの説明

Style Encoder

スケッチ画像を基にして生成した画像のスタイルを制御するため，与えられたスタイル画像から領域マスク要素のマスクスタイル特徴を抽出するStyle Encoderを提案する。
Style Encoderはsemantic segmentation maskとスタイル画像を受け取り、スタイル画像の各マスク要素（顔の場合は髪の毛、皮膚、口の領域）からスタイル特徴を抽出する。

データセット

UEC-Ramen555は555枚のラーメン画像が含めているラーメン画像データセットであります。各画像にはラーメンの属性のセグメンテーションマスクがあります。セマンティックマスクは手動でアノテーションされた11クラスのピクセルレベルのセマンティックラベルで構成されています。さらに、5クラスのラーメンスープのカテゴリアノテーションが追加で含まれています。ラーメン画像のセマンティックラベルは背景、器、スープ、レンゲ、箸と具（切れた卵、海苔、チャーシューなど）のセマンティックラベルが含めています。スープのカテゴリラベルには塩スープ、醤油スープ、味噌スープ、豚骨スープ、辛いスープが含まれます。

Download page

実験

特定のスタイル画像から抽出したスタイルを考慮した画像生成の結果

複数のスタイル画像から抽出したスタイルを考慮した画像生成の結果

参考文献

Cho, J. and Shimoda, W. and Yanai, K. "RamenAsYouLike: Sketch-based Food Image Generation and Editing", in Proc. of ACM Multimedia (ACMMM), 2019.
Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S. and Choo, J. "StarGAN: Unified Generative AdversarialNetworks for Multi-Domain Image-to-Image Translation", in Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2018.
Huang, X., Liu, M. Y., Belongie, S. and Kautz, J. "Multimodal Unsupervised Image-to-image Translation", in Proc. of European Conference on Computer Vision (ECCV), 2018.
Isola, P., Zhu, J. Y., Zhou, T. and Efros, A. A. "Image-To-Image Translation With Conditional Adversarial Networks", in Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2017.
Lee, H. Y., Tseng, H. Y., Huang, J. B., Singh, M. andYang, M. H. "Diverse Image-to-Image Translation viaDisentangled Representations", in Proc. of European Conference on Computer Vision (ECCV), 2018.
Liu, M. Y., Breuel, T. and Kautz, J. "Unsupervised Image-to-Image Translation Networks", in Proc. of Advances in Neural Information Processing Systems (NeurIPS), 2017.
Mirza, M. and Osindero, S. "Conditional generative adversarial nets", in Proc. of arXiv:1411.1784, 2014.
Park, T., Liu, M. Y., Wang, T. C. and Zhu, J. Y. "Semantic Image Synthesis With Spatially-Adaptive Normalization", in Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2019.
Ronneberger, O., Fischer, P. and Brox, T. "U-Net: Convolutional Networks for Biomedical Image Segmentation", in Proc. of International Conference on Medical image computing and computer-assisted intervention (MICCAI), 2015.
Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J.and Catanzaro, B. "High-Resolution Image Synthesisand Semantic Manipulation With Conditional GANs", in Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2018.
Zhu, J. Y., Park, T., Isola, P. and Efros, A. A. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks", in Proc. of IEEE International Conference on Computer Vision (ICCV), 2017.
Zhu, P., Abdal, R., Qin, Y. and Wonka, P. "SEAN: Image Synthesis with Semantic Region-Adaptive Normalization", in Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2020.

デモUIの説明

Segmentation

Style

Generation

RamenStyleAsYouLike:スタイル特徴を考慮したマスク画像からの画像生成

趙 宰亨 岡本 開夢 下田 和 柳井 啓司

電気通信大学 情報学専攻

Demo Page

概要

Demo Page

Style Encoder

データセット

実験

特定のスタイル画像から抽出したスタイルを考慮した画像生成の結果

複数のスタイル画像から抽出したスタイルを考慮した画像生成の結果

参考文献

デモUIの説明

RamenStyleAsYouLike:
スタイル特徴を考慮したマスク画像からの画像生成

趙宰亨岡本開夢下田和柳井啓司

電気通信大学　情報学専攻