diff --git a/.gitattributes b/.gitattributes index 886ac0c..d08a1ef 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,38 +1,81 @@ *.7z filter=lfs diff=lfs merge=lfs -text *.arrow filter=lfs diff=lfs merge=lfs -text *.bin filter=lfs diff=lfs merge=lfs -text -*.bin.* filter=lfs diff=lfs merge=lfs -text *.bz2 filter=lfs diff=lfs merge=lfs -text +*.ckpt filter=lfs diff=lfs merge=lfs -text *.ftz filter=lfs diff=lfs merge=lfs -text *.gz filter=lfs diff=lfs merge=lfs -text *.h5 filter=lfs diff=lfs merge=lfs -text *.joblib filter=lfs diff=lfs merge=lfs -text *.lfs.* filter=lfs diff=lfs merge=lfs -text +*.mlmodel filter=lfs diff=lfs merge=lfs -text *.model filter=lfs diff=lfs merge=lfs -text *.msgpack filter=lfs diff=lfs merge=lfs -text +*.npy filter=lfs diff=lfs merge=lfs -text +*.npz filter=lfs diff=lfs merge=lfs -text *.onnx filter=lfs diff=lfs merge=lfs -text *.ot filter=lfs diff=lfs merge=lfs -text *.parquet filter=lfs diff=lfs merge=lfs -text *.pb filter=lfs diff=lfs merge=lfs -text +*.pickle filter=lfs diff=lfs merge=lfs -text +*.pkl filter=lfs diff=lfs merge=lfs -text *.pt filter=lfs diff=lfs merge=lfs -text *.pth filter=lfs diff=lfs merge=lfs -text *.rar filter=lfs diff=lfs merge=lfs -text +*.safetensors filter=lfs diff=lfs merge=lfs -text saved_model/**/* filter=lfs diff=lfs merge=lfs -text *.tar.* filter=lfs diff=lfs merge=lfs -text +*.tar filter=lfs diff=lfs merge=lfs -text *.tflite filter=lfs diff=lfs merge=lfs -text *.tgz filter=lfs diff=lfs merge=lfs -text +*.wasm filter=lfs diff=lfs merge=lfs -text *.xz filter=lfs diff=lfs merge=lfs -text *.zip filter=lfs diff=lfs merge=lfs -text -*.zstandard filter=lfs diff=lfs merge=lfs -text -*.tfevents* filter=lfs diff=lfs merge=lfs -text -*.db* filter=lfs diff=lfs merge=lfs -text -*.ark* filter=lfs diff=lfs merge=lfs -text -**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text -**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text -**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text -*.safetensors filter=lfs diff=lfs merge=lfs -text -*.ckpt filter=lfs diff=lfs merge=lfs -text -*.gguf* filter=lfs diff=lfs merge=lfs -text -*.ggml filter=lfs diff=lfs merge=lfs -text -*.llamafile* filter=lfs diff=lfs merge=lfs -text -*.pt2 filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text +*tfevents* filter=lfs diff=lfs merge=lfs -text +images/overflow.png filter=lfs diff=lfs merge=lfs -text +images/show_case/1.png filter=lfs diff=lfs merge=lfs -text +images/show_case/10.png filter=lfs diff=lfs merge=lfs -text +images/show_case/11.png filter=lfs diff=lfs merge=lfs -text +images/show_case/12.png filter=lfs diff=lfs merge=lfs -text +images/show_case/14.png filter=lfs diff=lfs merge=lfs -text +images/show_case/15.png filter=lfs diff=lfs merge=lfs -text +images/show_case/16.png filter=lfs diff=lfs merge=lfs -text +images/show_case/17.png filter=lfs diff=lfs merge=lfs -text +images/show_case/18.png filter=lfs diff=lfs merge=lfs -text +images/show_case/19.png filter=lfs diff=lfs merge=lfs -text +images/show_case/2.png filter=lfs diff=lfs merge=lfs -text +images/show_case/21.png filter=lfs diff=lfs merge=lfs -text +images/show_case/22.png filter=lfs diff=lfs merge=lfs -text +images/show_case/23.png filter=lfs diff=lfs merge=lfs -text +images/show_case/24.png filter=lfs diff=lfs merge=lfs -text +images/show_case/26.png filter=lfs diff=lfs merge=lfs -text +images/show_case/27.png filter=lfs diff=lfs merge=lfs -text +images/show_case/28.png filter=lfs diff=lfs merge=lfs -text +images/show_case/29.png filter=lfs diff=lfs merge=lfs -text +images/show_case/3.png filter=lfs diff=lfs merge=lfs -text +images/show_case/30.png filter=lfs diff=lfs merge=lfs -text +images/show_case/31.png filter=lfs diff=lfs merge=lfs -text +images/show_case/32.png filter=lfs diff=lfs merge=lfs -text +images/show_case/34.png filter=lfs diff=lfs merge=lfs -text +images/show_case/35.png filter=lfs diff=lfs merge=lfs -text +images/show_case/36.png filter=lfs diff=lfs merge=lfs -text +images/show_case/38.png filter=lfs diff=lfs merge=lfs -text +images/show_case/39.png filter=lfs diff=lfs merge=lfs -text +images/show_case/4.png filter=lfs diff=lfs merge=lfs -text +images/show_case/40.png filter=lfs diff=lfs merge=lfs -text +images/show_case/41.png filter=lfs diff=lfs merge=lfs -text +images/show_case/42.png filter=lfs diff=lfs merge=lfs -text +images/show_case/43.png filter=lfs diff=lfs merge=lfs -text +images/show_case/44.png filter=lfs diff=lfs merge=lfs -text +images/show_case/45.png filter=lfs diff=lfs merge=lfs -text +images/show_case/46.png filter=lfs diff=lfs merge=lfs -text +images/show_case/47.png filter=lfs diff=lfs merge=lfs -text +images/show_case/48.png filter=lfs diff=lfs merge=lfs -text +images/show_case/49.png filter=lfs diff=lfs merge=lfs -text +images/show_case/50.png filter=lfs diff=lfs merge=lfs -text +images/show_case/6.png filter=lfs diff=lfs merge=lfs -text +images/show_case/7.png filter=lfs diff=lfs merge=lfs -text +images/show_case/8.png filter=lfs diff=lfs merge=lfs -text +images/show_case/9.png filter=lfs diff=lfs merge=lfs -text +diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text diff --git a/README.md b/README.md index fb1c19f..e2adc62 100644 --- a/README.md +++ b/README.md @@ -1,47 +1,187 @@ --- -license: Apache License 2.0 - -#model-type: -##如 gpt、phi、llama、chatglm、baichuan 等 -#- gpt - -#domain: -##如 nlp、cv、audio、multi-modal -#- nlp - -#language: -##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa -#- cn - -#metrics: -##如 CIDEr、Blue、ROUGE 等 -#- CIDEr - -#tags: -##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他 -#- pretrained - -#tools: -##如 vllm、fastchat、llamacpp、AdaSeq 等 -#- vllm +license: apache-2.0 --- -### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。 -#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型 +
+
+如果您是本模型的贡献者,我们邀请您根据模型贡献文档,及时完善模型卡片内容。
\ No newline at end of file +EcomID aims to generate customized images from a single reference ID image, ensuring strong semantic consistency while being controlled by keypoints. + +This repository provides the EcomID method and model, combining the strengths of [PuLID](https://github.com/ToTheBeginning/PuLID) and [InstantID](https://github.com/instantX-research/InstantID) for better background consistency, facial keypoint control, and realistic facial representation with improved similarity. + +# EcomID Overview + +## EcomID Structure +
+
+
+- **IP-Adapter of PuLID**: EcomID incorporates the ID-Encoder and cross-attention components from PuLID, trained with alignment loss.
+This method effectively reduces the interference of ID embeddings on text embeddings within the cross-attention part, minimizing disruption to the underlying model's text-to-image capabilities.
+- **InstantID’s IdentityNet Architecture**: Utilizing **a dataset of 2 million aesthetically pleasing portrait images**, IdentityNet enhances keypoint control, improving ID consistency and facial realism. During training, the IP-adapter is frozen, and only the IdentityNet is trained. Facial landmarks are used as conditional inputs, while face embeddings are integrated into IdentityNet via cross-attention.
+
+# Show Cases
+## Comparison with Other Methods
+### 1、Preserved Text-to-Image Capability
+
+| Prompt | +Reference Image | +EcomID | +InstantID | +||
|---|---|---|---|---|---|
| girl, white skin, black hair, long wavy hair, in European style living room, Retro tone, decorations, depth of field. | +![]() |
+ ![]() |
+ ![]() |
+
| Prompt | +Reference Image | +EcomID | +InstantID | +PuLID | +|
|---|---|---|---|---|---|
| A close-up portrait of a man standing in the library, holding two smiling toddlers next to him. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| Prompt | +Reference Image | +EcomID | +InstantID | +PuLID | +
|---|---|---|---|---|
| A close-up portrait of a little girl with double braids, wearing a white dress, standing on the beach during sunset. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| A close-up portrait of a very little girl with double braids, wearing a hat and white dress, standing on the beach during sunset. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| Agrizzled detective, fedora casting a shadow over his square jaw, a cigar dangling from his lips, his trench coat evocative of film noir, in a rainy alley. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| A smiling girl with bangs and long hair in a school uniform stands under cherry trees, holding a book. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| A very old witch, wearing a black cloak, with a pointed hat, holding a magic wand, against a background of a misty forest. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| A man clad in cyberpunk fashion: neon accents, reflective sunglasses, and a leather jacket with glowing circuit patterns. He stands stoically amidst a soaked cityscape. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| SDXL models | +Prompt | +Reference Image | +EcomID | +InstantID | +PuLID | +
|---|---|---|---|---|---|
| sd-xl-base-1.0 | +girl, solo, brown hair, holding a little teddy bear on her hands, wearing a school uniform, standing in the library, cartoon style. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| EcomXL | +A close-up portrait of a very little girl with double braids, wearing a hat and white dress, standing on the beach during sunset. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| DreamShaperXL | +solo, looking_at_viewer, smile, brown_hair, upper_body, open_clothes, teeth, open_jacket, black_jacket, blurry_background, realistic | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| leosam_xl_v7 | +A close-up portrait of a girl, solo, dress, jewelry, beach and sea, pink_dress, realistic. | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
+
+
+
+- **PuLID 的 IP-Adapter**:EcomID 借鉴了 PuLID 的 ID-Encoder 和交叉注意力组件,其使用对齐损失训练而成。
+故而该方法有效减少了 ID embedding 对交叉注意力部分的文本 embedding的干扰,最小化对底层模型文本到图像能力的干扰。
+
+- **InstantID 的 IdentityNet 架构**:利用 *200 万张美观的人像图像数据集*,训练了IdentityNet,增强了关键点控制,提高了 ID 一致性和面部真实感。在训练过程中,IP-adapter 被冻结,只有 IdentityNet 被训练。面部Keypoint用作条件输入,同时面部嵌入通过交叉注意力集成到 IdentityNet 中。
+
+# 展示案例
+## 与其他方法的比较
+### 1、保留文本到图像能力
+| Prompt | +参考图像 | +EcomID | +InstantID | +||
|---|---|---|---|---|---|
| 女孩,白皮肤,黑头发,长卷发,在欧洲风格的客厅,复古色调,装饰品,景深。 | +![]() |
+ ![]() |
+ ![]() |
+
| Prompt | +参考图像 | +EcomID | +InstantID | +PuLID | +|
|---|---|---|---|---|---|
| 在图书馆前站着的男人的特写肖像,抱着两个微笑的幼儿。 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| Prompt | +参考图像 | +EcomID | +InstantID | +PuLID | +
|---|---|---|---|---|
| 一个双辫小女孩的特写肖像,穿着白色裙子,傍晚在海滩上。 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| 一个非常小的女孩,双辫,带着 |
+ ![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| 一个满脸胡茬的侦探,戴着帽子,阴影投在他方形的下巴上,嘴里叼着一根香烟,他的风衣唤起了电影黑色风格,在一个阴雨小巷里。 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| 一个微笑的女孩,齐刘海和长发,穿着校服,站在樱花树下,手里拿着一本书。 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| 一个 |
+ ![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| 一个身穿赛博朋克风格的男人:霓虹配件,反光太阳镜,和带有发光电路图案的皮夹克。他在湿润的城市风貌中冷静地站着。 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| SDXL 模型 | +Prompt | +参考图像 | +EcomID | +InstantID | +PuLID | +
|---|---|---|---|---|---|
| sd-xl-base-1.0 | +女孩,单独,棕色头发,手里抱着一个小泰迪熊,穿着校服,站在图书馆里,卡通风格。 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| EcomXL | +一个非常小的女孩的特写肖像,双辫,带着 |
+ ![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| DreamShaperXL | +单独,面向观众,微笑,棕色头发,上半身,开衫,牙齿,打开的外套,黑色夹克,模糊背景,真实感 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+
| leosam_xl_v7 | +一个特写肖像,女孩,单独,裙子,珠宝,海滩和大海,粉色裙子,真实感。 | +![]() |
+ ![]() |
+ ![]() |
+ ![]() |
+