Update README.md

2026-04-02 12:22:53 +08:00 · 2024-10-24 13:04:22 +08:00
parent eb1df5e44e
commit 6ce0f40291
57 changed files with 1351 additions and 57 deletions
--- a/README.md
+++ b/README.md
@ -1,47 +1,187 @@
 ---
-license: Apache License 2.0
-
-#model-type:
-##如 gpt、phi、llama、chatglm、baichuan 等
-#- gpt
-
-#domain:
-##如 nlp、cv、audio、multi-modal
-#- nlp
-
-#language:
-##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
-#- cn 
-
-#metrics:
-##如 CIDEr、Blue、ROUGE 等
-#- CIDEr
-
-#tags:
-##各种自定义，包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
-#- pretrained
-
-#tools:
-##如 vllm、fastchat、llamacpp、AdaSeq 等
-#- vllm
+license: apache-2.0
 ---
-### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重，可浏览“模型文件”页面获取。
-#### 您可以通过如下git clone命令，或者ModelScope SDK来下载模型
+<div style="display: flex; justify-content: center; align-items: center;">
+  <img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
+  <img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
+</div>

-SDK下载
-```bash
-#安装ModelScope
-pip install modelscope
-```
-```python
-#SDK模型下载
-from modelscope import snapshot_download
-model_dir = snapshot_download('alimama-creative/SDXL-EcomID')
-```
-Git下载
-```
-#Git模型下载
-git clone https://www.modelscope.cn/alimama-creative/SDXL-EcomID.git
-```
+[中文版Readme](./README_ZH.md)

-<p style="color: lightgrey;">如果您是本模型的贡献者，我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>，及时完善模型卡片内容。</p>
+EcomID aims to generate customized images from a single reference ID image, ensuring strong semantic consistency while being controlled by keypoints.
+
+This repository provides the EcomID method and model, combining the strengths of [PuLID](https://github.com/ToTheBeginning/PuLID) and [InstantID](https://github.com/instantX-research/InstantID) for better background consistency, facial keypoint control, and realistic facial representation with improved similarity.
+
+# EcomID Overview
+
+## EcomID Structure
+  <img src="./images/overflow.png" alt="alibaba" style="width: 100%; height: auto; margin-right: 5%;">
+
+
+- **IP-Adapter of PuLID**: EcomID incorporates the ID-Encoder and cross-attention components from PuLID, trained with alignment loss. 
+This method effectively reduces the interference of ID embeddings on text embeddings within the cross-attention part, minimizing disruption to the underlying model's text-to-image capabilities.
+- **InstantID’s IdentityNet Architecture**: Utilizing **a dataset of 2 million aesthetically pleasing portrait images**, IdentityNet enhances keypoint control, improving ID consistency and facial realism. During training, the IP-adapter is frozen, and only the IdentityNet is trained. Facial landmarks are used as conditional inputs, while face embeddings are integrated into IdentityNet via cross-attention.
+
+# Show Cases
+## Comparison with Other Methods
+### 1、Preserved Text-to-Image Capability
+
+<table>
+    <tr>
+        <th style="width: 28%;">Prompt</th>
+        <th style="width: 24%;">Reference Image</th>
+        <th style="width: 24%;">EcomID</th>
+        <th style="width: 24%;">InstantID</th>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">girl, white skin, black hair, long wavy hair, <span style="color:red"><strong>in European style living room, Retro tone, decorations</strong></span>, depth of field.</td>
+        <td><img src="images/show_case/50.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/49.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/48.png" alt="InstantID图像" width="100%"></td>
+    </tr>
+<table>
+
+As shown above, EcomID ***preserves background generation abilities while minimizing stylization, greatly enhancing realism***. 
+The visualizations highlight more authentic portraits with improved background semantic consistency, showcasing EcomID's advantage in generating realistic images.
+
+### 2、Improved Facial Control and Consistency
+<table>
+    <tr>
+        <th style="width: 24%;">Prompt</th>
+        <th style="width: 19%;">Reference Image</th>
+        <th style="width: 19%;">EcomID</th>
+        <th style="width: 19%;">InstantID</th>
+        <th style="width: 19%;">PuLID</th>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">A close-up portrait of a man standing in the library, holding <span style="color:red"><strong>two smiling toddlers</strong></span> next to him.</td>
+        <td><img src="images/show_case/20.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/17.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/18.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/19.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+<table>
+
+As shown above, EcomID employs keypoints as conditional inputs for training, ***allowing for precise adjustments of facial positions, sizes, and orientations***. This capability ensures that the generated portraits are more controllable while further enhancing facial similarity and the overall quality of the images.
+
+### More showcases
+EcomID enhances portrait representation, delivering a more authentic and aesthetically pleasing appearance while ensuring semantic consistency and greater internal ID similarity (i.e., traits that do not vary with age, hairstyle, glasses, or other physical changes).
+
+<table>
+    <tr>
+        <th style="width: 24%;">Prompt</th>
+        <th style="width: 19%;">Reference Image</th>
+        <th style="width: 19%;">EcomID</th>
+        <th style="width: 19%;">InstantID</th>
+        <th style="width: 19%;">PuLID</th>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">A close-up portrait of a <span style="color:red"><strong>little girl with double braids</strong></span>, wearing a white dress, standing on the beach during sunset.</td>
+        <td><img src="images/show_case/21.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/22.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/23.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/24.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
+        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">Agrizzled detective, <span style="color:red"><strong>fedora</strong></span> casting a shadow over his square jaw, a <span style="color:red"><strong>cigar dangling from his lips</strong></span>, his trench coat evocative of film noir, in a <span style="color:red"><strong>rainy alley</strong></span>.</td>
+        <td><img src="images/show_case/25.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/26.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/27.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/28.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">A smiling girl with <span style="color:red"><strong>bangs and long hair</strong></span> in a school uniform stands under cherry trees, holding a book.</td>
+        <td><img src="images/show_case/29.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/30.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/31.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/32.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">A <span style="color:red"><strong>very old</strong></span> witch, wearing a black cloak, with a pointed hat, holding a magic wand, against a background of a misty forest.</td>
+        <td><img src="images/show_case/33.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/34.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/35.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/36.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td style="font-size: 12px;">A man clad in cyberpunk fashion: <span style="color:red"><strong>neon accents, reflective sunglasses，</strong></span> and a leather jacket with glowing circuit patterns. He stands stoically amidst a soaked cityscape.</td>
+        <td><img src="images/show_case/37.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/38.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/39.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/40.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+
+</table>
+
+### More Base Models, Resolutions, and Styles
+<table>
+    <tr>
+        <th style="width: 12%;">SDXL models</th>
+        <th style="width: 24%;">Prompt</th>
+        <th style="width: 16%;">Reference Image</th>
+        <th style="width: 16%;">EcomID</th>
+        <th style="width: 16%;">InstantID</th>
+        <th style="width: 16%;">PuLID</th>
+    </tr>
+    <tr>
+        <td>sd-xl-base-1.0</td>
+        <td style="font-size: 12px;">girl, solo, brown hair, holding a little teddy bear on her hands, wearing a school uniform, standing in the library, <span style="color:red"><strong>cartoon style</strong></span>.</td>
+        <td><img src="images/show_case/1.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/2.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/3.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/4.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td>EcomXL</td>
+        <td style="font-size: 12px;">A close-up portrait of a <span style="color:red"><strong>very little girl</strong></span> with double braids, wearing <span style="color:red"><strong>a hat</strong></span> and white dress, standing on the beach during sunset.</td>
+        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/47.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/46.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/45.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td>DreamShaperXL</td>
+        <td style="font-size: 12px;">solo, looking_at_viewer, smile, brown_hair, upper_body, open_clothes, teeth, open_jacket, black_jacket, blurry_background, realistic</td>
+        <td><img src="images/show_case/44.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/6.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/7.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/8.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+    <tr>
+        <td>leosam_xl_v7</td>
+        <td style="font-size: 12px;">A close-up portrait of a girl, solo, dress, jewelry, beach and sea, pink_dress, realistic.</td>
+        <td><img src="images/show_case/9.png" alt="参考图像" width="100%"></td>
+        <td><img src="images/show_case/15.png" alt="EcomID图像" width="100%"></td>
+        <td><img src="images/show_case/14.png" alt="InstantID图像" width="100%"></td>
+        <td><img src="images/show_case/16.png" alt="PuLID图像" width="100%"></td>
+    </tr>
+</table>
+
+### Notes
+- Unless otherwise specified, the showcases are generated using the base model EcomXL, which is also highly compatible with various other SDXL-based models, such as [leosams-helloworld-xl](https://civitai.com/models/43977/leosams-helloworld-xl), [dreamshaper-xl](https://civitai.com/models/112902/dreamshaper-xl), [stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and so on.
+- It works very well with SDXL Turbo/Lighting, [EcomXL Inpainting ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_inpaint) and [EcomXL Softedge ControlNet](https://huggingface.co/alimama-creative/EcomXL_controlnet_softedge).
+
+# How to use
+
+## ComfyUI
+
+- The EcomID_ComfyUI node has been released: [click here](https://github.com/alimama-creative/SDXL_EcomID_ComfyUI)
+
+# Training Details
+
+The model is trained on 2M Taobao images, where the proportion of human faces is greater than 3%. The images have a resolution greater than 800, and the aesthetic score is above 5.5.
+
+Mixed precision: fp16
+
+Learning rate: 1e-4
+
+Batch size: 2
+
+Image size: 1024x1024