update readme

This commit is contained in:
ai-modelscope
2024-10-15 17:44:00 +08:00
parent 672f1a6beb
commit c0c78aeff3
11 changed files with 2376 additions and 56 deletions

28
.gitattributes vendored
View File

@ -1,38 +1,38 @@
*.7z filter=lfs diff=lfs merge=lfs -text
*.arrow filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.bin.* filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.ftz filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.h5 filter=lfs diff=lfs merge=lfs -text
*.joblib filter=lfs diff=lfs merge=lfs -text
*.lfs.* filter=lfs diff=lfs merge=lfs -text
*.mlmodel filter=lfs diff=lfs merge=lfs -text
*.model filter=lfs diff=lfs merge=lfs -text
*.msgpack filter=lfs diff=lfs merge=lfs -text
*.npy filter=lfs diff=lfs merge=lfs -text
*.npz filter=lfs diff=lfs merge=lfs -text
*.onnx filter=lfs diff=lfs merge=lfs -text
*.ot filter=lfs diff=lfs merge=lfs -text
*.parquet filter=lfs diff=lfs merge=lfs -text
*.pb filter=lfs diff=lfs merge=lfs -text
*.pickle filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.pt filter=lfs diff=lfs merge=lfs -text
*.pth filter=lfs diff=lfs merge=lfs -text
*.rar filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
*.tar.* filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.tflite filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.wasm filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zstandard filter=lfs diff=lfs merge=lfs -text
*.tfevents* filter=lfs diff=lfs merge=lfs -text
*.db* filter=lfs diff=lfs merge=lfs -text
*.ark* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
**/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
*.safetensors filter=lfs diff=lfs merge=lfs -text
*.ckpt filter=lfs diff=lfs merge=lfs -text
*.gguf* filter=lfs diff=lfs merge=lfs -text
*.ggml filter=lfs diff=lfs merge=lfs -text
*.llamafile* filter=lfs diff=lfs merge=lfs -text
*.pt2 filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
*tfevents* filter=lfs diff=lfs merge=lfs -text
T2I.png filter=lfs diff=lfs merge=lfs -text
inpaint.png filter=lfs diff=lfs merge=lfs -text
diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text

119
README.md
View File

@ -1,47 +1,82 @@
---
license: Apache License 2.0
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
language:
- en
base_model: black-forest-labs/FLUX.1-dev
library_name: diffusers
tags:
- Text-to-Image
- FLUX
- Stable Diffusion
pipeline_tag: text-to-image
---
### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。
#### 您可以通过如下git clone命令或者ModelScope SDK来下载模型
SDK下载
```bash
#安装ModelScope
pip install modelscope
```
```python
#SDK模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('alimama-creative/FLUX.1-Turbo-Alpha')
```
Git下载
```
#Git模型下载
git clone https://www.modelscope.cn/alimama-creative/FLUX.1-Turbo-Alpha.git
<div style="display: flex; justify-content: center; align-items: center;">
<img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
<img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
</div>
[中文版Readme](./README_ZH.md)
This repository provides a 8-step distilled lora for [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) model released by AlimamaCreative Team.
# Description
This checkpoint is a 8-step distilled Lora, trained based on FLUX.1-dev model. We use a multi-head discriminator to improve the distill quality. Our model can be used for T2I, inpainting controlnet and other FLUX related models. The recommended guidance_scale=3.5 and lora_scale=1. Our Lower steps version will release later.
- Text-to-Image.
![](./images/T2I.png)
- With [alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta](https://huggingface.co/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta). Our distilled lora can be well adapted to the Inpainting controlnet, and the accelerated generated effect can follow the original output well.
![](./images/inpaint.png)
# How to use
## diffusers
This model can be used ditrectly with diffusers
```json
import torch
from diffusers.pipelines import FluxPipeline
model_id = "black-forest-labs/FLUX.1-dev"
adapter_id = "alimama-creative/FLUX.1-Turbo-Alpha"
pipe = FluxPipeline.from_pretrained(
model_id,
torch_dtype=torch.bfloat16
)
pipe.to("cuda")
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()
prompt = "A DSLR photo of a shiny VW van that has a cityscape painted on it. A smiling sloth stands on grass in front of the van and is wearing a leather jacket, a cowboy hat, a kilt and a bowtie. The sloth is holding a quarterstaff and a big book."
image = pipe(
prompt=prompt,
guidance_scale=3.5,
height=1024,
width=1024,
num_inference_steps=8,
max_sequence_length=512).images[0]
```
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
## comfyui
- T2I turbo workflow: [click here](./workflows/t2I_flux_turbo.json)
- Inpainting controlnet turbo workflow: [click here](./workflows/alimama_flux_inpainting_turbo_8step.json)
# Training Details
The model is trained on 1M open source and internal sources images, with the aesthetic 6.3+ and resolution greater than 800. We use adversarial training to improve the quality. Our method fix the original FLUX.1-dev transformer as the discriminator backbone, and add multi heads to every transformer layer. We fix the guidance scale as 3.5 during training, and use the time shift as 3.
Mixed precision: bf16
Learning rate: 2e-5
Batch size: 64
Image size: 1024x1024

81
README_ZH.md Normal file
View File

@ -0,0 +1,81 @@
---
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md
language:
- en
base_model: black-forest-labs/FLUX.1-dev
library_name: diffusers
tags:
- Text-to-Image
- FLUX
- Stable Diffusion
pipeline_tag: text-to-image
---
<div style="display: flex; justify-content: center; align-items: center;">
<img src="./images/images_alibaba.png" alt="alibaba" style="width: 20%; height: auto; margin-right: 5%;">
<img src="./images/images_alimama.png" alt="alimama" style="width: 20%; height: auto;">
</div>
本仓库包含了由阿里妈妈创意团队开发的基于[FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)模型的8步蒸馏版。
# 介绍
该模型是基于FLUX.1-dev模型的8步蒸馏版lora。我们使用特殊设计的判别器来提高蒸馏质量。该模型可以用于T2I、Inpainting controlnet和其他FLUX相关模型。建议guidance_scale=3.5和lora_scale=1。我们的更低步数的版本将在后续发布。
- Text-to-Image.
![](./images/T2I.png)
- 配合[alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta](https://huggingface.co/alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Beta)。我们模型可以很好地适配Inpainting controlnet并与原始输出保持相似的结果。
![](./images/inpaint.png)
# 使用指南
## diffusers
该模型可以直接与diffusers一起使用
```python
import torch
from diffusers.pipelines import FluxPipeline
model_id = "black-forest-labs/FLUX.1-dev"
adapter_id = "alimama-creative/FLUX.1-Turbo-Alpha"
pipe = FluxPipeline.from_pretrained(
model_id,
torch_dtype=torch.bfloat16
)
pipe.to("cuda")
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()
prompt = "A DSLR photo of a shiny VW van that has a cityscape painted on it. A smiling sloth stands on grass in front of the van and is wearing a leather jacket, a cowboy hat, a kilt and a bowtie. The sloth is holding a quarterstaff and a big book."
image = pipe(
prompt=prompt,
guidance_scale=3.5,
height=1024,
width=1024,
num_inference_steps=8,
max_sequence_length=512).images[0]
```
## comfyui
- 文生图加速链路: [点击这里](./workflows/t2I_flux_turbo.json)
- Inpainting controlnet 加速链路: [点击这里](./workflows/alimama_flux_inpainting_turbo_8step.json)
# 训练细节
该模型在1M公开数据集和内部源图片上进行训练这些数据美学评分6.3+而且分辨率大于800。我们使用对抗训练来提高质量我们的方法将原始FLUX.1-dev transformer固定为判别器的特征提取器并在每个transformer层中添加判别头网络。在训练期间我们将guidance scale固定为3.5并使用时间偏移量3。
混合精度: bf16
学习率: 2e-5
批大小: 64
训练分辨率: 1024x1024

1
configuration.json Normal file
View File

@ -0,0 +1 @@
{"framework": "pytorch", "task": "text-to-image", "allow_remote": true}

BIN
diffusion_pytorch_model.safetensors (Stored with Git LFS) Normal file

Binary file not shown.

BIN
images/T2I.png (Stored with Git LFS) Normal file

Binary file not shown.

BIN
images/images_alibaba.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

BIN
images/images_alimama.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

BIN
images/inpaint.png (Stored with Git LFS) Normal file

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,528 @@
{
"last_node_id": 106,
"last_link_id": 196,
"nodes": [
{
"id": 4,
"type": "DualCLIPLoader",
"pos": {
"0": -182.46112060546875,
"1": 35.274688720703125
},
"size": {
"0": 315,
"1": 106
},
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "CLIP",
"type": "CLIP",
"links": [
2,
27
],
"slot_index": 0,
"shape": 3,
"label": "CLIP"
}
],
"properties": {
"Node name for S&R": "DualCLIPLoader"
},
"widgets_values": [
"clip_l.safetensors",
"t5xxl_fp16.safetensors",
"flux"
]
},
{
"id": 7,
"type": "VAEDecode",
"pos": {
"0": 1028,
"1": -107
},
"size": {
"0": 210,
"1": 46
},
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 6,
"slot_index": 0,
"label": "samples"
},
{
"name": "vae",
"type": "VAE",
"link": 7,
"label": "vae"
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
79
],
"slot_index": 0,
"shape": 3,
"label": "IMAGE"
}
],
"properties": {
"Node name for S&R": "VAEDecode"
}
},
{
"id": 6,
"type": "EmptyLatentImage",
"pos": {
"0": 665,
"1": -145
},
"size": {
"0": 315,
"1": 106
},
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
177
],
"slot_index": 0,
"shape": 3,
"label": "LATENT"
}
],
"properties": {
"Node name for S&R": "EmptyLatentImage"
},
"widgets_values": [
832,
1248,
1
]
},
{
"id": 19,
"type": "CLIPTextEncodeFlux",
"pos": {
"0": 206,
"1": 116
},
"size": {
"0": 400,
"1": 200
},
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 27,
"slot_index": 0,
"label": "clip"
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
26
],
"slot_index": 0,
"shape": 3,
"label": "CONDITIONING"
}
],
"properties": {
"Node name for S&R": "CLIPTextEncodeFlux"
},
"widgets_values": [
"",
"(bad hand,bad finger),logo,Backlight,nsfw,(worst quality,low resolution,bad hands),distorted,twisted,watermark,",
3.5
]
},
{
"id": 5,
"type": "CLIPTextEncodeFlux",
"pos": {
"0": 202,
"1": -146
},
"size": {
"0": 400,
"1": 200
},
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 2,
"slot_index": 0,
"label": "clip"
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
18
],
"slot_index": 0,
"shape": 3,
"label": "CONDITIONING"
}
],
"properties": {
"Node name for S&R": "CLIPTextEncodeFlux"
},
"widgets_values": [
"((Asian Face)), baby girl, age 2, , , , , , , , , , (falling white curtain as background, minimalist, white tone, very soft, bright), photography, masterpiece, best quality, 8K, HDR, highres, front to camera",
"((Asian Face)), baby girl, age 2, , , , , , , , , , (falling white curtain as background, minimalist, white tone, very soft, bright), photography, masterpiece, best quality, 8K, HDR, highres, front to camera",
3.5
]
},
{
"id": 55,
"type": "UNETLoader",
"pos": {
"0": -177,
"1": 204
},
"size": {
"0": 308.9964904785156,
"1": 83.4256591796875
},
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
195
],
"slot_index": 0,
"shape": 3,
"label": "MODEL"
}
],
"properties": {
"Node name for S&R": "UNETLoader"
},
"widgets_values": [
"flux1-dev-fp8.safetensors",
"fp8_e4m3fn"
]
},
{
"id": 106,
"type": "LoraLoaderModelOnly",
"pos": {
"0": -184,
"1": 375
},
"size": {
"0": 315,
"1": 82
},
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 195
}
],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
196
],
"shape": 3,
"slot_index": 0
}
],
"properties": {
"Node name for S&R": "LoraLoaderModelOnly"
},
"widgets_values": [
"flux_turbo_v1_1.safetensors",
1
]
},
{
"id": 8,
"type": "VAELoader",
"pos": {
"0": -179.46112060546875,
"1": -70.72531127929688
},
"size": {
"0": 315,
"1": 58
},
"flags": {},
"order": 3,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "VAE",
"type": "VAE",
"links": [
7
],
"slot_index": 0,
"shape": 3,
"label": "VAE"
}
],
"properties": {
"Node name for S&R": "VAELoader"
},
"widgets_values": [
"ae.safetensors"
]
},
{
"id": 3,
"type": "XlabsSampler",
"pos": {
"0": 654,
"1": 12
},
"size": {
"0": 342.5999755859375,
"1": 282
},
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 196,
"slot_index": 0,
"label": "model"
},
{
"name": "conditioning",
"type": "CONDITIONING",
"link": 18,
"label": "conditioning"
},
{
"name": "neg_conditioning",
"type": "CONDITIONING",
"link": 26,
"label": "neg_conditioning"
},
{
"name": "latent_image",
"type": "LATENT",
"link": 177,
"label": "latent_image"
},
{
"name": "controlnet_condition",
"type": "ControlNetCondition",
"link": null,
"label": "controlnet_condition"
}
],
"outputs": [
{
"name": "latent",
"type": "LATENT",
"links": [
6
],
"shape": 3,
"label": "latent"
}
],
"properties": {
"Node name for S&R": "XlabsSampler"
},
"widgets_values": [
24,
"fixed",
8,
1,
2,
0,
1
]
},
{
"id": 21,
"type": "PreviewImage",
"pos": {
"0": 1026,
"1": 19
},
"size": {
"0": 210,
"1": 318
},
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 79,
"slot_index": 0,
"label": "images"
}
],
"outputs": [],
"title": "t2i output",
"properties": {
"Node name for S&R": "PreviewImage"
}
}
],
"links": [
[
2,
4,
0,
5,
0,
"CLIP"
],
[
6,
3,
0,
7,
0,
"LATENT"
],
[
7,
8,
0,
7,
1,
"VAE"
],
[
18,
5,
0,
3,
1,
"CONDITIONING"
],
[
26,
19,
0,
3,
2,
"CONDITIONING"
],
[
27,
4,
0,
19,
0,
"CLIP"
],
[
79,
7,
0,
21,
0,
"IMAGE"
],
[
177,
6,
0,
3,
3,
"LATENT"
],
[
195,
55,
0,
106,
0,
"MODEL"
],
[
196,
106,
0,
3,
0,
"MODEL"
]
],
"groups": [
{
"title": "Load Model",
"bounding": [
-210,
-187,
371,
700
],
"color": "#3f789e",
"font_size": 24,
"flags": {}
}
],
"config": {},
"extra": {
"ds": {
"scale": 1.1918176537727374,
"offset": [
438.12831553640723,
376.2590792694179
]
}
},
"version": 0.4
}