diff --git a/ICEdit-MoE-LoRA.safetensors b/ICEdit-MoE-LoRA.safetensors new file mode 100644 index 0000000..dda2217 --- /dev/null +++ b/ICEdit-MoE-LoRA.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa03f92c4f1ffb5c3107236c314ef1c2872e6485544130bf144331cb233aba58 +size 134 diff --git a/README.md b/README.md index cd8363b..921029d 100644 --- a/README.md +++ b/README.md @@ -1,47 +1,179 @@ --- -license: Apache License 2.0 - -#model-type: -##如 gpt、phi、llama、chatglm、baichuan 等 -#- gpt - -#domain: -##如 nlp、cv、audio、multi-modal -#- nlp - -#language: -##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa -#- cn - -#metrics: -##如 CIDEr、Blue、ROUGE 等 -#- CIDEr - -#tags: -##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他 -#- pretrained - -#tools: -##如 vllm、fastchat、llamacpp、AdaSeq 等 -#- vllm +license: apache-2.0 +datasets: +- osunlp/MagicBrush +- TIGER-Lab/OmniEdit-Filtered-1.2M +language: +- en +base_model: +- black-forest-labs/FLUX.1-Fill-dev +pipeline_tag: image-to-image +library_name: diffusers +tags: +- art --- -### 当前模型的贡献者未提供更加详细的模型介绍。模型文件和权重,可浏览“模型文件”页面获取。 -#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型 +
+ We present In-Context Edit, a novel approach that achieves state-of-the-art instruction-based editing using just 0.5% of the training data and 1% of the parameters required by prior SOTA methods. The first row illustrates a series of multi-turn edits, executed with high precision, while the second and third rows highlight diverse, visually impressive single-turn editing results from our method.
+如果您是本模型的贡献者,我们邀请您根据模型贡献文档,及时完善模型卡片内容。
\ No newline at end of file +## Download pretrained weights + +If you can connect to Huggingface, you don't need to download the weights. Otherwise, you need to download the weights to local. + +- [Flux.1-fill-dev](https://huggingface.co/black-forest-labs/flux.1-fill-dev). +- [ICEdit-MoE-LoRA](https://huggingface.co/sanaka87/ICEdit-MoE-LoRA). + +## Inference in bash (w/o VLM Inference-time Scaling) + +Now you can have a try! + +> Our model can **only edit images with a width of 512 pixels** (there is no restriction on the height). If you pass in an image with a width other than 512 pixels, the model will automatically resize it to 512 pixels. + +> If you found the model failed to generate the expected results, please try to change the `--seed` parameter. Inference-time Scaling with VLM can help much to improve the results. + +```bash +python scripts/inference.py --image assets/girl.png \ + --instruction "Make her hair dark green and her clothes checked." \ + --seed 42 \ +``` + +Editing a 512×768 image requires 35 GB of GPU memory. If you need to run on a system with 24 GB of GPU memory (for example, an NVIDIA RTX3090), you can add the `--enable-model-cpu-offload` parameter. + +```bash +python scripts/inference.py --image assets/girl.png \ + --instruction "Make her hair dark green and her clothes checked." \ + --enable-model-cpu-offload +``` + +If you have downloaded the pretrained weights locally, please pass the parameters during inference, as in: + +```bash +python scripts/inference.py --image assets/girl.png \ + --instruction "Make her hair dark green and her clothes checked." \ + --flux-path /path/to/flux.1-fill-dev \ + --lora-path /path/to/ICEdit-MoE-LoRA +``` + +## Inference in Gradio Demo + +We provide a gradio demo for you to edit images in a more user-friendly way. You can run the following command to start the demo. + +```bash +python scripts/gradio_demo.py --port 7860 +``` + +Like the inference script, if you want to run the demo on a system with 24 GB of GPU memory, you can add the `--enable-model-cpu-offload` parameter. And if you have downloaded the pretrained weights locally, please pass the parameters during inference, as in: + +```bash +python scripts/gradio_demo.py --port 7860 \ + --flux-path /path/to/flux.1-fill-dev (optional) \ + --lora-path /path/to/ICEdit-MoE-LoRA (optional) \ + --enable-model-cpu-offload (optional) \ +``` + +Then you can open the link in your browser to edit images. + +### 🎨 Enjoy your editing! + + + +# Comparison with Commercial Models + +
+ Compared with commercial models such as Gemini and GPT-4o, our methods are comparable to and even superior to these commercial models in terms of character ID preservation and instruction following. We are more open-source than them, with lower costs, faster speed (it takes about 9 seconds to process one image), and powerful performance.
+