RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functionsto wrap the same part of your model, it would result in the same setof parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases indefault. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations. Parameter at index447withname base_model.model.model.layers.31.mlp.down_proj.lora_B.default.weight has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration.
fp32_param = safe_get_full_fp32_param(param) if fp32_param isNone: print("Skip a param with None fp32_param:", param) continue lora_param = fp32_param.clone()
跳过了有问题的LoRA参数,顺利继续训练。*
训练完成
1 2 3 4
{'train_runtime': 4772.0564, 'train_samples_per_second': 6.287, 'train_steps_per_second': 0.786, 'train_loss': 0.24712346842542562, 'epoch': 3.0} 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 3750/3750 [1:19:31<00:00, 1.27s/it] Some non-default generation parameters are setin the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exceptionin v4.41. Non-default generation parameters: {'max_length': 4096}
评估模型的过程也很波折,首先是由于transformer和tokenizer的版本冲突问题,使用AutoModelForCausalLM来导入llava模型失败,改用llava自己的LlavaLlamaForCausalLM来导入,随后if 'llava' in model_name.lower():判断失效,改成if 'llava' in model_name.lower() or 'llava' not in model_name.lower():
然后魔改强制加载image_processor