-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Document] 更新Mac部署说明 #899
[Document] 更新Mac部署说明 #899
Conversation
Update PROJECT.md
[Document] 更新Mac部署 - FILE: Readme.md - ADD: OPENMP; MPS
[Document] 更新Mac部署 - FILE: README.md; README_en.md - ADD: OPENMP; MPS # 具体内容 以[chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4)量化模型为例,做如下配置: - 安装libomp的步骤; - 对量化后模型等配置gcc编译项; - 量化后模型启用MPS的解释。
[Document] 更新Mac部署 - FILE: README.md; README_en.md - ADD: OPENMP; MPS # 具体内容 以[chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4)量化模型为例,做如下配置: - 安装libomp的步骤; - 对量化后模型等配置gcc编译项; - 量化后模型启用MPS的解释。
[Document] 更新Mac部署
[Document] 更新Mac部署
[Document] 更新Mac部署 - FILE: README.md/README_end.md - ADD: OPENMP; MPS # 具体内容 以[chatglm-6b-int4](https://huggingface.co/THUDM/chatglm-6b-int4)量化模型为例,做如下配置: - 安装libomp的步骤; - 对量化后模型等配置gcc编译项; - 量化后模型启用MPS的解释; - 缩短文本长度。
[Document] 更新Mac部署
我的系统也是 MacOS 13.3.1的,用半精度进行 MPS 计算没有问题。你用半精度计算会报什么错? |
# eg: web_demo.py
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).float().to('mps')
model = model.eval() 非量化的issue-462里面你也回复了(6B没啥问题,只是chatglm-6b-int4会有问题),原因在quantization_code 这个文件(bz2了一个ELF/so文件)里面是NV的,当前mps不起作用。至于这个要启用的话,量化的代码估计要动很多。 error log
--- Logging error ---
Traceback (most recent call last):
File "/Users/yifanyang/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 19, in
from cpm_kernels.kernels.base import LazyKernelCModule, KernelFunction, round_up
File "/Users/yifanyang/miniconda3/lib/python3.10/site-packages/cpm_kernels/__init__.py", line 1, in
from . import library
File "/Users/yifanyang/miniconda3/lib/python3.10/site-packages/cpm_kernels/library/__init__.py", line 1, in
from . import nvrtc
File "/Users/yifanyang/miniconda3/lib/python3.10/site-packages/cpm_kernels/library/nvrtc.py", line 5, in
nvrtc = Lib("nvrtc")
File "/Users/yifanyang/miniconda3/lib/python3.10/site-packages/cpm_kernels/library/base.py", line 59, in __init__
raise RuntimeError("Unknown platform: %s" % sys.platform)
RuntimeError: Unknown platform: darwin
During handling of the above exception, another exception occurred: Traceback (most recent call last): To create a public link, set |
需要从 |
用了最新(2023/05)的
torch==2.0.0中(大陆广泛使用的anaconda镜像没有同步
---------------- error logs ---------------- loc("varianceEps"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/97f6331a-ba75-11ed-a4bc-863efbbaf80d/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":228:0)): error: input types 'tensor<1x5x1xf16>' and 'tensor<1xf32>' are not broadcast compatible |
能跑但是很慢,请问要怎么解决,mac m1 |
只用了CPU如题,这个issue上文解释了为什么量化模型MPS调用有问题 内存不够可能要看下内存(显存)。我的M1配置是64GB(Macbook pro M1 Max)和128GB(Mac Studio),会观测到显存占用是比较高的,但(上下文)token数没那么大的时候,问题不大。 运行的时候监控下内存占用,比如: while :; do clear; top -l 1 | grep "python" | awk '{print "MEM="$9 "\tRPRVT="$10}'; sleep 2; done 把里面的 至于细节,需要更多日志看了。Mac运行推理只是一个可行解而已。 综上,可能原因:
|
更新Mac部署说明
具体更新内容
以chatglm-6b-int4量化模型为例,做如下配置:
Mac 启用OMP涉及
https://huggingface.co/THUDM/chatglm-6b-int4
中quantization.py
的修改由于需要手动安装一些依赖,不单独commit,而直接描述在了说明中。已经验证环境:
Mac M1 Ultra 128GB
Mac OS: 13.3.1
GCC: Apple clang version 14.0.3 (clang-1403.0.22.14.1)
conda 23.3.1
torch (two versions, with MPS)