Flask在线部署ChatGLM2大模型
1、 拉取镜像
docker pull swr.cn-central-221.ovaijisuan.com/mindformers/mindformers_dev_mindspore_2_0:mindformers_0.6.0dev_20230616_py39_37
2、 新建docker.sh
-p 8000:8000 是宿主机映射到镜像8000端口
如果添加–ipc=host --net=host 会和-p冲突
# --device用于控制指定容器的运行NPU卡号和范围
# -v 用于映射容器外的目录
# --name 用于自定义容器名称
docker run -it -u root -p 8080:8080
--device=/dev/davinci0
--device=/dev/davinci1
--device=/dev/davinci2
--device=/dev/davinci3
--device=/dev/davinci4
--device=/dev/davinci5
--device=/dev/davinci6
--device=/dev/davinci7
--device=/dev/davinci_manager
--device=/dev/devmm_svm
--device=/dev/hisi_hdc
-v /etc/localtime:/etc/localtime
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver
-v /var/log/npu/:/usr/slog
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
--name 8080-test -v /home/:/home/
swr.cn-central-
221.ovaijisuan.com/mindformers/mindformers_dev_mindspore_2_0:mindformers_0.6.0dev_2023061
6_py39_37
/bin/bash
然后启动该sh文件后
docker ps查看,映射镜像8080端口供外网访问
编写api调用代码,并开放flask8080端口
from flask import Flask, request
import json
app = Flask(__name__)
from mindformers import AutoConfig, AutoModel, AutoTokenizer
import mindspore as ms
ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend", device_id=0)
config = AutoConfig.from_pretrained("glm2_6b")
config.checkpoint_name_or_path = "../mindformers/checkpoint_download/glm2/glm2_6b.ckpt"
model = AutoModel.from_config(config)
tokenizer = AutoTokenizer.from_pretrained("glm2_6b")
@app.route('/glm2_bot', methods=['POST'])
def say_hello_func():
print("----------- in hello func ----------")
data = json.loads(request.get_data(as_text=True))
text = data['text']
inputs = tokenizer(tokenizer.build_prompt(text))["input_ids"]
print(tokenizer.decode(inputs))
outputs = model.generate(inputs, max_length=14096)
outputs_text = tokenizer.decode(outputs)
return json.dumps({"response":outputs_text}, ensure_ascii=False,indent=4)
@app.route('/goodbye', methods=['GET'])
def say_goodbye_func():
print("----------- in goodbye func ----------")
return 'nGoodbye!n'
@app.route('/', methods=['POST'])
def default_func():
print("----------- in default func ----------")
data = json.loads(request.get_data(as_text=True))
return 'n called default func !n {} n'.format(str(data))
# host must be "0.0.0.0", port must be 8080
if __name__ == '__main__':
app.run(host="0.0.0.0", port=8080)
本地curl调用:
curl -i -k -H 'Accept:application/json' -H 'Content-Type:application/json;charset=utf8' -
X POST -d'{"text":"你好"}' http://172.17.0.2:8080/glm2_bot
Tips:
在服务端开放默认8080常规接口会遭受到扫描攻击,建议修改该端口或者添加ip白名单限制访问