Flask在线部署ChatGLM2大模型

1、 拉取镜像

docker pull swr.cn-central-221.ovaijisuan.com/mindformers/mindformers_dev_mindspore_2_0:mindformers_0.6.0dev_20230616_py39_37

2、 新建docker.sh

-p 8000:8000 是宿主机映射到镜像8000端口

如果添加–ipc=host --net=host 会和-p冲突

# --device用于控制指定容器的运行NPU卡号和范围

# -v 用于映射容器外的目录

# --name 用于自定义容器名称

docker run -it -u root -p 8080:8080 

--device=/dev/davinci0 

--device=/dev/davinci1 

--device=/dev/davinci2 

--device=/dev/davinci3 

--device=/dev/davinci4 

--device=/dev/davinci5 

--device=/dev/davinci6 

--device=/dev/davinci7 

--device=/dev/davinci_manager 

--device=/dev/devmm_svm 

--device=/dev/hisi_hdc 

-v /etc/localtime:/etc/localtime 

-v /usr/local/Ascend/driver:/usr/local/Ascend/driver 

-v /var/log/npu/:/usr/slog 

-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi 

--name 8080-test -v /home/:/home/ 

swr.cn-central-

221.ovaijisuan.com/mindformers/mindformers_dev_mindspore_2_0:mindformers_0.6.0dev_2023061

6_py39_37 

/bin/bash

然后启动该sh文件后

docker ps查看,映射镜像8080端口供外网访问

编写api调用代码,并开放flask8080端口

from flask import Flask, request
import json

app = Flask(__name__)





from mindformers import AutoConfig, AutoModel, AutoTokenizer
import mindspore as ms

ms.set_context(mode=ms.GRAPH_MODE, device_target="Ascend", device_id=0)
config = AutoConfig.from_pretrained("glm2_6b")
config.checkpoint_name_or_path = "../mindformers/checkpoint_download/glm2/glm2_6b.ckpt"
model = AutoModel.from_config(config)
tokenizer = AutoTokenizer.from_pretrained("glm2_6b")

@app.route('/glm2_bot', methods=['POST'])
def say_hello_func():
	 print("----------- in hello func ----------")
	 data = json.loads(request.get_data(as_text=True))
	 text = data['text']
	 inputs = tokenizer(tokenizer.build_prompt(text))["input_ids"]
	 print(tokenizer.decode(inputs))
	 outputs = model.generate(inputs, max_length=14096)
	 outputs_text = tokenizer.decode(outputs)
	 return json.dumps({"response":outputs_text}, ensure_ascii=False,indent=4)
	 
@app.route('/goodbye', methods=['GET'])
def say_goodbye_func():
	 print("----------- in goodbye func ----------")
	 return 'nGoodbye!n'
	 
@app.route('/', methods=['POST'])
def default_func():
	print("----------- in default func ----------")
	data = json.loads(request.get_data(as_text=True))
	return 'n called default func !n {} n'.format(str(data))


# host must be "0.0.0.0", port must be 8080
if __name__ == '__main__':
	app.run(host="0.0.0.0", port=8080)

本地curl调用:

curl -i -k -H 'Accept:application/json' -H 'Content-Type:application/json;charset=utf8' -
X POST -d'{"text":"你好"}' http://172.17.0.2:8080/glm2_bot

Tips:
在服务端开放默认8080常规接口会遭受到扫描攻击,建议修改该端口或者添加ip白名单限制访问