异步爬取有道词典(入门js逆向)

抓到post包,分析提交的表单

比较多组数据发现只有salt, lts, sign发生变化(说明页面js脚本对其加密) 下面要找到该脚本分析加密过程。因为js启动在点击“翻译”时, 说明事件与该button绑定,找到该元素用事件监听找到脚本

打开后发现有九千多行,但问题不大 , ctrl+f搜索指定文字去找变量,找变量要找到最后出现的位置,毕竟要倒退分析

salt等值成了r对象的属性再去找r, r = v.generateSaltSign(n);去找v的该方法的定义

再去找r
找到这个小崽子开始对它分析
var r = function(e) {
var t = n.md5(navigator.appVersion)
, r = "" + (new Date).getTime()//时间戳要注意格式
, i = r + parseInt(10 * Math.random(), 10);//将parseInt(10 * Math.random(), 10)扔入控制台
return {
ts: r,
bv: t, //别忘了他是定值
salt: i,
sign: n.md5("fanyideskweb" + e + i + "Y2FYu%TNSbMCxc3t2u^XT")//md5加密hash处理一下
}
这在0到9中抽一个整形
分析完了,扔代码。
import hashlib, js2py, aiohttp, asyncio, time, random
def generate_formdata(word):
'''
ts: r = "" + (new Date).getTime(),
salt: i = r + parseInt(10 * Math.random(), 10),
sign: n.md5("fanyideskweb" + e + i + "Y2FYu%TNSbMCxc3t2u^XT") e是输入的文本
'''
ts = str(int(time.time() * 1000))
salt = ts + str(random.randint(0, 9))
temp = "fanyideskweb" + word + salt + "Y2FYu%TNSbMCxc3t2u^XT"
md5 = hashlib.md5()
md5.update(temp.encode())
sign = md5.hexdigest()
formdata = {
"i": word,
"from": "AUTO",
"to": "AUTO",
'smartresult': 'dict',
'client': 'fanyideskweb',
"salt": salt,
"sign": sign,
"lts": ts,
"bv": "c795a332c678d5063a1ee5eb15253848",
"doctype": "json",
"version": "2.1",
'keyfrom': "fanyi.web",
'action': 'FY_BY_REALTlME'
}
return formdata
async def get_ans(data):
url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'
headers = {
"User - Agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 95.0.4638.69Safari / 537.36",
"Referer": "https: // fanyi.youdao.com /"
}
async with aiohttp.ClientSession as session:
async with session.get(url=url, headers=headers, data=data) as resp:
ans = await resp.text()
print(ans)
if __name__ == '__main__':
words = ['dog', 'cat', 'snack', 'stack', 'mask']
datas = map(generate_formdata, words)
loop = asyncio.get_event_loop()
for i in datas:
loop.create_task(get_ans(i))
loop.run_until_complete(get_ans(-1))