某团商铺查询 token参数解析与生成

  • 2020-02-29
  • 9,108
  • 15

一、分析接口

Network中可以找到返回是json格式数据的查询接口。

请求url中可以观察到请求的参数如下

二、token参数解析

首先看下token长什么样子

eJx1j0tvgkAUhf/LbCUwM7zEpAtEoGixSBWqTRcgb3nJoLQ0/e+dJjZpF01ucs589+TkzgforAjMEIQKhAy4xh2YAcRCVgIM6AndiFM8lSQoKVgRGHD8y3hZZkDYeQswe0GQhwxCEn79Ri4lL0jBFMEpfGV+eyzQ+U5ZNASyvm/JjOPSka3ivL8ENXtsKo56kuUcPeOfAKAN1ZY2UD3dNLhp//O26XdoBcnTmrp4OZTFFj8OhbrJYkUTotWb9lwsnnJhY7irXBjmvqNWRRGjxtEDbeDVtWt77UVN/F2HnO1EqYVNceRLUsYkGebxtoMbf5AnSceF6UMx6prsrZZX06+qzGgP6VDKonV/Fj0r0xxThzweF2PNS/tDOFrqsTZcJT/BMrEP5mjKxA3O+jpansXIHJdOYXlms67uJ49T1WhJI+pe79rXkSv3D3HUNVHSDfiScbv3LOftBKZK2IYRCZoox8gzrN4hPe9ypzvw+QWfPJQk

一眼看上去有点像base64编码,先用base64解析一下试试

import base64
s = 'eJx1j0tvgkAUhf/LbCUwM7zEpAtEoGixSBWqTRcgb3nJoLQ0/e+dJjZpF01ucs589+TkzgforAjMEIQKhAy4xh2YAcRCVgIM6AndiFM8lSQoKVgRGHD8y3hZZkDYeQswe0GQhwxCEn79Ri4lL0jBFMEpfGV+eyzQ+U5ZNASyvm/JjOPSka3ivL8ENXtsKo56kuUcPeOfAKAN1ZY2UD3dNLhp//O26XdoBcnTmrp4OZTFFj8OhbrJYkUTotWb9lwsnnJhY7irXBjmvqNWRRGjxtEDbeDVtWt77UVN/F2HnO1EqYVNceRLUsYkGebxtoMbf5AnSceF6UMx6prsrZZX06+qzGgP6VDKonV/Fj0r0xxThzweF2PNS/tDOFrqsTZcJT/BMrEP5mjKxA3O+jpansXIHJdOYXlms67uJ49T1WhJI+pe79rXkSv3D3HUNVHSDfiScbv3LOftBKZK2IYRCZoox8gzrN4hPe9ypzvw+QWfPJQk'
print(s)
result = base64.b64decode(s)
print(result)

先试下decode为utf-8,报错 无法直接解析,可能是经过压缩,尝试解压

import zlib
result2 = zlib.decompress(result)
print(result2)

结果如下

b'{"rId":100900,"ver":"1.0.6","ts":1582866069294,"cts":1582866069377,"brVD":[1030,1162],"brR":[[1920,1080],[1920,1080],24,24],"bI":["https://gz.meituan.com/meishi/","https://gz.meituan.com/"],"mT":[],"kT":[],"aT":[],"tT":[],"aM":"","sign":"eJwljT2OwjAQhe9C4dKxCXjDSi4QFRKi4wBWPAmjje1oPEaCw3ANRMVpuAfWUr1PT+9n4Qjc3lslesfwBeTr0QWw7+fr/bgLjzEC7VKJvGWmmhFpZgwl75IHq5VIhCPGE032zDzn36YZbzIAcnFR9ik0lfMZGzG7sRaqENdJq5dGzJPjIVGoNmH+O8AFpso5EVtRMvz/lYLedrodfrw2uh/Uyhi3Mf0g9bpbdsaodi21VFItPst3R/k="}'

 

基本能看懂了大部分参数,ts 和 cts 是时间戳,注意位数 ts为时间戳*1000左右,cts比ts稍微大一些,其他参数为常量 sign用同样的方法尝试解析

sign = 'eJwljT2OwjAQhe9C4dKxCXjDSi4QFRKi4wBWPAmjje1oPEaCw3ANRMVpuAfWUr1PT+9n4Qjc3lslesfwBeTr0QWw7+fr/bgLjzEC7VKJvGWmmhFpZgwl75IHq5VIhCPGE032zDzn36YZbzIAcnFR9ik0lfMZGzG7sRaqENdJq5dGzJPjIVGoNmH+O8AFpso5EVtRMvz/lYLedrodfrw2uh/Uyhi3Mf0g9bpbdsaodi21VFItPst3R/k='
result3 = zlib.decompress(base64.b64decode(sign))
print(result3)

结果为一系列参数拼接:

b'"areaId=0&cateId=0&cityName=\xe5\xb9\xbf\xe5\xb7\x9e&dinnerCountAttrId=&optimusCode=10&originUrl=https://gz.meituan.com/meishi/&page=1&partner=126&platform=1&riskLevel=1&sort=&userId=&uuid=813f7d161cf0466a96cf.1582866035.1.0.0"'

 

三、尝试构造Token完成请求

大概就是一波反向操作,其中需要注意几个问题 sign中参数最后需要转化为字符串类型而非b

import base64
import zlib
import time
import requests
from urllib import parse


def decode_token(s):
    a = base64.b64decode(s)
    print(a)
    result = zlib.decompress(a)
    return result


def encode_sign(sign):
    result = base64.b64encode(zlib.compress(sign))
    return str(result, 'utf-8')


def encode_token():
    d = {}
    d["rId"] = 100900
    d["ver"] = "1.0.6"
    d["ts"] = int(time.time() * 1000)
    # d["ts"] = 1582704455270
    d["cts"] = int(time.time() * 1000 + 75)
    # d["cts"] = 1582704455345
    d["brVD"] = [822, 1151]
    d["brR"] = [[1920, 1080], [1920, 1080], 24, 24]
    d["bI"] = ["https://gz.meituan.com/meishi/", "https://gz.meituan.com/"]
    d["mT"] = []
    d["kT"] = []
    d["aT"] = []
    d["tT"] = []
    d["aM"] = ""
    sign = b'"areaId=0&cateId=0&cityName=\xe5\xb9\xbf\xe5\xb7\x9e&dinnerCountAttrId=&optimusCode=10&originUrl=https://gz.meituan.com/meishi/&page=1&partner=126&platform=1&riskLevel=1&sort=&userId=&uuid=27feb019adbd4186a43e.1582704408.1.0.0"'
    d["sign"] = encode_sign(sign)
    return d


def get_token():
    d = encode_token()
    data = str(d).replace("'", '"').replace(' ', '').encode()
    # 进行 url 编码
    token = parse.quote(encode_sign(data))
    return token


if __name__ == '__main__':
    token = get_token()
    url = 'https://gz.meituan.com/meishi/api/poi/getPoiList?cityName=%E5%B9%BF%E5%B7%9E&cateId=0&areaId=0&sort=&dinnerCountAttrId=&page=1&userId=&uuid=27feb019adbd4186a43e.1582704408.1.0.0&platform=1&partner=126&originUrl=https%3A%2F%2Fgz.meituan.com%2Fmeishi%2F&riskLevel=1&optimusCode=10&_token=' + token
    headers = {
        # 'Cookie': '_lxsdk_cuid=17061bdb99fc8-02d1bc12d68259-3c604504-1fa400-17061bdb99fc8; _hc.v=4c9bb6b2-be41-8c38-5b85-55b5e525029d.1582187854; iuuid=F27BCC814DA3D3DE87F8AF31680A50D3481F769B5E586743E973F4D665CA8F1A; cityname=%E5%B9%BF%E5%B7%9E; _lxsdk=F27BCC814DA3D3DE87F8AF31680A50D3481F769B5E586743E973F4D665CA8F1A; webp=1; latlng=23.130873,113.314218,1582596340177; i_extend=C_b1Gimthomepagecategory11H__a; PHPSESSID=glju3e7tram9mm14oi6gl8pdq2; Hm_lvt_f66b37722f586a240d4621318a5a6ebe=1582704399; Hm_lpvt_f66b37722f586a240d4621318a5a6ebe=1582704399; __utma=211559370.1822528469.1582704399.1582704399.1582704399.1; __utmc=211559370; __utmz=211559370.1582704399.1.1.utmcsr=baidu|utmccn=baidu|utmcmd=organic|utmcct=zt_search; uuid=27feb019adbd4186a43e.1582704408.1.0.0; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; ci=20; __mta=209023277.1582187792859.1582187792859.1582704411604.2; client-id=50dea765-bdec-4ac7-b5b4-5f92afa9f54c; _lxsdk_s=17080d79353-d36-553-5%7C%7C4',
        'Host': 'gz.meituan.com',
        'Referer': 'https://gz.meituan.com/meishi/',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
    }
    r = requests.get(url, headers=headers)
    print(r.text)

 

结果:

目前店铺页数据提取比较普通 这里不多赘述。观察发现,数据中显示totalCounts有以前多条,但是实际上用浏览器操作的过程中只有67页的数据,所以如果想要获得完整的数据,细分区域,分类这些参数,通过排列组合分次请求才能获得更多数据。

评论

  • yibing回复

    太赞了!!!看了你的帖子终于爬成功了!!!感谢!

    • KaGen回复

      诶嘿!不客气

  • 十叁回复

    😀 😮 😯 喔~~~~~~

  • stephengbc回复

    想问下大佬我这里返回显示未登录具体可以怎么解决呀😢感恩!!!

    • KaGen回复

      检查下是不是没有携带cookie进行访问哈?邮箱有点东西哈哈哈哈哈

    • KaGen回复

      我提供的cookie可能已经过期,你需要用你自己的cookie进行访问

      • stephengbc回复

        哈哈哈哈,非常感谢您的回复~但是我在想一直使用自己的cookie应该会被识别到在爬然后被封号吧?使用timesleep间隔爬的话可以避免被封号吗,或者我也看到有人仿造cookie。因为我要爬不同区域的商家数据,加起来可能有好几百页,我觉得大概率会遇到反爬,您当时是没有遇到封ip这种反爬吗?

    • KaGen回复

      timesleep可以控制访问速度 几百页数据不多 控制速度不会那么容易被封

      • stephengbc回复

        ok,那我设置时间长些,再次感谢~

  • stephengbc回复

    您回复的好快,感动!最近急需爬他们的数据,所以可能问题有点多,实在抱歉,耽误您时间了哈哈哈

    • guoke回复

      留个爪,一起玩,我也在整
      二五灵灵八四七七零四

  • KaGen回复

    测试邮件

  • LL回复

    大神,大众点评滑块验证的_token和behavior可以请教下你么

    • KaGen回复

      emmmm 这个我也没有探究过 可能帮不上忙啦

苟活时长: Copyright © 2019-2020 OJO