初学Python，打算用它把喜欢的人的抖音喜欢视频全部下载下来

LEO-屹铭 2020-12-18 PM 4479℃ 1条

之前一直在用PHP作为爬虫的主力语言，没有打算过去学其他的语言，对于python也是一知半解，最近，想把一个抖音上的视频下载下来，保存到手机一看，有很严重的水印，于是想去掉这个水印，就在网上搜索，结果出来的全是Python的内容，没办法就硬着头皮看了一下，这一看不要紧，我发现Python竟然是这么的简单，怪不得身边有那么多的人用python啊。
抖音.jpg
通过对他们的代码进行分析，我发现了抖音去水印解析的原理，原来不是很难。
第一步：
既然是找到无水印的抖音视频，那第一步自然是先有一个抖音链接。
访问一个正常的抖音短视频的网页例如: https://v.douyin.com/JY5TXWJ/
用浏览器打开这个链接，然后按一下浏览器的F12,访问后,取服务器返回协议头

https://www.iesdouyin.com/share/video/**6851144902827396352**/?region=CN&mid=6851144917600324366&u_code=i7l6amic&titleType=title×tamp=1595312889&utm_campaign=client_share&app=aweme&utm_medium=ios&tt_from=copy&utm_source=copy

然后发现，上边那个加粗的就是视频的ID。
第二步：
就是将这个ID拼接成另一个链接：https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids=6851144902827396352
然后去访问这个链接。就会得到下面的这个json，分析一下。

{
status_code: 0,
item_list: [
{
author: {
nickname: "小刘不带刺",
signature: "承蒙大家厚爱 叫我小刘就可以 每个作品都只是想点醒深陷其中的你 感谢抖音平台让我能和你们相遇：小刘要补水 VX: liuhappy6666（注明来意） 警告：脾气不好 你咬我 我会加倍还给你",
avatar_thumb: {
uri: "3167b000ccac6d8a53e6a",
url_list: [
"https://p9-dy.byteimg.com/aweme/100x100/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p1-dy-ipv6.byteimg.com/aweme/100x100/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p3-dy-ipv6-test.byteimg.com/aweme/100x100/3167b000ccac6d8a53e6a.jpeg?from=4010531038"
]
},
avatar_medium: {
url_list: [
"https://p26-dy.byteimg.com/aweme/720x720/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p9-dy.byteimg.com/aweme/720x720/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p6-dy-ipv6.byteimg.com/aweme/720x720/3167b000ccac6d8a53e6a.jpeg?from=4010531038"
],
uri: "3167b000ccac6d8a53e6a"
},
followers_detail: null,
policy_version: null,
uid: "2541701236807328",
short_id: "3648299346",
avatar_larger: {
uri: "3167b000ccac6d8a53e6a",
url_list: [
"https://p3-dy-ipv6.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p9-dy.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p26-dy.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038"
]
},
unique_id: "Nanxx666",
platform_sync_info: null,
geofencing: null,
type_label: null
},
text_extra: [
{
start: 34,
end: 42,
user_id: "70258503077",
type: 0,
hashtag_name: "",
hashtag_id: 0
},
{
hashtag_name: "喝酒",
hashtag_id: 1558374815785985,
start: 30,
end: 33,
type: 1
}
],
author_user_id: 2541701236807328,
geofencing: null,
music: {
author: "小刘不带刺",
cover_hd: {
uri: "3167b000ccac6d8a53e6a",
url_list: [
"https://p6-dy-ipv6.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p3-dy-ipv6.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p29-dy.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038"
]
},
cover_medium: {
uri: "3167b000ccac6d8a53e6a",
url_list: [
"https://p3-dy-ipv6.byteimg.com/aweme/720x720/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p6-dy-ipv6.byteimg.com/aweme/720x720/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p1-dy-ipv6.byteimg.com/aweme/720x720/3167b000ccac6d8a53e6a.jpeg?from=4010531038"
]
},
play_url: {
uri: "https://sf3-dycdn-tos.pstatp.com/obj/ies-music/6851144919285762830.mp3",
url_list: [
"https://sf3-dycdn-tos.pstatp.com/obj/ies-music/6851144919285762830.mp3",
"https://sf6-dycdn-tos.pstatp.com/obj/ies-music/6851144919285762830.mp3"
]
},
position: null,
id: 6851144917600325000,
mid: "6851144917600324366",
title: "@小刘不带刺创作的原声",
cover_large: {
uri: "3167b000ccac6d8a53e6a",
url_list: [
"https://p6-dy-ipv6.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p3-dy-ipv6.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038",
"https://p29-dy.byteimg.com/aweme/1080x1080/3167b000ccac6d8a53e6a.jpeg?from=4010531038"
]
},
cover_thumb: {
uri: "3167b000ccac6d8a53e6a",
url_list: [
"https://p9-dy.byteimg.com/img/3167b000ccac6d8a53e6a~c5_168x168.webp?from=4010531038",
"https://p26-dy.byteimg.com/img/3167b000ccac6d8a53e6a~c5_168x168.webp?from=4010531038",
"https://p3-dy-ipv6.byteimg.com/img/3167b000ccac6d8a53e6a~c5_168x168.webp?from=4010531038",
"https://p9-dy.byteimg.com/img/3167b000ccac6d8a53e6a~c5_168x168.jpeg?from=4010531038"
]
},
duration: 10,
status: 1
},
video_labels: null,
is_live_replay: false,
comment_list: null,
long_video: null,
cha_list: [
{
type: 0,
view_count: 0,
hash_tag_profile: "",
is_commerce: false,
cid: "1558374815785985",
cha_name: "喝酒",
desc: "",
user_count: 0,
connect_music: null
}
],
video: {
cover: {
uri: "tos-cn-p-0015/8f6ac0a244034f978f98f629e8669f79",
url_list: [
"https://p29-dy.byteimg.com/img/tos-cn-p-0015/8f6ac0a244034f978f98f629e8669f79~c5_300x400.jpeg?from=2563711402_large",
"https://p6-dy-ipv6.byteimg.com/img/tos-cn-p-0015/8f6ac0a244034f978f98f629e8669f79~c5_300x400.jpeg?from=2563711402_large",
"https://p9-dy.byteimg.com/img/tos-cn-p-0015/8f6ac0a244034f978f98f629e8669f79~c5_300x400.jpeg?from=2563711402_large"
]
},
width: 720,
origin_cover: {
uri: "tos-cn-p-0015/9116553af46e4b1aad7e30860890e384_1595156482",
url_list: [
"https://p29-dy.byteimg.com/tos-cn-p-0015/9116553af46e4b1aad7e30860890e384_1595156482~tplv-dy-360p.jpeg?from=2563711402",
"https://p3-dy-ipv6.byteimg.com/tos-cn-p-0015/9116553af46e4b1aad7e30860890e384_1595156482~tplv-dy-360p.jpeg?from=2563711402",
"https://p26-dy.byteimg.com/tos-cn-p-0015/9116553af46e4b1aad7e30860890e384_1595156482~tplv-dy-360p.jpeg?from=2563711402"
]
},
has_watermark: true,
duration: 10798,
vid: "v0200f850000bsa2fvf4nj82nk76jh60",
play_addr: {
uri: "v0200f850000bsa2fvf4nj82nk76jh60",
url_list: [
"https://aweme.snssdk.com/aweme/v1/playwm/?video_id=v0200f850000bsa2fvf4nj82nk76jh60&ratio=720p&line=0"
]
},
height: 1280,
dynamic_cover: {
uri: "tos-cn-p-0015/4cf86095ac7b41eca87ef29ba328c0bc_1595156482",
url_list: [
"https://p26-dy.byteimg.com/obj/tos-cn-p-0015/4cf86095ac7b41eca87ef29ba328c0bc_1595156482?from=2563711402_large",
"https://p6-dy-ipv6.byteimg.com/obj/tos-cn-p-0015/4cf86095ac7b41eca87ef29ba328c0bc_1595156482?from=2563711402_large",
"https://p29-dy.byteimg.com/obj/tos-cn-p-0015/4cf86095ac7b41eca87ef29ba328c0bc_1595156482?from=2563711402_large"
]
},
ratio: "540p",
bit_rate: null
},
duration: 10798,
is_preview: 0,
group_id: 6851144902827396000,
statistics: {
aweme_id: "6851144902827396352",
comment_count: 3233,
digg_count: 59095,
play_count: 0
},
aweme_type: 4,
aweme_id: "6851144902827396352",
create_time: 1595156481,
share_info: {
share_weibo_desc: "#在抖音，记录美好生活#你以为我让你喝酒是在害你吗宝贝？ #喝酒 @DOU+小助手",
share_desc: "在抖音，记录美好生活",
share_title: "你以为我让你喝酒是在害你吗宝贝？ #喝酒 @DOU+小助手"
},
promotions: null,
forward_id: "0",
video_text: null,
risk_infos: {
content: "",
warn: false,
type: 0
},
share_url: "https://www.iesdouyin.com/share/video/6851144902827396352/?region=&mid=6851144917600324366&u_code=48&titleType=title&did=0&iid=0",
image_infos: null,
desc: "你以为我让你喝酒是在害你吗宝贝？ #喝酒 @DOU+小助手",
label_top_text: null
}
],
extra: {
now: 1608300352000,
logid: "2020121822055201019806014400261D69"
}
}

第三步：
我们发现 item_list[0].video.play_addr.url_list[0] 中的属性值是一个视频链接，我们试着去访问一下，发现确实是视频，不过还是有水印，到这里该怎么办呢？

https://aweme.snssdk.com/aweme/v1/playwm/?video_id=v0200f850000bsa2fvf4nj82nk76jh60&ratio=720p&line=0

第四步：
我发现去掉链接中的wm然后再去访问，就是无水印的啦。

https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200f850000bsa2fvf4nj82nk76jh60&ratio=720p&line=0

第五步：
就会得到一个新的播放地址,是无水印的
我们访问上边的那个链接就会得到真正的视频地址啦。

既然我们已经学会解析单个视频啦，那能不能把主页所有的视频全部解析下来呢？
带着这个疑问，我又去看了一下抖音个人主页，用F12分析一顿后，发现用户信息就在一个链接里面。

https://www.iesdouyin.com/web/api/v2/user/info/?sec_uid=" + sec_uid

去访问这个链接，就会的到作者喜欢的数，作品数等一些基本信息。

https://www.iesdouyin.com/web/api/v2/aweme/like/?sec_uid=" + sec_uid + "&count=21&max_cursor=" + str(max_cursor)+ "&aid=1128&_signature=Sxj4PgAAFMzhN7i-Sr.Ci0sY-C&dytk=

这个链接就会得到他喜欢的视频信息啦，下载链接也在里边。
其中str(max_cursor)是当前时间戳。

至此，已经完成一大半啦，然后就是一些下载的代码啦。
附核心代码

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
# https://github.com/missuo/douyin


import requests
import re
import webbrowser

# 输入链接，不用去除中文
all_url = input("请输入需要解析的链接:(支持包含中文)") 
pattern = re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')    # 正则表达式匹配URL
url_list = re.findall(pattern,all_url) # 在这里会自动从字符串中提取URL链接，返回的是一个列表
url = url_list[0]
#print(url)

# 定义函数用于获取抖音视频的id
def get_redirect_url(url):
    header = {
        'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1',
        'Upgrade-Insecure-Requests': '1',
    }
    data = requests.get(headers=header, url=url, timeout=5)
    vid = re.findall(r'\d+',data.url)
    return vid[0]
vid = get_redirect_url(url)
#print(vid)

# 向API发送GET请求
response = requests.get('https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids='+str(vid))
item = response.json().get("item_list")[0]
#print(item)

# 提取play_addr,也就是真实的视频链接，将playvm替换为play，以获得无水印的视频链接
mp4 = item.get("video").get("play_addr").get("url_list")[0].replace("playwm", "play")
print('真实的视频链接为:',mp4)

# 进行下载，会保存在和 .py文件 同一目录下
headers = {
    'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Mobile/15A372 Safari/604.1',
    'Upgrade-Insecure-Requests': '1',
}
res = requests.get(mp4, headers=headers, allow_redirects=True)
mp4url = res.url
desc = item.get("desc")
video = requests.get(url=mp4url, headers=headers)
with open(desc+".mp4", 'wb') as f:
    f.write(video.content)
    f.close()
    print(u"已经完成下载。")

扫描二维码，在手机上阅读！