袁可一：修订间差异

←上一编辑下一编辑→

可视化wikitext

2023年12月22日 (五) 22:43的版本

上海中学信息与技术教师，目前担任本部与国际部的教学工作。

b站账号ykykyky。

学业经历

高中就读于杨浦高级中学，大学就读于上海师范大学。

逸事

曾在2604课堂上教学“爬虫”时提到本站：

——“你们知道什么是网站吗？”

——”不知道的话，那你们知道hywiki吗？”

——”每节课都有学生想爬hywiki，我也不知道到底有什么有趣的。”

——”你们也可以试试爬hywiki，看看能不能成功。”

笔者当时正在浏览hywiki，故此记录。

后来确实有学生成功将hywiki的页面爬下，获得了一个包含所有主名字空间页面文本的文件夹，其名为“[绝密]华育中学WIKI源代码全泄露”，大小约为 2.75MB，共1733个文件。袁可一对此事表示赞赏。爬虫使用的代码：

 import requests, json
 from urllib import parse
 header = {"Cookie":""}

 def getraw(title):
   url ="http://hywiki.xyz/api.php?action=query&prop=revisions&rvlimit=1&rvprop=content&format=json&titles="+parse.quote(title)
   html=requests.get(url,headers=header)
   html.encoding="utf-8"
   dict1 = json.loads(html.text)["query"]["pages"]
   page = {}
   for i in dict1.values():
       page = i
   print(url)
   #print(html.text)
   #print(dict1)
   print(page['revisions'][0]["*"])
   filename = "D:/hywiki/"+title+".txt"
   with open(filename, 'w', encoding= "utf-8") as file:
       file.write(page['revisions'][0]["*"])

 url_list ="http://hywiki.xyz/api.php?action=query&list=allpages&format=json"
 htmllist=requests.get(url_list,headers=header)
 htmllist.encoding="utf-8"
 ap = json.loads(htmllist.text)
 threshold = 1000
 done = 0
 while ap["batchcomplete"] == "" and threshold >= done:
   for i in range(10):
       #print(ap['query']['allpages'][i]["title"])
       getraw(ap['query']['allpages'][i]["title"])
   url_list_loop ="http://hywiki.xyz/api.php?action=query&list=allpages&format=json&apfrom="+parse.quote(ap["continue"]["apcontinue"])
   htmllist_loop=requests.get(url_list_loop,headers=header)
   htmllist_loop.encoding="utf-8"
   ap = json.loads(htmllist_loop.text)
   done +=1