Skip to content
This repository was archived by the owner on Oct 23, 2024. It is now read-only.

Mccranky83/aistudy-docs-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AiStudy Documents Crawler

Due to the official notice that 本旧版过渡使用期至2025年7月31日, I'm archiving this project.

This project encompasses a sophisticated web crawler engineered to systematically acquire educational resources from the 上海市中小学数字教学系统.

The crawler leverages Puppeteer, a Node.js library, to simulate human-like interactions with the Chromium browser, enabling the efficient extraction of download links. Subsequently, the tool employs the curl command-line utility to facilitate the recursive downloading of these resources to the local system.

README_demo.mov

Installation

# Clone the repository
npm i # Installs project dependencies, including compatible Chrome
npm run start # Executes the start script, which runs `app/start.js`

Examples

Crawl first

prompt> npm run start

Directly download or crawl first? (d/C) 
Run in headless mode? (Y/n) 
subjectIndex [1-17]: 1
Crawl documents or answer sheets? (D/a) 
subjectIndex [1-2]: 1
Startup grade [Default: 0]: 
Offset [Default: 100]: 
Startup semester [Default: 0]: 
Offset [Default: 100]: 
Startup unit [Default: 0]: 
Offset [Default: 100]: 
Startup course [Default: 0]: 
Offset [Default: 100]: 

Direct download

prompt> npm run start

Directly download or crawl first? (d/C) d
劳动 - 6.json
sitemapName:

No available linkmaps

prompt> npm run start

Directly download or crawl first? (d/C) d
No linkmaps available!

About

上海市中小学数字教学系统爬虫

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published