
Due to the official notice that
本旧版过渡使用期至2025年7月31日
, I'm archiving this project.
This project encompasses a sophisticated web crawler engineered to systematically acquire educational resources from the 上海市中小学数字教学系统.
The crawler leverages Puppeteer, a Node.js library, to simulate human-like interactions with the Chromium browser, enabling the efficient extraction of download links. Subsequently, the tool employs the
curl
command-line utility to facilitate the recursive downloading of these resources to the local system.
README_demo.mov
# Clone the repository
npm i # Installs project dependencies, including compatible Chrome
npm run start # Executes the start script, which runs `app/start.js`
Crawl first
prompt> npm run start Directly download or crawl first? (d/C) Run in headless mode? (Y/n) subjectIndex [1-17]: 1 Crawl documents or answer sheets? (D/a) subjectIndex [1-2]: 1 Startup grade [Default: 0]: Offset [Default: 100]: Startup semester [Default: 0]: Offset [Default: 100]: Startup unit [Default: 0]: Offset [Default: 100]: Startup course [Default: 0]: Offset [Default: 100]:
Direct download
prompt> npm run start Directly download or crawl first? (d/C) d 劳动 - 6.json sitemapName:
No available linkmaps
prompt> npm run start Directly download or crawl first? (d/C) d No linkmaps available!