Scraper API and CLI

This is a step by step guide on how to use this simple data extraction package which supports CLI and API requests.

Built with 🛠️

Axios - Promise based HTTP client
Cheerio - Library for parsing and manipulating HTML
Express - Web framework for Node.js
Jest - JavaScript Testing Framework
Node.js - JavaScript runtime environment
NPM - Package manager for Node.js

Installation ⚙️

Run npm i to install the package dependencies.

CLI Usage ✅

Run:

  node cli-scraper.js <htmlSource> <selectorSource>

htmlSource can either be an html file or a web URl.
selectorSource is a JSON of keys with css selectors as values.
In case of repetitive data, the property __root is required.

E.g. node cli-scraper.js examples/input1.html examples/selector1.json

The results will be logged in the console and written in the scrapedData.json file inside the examples folder.

API Usage ✅

Run npm run dev to start the server.
Use curl, postman or another API testing tool to make your API requests.
The HTTP method should be POST and the body should be a JSON with html and selectors properties:

html can either be an html file stringified or a web URl.
selectors is an object of keys with css selectors as values.
In case of repetitive data, the property __root is required.

E.g.

curl -X POST http://localhost:3000/scrape -H "Content-Type: application/json" -d '{"html": "https://github.com/", "selectors": {"title": "h1:first-child"}}'

Requirements ⚙️

Node.js
NPM
A text editor like Visual Studio Code
An API testing platform like Postman

Notes 📋

I based my libraries decision on most popular and downloaded npm options.
The first example provided in the challenge description is wrong since there's no "p" element child of "h1"
The second example provided in the challenge description was modified to include tbody because of the cheerio load function behaviour

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
cli-scraper.js		cli-scraper.js
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraper API and CLI

Built with 🛠️

Installation ⚙️

CLI Usage ✅

API Usage ✅

Requirements ⚙️

Notes 📋

About

Languages

lsegg/scraper-api-challenge

Folders and files

Latest commit

History

Repository files navigation

Scraper API and CLI

Built with 🛠️

Installation ⚙️

CLI Usage ✅

API Usage ✅

Requirements ⚙️

Notes 📋

About

Topics

Resources

Stars

Watchers

Forks

Languages