Skip to content

Latest commit

 

History

History
64 lines (57 loc) · 6.76 KB

IP address strategies for web scraping.md

File metadata and controls

64 lines (57 loc) · 6.76 KB
created modified tags type status
2024-09-13T10:51
2024-09-13 15:52
web
scraping
web-scraping
data
browser
ip
network
ip-address
note
in-progress

Proxy Server

A proxy server is a middleman server between client (request) and server (response).

flowchart LR
    Client -->|Request| ProxyServer -->|Forwarded Request| DestinationServer
    DestinationServer -->|Response| ProxyServer -->|Forwarded Response| Client
Loading

A proxy server simply relays communication back and forth between client and destination server. The destination server can see the IP address of the proxy server but not the client IP address (unless the Proxy server intentionally forwards the client IP address).

VPN (Virtual Private Network)

A VPN consists of both VPN client (local encryption/decryption software) and VPN server. The VPN uses it's own encryption method, making the network traffic unintelligible to the client's Internet Service Provider (ISP).

flowchart TD
	Client -->|<span style="font-size: 1.5em;">👁️</span>request| VPNclient
	VPNclient -->|<span style="font-size: 1.5em;">👁️</span>forwarded response| Client
	VPNclient -->|<span style="font-size: 1.5em;">🔒</span>forwarded request| ISP
	ISP -->|<span style="font-size: 1.5em;">🔒</span>forwarded response| VPNclient
	ISP -->|<span style="font-size: 1.5em;">🔒</span>forwarded request| VPNServer
	VPNServer -->|<span style="font-size: 1.5em;">🔒</span>forwarded response| ISP
	VPNServer -->|<span style="font-size: 1.5em;">👁️</span>forwarded request| DestinationServer
	DestinationServer -->|<span style="font-size: 1.5em;">👁️</span>response| VPNServer
Loading

The destination server (e.g. website) cannot see the IP address of the client, but only the IP address of the VPN server. VPNs typically share a pool of IP addresses amongst their users.

Strategy Description Cost Pros Cons Notes
Use free proxy server(s) Free - Proxy server sees our IP address
-
- https://scrapfly.io/blog/introduction-to-proxies-in-web-scraping/
- If using HTTPS, proxy server probably can't read the encrypted request body (look out for certificate warnings)
- When rotating to a new proxy, make sure that the IP subnet is different
Use paid proxy server(s)
Use a free VPN A VPN is a proxy server shared by multiple users.
Typically, you don't have access directly to their HTTP servers. VPN IPs are more likely to face captcha etc.
Use a paid VPN
Use TOR Blocked TOR IPs ruins TOR for everyone
Use a cloud-based scraping service like ScraperAPI or BrightData

References

Related

  • Links to other notes which are directly related go here