created | modified | tags | type | status | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
2024-09-13 15:52 |
|
|
|
- https://scrapfly.io/blog/introduction-to-proxies-in-web-scraping/
- When rotating to a new proxy, make sure that the IP subnet is different
A proxy server is a middleman server between client (request) and server (response).
flowchart LR
Client -->|Request| ProxyServer -->|Forwarded Request| DestinationServer
DestinationServer -->|Response| ProxyServer -->|Forwarded Response| Client
A proxy server simply relays communication back and forth between client and destination server. The destination server can see the IP address of the proxy server but not the client IP address (unless the Proxy server intentionally forwards the client IP address).
A VPN consists of both VPN client (local encryption/decryption software) and VPN server. The VPN uses it's own encryption method, making the network traffic unintelligible to the client's Internet Service Provider (ISP).
flowchart TD
Client -->|<span style="font-size: 1.5em;">👁️</span>request| VPNclient
VPNclient -->|<span style="font-size: 1.5em;">👁️</span>forwarded response| Client
VPNclient -->|<span style="font-size: 1.5em;">🔒</span>forwarded request| ISP
ISP -->|<span style="font-size: 1.5em;">🔒</span>forwarded response| VPNclient
ISP -->|<span style="font-size: 1.5em;">🔒</span>forwarded request| VPNServer
VPNServer -->|<span style="font-size: 1.5em;">🔒</span>forwarded response| ISP
VPNServer -->|<span style="font-size: 1.5em;">👁️</span>forwarded request| DestinationServer
DestinationServer -->|<span style="font-size: 1.5em;">👁️</span>response| VPNServer
The destination server (e.g. website) cannot see the IP address of the client, but only the IP address of the VPN server. VPNs typically share a pool of IP addresses amongst their users.
Strategy | Description | Cost | Pros | Cons | Notes |
---|---|---|---|---|---|
Use free proxy server(s) | Free | - Proxy server sees our IP address - |
- https://scrapfly.io/blog/introduction-to-proxies-in-web-scraping/ - If using HTTPS, proxy server probably can't read the encrypted request body (look out for certificate warnings) - When rotating to a new proxy, make sure that the IP subnet is different |
||
Use paid proxy server(s) | |||||
Use a free VPN | A VPN is a proxy server shared by multiple users. Typically, you don't have access directly to their HTTP servers. VPN IPs are more likely to face captcha etc. |
||||
Use a paid VPN | |||||
Use TOR | Blocked TOR IPs ruins TOR for everyone | ||||
Use a cloud-based scraping service like ScraperAPI or BrightData |
- Github repo which fetches a proxy list from proxy sites
- scrapfly.io: A nice guide to proxy rotation strategies
- YouTube: "VPN vs Proxy: BIG Difference!"
- Links to other notes which are directly related go here