Skip to content

Commit

Permalink
v1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Tomen committed Mar 21, 2022
1 parent cbde60f commit 8374bcc
Show file tree
Hide file tree
Showing 7 changed files with 56 additions and 32 deletions.
50 changes: 36 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,37 @@
# subscrape
A Python scraper for substrate chains that uses Subscan.

## Usage
- copy config/sample_scrape_config.json to config/scrape_config.json and configure to your desire.
- make sure there is a data/parachains folder
- run
- corresponding files will be created in data/
The basic workflow if `scrape.py` considers the configuration presented in `data/scrape_config.json`
to traverse through the given chains and perform the operations for each chain.
Currently, only scraping extrinsics is supported.

Data is stored locally using `SubscanDB`. It can then be queried. An example is provided with `transform.py`.

If a file already exists in data/, that operation will be skipped in subsequent runs.
The application works in a way that subsequent runs will only fetch deltas.

### Configuration
## Limitations
Error handling is not very sophisticated, so if the scrape is interrupted by an uncaught exception,
the delta might be incomplete and subsequent runs might miss some data. To remedy the issue,
the delta must be deleted and the scraper run again.

To query extrinsics from Substrate chains, only the module and call is needed.
## Usage
- If you have a Subscan API key, drop it in a file named `config/subscan-key`
- copy `config/sample_scrape_config.json` to `config/scrape_config.json` and configure to your desire.
- run `scrape.py`
- corresponding files will be created in data/

Utility.batch() and Utility.batch_all() calls are also recursively searched for the extrinsic.
## Configuration

Filters can be applied:
To query extrinsics from Substrate chains, only the module and call is needed. Filters can be applied.

"_filter": [{"block_timestamp": [{"<":1644796800}]}],
### Scraper: extrinsics
Scrapes extrinsics by using their `call_module` and `call_name`.

### Filter: _skip
Will skip the current scope of the config.

### Filter: _filter
`"_filter": [{"block_timestamp": [{"<":1644796800}]}],`

To query transactions from Moonbeam chains, the contract address and the hex-formatted method id is needed. This can be found by scrolling to the input of a transaction on moonbeam and copying the method id. Example: https://moonriver.moonscan.io/tx/0x35c7d7bdd33c77c756e7a9b41b587a6b6193b5a94956d2c8e4ea77af1df2b9c3

Expand All @@ -32,11 +45,20 @@ We use the following methods in the projects:
- Async Operations: https://docs.python.org/3/library/asyncio-task.html

### SubscanWrapper
There is a class SubscanWrapper that encapsulates the logic around calling Subscan.
There is a class `SubscanWrapper` that encapsulates the logic around calling Subscan.
API: https://docs.api.subscan.io/
If you have a Subscan API key, you can put it in the main folder in a file called "subscan-key" and it will be applied to your calls.

### ParachainScraper
This is a scraoer that knows how to use the SubscanWrapper to fetch data for a parachain and serialize it to disk.
`ParachainScraper` knows how to use the `SubscanWrapper` to fetch data for a parachain and serialize it to disk.

Currently it knows how to fetch extrinsics.

### MoonscanWrapper
Analoguous to `SubscanWrapper`

### MoonbeamScraper
Analoguous to `ParachainScraper`

Currently it knows how to fetch addresses and extrinsics.
### SubscanDB
`SubscanDB` serializes extracted data to disk and unserializes it later.
8 changes: 7 additions & 1 deletion config/sample_scrape_config.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
{
"_version": 1,
"kusama": {
"extrinsics": {
"_filter": [{"block_timestamp": [{"<":1644796800}]}],
"system": [
"remark"
]
],
"utility":{
"_skip": true,
"batch":{},
"batch_all":{}
}
}
},
"moonriver": {
Expand Down
11 changes: 6 additions & 5 deletions subscrape/db/subscrape_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@
# one DB per Parachain
class SubscrapeDB:

def __init__(self, name):
def __init__(self, parachain):
self.logger = logging.getLogger("SubscrapeDB")
self._path = f"data/parachains/{name}_"
self._path = f"data/parachains/{parachain}_"
self._parachain = parachain
# the name of the currently loaded extrinsics
self._extrinsics_name = None
# the name of the currently loaded extrinsics sector
Expand All @@ -18,7 +19,7 @@ def __init__(self, name):
self._extrinsics = None
# a dirty flag that keeps track of unsaved changes
self._dirty = False
self.digits_per_sector = 6
self.digits_per_sector = 4

def _extrinsics_folder(self, name):
return f"{self._path}extrinsics_{name}/"
Expand All @@ -40,7 +41,7 @@ def set_active_extrinsics_call(self, call_module, call_name):
# make sure folder exists
folder_path = self._extrinsics_folder(call_string)
if not os.path.exists(folder_path):
os.mkdir(folder_path)
os.makedirs(folder_path)

# the method assumes that consumers go through sorted block lists
# it will load a sector from disk when it becomes active
Expand All @@ -64,7 +65,7 @@ def write_extrinsic(self, extrinsic):
# load sector file or create empty dict
self._extrinsics_sector_name = sector
self._extrinsics = self._load_extrinsics_sector(self._extrinsics_name, sector)
self.logger.info(f"Switched to sector {self._extrinsics_sector_name}. {len(self._extrinsics)} active entries")
self.logger.info(f"{self._parachain} {self._extrinsics_name}: Switched to sector {self._extrinsics_sector_name}. {len(self._extrinsics)} active entries")


# do we already know this extrinsic?
Expand Down
2 changes: 1 addition & 1 deletion subscrape/scrapers/parachain_scrape_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ def __init__(self, config):
self.filter = None
self.processor_name = None
self.skip = False
self.digits_per_sector = 6
self.digits_per_sector = None
self._set_config(config)

def _set_config(self, config):
Expand Down
9 changes: 5 additions & 4 deletions subscrape/scrapers/parachain_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ async def scrape(self, operations):
continue

# go
await self.fetch_extrinsics(module, call, call_config.filter, call_config.digits_per_sector)
await self.fetch_extrinsics(module, call, call_config)
elif key == "addresses":
await self.fetch_addresses()
elif key.startswith("_"):
Expand All @@ -56,10 +56,11 @@ async def scrape(self, operations):
self.logger.error(f"config contained an operation that does not exist: {key}")
exit

async def fetch_extrinsics(self, call_module, call_name, filter, digits_per_sector):
async def fetch_extrinsics(self, call_module, call_name, call_config):
call_string = f"{call_module}_{call_name}"

self.db.digits_per_sector = digits_per_sector
if call_config.digits_per_sector is not None:
self.db.digits_per_sector = call_config.digits_per_sector
self.db.set_active_extrinsics_call(call_module, call_name)

self.logger.info(f"Fetching extrinsics {call_string} from {self.api.endpoint}")
Expand All @@ -73,7 +74,7 @@ async def fetch_extrinsics(self, call_module, call_name, filter, digits_per_sect
self.db.write_extrinsic,
list_key="extrinsics",
body=body,
filter=filter
filter=call_config.filter
)

self.db.flush_extrinsics()
Expand Down
6 changes: 0 additions & 6 deletions test.py

This file was deleted.

2 changes: 1 addition & 1 deletion transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ def fetch_batch_contributions():
memo_call = calls[1]
if memo_call["call_module"] == "Crowdloan" and memo_call["call_name"] == "add_memo":
memo = memo_call["params"][1]["value"]
referral = ss58.ss58_encode(f"0x{memo}", ss58_format=42)
referral = ss58.ss58_encode(f"0x{memo}", ss58_format=2)
else:
referral = json.dumps(memo_call)
value = contribute_call["params"][1]["value"]
Expand Down

0 comments on commit 8374bcc

Please sign in to comment.