A data scraping pipeline was setup to mine relevant data and sources from the Uttar Pradesh fiscal data portal, Koshvani.
Platfrom Name : Koshvani web -- A Gateway to Finance Activities in the State of Uttar Pradesh Platform URL : http://koshvani.up.nic.in/
A more detailed analysis of the platform and in-scope data can be found here.
Though the data on the Koshvani platform is available in structured format to us and analyse, scraping it through traditional methods was turning out to be a challenge.
Keeping in mind the platform structure and behaviour, a decision was undertaken to select Selenium as the mode of data mining and storing. The Selenium framework allows to automate browser actions to extract in-scope datasets.
Instructions for setting up the data pipeline.
<<TBD>>
During the data scraping exercise, the following challenges were faced during mining of the data. The respective resolutions for those challeges are also documented here.
Challenge | Resolution |
---|---|
You can refer to the contributing guidelines and understand how to contribute.
root
└── contribute/
└── CODE-OF-CONDUCT.md
└── CONTRIBUTING.md
└── LICENSE.md
└── README.md