Skip to content

A Python library for automating interaction with websites.

License

Notifications You must be signed in to change notification settings

dbrobins/MechanicalSoup

 
 

Repository files navigation

MechanicalSoup

A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do Javascript.

I was a fond user of the Mechanize library, but unfortunately it's incompatible with Python 3 and development is inactive. MechanicalSoup provides a similar API, built on Python giants Requests (for http sessions) and BeautifulSoup (for document navigation).

Installation

Latest Version

From PyPI

 pip install MechanicalSoup

Python versions 2.6-2.7, 3.3-3.6, PyPy and PyPy3 are supported (and tested against).

Example

From example.py, code to log into the GitHub website:

"""Example app to login to GitHub using the StatefulBrowser class."""

from __future__ import print_function
import argparse
import mechanicalsoup
from getpass import getpass

parser = argparse.ArgumentParser(description="Login to GitHub.")
parser.add_argument("username")
args = parser.parse_args()

args.password = getpass("Please enter your GitHub password: ")

browser = mechanicalsoup.StatefulBrowser()
# Uncomment for a more verbose output:
# browser.set_verbose(2)

browser.open("https://github.com")
browser.follow_link("login")
browser.select_form('#login form')
browser["login"] = args.username
browser["password"] = args.password
resp = browser.submit_selected()

# Uncomment to launch a web browser on the current page:
# browser.launch_browser()

# verify we are now logged in
page = browser.get_current_page()
messages = page.find("div", class_="flash-messages")
if messages:
    print(messages.text)
assert page.select(".logout-form")

print(page.title.text)

# verify we remain logged in (thanks to cookies) as we browse the rest of
# the site
page3 = browser.open("https://github.com/hickford/MechanicalSoup")
assert page3.soup.select(".logout-form")

For an example with a more complex form (checkboxes, radio buttons and textareas), read tests/test_browser.py and tests/test_form.py.

Common problems

"No parser was explicitly specified"

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

Recent versions of BeautifulSoup show a harmless warning to encourage you to specify which HTML parser to use. You can do this in MechanicalSoup:

mechanicalsoup.Browser(soup_config={'features':'html.parser'})

Or if you have the parser lxml installed:

mechanicalsoup.Browser(soup_config={'features':'lxml'})

See also https://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser

Development

Build Status

Tests

py.test

Roadmap

See also

About

A Python library for automating interaction with websites.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%