Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sponsored search results and Pandas DataFrame conversion #69

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

engjellavdiu
Copy link

Summary

This PR adds two new features to the library, as suggested in Issue #11:

  1. Sponsored search results: The search function now accepts a sponsored boolean flag to fetch sponsored search results.
  2. Pandas DataFrame Conversion: Introduced a to_df function to convert search results into a Pandas DataFrame.

Changes

  • Modified SearchResult class to include an is_sponsored attribute.
  • Updated the search function to accommodate the sponsored flag.
  • Added a to_df function to convert search results to a Pandas DataFrame.
  • Updated README to reflect these changes.

How to Test

  1. Perform a search with the sponsored=True flag.
  2. Convert the search results to a Pandas DataFrame using the to_df function.

Issue Link

Closes #11

This commit introduces the ability to fetch sponsored search results from Google.

- Extended the `SearchResult` class to include a `is_sponsored` boolean field.
- Modified the `search` function to accept a new parameter `sponsored` that toggles whether or not to include sponsored results.
- Added additional parsing logic to identify and include sponsored results in the output.

Note: Ensure compliance with Google's terms of service when using this feature.
This commit adds a utility function `to_df` that converts the search results to a Pandas DataFrame for easier manipulation and analysis.

- Created `to_df` function that takes an iterable of `SearchResult` objects.
- The function enumerates through the search results, extracting `url`, `title`, and `description`, and optionally `is_sponsored` if present.
- Returns a Pandas DataFrame containing these details.

This feature enhances data manipulation capabilities, making it easier to process and analyze search results.
This commit updates the README to reflect the recent changes and new features added to the library:

- Added section on fetching sponsored results using the `sponsored` parameter.
- Included information about the `to_df` function for converting search results to a Pandas DataFrame.

The update aims to provide users with a comprehensive guide to using the latest version of the library.
Copy link
Owner

@Nv7-GitHub Nv7-GitHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, have you tested this? Thanks for the PR!

@@ -4,7 +4,7 @@
from requests import get
from .user_agents import get_useragent
import urllib

import pandas as pd
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you add this to requirements.txt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did :)

@@ -52,29 +70,52 @@ def search(term, num_results=10, lang="en", proxy=None, advanced=False, sleep_in
start = 0
while start < num_results:
# Send request
resp = _req(escaped_term, num_results - start,
resp = _req(escaped_term,num_results - start,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you delete the space?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be from formatter.

@engjellavdiu
Copy link
Author

Looks good, have you tested this? Thanks for the PR!
Yes.

Copy link
Owner

@Nv7-GitHub Nv7-GitHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, I need to pull the code and test it on my machine but unfortunately I am extremely busy with school and other projects right now so I don't really have time to do this but I'll try to get it done whenever I have time

@@ -1,2 +1,3 @@
beautifulsoup4>=4.9
requests>=2.20
pandas
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably add a version just to make sure it works in the future and we have less version issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract ads links (possible improvement)
2 participants