API Reference¶
Trustpilot scrapping Python package
- fakepilot.extract_info(file, with_reviews=False, nreviews=5)[source]¶
Return the information of a company page.
- Parameters:
- Returns:
Company’s information: name (
'name'), URL ('url'), number of reviews in Trustpilot ('nreviews'), score ('address') and if the company’s profile is claimed ('is_claimed') by the company. The categories, email ('email'), phone number ('phone'), address ('address') and rating distribution ('rating_distribution') are also included if they are on the page. In case of the reviews, which are included under the key'reviews', for each one the returned values are the author’s name ('author_name'), the author’s id ('author_id') in Trustpilot, rating ('star_rating'), date of publication ('date'), the text content ('content'), the number of reviews made by the author of the review ('nreviews'), the country that the author is from ('country'), the date of experience ('date_experience') and if the review is verified ('is_verified').- Return type:
- fakepilot.get_reviews(company_page, nreviews)[source]¶
Get the reviews’ data included in a company’s Trustpilot page.
The number of extracted reviews is the minimum of nreviews and the number of reviews in the company’s page.
Defines how the data is scrapped from the Trustpilot site.
- fakepilot.xray.concat_strings(node)[source]¶
Concatenate the strings contained in
nodeas a unique and complete string.We need to check if there is just one or more strings. In case of the latter, then we need to concatenate them. See https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string
- fakepilot.xray.extract_contact_info(tag)[source]¶
Extract the phone, address and email fields.
- Returns:
A pair whose first element is the phone number, then the email and finally the address.
- fakepilot.xray.extract_is_claimed(tag)[source]¶
Indicate if the Trustpilot company’s page is claimed by the company.
- fakepilot.xray.extract_number_reviews_author(tag)[source]¶
Extract the number of reviews made by the author of the current review.
- fakepilot.xray.extract_percentage_stars(tag)[source]¶
Extract the percentage of reviews that the company has received for each rating (1 star, 2 stars, etc.).
- fakepilot.xray.extract_rating_stats(tag)[source]¶
Extract the number of reviews and the TrustScore.
Both attributes are extracted simultaneously because they are in the same tag.
- fakepilot.xray.extract_review_content(tag)[source]¶
Extract the content or body of the review.
It is returned in Unicode encoding.
- fakepilot.xray.extract_url(tag)[source]¶
Return the URL of the company.
Trustpilot uses the company registered URL to uniquely identify a company. However, they aren’t normalized. Sometimes they can be
www.company-site.esorcompany-site.es. URL of the company as it is stored in Trustpilot.