API Reference

Trustpilot scrapping Python package

fakepilot.extract_info(file, with_reviews=False, nreviews=5)[source]

Return the information of a company page.

Parameters:
  • file (file object) – Company’s page of Trustpilot.

  • with_reviews (bool, optional) – Indicates whether the company’s reviews are extracted.

  • nreviews (int, optional) – Number of reviews to be extracted. Ignored if with_reviews is False.

Returns:

Company’s information: name ('name'), URL ('url'), number of reviews in Trustpilot ('nreviews'), score ('address') and if the company’s profile is claimed ('is_claimed') by the company. The categories, email ('email'), phone number ('phone'), address ('address') and rating distribution ('rating_distribution') are also included if they are on the page. In case of the reviews, which are included under the key 'reviews', for each one the returned values are the author’s name ('author_name'), the author’s id ('author_id') in Trustpilot, rating ('star_rating'), date of publication ('date'), the text content ('content'), the number of reviews made by the author of the review ('nreviews'), the country that the author is from ('country'), the date of experience ('date_experience') and if the review is verified ('is_verified').

Return type:

dict(str, )

fakepilot.get_reviews(company_page, nreviews)[source]

Get the reviews’ data included in a company’s Trustpilot page.

The number of extracted reviews is the minimum of nreviews and the number of reviews in the company’s page.

Parameters:
  • company_page (bs4.BeautifulSoup) – HTML company’s page where the reviews are extracted from.

  • nreviews (int) – Number of reviews to be extracted.

Returns:

Reviews of a company.

Return type:

list(dict(str,))

Defines how the data is scrapped from the Trustpilot site.

fakepilot.xray.concat_strings(node)[source]

Concatenate the strings contained in node as a unique and complete string.

We need to check if there is just one or more strings. In case of the latter, then we need to concatenate them. See https://www.crummy.com/software/BeautifulSoup/bs4/doc/#string

fakepilot.xray.extract_authors_country(tag)[source]

Extract the country where the author is from.

fakepilot.xray.extract_categories(tag)[source]

Return the company’s category list.

fakepilot.xray.extract_company_info(tag)[source]

Extract the data of a company.

fakepilot.xray.extract_company_name(tag)[source]

Return the name of the company.

fakepilot.xray.extract_contact_info(tag)[source]

Extract the phone, address and email fields.

Returns:

A pair whose first element is the phone number, then the email and finally the address.

fakepilot.xray.extract_date_experience(tag)[source]

Extract the date of experience of the review.

fakepilot.xray.extract_is_claimed(tag)[source]

Indicate if the Trustpilot company’s page is claimed by the company.

fakepilot.xray.extract_is_verified(tag)[source]

Extract if the review is verified.

fakepilot.xray.extract_number_reviews_author(tag)[source]

Extract the number of reviews made by the author of the current review.

fakepilot.xray.extract_percentage_stars(tag)[source]

Extract the percentage of reviews that the company has received for each rating (1 star, 2 stars, etc.).

fakepilot.xray.extract_rating_stats(tag)[source]

Extract the number of reviews and the TrustScore.

Both attributes are extracted simultaneously because they are in the same tag.

fakepilot.xray.extract_review_author_id(tag)[source]

Extract the review’s author id.

fakepilot.xray.extract_review_author_name(tag)[source]

Extract the review’s author’s name.

fakepilot.xray.extract_review_content(tag)[source]

Extract the content or body of the review.

It is returned in Unicode encoding.

fakepilot.xray.extract_review_date(tag)[source]

Extract the date the review was posted.

fakepilot.xray.extract_review_info(tag)[source]

Extract the review’s data

fakepilot.xray.extract_review_rating(tag)[source]

Extract the rating in the review.

fakepilot.xray.extract_review_title(tag)[source]

Extract the title of the review.

fakepilot.xray.extract_url(tag)[source]

Return the URL of the company.

Trustpilot uses the company registered URL to uniquely identify a company. However, they aren’t normalized. Sometimes they can be www.company-site.es or company-site.es. URL of the company as it is stored in Trustpilot.

fakepilot.xray.has_attr(attr_name)[source]

Return a function that checks if a tag has an attribute.

fakepilot.xray.parse_page(page)[source]

Parse page with BeautifulSoup.

Set the lxml’s parser if it is installed. If not, the html.parser is used.

Parameters:

page (str) – HTML document to be parsed.

Returns:

Parsed page with BeautifulSoup class.

Return type:

bs4.BeautifulSoup