API Reference

Defines how the data is scrapped from the Trustpilot site.

fakepilot.xray.extract_author_id(tag)[source]

Extract the review’s author id.

fakepilot.xray.extract_author_name(tag)[source]

Extract the review’s author’s name.

fakepilot.xray.extract_categories(tag)[source]

Return the company’s category list.

fakepilot.xray.extract_company_info(tag)[source]

Extract the data of a company.

fakepilot.xray.extract_company_name(tag)[source]

Return the name of the company.

fakepilot.xray.extract_contact_info(tag)[source]

Extract the phone, address and email fields.

Returns:

A pair whose first element is the phone number, then the email and finally the address.

fakepilot.xray.extract_content(tag)[source]

Extract the content or body of the review.

It is returned in Unicode encoding.

fakepilot.xray.extract_date(tag)[source]

Extract the date the review was posted.

fakepilot.xray.extract_rating(tag)[source]

Extract the rating in the review.

fakepilot.xray.extract_rating_stats(tag)[source]

Extract the number of reviews and the TrustScore.

Both attributes are extracted simultaneously because they are in the same tag.

fakepilot.xray.extract_review_info(tag)[source]

Extract the review’s data

fakepilot.xray.extract_url(tag)[source]

Return the URL of the company.

Trustpilot uses the company registered URL to uniquely identify a company. However, they aren’t normalized. Sometimes they can be www.company-site.es or company-site.es. URL of the company as it is stored in Trustpilot.

fakepilot.xray.parse_page(page)[source]

Parse page with BeautifulSoup.

Set the lxml’s parser if it is installed. If not, the html.parser is used.

Parameters:

page (str) – HTML document to be parsed.

Returns:

Parsed page with BeautifulSoup class.

Return type:

bs4.BeautifulSoup