API Reference¶

Defines how the data is scrapped from the Trustpilot site.

fakepilot.xray.extract_author_id(tag)[source]¶: Extract the review’s author id.

fakepilot.xray.extract_author_name(tag)[source]¶: Extract the review’s author’s name.

fakepilot.xray.extract_categories(tag)[source]¶: Return the company’s category list.

fakepilot.xray.extract_company_info(tag)[source]¶: Extract the data of a company.

fakepilot.xray.extract_company_name(tag)[source]¶: Return the name of the company.

fakepilot.xray.extract_contact_info(tag)[source]¶

Extract the phone, address and email fields.

Returns:: A pair whose first element is the phone number, then the email and finally the address.

fakepilot.xray.extract_content(tag)[source]¶

Extract the content or body of the review.

It is returned in Unicode encoding.

fakepilot.xray.extract_date(tag)[source]¶: Extract the date the review was posted.

fakepilot.xray.extract_rating(tag)[source]¶: Extract the rating in the review.

fakepilot.xray.extract_rating_stats(tag)[source]¶

Extract the number of reviews and the TrustScore.

Both attributes are extracted simultaneously because they are in the same tag.

fakepilot.xray.extract_review_info(tag)[source]¶: Extract the review’s data

fakepilot.xray.extract_url(tag)[source]¶

Return the URL of the company.

Trustpilot uses the company registered URL to uniquely identify a company. However, they aren’t normalized. Sometimes they can be www.company-site.es or company-site.es. URL of the company as it is stored in Trustpilot.

fakepilot.xray.parse_page(page)[source]¶

Parse page with BeautifulSoup.

Set the lxml’s parser if it is installed. If not, the html.parser is used.

Parameters:: page (str) – HTML document to be parsed.
Returns:: Parsed page with BeautifulSoup class.
Return type:: bs4.BeautifulSoup