AI tool that generates parsing code for scraping web pages with similar styles. Users input web page URLs for training, choose a parser prompt, and submit to build the parser.
Automatically generates code to parse web pages with similar visual styles. Users input URLs from the same website to train the parser.
Allows users to choose and customize parsing prompts to tailor the data extraction process.
When the training is 'Fail to Load' due to an HTML downloader being banned, users can save the HTML and upload it to the Coparser to train the model.
Guidance on installing Python packages like lxml and playwright to run generated code. Commands are provided for installing necessary packages.
Allows you to add, delete, and manage training cases that help tailor the parser to specific inputs or webpage structures.
Enables the user to regenerate code based on alterations or new training data, ensuring that the parser remains effective with updated website layouts.
Allows users to add and manage training cases for the parser. It supports attaching specific URLs to train the parser on different datasets, ensuring accuracy in data extraction.
Empowers users to generate and modify custom parser code using the provided input and sample data, offering flexibility to tailor data extraction logic to specific needs.
Includes visual output review features to verify and validate the extraction results against provided sample pages, helping ensure the parser is functioning correctly.
Uses Playwright for browsing and CSSSelector for parsing HTML to extract product details from web pages.
Extracts and cleans price information from the product page using CSS selectors.
Retrieves the product name from the HTML content using a CSS selector for the product name element.
Determines the availability of a product by checking specific HTML elements for delivery information.
Extracts the URL of the product image from the HTML content using a CSS selector.
Utilizes Playwright to launch a browser and extract the HTML content from a given Amazon product URL, which is then saved to a local file for further processing.
Extracts the sale price of a product using CSS selection from the HTML content, handling exceptions if the data is not initially found.
Gathers a range of prices if multiple price presents by extracting data from specified CSS selectors.
Extracts the product name from the HTML content using defined CSS selectors to locate the product title and retrieve its text value.
Fetches the total number of customer reviews available for a product by selecting the appropriate CSS element and retrieving its text.
Checks and retrieves the availability status of a product by inspecting the designated CSS selector for stock information.
Extracts URLs of the product images using CSS selectors, ensuring that visual content is included in the parsed data.
Calculates and returns the average review score for a product by extracting and converting the numeric rating from the HTML content.
Automatically extracts detailed information about clothing products from Amazon, such as product names and details.
Allows adding of specific URLs as training cases to test and refine the parser's ability to extract data accurately.
Generates custom Python code to facilitate specific data extraction tasks using libraries like Playwright and CSS Selectors.
Automatically extracts detailed product information such as price, product name, and average review from Amazon links.
Allows users to add training cases for improving parser accuracy using specific Amazon product pages.
Users can regenerate the parsing code based on updated training cases or requirements.
Utilizes a Python-based parser to extract product details from websites using Playwright for browser automation.
Allows users to add and manage training cases for the parser, with URLs to example pages and status indicators for completion.
Automatically generates Python code based on the prompt and provided training cases, using libraries like Playwright and lxml to fetch and parse data.