Downloader Tools
enhancedtoolkits.downloader.URLContentDownloader ¶
URLContentDownloader(byparr_enabled: Optional[bool] = None, max_retries: int = URL_DOWNLOADER_MAX_RETRIES, timeout: int = URL_DOWNLOADER_TIMEOUT, user_agent_rotation: bool = True, enable_caching: bool = False, add_instructions: bool = True, **kwargs)
Bases: StrictToolkit
URL Content Downloader Tool v1.1
A production-ready universal file downloading toolkit with BYPARR integration, anti-bot bypass capabilities, and smart content processing for any file type.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
byparr_enabled | Optional[bool] | Whether to use BYPARR service (None = auto-detect) | None |
max_retries | int | Maximum number of retry attempts | URL_DOWNLOADER_MAX_RETRIES |
timeout | int | Request timeout in seconds | URL_DOWNLOADER_TIMEOUT |
user_agent_rotation | bool | Whether to rotate user agents | True |
enable_caching | bool | Whether to cache downloaded content | False |
add_instructions | bool | Whether to add LLM usage instructions | True |
Source code in src/enhancedtoolkits/downloader.py
Functions¶
access_website_content ¶
Access, download and parse Website content using URL with anti-bot bypass. Automatically detects content type and applies appropriate processing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url | str | URL to download content from | required |
output | str | Output format ("auto", "markdown", "text", "html", or "binary") | 'auto' |
Returns:
Type | Description |
---|---|
str | Parsed content in the specified format |
Raises:
Type | Description |
---|---|
URLDownloadError | If download fails |
ContentParsingError | If content parsing fails |
Source code in src/enhancedtoolkits/downloader.py
get_file_from_url ¶
Download any file from a URL with smart content processing. Uses MarkItDown for HTML content, handles binary files appropriately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url | str | URL to download file from | required |
output | str | Output format ("auto", "markdown", "text", "html", or "binary") | 'auto' |
Returns:
Type | Description |
---|---|
str | Processed content or file information |
Raises:
Type | Description |
---|---|
URLDownloadError | If download fails |
ContentParsingError | If content parsing fails |
Source code in src/enhancedtoolkits/downloader.py
download_multiple_urls ¶
Download content from multiple URLs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
urls | List[str] | List of URLs to download | required |
format | str | Output format for all URLs | 'markdown' |
Returns:
Type | Description |
---|---|
str | JSON string containing results for all URLs |
Source code in src/enhancedtoolkits/downloader.py
get_url_metadata ¶
Extract metadata from a URL without downloading full content.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url | str | URL to extract metadata from | required |
Returns:
Type | Description |
---|---|
str | JSON string containing URL metadata |
Source code in src/enhancedtoolkits/downloader.py
check_url_accessibility ¶
Check if a URL is accessible without downloading content.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url | str | URL to check | required |
Returns:
Type | Description |
---|---|
str | JSON string with accessibility status |
Source code in src/enhancedtoolkits/downloader.py
get_llm_usage_instructions staticmethod
¶
Returns detailed instructions for LLMs on how to use the URL Content Downloader.