Skip to content

Downloading Tools (URL Content Downloader)

The DownloadingTools toolkit downloads content from URLs and returns either:

  • extracted text/markdown/html, or
  • minimal metadata for binary content (or when extraction is not possible)

It supports optional BYPARR integration for anti-bot bypass.

All public functions return strings.

📦 Installation

This toolkit requires markitdown (imported at module load time):

pip install markitdown

🤖 AI Agent Setup (Agno)

from agno.agent import Agent
from enhancedtoolkits import DownloadingTools

agent = Agent(
    name="Content Researcher",
    model="gpt-4",
    tools=[
        DownloadingTools(
            byparr_enabled=None,      # None = auto-detect via env
            max_retries=3,
            timeout=30,
            user_agent_rotation=True,
            enable_caching=False,
        )
    ],
)

⚙️ Configuration

Constructor parameters for DownloadingTools:

Parameter Type Default Notes
byparr_enabled bool \| None None If None, uses env BYPARR_ENABLED
max_retries int URL_DOWNLOADER_MAX_RETRIES Clamped internally (1..10)
timeout int URL_DOWNLOADER_TIMEOUT Clamped internally (5..300)
user_agent_rotation bool True Rotates headers/user agents
enable_caching bool False In-memory per-instance cache (no TTL)

Environment variables

  • BYPARR_URL (default http://byparr:8191/v1)
  • BYPARR_TIMEOUT (default 60)
  • BYPARR_ENABLED (true/false)
  • URL_DOWNLOADER_MAX_RETRIES (default 3)
  • URL_DOWNLOADER_TIMEOUT (default 30)

📥 Available Functions

get_file_from_url(url, output='auto')

Downloads and processes the URL.

output options (see DownloadingTools.SUPPORTED_FORMATS):

  • auto
  • markdown
  • text
  • html
  • binary

access_website_content(url, output='auto')

Alias for get_file_from_url().

download_multiple_urls(urls, output='auto')

Downloads up to 10 URLs and returns a JSON string of results.

get_url_metadata(url)

Makes a HEAD request and returns a JSON string with basic metadata.

check_url_accessibility(url)

Makes a HEAD request and returns a JSON string with accessibility + timing.

✅ Examples

from enhancedtoolkits import DownloadingTools

downloader = DownloadingTools(byparr_enabled=False)

md = downloader.get_file_from_url("https://example.com", output="markdown")
meta = downloader.get_url_metadata("https://example.com")

batch = downloader.download_multiple_urls(
    urls=["https://example.com", "https://example.org"],
    output="text",
)

API Reference