What is a URL Node?

The URL Node allows users to add a URL to the flow and scrape the HTML or Metadata of a website to use as an input to the LLM. If an LLM node returns a URL as its output, it can feed into the URL node to scrape a website in a more complex workflow.

Mode

Defines what type of content to fetch from the provided URL.

  • Page HTML: Downloads the full HTML content of the page. Suitable for use cases like content parsing, summarization, or extraction of visible page elements.
  • Metadata only: Fetches only metadata (e.g., <title>, <meta> tags such as description and Open Graph data). Useful for lightweight previews or indexing.

Scrape Subpages

When enabled, the node will attempt to crawl and include linked subpages within the same domain.

Enable URL as Input

When checked, this enables dynamic input of URLs via upstream nodes or user input, rather than hardcoding a static value in the interface.

How to use the URL Node

  1. Add a URL node to your flow.
  2. Connect the URL node to an LLM node.
  3. Mention the URL node in the LLM node by pressing ”/” and selecting the URL node.
  4. Add an Output node to your flow.
  5. Connect the Output node to the LLM node.

URL Node Settings

If you click the gear icon in the node, you will see the available settings.

Chunking Settings

  • Chunking Algorithm: Defines how the data is split (e.g., Sentence-based).
  • Chunk Overlap: The number of overlapping tokens between chunks.
  • Chunk Length: Max length of each chunk sent to the LLM.

Additional Features

  • Advanced Data Extraction: Enable more precise field-level parsing (toggle option).
  • Text in Images (OCR): Extract and include text from profile images or banners (toggle option)

How to expose the URL node to your users

  1. Go to the Export tab.
  2. Enable the URL Node in the Inputs section.
  3. Press Save Interface to save your changes.
  4. Your users should now see a URL input field in the interface.