How it works

  • The Web Connector scrapes sites based on the base URL.
  • It only indexes files from the same domain and base path.
  • The connector indexes pages that are reachable via hyperlinks starting from the base URL.
  • The text contents are cleaned up using heuristics, and metadata (e.g., page Title) is extracted.

Setting up

Authorization

No additional authorization is necessary as long as the page is reachable.

Indexing

  1. Navigate to the Admin Dashboard and select the Web Connector.
  2. Input the base URL to index and click Index.
  3. To check the status of the indexing, visit the Connectors Status page (top left).

Once indexing is complete, the content from the provided web pages will be available for search in Hymalaia.