The Ingestion API allows you to directly send arbitrary documents to Hymalaia’s backend for indexing and search. This is useful for cases where data doesn’t originate from an existing connector or when you want to supplement/override specific content.


πŸ“š Typical Uses

Use the Ingestion API when:

  • You have documents not tied to any connector but useful for search.
  • You want to programmatically ingest documents instead of setting up a connector.
  • You want to edit existing docs without altering their original source.
  • You want to enhance connector docs (e.g., attach a README to a GitHub project).

πŸš€ Example: Ingesting a Web Document

This example sends a document of type "web" to Hymalaia via curl.

curl --location 'localhost:8080/hymalaia-api/ingestion' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR_API_TOKEN>' \
--data '{
  "document": {
    "id": "ingestion_document_1",
    "sections": [
      {
        "text": "This is the contents of the document that will be processed and saved into the vector+keyword document index. ",
        "link": "https://docs.hymalaia.app/introduction#what-is-hymalaia"
      },
      {
        "text": "You can include multiple content sections each with their own link or combine them. ",
        "link": "https://docs.hymalaia.app/introduction#main-features"
      }
    ],
    "source": "web",
    "semantic_identifier": "Hymalaia Ingestion Example",
    "metadata": {
        "tag": "informational",
        "topics": ["hymalaia", "api"]
    },
    "doc_updated_at": "2024-04-25T08:20:00Z"
  },
  "cc_pair_id": 1
}'

ℹ️ Note: The bearer token is generated at server startup in Hymalaia MIT. For more robust auth, API Keys are available in Hymalaia EE.


πŸ” Field Breakdown

FieldDescription
idUnique document ID. If omitted, it’s generated from semantic_identifier. Existing docs with same ID are updated.
sectionsList of content sections. Each has text and optionally a link. Sections are used for chunking and influence search results.
sourceSource type (e.g. "web"). Full list found under DocumentSource in Hymalaia code.
semantic_identifierActs as the title of the document in the UI.
metadataMetadata such as tag or topics. These are displayed as document tags. Accepts string or array.
doc_updated_atTimestamp of last update. Hymalaia uses this to apply recency-based scoring.
cc_pair_idConnector ID the doc should belong to. Use 1 for default. This links the doc to connector groups and deletion.

πŸ“₯ Checking Ingested Documents

You can view all documents that have been indexed through the ingestion API using the corresponding endpoint (e.g. /hymalaia-api/ingestion-docs).


πŸ“˜ See Also


Need help or want to go deeper? Ping the Hymalaia team on Slack or Discord!