Skip to main content

Datasources

Most LLM models are trained on public datasets. These datasets are large and contain a lot of information but they might not be relevant to your use case. For example, if you are building a chatbot for your store, you might want the chatbot to respond to questions about your inventory, your return policy, etc. By combining the capabilities of Language Models with your custom data, you can create powerful and personalized AI applications.

LLMStack allows you to seamlessly integrate your custom data with the pre-trained language models. You can simply upload your files, add urls or connect your external data sources and start using them as data sources in your LLM apps. LLMStack does the heavy lifting for you by chunking, tokenizing and indexing your data into the included vectorstore. You can then use the vectorstore to augment your language models and build powerful AI applications.

Datasources

Supported Data Sources

LLMStack supports the following data sources:

Local Files

You can upload your files to LLMStack and use them as data sources. LLMStack supports the following file formats:

  • Text files
  • PDF files
  • CSV files
  • DOCX, PPTX, XLSX files
  • Markdown files
  • Media files (audio, video) - extract text from media files using speech to text APIs

URLs

You can add URLs to LLMStack and use them as data sources. LLMStack will download the content from the URL and use it as a data source. You can add sitemap URLs to LLMStack and it will crawl the sitemap and download the content from the URLs in the sitemap.

External Data Sources

You can connect your external data sources to LLMStack and use them as data sources. LLMStack supports the following external data sources:

  • Google Drive
  • Amazon S3