Up(sun) and ready with Pandoc
With the enthusiasm for AI assistants, many folks are looking for these assistants to provide technical information about your product.
After years of remarkable achievements through web 2.0 and its famous robots.txt
, security.txt
, and humans.txt
,
a new standard has been proposed to the web ecosystem and will soon become essential for the web: llms.txt
.
LLMs.txt was conceived by Jeremy Howard, co-founder of Answer.AI, to address a fundamental challenge in AI-human interaction.
When AI assistants attempt to process standard web pages, they struggle with non-essential elements like navigation menus, scripts, and styling. These elements consume valuable context space without contributing to the actual content understanding. LLMs.txt provides an elegant solution: it delivers precisely curated information in a format that AI systems can efficiently process and understand.
If you need to convert files from one markup format to another, Pandoc is your swiss-army knife.
Developed by John MacFarlane, Pandoc is a Haskell library for converting from one markup format to another
and John provides in this pandoc
repo a command-line tool that uses this Pandoc library.
Easy to install and ready to convert.
In this How-Tos guide, we will see how to install this pandoc
command line tool on your Upsun project.
Assumptions:
- You already have an Upsun account. If you don’t, please register for a trial account. You can sign up with an email address or an existing GitHub, Bitbucket, or Google account. If you choose one of these accounts, you can set a password for your Upsun account later.
- You have the Upsun CLI installed locally.
- You have the Git CLI installed locally.
For this tutorial, we will start with a basic HTML application.
The main goal of this tutorial is to showcase how to install pandoc
on your project and quickly generate a llms.txt
file from your HTML pages.
Prepare your local HTML project
In order to quickly showcase the strength of Pandoc, we will simulate a simple HTML application, that you could obtained using a static website generator like Hugo. The proposed structure will be:
- config.yaml
- api.html
- applications.html
- index.html
To do so, in your Terminal, execute the following commands:
mkdir my-html-app
cd my-html-app
mkdir public
curl -L https://raw.githubusercontent.com/upsun/snippets/refs/heads/main/src/llms/html-app-example.tar.gz | tar -xvz - -C public
git init && git add . && git commit -m "init HTML app"
html-app-example.tar.gz
file contains all HTML files (index.html
, ./learn/*.html
) in this llms
folder.Init your Upsun config
Upsun CLI provides a command to initialize a basic config for your local code. As it is a simple HTML app, we will generate a minimum configuration file using the following command:
➜ my-html-app git:(main) upsun project:init
Welcome to Upsun!
Let's get started with a few questions.
We need to know a bit more about your project. This will only take a minute!
What language is your project using? We support the following: [JavaScript/Node.js]
Tell us your project's application name: [app]
(\_/)
We’re almost done... =(^.^)=
Last but not least, unless you’re creating a static website, your project uses services. Let’s define them:
Select all the services you are using: []
You have not selected any service, would you like to proceed anyway? [Yes]
┌───────────────────────────────────────────────────┐
│ CONGRATULATIONS! │
│ │
│ We have created the following files for your: │
│ - .environment │
│ - .upsun/config.yaml │
│ │
│ We’re jumping for joy! ⍢ │
└───────────────────────────────────────────────────┘
│ /
│/
│
(\ /)
( . .)
o (_(“)(“)
Please select
Javascript/Node.js
- application name:
app
- no service selected
Your HTML application is almost ready to be deployed on Upsun, one more step to go.
Update this config line into the newly created .upsun/config.yaml
file for the router to point to your public
folder:
|
|
and then commit your updates:
git add .upsun/config.yaml && git commit -m "change locations.root to the public folder"
Create an Upsun project
You then need to create an Upsun project by executing these commands and follow the prompts:
upsun project:create
upsun push
Install Pandoc
There is to ways to install pandoc
on your project:
Using a shell script
John MacFarlane provides in his Pandoc repo a quick and easy way to install Pandoc.
We’ve prepared a shell script for you (source) that can be used to install the latest version of Pandoc.
Update your .upsun/config.yaml
file and add this curl call in your applications.app.hooks.build
step:
|
|
The install-pandoc.sh
script installs the pandoc
binary from Pandoc repo in the /app/.global/bin
folder of your application container.
Using Composable image
The Upsun Composable image provides enhanced flexibility when defining your app. It allows you to install several runtimes and tools in your application container, in a “one image to rule them all” approach.
The composable image is built on Nix and the good is Pandoc package is available.
Update your .upsun/config.yaml
by commenting default type
parameter and by adding the following lines:
|
|
And then, deploy your updates:
git add .upsun/config.yaml .environment && git commit -m "install Pandoc"
upsun push
Test it
You can now use pandoc
in your project to generate a public/llms.txt
file that will concatenate all the HTML pages in Markdown.
Update your .upsun/config.yaml
by adding the following lines:
|
|
This pandoc $(find...
command concatenates all existing .html
files located in the public
folder in a single ./public/llms.txt
file and convert them to Markdown.
And then, deploy your updates:
git add .upsun/config.yaml && git commit -m "Use Pandoc to generate a public/llms.txt file"
upsun push
Test it works by accessing the file by adding /llms.txt
to your environment URL:
upsun env:url --primary
Conclusion
Et voilà, we saw how to use pandoc
to convert all existing HTML pages into a single Markdown public/llms.txt
file. Now, perhaps the next step would be to train an AI Assistant with the file llms.txt
…
Stay tuned.
Discover how to deploy a personal Chainlit AI assistant on Upsun by reading this great blogpost: Experiment with Chainlit AI interface with RAG on Upsun