Up(sun) and ready with Pandoc

Up(sun) and ready with Pandoc

February 12, 2025· Florent Huck
Florent Huck
·Reading time: 6 minutes

With the enthusiasm for AI assistants, many folks are looking for these assistants to provide technical information about your product. After years of remarkable achievements through web 2.0 and its famous robots.txt, security.txt, and humans.txt, a new standard has been proposed to the web ecosystem and will soon become essential for the web: llms.txt. LLMs.txt was conceived by Jeremy Howard, co-founder of Answer.AI, to address a fundamental challenge in AI-human interaction.

When AI assistants attempt to process standard web pages, they struggle with non-essential elements like navigation menus, scripts, and styling. These elements consume valuable context space without contributing to the actual content understanding. LLMs.txt provides an elegant solution: it delivers precisely curated information in a format that AI systems can efficiently process and understand.

If you need to convert files from one markup format to another, Pandoc is your swiss-army knife. Developed by John MacFarlane, Pandoc is a Haskell library for converting from one markup format to another and John provides in this pandoc repo a command-line tool that uses this Pandoc library. Easy to install and ready to convert.

In this How-Tos guide, we will see how to install this pandoc command line tool on your Upsun project.

Assumptions:

  • You already have an Upsun account. If you don’t, please register for a trial account. You can sign up with an email address or an existing GitHub, Bitbucket, or Google account. If you choose one of these accounts, you can set a password for your Upsun account later.
  • You have the Upsun CLI installed locally.
  • You have the Git CLI installed locally.

For this tutorial, we will start with a basic HTML application. The main goal of this tutorial is to showcase how to install pandoc on your project and quickly generate a llms.txt file from your HTML pages.

Prepare your local HTML project

In order to quickly showcase the strength of Pandoc, we will simulate a simple HTML application, that you could obtained using a static website generator like Hugo. The proposed structure will be:

    • config.yaml
      • api.html
      • applications.html
    • index.html
  • To do so, in your Terminal, execute the following commands:

    Terminal
    mkdir my-html-app
    cd my-html-app
    mkdir public
    curl -L https://raw.githubusercontent.com/upsun/snippets/refs/heads/main/src/llms/html-app-example.tar.gz | tar -xvz - -C public
    git init && git add . && git commit -m "init HTML app"
    🚨 Please note: This html-app-example.tar.gz file contains all HTML files (index.html, ./learn/*.html) in this llms folder.

    Init your Upsun config

    Upsun CLI provides a command to initialize a basic config for your local code. As it is a simple HTML app, we will generate a minimum configuration file using the following command:

    Terminal
    ➜  my-html-app git:(main) upsun project:init
    Welcome to Upsun!
    Let's get started with a few questions.
    
    We need to know a bit more about your project. This will only take a minute!
    
    What language is your project using? We support the following: [JavaScript/Node.js]
    
    Tell us your project's application name: [app]
    
    
                           (\_/)
    We’re almost done...  =(^.^)=
    
    Last but not least, unless you’re creating a static website, your project uses services. Let’s define them:
    
    Select all the services you are using: []
    
    You have not selected any service, would you like to proceed anyway? [Yes]
    
    ┌───────────────────────────────────────────────────┐
    │   CONGRATULATIONS!                                │
    │                                                   │
    │   We have created the following files for your:   │
    │     - .environment                                │
    │     - .upsun/config.yaml                          │
    │                                                   │
    │   We’re jumping for joy! ⍢                        │
    └───────────────────────────────────────────────────┘
             │ /
             │/
      (\ /)
      ( . .)
      o (_(“)(“)

    Please select

    • Javascript/Node.js
    • application name: app
    • no service selected

    Your HTML application is almost ready to be deployed on Upsun, one more step to go.

    Update this config line into the newly created .upsun/config.yaml file for the router to point to your public folder:

    .upsun/config.yaml
    1
    2
    3
    4
    5
    6
    7
    8
    
    applications:
      app:
        web:
          locations:
            "/":
              root: "public"
              index: ["index.html"]
              passthru: true

    and then commit your updates:

    Terminal
    git add .upsun/config.yaml && git commit -m "change locations.root to the public folder"

    Create an Upsun project

    You then need to create an Upsun project by executing these commands and follow the prompts:

    Terminal
    upsun project:create
    upsun push

    Install Pandoc

    There is to ways to install pandoc on your project:

    Using a shell script

    John MacFarlane provides in his Pandoc repo a quick and easy way to install Pandoc.

    We’ve prepared a shell script for you (source) that can be used to install the latest version of Pandoc. Update your .upsun/config.yaml file and add this curl call in your applications.app.hooks.build step:

    .upsun/config.yaml
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    applications:
      app:
        type: "nodejs:20"
        #...
        hooks:
          build: |
            set -x -e
            #...
            curl -fsS https://raw.githubusercontent.com/upsun/snippets/refs/heads/main/src/install-github-asset.sh | bash /dev/stdin "jgm/pandoc" 
            pandoc -v        

    The install-pandoc.sh script installs the pandoc binary from Pandoc repo in the /app/.global/bin folder of your application container.

    Using Composable image

    The Upsun Composable image provides enhanced flexibility when defining your app. It allows you to install several runtimes and tools in your application container, in a “one image to rule them all” approach.

    The composable image is built on Nix and the good is Pandoc package is available.

    Update your .upsun/config.yaml by commenting default type parameter and by adding the following lines:

    .upsun/config.yaml
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    applications:
      app:
        #type: "nodejs:20"
        stack: 
          - pandoc
        #...
        hooks:
          build: |
            set -x -e
            pandoc -v        

    And then, deploy your updates:

    Terminal
    git add .upsun/config.yaml .environment && git commit -m "install Pandoc"
    upsun push

    Test it

    You can now use pandoc in your project to generate a public/llms.txt file that will concatenate all the HTML pages in Markdown. Update your .upsun/config.yaml by adding the following lines:

    .upsun/config.yaml
    1
    2
    3
    4
    5
    6
    7
    8
    
    applications:
      app:        
        #...
        hooks:
          build: |
            set -x -e
            #...
            pandoc $(find ./public -iname "*.html" -type f | sort -d) -f html -s -o "./public/llms.txt" -t markdown        

    This pandoc $(find... command concatenates all existing .html files located in the public folder in a single ./public/llms.txt file and convert them to Markdown.

    And then, deploy your updates:

    Terminal
    git add .upsun/config.yaml && git commit -m "Use Pandoc to generate a public/llms.txt file"
    upsun push

    Test it works by accessing the file by adding /llms.txt to your environment URL:

    Terminal
    upsun env:url --primary

    Conclusion

    Et voilà, we saw how to use pandoc to convert all existing HTML pages into a single Markdown public/llms.txt file. Now, perhaps the next step would be to train an AI Assistant with the file llms.txt

    Stay tuned.

    Discover how to deploy a personal Chainlit AI assistant on Upsun by reading this great blogpost: Experiment with Chainlit AI interface with RAG on Upsun

    Last updated on