Hugo's Blog

html
css
jinja
python

Making a blog website

A very heartily welcome to my first blog post ever! I created this blog for two main reasons: 1. To force myself to think deeply about programming decisions (after all, it must somehow become a coherent narrative). And 2. as a way of putting myself out there as a software and computer science enthusiast, something that will hopefully lead to meeting new people. So now, what a better way to start of a programming blog by writing a piece on how I created this website itself? Do note that I do not plan to keep editing and updating this post going forward, so if you are from the future the website may well be a little different than described here. Yet I think it can serve as an interesting write up of the experience of such an undertaking.

What do we want

Let's kick it off with defining our requirements:

The old way is the gold way: static html

Initially I started building a setup in NodeJs with nunjucks (a Jinja templating engine). Whilst I rarely use javascript, I've always been attracted to the functional syntax and asynchronous nature of NodeJs. However, in a discussion with a former peer student who has some years of serious experience in DevOps, I learned something new. Namely that since data changes are rare, and request agnostic, a static site would be more than adequate. Of course we can still use templates but the difference is that we 'build' (state -> html) only when we change or add something. This idea would also tie in nicely with our low maintenance requirement since we only need to care about keeping a trivial nginx server alive, which I know to be quite trivial.

Incidently, I also played a small role in making a static photo gallery for my study association which was to be placed behind a login to abide by GDPR guidelines. This project contained a few templates along with a python applications to build and manage the website and it's state. I decided to borrow heavily from this setup. This project is also open source by the way (github)

Our plan

  1. Define the state
  2. Create Jinja2 templates to render the state
  3. Create a python based cli to manage the state and build the site
  4. Create an nginx configurations to serve files (without extensions in uri)

1. Defining the state

Being a very simple blog, it seems adequate that our state is merely a list of posts. Let's define our types using the marvelous dataclasses introduced into python3.7:

from __future__ import annotations
from dataclasses import dataclass

@dataclass
class Post:
    number: int
    title: str
    postedAt: datetime

@dataclass
class State:
    posts: List[Post]

Also, be warned. I use type annotations in python. They do not have any effect on the functionality but are used to enforce a typesystem using static analysis. You can read more about it here: mypy.

Also, I left out the code for json parsing here since it is quite straightforward. The state file might look something like this:

{
    "posts": 
    [
        {
            "number": 0,
            "title": "test_post",
            "postedAt": "2019-11-01T16:54:42"
        },
        {
            "number": 1,
            "title": "Making a blog website",
            "postedAt": "2019-11-01T17:22:20"
        }
    ]
}

2. Creating templates

Blog posts I love are those that contain clear, copyable code snippets. It goes then without saying that I should adhere by the same standard. My aforementioned friend that advised to build a static site, also recommended pygments to convert plain text code into syntax highlighted html. The great thing about Jinja is the flexible way in which you can provide your template with custom functions. This means that we can write our highlight function the way we normally would in python and then simply inject it into our Jinja context.

def highlight(lang: str, code: str) -> str:
    formatter = HtmlFormatter()
    lex = pygments.lexers.get_lexer_by_name(lang)
    return str(pygments.highlight(code, lex, formatter))

def build() -> None:
  # Create a Jinja context with /design/pages/ as root
  env = jinja2.Environment(loader=jinja2.FileSystemLoader("design/pages/"))
  env.globals["highlight"] = highlight
  # Then simply render all templates to eponymous files
  # in the build directory

Now in our template we can easily call that function. However, because our code is generally a multi-line string contained all kinds of different strings themselves, it would be a strenuous task to somehow pipe all that text as argument without messing with indentation, special characters, etc. Not to mention the appalling syntax we would have to swallow writing blog posts! Instead I found this beautiful way to call a macro in jinja

<!-- imported somewhere in the base template -->
{% macro code(language) -%}
    {{ highlight(language, caller()) }}
{%- endmacro %}

<!-- while writing a blog post -->
{% call macros.code("python") %}
  def main():
    print("my source code is not a nice string")
    return 0;
{% endcall %}

Effectively, you can use the caller() method to load in the literal string that was typed between the call block. This circumvents the need of awkwardly passing code containing all kinds of weird characters as argument to the macro directly.

At this time I was introduced to another problem however; code segments are placed between 'pre' (preformated) tags. That means that the indent I so nicely added to my templates to keep things readable, where now also showing up in the final result. Evidently I could not simply remove all white space from the start of every line because I want to be able to present indented code snippets in a frivolous attempt to make the impression that I write tidy code. Fixing this problem would require us to consider all the lines of code in the snippet together. What we want to the remove the leading indent of the snippet, or in other words remove all indent that is shared between all the lines. Let's fix our highlight function so to satisfy this new constraint:

def highlight(lang: str, code: str) -> str:
    formatter = HtmlFormatter()

    # special all() function that return false for empty sets, 
    # otherwise we never terminate in the situation that the 
    # code block contains only whitespace for example)
    alln: Callable[[List[Any]], bool] = lambda l: all(l) and len(l) > 0

    # Remove all shared whitespace
    lines = code.split("\n")
    while alln([str(x)[0].isspace() for x in lines if len(x) > 0]):
        for i in range(len(lines)):
            if len(lines[i]) > 0:
                lines[i] = lines[i][1:]
    code = "\n".join(lines)
    lex = pygments.lexers.get_lexer_by_name(lang)
    return str(pygments.highlight(code, lex, formatter))

3. Management CLI

This will be the gatekeeper to our state. It is this cli we will use for building the site as well as creating new posts.

Over the years I've created loads of small cli programs. But until recently they always sucked. I've had dotnet projects with 30 different attributes all doing weird type reflecion logic to slowly consume a command and eventually select a method to execute.

But all my troubles seemed to have been solved by the wonderful click package. It allowed me to effortlessly create commands and subgroups. Furthermore it can also be configured that an argument should be path, allowing your favorite terminal to auto complete your commands. Eventually I came to the following set of commands:

blog init          # create an empty state.json file
blog build         # copy and render source files to build location
blog preview       # serve build directory in browser
blog post create   # copy post template to new directory in the source and append to the state
blog post edit     # open a post's directory in vim 
blog post delete   # remove a post from the state

4. Nginx configuration

Now that we have everything setup, we just need a way to serve the files. For this we only need a simple and short nginx configuration that simply looks for the file specified in the uri. To avoid having ugly looking uri's that end in .html, we simply leverage the fact that nginx simply looks for index.html if no file is specified. So you, my dear reader, only see /posts/1 for example, which in actuality is interpreted as /posts/1/index.html. Note that in reality of course, we use a slightly longer configuration to enable ssl and redirect http to https, but that is outside the scope of this post.

http {
  include /etc/nginx/conf/mime.types;
  default_type application/octet-stream;

  server {
    listen 80;

    root /var/www/blog;
    index index.html index.htm index.php;

    server_name blog;

    location / {
      try_files $uri $uri/ =404;
    }
  }
}
events { }

We want more, but mmm

During any coding project I find myself always arriving at 'that' moment; You want to add a feature but it just doesn't really play nice with the existing codebase, causing you to edit way more source files than intended. In this case it was the feature of showing the last modified time the bottom of a post's page. Getting the information is quite straightforward using Pathlib, but there is no slot for in our property state type. If you are familiar with python you might think 'why not set the property at runtime? Isn't that what makes python so nice?'. You be very right, this is an acceptable solution in the python world, but it does mean that you break your predefined type, and we committed to having a sound typesystem. Therefore, we need not mutate our existing state, but rather create a new extended type. Previously we could think of our pipeline as follows:

build :: "state.json" -> State -> html 

But we now want to fetch extra data at runtime and add this to the state. To keep our typesystem as an ally we therefore have to adjust our plan to this:

build :: "state.json" -> State -> PreparedState -> html

We need to then create another set of types to represent this new state. While we're at it I also decided to leverage this newfound functionality to load the abstract of a post from a separate file during that same transformation, this way I prevent myself from ever having to directly edit the state.json file, as well as keeping it clean and relevant post information bundled.

# When inheriting from dataclasses, the constructor 
# parameters are a union of the parent's and the child's.
@dataclass
class PreparedPost(Post):
    modifiedAt: datetime
    abstract: str

@dataclass
class PreparedState:
    posts: List[PreparedPost]

And a finally a function for our conversion step:

def prepareState(state: State, root: Path) -> PreparedState:
    posts_dir = root / "posts"
    prepared_posts = []
    for post in state.posts:
        post_path = posts_dir / str(post.number)

        # Recursively finds the files in that path and
        # return the most recent modification date.
        modifiedAt = lastModified(post_path)

        # load the abstract
        abstract = "abstract not available"
        abstract_path = post_path / "abstract.html"
        if abstract_path.exists():
            with open(abstract_path, "r") as f:
                abstract = f.read()

        prepared_posts.append(
            PreparedPost(
            post.number,
            post.title,
            post.postedAt,
            modifiedAt,
            abstract)
        )

    return PreparedState(prepared_posts)

And that's it! We now have access to two more properties when rendering posts in the template. Although it might seem a little tedious to have created a new type, I do think that the redundancy there is a good trade off for making the fetching of extra data a very explicit and maintainable step. In general, it seems to be beneficial to think (and code) in terms of transformations or mappings instead of mutation.

Looking back

Let's take a look at what we have:

.
├── design
│   │
│   ├── index.html
│   ├── posts
│   │   ├── 0
│   │   │   ├── index.html
│   │   │   └── abstract.html
│   │   └── 1
│   │       ├── index.html
│   │       └── abstract.html
│   ├── templates
│   │   ├── base.html
│   │   ├── footer.html
│   │   ├── macros.html
│   │   ├── nav.html
│   │   └── post.html
│   └── public
│       ├── pygments.css
│       └── style.css
├── ignore
│   └── build
│       ├── index.html
│       ├── posts
│       │   ├── 0
│       │   │   └── index.html
│       │   └── 1
│       │       └── index.html
│       └── public
│           ├── pygments.css
│           └── style.css
├── main.py
├── Pipfile
├── Pipfile.lock
├── README.md
├── src
│   ├── cli.py
│   ├── generate.py
│   ├── state.py
│   └── __init__.py
└── state.json

We have the our design directory that contains all of our templates and other files that I might need in any future posts, conveniently boxed in their own subdirectory. In our build folder we have now have the exact same tree structure except that the template are now rendered with the data provided by the state.json file (also I deleted the templates directory). I am particularly pleased by the simplicity of it all. Now we have all the time in the world to fiddle with the layout and conjure a custom 404 page with images of cute animals.

And this concludes the project. If you want to take a look at the source you can find that over here on Github. If you enjoyed it, hated it, have any suggestions, or just want to say hi, feel free to email me!

Gitalking ...

last modified: May 29, 2023 at 13:07