Demo World! Markdown to HTML

April 14, 2020 — Josh

I've just redone my blog.

One starry night I decided that I wanted to be able to write posts in Markdown, so I went on a quest looking for a utility to help me do that. It turned out that people really like having multiple dependencies for their tools! My poor computer only has ~4GB of disk space left at all times, so that dissuaded me quite efficiently. Then I got an idea...


An idea

An Idea


Beware The Idesa of March

Alright that's enough idea-ing I think you get the point. It dawned on me that I could just maybe write myself a tool to do this, and that way I wouldn't have to download anything. I know, ridiculous. The first thing that came to mind was how easy it would be to parse headings into a corresponding html tag. I mean, consider this:

This is the syntax for headings in Markdown

# H1
## H2
### H3
<!-- HTML corresponding to above MD -->

<h1> H1 </h1>
<h2> H2 </h2>
<h3> H3 </h3>

It's essentially a for loop and an if statement, not so bad. So, I did that, and it actually worked. I spent that night further spaghetti-ing mycode, turning it from a smol angel hair, to a spaghetti, and eventually to a full blown lasagna (a single sheet or the entire dish, your choice).

Long story short, I ended up writing PHD, to which this post is an ode to, coz I'm using it right now. There are still a few bugs, and my focus wasn't being fully compatible with the Markdown spec (there are too many Markdowns and none of them are consistent), though I ended up getting most of the features down despite not setting out to initially. I also didn't think there'd be any way in hell my code would scale well cause I overused C++ STL string functions that are secretly computationally complex, but lo and behold it parsed a 3000 line Markdown file to HTML without any noticeable wait time. Sick!


gangsta euripides
Say hello to our logo, Gangsta Euripides

That's enough talk, I'll show you the feature list and demonstrate each point.


Features


Quick demo

Damn, I actually found 2 serious bugs while typing that out, one of which is now resolved. Argh, I say.

I'll end the PHD section with a snippit of a program, as code blocks are really the most exciting thing about this to me.

#include <iostream>
#include <stack>

using namespace std;

int main()
{
        stack<string> pancakes;

        pancakes.push("Blueberry");
        pancakes.push("Strawberry");
        pancakes.push("Chocolate Chip");

        while(!pancakes.empty())
        {
                cout << pancakes.top() << endl;
                pancakes.pop():
        }

        return 0;
}

Very nice, if I say so myself. I'm in a chocolate chip kind of mood rn.

Next I'll briefly discuss how I restrucured the blog, and then you can go home.


W3Schools Actually Does Something Cool For Once

You heard it here first folks.

Imagine this: you write a blog post in Markdown, convert it to HTML, and then simply "include" the new post in your main page. No copy pasting, no monkey business. That's what I was imagining. Unfortunately that wasn't as easy to do as it sounds, as including HTML isn't really a thing, but thankfully W3Schools pulled through with a nice little JavaScript function that does just that. The function works by sending a XHTTPRequest to your server asking for the requested file, and then replacing an HTML placeholder you create on the main page with the contents of the file.

It looks a bit like this:

<!-- header and stuff -->

<body>
<!-- other stuff -->
<div w3-include-html="february.html"></div>
<!-- other stuff -->

<script src="scripts.js">includeHTML()</script>
</body>

This solves the problem of not having to copy paste posts into the index.html, but it also leaves 2.5 new problems in its wake, since I wanted the posts to also be viewable individually on their own pages.

Problems

  1. Converting Markdown to HTML yields a file that only contains the post content, there are no containers and there's no page structure, just a page with text on it.

  2. HTML files have headers with metadata, will including this in a file with its own metadata cause any issues?

  3. If the post HTML files are in a different location than the index.html file they're being included in, relative paths to files can't be used.

Solutions

Problem 1: We can write a function that inserts the containers/page structure into the HTML

Note: I just copied the containers from my index.html page

function inject_blog_structure()
{
	let body = document.getElementsByTagName("BODY")[0];

	let above = "<div id=\"blog\">\r\n\r\n\t\t<div id=\"header\">\r\n\t\t\t<h1 id=\"tit\">Corduroy\'s Meditations\
	<\/h1>\r\n\t\t\t<p id=\"desc\">A cozy spot to sit and think :-)<\/p>\r\n\t\t<\/div>\r\n\r\n<div id=\"content\">"

	let below = "<!-- FOOTER -->\r\n<p id=\"foot\"><a onclick=\"window.scrollTo(0, 0);\">\u2191 Back up top! \u2191<\/a><\/p>\r\n\r\n<div id=\"footer\">\r\n\t\
	With love, from <a href=\"https:\/\/github.com\/joshnatis\">Josh<\/a> &mdash; <a href=\"mailto:josh&#64;josh8&#46;com\">josh&#64;josh8&#46;com\
	<\/a>\r\n\t<br><br>\r\n\tPosts generated with <a href=\"https:\/\/github.com\/joshnatis\/phd\">phd<\/a>, a markdown to html parser I wrote :)\
	<\/div>\r\n<\/div>\r\n<!-- FOOTER -->\r\n\r\n\r\n\<\/div> <!-- END CONTENT -->\r\n"

	body.innerHTML = above + body.innerHTML + below;
}

Then simply call the script at the bottom of each post: <script>inject_blog_structure();</script>

Problem 2: Decapitate them

borat, very nice

//removes all html not within body tag
function decapitate(html_content)
{
	let content = html_content.split("<body>");

	if(content.length >= 2) //in the case that <body> exists
	{
		content = content[1].trim();

		//remove title from post because we have the title in index.html already
		if(content.substring(0, 2) === "<h")
			content = content.substring(content.indexOf("\n") + 1);

		content = content.split("</body>");
	}

	return content[0];
}

//NOTE: i call this function in the includeHTML function on the response text from the server

It's not much, in fact it sucks, but it's honest work

Problem 3: Easy! Just use absolute paths and everything will be fine. Wait, what'd you say? For some reason this will work on localhost but not on GitHub pages? But that doesn't make any sense! Ok fuck it just put all of the posts and the index.html in the root directory so the paths are the same when posts are included and when they're viewed individually.


That is all

I hope you enjoyed, that's the end. My workflow to make a new post is as follows:

  1. Write post in Markdown

  2. Call ./phd post.md post.html

  3. Make any slight manual adjustments to HTML if necessary

  4. Insert <div w3-include-html="hello-world.html"></div> into index.html

  5. git add ., git commit -m "Imperative sentence for no good reason", git push (and don't forget to do git pull first because you definitely updated your README.md and forgot, so you'll get a merge conflict).

Here are some resources if you wanted to create a similar setup:


Notes

* bugs found while making this post: 4 (add 1 if you count the one crawling on my wall)

* if you're interested in trying this out, be 'ware that XHTTP GET requests won't work locally (like, if you just open ~/blog/index.html in your browser). To test locally you can start a server on localhost. Here's how I do it:

Make a script, name it server or something
#!/bin/bash

DIRECTORY="$1"
if [ -z "$DIRECTORY" ]; then
        echo "Usage: server <directory_path>"
        exit
fi

cd "$DIRECTORY"

python3 -m http.server 8080

Call it: server /path/to/blog/directory, then open localhost:8080 in your browser.