home

Importing From Tumblr

I discovered I can export all my old posts from Tumblr (https://assorted.idiocy.org) so I figured I’d have a go at importing them into this new Hugo blog.

Alas the output from Tumblr is… Weird. I’ve used Emacs keyboard macros to remove some of the HTML boilerplate and add the Hugo frontmatter, but there’s a lot more to do.

A couple of things I’ve struggled with a bit is sorting out the timestamps and cleaning up the tags listed at the end of each page. The timestamps were a bit of a pain and I’ve had to write some custom Emacs Lisp to fix it.

(defun fix-tumblr-dates ()
  (beginning-of-buffer)
  (re-search-forward "title=\"\\([[:alpha:]]+\\) \\([[:digit:]]+\\).., \\(....\\) \\([[:digit:]]+\\):\\(..\\)\\(..\\)\"")

  (let* ((year (match-string 3))
         (month (match-string 1))
         (day (match-string 2))
         (h (mod (string-to-number (match-string 4)) 12))
         (hour (if (string-equal (match-string 6) "pm")
                   (+ 12 h)
                 (if (= h 12)
                     0
                   h)))
         (minute (match-string 5)))
    (search-forward "date=")
    (kill-line)
    (insert (format "\"%s\""
                    (format-time-string
                     "%FT%R:00%z"
                     (date-to-time
                      (format "%s %s %s %d:%s"
                              year month day hour minute)))))))

I’d already set the title to match the timestamp, since Tumblr posts mostly don’t have titles.

The tags are listed in a div with a bunch of spans, and they don’t look very good displayed that way in Hugo, so I’m converting them to a list.

(defun fix-tumblr-tags ()
  (beginning-of-buffer)
  (when (search-forward "<div id=\"footer\">" nil t)
    (kill-whole-line 2)
    (insert "<hr>\n<ul>\n")
    (replace-regexp "span[^>]*" "li")
    (search-forward "</div>")
    (kill-whole-line)
    (insert "</ul>"))
  (end-of-buffer)
  (search-backward "+++")
  (forward-line)
  (indent-region (point) (point-max)))

This isn’t ideal either, I should probably add a tags heading or something. I’ve also decided not to convert them all to Hugo tags as some of them are… Weird. I’ll probably have to come back to this again in the future.

There’s quite a lot of HTML code in the posts that needs cleaned up too, but I may have to look at it one page (of almost 800 pages) at a time.

Update:

I think I’ve got it all sorted. If you spot anything clearly wrong or weird character quoting or whatever, let me know.

Except broken links. There are a lot of broken links…