Friday, 13 January 2012

TeX - an odd markup language

Why do I say odd?

Over the last couple of iterations at $work, I have wanted to produce some automated PDF reports from out of our database, with the option to pull in some additional images should they be available.

Now, I'm using the perl ClearPress MVC framework for the application, and it is nice and friendly in that I can Template my TeX and push that through pdflatex to get the PDF.

No problems so far. The data is therefore obtained and pushed into the template in exactly the same way as with HTML.

A nice simple bit of TeX markup and I get a nicely formatted PDF report yes?


Unlike scripting languages and HTML, it doesn't seem to be doing everything top to bottom. Some of my tables end up above Section headings, images too.


Now, prior to this I have had no experience with TeX in any flavour before.
I start by looking into main uses of TeX. For those not in the know, it is a markup designed for Mathematical equation rendering, and is popular with Journal entries as with simple markup given with the paper, they can change the formatting to suit the Journal.

So I start looking around the Google querying how to do formatting. The answers to queries I have seem to always take me to forums, as opposed to any sort of manual. The first answer took me to the info in the paragraph above, basically saying that we want to do your formatting, don't do it yourself. And there was an interesting discussion in a forum with one person giving a way of telling your table/image 'Put yourself there!' and another asking 'Why on Earth would you want to limit your table/image to the one position'.

Also, another one was escaping special characters. Eventually, I came to the following link The Comprehensive LaTeX Symbol List - The UK TeX Archive (pdf download) but not after having been pushed around forum answers again.

It goes to show that forums are great places for information (although, sometimes the question to Google often brings up that in favour of the manual :) ).

But again, why do I say odd?

Well it seems that TeX doesn't do Markup in quite the logical way I would do Markup (which is the way HTML does it). HTML says start at the top, display the marked up code from top to bottom, unless the markup (combined with css/javascript) tells me to do it otherwise. TeX seems to say, Start at the top, but float everything to where I think it is best, unless the markup specifies something different. This seems to keep the order of the same types of items, but doesn't necessarily keep the items where written.

Also, special characters are not always represented with a textual markup version, or an actual escaped version. i.e.

  \ needs to be \textbackslash, not \\ (as that actually means newline)
  # needs to be \#, there isn't a \texthash, and \hash doesn't always work

I think that, had I known this at the start, I'd have then had a better idea of how to query Google for info. TeX is clearly a useful Markup, and by the looks of it, very extensible (I pulled in 'grffile' in order for it to deal with spaces in filenames and paths), but it isn't as quick to pick up as html.