Friday, 13 January 2012

TeX - an odd markup language

Why do I say odd?

Over the last couple of iterations at $work, I have wanted to produce some automated PDF reports from out of our database, with the option to pull in some additional images should they be available.

Now, I'm using the perl ClearPress MVC framework for the application, and it is nice and friendly in that I can Template my TeX and push that through pdflatex to get the PDF.

No problems so far. The data is therefore obtained and pushed into the template in exactly the same way as with HTML.

A nice simple bit of TeX markup and I get a nicely formatted PDF report yes?

No.

Unlike scripting languages and HTML, it doesn't seem to be doing everything top to bottom. Some of my tables end up above Section headings, images too.

What?

Now, prior to this I have had no experience with TeX in any flavour before.
I start by looking into main uses of TeX. For those not in the know, it is a markup designed for Mathematical equation rendering, and is popular with Journal entries as with simple markup given with the paper, they can change the formatting to suit the Journal.

So I start looking around the Google querying how to do formatting. The answers to queries I have seem to always take me to forums, as opposed to any sort of manual. The first answer took me to the info in the paragraph above, basically saying that we want to do your formatting, don't do it yourself. And there was an interesting discussion in a forum with one person giving a way of telling your table/image 'Put yourself there!' and another asking 'Why on Earth would you want to limit your table/image to the one position'.

Also, another one was escaping special characters. Eventually, I came to the following link The Comprehensive LaTeX Symbol List - The UK TeX Archive (pdf download) but not after having been pushed around forum answers again.

It goes to show that forums are great places for information (although, sometimes the question to Google often brings up that in favour of the manual :) ).

But again, why do I say odd?

Well it seems that TeX doesn't do Markup in quite the logical way I would do Markup (which is the way HTML does it). HTML says start at the top, display the marked up code from top to bottom, unless the markup (combined with css/javascript) tells me to do it otherwise. TeX seems to say, Start at the top, but float everything to where I think it is best, unless the markup specifies something different. This seems to keep the order of the same types of items, but doesn't necessarily keep the items where written.

Also, special characters are not always represented with a textual markup version, or an actual escaped version. i.e.

  \ needs to be \textbackslash, not \\ (as that actually means newline)
  # needs to be \#, there isn't a \texthash, and \hash doesn't always work

I think that, had I known this at the start, I'd have then had a better idea of how to query Google for info. TeX is clearly a useful Markup, and by the looks of it, very extensible (I pulled in 'grffile' in order for it to deal with spaces in filenames and paths), but it isn't as quick to pick up as html.

5 comments:

Jakub Narebski said...

You can generate PDF in other ways (though some of them go through pdfTeX / pdfLaTeX as well).

Anyway I recommend using higher-level LaTeX rather than plain TeX. This is what I know... and there if you use \includegraphics it des not float - that what 'figure' environment is for (\begin{figure} ... \end{figure}).

Jakub Narebski said...

Nowadays TeX / LaTeX StackExchange should be a good place to ask questions, besides comp.dtp.tex newsgroup and the like

Anonymous said...

Well I truly enjoyed studying it. This article provided by you is very constructive for correct planning.
Learning Resources MathlinkCubes 1000 (LER4287)

brian d foy said...

HTML is not a presentation language as you suggest. It doesn't necessarily start at the top and work its way to the bottom either even if many browsers do just that.

TeX is a layout engine. It puts things where it thinks they most likely belong. Tables and figures should break across printed pages and so on. When it encounters those objects, it looks for a place to put them where they should be safe. TeX is actually a two step process. The first pass creates the DVI (device independent) file and the second pass presents it for the particular device.

Unknown said...

Software Outsourcing Company OTS Solutions : ISO 9001 & Microsoft Certified Software Outsourcing & Development service provider company offering software Application development , IT outsourcing services to its clients across the globe.