\usemodule[mag-01] \usemodule[abr-02] \setuptyping[option=XML,palet=colorpretty] \setuptyping[option=LUA,palet=colorpretty] \definetype[typeXML][option=XML,style=type] \definetype[typeTEX][option=tex,style=type] \definetype[typeLUA][option=lua,style=type] \useattachment[sample][sample.xml] \useattachment[environment][ourenvironment.tex] %\setupinteraction[state=start] \setvariables [magazine] [ title={Getting Web Content and pdf-Output from One Source}, author={Thomas A. Schmitz}, date={Feb. 25, 2010}, ] \startbuffer[abstract] \CONTEXT's capabilities of typesetting \XML\ allow you to use the same source document for producing both a web page and typeset output. This tutorial will explain the basics of how to use a \CONTEXT\ environment that will process your \XML\ file. \stopbuffer \setupheadertexts[section] \starttext \setups [titlepage] \setups [title] Sometimes, documents that you create will have to \quotation{live} in different formats. One common requirement will be that you want to publish their content on the web and have a beautifully typeset version for printing and easier reference. \CONTEXT\ can handle \XML\ files, and with the advent of \MKIV, it has sophisticated features to filter and manipulate \XML\ documents.\footnote{If you are interested in the details, chapters XVII and XXIII of the mk manual (\hyphenatedurl{http://www.pragma-ade.nl/general/manuals/mk.pdf}) contain lots of fascinating background information. A manual for \XML\ in \MKIV\ can be found at \hyphenatedurl{http://www.pragma-ade.com/general/manuals/xml-mkiv.pdf}.} In this MyWay, I will describe the process of setting up a relatively simple \XHTML\ document so that it can be typeset by \CONTEXT. This article is the by-product of something I had to set up at my university department: we wanted to publish a document with reading assignments and bibliographical information for our students. This document will be published on our department's website,\footnote{Where it will be part of our university's CMS system, but this is irrelevant here.} but I also wanted a pdf-version that students could print out for easier reference. Maintaining and syncing two different source files (one in \HTML\ for the website, one in \TeX\ for typesetting) is terribly inefficient and error-prone, so I decided that I wanted to set up a process to typeset the \XHTML\ file with \CONTEXT. The document itself is rather simple: it contains text, a few tables, and a few images. It is given as an example that should allow and motivate you to delve further into this subject. Our source document is coded in \quotation{strict} \XHTML\ since the specs for this format (esp. that all elements be properly nested and closed) make it easier to process documents with \CONTEXT\ than \quotation{pure} \HTML. We will look at the structure of this \XHTML\ document step by step. After the \typeXML{DOCTYPE} declaration and the required \typeXML{} and \typeXML{} elements, our document begins with a heading and an introduction which contains just text: \startXML Our Document

Important Advice

Introduction

The first paragraph. It contains quoted text, emphasized text which should be rendered in italics, and bold text.

\stopXML If we want to process this file, we will need to tell \CONTEXT\ what to do with the different elements and with the document as a whole. For this, we need to write an environment file. If you have ever written something in \HTML, you can think of this file as the equivalent of an external \CSS\ file. As you may have heard, \CONTEXT\ makes use of \LUATEX, which will completely replace \PDFTEX\ in time. Many parts of \CONTEXT\ now exist in an older version (for good old \PDFTEX), which is called \MKII , and a newer version (for \LUATEX), which is called \MKIV . In most areas, there are no big differences in the user interface, but since \LUATEX\ is far superior in this area, Hans Hagen has rewritten the entire \XML\ handling mechanism from scratch. The new code allows more control over what to do with different \XML\ elements, and it is much faster for complex documents. For the time being, there is not enough documentation for beginners – hence this MyWay. It will describe how to code an environment for use with \MKIV . Our environment will basically contain two parts: \startitemize[n] \item Setups for handling the different \XHTML\ elements, tags, and attributes, \item and the setup for typesetting our document, i.e., the information which is normally contained in the preamble of your \CONTEXT\ documents. \stopitemize With that in mind, let us begin by looking at the different elements of our environment file. I will explain what they do as we go. \startTEX \startxmlsetups xml:oursetups \xmlsetsetup{\xmldocument}{*}{-} \xmlsetsetup{\xmldocument}{html|body|h1|h2|p|em|q|b}{xml:*} \stopxmlsetups \xmlregistersetup{xml:oursetups} \stopTEX We begin by defining our setups. The line \typeTEX{\startxmlsetups xml:oursetups} defines an environment for our set and names it {\tt oursetups}. The two lines within this environment tell \CONTEXT\ what to do with the different elements: the line \typeTEX{\xmlsetsetup{\xmldocument}{*}{-}} tells it to disregard everything; this way, only elements that we name explicitly will be typeset. In our case, this is useful because we do not want the \quotation{title} element of the header to be typeset. The next line lists all the elements which we {\em do} want to be processed and typeset. As we will define further elements, we will have to remember to add them to this line, or they will be silently disregarded! Finally, we \quotation{register} our setup under its name. Next, we will tell \CONTEXT\ what it should do with the different elements: \startTEX \startxmlsetups xml:html \xmlflush{#1} \stopxmlsetups \startxmlsetups xml:body \xmlflush{#1} \stopxmlsetups \startxmlsetups xml:h1 \chapter{\xmlflush{#1}} \stopxmlsetups \startxmlsetups xml:h2 \section{\xmlflush{#1}} \stopxmlsetups \startxmlsetups xml:p \xmlflush{#1}\par \stopxmlsetups \startxmlsetups xml:em \dontleavehmode{\em \xmlflush{#1}} \stopxmlsetups \startxmlsetups xml:q \quotation{\xmlflush{#1}} \stopxmlsetups \startxmlsetups xml:b \dontleavehmode{\bf\xmlflush{#1}} \stopxmlsetups \stopTEX These different setup elements are the most important part of our environment file. They tell \CONTEXT\ how to translate \XHTML\ tags into \CONTEXT\ commands. If you look at these definitions, you will see that they are not difficult to understand: for every element you want processed, you need a setup command. Every element is prefixed by the \XML: namespace; the name of the element follows. The first two commands tell \CONTEXT\ to simply \quotation{flush,} i.e., transmit the content of the \typeXML{} and \typeXML{} elements to the typesetting engine. Things become more interesting with the different headers: here, we want headings at the level of \typeXML{

} to be typeset as chapter headings in \CONTEXT. That's what the line \typeTEX{\chapter{\xmlflush{#1}}} does: it takes the content between the \typeXML{

} tags and \quotation{flushes} it as the argument of the \typeTEX{\chapter} command. \typeXML{

} elements are paragraphs; hence, they are flushed and a \typeTEX{\par} is added at the end. You'll see easily what the other setups do. Since switches such as \typeTEX{\em} and \typeTEX{\bf} need to be inside groups, we add an extra pair of braces; since they might make problems if they start a paragraph, we have to be cautious and add a \typeTEX{\dontleavehmode} at the beginning. After these \XML\ setups, the second part of our environment file contains just the normal setup for typesetting. If you are a little bit familiar with \CONTEXT, this should be easy to understand, and I won't go into the details here: \startTEX \usetypescript[termes] \setupbodyfont[termes,11pt] \setupbodyfontenvironment[default][em=italic] \setuphead[chapter][page=yes, header=empty, align=middle, after={\blank[line]}] \setuphead[section][page=no, align=middle, number=no, before={\blank[2*line]}, after={\blank[line]}] \setupindenting[medium,yes] \stopTEX This, then, is all you need if you want to process normal text in paragraphs and headings. We can now typeset our file (which we name {\tt sample.xml}) with this environment (which we call {\tt ourenvironment.tex}) with this command: {\tt context --environment=ourenvironment sample.xml} The output will be saved as {\tt sample.pdf}, and it should show all the elements we have defined. Things become a bit more complex when we want to build tables. In the document I was writing, there were two types of table, one with two columns, one with three columns. In \HTML, this is not problematic since the browser will reflow the text according to the width of the window. In a printed version, however, we want more control over the relative width of the single table columns. In order to achieve this, we need to distinguish between the two types of tables. We assign them two different {\tt class} attributes in our \XHTML\ code: \startXML
Heading One Heading Two Heading Three
A Paragraph A Title An Explanation
A A lengthy paragraph, with text in quotation marks and all sorts of other stuff.
B And yet another paragraph.
\stopXML The first thing we will have to remember is to add these elements to the top of our environment so they will get processed: \startTEX \xmlsetsetup{\xmldocument}{html|body|h1|h2|p|em|q|b|table|tr|th|td}{xml:*} \stopTEX This is important for all elements that we will use; I will assume that you remember this from now on. But how can we typeset these tables? \CONTEXT\ offers \quotation{Natural Tables.}\footnote{For details, see the Natural Tables manual at \hyphenatedurl{http://pragma-ade.com/general/manuals/enattab.pdf}.} They are quite similar in their setup to \HTML\ tables, so it is relatively easy to map this code to \CONTEXT\ code. We will use the \quotation{class} attributes to define two different setups: \startTEX \startxmlsetups xml:table \doifelse {\xmlatt{#1}{class}} {threecol} { \setupTABLE[c][1][align=right,width=.2\textwidth] \setupTABLE[c][2,3][align=right,width=.4\textwidth] \bTABLE[frame=on,split=yes] \xmlflush{#1} \eTABLE } { \setupTABLE[c][1][align=right,width=.05\textwidth] \setupTABLE[c][2][align=right,width=.95\textwidth] \bTABLE[frame=on,split=yes] \xmlflush{#1} \eTABLE } \stopxmlsetups \startxmlsetups xml:tr \bTR \xmlflush{#1} \eTR \stopxmlsetups \startxmlsetups xml:th \bTD [align=middle,style=bold] \xmlflush{#1} \eTD \stopxmlsetups \startxmlsetups xml:td \bTD \xmlflush{#1} \eTD \stopxmlsetups \stopTEX Let us look at this code in detail: first, we tell \CONTEXT\ that we want to process \typeXML{} elements: \startTEX \startxmlsetups xml:table \stopTEX Then, we use a condition to process this element. The syntax for this conditional in \CONTEXT\ is \typeTEX{\doifelse {string1} {string2} {then ...} {else ...}}:\footnote{There is an excellent article by Taco Hoekwater on system macros at \hyphenatedurl{http://tex.aanhet.net/context/syst-gen-doc.pdf}; the same material is available on the \CONTEXT\ wiki (\hyphenatedurl{http://wiki.contextgarden.net/System_Macros}) as well.} we compare \quotation{string1} to \quotation{string2.} If they are identical, the \quotation{then} branch is executed; if they are different, the \quotation{else} branch is executed. The command \startTEX \doifelse {\xmlatt{#1}{class}} {threecol} \stopTEX thus compares the value of the attribute {\tt class} of the current element (that's what \typeTEX{\xmlatt{#1}{class}} expands to) with the string \quotation{threecol.} So: if the \quotation{class} attribute is set to \quotation{threecol,} we set up a table in which the first column occupies 20\,\% of the textwidth, columns two and three 40\,\%, respectively. If it is set to any other value, we set up a table in which the first column holds 5\,\% of the textwidth and the second column the remaining 95\,\%. (If we need more different types of tables, we would have to nest such \typeTEX{\doifelse} macros). The rest is straightforward: \typeXML{} and \typeXML{
} elements are wrapped in \typeTEX{\bTD \eTD} pairs, and are formatted as bold, centered text; \typeXML{
} elements are wrapped in the corresponding commands for table rows and table cells for natural tables. Let us lok at one further point: in my tables, I wanted some cells to span several rows. How is this done? In \XHTML, there is the {\tt rowspan} attribute: \startXML
A 1 Three rows
B 2
C 3
\stopXML A similar effect can be achieved in a natural table in \CONTEXT. The syntax here is \typeTEX{\td [nr=3]}. So all we have to do is extract the value of the attribute of {\tt rowspan} and \quotation{feed} it to the {\tt nr} argument in our \CONTEXT\ table. But there is one further problem: if a \typeXML{
} element does not have a {\tt rowspan} attribute, its value does not exist, of course. We must make sure that such a non-existent value is not transmitted to the {\tt nr} argument, or \CONTEXT\ will complain about a \quotation{missing number.} We modify our definition of the \typeXML{} element: at first, we test whether {\tt rowspan} does have a numerical value; if it does, we feed this number to \CONTEXT. Again, we use one of the nifty system conditionals that \CONTEXT\ provides: \startTEX \startxmlsetups xml:td \doifnumberelse {\xmlatt{#1}{rowspan}} {\bTD [nr=\xmlatt{#1}{rowspan},align=lohi] \xmlflush{#1} \eTD} {\bTD \xmlflush{#1} \eTD} \stopxmlsetups \stopTEX You have probably understood what this code does: the command \typeTEX{\doifnumberelse} takes three arguments. It checks whether the first argument is a number; here this first argument is the attribute {\tt rowspan} of the current element. If this is a number, it will use this number as assignment for the {\tt nr} attribute in \CONTEXT's table and flush the content of the element between the table commands \typeTEX{\bTD} and \typeTEX{\eTD}. If it isn't a number (because the attribute doesn't exist), it builds a \quotation{normal} table cell without any additional arguments. So much for tables. Let us now take a look at another interesting aspect of \HTML: embedding images. Here's a typical way an image is embedded in \HTML: \startXML

hacker

\stopXML As you see, the \typeXML{} element takes attributes which define the image to be included, its width, and an alternative text which should appear in case the image does not load. We can use this text for our image caption, and it is clear that we will need the image name as well. However, there is a problem with the {\tt width} parameter: in \XHTML, it can be given either in pixels, in which case it will be given as a number only, or in percent of the containing element. These cases need a special treatment: if the width is given in pixels, we can easily use this number to give the size in points, but we will have to add the unit {\tt pt}. If it is given in percent, we will have to get rid of the \% sign (which would confuse the \TeX\ engine) and convert it to a format that \CONTEXT\ uses, which is usually in the form \typeTEX{0.x\textwidth}. This conversion could be done in \TeX, but since we are using \LUATEX, we have the convenience of the \LUA\ language, which we will use here. At first, we write a \LUA\ function that converts the value of the {\tt width} attribute:\footnote{I'm grateful to Taco Hoekwater who provided help with the lua code.} \startLUA function getmeas(s) if string.find(s, "[^0-9]") then s = s:sub(1,-2) s = s / 100 s = s.."\\textwidth" tex.sprint(tex.ctxcatcodes, s) else s = s.."pt" tex.sprint(s) end end \stopLUA Providing an introduction to the \LUA\ language is beyond the scope of this MyWay; I give just a few short explanations: Since the \XHTML\ attribute {\tt width} can either be a number or a number with a percent sign, we know that any value which contains more than just digits must be a percentage. The function {\tt getmeas} takes a string {\tt s}. It then tests whether this string contains anything but digits (that's what the line \typeLUA{if string.find(s, "[^0-9]")} does). If it contains anything but digits (i.e., digits and a percent sign), the \typeLUA{then} branch is executed: first, we extract a substring from our string {\tt s} which extends from the first character to the last but one character with the code \typeLUA{s = s:sub(1,-2)}. This will thus give us the number, without the \% sign. We then divide this number by 100 (\typeLUA{s = s / 100}) and append the \TeX\ string \typeTEX{\textwidth} to it. Finally, we pass this new string (which now has the form \typeTEX{0.25\textwidth}) to the \LUATEX\ engine. If our string {\tt s} contains only digits, we simply append the unit {\tt pt} to it and pass it to \LUATEX; it now has the form {\tt 25pt}. We wrap this \LUA\ function in a pair of \type{\startluacode \stopluacode} delimiters. We can now finally write the setup for {\tt img} element: \startTEX \startxmlsetups xml:img \placefigure [here] [\xmlatt{#1}{src}] {\xmlatt{#1}{alt}} {\externalfigure[\xmlatt{#1}{src}] [width=\ctxlua{getmeas("\xmlatt{#1}{width}")}]} \stopxmlsetups \stopTEX So: when \TeX\ finds an {\tt img} element, it will place a \typeTEX{\placefigure} command. It will use the name of the image (which is given in the {\tt src} attribute) as the identifier of this figure and the content of the {\tt alt} attribute for the caption. Finally, it will place the image itself as an \typeTEX{\externalfigure}, again using the content of the {\tt src} attribute and the content of the {\tt width} attribute to calculate the width. One last word about images: as you know, \HTML\ can include both local images and images retrieved from the web via URIs. You will be relieved to know that the same is possible with \CONTEXT: both \typeTEX{\externalfigure[nameoflocalfigure]} and \typeTEX{\externalfigure[http://www.someplace/someimage.jpg]} will work. As you see, \CONTEXT\ \MKIV\ offers rich possibilities of processing and manipulating \XML\ content. It is even possible to filter the content of the \XML\ data and only typeset content which matches certain criteria. Here's an example: \startTEX \startxmlsetups xml:p \xmltext{#1}{q} \stopxmlsetups \stopTEX What this setup does is: it looks at the element \typeXML{

} and then only typesets subelements of type \typeXML{} within this element. This may come in handy if you want to select only certain elements from your file. A command that is even more powerful is \typeTEX{\xmlfilter}; it can filter your \XML\ data and only process it if it meets certain conditions (only elements which have a certain attribute, or whose text contains a certain string). This MyWay was meant to whet your appetite. \CONTEXT\ \MKIV\ offers many sophisticated options to filter, manipulate, and typeset \XML\ files. This brief tutorial was meant to give beginners a point where to start exploring these opportunities. If writing, editing, and maintaining documents which will end up on the web and which should also be typeset is part of your workflow, you should definitely have a look at these possibilities. \page To make it easier for you to experiment, I have included the \XML\ file and the environment here. First the file {\tt sample.xml}: \attachment[sample] \typefile[XML][]{sample.xml} And here the environment {\tt ourenvironment.tex}: \attachment[environment] \typefile[TEX][]{ourenvironment.tex} %\setups [listing] %If you want the source listing of the module to be printed \setups [lastpage] \stoptext %%% Local Variables: %%% mode: context %%% TeX-master: t %%% End: