Semantics : SgmlStatement

SgmlStatement  ::=  do Expr ( in Expr )? ( < IDENTIFIER > )? as ( sgml ( case )? | html )
( with IDENTIFIER ;
| ( extends IDENTIFIER )? { ( SgmlEvent : ( Statement )+ )+ }
)

Used by:  

This statement is for SGML/HTML scraping and processing, which takes a file name or URL as the source, and allow handler code for each tag including text. The source can also be a file within a zip archive. The text encoding can be specified in < and > . The keywords sgml and html here are synonyms.

Tags and text pieces in a SGML document are parsed and become "events", allowing code to process. There are also special events such as BEFORE and AFTER . See SgmlEvent.

In the tag handler code, $_ is the built-in variable for the current tag. If it is not an end tag or text, the associated attributes are accessed as data members of $_ . See SgmlTag, SgmlTextTag or SgmlSpecialTag for its properties and methods. Tag names and attribute names are case insenstitive for HTML and SGML by default; for SGML, the case decorator forces case sensitivity. Multiple tags can share a same handler. Handler code ends before the next tag. The code runs in its own context (or scope) of the block.

SGML handler statement can also be declared first. See SgmlHandlerDeclaration. A declared SGML handler can be invoked via the with clause. It also allows other SGML handler statement to inherit its handlers via the extends clause.

Here are the rules for inherited SGML handler statements. The BEFORE and AFTER handlers are never inherited. For a tag, if no specific handler is provided in the current handler, the processing goes to the parent; if none of the processors have a specific handler for it, it tries the any tag handler (< > ) in the same way. In a handler, the resume statement can be used at the end to continue processing with the parent's handler.