   #PHP Manual Function Reference wddx_serialize_vars utf8_decode

   PHP Manual
   Prev  Next
   ______________________________________________________________________

CXI. XML parser functions

Introduction

   XML (eXtensible Markup Language) is a data format for structured
   document interchange on the Web. It is a standard defined by The World
   Wide Web consortium (W3C). Information about XML and related
   technologies can be found at http://www.w3.org/XML/.

   This PHP extension implements support for James Clark's expat in PHP.
   This toolkit lets you parse, but not validate, XML documents. It
   supports three source character encodings also provided by PHP:
   US-ASCII, ISO-8859-1 and UTF-8. UTF-16 is not supported.

   This extension lets you create XML parsers and then define handlers
   for different XML events. Each XML parser also has a few parameters
   you can adjust.

Requirements

   This extension uses expat, which can be found at
   http://www.jclark.com/xml/expat.html. The Makefile that comes with
   expat does not build a library by default, you can use this make rule
   for that:
libexpat.a: $(OBJS)
    ar -rc $@ $(OBJS)
    ranlib $@

   A source RPM package of expat can be found at
   http://sourceforge.net/projects/expat/.

Installation

   These functions are enabled by default, using the bundled expat
   library. You can disable XML support with --disable-xml. If you
   compile PHP as a module for Apache 1.3.9 or later, PHP will
   automatically use the bundled expat library from Apache. In order you
   don't want to use the bundled expat library configure PHP
   --with-expat-dir=DIR, where DIR should point to the base installation
   directory of expat.

   The windows version of PHP has built in support for this extension.
   You do not need to load any additional extension in order to use these
   functions.

Runtime Configuration

   This extension has no configuration directives defined in php.ini.

Resource Types

xml

   The xml resource as returned by xml_parser_create() and
   xml_parser_create_ns() references an xml parser instance to be used
   with the functions provided by this extension.

Predefined Constants

   The constants below are defined by this extension, and will only be
   available when the extension has either been compiled into PHP or
   dynamically loaded at runtime.

   XML_ERROR_NONE (integer)

   XML_ERROR_NO_MEMORY (integer)

   XML_ERROR_SYNTAX (integer)

   XML_ERROR_NO_ELEMENTS (integer)

   XML_ERROR_INVALID_TOKEN (integer)

   XML_ERROR_UNCLOSED_TOKEN (integer)

   XML_ERROR_PARTIAL_CHAR (integer)

   XML_ERROR_TAG_MISMATCH (integer)

   XML_ERROR_DUPLICATE_ATTRIBUTE (integer)

   XML_ERROR_JUNK_AFTER_DOC_ELEMENT (integer)

   XML_ERROR_PARAM_ENTITY_REF (integer)

   XML_ERROR_UNDEFINED_ENTITY (integer)

   XML_ERROR_RECURSIVE_ENTITY_REF (integer)

   XML_ERROR_ASYNC_ENTITY (integer)

   XML_ERROR_BAD_CHAR_REF (integer)

   XML_ERROR_BINARY_ENTITY_REF (integer)

   XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF (integer)

   XML_ERROR_MISPLACED_XML_PI (integer)

   XML_ERROR_UNKNOWN_ENCODING (integer)

   XML_ERROR_INCORRECT_ENCODING (integer)

   XML_ERROR_UNCLOSED_CDATA_SECTION (integer)

   XML_ERROR_EXTERNAL_ENTITY_HANDLING (integer)

   XML_OPTION_CASE_FOLDING (integer)

   XML_OPTION_TARGET_ENCODING (integer)

   XML_OPTION_SKIP_TAGSTART (integer)

   XML_OPTION_SKIP_WHITE (integer)

Event Handlers

   The XML event handlers defined are:

   Table 1. Supported XML handlers
   PHP function to set handler Event description
   xml_set_element_handler() Element events are issued whenever the XML
   parser encounters start or end tags. There are separate handlers for
   start tags and end tags.
   xml_set_character_data_handler() Character data is roughly all the
   non-markup contents of XML documents, including whitespace between
   tags. Note that the XML parser does not add or remove any whitespace,
   it is up to the application (you) to decide whether whitespace is
   significant.
   xml_set_processing_instruction_handler() PHP programmers should be
   familiar with processing instructions (PIs) already. <?php ?> is a
   processing instruction, where php is called the "PI target". The
   handling of these are application-specific, except that all PI targets
   starting with "XML" are reserved.
   xml_set_default_handler() What goes not to another handler goes to the
   default handler. You will get things like the XML and document type
   declarations in the default handler.
   xml_set_unparsed_entity_decl_handler() This handler will be called for
   declaration of an unparsed (NDATA) entity.
   xml_set_notation_decl_handler() This handler is called for declaration
   of a notation.
   xml_set_external_entity_ref_handler() This handler is called when the
   XML parser finds a reference to an external parsed general entity.
   This can be a reference to a file or URL, for example. See the
   external entity example for a demonstration.

Case Folding

   The element handler functions may get their element names case-folded.
   Case-folding is defined by the XML standard as "a process applied to a
   sequence of characters, in which those identified as non-uppercase are
   replaced by their uppercase equivalents". In other words, when it
   comes to XML, case-folding simply means uppercasing.

   By default, all the element names that are passed to the handler
   functions are case-folded. This behaviour can be queried and
   controlled per XML parser with the xml_parser_get_option() and
   xml_parser_set_option() functions, respectively.

Error Codes

   The following constants are defined for XML error codes (as returned
   by xml_parse()):

   XML_ERROR_NONE
   XML_ERROR_NO_MEMORY
   XML_ERROR_SYNTAX
   XML_ERROR_NO_ELEMENTS
   XML_ERROR_INVALID_TOKEN
   XML_ERROR_UNCLOSED_TOKEN
   XML_ERROR_PARTIAL_CHAR
   XML_ERROR_TAG_MISMATCH
   XML_ERROR_DUPLICATE_ATTRIBUTE
   XML_ERROR_JUNK_AFTER_DOC_ELEMENT
   XML_ERROR_PARAM_ENTITY_REF
   XML_ERROR_UNDEFINED_ENTITY
   XML_ERROR_RECURSIVE_ENTITY_REF
   XML_ERROR_ASYNC_ENTITY
   XML_ERROR_BAD_CHAR_REF
   XML_ERROR_BINARY_ENTITY_REF
   XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF
   XML_ERROR_MISPLACED_XML_PI
   XML_ERROR_UNKNOWN_ENCODING
   XML_ERROR_INCORRECT_ENCODING
   XML_ERROR_UNCLOSED_CDATA_SECTION
   XML_ERROR_EXTERNAL_ENTITY_HANDLING

Character Encoding

   PHP's XML extension supports the Unicode character set through
   different character encodings. There are two types of character
   encodings, source encoding and target encoding. PHP's internal
   representation of the document is always encoded with UTF-8.

   Source encoding is done when an XML document is parsed. Upon creating
   an XML parser, a source encoding can be specified (this encoding can
   not be changed later in the XML parser's lifetime). The supported
   source encodings are ISO-8859-1, US-ASCII and UTF-8. The former two
   are single-byte encodings, which means that each character is
   represented by a single byte. UTF-8 can encode characters composed by
   a variable number of bits (up to 21) in one to four bytes. The default
   source encoding used by PHP is ISO-8859-1.

   Target encoding is done when PHP passes data to XML handler functions.
   When an XML parser is created, the target encoding is set to the same
   as the source encoding, but this may be changed at any point. The
   target encoding will affect character data as well as tag names and
   processing instruction targets.

   If the XML parser encounters characters outside the range that its
   source encoding is capable of representing, it will return an error.

   If PHP encounters characters in the parsed XML document that can not
   be represented in the chosen target encoding, the problem characters
   will be "demoted". Currently, this means that such characters are
   replaced by a question mark.

Examples

   Here are some example PHP scripts parsing XML documents.

XML Element Structure Example

   This first example displays the structure of the start elements in a
   document with indentation.

   Example 1. Show XML Element Structure
   <?php
   $file = "data.xml";
   $depth = array();
   function startElement($parser, $name, $attrs) {
       global $depth;
       for ($i = 0; $i < $depth[$parser]; $i++) {
           echo "  ";
       }
       echo "$name\n";
       $depth[$parser]++;
   }
   function endElement($parser, $name) {
       global $depth;
       $depth[$parser]--;
   }
   $xml_parser = xml_parser_create();
   xml_set_element_handler($xml_parser, "startElement", "endElement");
   if (!($fp = fopen($file, "r"))) {
       die("could not open XML input");
   }
   while ($data = fread($fp, 4096)) {
       if (!xml_parse($xml_parser, $data, feof($fp))) {
           die(sprintf("XML error: %s at line %d",
                       xml_error_string(xml_get_error_code($xml_parser)),
                       xml_get_current_line_number($xml_parser)));
       }
   }
   xml_parser_free($xml_parser);
   ?>

XML Tag Mapping Example

   Example 2. Map XML to HTML

   This example maps tags in an XML document directly to HTML tags.
   Elements not found in the "map array" are ignored. Of course, this
   example will only work with a specific XML document type.

   <?php
   $file = "data.xml";
   $map_array = array(
       "BOLD"     => "B",
       "EMPHASIS" => "I",
       "LITERAL"  => "TT"
   );
   function startElement($parser, $name, $attrs) {
       global $map_array;
       if ($htmltag == $map_array[$name]) {
           echo "<$htmltag>";
       }
   }
   function endElement($parser, $name) {
       global $map_array;
       if ($htmltag == $map_array[$name]) {
           echo "</$htmltag>";
       }
   }
   function characterData($parser, $data) {
       echo $data;
   }
   $xml_parser = xml_parser_create();
   // use case-folding so we are sure to find the tag in $map_array
   xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
   xml_set_element_handler($xml_parser, "startElement", "endElement");
   xml_set_character_data_handler($xml_parser, "characterData");
   if (!($fp = fopen($file, "r"))) {
       die("could not open XML input");
   }
   while ($data = fread($fp, 4096)) {
       if (!xml_parse($xml_parser, $data, feof($fp))) {
           die(sprintf("XML error: %s at line %d",
                       xml_error_string(xml_get_error_code($xml_parser)),
                       xml_get_current_line_number($xml_parser)));
       }
   }
   xml_parser_free($xml_parser);
   ?>

XML External Entity Example

   This example highlights XML code. It illustrates how to use an
   external entity reference handler to include and parse other
   documents, as well as how PIs can be processed, and a way of
   determining "trust" for PIs containing code.

   XML documents that can be used for this example are found below the
   example (xmltest.xml and xmltest2.xml.)

   Example 3. External Entity Example
   <?php
   $file = "xmltest.xml";
   function trustedFile($file) {
       // only trust local files owned by ourselves
       if (!eregi("^([a-z]+)://", $file)
           && fileowner($file) == getmyuid()) {
               return true;
       }
       return false;
   }
   function startElement($parser, $name, $attribs) {
       echo "&lt;<font color=\"#0000cc\">$name</font>";
       if (sizeof($attribs)) {
           while (list($k, $v) = each($attribs)) {
               echo " <font color=\"#009900\">$k</font>=\"<font
                      color=\"#990000\">$v</font>\"";
           }
       }
       echo "&gt;";
   }
   function endElement($parser, $name) {
       echo "&lt;/<font color=\"#0000cc\">$name</font>&gt;";
   }
   function characterData($parser, $data) {
       echo "<b>$data</b>";
   }
   function PIHandler($parser, $target, $data) {
       switch (strtolower($target)) {
           case "php":
               global $parser_file;
               // If the parsed document is "trusted", we say it is safe
               // to execute PHP code inside it.  If not, display the
   code
               // instead.
               if (trustedFile($parser_file[$parser])) {
                   eval($data);
               } else {
                   printf("Untrusted PHP code: <i>%s</i>",
                           htmlspecialchars($data));
               }
               break;
       }
   }
   function defaultHandler($parser, $data) {
       if (substr($data, 0, 1) == "&" && substr($data, -1, 1) == ";") {
           printf('<font color="#aa00aa">%s</font>',
                   htmlspecialchars($data));
       } else {
           printf('<font size="-1">%s</font>',
                   htmlspecialchars($data));
       }
   }
   function externalEntityRefHandler($parser, $openEntityNames, $base,
   $systemId,
                                     $publicId) {
       if ($systemId) {
           if (!list($parser, $fp) = new_xml_parser($systemId)) {
               printf("Could not open entity %s at %s\n",
   $openEntityNames,
                      $systemId);
               return false;
           }
           while ($data = fread($fp, 4096)) {
               if (!xml_parse($parser, $data, feof($fp))) {
                   printf("XML error: %s at line %d while parsing entity
   %s\n",
                          xml_error_string(xml_get_error_code($parser)),
                          xml_get_current_line_number($parser),
   $openEntityNames);
                   xml_parser_free($parser);
                   return false;
               }
           }
           xml_parser_free($parser);
           return true;
       }
       return false;
   }
   function new_xml_parser($file) {
       global $parser_file;
       $xml_parser = xml_parser_create();
       xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, 1);
       xml_set_element_handler($xml_parser, "startElement",
   "endElement");
       xml_set_character_data_handler($xml_parser, "characterData");
       xml_set_processing_instruction_handler($xml_parser, "PIHandler");
       xml_set_default_handler($xml_parser, "defaultHandler");
       xml_set_external_entity_ref_handler($xml_parser,
   "externalEntityRefHandler");

       if (!($fp = @fopen($file, "r"))) {
           return false;
       }
       if (!is_array($parser_file)) {
           settype($parser_file, "array");
       }
       $parser_file[$xml_parser] = $file;
       return array($xml_parser, $fp);
   }
   if (!(list($xml_parser, $fp) = new_xml_parser($file))) {
       die("could not open XML input");
   }
   echo "<pre>";
   while ($data = fread($fp, 4096)) {
       if (!xml_parse($xml_parser, $data, feof($fp))) {
           die(sprintf("XML error: %s at line %d\n",
                       xml_error_string(xml_get_error_code($xml_parser)),
                       xml_get_current_line_number($xml_parser)));
       }
   }
   echo "</pre>";
   echo "parse complete\n";
   xml_parser_free($xml_parser);
   ?>

   Example 4. xmltest.xml
<?xml version='1.0'?>
<!DOCTYPE chapter SYSTEM "/just/a/test.dtd" [
<!ENTITY plainEntity "FOO entity">
<!ENTITY systemEntity SYSTEM "xmltest2.xml">
]>
<chapter>
 <TITLE>Title &plainEntity;</TITLE>
 <para>
  <informaltable>
   <tgroup cols="3">
    <tbody>
     <row><entry>a1</entry><entry morerows="1">b1</entry><entry>c1</entry></row
>
     <row><entry>a2</entry><entry>c2</entry></row>
     <row><entry>a3</entry><entry>b3</entry><entry>c3</entry></row>
    </tbody>
   </tgroup>
  </informaltable>
 </para>
 &systemEntity;
 <section id="about">
  <title>About this Document</title>
  <para>
   <!-- this is a comment -->
   <?php echo 'Hi!  This is PHP version ' . phpversion(); ?>
  </para>
 </section>
</chapter>

   This file is included from xmltest.xml:

   Example 5. xmltest2.xml
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY testEnt "test entity">
]>
<foo>
   <element attrib="value"/>
   &testEnt;
   <?php echo "This is some more PHP code being executed."; ?>
</foo>

   Table of Contents
   utf8_decode --  Converts a string with ISO-8859-1 characters encoded
          with UTF-8 to single-byte ISO-8859-1.

   utf8_encode -- encodes an ISO-8859-1 string to UTF-8
   xml_error_string -- get XML parser error string
   xml_get_current_byte_index -- get current byte index for an XML parser
   xml_get_current_column_number --  Get current column number for an XML
          parser

   xml_get_current_line_number -- get current line number for an XML
          parser

   xml_get_error_code -- get XML parser error code
   xml_parse_into_struct -- Parse XML data into an array structure
   xml_parse -- start parsing an XML document
   xml_parser_create_ns --  Create an XML parser with namespace support
   xml_parser_create -- create an XML parser
   xml_parser_free -- Free an XML parser
   xml_parser_get_option -- get options from an XML parser
   xml_parser_set_option -- set options in an XML parser
   xml_set_character_data_handler -- set up character data handler
   xml_set_default_handler -- set up default handler
   xml_set_element_handler -- set up start and end element handlers
   xml_set_end_namespace_decl_handler --  Set up character data handler
   xml_set_external_entity_ref_handler -- set up external entity
          reference handler

   xml_set_notation_decl_handler -- set up notation declaration handler
   xml_set_object -- Use XML Parser within an object
   xml_set_processing_instruction_handler --  Set up processing
          instruction (PI) handler

   xml_set_start_namespace_decl_handler --  Set up character data handler

   xml_set_unparsed_entity_decl_handler --  Set up unparsed entity
          declaration handler
   ______________________________________________________________________

   Prev                Home        Next
   wddx_serialize_vars  Up  utf8_decode
