Entity Management


One of the big issues in SGML/XML is entity management. This refers to resolving entities in general, but usually boils down to finding DTD files. Most DTD are available via http, but a lot of editors, parsers, and the like are not network enabled. Using SYSTEM names to local files works, but isn't too portable. In theory, one would like to use well know public identifiers, things like,

-/Some Organization//DTD Some thing 1.0/EN
    

But resolving those into actual files is a black art. What follows is an attempt to make some sense of the problem.


SGML Entity Management Catalogs

OASIS Technical Resolution 9401 defines a catalog format that many applications support. In the simplest case, one can map public identifiers into relative paths. This example, CATALOG maps names like

-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN

into files (located in the same directory) such as web-app_2._2.dtd As an example, I've collected a few DTDs but you are better going back to the original sources for authoritative copies.


emacs PSGML mode

PSGML is an major mode for emacs or Xemacs than supports SGML, XML and HTML. While PSGML supports SYSTEM entities, it will not do http requests. However, PSGML will use PUBLIC identifiers first, unless you set sgml-system-identifiers-are-prefered non-nil. PSGML will map PUBLIC identifiers using SGML Entity catalogs. Several variable are used to look for catalogs. For details see the PSGML info for "Entity Management".

In general, I recommend setting sgml-local-catalogs which is an list of locally installed catalogs. For example, if the catalogs and DTDs in the example above are in /usr/local/dtds, setting it to

( "/usr/local/dtds/CATALOG" )
    
will do the trick. Those who don't like to play with their .emacs can just use custom to set the variable
META-x customize-variable sgml-local-catalogs
    

You can also compile the DTDs to speed up psgml and list them in a separate catalog. See the info files for psgml for the full details.

For a good article on setting up emacs with psgml-mode, see Using Emacs for XML Documents by Brian Gillian, on IBM DeveloperWorks.


Ant

Ant is a build tool we often use for Java. It includes a useful XMLValidate task, which is useful as a packaging pre-requisite for things like XML deployment descriptors. It does not support catalogs, however it does support and embedded dtd element, that will map PUBLIC identifiers to local files. For example:

  <target name="validate" description="validate xml config files">
    <xmlvalidate>
      <fileset dir="${etc.dir}" includes="*.xml">
      </fileset>
      <dtd location="${etc.dir}/web-app_2_2.dtd"
           publicid="-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN" />
      <dtd location="${etc.dir}/struts-config_1_0.dtd"
           publicid="-/Apache Software Foundataion//DTD Struts Configuration 1.0//EN"/>
    </xmlvalidate>
  </target>
    

Drew Sudell
Last modified: Fri May 24 17:31:31 EDT 2002
Created with XEmacs Valid HTML 4.01!