head	1.3;
access;
symbols
	REL7_2_8:1.2
	REL7_2_7:1.2
	REL7_2_6:1.2
	REL7_2_5:1.2
	REL7_2_4:1.2
	REL7_2_3:1.2
	REL7_3_STABLE:1.2.0.4
	REL7_2_STABLE:1.2.0.2
	REL7_2:1.2
	REL7_2_RC2:1.2
	REL7_2_RC1:1.2
	REL7_2_BETA5:1.2
	REL7_2_BETA4:1.2
	REL7_2_BETA3:1.2
	REL7_2_BETA2:1.2
	REL7_2_BETA1:1.2;
locks; strict;
comment	@# @;


1.3
date	2002.10.22.20.03.09;	author petere;	state dead;
branches;
next	1.2;

1.2
date	2001.08.21.00.39.20;	author momjian;	state Exp;
branches
	1.2.4.1;
next	1.1;

1.1
date	2001.07.30.14.59.02;	author momjian;	state Exp;
branches;
next	;

1.2.4.1
date	2002.11.04.21.24.29;	author tgl;	state dead;
branches;
next	;


desc
@@


1.3
log
@Update build system.
@
text
@This package contains some simple routines for manipulating XML
documents stored in PostgreSQL. This is a work-in-progress and
somewhat basic at the moment (see the file TODO for some outline of
what remains to be done).

At present, two modules (based on different XML handling libraries)
are provided.

Prerequisite:

pgxml.c:
expat parser 1.95.0 or newer (http://expat.sourceforge.net)

or

pgxml_dom.c:
libxml2 (http://xmlsoft.org)

The libxml2 version provides more complete XPath functionality, and
seems like a good way to go. I've left the old versions in there for
comparison.

Compiling and loading:
----------------------

The Makefile only builds the libxml2 version.

To compile, just type make.

Then you can use psql to load the two function definitions: 
\i pgxml_dom.sql


Function documentation and usage:
---------------------------------

pgxml_parse(text) returns bool
  parses the provided text and returns true or false if it is 
well-formed or not. It returns NULL if the parser couldn't be
created for any reason.

pgxml_xpath (XQuery functions) - differs between the versions:

pgxml.c (expat version) has:

pgxml_xpath(text doc, text xpath, int n) returns text
  parses doc and returns the cdata of the nth occurence of
the "simple path" entry. 

However, the remainder of this document will cover the pgxml_dom.c version.

pgxml_xpath(text doc, text xpath, text toptag, text septag) returns text
  evaluates xpath on doc, and returns the result wrapped in
<toptag>...</toptag> and each result node wrapped in
<septag></septag>. toptag and septag may be empty strings, in which
case the respective tag will be omitted.

Example:

Given a  table docstore:

 Attribute |  Type   | Modifier 
-----------+---------+----------
 docid     | integer | 
 document  | text    | 

containing documents such as (these are archaeological site
descriptions, in case anyone is wondering):

<?XML version="1.0"?>
<site provider="Foundations" sitecode="ak97" version="1">
   <name>Church Farm, Ashton Keynes</name>
   <invtype>watching brief</invtype>
   <location scheme="osgb">SU04209424</location>
</site>

one can type:

select docid, 
pgxml_xpath(document,'//site/name/text()','','') as sitename,
pgxml_xpath(document,'//site/location/text()','','') as location
 from docstore;
 
and get as output:

 docid |               sitename               |  location  
-------+--------------------------------------+------------
     1 | Church Farm, Ashton Keynes           | SU04209424
     2 | Glebe Farm, Long Itchington          | SP41506500
     3 | The Bungalow, Thames Lane, Cricklade | SU10229362
(3 rows)

or, to illustrate the use of the extra tags:

select docid as id,
pgxml_xpath(document,'//find/type/text()','set','findtype') 
from docstore;

 id |                               pgxml_xpath                               
----+-------------------------------------------------------------------------
  1 | <set></set>
  2 | <set><findtype>Urn</findtype></set>
  3 | <set><findtype>Pottery</findtype><findtype>Animal bone</findtype></set>
(3 rows)

Which produces a new, well-formed document. Note that document 1 had
no matching instances, so the set returned contains no
elements. document 2 has 1 matching element and document 3 has 2.

This is just scratching the surface because XPath allows all sorts of
operations.

Note: I've only implemented the return of nodeset and string values so
far. This covers (I think) many types of queries, however.

John Gray <jgray@@azuli.co.uk>  16 August 2001


@


1.2
log
@1. I've now produced an updated version (and called it 0.2) of my XML
parser interface code. It now uses libxml2 instead of expat (though I've
left the old code in the tarball). This means *proper* XPath support, and
the provided function allows you to wrap your result set in XML tags to
produce a new XML document.

John Gray
@
text
@@


1.2.4.1
log
@Back-patch recent file removals into REL7_3_STABLE branch.
@
text
@@


1.1
log
@XML conversion utility, requires expat library.

John Gray
@
text
@d1 7
a7 8
This package contains a couple of simple routines for hooking the
expat XML parser up to PostgreSQL. This is a work-in-progress and all
very basic at the moment (see the file TODO for some outline of what
remains to be done).

At present, two functions are defined, one which checks
well-formedness, and the other which performs very simple XPath-type
queries.
d11 1
d14 19
a32 2
I used a shared library version -I'm sure you could use a static
library if you wished though. I had no problems compiling from source.
d42 4
d48 3
a50 1
the "XPath" listed. See below for details on the syntax.
d52 5
d80 2
a81 2
pgxml_xpath(document,'/site/name',1) as sitename,
pgxml_xpath(document,'/site/location',1) as location
d86 23
a108 12
 docid |          sitename           |  location  
-------+-----------------------------+------------
     1 | Church Farm, Ashton Keynes  | SU04209424
     2 | Glebe Farm, Long Itchington | SP41506500
(2 rows)


"XPath" syntax supported
------------------------

At present it only supports paths of the form:
'tag1/tag2' or '/tag1/tag2'
d110 2
a111 2
The first case will find any <tag2> within a <tag1>, the second will
find any <tag2> within a <tag1> at the top level of the document.
d113 2
a114 1
The real XPath is much more complex (see TODO file).
d116 1
a117 1
John Gray <jgray@@azuli.co.uk>  26 July 2001
@
