Build multi-tenant RAG with Neon's database-per-user model — no nosy neighbors, max isolation, minimal costs

PostgreSQL XML Data Type

Summary: in this tutorial, you will learn how to use the PostgreSQL XML data type to store XML documents in the database.

Introduction to the PostgreSQL XML data type

PostgreSQL supports built-in XML data type that allows you to store XML documents directly within the database.

Here’s the syntax for declaring a column with the XML type:

column_name XML

The XML data type offers the following benefits:

  • Type Safety: PostgreSQL can validate when inserting/updating data, ensuring XML data conforms to XML standards.
  • Built-in XML functions and operators: PostgreSQL supports many XML functions and operators to manipulate XML data effectively.

PostgreSQL XML data type example

First, create a table called person:

CREATE TABLE person(
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    info XML
);

In this person table:

  • id is an identity column that serves as the primary key column of the table.
  • info is a column with the type XML that will store the XML data.

Second, insert a row into the person table:

INSERT INTO person (info)
VALUES (
    XMLPARSE(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?>
    <person>
        <name>John Doe</name>
        <age>35</age>
        <city>San Francisco</city>
    </person>')
);

In this statement:

  • DOCUMENT indicates that the input string is a complete XML document starting with the XML declaration <?xml version="1.0" encoding="UTF-8"?> and having the root element <person>
  • XMLPARSE function converts the string into an XML document.
  • The INSERT statement inserts the new XML document into the info column of the persons table.

Third, insert multiple rows into the person table:

INSERT INTO person (info)
VALUES
(
    XMLPARSE(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?>
    <person>
        <name>Jane Doe</name>
        <age>30</age>
        <city>San Francisco</city>
    </person>')
),
(
    XMLPARSE(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?>
    <person>
        <name>John Smith</name>
        <age>40</age>
        <city>New York</city>
    </person>')
),
(
    XMLPARSE(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?>
    <person>
        <name>Alice Johnson</name>
        <age>30</age>
        <city>Los Angeles</city>
    </person>')
);

Fourth, retrieve the names of persons from the XML documents using xpath() function:

SELECT xpath('/person/name/text()', info) AS name
FROM person;

Output:

name
-------------------
 {"John Doe"}
 {"Jane Doe"}
 {"John Smith"}
 {"Alice Johnson"}
(4 rows)

Each row in the result set is an array of XML values representing person names. Since each person has one name, the result array has only one element.

Fourth, retrieve person names as text from the XML documents using xpath() function:

SELECT (xpath('/person/name/text()', info))[1]::text AS name
FROM person;

Output:

name
---------------
 John Doe
 Jane Doe
 John Smith
 Alice Johnson
(4 rows)

How it works.

  • First, the XPath '/person/name/text()' returns the text of the name node of the XML document. It returns an array that includes all matching values.
  • Second, the [1] subscript returns the first element of the array.
  • Third, the ::text casts the XML value to the text.

Fifth, retrieve the ages of persons:

SELECT (xpath('/person/age/text()', info))[1]::text::integer AS age
FROM person;

Output:

age
-----
  35
  30
  40
  30
(4 rows)

In this query:

  • The xpath /person/age/text() returns the text of the age nodes as an array of text.
  • The [1] subscript returns the first element of the array.
  • The ::text cast the element to the text.
  • The ::integer casts the text to an integer.

In this example, we cast an XML value to text and text to an integer because we cannot cast an XML value directly to an integer.

Sixth, retrieve the name, age, and city from the XML document:

SELECT
    (xpath('/person/name/text()', info))[1]::text AS name,
    (xpath('/person/age/text()', info))[1]::text::integer AS age,
    (xpath('/person/city/text()', info))[1]::text AS city
FROM
    person;

Output:

name      | age |     city
---------------+-----+---------------
 John Doe      |  35 | San Francisco
 Jane Doe      |  30 | San Francisco
 John Smith    |  40 | New York
 Alice Johnson |  30 | Los Angeles
(4 rows)

Seventh, find the person with the name “Jane Doe”:

SELECT *
FROM person
WHERE (xpath('/person/name/text()', info))[1]::text = 'Jane Doe';

Output:

id |                info
----+------------------------------------
  2 |     <person>                      +
    |         <name>Jane Doe</name>     +
    |         <age>30</age>             +
    |         <city>San Francisco</city>+
    |     </person>
(1 row)

Creating indexes for XML data

If the person table has many rows, finding the person by name will be slow. You can create an expression index for the XML documents to improve the query performance.

First, create an index expression that extracts the name of a person as an array of text:

CREATE INDEX person_name
ON person USING BTREE
    (cast(xpath('/person/name', info) as text[])) ;

Second, create a function that inserts 1000 rows into the person table for testing purposes:

CREATE OR REPLACE FUNCTION generate_persons()
RETURNS void AS
$$
BEGIN
    INSERT INTO person (info)
    SELECT
        XMLPARSE(DOCUMENT '<?xml version="1.0" encoding="UTF-8"?>
        <person>
            <name>' || 'Person' || generate_series || '</name>
            <age>' || (generate_series % 80 + 18) || '</age>
            <city>' || CASE WHEN generate_series % 3 = 0 THEN 'New York'
                            WHEN generate_series % 3 = 1 THEN 'Los Angeles'
                            ELSE 'San Francisco' END || '</city>
        </person>')
    FROM generate_series(1, 1000);
END;
$$ LANGUAGE plpgsql;

Third, call the generate_persons to insert 1000 rows into the person table:

SELECT generate_persons();

Fifth, find a person with the name Jane Doe:

EXPLAIN ANALYZE
SELECT *
FROM person
WHERE cast(xpath('/person/name', info) as text[]) = '{<name>Jane Doe</name>}';

Output:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on person  (cost=4.31..17.81 rows=5 width=178) (actual time=0.039..0.040 rows=0 loops=1)
   Recheck Cond: ((xpath('/person/name'::text, info, '{}'::text[]))::text[] = '{"<name>Jane Doe</name>"}'::text[])
   ->  Bitmap Index Scan on person_name  (cost=0.00..4.31 rows=5 width=0) (actual time=0.036..0.037 rows=0 loops=1)
         Index Cond: ((xpath('/person/name'::text, info, '{}'::text[]))::text[] = '{"<name>Jane Doe</name>"}'::text[])
 Planning Time: 0.144 ms
 Execution Time: 0.069 ms
(6 rows)

The output indicates that the query utilizes the index expression of the person table.

Summary

  • Use the XML data type to store XML documents in the database.
  • Use the xpath() function to retrieve a value from XML documents.

Last updated on

Was this page helpful?