This specification defines an HTTP client interface for XPath based languages. The HTTP client interface is provided through a single extension function which performs HTTP requests, and associated error codes which define client error states.

It has been designed to be compatible via [[!XPATH20]] with [[!XQUERY]], and [[!XSLT20]]. It should also be suitable for any other language which hosts XPath 2.0, such as [[!XPROC]].

Introduction

Namespace conventions

The module defined by this document defines one function in the namespace http://expath.org/ns/http-client. In this document, the http prefix, when used, is bound to this namespace URI.

Error codes are defined in the namespace http://expath.org/ns/error. In this document, the err prefix, when used, is bound to this namespace URI.

Error management

Error conditions are identified by a code (a QName). When such an error condition is reached during the execution of the function, a dynamic error is thrown, with the corresponding error code (as if the standard XPath function fn:error had been called).

There are many cases where the HTTP protocol layer may raise an error. In each case, if the error condition is not mentioned explicitly in the spec, the implementation MUST raise an error with the error code err:HC001.

The http:send-request function

This module defines an XPath extension function that sends an HTTP request and returns the corresponding response. It also supports HTTP multi-part messages. Here is the signature of this function:

http:send-request( $request as element(http:request)?,
$href as xs:string?,
$bodies as item()*) as item()+

Besides the arity-three signature above, there are two other signatures that are convenient shortcuts (corresponding to the full version in which corresponding parameters have been set to the empty sequence). They are:

http:send-request($request as element(http:request)?, $href as xs:string?) as item()+
http:send-request($request as element(http:request)?, $href) as item()+

Sending a request

The functions defined in this module allow the transmission of a request to an HTTP server and the reception of the corresponding response. The request is represented by the parameters to the function, which define how to generate the actual HTTP request to transmit.

The Request Element

The http:request element represents all the information needed to send the HTTP request.

Some of the values defined for the http:request element can instead be set through a parameter to the function. For instance, some signatures define the parameter $href. If the value of this parameter is not the empty sequence, it will override the value of the attribute href on the http:request element.

<http:request method = ncname
              href? = uri
              http-version? = string
              status-only? = boolean
              username? = string
              password? = string
              auth-method? = string
              send-authorization? = boolean
              override-media-type? = string
              follow-redirect? = boolean
              timeout? = integer>
   <!-- Content: (http:header*, (http:body|http:multipart)?) -->
</http:request>

The http:header element represents an HTTP header, either in a request or a response:

<http:header name = string
             value = string>
   <!-- Content: empty -->
</http:header>

The http:body element represents the body of either an HTTP request or an HTTP response (in multipart requests and responses, it represents the body of a single part):

<http:body media-type = string
           src? = uri
           method? = "xml" | "html" | "xhtml" | "text" | "binary"
             | qname-but-not-ncname
           byte-order-mark? = "yes" | "no"
           cdata-section-elements? = qnames
           doctype-public? = string
           doctype-system? = string
           encoding? = string
           escape-uri-attributes? = "yes" | "no"
           indent? = "yes" | "no"
           normalization-form? = "NFC" | "NFD" | "NFKC" | "NFKD"
             | "fully-normalized" | "none" | nmtoken
           omit-xml-declaration? = "yes" | "no"
           standalone? = "yes" | "no" | "omit"
           suppress-indentation? = qnames
           undeclare-prefixes? = "yes" | "no"
           version? = nmtoken>
   <!-- Content: any* -->
</http:body>

The media-type is the media type of the body part. It is mandatory. In a request it is provided by the user and is the default value of the Content-Type header if it is not set explicitly. In a response, it is provided by the implementation from the Content-Type header returned by the server. The src attribute can be used in a request to set the body content as the content of the linked resource instead of using the children of the http:body element. When this attribute is used, only the media-type attribute must also be present, and there can be neither content in the http:body element, nor any other attribute, otherwise the error err:HC004 MUST be raised.

All the attributes, except src, are used to set the corresponding serialization parameters defined in [[!xslt-xquery-serialization]]. Those attributes can be provided by the user on a request to control the way a part body is serialized. In the response, the implementation can, but is not required, to provide some of them if it has the corresponding information (some of them do not make any sense in a response, therefore they will never be supplied on the response element, for instance version).

The http:multipart element represents an HTTP Multipart Type request or response:

<http:multipart media-type = string
                boundary? = string>
   <!-- (http:header*, http:body)+ -->
</http:multipart>

The media-type attribute is the media type of the whole request or response, and has to be a multipart media type (that is, its main type must be multipart). The boundary attribute is the boundary marker used to separate the several parts in the message (the value of the attribute is prefixed with "--" to form the actual boundary marker in the request; conversely, this prefix is removed from the boundary marker in the response to set the value of the attribute).

Serializing the Request

If the request entity body has content (one body or several body parts), it can be specified by the http:multipart element, the http:body element, and/or the parameter $bodies. For each body, the content of the HTTP body is generated as follows.

Except when its attribute src is present, a http:request element can have several attributes representing serialization parameters, as defined in [[!xslt-xquery-serialization]]. This spec defines in addition the method binary; in this case the body content must be either an xs:hexBinary or an xs:base64Binary item, and no other serialization parameter can be set besides media-type.

The default value of the serialization method depends on the media-type: it is xml if it is an XML media type, html if it is an HTML media type, xhtml if it is application/xhtml+xml, text if it is a textual media type, and binary for any other case.

When a body element has no content (i.e. no child nodes) its content is given by the parameter $bodies. In a single part request, this parameter must have at most one item. If the body is empty, the parameter cannot be the empty sequence. In a multipart request, $bodies must have as many items as there are empty body elements. If there are three empty body elements, the content of the first of them is $bodies[1], and so on. The number of empty body elements must be equal to the number of items in $bodies.

Authentication

HTTP authentication when sending a request is controlled by the attributes username, password, auth-method and send-authorization on the http:request element. If username has a value, password and auth-method must have a value too. And if any one of the three other attributes have been set, username must be set too.

The attribute auth-method can be either Basic or Digest, but other values can also be used, in an implementation-defined way. The handling of those attributes must be done in conformance with [[!rfc2617]]. If send-authorization is true (default value is false) and the authentication method supports generating the header Authorization without challenge, the request contains this header. The default value is to send a non-authenticated request, and if the response is an authentication challenge, only then send the credentials in a second request.

Handling the Response

After having sent the request to the HTTP server, the function waits for the response. The HTTP client parses the raw response and the function returns a representation of the response as a sequence. The sequence has an http:response element as the first item, which is followed by an additional item for each body or body part in the response.

The Response Element

<http:response status = integer
               message = string>
   <!-- Content: (http:header*, (http:body|http:multipart)?) -->
</http:response>

The http:response element is the first item in the sequence returned by the function. The status attribute is the HTTP Status Code returned by the server, and message is the Reason Phrase coming with the Status-Line. The http:header elements are as defined for the request, but represent instead the response headers. The http:body and http:multipart elements are also like in the request, but http:body elements must be empty.

The Response Entity Body

Instead of being inserted within the http:response element, the content of each body is returned as a single item in the returned sequence. Each item is in the same order (after the http:response element) as the http:body elements. For each body, the way this item is built from the HTTP response is as follow.

If the status-only attribute has the value true (default is false), the returned sequence will only contain the http:response element (with the headers, but also the empty http:body or http:multipart elements, as if status-only was false), and the following items, representing the bodies content are not generated from the HTTP response.

For each body that has to be parsed, the following rules apply in order to build the corresponding XDM item. If the body media type is a text media type, the item is an xs:string, containing the body content. If the media type is an XML media type, the content is parsed and the item is the resulting XDM document-node. If the media type is an HTML type, the content is tidied up and parsed (this process is both implementation-dependent and implementation-defined) and the item is the resulting XDM document-node. If this is a binary media type, the content is returned as an xs:base64Binary item. From the previous rules, a result item can then be either a document-node (from XML or HTML), an xs:string, or a xs:base64Binary.

When the type of a part is either XML or HTML, its body has to be parsed into a document node. If an error occurs whilst parsing the content, the error err:HC002 MUST be raised.

If the attribute override-media-type is set on the request, its value is used instead of the Content-Type header returned by the HTTP server. If the Content-Type header of the response indicates a multipart type, the value of override-media-type can only be a multipart type, or application/octet-stream (to get the raw entity as a binary item). If it is not, the error err:HC003 MUST be raised.

Processing Media Types

In both requests and responses, Media Type strings are used to choose the way the entity content has to be serialized or parsed.

We define four different classes of Media Type, which are used for sending requests and receiving responses. The intent is to provide guidance as to handling the entity content with respect to its content type, but an implementation is permitted to deviate from those rules if it is obvious that a particular type should be treated in a specific way, typically this can be useful for binary types such as [[EXI]].

Error Codes

err:HC001
An HTTP error occurred.
err:HC002
Error parsing the entity content as XML or HTML.
err:HC003
A multipart response was received, but the override-media-type was not a multipart media type or application/octet-stream.
err:HC004
The src attribute on the body element is mutually exclusive with all other attribute (except the media-type).
err:HC005
The http:request element is invalid.
err:HC006
A timeout occurred waiting for the response.
err:HC007
The specified HTTP version is not supported by this implementation.