Guile provides a standard data type for Universal Resource Identifiers (URIs), as defined in RFC 3986.
The generic URI syntax is as follows:
URI-reference := [scheme ":"] ["//" [userinfo "@"] host [":" port]] path \ [ "?" query ] [ "#" fragment ]
For example, in the URI, ‘http://www.gnu.org/help/
’, the
scheme is http
, the host is www.gnu.org
, the path is
/help/
, and there is no userinfo, port, query, or fragment.
Userinfo is something of an abstraction, as some legacy URI schemes
allowed userinfo of the form username:passwd
. But
since passwords do not belong in URIs, the RFC does not want to condone
this practice, so it calls anything before the @
sign
userinfo.
(use-modules (web uri))
The following procedures can be found in the (web uri)
module. Load it into your Guile, using a form like the above, to have
access to them.
The most common way to build a URI from Scheme is with the
build-uri
function.
#f
] [#:host=#f
] [#:port=#f
] [#:path=""
] [#:query=#f
] [#:fragment=#f
] [#:validate?=#t
] ¶Construct a URI. scheme should be a symbol, port either a
positive, exact integer or #f
, and the rest of the fields are
either strings or #f
. If validate? is true, also run some
consistency checks to make sure that the constructed URI is valid.
Return #t
if obj is a URI.
Guile, URIs are represented as URI records, with a number of associated accessors.
Field accessors for the URI record type. The URI scheme will be a
symbol, or #f
if the object is a relative-ref (see below). The
port will be either a positive, exact integer or #f
, and the rest
of the fields will be either strings or #f
if not present.
Parse string into a URI object. Return #f
if the string
could not be parsed.
#t
] ¶Serialize uri to a string. If the URI has a port that is the default port for its scheme, the port is not included in the serialization. If include-fragment? is given as false, the resulting string will omit the fragment (if any).
Declare a default port for the given URI scheme.
"utf-8"
] [#:decode-plus-to-space? #t] ¶Percent-decode the given str, according to encoding, which should be the name of a character encoding.
Note that this function should not generally be applied to a full URI
string. For paths, use split-and-decode-uri-path
instead. For
query strings, split the query on &
and =
boundaries, and
decode the components separately.
Note also that percent-encoded strings encode bytes, not
characters. There is no guarantee that a given byte sequence is a valid
string encoding. Therefore this routine may signal an error if the
decoded bytes are not valid for the given encoding. Pass #f
for
encoding if you want decoded bytes as a bytevector directly.
See set-port-encoding!
, for more information on
character encodings.
If decode-plus-to-space? is true, which is the default, also
replace instances of the plus character ‘+’ with a space character.
This is needed when parsing application/x-www-form-urlencoded
data.
Returns a string of the decoded characters, or a bytevector if
encoding was #f
.
"utf-8"
] [#:unescaped-chars] ¶Percent-encode any character not in the character set, unescaped-chars.
The default character set includes alphanumerics from ASCII, as well as
the special characters ‘-’, ‘.’, ‘_’, and ‘~’. Any
other character will be percent-encoded, by writing out the character to
a bytevector within the given encoding, then encoding each byte as
%HH
, where HH is the hexadecimal representation of
the byte.
Split path into its components, and decode each component, removing empty components.
For example, "/foo/bar%20baz/"
decodes to the two-element list,
("foo" "bar baz")
.
URI-encode each element of parts, which should be a list of
strings, and join the parts together with /
as a delimiter.
For example, the list ("scrambled eggs" "biscuits&gravy")
encodes
as "scrambled%20eggs/biscuits%26gravy"
.
As we noted above, not all URI objects have a scheme. You might have
noted in the “generic URI syntax” example that the left-hand side of
that grammar definition was URI-reference, not URI. A
URI-reference is a generalization of a URI where the scheme is
optional. If no scheme is specified, it is taken to be relative to some
other related URI. A common use of URI references is when you want to
be vague regarding the choice of HTTP or HTTPS – serving a web page
referring to /foo.css
will use HTTPS if loaded over HTTPS, or
HTTP otherwise.
#f
] [#:userinfo=#f
] [#:host=#f
] [#:port=#f
] [#:path=""
] [#:query=#f
] [#:fragment=#f
] [#:validate?=#t
] ¶Like build-uri
, but with an optional scheme.
Return #t
if obj is a URI-reference. This is the most
general URI predicate, as it includes not only full URIs that have
schemes (those that match uri?
) but also URIs without schemes.
It’s also possible to build a relative-ref: a URI-reference that explicitly lacks a scheme.
#f
] [#:host=#f
] [#:port=#f
] [#:path=""
] [#:query=#f
] [#:fragment=#f
] [#:validate?=#t
] ¶Like build-uri
, but with no scheme.
Return #t
if obj is a “relative-ref”: a URI-reference
that has no scheme. Every URI-reference will either match uri?
or relative-ref?
(but not both).
In case it’s not clear from the above, the most general of these URI
types is the URI-reference, with build-uri-reference
as the most
general constructor. build-uri
and build-relative-ref
enforce enforce specific restrictions on the URI-reference. The most
generic URI parser is then string->uri-reference
, and there is
also a parser for when you know that you want a relative-ref.
Note that uri?
will only return #t
for URI objects that
have schemes; that is, it rejects relative-refs.
Parse string into a URI object, while not requiring a scheme.
Return #f
if the string could not be parsed.
Parse string into a URI object, while asserting that no scheme is
present. Return #f
if the string could not be parsed.