Extractor Reference¶
A feature is created by applying a feature expression, which consists of a mix of literal strings and extractors.
For convenience, because a single extractor is by far the most common case, unquoted strings
are treated as a single extractor. Consider the extractor ua-req-host
. This can be
used in the following feature expressions, presuming the host is “example.one”.
Feature String |
Extracted Feature |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Extractors may have or require parameters that affect what is extracted which are supplied as an argument. This enables using these parameters inside a feature expression.
Extractors¶
HTTP Messages¶
There is a lot of information in the HTTP messages handled by Traffic Server and many extractors to access it. These are divided in to families, each one based around one of the basic messages -
User Agent Request - the request sent by the user agent to Traffic Server.
Proxy Request - the request sent by Traffic Server (the proxy) to the upstream.
Upstream Response - the response sent by the upstream to Traffic Server in response to the Proxy Request.
Proxy Response - the response sent by Traffic Server to the user agent.
There is also the “pre-remap” or “pristine” user agent URL. This is a URL only, not a request, and is a copy of the user agent URL just before URL rewriting.
In addition, for the remap hook, there are two other URLs (not requests) that are available. These are the “target” and “replacement” URLs which are the values literally specified in the URL rewrite rule. Note these can be problematic for regular expression remap rules as they are frequently not valid URLs.
Host and Port Handling¶
The host and port for a request or URL require special handling due to vagaries in the HTTP
specification. The most important distinction is these can appear in two different places, the URL
itself or in the Host
field of the request, or both. This can make modifying them in a specific
way challenging. The complexity described here is to make it possible to do exactly what is needed.
In particular, although the HTTP specification says if the host and port are in the URL and in the
Host
header, these must be the same, in _practice_ many proxies are configured to make them
different before sending the request upstream from the proxy, generally so that the Host
field
is based on what the user agent sent and the URL is changed to be where the proxy routes the
request. Therefore there are extactors which work on the request as a whole, considering both the
URL and the Host
field, and others that always use the URL.
Beyond this, the port is optional and this presents some problems. One is a result of the ATS plugin API which makes it impossible to distinguish between “http://delain.nl” and “http://delain.nl:80”. Both port 80 and port 443 are treated specially, the former for scheme “HTTP” and the latter for “HTTPS”. I intend to add that at some point but currently it cannot be done. The result is it is difficult to impossible to properly set these values in a configuration language such as TxnBox has and even if possible would be rather painful to do repeatedly. Therefore TxnBox has the concept of “location” which corresponds to the host and port (the HTTP specification calls this the authority but everyone thought using the term was a terrible idea). This makes it easy to access the host, the port, or both. This is more important when interacting with directives to set those values. Here is a chart to illustrate the three terms
url |
host |
port |
loc |
---|---|---|---|
evil-kow.ex |
80 |
evil-kow.ex |
|
evil-kow.ex |
80 |
evil-kow.ex |
|
evil-kow.ex |
443 |
evil-kow.ex |
|
evil-kow.ex |
4443 |
evil-kow.ex:4443 |
|
evil-kow.ex |
4443 |
evil-kow.ex:4443 |
Paths¶
Unfortunately due to how the plugin API works, paths are a bit odd. One result is
Important
Paths do not have a leading slash
Given the URL “http://delain.nl/pix”, the path is “pix”, not “/pix”. The existence of the slash is implied by the existence of the path. There is, unfortunately, no way to distinguish a missing from an empty path. E.g. “http://delain.nl” and “http://delain.nl/” are not distinguishable by looking at the value from a path extractor, both will yield an empty string. This matters less than it appears because both ATS and the upstream will treat them identically. Note this applies only to the slash separating the “authority” / “location” from the path. The path for the URLs “http://delain.nl/pix/charlotte” and “http://delain.nl/pix/charlotte/” are distinguishable.
User Agent Request¶
-
ua-req-method
¶
- Result
- string
The user agent request method.
-
ua-req-url
¶
- Result
- string
The URL in the user agent request.
-
ua-req-scheme
¶
- Result
- string
The URL scheme in the user agent request.
-
ua-req-loc
¶
- Result
- string
The location for the request, consisting of the host and the optional port. This is retrieved
from the URL if present, otherwise from the Host
field.
-
ua-req-host
¶
- Result
- string
Host for the user agent request. This is retrieved from the URL if present, otherwise from the
Host
field. This does not include the port.
-
ua-req-port
¶
- Result
- integer
The port for the user agent request. This is pulled from the URL if present, otherwise from
the Host
field. If not specified, the canonical default based on the scheme is used.
-
ua-req-path
¶
- Result
- string
The path of the URL in the user agent request. This does not include a leading slash.
-
ua-req-query
¶
- Result
- string
The query string for the user agent request if present, an empty string if not.
-
ua-req-query-value
¶
- Result
- string
- Argument
- Query parameter key.
The value for a specific query parameter, identified by key. This assumes the standard format for
a query string, key / value pairs (joined by ‘=’) separated by ‘&’ or ‘;’. The key comparison
is case insensitive. NIL
is returned if the key is not found.
-
ua-req-fragment
¶
- Result
- string
The fragment of the URL in the user agent request if present, an empty string if not.
-
ua-req-url-host
¶
- Result
- string
Host for the user agent request URL.
-
ua-req-url-port
¶
- Result
- integer
The port for the user agent request URL.
-
ua-req-url-loc
¶
- Result
- string
The location for the user agent request URL, consisting of the host and the optional port.
-
ua-req-field
¶
- Result
- NULL, string, string list
- Argument
- name
The value of a field in the client request. This requires a field name as a argument. To get the value of the “Host” field the extractor would be “ua-req-field<Host>”. The field name is case insensitive.
If the field is not present, the NULL
value is returned. Note this is distinct from the
empty string which is returned if the field is present but has no value. If there are duplicate
fields then a string list is returned, each element of which corresponds to a field.
Pre-Remap¶
The following extractors extract data from the user agent request URL, but from the URL as it was before URL rewriting (“remapping”). Only the URL is preserved, not any of the fields or the method. These are referred to elsewhere as “pristine” but that is a misnomer. If the user agent request is altered before URL rewriting, that will be reflected in the data from these extractors. These do not necessarily return the URL as it was received by ATS from the user agent. All of these have an alias with “pristine” instead of “pre-remap” for old school operations staff. There are no directives to modify these values, they are read only.
-
pre-remap-scheme
¶
- Result
- string
The URL scheme in the pre-remap user agent request URL.
-
pre-remap-url
¶
- Result
- string
The full URL of the pre-remap user agent request.
-
pre-remap-path
¶
- Result
- string
The URL path in the pre-remap user agent request URL. This does not include a leading slash.
-
pre-remap-host
¶
- Result
- string
The host in the pre-remap user agent request URL. This does not include the port.
-
pre-remap-port
¶
- Result
- integer
The port in the pre-remap user agent request URL. If not specified, the canonical default based on the scheme is used.
-
pre-remap-query
¶
- Result
- string
The query string for the pre-remap user agent request URL.
-
pre-remap-query-value
¶
- Result
- string
- Argument
- Query parameter key.
The value for a specific query parameter, identified by key. This assumes the standard format for
a query string, key / value pairs (joined by ‘=’) separated by ‘&’ or ‘;’. The key comparison
is case insensitive. NIL
is returned if the key is not found.
-
pre-remap-fragment
¶
- Result
- string
The fragment of the URL in the pre-remap user agent request if present, an empty string if not.
Rewrite Rule URLs¶
During URL rewriting there are two additional URLs available, the “target” and the “replacement” URL. These are fixed values from the rule itself, not the user agent. For this reason there are extractors to get data from these URLs but no directives to modify them. These values are available only for the “remap” hook, that is directives invoked from a rule in “remap.config”. Query values are not permitted in these URLs and so no extractor for that is provided.
-
remap-target-url
¶
- Result
- string
The full target URL.
-
remap-target-scheme
¶
- Result
- string
The scheme in the target URL.
-
remap-target-loc
¶
- Result
- string
The network location of the target URL.
-
remap-target-host
¶
- Result
- string
The host in the target URL. This does not include the port, if any.
-
remap-target-port
¶
- Result
- integer
The port in the target URL. If not specified, the default based on the scheme is extracted.
-
remap-target-path
¶
- Result
- string
The path in the target URL.
-
remap-replacement-url
¶
- Result
- string
The full replacement URL.
-
remap-replacement-scheme
¶
- Result
- string
The scheme in the replacement URL.
-
remap-replacement-loc
¶
- Result
- string
The network location in the replacement URL.
-
remap-replacement-host
¶
- Result
- string
The host in the replacement URL. This does not include the port, if any.
-
remap-replacement-port
¶
- Result
- integer
The port in the replacement URL. If not specified, the default based on the scheme is extracted.
-
remap-replacement-path
¶
- Result
- string
The path in the replacement URL.
Proxy Request¶
-
proxy-req-method
¶
- Result
- string
The proxy request method.
-
proxy-req-url
¶
- Result
- string
The URL in the request.
-
proxy-req-scheme
¶
- Result
- string
The URL scheme in the proxy request.
-
proxy-req-loc
¶
- Result
- string
The network location in the request. This is retrieved from the URL if present, otherwise from
the Host
field.
-
proxy-req-host
¶
- Result
- string
Host for the request. This is retrieved from the URL if present, otherwise from the Host
field. This does not include the port.
-
proxy-req-path
¶
- Result
- string
The path of the URL in the request. This does not include a leading slash.
-
proxy-req-port
¶
- Result
- integer
The port for the request. This is pulled from the URL if present, otherwise from the Host
field.
-
proxy-req-query
¶
- Result
- string
The query string in the proxy request.
-
proxy-req-query-value
¶
- Result
- string
- Argument
- Query parameter key.
The value for a specific query parameter, identified by key. This assumes the standard format for
a query string, key / value pairs (joined by ‘=’) separated by ‘&’ or ‘;’. The key comparison
is case insensitive. NIL
is returned if the key is not found.
-
proxy-req-fragment
¶
- Result
- string
The fragment of the URL in the proxy request if present, an empty string if not.
-
proxy-req-url-host
¶
- Result
- string
The host in the request URL.
-
proxy-req-url-port
¶
- Result
- integer
The port in the request URL.
-
proxy-req-url-loc
¶
- Result
- string
The location in the URL if present, an empty string if not.
-
proxy-req-field
¶
- Result
- NULL, string, string list
- Argument
- name
The value of a field. This requires a field name as a argument. To get the value of the “Host” field the extractor would be “proxy-req-field<Host>”. The field name is case insensitive.
If the field is not present, the NULL
value is returned. Note this is distinct from the
empty string which is returned if the field is present but has no value. If there are duplicate
fields then a string list is returned, each element of which corresponds to a field.
Upstream Response¶
-
upstream-rsp-status
¶
- Result
- integer
The code of the response status.
-
upstream-rsp-status-reason
¶
- Result
- string
The reason of the response status.
-
upstream-rsp-field
¶
- Result
- NULL, string, string list
- Argument
- name
The value of a field. This requires a field name as a argument. The field name is case insensitive.
If the field is not present, the NULL
value is returned. Note this is distinct from the
empty string which is returned if the field is present but has no value. If there are duplicate
fields then a string list is returned, each element of which corresponds to a field.
Proxy Response¶
-
proxy-rsp-status
¶
- Result
- integer
The code of the response status.
-
proxy-rsp-status-reason
¶
- Result
- string
The reason of the response status.
-
proxy-rsp-field
¶
- Result
- NULL, string, string list
- Argument
- name
The value of a field. This requires a field name as a argument. The field name is case insensitive.
If the field is not present, the NULL
value is returned. Note this is distinct from the
empty string which is returned if the field is present but has no value. If there are duplicate
fields then a string list is returned, each element of which corresponds to a field.
Transaction¶
-
is-internal
¶
- Result
- boolean
This returns a boolean value, true
if the request is an internal request, and false
if not.
Session¶
-
inbound-txn-count
¶
- Result
- integer
The number of transactions, including the current on, that have occurred on the inbound transaction.
-
inbound-addr-remote
¶
- Result
- IP address
The remote address for the inbound connection. This is also known as the “client address”, the address from which the connection originates.
-
inbound-addr-local
¶
- Result
- IP address
The local address for the inbound connection, which is the address used accept the inbound session.
-
inbound-sni
¶
- Result
- string
The SNI name sent on the inbound session.
-
has-inbound-protocol-prefix
¶
- Result
- boolean
- Argument
- protocol tag prefix
For the inbound session there is a list of protocol tags that describe the network protocols used for that network connection. This extractor checks the inbound session list to see if it contains a tag that has a specific prefix. The most common use is to determine if the inbound session is TLS
with: has-inbound-protocol-prefix<tls>
select:
- is-true:
do: # TLS only stuff.
Note
Checking a request for the scheme “https” is not identical to checking for TLS. Nothing prevents a user agent from sending a scheme at variance with the network protocol stack. This extractor checks the network protocol, not the request.
Checking for IPv6 can be done in a similar way.
with: has-inbound-protocol-prefix<ipv6>
select:
- is-true:
do: # IPv6 special handling.
-
inbound-protocol
¶
- Result
- string
- Argument
- protocol tag prefix
For the inbound session there is a list of protocol tags that describe the network protocols used for that network connection. This extractor searches the inbound session list and if there is a prefix match, returns the matched protocol tag. This can be used to check for different versions of TLS.
with: inbound-protocol<tls>
select:
- match: "tls/1.3"
do: # TLS 1.3 only stuff.
- prefix: "tls"
do: # Older TLS stuff.
- otherwise:
do: # Non-TLS stuff.
-
inbound-protocol-stack
¶
- Result
- tuple of strings
This extracts the entire stack of tags for the network protocols of the inbound connection as a tuple. This could be used to check for an IPv4 connection
with: inbound-protocol-stack
select:
- for-any:
match: "ipv4"
do:
# IPv4 only things.
In general, though, has-inbound-protocol-prefix
is usually a better choice for doing such
checking unless the full stack or a full tag is needed.
-
inbound-cert-verify-result
¶
- Result
- integer
The result of verifying the inbound remote (client) certificate. Due to issues in the OpenSSL
library this can be a bit odd. If the the inbound session is not TLS the result will be
X509_V_ERR_INVALID_CALL
which as of this writing has the value 69 (:reference). Otherwise, if no client certificate
was provided and was not required the result is X509_V_OK
which has the value 0. This lack
can be detected indirectly by all of the certificate extractors returning empty strings.
-
inbound-cert-local-issuer-field
¶
- Result
- string
- Argument
- Entry name.
Extract the value for an entry in the local (server) certificate issuer for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.
-
inbound-cert-local-subject-field
¶
- Result
- string
- Argument
- Entry name.
Extract the value for an entry in the local (server) certificate subject for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.
-
inbound-cert-remote-issuer-field
¶
- Result
- string
- Argument
- Entry name.
Extract the value for an entry in the remote (client) certificate issuer for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.
If a client certificate wasn’t provided or failed validation, this will yield an empty string.
-
inbound-cert-remote-subject-field
¶
- Result
- string
- Argument
- Entry name.
Extract the value for an entry in the remote (client) certificate subject for an inbound session. This will accept a short or long name as the argument. Note these names are case sensitive.
If a client certificate wasn’t provided or failed validation, this will yield an empty string.
-
outbound-txn-count
¶
- Result
- integer
The number of transactions between the Traffic Server proxy and the origin server from a single session. Any value greater than zero indicates connection reuse.
with: outbound-txn-count
select:
- gt: 10
do:
- proxy-rsp-field<Connection>: "close"
Warning
For ATS versions before 10, this will return 0 and the value should not be taken into consideration to determine connection reuse.
Duration¶
A “duration” is a span of time. This is specified by one of a set of extractors.
-
milliseconds
¶
- Result
- duration
- Argument
- count
A duration of count milliseconds.
-
seconds
¶
- Result
- duration
- Argument
- count
A duration of count seconds.
-
minutes
¶
- Result
- duration
- Argument
- count
A duration of count minutes.
-
hours
¶
- Result
- duration
- Argument
- count
A duration of count hours.
Utility¶
This is an ecletic collection of extractors that do not depend on transaction or session data.
-
...
¶
- Result
- any
The feature for the most recent with
.
-
random
¶
- Result
- integer
Generate a random integer in a uniform distribution. The default range is 0..99 because the most common use is for a percentage. This can be changed by adding arguments. A single number argument changes the upper bound. Two arguments changes the range. E.g.
random<199>
generates integers in the range 0..199.
random<1,100>
generates integers in the range 1..100.The usual style for using this in a percentage form is
with: random select: - lt: 5 # match 5% of the time do: # ... - lt: 25: # match 20% of the time - 25% less the previous 5% do: # ...
-
text-block
¶
- Result
- string
- Argument
- name
Extract the content of the text block (defined by a text-block-define
) for name.
-
ip-col
¶
- Argument
- Column name or index
This must be used in the context of the modifier ip-space
which creates the row context
needed to extract the column value for that row. The argument can be the name of the column, if
it has a name, or the index. Note index 0 is the IP address range, and data columns start at
index 1.
-
stat
¶
- Result
- integer
- Argument
- Plugin statistic name.
This extracts the value of a plugin statistic, which is currently limited to integers by Traffic Server.
Note statistic values are eventually consistent, there can be multiple second delays between
incrementing a statistic with stat-update
and the value changing.
-
env
¶
- Result
- string
- Argument
- Variable name
Extract the value of the named variable from the process environment.
-
inbound-tcp-info
¶
- Result
- integer
- Argument
- Field name
Extracts a field value from the tcp_info
data available on some operating systems. If not available, NULL
is returned.
The currently supported fields are
- rtt
Round trip time.
- rto
Retransmission timeout.
- retrans
Retransmits.
- snd-cwnd
Outbound congestion window.