Extractor Development¶
Extractors are referenced by feature expressions. This means every extractor must be able to output to a string, and may optionally provide typed data.
Unlike other elements, use of an extractor involves referencing a global instance, rather than instantiating an instance per use. This is because
Extractors are used far more frequently.
Most extractors do not require any local storage or state.
All extractors are implemented by a class. This must be a subclass of Extractor. By
convention the name of the class should be “Ex_” followed by the extractor name. For example the
class Ex_ua_req_url
is the implementation of the “ua-req-url” extractor.
By convention, a TextView
named NAME
is declared to define the name of the
extractor. This isn’t required, the name is defined by the registration call, but it’s convenient.
There are several methods that are needed to be fully functional. Several of them take a Extractor::Spec parameter. For any specific use of an extractor there is a single instance of this class which is passed to all methods of the extractor. In some sense, this represents the per use instance data. This class is a subclass of the BufferWriter specifier to provide additional members. These are
_exf
A pointer to the extractor instance. This is used to call the extractor during feature extraction.
_name
The name of the extractor used in the feature expression.
_data
A memory span which is by default empty. It can be used to store per instance data if needed as described below in the examples.
Required Methods¶
swoc::Rv<ActiveType> validate(Config & cfg, Spec & spec, swoc::TextView const& arg);
This is called during configuration loading when the extractor is parsed. It is expected to do two things -
Validate the argument if any.
Indicate the return type.
If the extractor can only return a string and has no argument, the base implementation can be used,
which will always return the types STRING
and NIL
and no errors.
- cfg
The configuration object, representing the configuration being loaded.
- spec
The parsed specifier for the extractor. This can also be used to store instance data if needed.
- arg
The argument to the extractor, if any. Arguments are specified by adding angle enclosed text after the extractor. For instance the proxy response field extrator
proxy-rsp-field
requires an argument that is the field name -proxy-rsp-field<Best-Band>
to get the field with the name “Best-Band’. If an argument is required, thevalidate
method must parse the argument and validate it, returning an error if it is invalid.
An extractor that returns any type other than a string must override this method.
Feature extract(Context & ctx, Spec const& spec);
This method must be overridden. This is called when the value for the extractor is needed and should perform the extraction, returning the result.
- ctx
The context for the transaction.
- spec
The parsed specifier. This is the same instance passed to
validate
.
swoc::BufferWriter & format(swoc::BufferWriter& w, Spec const& spec, Context & ctx);
This method is called when the value for the extractor is needed in a string. The method must output the extracted value to the buffer as a string.
- w
The output buffer.
- spec
The parsed specifier. This is the same instance passed to
validate
.- ctx
The context instance.
The extract
and format
mehods are closely related and generally one will invoke the
other, most frequently format
calling extract
and passing the result to
bwformat
to generate the string output. Therefore there is a default implementation of this
method.
return bwformat(w, spec, this->extract(ctx, spec));
If this suffices, then it does not be to be overridden. There are cases where this is necessary which is why the methods are separate.
In some cases an extractor needs to store instance related information. This should be allocated
from configuration memory. The specifier has a member Extractor::Spec::_data which holds a
MemSpan<void>
. Because the same specifier instance is passed to validate
and
extract
a configuration allocated span can be stored there for later retrieval. While any
span can be assigned to a void span, the MemSpan::rebind<T>
method must be used to retrieve the actual
type.
String Extractor¶
For performance reasons string extractors are required to extract into transient context memory. If the
output size isn’t reasonably bounded at extraction time then it may be necessary to attempt the
extraction, detect the transient memory length being insufficient, and trying again. To simplify this
there is a class, StringExtractor to help with the implementation. This requires the extractor
to implement the format
method and uses that to implement the extract
method.
Example¶
Consider an extractor for the inbound transaction count. The code is in plugin/src/Ex_Ssn.cc.
The implementation is done in two parts
Specifically for extractor, the Traffic Server plugin API support must be extended to call
TSHttpSsnTransactionCount
to perform the actual extraction. This is straight forward. A
method is added to the HTTP session support class ts::HttpSsn
in
plugin/include/txn_box/ts_util.h.
unsigned HttpSsn::txn_count() const { return TSHttpSsnTransactionCount(_ssn); };
Given access to the data to be extracted, the next step is to define the extractor class.
class Ex_inbound_txn_count : public Extractor {
public:
static constexpr TextView NAME { "inbound-txn-count" };
Rv<ActiveType> validate(Config&, Extractor::Spec&, TextView const&) override;
Feature extract(Context & ctx, Spec const& spec) override;
};
This is a minimal implementation. The method implemtations are straight forward.
Rv<ActiveType> Ex_inbound_txn_count::validate(Config&, Extractor::Spec&, TextView const&) {
return ActiveType{ INTEGER }; // never a problem, just return the type.
}
Feature Ex_inbound_txn_count::extract(Context &ctx, Spec const&) {
return feature_type_for<INTEGER>(ctx.inbound_ssn().txn_count());
}
The validate
method doesn’t check for any errors (as there is no argument) and returns an
active type of “INTEGER” which is the type of value extracted. The extract
method retrieves
the inbound session from the context instance and then gets the transaction count from there. The
method is required to return a Feature instance. This type can be constructed from any of the
valid feature types. The meta-function feature_type_for is used to retrieve the feature type
used for INTEGER values and the methods constructions casts the transaction count to that type and
returns it, which in turn constructs a feature with the value and type.
This provides the implementation but the extractor must be declared and registered to be used. This is done in a static initializer in the source file.
namespace {
Ex_inbound_txn_count inbound_txn_count;
[[maybe_unused]] bool INITIALIZED = [] () -> bool {
Extractor::define(Ex_inbound_txn_count::NAME, &inbound_txn_count);
return true;
} ();
} // namespace
This declares a file scope instance of the extractor class and a static bool
variable
“INITIALIZED”. The value is set to the result of a lambda that takes no arguments. The point of this
is to force the invocation of the lambda which in turns calls Extractor::define to define the
“inbound-txn-count” extractor, passing the extractor name and implementation class instance. The
enclosing anonymous namespace
helps avoid name collisions by preventing any external
linkage.
As an example of instance storage, the random extractor (Ex_random) must store two integers
which are the limits of the generated value. The argument for this is parsed in validate
and
stored using the code
auto values = cfg.alloc_span<feature_type_for<INTEGER>>(2);
spec._data = values; // remember where the storage is.
values gets a configuratin allocated span the size of two integers. This is then cached in
the specifier and other code parses the arguments and sets the values in the span. During invocation
in extract
the values are retrieved.
auto values = spec._data.rebind<feature_type_for<INTEGER>>();
As before, values is a MemSpan<feature_type_for<INTEGER>>
of size 2 and therefore the
values can be accessed as values[0]
and values[1]
.
More commonly a nested class will be defined and used as the configuration type, allocating a span of size 1, but the mechanism is the same.
Note this memory is uninitialized. If a class instance is to be stored it must be completely
assigned by the code (as is the case for Ex_random
) or placement new
should be used
to construct to a known state. It is usually the case that all of the members are set (because if
the member isn’t set during configuration load, why is it there?) but sometimes more complex
initialization is required.
For the random extractor this could have been done with
using I = feature_type_for<INTEGER>;
auto values = cfg.alloc_span<I>(2);
values.apply([](I& i) { new (&i) I; });
spec._data = values; // remember where the storage is.
While clearly not really useful for an integral type, the technique is identical for a class, only the type is the class intead of the feature integer value type.
Or, if zero initialized memory suffices
auto values = cfg.alloc_span<feature_type_for<INTEGER>>(2);
memset(values, 0);
spec._data = values; // remember where the storage is.
Note
This configuration allocated memory is per configuration. That means it can be accessed from multiple threads in different transactions simultaneously.