Active content filtering

IBM® WebSphere® sMash provides support for removing and handling untrusted active content in requests and responses.

The zero.acf package contains an active content filter (ACF) that provides the support for:

Typical Web applications using Dojo (this is the default)
  • Remove active content (such as JavaScript, Applet, and ActiveX objects) from all inbound request parameters sent to any URIs. These parameters are considered as HTML fragments.
  • Remove active content from all String values in an inbound JSON object sent to any URIs. In this case, all content types are targeted.
  • Remove active content from all String values in an outbound JSON object sent by any URIs. In this case, all content types except for "text/html" are targeted.
Fine-grained control for Web applications using HTML
  • Validate whether an inbound request parameter sent to the specific URI includes active content.
  • Remove active content from all inbound request parameters sent to the specific URI.
  • Remove active content from an outbound HTML sent by the specific URI.
Fine-grained control for Web applications using JSON
  • Validate whether an inbound request parameter sent to the specific URI includes active content.
  • Remove active content from all inbound request parameters sent to the specific URI.
  • Validate whether a String value in an inbound JSON object sent to the specific URI includes active content.
  • Remove active content from all String values in an inbound JSON object sent to the specific URI.
  • Remove active content from all String values in an outbound JSON object.

The programmatic API For ACF is considered an advanced usage scenario. The API documentation detailing the programmatic APIs is in the references section of the Developer's Guide. The main classes of interest are zero.acf.ACFFactory and zero.acf.ACFProcessor.

Programmatic API for filtering inbound request parameters

When using ACF for a request parameter in your application, you need to invoke APIs that the zero.acf package provides.

Examples using APIs

Use the following steps to validate if a request parameter includes active content:

  1. Obtain a zero.acf.ACFProcessor instance using the zero.acf.ACFFactory.
  2. Invoke a validate() method of the ACFProcessor to see if ACF detects any active content in the request parameter.
String parameter = ...;

zero.acf.ACFProcessor processor = zero.acf.ACFFactory.getActiveContentProcessor("text/html");
String detected = processor.validate(parameter);
if (detected != null) {
        // You can throw an exception to let clients know that the active content is detected or
        // You can use XML Encoder API to avoid execution of any active content
}

Use the following steps to remove active content from a request parameter:

  1. Obtain a zero.acf.ACFProcessor instance using the zero.acf.ACFFactory.
  2. Invoke the process() method of the ACFProcessor to remove any active content from the request parameter.
String parameter = ...;

zero.acf.ACFProcessor processor = zero.acf.ACFFactory.getActiveContentProcessor("text/html");
String processedString = processor.process(parameter);
// use the processedString since ACF has removed possible unwanted active content from the parameter.

Programmatic API for filtering values in an inbound JSON object

When using ACF for an inbound JSON object in your application, you need to invoke APIs that the zero.acf package provides.

Using API examples

Use the following steps to validate if a value of an inbound JSON object includes active content.

  1. Obtain a zero.acf.ACFProcessor instance using the zero.acf.ACFFactory.
  2. Invoke a validate() method of the ACFProcessor to see if ACF detects any active content in the value.
String jsonValue = ...;

zero.acf.ACFProcessor processor = zero.acf.ACFFactory.getActiveContentProcessor("application/json");
String detected = processor.validate(jsonValue);
if (detected != null) {
        // You can throw an exception to let clients know that the active content is detected or
        // You can use XML Encoder API to avoid execution of any active content
}

Use the following steps to remove active content from a value in an inbound JSON object:

  1. Obtain a zero.acf.ACFProcessor instance using the zero.acf.ACFFactory.
  2. Invoke the process() method of the ACFProcessor to remove any active content from the value.
String jsonValue = ...;

zero.acf.ACFProcessor processor = zero.acf.ACFFactory.getActiveContentProcessor("application/json");
String processedString = processor.process(jsonValue);
// use the processedString since ACF has removed possible unwanted active content from the value.

Configuring ACF

When you add a dependency on the zero.acf, ACF provides the following level of protection, by default:

  1. Removes active content from all HTTP request parameters sent to any URIs. These parameters are considered as HTML fragments.
  2. Removes active content from all String values in an inbound JSON object sent to any URIs when the content type of the request is "application/json" or "text/json".
  3. Removes active content from all String values in an outbound JSON object sent by any URIs when the content type of the response is "application/json" or "text/json".
  4. Enables protection against Cross-Site Request Forgery (CSRF).

If you do not need more fine-grained filtering, there is no need to configure anything in your zero.config file. If you need more fine-grained filtering, set the value of enableByDefault to false, in your zero.config file. Also note this example also highlights the usage of CSRF protection. In this example, replace <csrfMode> with "" if you do not want protection against CSRF or "REQUEST" if you want to maintain CSRF protection. For an overview of CSRF protection see the CSRF documentation for details.

/config/acf/enableByDefault=false
/config/security/token/enableCsrfProtection=<csrfMode>

ACF configuration options

ACF configuration requires some properties to remove active content from the specified input such as inbound request parameters and outbound HTML.

When you use ACF, configure the following information in your zero.config file:

conditions (mandatory)
The URI pattern in the /request/path, which is protected. These are the same regex patterns that are supported as part of the condition operators in WebSphere sMash.
contentType (optional)
A list of the content types of the input. The default value is "["application/json", "text/json"]". When the content type in the request/response header includes one of the content types defined here, ACF is invoked. When you want to remove active content from an HTML document, set this property to "["text/html"]".
target (optional)
A list of the targets of ACF processing. The default value is "RESPONSE". When you want to remove active content from inbound request parameters (for HTML) or a inbound JSON object (for JSON), you need to set this property to "REQUEST". To remove active content from both inbound data and outbound data, set this property to "REQUEST_RESPONSE".
filterJsonDuringEncode (optional)
The flag that indicates whether JSON should have active content removed (if found) during JSON serialization and deserialization. The default value is true. If set to false, the application developer can filter the JSON data structure using the programmatic APIs.
filterRuleFile (optional)
The name of the file including your custom filter rules. For HTML, ACF works based on filter rules. When ACF filters inbound request parameters or an outbound HTML without this property, it uses the default filter rules bundled with ACF. But you can use your own custom filter rules when this property is set. ACF assumes that the configuration file is put in the config directory in your application.

The following examples show a few variations on how to configure ACF.

This configuration defines an active content filter on all outbound HTML resources that match the URI pattern "/app(/.*)?". In this example, if the content type of the response is "text/html", the HTML response is filtered based on the rule sets defined in the "acf-custom.xml" file.

@include "${/config/dependencies/zero.acf}/config/acf.config"{
	"conditions" : "/request/path =~ /app(/.*)?",
    "contentType" : ["text/html"],
    "filterRuleFile" : "acf-custom.xml"
}

This configuration defines an active content filter on all inbound JSON resources that match the URI pattern "/app(/.*)?" and all HTTP request parameters. In this example, if the content type of the request is "application/json" or "text/json", all String values in the inbound JSON object are filtered.

@include "${/config/dependencies/zero.acf}/config/acf.config"{
	"conditions" : "/request/path =~ /app(/.*)?",
	"target" : "REQUEST"
}

This configuration defines an active content filter on all inbound and outbound resources that match the URI pattern "/app(/.*)?". In this example, if the content type of the request is "application/json" or "text/json", all String values in the outbound JSON object are filtered when the JSON object is written to the outbound output stream. All String values in the inbound JSON object are also filtered when the JSON object is read from the inbound input stream.

@include "${/config/dependencies/zero.acf}/config/acf.config"{
	"conditions" : "/request/path =~ /app(/.*)?",
	"target" : "RESPONSE_REQUEST",
	"filterJsonDuringEncode" : false
}

Using ACF configuration file for HTML filtering

ACF filter rules are described in an XML file. You can use the default configuration file bundled in the zero.acf package or you can prepare your own configuration file.

The DTD structure of ACF configuration file

The following example shows the DTD of ACF configuration file:

<!ELEMENT config (filter-chain, filter-rule*) >

<!ELEMENT filter-chain (filter+) >

<!ELEMENT filter EMPTY >
<!ATTLIST filter name ID #REQUIRED
class CDATA #REQUIRED
verbose-output (true|false) "false" >

<!ELEMENT filter-rule (target*) >
<!ATTLIST filter-rule id IDREF #REQUIRED >

<!ELEMENT target (rule+, disable-rule?) >
<!ATTLIST target scope CDATA "default" >

<!ELEMENT rule EMPTY >

<!ELEMENT disable-rule (disable-target+) >
<!ELEMENT disable-target EMPTY >
<!ATTLIST disable-target scope CDATA #REQUIRED >

In this example, the following elements are used:

config
The root element of ACF configuration file. It can include only one filter chain.
filter-chain
The chain of filters. Each filter, described as a child element of this element, is applied in order of appearance.
filter
This element contains the basic information for each filter. This element can be used 0 (zero) or more times in the configuration file.
filter/@name
The name of the filter. You can specify an arbitrary name for the filter.
filter/@class
The name of the class that extends the com.ibm.trl.acf.api.Filter abstract class.
filter/@verbose-output (optional)
The default value is false. If it is set to true, comments are inserted wherever the active content is removed.
filter-rule
The actual filtering rule for each filter declared in the <filter> elements in the <filter-chain>. Each filter chain is required to have at least one filter rule.
filter-rule/@id
The associated ID of the filter rule that is specified in the <filter> as filter/@name.
target
This element specifies the filtering scope. You can create this element one or more times.
target/@scope
The value is described using the limited XPath. When the value is "default", the rules are applied to the whole input. The valid XPath can contain only the following limited axis and expressions: "/", "//", "[]" and "@=". For example, "/html/body" is valid but "//*[count(*)=3]" is not valid.
rule
Contains the parameters of the filter as custom attributes. The attributes that are defined depend upon the filter being configured.
disable-rule
This element has child elements that specify the scopes where the filter is not applied. Only the elements specified by the scope can be omitted.
disable-target
This element has child elements that specify the scopes where the filter is not applied.
disable-target/@scope
You can specify a plurality of scopes where the filter is not applied.

Default filter rules

The zero.acf package provides the default filter rules shown in the following example:

	<?xml version="1.0"?>
	<config>
	<filter-chain>
	<filter name="base" class="com.ibm.trl.acf.impl.html.basefilter.BaseFilter" verbose-output="false" />
	</filter-chain>
	
	<filter-rule id="base">
	<target scope="">
	<rule c14n="true" all="true" />
	
	<rule attribute="on" attribute-criterion="starts-with" action="remove-attribute-value" />
	<rule attribute="${" attribute-criterion="starts-with" action="remove-attribute-value" />
	<rule attribute="href" value="javascript" value-criterion="starts-with" action="remove-attribute-value" />
	<rule attribute="src" value="javascript" value-criterion="starts-with" action="remove-attribute-value" />
	<rule attribute="dynsrc" value="javascript" value-criterion="starts-with" action="remove-attribute-value" />
	<rule attribute="style" value="expression" value-criterion="contains" action="remove-attribute-value" />
	
	<rule tag="iFrame" action="remove-tag"/>
	<rule tag="applet" action="remove-tag" />
	<rule tag="embed" action="remove-tag" />
	<rule tag="object" action="remove-tag" />
	<rule tag="script" action="remove-tag" />
	<rule tag="link" attribute="rel" value="stylesheet" value-criterion="contains" action="remove-tag" />
	<rule tag="style" action="remove-tag" />
	</target>
	</filter-rule>
	</config>
	

If you do not specify the filterRuleFile argument when the API is used for the request parameter, or if you do not configure the filterRuleFile argument in your zero.config file for the response message, then the default rules are used.

ACF, with the default filter rules, canonicalizes the attribute value before ACF processing by doing the following tasks:

  • Decodes the value based on data URL scheme (RFC2397)
  • Resolves the entity reference
  • Decodes the URL encoding
  • Removes any whitespaces (tab, carriage return, and line feed)

It removes the following tags:

  • "iFrame"
  • "applet"
  • "embed"
  • "object"
  • "script"
  • "style"

It also removes any of the following tags:

  • "ilink" tags the "rel" attribute value of which matches "stylesheet"
  • attributes that start with "on" or "${"
  • "href", "src", and "dynsrc" attributes the value of which starts with "javascript"
  • "style" attributes the value of which contains "expression".

ACF assumes the response message is UTF-8 encoded if the content type response header does not specify a charset.

Version 1.0.0.3.25591