Security research, news and guidance

Output Validation using the OWASP ESAPI

October 29, 2009  |  Written by Security Ninja  |   Application Security   |   4 Comments

Hi everyone,

It is time for me to publish the second post from our Principles of Secure Development to OWASP ESAPI mapping series. We will be looking at Output Validation and how the OWASP ESAPI can be used to implement this principle. I have taken onboard the feedback from the previous post and given more detailed examples of how you can implement the ESAPI within your environment.

In the previous post we explained why it was important to validate all input into your application and how you could do this using the ESAPI. If you are familiar with the Principles of Secure Development you will know that data validation is a two way process, your applications should validate data that is received through its input points and any data returned through its output points.

Some attacks such as Cross Site Scripting can take advantage of poorly validated output to attack unsuspecting end users through your application. There are three main issues associated with output validation that you should always aim to address in your application; they are data encoding, data format and data length.

The data encoding process is slightly different depending on where your output is going to end up. We need to understand what data our application will return and precisely where it is going to end up. The approach taken to identify the data and its destination is similar to the one we took when we were implementing Input Validation; we need to define the data types our application will output as well as its destination.

We have included a mockup of our example application below which we will use to help us define our data types and output validation requirements.

NinjaSearch

The Ninja Search page is straightforward with regards to data it will receive; there is only one input point as you can see in the mockup above.

When a search is performed we will have two output points to validate:

NinjaSearchResults

We have two data types on the search results page which we need to validate correctly. The two data types have different destinations and have different output validation requirements.

Before we detail the issues we face we should first define the data types we need to output, we have included them in the table below:

Data Type Content Max Length Destination
Search Query a-zA-Z0-9!ӣ$%^&*()-_=+<>,./?\\|@#~:; 200 URL Parameter
Search Query a-zA-Z0-9!ӣ$%^&*()-_=+<>,./?\\|@#~:; 200 HTML of Results Page

You can see that the Search Query data type is allowed to use potentially dangerous characters such as < and > meaning that we cannot simply remove or replace these characters – we must validate the output and apply the appropriate encoding to prevent attacks such as Cross Site Scripting.

We need help with validating output and it sure would be nice to return a nicely sanitised error message to the user if potentially malicious data is entered into our application. The ESAPI can help us with all of these issues.

The ESAPI modules are shown in the image below, we have highlighted the modules we will use for our output validation requirements.

InputVALESAPI

So now that we know the modules we will be using lets dive into the details of each one and see how we can use it to secure our application.

ESAPI Encoder

The ESAPI Encoder module provides you with a set of methods which allow you to correctly decode data being received by your application and encode the data your application will output. This module will allow us to address the main issue we outlined earlier – validation of output. To correctly validate data that is returned by Ninja Search we need to apply the appropriate encoding, this should be determined by the destination field in the data type definition table.

Before we get into the details of the methods we can use within the ESAPI we should explore the two destinations more closely. The destination field for our data types tells us that we will be writing the search query into a URL parameter and into the HTML of the search results page. This means we have to apply two different types of encoding to correctly validate the output from Ninja Search, the two different types of encoding are explained below.

URL Encoding

URL Encoding (otherwise known as Percent Encoding) is used to encode data that will be contained within a URL. The characters that are allowed in URLs are defined as either Reserved or Unreserved in RFC 3986. The reserved characters are defined as characters which can sometimes have a special meaning in URLs, the RFC states that amongst other things the Reserved character are used as delimiters in URLs.

You will often see Reserved characters such as = ? and & used as delimiters in URLs.

We obviously need to be able to use these characters in some circumstances without them being used in this special way. This is where URL encoding comes in, if we want to use any of the reserved characters in our URL we must URL encode them.

If you visit Google and search for 1=1/ which contains two Reserved characters you will see that they are URL encoded in the URL:

Google URL Encoding

The Unreserved character set is much easier to understand, it contains upper and lower case A-Z, 0-9 - _ . ~ and doesn’t require any type of encoding.

You can also use a additional characters in URLs such as < > { | and ^. These characters are treated as Unreserved characters as well.

HTML Encoding

In my opinion HTML Encoding is much simpler than URL Encoding. We will use HTML Encoding to represent characters which may have special meanings as a literal character. We use character entity references to replace these potentially dangerous characters with a symbolic name, the common character entity references are shown below (there are around 250 character entity references):

&lt;” represents the < sign.

&gt;” represents the > sign.

&amp;” represents the & sign.

&quot;” represents the mark.

You can see that a Google search for <> uses HTML Encoding to display the < and > characters to the user:

Google HTML Encoding

So we now know our data types, the destination for each data type and the two types of encoding we will use – lets see how the OWASP ESAPI Encoder can help us!

If a Ninja Search user enters some potentially malicious input we need to ensure that we correctly encode it because we know this will end up in our URL and results page HTML.

To correctly encode the output that will end up in our URL parameter we will use the ESAPI encodeForURL method. I have included the reference implementation of the method below:

Code Example1

This method will take a string and encodes it for use in a URL. I have to admit that I found the encoder part of the ESAPI more difficult to understand than the Validator that we focused on in the last blog post. I did some digging and I will explain my understanding of it below:

Code Example2

The encodeForURL method will take a string and encode it to be used in a URL.

Code Example3

We will use the Sun URLEncoder to perform the encoding for us here. We need to retrieve a value from the ESAPI Security Configuration, specifically the character encoding type we will be using (taken from SecurityConfiguration.java):

Code Example4

If we run into any exceptions we will handle them with the two EncodingExceptions as seen below:

Code Example5

The method will either return a URL encoded string or an exception.

By performing URL Encoding we have addressed the risk of malicious characters being entered into the URL parameter.

We still have to perform HTML Encoding for the second output point in our application though. We will use a different method for the HTML Encoding and a slightly different approach which can be seen in the reference implementation below:

Code Example6

I think the first four lines are easy to understand so I will jump in at line 5 of this example:

Code Example7

We are encoding characters one at a time with the HTML Encoder instead of a string all at once. In the code above c is the specific character we are encoding. If the input is either \\t (tab), \\n (newline) or \\r (return) we won’t encode it.

Code Example8

In the next section of the code we check whether the input is in the standard ASCII printable characters range. The printable range is 0x20 (space) through to 0x7E (~). If the characters are outside of that range we will log an alert.

Code Example9

The final piece of the code will perform the HTML encoding for us. You can see that we are using something called the htmlCodec which I haven’t explained yet. The Encoder module of the ESAPI has 9 codecs that can be used for data encoding, we are going to use the htmlCodec to encode our output. I have included some code from the reference implementation that shows how we make use of the codecs:

Code Example10

Code Example11

In short the htmlcodec contains character entity references we discussed earlier, you can see a few of them below:

Code Example12

The other two items you can see after htmlcodec called CHAR_ALPHANUMERICS and IMMUNE_HTML refer to characters which shouldn’t be encoded. The CHAR_ALPHANUMERICS is a-zA-Z0-9 and the IMMUNE_HTML character set is shown below:

Code Example13

ESAPI Validator

We must also ensure that we validate the content and length of the output by using the ESAPI Validator module. We covered the Validator in depth in the first blog post in this series so we won’t cover it again here.

ESAPI Exception Handling and Logger

The ESAPI Logger module provides you with a set of methods which will allow you to easily log any exceptions that occur in your application.

We won’t be covering the exception handling and logging capabilities of the ESAPI here because we will cover this in the Error Handling principle/ESAPI mapping post but we have seen in our examples that we have some potential exceptions to deal with.

I would love to go into more details about this module here but I would have nothing left to say in the Error Handling post if I did!

Error Handling

The Error Handling principle will be mapped to the ESAPI in an upcoming blog post so stay tuned to Security Ninja through our RSS feed and Twitter.

References

RFC 3986 Uniform Resource Identifier (URI): Generic Syntax

HTML Document Representation

Character entity references in HTML 4

Character encodings in HTML

I hope you have found this blog post useful, please do contact me with any feedback you have!

SN

This entry was posted on October 29, 2009 at 10:21 am and is filed under Application Security . You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 comments   >

  1. Neil Matatall says:

    I like the cartoonish prototypes :) but not as much as I like esapi

  2. Pingback: Security Ninja’s Output Validation Post « Supply Chain Technology

  3. Pingback: Interesting Information Security Bits for 11/02/2009 | Infosec Ramblings

  4. JurJar says:

    I did not understand the URL Encoding part, is it not that browser handles this case when you sumbit a form. The only case is when you construct the get URL, but is it relevant to given example?

Leave a comment

VIDEOS & SLIDESHARES

Look at our latest security Videos & SlideShares

EVENTS & SEMINARS

Upcoming Security Events & Seminars

PODCASTS & DOWNLOADS

Check out our Podcasts & White Papers