Security research, news and guidance

Input Validation

This principle is certainly not a silver bullet but if you ensure that all of the data received and processed by your application is sufficiently validated you can go along way towards preventing many of the common vulnerabilities being actively exploited by malicious users. It is important for you to understand what data your application should accept, what its syntax should be and its minimum and maximum lengths. This information will allow you to define a set of “known good” values for every entry point that externally supplied data could exist.

Two main approaches exist for input validation called whitelisting and blacklisting respectively. It would be wrong to suggest either of these approaches is always the right answer but it is largely accepted that validating inputs against whitelists will be the most secure option. A whitelist will allow you to define what data should be accepted by your application for a given input point, in short you define a set of “known good inputs”. The blacklist approach will attempt to do the opposite by defining a set of “known bad inputs” which requires the developer to understand a wide range of potentially malicious inputs.

A simple regular expression used for whitelisting a credit card number input is shown below:

^\\d{12,16}$

This will ensure that any data received in this input point is a number (\\d = 0-9) with a minimum length of 12 and a maximum of 16 ({12,16}). Although this is a simple example it clearly demonstrates the power of whitelist validation techniques because this input point will now prevent many common attacks.

The blacklisting approach will try to identify potentially malicious inputs and then replace or remove them. The example shown below will search the data received through an input point and replace any single quotes with a double quote.

s.replaceAll(Pattern.quote(” ‘ “),
Matcher.quoteReplacement(” ” “));

The blacklisting approach is often avoided where possible because it only protects against threats the developer could think of at the time of its creation. This means the blacklist might miss new attack vectors and have higher maintenance costs when compared to a whitelist.

Input Validation best practices:

  • Apply whitelists (known good values) where possible.
  • Canonicalise all inputs. This means reducing the data received to its simplest form, if the validation functions only searches for UTF-8 input an attacker could use another encoding method, like UTF-16, to encode the malicious characters and bypass the validation function.
  • Check for content (i.e. 0-9), minimum and maximum lengths and correct syntax of all inputs.

VIDEOS & SLIDESHARES

Look at our latest security Videos & SlideShares

EVENTS & SEMINARS

Upcoming Security Events & Seminars

PODCASTS & DOWNLOADS

Check out our Podcasts & White Papers