Content Filtering Technologies Overview

The technology itself is extremely interesting and evolving rapidly, but it is the debate around how this technology should be implemented that evokes passionate argument.

Network and Internet Content Filtering (ICF) is a technology tied to controversy. The technology itself is extremely interesting and evolving rapidly, but it is the debate around how this technology should be implemented that evokes passionate argument. Be it teenagers at home or employees in the workplace, the scenarios and ethical implications are still to be unravelled.

Enex TestLab has, for many years, been involved in testing filtering technology in order to provide our many and various clients with independent technical insight. Enex takes an impartial position with regard to the non-technical discussion, divorcing itself from the ethical debate and emotion of this topic. Our energy is explicitly dedicated to testing the underlying technical claims of both critics and product vendors alike.

With the debate currently re-ignited in relation to the Internet Service Providers election to voluntary filter child abuse material we thought it timely to cover the various technologies they may be considering.

This review looks at the various technologies that filter vendors incorporate into their solutions.

The two most common technical claims made about Internet filtering are related to their performance and their accuracy /effectiveness.

  • Performance impact on the network - critics claim significant performance impact while vendors claim low to no impact.

In truth, both can be correct. If one procures the wrong technology it will have a significant performance impact. On the other hand, certain solutions can have impressively low performance impact.

  • Accuracy/effectiveness - critics typically claim that significant over-blocking occurs when filters are introduced. Critics also claim that filters have limited effectiveness (under-blocking) in addressing content according to the administrator’s policy. Vendors, conversely, claim low-to-no over-blocking and very high levels of effectiveness in capturing content set by the administrator’s rules.

Over-blocking occurs when legitimate content is blocked. Under-blocking occurs when illegitimate content is not blocked. Again the real truth of these claims is dependent on the requirements and expectations of the administrator as well as their choice of technology. Typically, by tweaking and fine tuning the configuration and policy/rule-set, most filters can achieve acceptable levels of accuracy and effectiveness. Nevertheless, most filters can be circumvented by those with the intent.

Cost, as with any technology, is an influential consideration which should never be left out of any evaluation. Content filters generally fall within the scope of the enterprise’s information systems security department, but they are often delivered as a component of other products, for example unified threat management (UTM) devices.

It is difficult to undertake a typical risk analysis to determine the cost versus loss of this technology as it would involve calculating metrics such as lost productivity (eg. employees spending work-time on social networking sites); inadvertent access to undesirable material; loss of employees though access to online employment sites (yes some employers are paranoid); and rumours even exist of legal action by employees due to the loss of their personal details during internet banking transactions because the company computer systems were compromised (we said some employers were paranoid!). So like any security technology, the cost can only be justified via the means to an end for the individual application.

The technology itself is also inherently divisive. There are two common places to locate internet filters, either at the end-point on the client system, commonly known as PC-based filtering; or further upstream either on the local network connection (gateway) or at the point of internet provision. Filters can come as either client-server software solutions; stand-alone software applications ready for installation; or as part of an appliance (this could be as a dedicated filtering appliance or as a component of a multifunctional security appliance. Some even come on network switch/routers/gateway devices).

The decision about location comes down to a few factors, for example, a parent wishing to secure one or two systems in their home may select PC filtering because it provides local and granular control over the categories they wish to filter. An enterprise security architect/administrator may find that individual filters cause too many administrative overheads and therefore select to implement filtering policies/rule-sets at a network level. ISPs often provide added value for their subscribers by offering content filtering in a similar way they do to e-mail filtering. In summary each location/structure has advantages and disadvantages.

Like any technology there are many vendors each purporting to be number one. There are many vendors claiming to lead their class. Most of these vendors also try to be everything to everyone - blurring the lines between suitability for any one intended purpose. This is what procurement agents need to sort out. At the end of the day content filtering boils down to a few techniques. The key technologies will be covered in this review: Request and Response Filtering, Pass-through, Pass-by, Proxy, Deep Packet Inspection, and Hybrids.

Filtering can broadly fall into two categories; list-based filtering and smart filtering (using bayesian/heuristics). List-based filtering, as its name suggests, takes a pre-defined list and sorts the information passing through according to the list. The two forms of list filtering are black, and white. Black listing intends to block everything on the list while white listing blocks everything not on the list. The other category of filter - smart filtering - attempts to identify content as it passes through the filter. Then, according to rules established by the administrator, pass or block access to that content.

Some filter vendors also claim to now be looking at other content delivery vectors outside the traditional web (HTTP) such as peer-to-peer file transfers (P2P), File Transfer Protocol (FTP) and so on. Nevertheless, there are others, such as deep web and darknet, which still seem to be outside their bounds. There will also be other emerging technologies to cover vectors such as IPv6 and mobile/smartphones.

Network architects need to be aware of a few potential pitfalls when planning, evaluating and procuring content filtering systems. While outright deliberate human circumvention is one way to bypass filters, others such as encryption/tunnelling/encapsulation of the data from the content provider’s end are generally not within the scope of internet content filters unless specific rule-sets are created. Engineers also need to be aware of the peak loads/traffic that is likely to be passing across the network that will need to be filtered, not to mention how any changes to the network topology (logical structure) may affect other systems in place.

Testing?
We didn’t – not this time. (Well, not for this review anyway). What makes Enex TestLab an expert on network filtering is that we have been independently testing vendors’ claims across all technologies for over twenty two years. This review brings you the insight of that experience.

In addition, for over six years Enex TestLab has been specifically testing both PC-based and server-based filters. These projects include public reviews such as this one, NetAlert work and report in 2005/2006, Australian Communications and Media Authority (ACMA) closed environment trial in 2007/2008 , Department of Communications Information Technology and the Arts (DCITA) Protecting Australian Families Online PC based filter program (PAFO) in 2007/2008, and the recent Department of Broadband Communications and the Digital Economy (DBCDE) live filter trial in 2008/2009, and also the ongoing work over the last six years with the Internet Industry Association (IIA) and their Family Friendly Filter program .

Our colleagues at Nullabor Consulting, Lateral Plains and the University of Ballarat, kindly contributed to Enex TestLab’s ICF work by researching and documenting the common types of filtering technologies available.

Filtering Technologies

This review provides a basic overview of the main elements of common filtering technologies. Most filtering systems are combinations or variants of these elements.
In this review, client refers to a person making a request to a remote server to obtain some information. Unwanted material is material that a filter should prevent the client from obtaining. Material is blocked if the filter prevents the client obtaining the material. Over-blocking occurs when a filter blocks material that should not have been blocked. Under-blocking occurs when a filter fails to block material that should have been blocked.

Request filtering

Request filtering occurs when requests coming from a client are inspected, and action taken if the client appears to be requesting material that the filter should block.
Relatively few requests may generate very large amounts of data in response. In general there are far fewer request packets to inspect than there are response packets, so request filtering requires less computing power.

The disadvantage of request filtering lies in the assumption that a particular request will always result in the same response. It may be that a previously “innocent” request receives unwanted material in response. It can also happen that the material which would have been returned in response to a filtered request is no longer such as to warrant being filtered.

positive

  • Requests are always couched in well-defined protocols and are very well suited to automated inspection.
  • A good basis of many “smart” filters.

negative

  • Request filtering relies on the assumption that a particular request will always result in the same response.

overall

  • If one needs to be thorough in their filtering, a request-based filter should be considered.

Response filtering

Response filtering occurs when the material being sent back to a client is filtered. As noted above, response data is generally far greater in volume, and therein is the main disadvantage - it requires far greater computing power. Responses may also be video, images, music, speech, text or combinations of all of these, and are so not always amenable to automated inspection.

The advantage of response filtering is that the filter receives the actual material being sent to the client, regardless of where the data is coming from or what request was made. This advantage is somewhat illusory, however, because with current technology it is essentially impossible to determine through automated inspection of the data itself whether data is unwanted or not. For this reason, response filtering is relatively rarely employed.

positive

  • Inspects all content received.

negative

  • Requires significant computing power.
  • Can be subject to significant levels of under and over-blocking.

overall

  • One of the methods used by “smart” filter vendors but is generally very hit and miss and can lead to significant performance issues.

Pass-through

A pass-through filter is one which is positioned “in line” - that is, all data flowing between the client and the network flows through the filtering device.

The advantage of a pass-through filter is speed and simplicity. The disadvantage is speed because every single packet must be processed and inspected at wire speed, pass-through filtering requires a lot of computing power.

positive

  • Simple to manage
  • Provides good levels of control

negative

  • Can be a single point of failure in a network unless architected correctly.
  • Can be a performance bottleneck

overall

  • Good solution for an enterprise which needs higher levels of control over their network content. Be wary of the potential performance issues and costs.

Pass-by

A pass-by filter is one where the filtering device is not “in line”, but instead is placed “beside” the main data flow.

There are two main types of pass-by filter - mirroring/monitoring and hybrid (more about Hybrid Pass-By later).

A mirroring/monitoring pass-by filter receives a duplicate of some or all packets passing through a router or switch, and monitors those packets. If it detects unwanted material, it performs some action to block or otherwise disrupt the particular request. As with a pass-through filter, a mirroring/monitoring pass-by filter must be able to inspect all packets at wire speed.

The advantage of a pass-by filter is that the tasks of traffic forwarding and traffic filtering can be split, with each device optimised to do one task well. The potential for filter failure to impact on traffic flow is also minimised. The disadvantage is in added complexity.

positive

  • Overcomes some performance limitations of other technologies.
  • High levels of load balancing and redundancy

negative

  • Quite complex to configure
  • Needs to be deployed correctly to allow for maximum load or could become a bottle-neck in high-volume networks.

overall

  • Good solution for network owners who need to maintain performance.

Hybrid pass-by

In a hybrid pass-by filter one system makes a swift but crude determination about a data flow, either sending it on unfiltered or diverting it to a second stage. The second stage then inspects the data stream more closely and takes disruptive action if an unwanted data stream is seen. The first stage may be a simple header inspection, diverting data streams that involve particular IP addresses, for example.

The advantage of a hybrid system lies in the fact that in most situations the vast majority of traffic is unremarkable and does not need to be blocked. Only a small proportion needs to be forwarded to the second stage, and the second stage does not need to be able to handle wire speed. A hybrid system is less effective where very large data flows must be diverted, or in situations where there is no swift and simple way to distinguish between data flows that need filtering and those that do not.

The first stage can inspect something other than the data stream itself. For example, in a technique known as DNS poisoning, DNS lookups are inspected. If the client was seeking a possibly unwanted web page, the request would receive a faked response that diverted its connection attempt to a proxy server. The proxy server would perform a more detailed check, and only fetch web pages that were not on a blacklist of unwanted pages.

Hybrid pass-by has a disadvantage in that it produces an externally visible topology change - if using a proxy, for example, all diverted requests seem to come from the proxy. That can be a bad thing, witness what happened to Wikipedia in the UK when BT started sending all Wikipedia requests via a small number of proxies (and therefore IP pool). Wikipedia thought they were under attack as so many update/change requests were coming from what seemed a limited number of systems and Wikipedia’s automated safeguards kicked in thereby blocking access from those IP addresses, so while not a direct fault of the filter itself, the implication of network topology changes effected by filtering need to be considered.

Because proxy servers are typically able to filter only one protocol, they are almost always used as part of a hybrid system - a simple determination is made in the first stage, diverting all (or possibly only suspect) requests using a given protocol to an associated proxy server. All other protocols are passed directly.

positive

  • low performance impact

negative

  • can be quite complex to setup and configure correctly

overall

  • Good solution for network owners who need to maintain performance, providing it is provisioned correctly.

Proxy

A proxy is a system that mediates between a client and a server. Instead of connecting directly to the server, the client connects to the proxy. The proxy then makes a connection to the server on behalf of the client. This means that all communication between the client and the server passes through the proxy and can be inspected.

A proxy can typically handle only one or two protocols, and then only at the level of a particular application (typically, OSI Layer 7). Proxies are typically implemented in software rather than in hardware, so they are, in general, relatively slow. The best-known proxy in common use is the web proxy, which was originally developed to cache responses and thus save time and bandwidth.

positive

  • Proxies are a well known technology to network engineers.

negative

  • Limited in protocols handled.

overall

  • Solid networking knowledge and experience would be required to setup and configure a proxy server as a filter.

Deep packet inspection (DPI)

A packet being transmitted across the internet is composed of header information and a payload. The header is used to direct the packet from its source to the correct destination, analogously to the information written on the outside of an envelope. The payload is the actual content of the packet.

Inspecting the headers of packets can be done very rapidly, because all headers are the same (for a given IP protocol), and the format of the headers is well known. With deep packet inspection (DPI) the content of the packets are inspected, either as well as, or instead of, the headers. This is considerably more difficult, as relevant information may be spread out across many packets. The format of the payload is different depending on which application is sending the information. The kind of information sought and the tests to be done on it are generally more complex than the simple tests done on a packet header. DPI thus requires much more computing power than simple header inspection. As with pass-through, accuracy can often come down to how well the vendor has written their code to parse the information and detect content that should be blocked.

positive

  • Very thorough.

negative

  • Usually leads to significant performance impact.

overall

  • DPI is fragile and costly - filters for each protocol to be inspected must be independently developed, and must keep pace with changes in that protocol.

Last words

Filtering content is not necessarily a straight forward technical task, as many would suggest. There is a minefield of techniques and deployment possibilities - each of which has unique benefits and tradeoffs (and this is before the filter vendors themselves get involved in the equation).

If you are tasked with evaluating and procuring a filtering solution for your enterprise, narrow it down to the business’ requirements for the filter. Work with your network architects and engineers to gain an idea of current traffic flows, and then consider the options available. Bear in mind the key considerations of accuracy/effectiveness and performance. And above all ensure that your enterprise has a comprehensive employee internet/network acceptable usage policy and that it has been read, understood and accepted by all staff regularly. The best way to ensure this is to have a popup stating the policy and requiring a check box to be clicked every-time the employee logs on to their terminal.

Tags Filteringcontent filteringproxyrequest filteringsecurity reviewhybrid pass-bypass-through filterdeep packet inspection(DPI)response filteringpass-by filter

Show Comments