WebCleaner is featured in the Linux Magazine Issue 43. The article is downloadable as PDF.
Download the latest packages from WebCleaner download section. There are also Md5sum checksums from above files.
Requirements and installation instructions are located at the install documentation. To see what has changed between releases look at the ChangeLog.
Proxy configuration Filter configuration
The first feature that sets WebCleaner apart from other proxies is exact HTML filtering, and this removes a lot of advertisings. The filter does not just replace some strings, the proxy parses all HTML data. The parser is fast (written in C) and can cope with every broken HTML page out there; if the parser does not recognize HTML structures, it just passes the data over to the proxy until it recognizes a tag again. No valid HTML data is ever discarded or dropped.
Another feature is the JavaScript filtering: JavaScript data is executed in the integrated Spidermonkey JavaScript engine which is also used by the Mozilla browser suite. This eliminates all JavaScript obfuscation, popups, and document.write() stuff, but the other JavaScript functions still work as usual.
Exact HTML filtering has another good side-effect: it is possible to detect and prevent known security flaws in HTML processors. Several known buffer overflow exploits or Denial of Service attacks are detected and fixed by the HtmlSecurity class.
Furthermore, WebCleaner can filter SSL traffic used in https:// URLs. See the SSL gateway documentation for more info.
Assuming your proxy runs on port 8080, point your browser to http://localhost:8080/ to configure the proxy. The underlying configuration format is a custom XML format which is explained in config/filter.dtd and config/webcleaner.dtd.
Please note that the web configuration interface needs write permissions in the configuration directory.
The proxy is supervised and automatically (re-)started from the runit package. See the runit homepage for more information.
The proxy is a normal NT service and can be started/stopped from the "Administrative Tasks" entry in the system configuration.
To allow using your proxy from other hosts than the one it is running on, you have to edit the allowed host list in the configuration interface.
For example to allow access from your local LAN network at 192.168.1.* you would add 192.168.1.1/8 to the allowed host list.
If you do allow access from other hosts than your own, please do not remove the password protection. Otherwise you will be running an open proxy which is a security risk.
For help and bug reports you can join the webcleaner-users@lists.sourceforge.net mailing list at the subscription page or read the list archives.
WebCleaner is not a HTTP compliant proxy because it modifies requests, headers and data. Modifications aside, the proxy tries to fulfill the HTTP/1.1 specifications found in RFC 2616.
Surf performance will decrease, especially with the Rewriter and the Replacer module enabled. It will decrease further with JavaScript parsing enabled, since the proxy downloads and parses <script src=""> tags in the background.
The Rewriter module parses the HTML. It optimizes HTML by making tags and attribute names lowercase and removing some (but not all) ignorable whitespace.
The warning "unsupported content encoding" could lead to corrupted HTML pages. WebCleaner tries to filter even unknown-encoded content to prevent Denial of Service attacks (eg webservers sending always an unknown content-encoding). Currently, this affects only the "compress" or "x-compress" encoding, because the LZW algorithm to uncompress such content is patented and therefore not included in WebCleaner. See http://www.burnallgifs.org.