Eric Radman : a Journal

Content Filtering with a Web Proxy

Why would I want to filter the content on web pages? I dedicated another site to explain why. As far as I can tell there are only two places that web filtering is done: on a router or on the individual's desktop. I use an OpenBSD router running the Polipo and a custom redirector that I wrote.

Polipo Web Proxy

I think Squid is alkward, and it doesn't speak IPv6 natively, so so I use Polipo. These are the changes I made to .polipo:

proxyAddress = "::0"
diskCacheRoot = "~/.polipo-cache/"
logFile = /home/eradman/log/polipo
redirector = /home/eradman/bin/media-sanitizer

And then cron starts it under my username. I also HUP the redirector to erase the access log and reload the rules.

@reboot /usr/local/bin/polipo daemonise=yes
30 1 * * * /bin/kill -HUP `pgrep media-sanitizer`

My blacklist file includes these blocks, with a few additional sites that I allow all media from:

# annoying media
.jpg
.jpeg
.mp4
.swf
.gif
.mpeg
.mpg
.wmv
.asf
.divx
.mov
.avi
thumbnail
images
galleries/
.yimg.com/image/
.video.search.
imgad?
# exceptions
en.wikipedia.org
openbsd.org
...

MAC Filtering at the Router

I prevent all outbound access through port 80, but in order to keep my wife happy and to abide by the author's wishes I tag the MAC address of her computer on the wireless bridge so that I can apply rules to in with PF. This is bridgename.bridge0:

add wi0
up
rule pass in on wi0 src 00:14:51:7a:7a:08 tag LAURA

Now I can add a rule that allows standard web traffic from this NIC

pass in on $int_if proto tcp tagged LAURA

but not from others

block in on $int_if proto tcp from any to any port 80

Now in pf.conf:

block in on $ext_if proto tcp from any to any port 3128
pass in on $ext_if proto tcp from $proxy_access to any port 3128

Lastly allow traffic from users authenticated by authpf

pass in on $ext_if proto tcp from <authpf_users> to any port $proxy_ports

Client Configuration

On my desktop I modified .profile so that all shells have the proxy set by default:

http_proxy=http://proxy.eradman.com:8123/
https_proxy=http://proxy.eradman.com:8123/
export http_proxy https_proxy

Web browsers such as Firefox, Konqueror, and Opera have to be manually configured, but nearly all command-line utilities read http_proxy from the environment.

Proxy Authentication

OpenBSD's authpf provides an effective means of enabling access from the outside. It works by invoking /usr/sbin/authpf as the user's shell. I put the following in /etc/authpf/authpf.message:

<< web proxy authenticated >>

Now PF rules can be added that allow the TCP keep-alive of one SSH connection to maintain open ports for other services.

nat-anchor "authpf/*"
rdr-anchor "authpf/*"

binat-anchor "authpf/*"
anchor "authpf/*"

pass in on $ext_if proto tcp from <authpf_users> to any port $proxy_ports

To make authentication for my proxy user generate a public key without a password, and then copy ~/.ssh/id_dsa.pub to the remote machine.

$ ssh-keygen -t dsa
$ scp ~/.ssh/id_dsa.pub proxy.eradman.com:~/.ssh/authorized_keys

Shortcomings

  1. The most bothersome side-affect of using an HTTP proxy is that it bypasses local routing rules, so that an list of networks and hosts have to be manually entered in each web browser.
  2. If the Proxy is running IPv6 both the server and the client may hang if either one of them has limited or no IPv6 connectivity.
  3. Downloading large files is a very poor use of bandwidth when accessing the proxy remotely.

Friendly HTTP blocking

prdr is a little utility that gives users general instructions for setting their PC to use the proxy that you have configured. It works by rewriting the destination of packets from anywhere to TCP port 80 to a public interface. Setup instructions are listed in GUIDE, so I'll just add here the extra rules I used to add IPv6 support:

# inetd.conf

www stream tcp6    nowait  webaccess /home/webaccess/redirector redirector \
    /home/webaccess/notice
# pf.conf

six_if = "gif0"

# Exempt direct connections home.eradman.com
no rdr on $int_if inet6 proto tcp from any to ($six_if) port = www

# Redirect direct access for www to proxy setup instructions
rdr on $int_if inet6 proto tcp from 2001:4978:10a:7::/64 to any port www -> \
    ($six_if) port www

Notice that wrapping an interface name with parentheses results in the rule applying to the current IP addresses assigned to it.

Benefits

  1. Doesn't suffer from congestion because of blocked ICMP messages.
  2. Bypass user-level authentication on networks that limit access to TCP port 80.
  3. Personal filtering preferences are maintained wherever you travel.

$ Tue Jan 27 08:06:48 -0500 2009 $