Content Filtering with a Web Proxy
Why would I want to filter the content on web pages? I dedicated another site to explain why. As far as I can tell there are only two places that web filtering is done: on a router or on the individual's desktop. I use an OpenBSD router running the Polipo and a custom redirector that I wrote.
Polipo Web Proxy
I think Squid is alkward, and it doesn't speak IPv6 natively, so so I use Polipo. These are the changes I made to .polipo:
proxyAddress = "::0" diskCacheRoot = "~/.polipo-cache/" logFile = /home/eradman/log/polipo redirector = /home/eradman/bin/media-sanitizer
And then cron starts it under my username. I also HUP the redirector to erase the access log and reload the rules.
@reboot /usr/local/bin/polipo daemonise=yes 30 1 * * * /bin/kill -HUP `pgrep media-sanitizer`
My blacklist file includes these blocks, with a few additional sites that I allow all media from:
# annoying media .jpg .jpeg .mp4 .swf .gif .mpeg .mpg .wmv .asf .divx .mov .avi thumbnail images galleries/ .yimg.com/image/ .video.search. imgad?
# exceptions en.wikipedia.org openbsd.org ...
MAC Filtering at the Router
I prevent all outbound access through port 80, but in order to keep my wife happy and to abide by the author's wishes I tag the MAC address of her computer on the wireless bridge so that I can apply rules to in with PF. This is bridgename.bridge0:
add wi0 up rule pass in on wi0 src 00:14:51:7a:7a:08 tag LAURA
Now I can add a rule that allows standard web traffic from this NIC
pass in on $int_if proto tcp tagged LAURA
but not from others
block in on $int_if proto tcp from any to any port 80
Now in pf.conf:
block in on $ext_if proto tcp from any to any port 3128 pass in on $ext_if proto tcp from $proxy_access to any port 3128
Lastly allow traffic from users authenticated by authpf
pass in on $ext_if proto tcp from <authpf_users> to any port $proxy_ports
Client Configuration
On my desktop I modified .profile so that all shells have the proxy set by default:
http_proxy=http://proxy.eradman.com:8123/ https_proxy=http://proxy.eradman.com:8123/ export http_proxy https_proxy
Web browsers such as Firefox, Konqueror, and Opera have to be manually configured, but nearly all command-line utilities read http_proxy from the environment.
Proxy Authentication
OpenBSD's authpf provides an effective means of enabling access from the outside. It works by invoking /usr/sbin/authpf as the user's shell. I put the following in /etc/authpf/authpf.message:
<< web proxy authenticated >>
Now PF rules can be added that allow the TCP keep-alive of one SSH connection to maintain open ports for other services.
nat-anchor "authpf/*" rdr-anchor "authpf/*" binat-anchor "authpf/*" anchor "authpf/*" pass in on $ext_if proto tcp from <authpf_users> to any port $proxy_ports
To make authentication for my proxy user generate a public key without a password, and then copy ~/.ssh/id_dsa.pub to the remote machine.
$ ssh-keygen -t dsa $ scp ~/.ssh/id_dsa.pub proxy.eradman.com:~/.ssh/authorized_keys
Shortcomings
- The most bothersome side-affect of using an HTTP proxy is that it bypasses local routing rules, so that an list of networks and hosts have to be manually entered in each web browser.
- If the Proxy is running IPv6 both the server and the client may hang if either one of them has limited or no IPv6 connectivity.
- Downloading large files is a very poor use of bandwidth when accessing the proxy remotely.
Friendly HTTP blocking
prdr is a little utility that gives users general instructions for setting their PC to use the proxy that you have configured. It works by rewriting the destination of packets from anywhere to TCP port 80 to a public interface. Setup instructions are listed in GUIDE, so I'll just add here the extra rules I used to add IPv6 support:
# inetd.conf
www stream tcp6 nowait webaccess /home/webaccess/redirector redirector \
/home/webaccess/notice
# pf.conf
six_if = "gif0"
# Exempt direct connections home.eradman.com
no rdr on $int_if inet6 proto tcp from any to ($six_if) port = www
# Redirect direct access for www to proxy setup instructions
rdr on $int_if inet6 proto tcp from 2001:4978:10a:7::/64 to any port www -> \
($six_if) port www
Notice that wrapping an interface name with parentheses results in the rule applying to the current IP addresses assigned to it.
Benefits
- Doesn't suffer from congestion because of blocked ICMP messages.
- Bypass user-level authentication on networks that limit access to TCP port 80.
- Personal filtering preferences are maintained wherever you travel.