At work, we needed to implement some HTML scrubbing and XSS protection across a Perl Catalyst-powered API, so went looking for existing solutions. We found Catalyst::Plugin::HTML::Scrubber which did some of what we needed, but did not scrub within encoded PUT/POST bodies e.g. POSTed JSON.
I implemented some improvements to provide this, but sadly the original author could not be reached – it seems he hasn’t been active in the Perl community for quite some time. With a little help from the CPAN admins (thanks!) I obtained maintainership of it, and have since got a couple of releases out which add the features we needed:
- scrubbing HTML/XSS attempts within both normal parameters (querystring / POSTed forms) and also recursively within PUTted/POSTed JSON etc
- the ability to whitelist certain parameters by name or regex to exclude them from scrubbing – we have some admin-only areas where staff can enter “message of the day” content which is allowed to contain HTML
- a “no encode HTML entities” option to undo HTML::Scrubber‘s automatic HTML entity encoding of e.g. angle brackets – whilst content destined for the browser wants to be HTML-encoded, inbound parameters don’t want that, we just want HTML /XSS attempts stripped, a parameter value like >= 5 should be left alone
The amended version can be found on CPAN – Catalyst::Plugin::HTML::Scrubber.
(Aside: yes, it has been, er, quite some time since I posted anything on this blog.)