Easy file finding with File::Find::Rule

Recently I found File::Find::Rule on the CPAN, and I’m impressed how easy it makes it to get a list of files to work on.

A fairly common way to do this in Perl would be something like:

my $dirh = new DirHandle($somedir);
while (my $entry = $dirh->read) {
    # Skip hidden files and directories:
    next if ($entry =~ /^\./ || !-f $entry);

    # Skip if it doesn't match the name we want:
    next if ($entry !~ /\.txt$/);

    print "Found: $somedir/$entry\n";
}

File::Find::Rule makes things rather easier:

my @files = File::Find::Rule->file()->name('*.txt')->in($somedir);

Various conditions can be chained together to find exactly what you want.

Another example, showing combining rules with ->any() to find files matching any of those conditions:

# find avis, movs, things over 200M and empty files
my @files = File::Find::Rule->any(
    File::Find::Rule->name( '*.avi', '*.mov' ),
    File::Find::Rule->size( '>200M' ),
    File::Find::Rule->file->empty,
)->in('/home');

There’s plenty of other ways to do this, but I think File::Find::Rule gives a way to clearly and concisely state what you want and get the job done.

Favourite new Perl features

I’ve been starting to make use of the new features introduced in perl 5.10 recently (after being constrained by my main dev environments still running perl 5.8.8, and not having the time to upgrade).

My favourite features so far are:

The smart match operator

The new smart-match operator, ~~, is a great example of DWIM.

A few examples:

if (@a ~~ 'foo')  # list contains at least one item equalling 'foo'
if (@a ~~ /foo/) # list contains at least one item matching /fo+/
if (@a ~~ @b)   # lists contain same values

That’s just a brief overview; there’s plenty more documentation

say

Not a big change, but the new say keyword acts just like print, but adds an implicit newline to the end – so say 'Hello'; is just the same as print "Hello\n";

It’s more useful in cases where you would have had to add parenthesis to get correct precedence – something like: print join(';', @foo) . "\n"; can now be written more concisely as just say join ';', @foo;.

Switch (given) statement

given ($foo) {
    when (/^abc/) { abc(); }
    when (/^def/) { def(); }
    when (/^xyz/) { xyz(); }
    default { die "Unrecognised foo"; }
}

Defined-or

// is now the defined-or operator.

It’s pretty common to use conditional assignments like: $a ||= $b to assign to $a unless $a already has a value. Now you can use $a //= $b to test for definedness rather than truthiness.

Likewise, if ($hash{foo} // $hash{bar}) will be true if either of them is defined (even if they’re defined but have a false value).

Named regex captures

Parenthesised sub-expressions in regular expressions can now be given a name, and accessed via the special %+ hash:

if ($foo =~ m{ (? \d{4} ) - (? \d{2}) - (? \d{2}) }xms) {
    say "Year: $+{year}";
}

The features above are my own personal favourites, in no particular order. The full (large) set of changes can be found in the perldelta for 5.10.0.

Monitoring Twitter via RSS search result feeds

Want to make better use of Twitter, getting a better signal to noise ratio, spotting tweets about things you’re interested in without having to wade through things you’re not?

Use the Twitter search, and enter terms you’re interested in, seperated by OR (e.g. badgers OR mushrooms OR snakes), then hit search – as you’d expect, you’ll find any tweets mentioning any of the terms you mentioned.

Now, in the top right of the results page, is a “Feed for this query” link – this gives you an RSS feed for this search which you can subscribe to in your RSS reader of choice. Now, you have a feed you can monitor for anything that’s of interest for you, whether that be your personal interests, or your company’s brand and terms related to your target market (e.g. hosting OR domains OR ecommerce etc).

Of course, you could just hand-craft the feed URL, if you’d prefer – it will look like:

http://search.twitter.com/search.atom?lang=en&q=badgers+OR+mushrooms+OR+snakes

It’s easy enough to change that to whatever you want :)

Pastebin Firefox extension

My friend James Ronan has just released a Pastebin Firefox extension, making it even easier to paste code etc to pastebin.com.

As the code by Paul Dixon which powers pastebin.com is Open Source and can be installed on your own server, the extension allows you to provide the URL of your own private pastebin install if you have one – this is ideal for me, as we have a private pastebin setup at work which is often used.

Using the extension is as simple as right clicking and chosing “pastebin my clipboard”, which submits the contents of your clipboard (or highlight buffer) to pastebin, and copies the resulting URL to the clipboard, ready for you to paste on IRC / IM / whatever.
Continue reading Pastebin Firefox extension

Quick Fibonacci calculations are nothing new

Just read this post by Ben Newman (found via Reddit).

Now, the use of C++ templates to calculate the value at compile time rather than runtime is midly clever and amusing (if also impractical and convoluted) but the fact that it can calculate a Fibonacci number quickly is nothing new; it’s solely down to remembering the values you’ve already calculated, and not calculating them again needlessly.
Continue reading Quick Fibonacci calculations are nothing new

Preventing SSH brute-force attacks with iptables

I wanted to do this tonight and couldn’t remember the exact iptables incantation, and I know I’ll want it again, so sharing it here for me and for anyone else it may be useful to.

If you need SSH to be world-accessible, but don’t want to be plagued by SSH brute-force login attempts, the following ought to do the trick:


iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m limit --limit 3/minute -j ACCEPT

That will allow inbound SSH connections, but only 3 per minute (averaged) – this should be more than a user would normally need, but isn’t sufficient for a brute-force login attack. If someone tries a brute-force attack against you, after a few connections they’ll be ignored.

This is assuming that your default INPUT policy is to drop or reject packets, as it should be. For it to work, it also assumes that you have a rule to allow inbound connections which are part of an established connection – something like:


iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

Of course, if you’re going to open SSH to the world, even with the above protection, you’ll still want to ensure passwords are secure (or disable password-based authentication totally, using SSH keys instead), and I’d recommend disabling root logins via SSH (the PermitRootLogin setting in /etc/ssh/sshd_config)

Happy New Year!

Just a quick post to say Happy New Year everyone! I hope 2009 will be a good year for you.

And, in case you’re wondering, no, I’m not sad enough to be sitting in front of my computer on NYE; this post was scheduled several days ago ;)

David Precious – professional Perl developer, motorcyclist and beer drinker