Perlesque – The Knowin' Samoan

Scope Creep

(initially published 2012-05-16)

I have a love-hate relationship with Perl’s scoping model. It has quite obviously grown over time so it is kind of twisted and knotted like a tree. However, all those twists and knots make it easy to climb if you know what you’re doing. So lets review the different types of scope:

Package Scope – Package scope is actually really just global scope with the caveat that unqualified identifiers refer to the current package setting. This is the default scope, though naked package-scoped variables are fraught with peril, so I will repeat the admonition to use strict so you don’t end up hanging yourself from the Perl tree with a noose made of your own code. Subroutine declarations (at least “nonymous” ones) are almost always package/globally scoped. Perl’s magic variables (such as $_) are generally package-scoped as well.
Dynamic Scope – Using the local keyword, we can restrict the declarations of variables, typeglobs, etc. to only be valid within the confines of our current block and everything we call from within that block. The declarations essentially have global/package scope until we exit the current block. This means we can override global declarations for the duration of the block, or we can materialize our own pseudo-global subroutines and make them disappear just as quickly when we’re done with them. People try to tell me Perl’s local is deprecated. It’s not. It’s just abused and misunderstood.
Lexical Scope – my and our modify declarations so that they are only meaningful within the block within which the declaration occurs. This includes sub-blocks, but it does not include calls to code that is defined outside of the lexical block. The primary distinction between my and our is that my declarations disappear completely when the block is done, but our declarations can be re-accessed in an entirely different scope (or the next time we re-enter this scope) by another use of our. This creates “hidden” global variables that can be explicitly revealed within any given lexical scope. There are more peculiarities of our that the Perl docs can certainly elucidate more than I have patience to.

Here’s some examples that hopefully will make some of the differences clear:

$foo = 0; # AKA $main::foo

sub a { print "$foo\n"; }
a(); # prints "0"
package bar;
$foo = 1; # AKA $bar::foo
main::a(); # still prints "0"
sub a { print "$foo\n"; }
a(); # prints "1"
{
    local $foo = 2;
    a(); # prints "2"
    local $main::foo = 3;
    main::a(); # prints "3"
}
a(); # prints "1" again
main::a(); # prints "0" again

my $foo = 4; # lexically scoped, not the same as $bar::foo
a(); # prints "1"

package baz;
print "$foo\n"; # prints "4"

{
    my $foo = 5;
    sub a {
        print "$foo\n";
    }
    a(); # prints "5"
}
print "$foo\n"; # prints "4" again
a(); # prints "5" again. Aha! A closure!

{
    our $foo = 6;
    package moo;
    print "$foo\n"; # still prints "6" b/c "our" is lexical
    our $foo = 7; # Lexical but associated with the moo package
}

{
    package moo;
    print "$foo\n"; # prints "4" still
}

{
    our $foo;
    print "$foo\n"; # prints "6"
    package moo;
    our $foo;
    print "$foo\n"; # prints "7"
}

Please note, the above won’t work in a use strict environment, because of the use of unqualified global variables. It’s a little like driving without a seatbelt. You really can if you need to, but in almost all circumstances there’s just no need to take the risk.
Globals and locals are forever bound to a package scope. If you switch packages, the same names refer to entirely different entities.
Lexical identifiers, declared with my and our are good throughout the entire block in which they were declared. our identifiers however are also associated with a package, and you can get back to the same entity in a different scope by using our again.
Lexical variables can be “closed” as in the last sub a above. In this case, the subroutine is actually defined in the global package scope despite the enclosing anonymous block, and maintains a reference to the lexically defined $foo even after the scope in which that $foo was originally defined is gone. We’ll get back to closures more in another post.

Now, you may ask yourself, why is any of this a good thing? This post is unfortunately short on some of the sexy techniques I have been advertising, but scoping is foundational to some truly mindblowing stunts which are soon to come. So study up. This will be on the final!

I’m the map!

(originally posted 2009-11-24)

If there’s a place you wanna go, I’m the one you need to know…

My boss tells me I have “an unhealthy fascination with map“. That may be true, but it’s only because map is a perfect example of what makes Perl Perl. Let’s take a quick stroll through some of the ins and outs of this wondrous little function, shall we?

The Shortest Distance Between Two Points

Here’s the setup. You have a list. You want to make another list with just as many elements, where every element of the new list is created by some sort of operation on the elements of the first list. Huh, you say? Okay, for a concrete example, let’s say you have a list of words, and you want a list of the lengths of those words. Your first instinct may be a loop:

my @lengths;
foreach my $word (@words) {
    push @lengths, length($word);
}

You may even think a fancy statement modifier would do the trick nicely:

push @lengths, length($_) foreach @words;

But what you really want is map:

my @lengths = map {length($_)} @words;

or alternatively:

my @lengths = map(length($_), @words);

It stands head and shoulders above the loop implementation for several reasons:

Its purpose is very well-defined. You know instantly from looking at it that you are trying to create a list with exactly the same number of elements as @words. The loop is a more general-purpose mechanism, so its purpose takes more work to discern.
@lengths gets its value by way of assignment rather than by side effects. This is a clear win for the language purists, but for the normal folks it’s still a Good Thing (TM) because it’s clear where the value came from. If you’re pushing here, it is certainly reasonable to suspect that you may be pushing more items on somewhere else. If you populate it all at once and never change it again, it’s very clean and clear to the reader.
It’s more efficient because Perl can optimize performance for the specific task of list assembly, which it cannot do for a loop, even if the loop is simply trying to do the same thing.

Building on this, we can use map to build hashes:

my %word_length = map { $_ => length($_) } @words;

Note how in the case above, map returns two list elements for each element of @words (remember that the “fat comma” operator => is still basically just a comma, so the expression evaluates to a two-element list). This sort of approach can also be used to create longer lists, e.g.:

my @repeated_words = map { ($_) x length($_) } @words;

You can embed one map inside another, but remember that $_ in the innermost expression will refer to something different from $_ in the outer expression:

@matrix = map {my $x = $_; map {[$x, $_]} @y_values} @x_values;

You can’t get there from here

So when may it be desirable to opt for a loop when map could be used to accomplish the same task? Here are a few general guidelines:

If the code block or expression is too complicated, using map will only be detrimental to the purpose of clarity and concision. A loop may be more verbose, but it can win big in terms of making the code comprehensible and therefore maintainable. And branching is more straightforward in a flow-control structure than in a declarative-style function like map.
If you want to perform an operation on each element of a list, but don’t need to create a list of results of these operations, map is a counter-intuitive choice.
If you’re worried about action-at-a-distance (see below) a foreach loop gives you an easy way to specify a loop variable so you don’t get caught in the trap of using the default variable $_.

Here be Sea Serpents!

map, its kid brother grep, and under certain circumstances for/foreach, have an interesting–er–feature, if you will. You see, the $_ that you use in you expression or code block is actually an alias for the element of the list that is being operated on. So if you change the value of $_, you are operating on a reference (just like operating on @_ within a subroutine) and can therefore create all manner of side effects, intended or–more likely–otherwise. This is what is referred to in the Perl world as “action-at-a-distance”. I recently had my first run-in without action-at-a-distance bugs as I was building an object-oriented system (a la Moose) which at some point in the hierarchy wraps some older code that I was much less familiar with. Now it seems perfectly reasonable to have a collection of objects and want to do something like get the sum of the values of a certain attribute. Something like:

sum map { $_->get_population } @cities

Now if you like being lazy as any great programmer should, you might not actually construct those population attributes until they’re needed, and Moose makes such deferred evaluation a snap. But what if somewhere in the process of constructing the attribute some code along the way decides to use $_ for its own nefarious purposes. This is exactly what I ran into. Somewhere someone decided to assign a value to $_ (which is not necessarily a bad thing, as there are occasions where this is quite convenient) but $_ being a global variable, it impacted the map statement I had written, through many layers of code. My array, which originally contained blessed object references, all of a sudden had some stray strings, and sometimes undefs, in it. A most unwelcome turn of events.

What to do? Well, there are a few precautions that can and should be taken to prevent such things from happening. First of all, any time $_ is used, it should be localized or even lexicalized in scope. Constructs which generate $_ for you typically automatically limit the scope to the construct itself (in the case of map, the map statement itself) as a byproduct of aliasing, i.e.:

$_ = 'a';
@b = 1..3;
for (@b) {++$_}

In the outer scope, $_ still has the value “a”, but incrementing $_ in the loop’s scope has incremented all the values of the array @b so now @b is 2..4. Nutty, huh? But that value of “a” could be problematic to some other scope, so to be a good citizen, I should preface that first assignment with local or (if you have Perl 5.10) my. If the code I had been calling had taken such precautions, I wouldn’t have had so much trouble.

However, there are precautions that can be taken at the other end as well. And in the end, even though I went back and sanitized the underlying code to have good scoping hygiene, I still applied the following so that my code was more robust even if someone went back and did the same thing over again at the lower level. Well how can we prevent the value of the alias from changing? If we are sure that our expression/code-block doesn’t change the value of $_ within its lexical scope, we could (again, if you have Perl 5.10) lexicalize $_. However, if you have no such guarantee, or you don’t have 5.10, then you probably need to localize $_, like so:

@a = 100..200;
@b = map { local $_ = $_; s/^/1/ } @a;

This leaves @a intact but leaves @b equivalent to 1100..1200. If you want a simple way to do this more often, you can roll your own “safe” version of map:

sub safe_map (&@) {
my $code = shift;
map {local $_ = $_; $code->()} @_
}

Now you can just drop that in where you usually use map (except if you are using the EXPR form, which won’t work with the prototype) and you can ensure your mapped lists won’t get corrupted. There are ways the output list could still get corrupted by poorly-scoped subroutines, but that’s a much easier problem to recognize and deal with.

Conclusion

Hopefully this is enough to make you want to use map but want to do so carefully. For with great power comes great responsibility. It is a beautiful tool, but in the wrong hands it can unleash a force so terrible, few have lived to tell the tale. So forge on, intrepid adventurers. Excelsior!

A slice of hash

(originally posted 2009-10-20)

I used to like Perl. The past few months using it as my main language, I grew to really like it a lot. What had me fall head over heels in love, what inspired me to start this new blog series: hash slices. Anyone who’s been around Perl at all knows hashes (i.e. associative arrays, to use the behavioral description rather than the implementation description). They are a core feature that set it apart fairly early on as a force to be reckoned with, and has been emulated numerous times due to it’s simplicity and power. But addressing multiple elements of an associative array requires some sort of loop, right? Not so in Perl. List assignments are a fairly well-known aspect of Perl. But with a hash slice, you can treat a set of hash elements exactly as you would a list.

You’ve probably come across array slices before, though you may not have recognized them as such:

@foo[1..5]
@bar[2,4,6]

These can be used pretty much anywhere a regular list could, even as an lvalue. A hash slice produces much the same animal, though it comes of different parentage. The key syntax for referring to a hash slice is analogous:

@hash{list_of_keys}

@list_of_keys could also be replaced by a literal list, a function returning a list, a de-referenced array reference, etc. In other words, you tell the parser that you want a list of hash elements and just supply a list of keys. That’s interesting, but it gets insanely beautiful when you do things like assigning to a hash slice:

# Assign sequential values to a bunch of hash elements
@my_hash{qw(a b c d)} = 1..4;
# Copy all elements of hash_b whose key has a 'g' in it into hash_a, leaving other elements of hash_a intact
@hash_a{@tmp} = @hash_b{@tmp = sort grep {/g/} keys %hash_b};
# Assign the value 1 to all elements of the hash referenced by $hashref, whose keys are in the array true_keys
@{$hashref}{@true_keys} = (1) x scalar(@true_keys);

To accomplish these same things you would have had to do things like:

@my_arr = qw(a b c d);
foreach (0..$#my_arr) {
$my_hash{$my_arr[$_]} = $_ + 1;
}
foreach (grep {/g/} sort keys %hash_b) {
$hash_a{$_} = $hash_b{$_};
}
foreach (@true_keys) {
$hashref->{$_} = 1;
}

Or even more unwieldy:

$my_hash{a} = 1;
$my_hash{b} = 2;
$my_hash{c} = 3;
$my_hash{d} = 4;
...

The minute I discovered hash slices, I was able to refactor away dozens of lines of code, and make things in most circumstances easier to understand and harder to break. Once you become familiar with the syntax, it can be much easier to discern the intention of a single line of code than a loop, or god forbid a series of assignments.

So let’s get slicin’ and dicin’ people!