What if I'm Really Careful?

Subject:      Re: Is $$variable allowed like in PHP ?
From:         mjd@op.net (Mark-Jason Dominus)
Date:         Wed, 17 Nov 1999 23:26:01 GMT
Message-ID:   <80vdh5$f79$1@monet.op.net>
Newsgroups:   comp.lang.perl.misc

In article <FDn*U3ado@news.chiark.greenend.org.uk>, Ben Evans <bene@chiark.greenend.org.uk> wrote:

>So the basic problems are:
>* that they're global variables
>
>* that if you went to sleep and forgot what you're
>  doing, you might accidentally do something stupid, ie
>  blow away something you really wanted or call the wrong
>  subroutine or get a run-time error.
>
>So for quick and dirty scripts of up to a few hundred lines
>where I pretty much know exactly what I want to be doing,
>and I'm not depending on user input in any but the most
>restricted way, there's no real problem with soft references?

Yeah, well, there's no problem with smoking in bed, either, as long as you don't fall asleep and burn the house down.

I think you missed the point here. Global variables have problems, right, you got that. When you use symbolic references, you're using global variables in a totally unconstrained way; you seem not to have picked up on that. Here's an example: Earlier in this thread, we heard from a guy who has a CGI program that does this:

	ReadParse();
	while (($name, $val) = each %in) { $$name = $val }

He `knows' that this is safe because he `knows' that the names in the form will not conflict with his other variables. That is akin to someone who says it is OK to smoke in bed because they `know' they aren't going to fall asleep.

The reality, of course, is that anyone editing the form has to cross-check against the program to make sure that the names they want are not already reserved, and vice versa.

The entire fifty-year history of programming languages has been nothing more than one giant trend away from this kind of cross-checking. Everything: high level languages, subroutines, structured programming, modules, OOP, and all the other stuff is devoted to inventing new and better ways to make it safe to put together two components, such as a form and the program to process it, so that you don't have to cross-check.

It's not possible to enumerate all the things that could go wrong here. Someone could forget to check. Someone could decide not to check because they `know' their name is safe. Someone could make a mistake while checking. Someone could check correctly, and then make a typo when typing the name. Someone could accidentally use a Perl variable name that is meaningful but which does not appear explicitly in the program, like Exporter::VERBOSE, and so evades the check. Someone could use a name like Exporter::VERBOSE, only to have a new problem transpire months later because the program did not originally use that module. Someone could maliciously post bogus data, including the fake form fields named '0', '/', '\', '"', and '*', in a calculated attempt to get the program to do something it shouldn't have been doing. These are only a few of the many possible disasters.

You can't simply decide be careful, because the problem space is just too big to envision all the possible things that could go wrong. That's why we've spent fifty years trying to avoid having to try. You don't get around a fifty-year research problem just by being really careful; you have to be methodical also. Overlooking one of the twelve thousand possible disasters isn't `doing something stupid', any more than setting the house on fire when you're smoking in bed is; the stupidity is in smoking in bed in the first place.

Something else I think you missed is that a lot of these disasters are going to be impossible to diagnose. I don't know about you, but I don't put bugs into my programs on purpose. They're in there by accident. Even though I'm being careful. If you want to go ahead and postulate a bug-free program that uses symbolic references, then sure, obviously there are no problems with the symbolic references in that case, but it's a circular argument---and a fallacious one, too. Smoking in bed is dangerous, regardless of whether or not you manage to stay awake. Sure, if you're a perfect programmer, you don't need to avoid symbolic references; you don't need to avoid self-modifying assembly language code either. And you might as well program in assembly language, because it'll be a lot more efficient than using Perl. But some of us mere mortals do have to consider the impact of bugs.

The other side of the fifty-year trend away from global variables is that we divide programs into separate components that communicate in circumscribed ways. Why do we do this? It's so that when (not if, but when) we screw up, we can figure out where the problem is. When your CGI program aborts with an error, you don't have to worry that maybe the problem is in one of the 1500 other programs in /usr/local/bin, because those are totally unrelated pieces of software that have nothing to do with the program you are trying to debug. When subroutine X fails, you can start by looking at subroutine X, and expand your search from there; you don't immediately have to suspect the every one of the other 700 lines of code in your program. At least, this is the normal presumption. When you use symbolic references, all bets, and I really do mean all bets, are suddenly off. When you write my $x = 1, the effect is confined to the current block. When you write $main::x = 1, you can figure out the effect by looking through the program to see who looks at $main::x. The instant you write $$name = 1, you don't know any more what the effect will be or where it will crop up; it could be anywhere. You have just turned every 25-line debugging problem into a potential 700-line debugging problem.

So here's the short summary:

One of the biggest problems in all of compter programming is namespace management and data hiding. When you use a symbolic reference you are throwing away forty years of expensive lessons from the School of Hard Knocks.
Being careful does not mean trying to understand the possible consequences while behaving dangerously; it means avoiding dangerous behavior in the first place. Don't say ``I know it's dangerous, so I'll be really careful.'' Say ``I know it's dangerous, so I won't do it.''
Plan for failure, not for success, because success takes care of itself. You don't need a plan for when you don't screw up, so adopt programming practices that will assist your debugging efforts, not confound them.

Sorry this is so long, but I realized that a lot of this stuff was not obvious and that maybe nobody had ever said it explicitly before. My two web pages were gropings in this direction, but I don't think I was clear.

Hope this helps.

Other articles on this topic: Part 1 Part 2 Part 3

Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia

mjd-perl-misc@plover.com