Tuesday, May 15, 2007

The key to learning (and teaching) Programming

The subject's come up many a time in my life about how people program computers. A lot of the time it's seen as a mystical art and some lay-people even confuse "programming" with "operating a computer", or even just "changing some settings".

The question always comes up about how to "learn programming". Besides the obvious problem with the wording of that phrase, it seems that people still haven't found an effective way to teach children how to program computers properly... and more and more teachers are skipping every bit of the subject that they can. My interest was caught by this subject again recently by a story on BBC News and by a subsequent appearance of the same story on Slashdot concerning a graphical programming language specifically aimed at younger children.

Programming is quite a strange discipline. It is the laying down of logical, unbreakable rules and actions to arrive at a predictable and reliable answer. However, it also requires, more often than not, ingenuity and creativity in order to achieve those aims. In some respects, it is similar to the cutting-edge of mathematical research (and, in fact, mathematics is probably the most complementary subject to computer programming, with the possible exception of electronic circuit design).

The problem with teaching programming is not that children cannot be brought to understand the subject, or even that the tools aren't available for them... most children of an age where they can comprehend what programming is are capable of actually programming. The problem is, in fact, the established complexity of the systems that they use each day and the over-simplification of their given tools. Additionally, the fast change of consensus on a "suitable" language stifles the teaching of programming, when in fact every programming language proposed has problems whether they be from an over-simplification of programming or a steep learning curve.

Also, the program which dumbs down programming to the point where children can "understand it" is probably one of the more complex programs to write. The abstraction required to allow a child to just drag-and-drop items into a graphical interface and thereby create a set of instructions for the computer to follow relies on decades of other people's code in order to display itself, allow the child to manipulate the mouse, and do so in a high-performance manner on a modern machine.

This is, in some way, similar to mathematics again... we all get told that it's pi-r-squared but finding out WHY takes more advanced mathematics to discover. Unfortunately, the modern consensus is that we have to assume that "it just is" for several years before we can get to the stage where we can work out exactly why. This doesn't present a problem in teaching, however, as almost all teaching comes from the concept of "good enough for now" or "lies for kids".

Teachers are also drawn into the same trap that stops children from being able to grasp programming: You can teach children for-loops and variables to your heart's content... the fact is that they will not be able to effectively link those concepts with the word-processor that loads in a fraction of a second with a thousand different features or their favourite 3D game that simulates realistic physics without a complete understanding of some quite complex mathematical principles. Most teachers, if pressed, will happily explain that "it all gets turned into 1's and 0's" (thereby introducing even more questions about how 1's and 0's can look like Mario running across a platform in time to their controller movements) but few can explain to a child's satisfaction just how it does that.

And when the children attempt to reproduce that 3D game in BASIC or Logo, they will be sadly disappointed that the best they can manage is a few hundred lines of code to make some circles appear at random locations on the screen (but it would only be pseudo-random, which is based on polynomial iteration, which requires advanced mathematics... and the circles would be only grid-based representations of the formula for the graph of a circle, which requires trigonometry to grasp completely... the traps are always there). Yes, there are such things as Dark BASIC, but again we're just abstracting away absolutely everything into a black box that "just works", rather than letting the children find out for themselves in due course.

That's not to say that there isn't some effort being put into teaching children programming, or even that that effort is failing. There are some very effective tools to demonstrate programming in a context where even the youngest child can interact and play with a system that they can understand. The best example would probably be the Lego Mindstorms kits - some building blocks and motors can be turned into anything once a child has learned how to make the control blocks do what he wants them to do. A common first foray into the programming world for many primary school children today is a Lego Mindstorms kit that they first build and then program to achieve an aim - whether it be controlling the traffic lights at a busy "junction" or raising and lowering a gate on a railway crossing.

Unfortunately, some of the Lego software is abysmal to understand on first glance, even for a seasoned programmer. Clunky interfaces, unintuitive icons and "settings" and an extremely limited instruction set, all of which are supposed to help the child understand.

When all you have to work with is "Output X on", "Output X off", "Wait X seconds", "Wait for input X to reach Y" arranged in blocks where the syntax is horribly restrictive and forces you (from a logical and interface point of view) to do things such as "wait for input NONE to reach NONE" on dozens on instructions where it would just be more intuitive to introduce some concept of our old friend IF instead, programming starts to lose its fun.

Rather than forcing the logical equivalent of null IF statements onto every line of code, a much better idea would be to merely teach the children how to program the old fashioned way - groups of english-like instructions, executed strictly in order. Unfortunately, the languages of today all have syntaxes which require pixel-perfect placement of semicolons, parentheses and quotation marks in order to write even a simple Hello World program. And although there are ways and means around that (such as seriously relaxing the syntax), people seem to have the idea that having to use WORDS is somehow dirty, when it comes to programming.

As a youngster, I was only ever officially taught (in this order) Logo, BBC BASIC and Java. Bearing in mind that the first was in infant school (early 80's) and the last on my degree course (early 00's), that's a huge and slow gap to a programming language of practical use (i.e. one that I would have available at home, one that I could make useful programs in that I could use each day, one that I would be able to swap, create and share programs with other people). It was only by my own experimentation that I was able to learn other languages myself and I was programming competently in the last two of those languages before my teachers even mentioned them. For my A-level's, I actually taught my own classmates for several lessons because the teacher believed I could do a better job than he, having been programming in the course language longer than he had been programming.

Logo provided graphical (or physical) response, but all code was textual. BBC BASIC *could* be graphical but was mainly textual and all code was textual. Java, again, *could* be graphical but all code was, again, textual. However, a lot of modern-concept educational programming revolves on getting away from actual words (the Lego Mindstorms "language" being a perfect example). Textual code is not something to be scared of. In fact, in other subjects, we treat textual lists of instructions as vital. And isn't that, when you get down to it, all a computer program is? We need to bring back the old-fashioned text languages, without the historical baggage of text-mode editors, strange delimiters and line terminators.

To clarify, I'm not seriously suggesting that learning C++ at a young age would have helped me or any child. That's a ludicrous assumption, similar to saying that being taught quantum mechanics as a child would help me understand Newtonian physics - of course it would but the complexities of teaching at (and learning at) that sort of level would far outweigh the benefits of such an education. My own teachers struggled with every language I had to be taught, until I hit university. There's no way on Earth that we could expect teachers to teach at the levels required in order to allow that to happen, at least not in the forseeable future.

What's needed is not a *primarily* graphical system. What's needed is a way to easily construct lists of plain-English instructions without needing to worry about perfect spelling, excellent grammar (the absence of which is, in programming, a vital learning experience) and lots of typing. Drag-and-drop keywords are a good start.

The language itself, however, needs to be less strict or, at a minimum, not allow beginners to make most of the classic mistakes. You can't mispell a dragged-and-dropped keyword and with proper graphical interfaces you are able to show scope, loops etc. effectively without having to worry how many invisible tabs you have inserted on the left of the screen.

Data-types, although important, do not need to be set in stone. Children SHOULD learn the difference between a string and an integer variable as it is of critical importance at each stage of programming. Visual Basic's "Variant" does an excellent job of masquerading as multiple types simultaneously but introduces new problems in itself because certain distinctions are not made. However, a simple indicator of Constant vs Variable, String vs Number, should always be included.

The programmer should also be responsible for creating the variables that they are going to use. Variable scope is vital knowledge. If this means clicking a new button and selecting a type, be it "Number" or "Animated 3D Penguin Character", the kids should be the ones who create him, the ones that drag him in each time he's needed and the ones who (critically) should remove him from the project if he's no longer necessary. This means not only in whatever GUI the programming language is set in but into the actual program itself. They should insert the Animated 3D Penguin not only into the project but also into the correct parts of the program to "create" and "destroy" him at the right times.

[[ On a side note: They should also be able to name him whatever they like first... they are kids after all. In fact, he shouldn't be able to be created without being given an assigned name by the programmer. No "Animated 3D Penguin Character #1" etc. ]]

Ideally, his birth and death should be graphical each time the program is run. This will teach correct construction/destruction techniques (because Pingu just popping into the screen half-way through the game isn't good but creating him at the start and just keeping him off-screen until needed is a good way to learn - similarly this will also show the brighter kids that you don't need to create everything at the start and slow the whole system down, if you manage the entrance and exit of data effectively).

Abstraction of complex functions, e.g. 3D libraries etc. also help the programming become interesting much more quickly (I can remember the wow-factor of my first coloured circle on a black background but children today would hardly be impressed with that). However, many of these can, in fact, be expressed in the language itself (albeit complexly).

Most importantly, a language should be complete and self-hosting... that is that each part of it should be capable of being written in itself. The beginners will wonder how this is possible but this is true for almost all programming languages. Except, strangely, the educational ones. By hiding the complexity within the language itself, you strip away the hard work into black boxes.

For instance, the "Create a new penguin called Bob" instruction should not be the end of the matter. You should be able to drill down. That instruction should have the capability to be broken into several hundred parts, each of more complex and primitive instructions, but all in the same "language". The hidden properties of him should be there somewhere... accessible to the more advanced children who want to do something that the basic levels of the languages don't allow.

A multi-tiered GUI is the perfect way of doing this. On startup, you have your "Penguin Bob" buttons but if you "break" them, they split into the code that creates Penguin Bob, in a simple-enough set of further instructions. And you can break them multiple times, into ever-more-complex code until you hit some equivalent of assembly language... most probably some sort of sandboxed, interpreted code similar to Java. Why? Because this is how ALL code works... the BASIC interpreter is probably written in C. The C code would be executed as machine code. Everything is based on an more complex underlying level. And as an extra added bonus, this means that the language that kids learn in Year 5 to make Penguin Bob run across the screen and giggle is still the same language when they hit Year 11, 12 or 13 and have to submit an independently produced program of a highly complex nature (for instance, their own word-processor).

It doesn't have to be fast... in fact it SHOULDN'T be, for the basic levels. This teaches the users quickly that unnecessarily looping ten thousand times over a hundred lines of code is bad practice. And when they grow up and are able to break the code down to it's components they will not only see WHY but be able to generate a much more efficient code that is functionally equivalent.

With widespread use, it would also encourage the greatest principle in programming that has been present since the first line of BASIC hit our eyes... collaboration. Libraries of code. Open source. Those BASIC listings in ancient computer magazines. All of them were about collaboration and sharing of code. Imagine if your GCSE project in Year 11 was to create a documented, stable, fast set of replacement functions that the kids in Year 7 would use to make Penguin Bob do a somersualt.

There also needs to be real, physical feedback. A coloured square on a screen means nothing anymore - especially when it takes £2000 or computer to do it, from a child's perspective. Logo (and some of its later clones, such as RoamerWorld) knew this... many a British child grew up by slamming a glass-domed turtle into the teacher's ankles. Lego Mindstorms spotted this. To incite enthusiasm for programming, there needs to be a response... not just from a computer screen. Things such as PIC-chips and simple robotic circuits make this a breeze.

And, with something along the lines of RoboWar, where students could compete in a school-wide championship to generate the "best" robot by programming him, you could involve the whole school. In one fell swoop, you could have programming on multiple levels (from all levels of AI to sandboxing), electronics, control, communication and design (by applying the "Robot Wars" principle to a full-scale, physical version of each robot), competition and publicity.

Each educational language seems to have one or two brilliant ideas and then to fall flat on it's face in all other aspects. Considering the computers are ever-more complex, ever-more prevalent and are not going to be going anywhere in the foreeseable future, our children should have an understanding of exactly the principles that underly them.

They will learn WHY computers cannot be trusted to run the world without an awful lot of testing (because of the absence of "intelligent" computers so often seen in Hollywood), they will have the ability to not have to rely on multinational companies to write a simple letter. ICT is already compulsory in many aspects in almost every school lesson, and it creeps into our lives more and more each day. Unfortunately, nobody is being educated how to make all this technology do our bidding, instead relying on a few good people to do everything for us.