Friday, April 18, 2014

The difference between code and data is Turing completeness

In principle there seems to be no difference between code and data, because the same sequence of bits can be treated as a sequence of numbers or a piece of code. It is more accurate to say that the same sequence of bits can be treated as code or data by different programs.

For example, we could have a program A that treats its input as a sequence of numbers and computes the sum of these numbers, and program B that treats the input as a piece of code and executes it. The important difference between programs A and B is that we can make program B carry out any possible computation by giving it a suitable input, while program A can only compute sums.

On the spectrum between these two extremes, we can find programs whose behavior is more influenced by their input than program A, but easier to analyze than program B. These programs correspond to models of computation that are stronger than finite state machines, but weaker than Turing machines.

For example, if we run a syntax highlighting program on a piece of code, we probably can't make it carry out an arbitrary computation. In this way, a syntax highlighting program treats the input as data. If we give the same piece of code to a compiler, is is treated less like data and more like code, especially if the programming language it's written in allows some compile-time computation. And when we finally execute the program, it is treated as code by the interpreter or the operating system.

Now you can tell the difference between code and data in any software system. If the data can strongly influence the program used to process it, and make it carry out an arbitrary computation, then it's code.

No comments:

Post a Comment