A programming language is an artificial language used mainly for giving instructions to computers; a secondary use is for the precise specification of algorithms. A group of instructions for a particular purpose is known as a computer program.
When compared with human languages (e.g. English, French, Spanish, etc.), programming languages are very much smaller, much simpler, and are also more precise. Most programming languages consist of a mixture of English words and mathematical notation; very few programming languages have a base vocabulary exceeding 60 words, although they may also have utility libraries with hundreds of entries which provide optional extra functionality.
Many thousands of different programming languages have been defined, though only a few dozen are in widespread use. The first publicly available programming language was developed in 1951 by Grace Hopper. In 1948 Zuse published a paper (in "Archiv der Mathematik") about an early programming language Plankalkül, though this was not actually implemented until 1998.
Describing a programming language
While human languages such as English may have rules of grammar, there are almost always exceptions to these rules. To reduce ambiguity, most programming languages have their grammar or syntax rigidly defined by a relatively small number of rules, with no exceptions. In many cases some variation of a context-free grammar is used, often Backus-Naur Form, or some extension thereof. Note that context-free grammars are not powerful enough to describe human languages such as English.
The meaning or semantics of any particular construct in a programming language is most commonly described in fairly precise English. Several unsuccessful attempts have been made to find a useful and formal way of describing the semantics.
The more important programming languages have an internationally-agreed official standard (ISO is the international standards organisation, ANSI is the standards body in the United States). In theory, programs which conform to the relevant standard should be usable on most types of computer; this works in practice most of the time, provided that the programmers have been careful to avoid machine dependencies. Such agreed standards are usually revised every 5 to 15 years, often with new features being added, and sometimes with old features being either removed or marked as obsolete.
Very few programmers learn a programming language directly from the formal definition, which exists primarily as a reference which can be consulted in cases of doubt. There are usually plenty of books and training courses for the more common programming languages.
The art of programming
The act of writing a computer program using a programming language is called programming. While there are many guidelines, there are no rules which are guaranteed to produce a correct and usable program, so programming is very much an art or skill rather than a science. Furthermore, a programmer must pay meticulous attention to detail: a moderately large program of say 150,000 instructions must be completely free of punctuation errors, grammatical errors, and spelling mistakes, and must also be largely free of logical errors (as a size comparison, a 400-page paperback book contains about 16,000 lines - 1/10th the size).
Various studies have found that programmers with a comparable amount of experience may differ in skill levels by a factor of 10 or more.
Classifying programming languages
The initial distinction is between low-level languages and high-level languages.
Low-level languages are generally referred to as assembly languages. Most instructions in such languages equate directly to a single machine instruction. Each machine architecture has its own assembly language, so programs written in such languages are generally very machine-specific and are not portable. The instructions in such languages are usually fairly cryptic, often using abbreviations such as ADD, JNE, LR, LI, etc. In the early days of computing (up to about 1ate 1950's) most programming was done in assembly language, partly due to the limited power and memory capacity of early computers.
High-level languages generally use a mixture of English words and simple mathematical notation, and are often available with little or no change on many different types of computers. These are often referred to as 3rd-generation languages (the 1st generation was raw binary machine code, the 2nd was assembly language). A single high-level statement may be equivalent to using 3 to 10 assembly language statements, so programming in a high-level language can be much more productive. Example statements include:
X = 5 + Y*3 if result < 3 then stop print "Hello world"
The two main types of high-level languages are imperative and declarative, with the first type being far more common.
Declarative languages define relationships and facts, ask a question, and leave it to the underlying program implementation and its theorem prover to work out how to derive an answer. An example of such a language is Prolog (Programming in Logic). These languages can be effective in areas such as database analysis, symbolic mathematics, and language parsing.
When using an imperative language, the programmer is responsible for giving precise instructions as to exactly what has to be done and in what order. Note that some programs may have a random element included so that the results of a program are not necessarily predictable. One of the key reasons for the success of high-level languages is that they provide a level of abstraction (sometimes several such levels). which means that a programmer can concentrate more on the problem to be solved and less on the detail of how to solve it.
Early high-level programming languages were often intended for use in a particular problem domain (e.g. COBOL for business purposes, Fortran and Algol 60 for scientific calculations). Other programming languages were originally intended for teaching purposes (e.g. BASIC and Pascal). The language C was originally intended for systems programming (writing operating systems and compilers).
Successful programming languages evolve over time to become more general-purpose, or form the basis for other languages (e.g. C -> C++). The evolution often involves incorporating good features from other languages. The general trend is for languages to become more general-purpose and provide more levels of abstraction. The concepts of structured programming were put on a sound theoretical basis in the early 1970's and these have also influenced the evolution of some programming languages (e.g. Fortran 66 -> Fortran 77). The idea of objects (combining data with the applicable operations) was available in Simula in the mid 1960's, but did not become popular until the advent of C++ in the early 1980's and Java in the mid 1990's; confusingly, Java and C++ have a lot of syntax in common, but the meaning (semantics) may differ drastically for what looks the same when written down.
There are a variety of different programming paradigms (styles of writing programs). Some programming languages really only support a single paradigm while others may support several. Declarative programming has already been mentioned above. Other paradigms include: procedural, functional, event-driven, object-oriented and automata-based.
Implementation of programming languages
A program as written by a programmer in some programming language is said to be in the form of source code. Such source code can not be used as is, but must first be processed by another program before it can be used on a computer.
In the case of assembly language the processing program is called an assembler; this has a relatively simple task since the instructions coded by the user correspond almost exactly to the actual machine code instructions to be used. The entire program is assembled prior to use.
There are three types of program for dealing with high-level programming languages:
- compiler, which converts the entire program to machine code prior to use.
- interpreter, which examines the source program line by line, works out what has to be done and does it.
- translation to some intermediate code, which can then either be interpreted or converted to machine code on the fly.
Programs which have been completely converted to machine code prior to use may easily run 10 times faster than programs which are interpreted.
There are no fixed rules as to which programming languages are processed in which way, and indeed most languages can be handled in any of these three ways. Many languages have been implemented in two or more of these ways. Traditionally compilers have been used for languages where performance is important (e.g. Fortran, C), interpreters have been used for languages where convenience of use is important (e.g. most scripting and command languages), and translation to intermediate code has been used when machine-independence is considered to be important (e.g. Java).
Most of the program code in use today is written in C, C++, COBOL, and Java. Other popular general-purpose languages include BASIC, Pascal, and C#. There are literally thousands of programming languages, although many of them are simply variants of others. Variants are called dialects. Thus, Visual BASIC is a dialect of BASIC.
The following is a sample list of programming languages.