A programming language, is an artificial language used mainly for giving instructions to computers; a secondary use is for the precise specification of algorithms. A group of instructions for a particular purpose is known as a computer program.
When compared with human languages (e.g. English, French, Spanish, etc.), programming languages are very much smaller, much simpler, and are also more precise. Most programming languages consist of a mixture of English words and mathematical notation; very few programming languages have a base vocabulary exceeding 60 words, although they may also have utility libraries with hundreds of entries which provide optional extra functionality.
Many thousands of different programming languages have been defined, though only a few dozen are in widespread use. The first publicly available programming language A-0 was developed in 1951 by Grace Hopper. In 1948 Zuse published a paper (in "Archiv der Mathematik") about an early programming language Plankalkül, though this was not actually implemented until 1998.
Describing a programming language
While human languages such as English may have rules of grammar, there are almost always exceptions to these rules. To reduce ambiguity, most programming languages have their grammar or syntax rigidly defined by a relatively small number of rules, with no exceptions. In many cases some variation of a context-free grammar is used, often Backus-Naur Form, or some extension thereof. Note that context-free grammars are not powerful enough to describe human languages such as English.
The meaning or semantics of any particular construct in a programming language is most commonly described in fairly precise English. Several unsuccessful attempts have been made to find a useful and formal way of describing the semantics.
The more important programming languages have an internationally agreed official standard (ISO is the international standards organization, ANSI is the standards body in the United States). In theory, programs which conform to the relevant standard should be usable on most types of computer; this works in practice most of the time, provided that the programmers have been careful to avoid machine dependencies. Such agreed standards are usually revised every 5 to 15 years, often with new features being added, and sometimes with old features being either removed or marked as obsolete.
Very few programmers learn a programming language directly from the formal definition, which exists primarily as a reference which can be consulted in cases of doubt. There are usually plenty of books and training courses for the more common programming languages.
The art of programming
The act of writing a computer program using a programming language is called programming. While there are many guidelines, there are no rules which are guaranteed to produce a correct and usable program, so programming is very much an art or skill rather than a science. Furthermore, a programmer must pay meticulous attention to detail: a moderately large program of say 150,000 instructions must be completely free of punctuation errors, grammatical errors, and spelling mistakes, and must also be largely free of logical errors (as a size comparison, a 400-page paperback book contains about 16,000 lines - 1/10th the size).
Various studies have found that programmers with a comparable amount of experience may differ in skill levels by a factor of 10 or more.
Classifying programming languages
Some programming languages are called scripting languages, or command languages, when they are interpreted by an operating system shell or as a way of programming other applications. However, the distinction between languages described with these different terms is largely arbitrary. Although scripting and command languages are usually interpreted and have a smaller grammers than other programming languages, there is nothing inherent about them that prevents them from being compiled instead of interpreted. Nor do they always act as an adjunct to other software as some of them can be used to write stand-alone programs. Further, some languages not labelled as "scripting languages", such as BASIC, are interpreted, and versions of BASIC and Pascal have been used as scripting adjuncts to other software. In other words, the distinction is primarily one of usage and nothing inherent to the language itself. The most common languages used primarily for scripting are JavaScript, PHP, and Lua.
Low-level vs High-level
Programming languages can be categorized by whether they are low-level or high-level. They can also be referred to by "generation". 1st-generation languages are the machine code of the CPU. 2nd-generation languages are the assembly languages. 3rd-generation languages are what most programmers use in the modern world and include such things as BASIC, C++, Pascal, and PHP. 4th-generation languages are domain-specific and offer productivity gains for programmers in those domains, but their narrow focus makes them unsuitable for solving problems outside of their specific domain. 1st- and 2nd-generation languages are categorized as low-level, while 3rd- and 4th-generation are categorized as high-level. Each generation abstracts the programming concepts further away from the hardware that runs the program. However, concepts and capabilities often cross these generational boundaries over time.
Here's an example of some 3rd-generation (Pascal) code and what the equivalent code looks like in 2nd-generation (assembly):
In Pascal:
P := Pos( HT, S ) + 1 ; while( S[ P ] <> HT ) do inc( P ) ;
In assembly:
lea eax,[ebp-$918] mov edx,[ebp-$10] call @UStrFromLStr mov edx,[ebp-$918] mov ecx,1 mov eax,$62f890 call Pos inc eax mov [ebp-$8],eax jmp A loop: inc dword ptr [ebp-8] A: mov eax,[ebp-$10] mov edx,[ebp-8] cmp byte ptr [eax+edx-1],9 jnz loop
Low-level languages
Low-level languages are generally referred to as assembly languages. Most instructions in such languages equate directly to a single machine instruction. Each machine architecture has its own assembly language, so programs written in such languages are generally very machine-specific and are not portable to different hardware. The instructions in such languages are usually fairly cryptic, often using abbreviations such as ADD, JNE, LR, LI, etc. In the early days of computing (up to about late 1950's) most programming was done in assembly language, partly due to the limited power and memory capacity of early computers.
High-level languages
High-level languages generally use a mixture of English words and simple mathematical notation, and are often available with little or no change on many different types of computers, depending on what compilers are available on which computers. A single high-level statement may be equivalent to using 3 to 10 assembly language statements, so programming in a high-level language can be much more productive. One of the key reasons for the success of high-level languages is that they provide a level of abstraction (sometimes several such levels). which means that a programmer can concentrate more on the problem to be solved and less on the detail of how to solve it.
There are a variety of different programming paradigms (styles of writing programs). Some programming languages really only support a single paradigm while others may support several. Possible paradigms include: imperative, declarative, procedural, functional, event-driven, object-oriented, list processing, and automata-based.
Languages associated with computers can also be classified as to whether or not they are Turing-complete, i.e. whether or not they are capable of expressing all possible computations (assuming enough time and memory). Examples of languages which are not Turing-complete include SQL for describing databases and markup languages such as HTML and XML.
Example statements include:
X = 5 + Y*3 if result < 3 then stop print "Hello world"
Imperative programming
Also known as Procedural Programming, this is the most common type of programming language, and many of the other paradigms are derived from or build on this. When using an imperative language, the programmer is responsible for giving precise instructions as to exactly what has to be done and in what order. Note that some programs, such as games, may have a random element included so that the results of a program are not necessarily predictable.
Early high-level programming languages were often intended for use in a particular problem domain:
- COBOL for business purposes,
- Fortran and Algol 60 for scientific calculations,
- BASIC and Pascal for teaching purposes,
- C for systems programming (writing operating systems and compilers),
- LISP for list processing and artificial intelligence.
Successful programming languages evolve over time to become more general-purpose, or form the basis for other languages (e.g. C -> C++). The evolution often involves incorporating good features from other languages. The general trend is for languages to become more general-purpose and provide more levels of abstraction. The concepts of structured programming were put on a sound theoretical basis in the early 1970s and these have also influenced the evolution of some programming languages (e.g. Fortran 66 -> Fortran 77). The idea of objects (combining data with the applicable operations) was available in Simula in the mid 1960s, but did not become popular until the advent of C++ in the early 1980s and Java in the mid 1990s; confusingly, Java and C++ have a lot of syntax in common, but the meaning (semantics) may differ drastically for what looks the same when written down.
Declarative programming
Declarative languages define relationships and facts. Ask a question, and leave it to the underlying program implementation and its theorem prover to work out how to derive an answer. An example of such a language is Prolog (Programming in Logic). These languages can be effective in areas such as database analysis, symbolic mathematics, and language parsing.
Implementation of programming languages
A program as written by a programmer in some programming language is said to be in the form of source code. Such source code can not be used as is, but must first be processed by another program before it can be used on a computer.
In the case of assembly language the processing program is called an assembler; this has a relatively simple task since the instructions coded by the user correspond almost exactly to the actual machine code instructions to be used. The entire program is assembled prior to use.
There are three types of program for dealing with high-level programming languages:
- compiler, which converts the entire program to machine code prior to use.
- interpreter, which examines the source program line by line, works out what has to be done and does it.
- translation to some intermediate code, which can then either be interpreted or converted to machine code on the fly.
Programs which have been completely converted to machine code prior to use may easily run 10 times faster than programs which are interpreted.
There are no fixed rules as to which programming languages are processed in which way, and indeed most languages can be handled in any of these three ways. Many languages have been implemented in two or more of them. Traditionally compilers have been used for languages where performance is important (e.g. Fortran, C), interpreters have been used for languages where convenience of use is important (e.g. most scripting and command languages), and translation to intermediate code has been used when machine-independence is considered to be important (e.g. Java, C#).
Assorted Languages
Information about some 2,500 programming languages can be found in 'The Language List'.[1]
According to LangPop [2] the most popular programming languages as of 2012 are:
A similar list, which Java as the most popular programming language, was published for 2011.[3]
According to Professor Alfred V. Aho [4] The most influential high-level programming languages have been (in approximate order of development):
- FORTRAN showed that a useful high-level language was possible.
- COBOL provided a language especially for business use.
- LISP provided a language suitable for symbol processing.
- Algol 60 was the first block-structured language and set a new standard in language definition (Backus-Naur form).
- Algol 60 was described by Edsger Dijkstra, as "a great improvement on many of its successors".
- BASIC was the first language used interactively with multiple simultaneous users.
- Simula 67 was the first object-oriented language.
- C showed that operating systems and compilers could be written in a high-level language.
Basic Programming Concepts
Certain concepts and constructs are common to most, if not all, programming languages. The exact syntax and semantics will vary from language to language, but the basic concepts remain the same. All languages support reserved keywords which define the vocabulary of the language. Identifiers are user-specified words that can be used to name variables, functions, constants, and more.
Comments
Comments are a construct used in programming languages to support documentation. Comments are ignored by the compiler/interpreter, and are solely for the use of humans reading the source code.
Variables
Variables are named storage places in memory that can hold data. The kind of data stored in variables depends upon the data type of the variable. Naming conventions differ between languages, but typically variable names start with an alpha character (A-Z or a-z), and can contain alphanumeric and underscore characters.
Data Types
Data types indicate both how much data is stored as well as how that data can be processed and represented. Some languages (such as Python) are typeless, indicating that variables can store any type of data. Some (such as PHP) are loosely typed, which means that variables can be given a defined type, but this is not always strictly enforced. Other languages (such as C++) are strictly typed, indicating that each variable can only store data of a given type.
Literals
Literals are values that are used within programs. Examples include numeric literals such as 200 or 3.14159, and string literals such as 'Text' or "Cow". Typically character and string literals are delimited with single quotes (') or double quotes ("), with some languages permitting both.
Constants
Most languages support named literal values so that the use of constants in the code is self-documenting.
Statements
Statements are instructions which indicate what action to perform.
Flow control
Flow control statements include loop constructs, goto statements, function calls, switch/case statements, conditional statements (if-then-else), exit statements, and others. goto is provided in most languages, but its use is almost universally frowned upon, since it makes program maintenance more difficult.
Assignment
Assignment statements allow the assignment of values to variables, and copying of data.
I/O
I/O statements concern reading input from the user, or files, or devices, and writing output to them.
Declaration
Declaration statements allow the programmer to specify variable names, labels, array dimensions, and data types including structure layout and class definitions.
Other
Some statements perform functions that do not fit under the above types. These vary widely by language. For instance, the BASIC programming language has the DATA statement which allows raw data to be included in the source.
Operators
Operators are reserved symbols used for various operations. Unary operators operate on a single value. Binary operators operate on two values. Ternary operators operate on three values. There are several different classes of operators typically available.
Arithmetic
These operators support arithmetic operations such as add, subtract, divide, multiply, and so forth.
String
String operators perform operations on strings, such as concatenating strings together.
Logical
Logical operators operate on boolean values, including and, or, and not.
Comparison
Comparison operators compare values and include such operations as equal-to, not-equal-to, less-than, greater-than, less-than-or-equal, and greater-than-or-equal.
Bitwise
Bitwise operators are used to manipulate individual bits in integer values.
Functions
Functions are identifiers that take zero, one, or several parameters and return a value based upon the parameters. Functions can return the current system time, perform arithmetical operations, perform string operations, and so forth.
Most languages allow users to define their own functions.
Objects
Object Oriented programming is supported in most modern programming languages.
Pragmas/Directives
Pragmas (aka directives) provide a way of changing options associated with a compiler. For instance, a pragma may alter the memory alignment of data. Conditional compilation is also provided by directives that enable or disable the inclusion of code depending upon external conditions (such as the platform that the program is being compiled upon).
Advanced Programming Concepts
Synchronous vs. Asynchronous
Most programs operate in a synchronous manner: each statement is executed before the next statement is executed. However, some languages have asynchronous abilities built into the language (such as Node.js) or there are optional libraries that can provide asynchronous behavior for other languages. An asynchronous program is one that executes more than one statement concurrently. The mechanism that provides the ability is sometimes called co-routines or, more often, threads. A synchronous program is "single-threaded", while an asynchronous program is "multi-threaded". Multiple threads can execute in parallel which can provide a performance benefit if used correctly. For instance, while one thread is accepting input from the user, another thread can be writing data to a file, and yet another thread can be calculating the display of graphics. Such an approach cannot be accomplished with synchronous code since each of these tasks must complete before the next one can start.
Multi-threading can provide benefits, but it also requires far more thought in program design, the use of exclusivity mechanisms (such as mutexes, spinlocks, and/or critical sections) to protect data from being updated at the same time by multiple threads, and multi-threaded code can be exceedingly difficult to maintain and debug.
Prescient Quotes on Police State Programming Surveillance
- "The progress of science in furnishing the government with means of espionage is not likely to stop with wiretapping. Ways may some day be developed by which the government, without removing papers from secret drawers, can reproduce them in court, and by which it will be enabled to expose to a jury the most intimate occurrences of the home. Advances in the psychic and related sciences may bring means of exploring unexpressed beliefs, thoughts and emotions. 'That places the liberty of every man in the hands of every petty officer' was said by James Otis of much lesser intrusions than these. 1 To Lord Camden a far slighter intrusion seemed 'subversive of all the comforts of society.' Can it be that the Constitution affords no protection against such invasions of individual security?"
- Louis Brandeis (1856-1941), United States Supreme Court Associate Justice Dissenting, Olmstead v. United States, 277 U.S. 438 (1928).
References
- ↑ The Language List. Retrieved on 2012-03-16.
- ↑ Programming Language Popularity. Retrieved on 2012-03-13.
- ↑ http://www.eweek.com/c/a/Application-Development/Java-C-C-Top-18-Programming-Languages-for-2011-480790/?kc=EWKNLBOE12242010FEA1
- ↑ Trends in Programming Language Design. Retrieved on 2012-03-14.