HTTP
HTTP (HyperText Transfer Protocol) is a single-transaction text-based data transfer protocol using the request-and-respond method. Following with the client-server model of peer communications, a client constitutes software that makes requests, an example of which is a typical web browser. Servers accept connections and requests from clients and return the results to the client, often with intermediate processing that determines the content to return, or modifies content to return. Software such as MediaWiki used by Conservapedia runs on the server and performs this intermediate processing. Software on the server may also act as a client in the case of a reverse proxy for load balancing, or to obtain content from other servers, as is commonly done in the fight against internet spam.
Contents
Requests
Clients make requests by sending a series of headers after connecting to a server. These headers specify the nature of the request in great detail and allow the client to specify exactly what it needs from the server. The first line of the request always includes the type of request, the location on the server, and the HTTP version requested. Servers are free to downgrade to HTTP/1.0 when 1.1 is requested, or upgrade to HTTP/1.1 if 1.0 is requested. All other headers may appear in any order. Requests and responses are comprised of two parts: the header and the body.
Request Headers
- First Line - Special header in that it states the nature of the request (GET, POST, PUT, and HEAD are the most common), a resource on the server, and the HTTP version. All other headers must appear after this first line.
- Host - If the server provides content for multiple domain names, this header specifies which domain name the request is to be made against. Since clients never know if this is needed, they always send this header.
- Accept - List of content types the client is willing to accept.
- User-Agent - A simple string that identifies the client, usually used by web browsers to provide a scary amount of information that is only sometimes useful.
- Referrer - If the client software is a web browser, this header contains the complete URL from the previous page. Only provided if the user clicked a link.
- Cookie - Cookies are small bits of text that are meant to carry information across requests. The most common uses of this data include authentication information (when you log into a website) and tracking information (such as a unique client ID used by advertisers for demographics collection).
Response Headers
- First Line - Similar to the first line in the request header, this one provides the HTTP version used by the server, the status code and the status text.
- Cache-Control - Instructions on how the response body may be stored for later use to reduce the number of requests made against the server in question.
- Content-Encoding - Only used when the content has been encoded, such as when the response body has been compressed.
- Content-Type - Specifies the type of data encapsulated in the response body. This may be text/html for web pages, or image/png for Portable Network Graphics images.
- Content-Language - When the content requested is available in multiple languages, this header may be used to specify the language the content is written in. Note that this only applies to text/ content types, and refers to human language, not a programming or scripting language.
- Date - Tells the client what the server's clock thinks the current date and time are. Only required if time-sensitive transfers are used, such as in secure connections.
- Expires - This is used by the server to set a specific expiration date for the content provided. This may be set sometime in the past to prevent caching, or sometime in the future to encourage caching.
- Last-Modified - Used to inform the client when the content was last updated, according to the server's clock.
- Server - Tells clients about the server software in question. Additionally, a header named X-Powered-By tells the client what technology was used in processing the request. These are sometimes disabled by administrators for security purposes.
Example Request and Response
Request:
GET /Conservapedia HTTP/1.1 Host: www.conservapedia.com Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Referer: http://www.conservapedia.com/Main_Page User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4
Here we can see a request for www.conservapedia.com/Conservapedia and the user clicked on a link to this resource from the front page of Conservapedia. We can also see the user is using an Intel-based Mac running Mac OS X 10.6.4, the browser is a WebKit-based browser called Chrome and is willing to accept any type of content.
Response:
HTTP/1.1 200 OK Cache-Control: private, must-revalidate, max-age=0 Connection: close Content-Encoding: gzip Content-Type: text/html; charset=UTF-8 Content-language: en Date: Mon, 02 Aug 2010 19:38:27 GMT Expires: Thu, 01 Jan 1970 00:00:00 GMT Last-Modified: Mon, 02 Aug 2010 19:10:52 GMT Server: Apache/1.3.41 (Unix) Transfer-Encoding: chunked Vary: Accept-Encoding,Cookie X-Powered-By: PHP/5.2.5 X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=cpwiki_mediaToken;string-contains=cpwiki_mediaLoggedOut;string-contains=cpwiki_media_session
The server accepted the request and says everything is good-to-go. We can clearly see the server doesn't like public caches, and doesn't even want local caches to be used. It wants to close the connection after the content is sent, and the content is compressed. The content expired a long time ago, to further discourage clients from caching the content, but was recently modified. This server also tells us it is Apache, with a request processor called PHP.
Response Status Codes
Multiple response codes are defined in RFC-1945[1] and RFC-2616.[2] These status codes can tell the client about a variety of conditions including success, failure, or something else entirely.
100 Series
- 100 Continue - For requests with long request bodies, this is used by the server to inform the client that it should continue sending the request body.
- 101 Switching Protocols - Used by the server to inform the client that it is switching protocols. This may be done to change the HTTP version used for the request, or to switch to an extension of HTTP, such as WebDAV.[3]
200 Series
- 200 OK - Possibly the most commonly used, tells the client the server has processed the request as desired.
- 201 Created - Tells the client the specific resource has been created, but may or may not provide content.
- 202 Accepted - Request has been accepted as in 200 OK, but the request will be processed later.
- 203 Non-Authoritative Information - The response may come from a server other than the current server.
- 204 No Content - Just like 200 OK, but there is no content to be returned.
- 205 Reset Content - The request has been fulfilled, but changes to the displayed document should be reverted by the client. There should not be any response body with this status code.
- 206 Partial Content - If the client sent a Content-Range header, then the response body is the requested segment of the requested content.
300 Series
- 300 Multiple Choices - Based on the request given, there are multiple resources that may be provided. The server should provide a listing of these other resources.
- 301 Moved Permanently - The requested resource has been moved somewhere else. A Location response header should provide the new location of the resource.
- 302 Found - The requested resource has been temporarily moved to a new location. There should be a Location header as in 301.
- 303 See Other - The response is at a different location. This may be used by form processing scripts.
- 304 Not Modified - The client has requested a resource which has not been modified since the conditions set by the client have not changed.
- 305 Use Proxy - Used to tell clients that it should use a specific proxy. A Location header should give the location of the proxy.
- 306 - Unused in RFC-2616
- 307 Temporary Redirect - Used almost exactly like 302, but the client should not automatically use the new location.
400 Series
- 400 Bad Request - Server was unable to understand the request.
- 401 Unauthorized - Request requires authentication, but none is given.
- 402 Payment Required - Reserved.
- 403 Forbidden - Server refuses to fulfill the request, even with authentication.
- 404 Not Found - There are no resources matching the specified location.
- 405 Method Not Allowed - The request line contains a method that isn't permissible with the location given.
- 406 Not Acceptable - The client made a request with content parameters the server cannot fullfil.
- 407 Proxy Authentication Required - Similar to 401, but the client must first authenticate itself with the proxy server.
- 408 Request Timeout - The client was unable to send the request body in its entirety within the time allowed by the server.
- 409 Conflict - The resource has an internal conflict and a response body could not be provided.
- 410 Gone - The resource has moved, but the server does not know where it went.
- 411 Length Required - Server does not accept longer request bodies without knowing the length of the content.
- 412 Precondition Failed - Some condition failed and the server was unable to process the request.
- 413 Request Entity Too Large - The request body is just too big for the server to accept.
- 414 Request-URI Too Long - The server is unwilling to process the request location due to excessive length. This may be used by web developers to help secure a service.
- 415 Unsupported Media Type - Refusal by server because the resource is in a format not supported by the client.
- 416 Requested Range Not Satisfiable - The value of the Content-Range header extends beyond the length of the resource.
- 417 Expectation Failed - Client's expectations could not be satisfied.
500 Series
- 500 Internal Server Error - An unexpected error within the server prevented the request from being fulfilled.
- 501 Not Implemented - Client requests functionality not present in the server.
- 502 Bad Gateway - If the server is a gateway or proxy to another server, the upstream server provided an invalid response.
- 503 Service Unavailable - Request could not be processed because the server may be overloaded, or an administrator has placed it in maintenance mode.
- 504 Gateway Timeout - The upstream server did not respond in a timely manner.
- 505 HTTP Version Not Supported - Server does not support the client's HTTP version.
History
HTTP was developed by Tim Berners-Lee in 1990.[4] At the time it was known as HTTP/0.9 and was updated and extended many times by many people, yet was not a standard until it was published in RFC-1945[1] as HTTP/1.0. HTTP was designed to supplement and transfer HTML, also developed by Tim Berners-Lee. Modern servers and clients typically use HTTP/1.1 which was defined as an update to HTTP/1.0 in RFC-2616.[2]