HTTP: The Definitive Guide
Preface
Running Example: Joe's Hardware Store
Chapter-by-Chapter Guide
Typographic Conventions
Comments and Questions
Acknowledgments
Part I: HTTP: The Web's Foundation
Chapter 1. Overview of HTTP
1.1 HTTP: The Internet's Multimedia Courier
1.2 Web Clients and Servers
Figure 1-1. Web clients and servers
1.3 Resources
Figure 1-2. A web resource is anything that provides web content
1.3.1 Media Types
Figure 1-3. MIME types are sent back with the data content
1.3.2 URIs
Figure 1-4. URLs specify protocol, server, and local resource
1.3.3 URLs
Table 1-1. Example URLs
1.3.4 URNs
1.4 Transactions
Figure 1-5. HTTP transactions consist of request and response messages
1.4.1 Methods
Table 1-2. Some common HTTP methods
1.4.2 Status Codes
Table 1-3. Some common HTTP status codes
1.4.3 Web Pages Can Consist of Multiple Objects
Figure 1-6. Composite web pages require separate HTTP transactions for each embedded resource
1.5 Messages
Figure 1-7. HTTP messages have a simple, line-oriented text structure
1.5.1 Simple Message Example
Figure 1-8. Example GET transaction for http://www.joes-hardware.com/tools.html
1.6 Connections
1.6.1 TCP/IP
Figure 1-9. HTTP network protocol stack
1.6.2 Connections, IP Addresses, and Port Numbers
Figure 1-10. Basic browser connection process
1.6.3 A Real Example Using Telnet
Example 1-1. An HTTP transaction using telnet
1.7 Protocol Versions
1.8 Architectural Components of the Web
1.8.1 Proxies
Figure 1-11. Proxies relay traffic between client and server
1.8.2 Caches
Figure 1-12. Caching proxies keep local copies of popular documents to improve performance
1.8.3 Gateways
Figure 1-13. HTTP/FTP gateway
1.8.4 Tunnels
Figure 1-14. Tunnels forward data across non-HTTP networks (HTTP/SSL tunnel shown)
1.8.5 Agents
Figure 1-15. Automated search engine "spiders" are agents, fetching web pages around the world
1.9 The End of the Beginning
1.10 For More Information
1.10.1 HTTP Protocol Information
1.10.2 Historical Perspective
1.10.3 Other World Wide Web Information
Chapter 2. URLs and Resources
2.1 Navigating the Internet's Resources
Figure 2-1. How URLs relate to browser, machine, server, and location on the server's filesystem
2.1.1 The Dark Days Before URLs
2.2 URL Syntax
Table 2-1. General URL components
2.2.1 Schemes: What Protocol to Use
2.2.2 Hosts and Ports
2.2.3 Usernames and Passwords
2.2.4 Paths
2.2.5 Parameters
2.2.6 Query Strings
Figure 2-2. The URL query component is sent along to the gateway application
2.2.7 Fragments
Figure 2-3. The URL fragment is used only by the client, because the server deals with entire objects
2.3 URL Shortcuts
2.3.1 Relative URLs
Example 2-1. HTML snippet with relative URLs
Figure 2-4. Using a base URL
2.3.1.1 Base URLs
2.3.1.2 Resolving relative references
Figure 2-5. Converting relative to absolute URLs
2.3.2 Expandomatic URLs
2.4 Shady Characters
2.4.1 The URL Character Set
2.4.2 Encoding Mechanisms
Table 2-2. Some encoded character examples
2.4.3 Character Restrictions
Table 2-3. Reserved and restricted characters
2.4.4 A Bit More
2.5 A Sea of Schemes
Table 2-4. Common scheme formats
2.6 The Future
Figure 2-6. PURLs use a resource locator server to name the current location of a resource
2.6.1 If Not Now, When?
2.7 For More Information
Chapter 3. HTTP Messages
3.1 The Flow of Messages
3.1.1 Messages Commute Inbound to the Origin Server
Figure 3-1. Messages travel inbound to the origin server and outbound back to the client
3.1.2 Messages Flow Downstream
Figure 3-2. All messages flow downstream
3.2 The Parts of a Message
Figure 3-3. Three parts of an HTTP message
3.2.1 Message Syntax
Figure 3-4. An HTTP transaction has request and response messages
Figure 3-5. Example request and response messages
3.2.2 Start Lines
3.2.2.1 Request line
3.2.2.2 Response line
3.2.2.3 Methods
Table 3-1. Common HTTP methods
3.2.2.4 Status codes
Table 3-2. Status code classes
Table 3-3. Common status codes
3.2.2.5 Reason phrases
3.2.2.6 Version numbers
3.2.3 Headers
3.2.3.1 Header classifications
Table 3-4. Common header examples
3.2.3.2 Header continuation lines
3.2.4 Entity Bodies
3.2.5 Version 0.9 Messages
Figure 3-6. HTTP/0.9 transaction
3.3 Methods
3.3.1 Safe Methods
3.3.2 GET
Figure 3-7. GET example
3.3.3 HEAD
Figure 3-8. HEAD example
3.3.4 PUT
Figure 3-9. PUT example
3.3.5 POST
Figure 3-10. POST example
3.3.6 TRACE
Figure 3-11. TRACE example
3.3.7 OPTIONS
Figure 3-12. OPTIONS example
3.3.8 DELETE
Figure 3-13. DELETE example
3.3.9 Extension Methods
Table 3-5. Example web publishing extension methods
3.4 Status Codes
3.4.1 100-199: Informational Status Codes
Table 3-6. Informational status codes and reason phrases
3.4.1.1 Clients and 100 Continue
3.4.1.2 Servers and 100 Continue
3.4.1.3 Proxies and 100 Continue
3.4.2 200-299: Success Status Codes
Table 3-7. Success status codes and reason phrases
3.4.3 300-399: Redirection Status Codes
Figure 3-14. Redirected request to new location
Figure 3-15. Request redirected to use local copy
Table 3-8. Redirection status codes and reason phrases
3.4.4 400-499: Client Error Status Codes
Table 3-9. Client error status codes and reason phrases
3.4.5 500-599: Server Error Status Codes
Table 3-10. Server error status codes and reason phrases
3.5 Headers
3.5.1 General Headers
Table 3-11. General informational headers
3.5.1.1 General caching headers
Table 3-12. General caching headers
3.5.2 Request Headers
Table 3-13. Request informational headers
3.5.2.1 Accept headers
Table 3-14. Accept headers
3.5.2.2 Conditional request headers
Table 3-15. Conditional request headers
3.5.2.3 Request security headers
Table 3-16. Request security headers
3.5.2.4 Proxy request headers
Table 3-17. Proxy request headers
3.5.3 Response Headers
Table 3-18. Response informational headers
3.5.3.1 Negotiation headers
Table 3-19. Negotiation headers
3.5.3.2 Response security headers
Table 3-20. Response security headers
3.5.4 Entity Headers
Table 3-21. Entity informational headers
3.5.4.1 Content headers
Table 3-22. Content headers
3.5.4.2 Entity caching headers
Table 3-23. Entity caching headers
3.6 For More Information
Chapter 4. Connection Management
4.1 TCP Connections
Figure 4-1. Web browsers talk to web servers over TCP connections
4.1.1 TCP Reliable Data Pipes
Figure 4-2. TCP carries HTTP data in order, and without corruption
4.1.2 TCP Streams Are Segmented and Shipped by IP Packets
Figure 4-3. HTTP and HTTPS network protocol stacks
Figure 4-4. IP packets carry TCP segments, which carry chunks of the TCP data stream
4.1.3 Keeping TCP Connections Straight
Table 4-1. TCP connection values
Figure 4-5. Four distinct TCP connections
4.1.4 Programming with TCP Sockets
Table 4-2. Common socket interface functions for programming TCP connections
Figure 4-6. How TCP clients and servers communicate using the TCP sockets interface
4.2 TCP Performance Considerations
4.2.1 HTTP Transaction Delays
Figure 4-7. Timeline of a serial HTTP transaction
4.2.2 Performance Focus Areas
4.2.3 TCP Connection Handshake Delays
Figure 4-8. TCP requires two packet transfers to set up the connection before it can send data
4.2.4 Delayed Acknowledgments
4.2.5 TCP Slow Start
4.2.6 Nagle's Algorithm and TCP_NODELAY
4.2.7 TIME_WAIT Accumulation and Port Exhaustion
4.3 HTTP Connection Handling
4.3.1 The Oft-Misunderstood Connection Header
Figure 4-9. The Connection header allows the sender to specify connection-specific options
4.3.2 Serial Transaction Delays
Figure 4-10. Four transactions (serial)
4.4 Parallel Connections
Figure 4-11. Each component of a page involves a separate HTTP transaction
4.4.1 Parallel Connections May Make Pages Load Faster
Figure 4-12. Four transactions (parallel)
4.4.2 Parallel Connections Are Not Always Faster
4.4.3 Parallel Connections May "Feel" Faster
4.5 Persistent Connections
4.5.1 Persistent Versus Parallel Connections
4.5.2 HTTP/1.0+ Keep-Alive Connections
Figure 4-13. Four transactions (serial versus persistent)
4.5.3 Keep-Alive Operation
Figure 4-14. HTTP/1.0 keep-alive transaction header handshake
4.5.4 Keep-Alive Options
4.5.5 Keep-Alive Connection Restrictions and Rules
4.5.6 Keep-Alive and Dumb Proxies
4.5.6.1 The Connection header and blind relays
Figure 4-15. Keep-alive doesn't interoperate with proxies that don't support Connection headers
4.5.6.2 Proxies and hop-by-hop headers
4.5.7 The Proxy-Connection Hack
Figure 4-16. Proxy-Connection header fixes single blind relay
Figure 4-17. Proxy-Connection still fails for deeper hierarchies of proxies
4.5.8 HTTP/1.1 Persistent Connections
4.5.9 Persistent Connection Restrictions and Rules
4.6 Pipelined Connections
Figure 4-18. Four transactions (pipelined connections)
4.7 The Mysteries of Connection Close
4.7.1 "At Will" Disconnection
4.7.2 Content-Length and Truncation
4.7.3 Connection Close Tolerance, Retries, and Idempotency
4.7.4 Graceful Connection Close
Figure 4-19. TCP connections are bidirectional
4.7.4.1 Full and half closes
Figure 4-20. Full and half close
4.7.4.2 TCP close and reset errors
Figure 4-21. Data arriving at closed connection generates "connection reset by peer" error
4.7.4.3 Graceful close
4.8 For More Information
4.8.1 HTTP Connections
4.8.2 HTTP Performance Issues
4.8.3 TCP/IP
Part II: HTTP Architecture
Chapter 5. Web Servers
5.1 Web Servers Come in All Shapes and Sizes
5.1.1 Web Server Implementations
5.1.2 General-Purpose Software Web Servers
Figure 5-1. Web server market share as estimated by Netcraft's automated survey
5.1.3 Web Server Appliances
5.1.4 Embedded Web Servers
5.2 A Minimal Perl Web Server
Example 5-1. type-o-serve—a minimal Perl web serv
Figure 5-2. The type-o-serve utility lets you type in server responses to send back to clients
5.3 What Real Web Servers Do
Figure 5-3. Steps of a basic web server request
5.4 Step 1: Accepting Client Connections
5.4.1 Handling New Connections
5.4.2 Client Hostname Identification
Example 5-2. Configuring Apache to look up hostnames for HTML and CGI resources
5.4.3 Determining the Client User Through ident
Figure 5-4. Using the ident protocol to determine HTTP client username
5.5 Step 2: Receiving Request Messages
Figure 5-5. Reading a request message from a connection
5.5.1 Internal Representations of Messages
Figure 5-6. Parsing a request message into a convenient internal representation
5.5.2 Connection Input/Output Processing Architectures
Figure 5-7. Web server input/output architectures
5.6 Step 3: Processing Requests
5.7 Step 4: Mapping and Accessing Resources
5.7.1 Docroots
Figure 5-8. Mapping request URI to local web server resource
5.7.1.1 Virtually hosted docroots
Figure 5-9. Different docroots for virtually hosted requests
Example 5-3. Apache web server virtual host docroot configuration
5.7.1.2 User home directory docroots
Figure 5-10. Different docroots for different users
5.7.2 Directory Listings
5.7.3 Dynamic Content Resource Mapping
Figure 5-11. A web server can serve static resources as well as dynamic resources
5.7.4 Server-Side Includes (SSI)
5.7.5 Access Controls
5.8 Step 5: Building Responses
5.8.1 Response Entities
5.8.2 MIME Typing
Figure 5-12. A web server uses MIME types file to set outgoing Content-Type of resources
5.8.3 Redirection
5.9 Step 6: Sending Responses
5.10 Step 7: Logging
5.11 For More Information
Chapter 6. Proxies
6.1 Web Intermediaries
Figure 6-1. A proxy must be both a server and a client
6.1.1 Private and Shared Proxies
6.1.2 Proxies Versus Gateways
Figure 6-2. Proxies speak the same protocol; gateways tie together different protocols
6.2 Why Use Proxies?
Figure 6-3. Proxy application example: child-safe Internet filter
Figure 6-4. Proxy application example: centralized document access control
Figure 6-5. Proxy application example: security firewall
Figure 6-6. Proxy application example: web cache
Figure 6-7. Proxy application example: surrogate (in a server accelerator deployment)
Figure 6-8. Proxy application example: content routing
Figure 6-9. Proxy application example: content transcoder
Figure 6-10. Proxy application example: anonymizer
6.3 Where Do Proxies Go?
6.3.1 Proxy Server Deployment
Figure 6-11. Proxies can be deployed many ways, depending on their intended use
6.3.2 Proxy Hierarchies
Figure 6-12. Three-level proxy hierarchy
6.3.2.1 Proxy hierarchy content routing
Figure 6-13. Proxy hierarchies can be dynamic, changing for each request
6.3.3 How Proxies Get Traffic
Figure 6-14. There are many techniques to direct web requests to proxies
6.4 Client Proxy Settings
6.4.1 Client Proxy Configuration: Manual
6.4.2 Client Proxy Configuration: PAC Files
Table 6-1. Proxy auto-configuration script return values
Example 6-1. Example proxy auto-configuration file
6.4.3 Client Proxy Configuration: WPAD
6.5 Tricky Things About Proxy Requests
6.5.1 Proxy URIs Differ from Server URIs
Figure 6-15. Intercepting proxies will get server requests
6.5.2 The Same Problem with Virtual Hosting
6.5.3 Intercepting Proxies Get Partial URIs
6.5.4 Proxies Can Handle Both Proxy and Server Requests
6.5.5 In-Flight URI Modification
6.5.6 URI Client Auto-Expansion and Hostname Resolution
6.5.7 URI Resolution Without a Proxy
Figure 6-16. Browser auto-expands partial hostnames when no explicit proxy is present
6.5.8 URI Resolution with an Explicit Proxy
Figure 6-17. Browser does not auto-expand partial hostnames when there is an explicit proxy
6.5.9 URI Resolution with an Intercepting Proxy
Figure 6-18. Browser doesn't detect dead server IP addresses when using intercepting proxies
6.6 Tracing Messages
Figure 6-19. Access proxies and CDN proxies create two-level proxy hierarchies
6.6.1 The Via Header
Figure 6-20. Via header example
6.6.1.1 Via syntax
6.6.1.2 Via request and response paths
Figure 6-21. The response Via is usually the reverse of the request Via
6.6.1.3 Via and gateways
Figure 6-22. HTTP/FTP gateway generates Via headers, logging the received protocol (FTP)
6.6.1.4 The Server and Via headers
6.6.1.5 Privacy and security implications of Via
6.6.2 The TRACE Method
Figure 6-23. TRACE response reflects back the received request message
6.6.2.1 Max-Forwards
Figure 6-24. You can limit the forwarding hop count with the Max-Forwards header field
6.7 Proxy Authentication
Figure 6-25. Proxies can implement authentication to control access to content
6.8 Proxy Interoperation
6.8.1 Handling Unsupported Headers and Methods
6.8.2 OPTIONS: Discovering Optional Feature Support
Figure 6-26. Using OPTIONS to find a server's supported methods
6.8.3 The Allow Header
6.9 For More Information
Chapter 7. Caching
7.1 Redundant Data Transfers
7.2 Bandwidth Bottlenecks
Figure 7-1. Limited wide area bandwidth creates a bottleneck that caches can improve
Table 7-1. Bandwidth-imposed transfer time delays, idealized (time in seconds)
7.3 Flash Crowds
Figure 7-2. Flash crowds can overload web servers
7.4 Distance Delays
Figure 7-3. Speed of light can cause significant delays, even with parallel, keep-alive connections
7.5 Hits and Misses
Figure 7-4. Cache hits, misses, and revalidations
7.5.1 Revalidations
Figure 7-5. Successful revalidations are faster than cache misses; failed revalidations are nearly identical to misses
Figure 7-6. HTTP uses If-Modified-Since header for revalidation
7.5.2 Hit Rate
7.5.3 Byte Hit Rate
7.5.4 Distinguishing Hits and Misses
7.6 Cache Topologies
Figure 7-7. Public and private caches
7.6.1 Private Caches
7.6.2 Public Proxy Caches
Figure 7-8. Shared, public caches can decrease network traffic
7.6.3 Proxy Cache Hierarchies
Figure 7-9. Accessing documents in a two-level cache hierarchy
7.6.4 Cache Meshes, Content Routing, and Peering
Figure 7-10. Sibling caches
7.7 Cache Processing Steps
Figure 7-11. Processing a fresh cache hit
7.7.1 Step 1: Receiving
7.7.2 Step 2: Parsing
7.7.3 Step 3: Lookup
7.7.4 Step 4: Freshness Check
7.7.5 Step 5: Response Creation
7.7.6 Step 6: Sending
7.7.7 Step 7: Logging
7.7.8 Cache Processing Flowchart
Figure 7-12. Cache GET request flowchart
7.8 Keeping Copies Fresh
7.8.1 Document Expiration
Figure 7-13. Expires and Cache Control headers
7.8.2 Expiration Dates and Ages
Table 7-2. Expiration response headers
7.8.3 Server Revalidation
7.8.4 Revalidation with Conditional Methods
Table 7-3. Two conditional headers used in cache revalidation
7.8.5 If-Modified-Since: Date Revalidation
Figure 7-14. If-Modified-Since revalidations return 304 if unchanged or 200 with new body if changed
7.8.6 If-None-Match: Entity Tag Revalidation
Figure 7-15. If-None-Match revalidates because entity tag still matches
7.8.7 Weak and Strong Validators
7.8.8 When to Use Entity Tags and Last-Modified Dates
7.9 Controlling Cachability
7.9.1 No-Cache and No-Store Headers
7.9.2 Max-Age Response Headers
7.9.3 Expires Response Headers
7.9.4 Must-Revalidate Response Headers
7.9.5 Heuristic Expiration
Figure 7-16. Computing a freshness period using the LM-Factor algorithm
7.9.6 Client Freshness Constraints
Table 7-4. Cache-Control request directives
7.9.7 Cautions
7.10 Setting Cache Controls
7.10.1 Controlling HTTP Headers with Apache
7.10.2 Controlling HTML Caching Through HTTP-EQUIV
Figure 7-17. HTTP-EQUIV tags cause problems, because most software ignores them
7.11 Detailed Algorithms
7.11.1 Age and Freshness Lifetime
7.11.2 Age Computation
Example 7-1. HTTP/1.1 age-calculation algorithm calculates the overall age of a cached document
7.11.2.1 Apparent age is based on the Date header
7.11.2.2 Hop-by-hop age calculations
7.11.2.3 Compensating for network delays
7.11.3 Complete Age-Calculation Algorithm
Figure 7-18. The age of a cached document includes resident time in the network and cache
7.11.4 Freshness Lifetime Computation
7.11.5 Complete Server-Freshness Algorithm
Example 7-2. Server freshness constraint calculation
Example 7-3. Client freshness constraint calculation
7.12 Caches and Advertising
7.12.1 The Advertiser's Dilemma
7.12.2 The Publisher's Response
7.12.3 Log Migration
7.12.4 Hit Metering and Usage Limiting
7.13 For More Information
Chapter 8. Integration Points: Gateways, Tunnels, and Relays
8.1 Gateways
Figure 8-1. Gateway magic
Figure 8-2. Three web gateway examples
8.1.1 Client-Side and Server-Side Gateways
8.2 Protocol Gateways
Figure 8-3. Configuring an HTTP/FTP gateway
Figure 8-4. Browsers can configure particular protocols to use particular gateways
8.2.1 HTTP/*: Server-Side Web Gateways
Figure 8-5. The HTTP/FTP gateway translates HTTP request into FTP requests
8.2.2 HTTP/HTTPS: Server-Side Security Gateways
Figure 8-6. Inbound HTTP/HTTPS security gateway
8.2.3 HTTPS/HTTP: Client-Side Security Accelerator Gateways
Figure 8-7. HTTPS/HTTP security accelerator gateway
8.3 Resource Gateways
Figure 8-8. An application server connects HTTP clients to arbitrary backend applications
Figure 8-9. Server gateway application mechanics
8.3.1 Common Gateway Interface (CGI)
8.3.2 Server Extension APIs
8.4 Application Interfaces and Web Services
8.5 Tunnels
8.5.1 Establishing HTTP Tunnels with CONNECT
Figure 8-10. Using CONNECT to establish an SSL tunnel
8.5.1.1 CONNECT requests
8.5.1.2 CONNECT responses
8.5.2 Data Tunneling, Timing, and Connection Management
8.5.3 SSL Tunneling
Figure 8-11. Tunnels let non-HTTP traffic flow through HTTP connections
Figure 8-12. Direct SSL connection vs. tunnelled SSL connection
8.5.4 SSL Tunneling Versus HTTP/HTTPS Gateways
8.5.5 Tunnel Authentication
Figure 8-13. Gateways can proxy-authenticate a client before it's allowed to use a tunnel
8.5.6 Tunnel Security Considerations
8.6 Relays
Figure 8-14. Simple blind relays can hang if they are single-tasking and don't support the Connection header
8.7 For More Information
Chapter 9. Web Robots
9.1 Crawlers and Crawling
9.1.1 Where to Start: The "Root Set"
Figure 9-1. A root set is needed to reach all pages
9.1.2 Extracting Links and Normalizing Relative Links
9.1.3 Cycle Avoidance
Figure 9-2. Crawling over a web of hyperlinks
9.1.4 Loops and Dups
9.1.5 Trails of Breadcrumbs
9.1.6 Aliases and Robot Cycles
Table 9-1. Different URLs that alias to the same documents
9.1.7 Canonicalizing URLs
9.1.8 Filesystem Link Cycles
Figure 9-3. Symbolic link cycles
9.1.9 Dynamic Virtual Web Spaces
Figure 9-4. Malicious dynamic web space example
9.1.10 Avoiding Loops and Dups
9.2 Robotic HTTP
9.2.1 Identifying Request Headers
9.2.2 Virtual Hosting
Figure 9-5. Example of virtual docroots causing trouble if no Host header is sent with the request
9.2.3 Conditional Requests
9.2.4 Response Handling
9.2.4.1 Status codes
9.2.4.2 Entities
9.2.5 User-Agent Targeting
9.3 Misbehaving Robots
9.4 Excluding Robots
Figure 9-6. Fetching robots.txt and verifying accessibility before crawling the target file
9.4.1 The Robots Exclusion Standard
Table 9-2. Robots Exclusion Standard versions
9.4.2 Web Sites and robots.txt Files
9.4.2.1 Fetching robots.txt
9.4.2.2 Response codes
9.4.3 robots.txt File Format
9.4.3.1 The User-Agent line
9.4.3.2 The Disallow and Allow lines
9.4.3.3 Disallow/Allow prefix matching
Table 9-3. Robots.txt path matching examples
9.4.4 Other robots.txt Wisdom
9.4.5 Caching and Expiration of robots.txt
9.4.6 Robot Exclusion Perl Code
Table 9-4. Robot accessibility to the Mary's Antiques web site
9.4.7 HTML Robot-Control META Tags
9.4.7.1 Robot META directives
9.4.7.2 Search engine META tags
Table 9-5. Additional META tag directives
9.5 Robot Etiquette
Table 9-6. Guidelines for web robot operators
9.6 Search Engines
9.6.1 Think Big
9.6.2 Modern Search Engine Architecture
Figure 9-7. A production search engine contains cooperating crawlers and query gateways
9.6.3 Full-Text Index
Figure 9-8. Three documents and a full-text index
9.6.4 Posting the Query
Figure 9-9. Example search query request
9.6.5 Sorting and Presenting the Results
9.6.6 Spoofing
9.7 For More Information
Chapter 10. HTTP-NG
10.1 HTTP's Growing Pains
10.2 HTTP-NG Activity
10.3 Modularize and Enhance
Figure 10-1. HTTP-NG separates functions into layers
10.4 Distributed Objects
10.5 Layer 1: Messaging
10.6 Layer 2: Remote Invocation
10.7 Layer 3: Web Application
10.8 WebMUX
Figure 10-2. WebMUX can multiplex multiple messages over a single connection
10.9 Binary Wire Protocol
10.10 Current Status
10.11 For More Information
Part III: Identification, Authorization, and Security
Chapter 11. Client Identification and Cookies
11.1 The Personal Touch
11.2 HTTP Headers
Table 11-1. HTTP headers carry clues about users
11.3 Client IP Address
Figure 11-1. Proxies can add extension headers to pass along the original client IP address
11.4 User Login
Figure 11-2. Registering username using HTTP authentication headers
11.5 Fat URLs
11.6 Cookies
11.6.1 Types of Cookies
11.6.2 How Cookies Work
Figure 11-3. Slapping a cookie onto a user
11.6.3 Cookie Jar: Client-Side State
11.6.3.1 Netscape Navigator cookies
11.6.3.2 Microsoft Internet Explorer cookies
Figure 11-4. Internet Explorer cookies are stored in individual text files in the cache directory
11.6.4 Different Cookies for Different Sites
11.6.4.1 Cookie Domain attribute
11.6.4.2 Cookie Path attribute
11.6.5 Cookie Ingredients
Table 11-2. Cookie specifications
11.6.6 Version 0 (Netscape) Cookies
11.6.6.1 Version 0 Set-Cookie header
Table 11-3. Version 0 (Netscape) Set-Cookie attributes
11.6.6.2 Version 0 Cookie header
11.6.7 Version 1 (RFC 2965) Cookies
11.6.7.1 Version 1 Set-Cookie2 header
Table 11-4. Version 1 (RFC 2965) Set-Cookie2 attributes
11.6.7.2 Version 1 Cookie header
11.6.7.3 Version 1 Cookie2 header and version negotiation
11.6.8 Cookies and Session Tracking
Figure 11-5. The Amazon.com web site uses session cookies to track users
11.6.9 Cookies and Caching
11.6.10 Cookies, Security, and Privacy
11.7 For More Information
Chapter 12. Basic Authentication
12.1 Authentication
12.1.1 HTTP's Challenge/Response Authentication Framework
Figure 12-1. Simplified challenge/response authentication
12.1.2 Authentication Protocols and Headers
Table 12-1. Four phases of authentication
Figure 12-2. Basic authentication example
12.1.3 Security Realms
Figure 12-3. Security realms in a web server
12.2 Basic Authentication
12.2.1 Basic Authentication Example
Table 12-2. Basic authentication headers
12.2.2 Base-64 Username/Password Encoding
Figure 12-4. Generating a basic Authorization header from username and password
12.2.3 Proxy Authentication
Table 12-3. Web server versus proxy authentication
12.3 The Security Flaws of Basic Authentication
12.4 For More Information
Chapter 13. Digest Authentication
13.1 The Improvements of Digest Authentication
13.1.1 Using Digests to Keep Passwords Secret
Figure 13-1. Using digests for password-obscured authentication
13.1.2 One-Way Digests
Table 13-1. MD5 digest examples
13.1.3 Using Nonces to Prevent Replays
13.1.4 The Digest Authentication Handshake
Figure 13-2. Digest authentication handshake
Figure 13-3. Basic versus digest authentication syntax
13.2 Digest Calculations
13.2.1 Digest Algorithm Input Data
13.2.2 The Algorithms H(d) and KD(s,d)
13.2.3 The Security-Related Data (A1)
Table 13-2. Definitions for A1 by algorithm
13.2.4 The Message-Related Data (A2)
Table 13-3. Definitions for A2 by algorithm (request digests)
13.2.5 Overall Digest Algorithm
Table 13-4. Old and new digest algorithms
Table 13-5. Unfolded digest algorithm cheat sheet
13.2.6 Digest Authentication Session
13.2.7 Preemptive Authorization
Figure 13-4. Preemptive authorization reduces message count
13.2.7.1 Next nonce pregeneration
13.2.7.2 Limited nonce reuse
13.2.7.3 Synchronized nonce generation
13.2.8 Nonce Selection
13.2.9 Symmetric Authentication
Table 13-6. Definitions for A2 by algorithm (request digests)
Table 13-7. Definitions for A2 by algorithm (response digests)
13.3 Quality of Protection Enhancements
13.3.1 Message Integrity Protection
13.3.2 Digest Authentication Headers
Table 13-8. HTTP authentication headers
13.4 Practical Considerations
13.4.1 Multiple Challenges
13.4.2 Error Handling
13.4.3 Protection Spaces
13.4.4 Rewriting URIs
13.4.5 Caches
13.5 Security Considerations
13.5.1 Header Tampering
13.5.2 Replay Attacks
13.5.3 Multiple Authentication Mechanisms
13.5.4 Dictionary Attacks
13.5.5 Hostile Proxies and Man-in-the-Middle Attacks
13.5.6 Chosen Plaintext Attacks
13.5.7 Storing Passwords
13.6 For More Information
Chapter 14. Secure HTTP
14.1 Making HTTP Safe
14.1.1 HTTPS
Figure 14-1. Browsing secure web sites
Figure 14-2. HTTPS is HTTP layered over a security layer, layered over TCP
14.2 Digital Cryptography
14.2.1 The Art and Science of Secret Coding
14.2.2 Ciphers
Figure 14-3. Plaintext and ciphertext
Figure 14-4. Rotate-by-3 cipher example
14.2.3 Cipher Machines
14.2.4 Keyed Ciphers
Figure 14-5. The rotate-by-N cipher, using different keys
14.2.5 Digital Ciphers
Figure 14-6. Plaintext is encoded with encoding key e, and decoded using decoding key d
14.3 Symmetric-Key Cryptography
Figure 14-7. Symmetric-key cryptography algorithms use the same key for encoding and decoding
14.3.1 Key Length and Enumeration Attacks
Table 14-1. Longer keys take more effort to crack (1995 data, from "Applied Cryptography")
14.3.2 Establishing Shared Keys
14.4 Public-Key Cryptography
Figure 14-8. Public-key cryptography is asymmetric, using different keys for encoding and decoding
Figure 14-9. Public-key cryptography assigns a single, public encoding key to each host
14.4.1 RSA
14.4.2 Hybrid Cryptosystems and Session Keys
14.5 Digital Signatures
14.5.1 Signatures Are Cryptographic Checksums
Figure 14-10. Unencrypted digital signature
14.6 Digital Certificates
14.6.1 The Guts of a Certificate
Figure 14-11. Typical digital signature format
14.6.2 X.509 v3 Certificates
Table 14-2. X.509 certificate fields
14.6.3 Using Certificates to Authenticate Servers
Figure 14-12. Verifying that a signature is real
14.7 HTTPS: The Details
14.7.1 HTTPS Overview
Figure 14-13. HTTP transport-level security
14.7.2 HTTPS Schemes
Figure 14-14. HTTP and HTTPS port numbers
14.7.3 Secure Transport Setup
Figure 14-15. HTTP and HTTPS transactions
14.7.4 SSL Handshake
Figure 14-16. SSL handshake (simplified)
14.7.5 Server Certificates
Figure 14-17. HTTPS certificates are X.509 certificates with site information
14.7.6 Site Certificate Validation
14.7.7 Virtual Hosting and Certificates
Figure 14-18. Certificate name mismatches bring up certificate error dialog boxes
14.8 A Real HTTPS Client
14.8.1 OpenSSL
14.8.2 A Simple HTTPS Client
14.8.3 Executing Our Simple OpenSSL Client
14.9 Tunneling Secure Traffic Through Proxies
Figure 14-19. Corporate firewall proxy
Figure 14-20. Proxy can't proxy an encrypted request
14.10 For More Information
14.10.1 HTTP Security
14.10.2 SSL and TLS
14.10.3 Public-Key Infrastructure
14.10.4 Digital Cryptography
Part IV: Entities, Encodings, and Internationalization
Chapter 15. Entities and Encodings
15.1 Messages Are Crates, Entities Are Cargo
Figure 15-1. Message entity is made up of entity headers and entity body
15.1.1 Entity Bodies
Figure 15-2. Hex dumps of real message content (raw message content follows blank CRLF)
15.2 Content-Length: The Entity's Size
15.2.1 Detecting Truncation
15.2.2 Incorrect Content-Length
15.2.3 Content-Length and Persistent Connections
15.2.4 Content Encoding
15.2.5 Rules for Determining Entity Body Length
15.3 Entity Digests
15.4 Media Type and Charset
Table 15-1. Common media types
15.4.1 Character Encodings for Text Media
15.4.2 Multipart Media Types
15.4.3 Multipart Form Submissions
15.4.4 Multipart Range Responses
15.5 Content Encoding
15.5.1 The Content-Encoding Process
Figure 15-3. Content-encoding example
15.5.2 Content-Encoding Types
Table 15-2. Content-encoding tokens
15.5.3 Accept-Encoding Headers
Figure 15-4. Content encoding
15.6 Transfer Encoding and Chunked Encoding
Figure 15-5. Content encodings versus transfer encodings
15.6.1 Safe Transport
15.6.2 Transfer-Encoding Headers
15.6.3 Chunked Encoding
15.6.3.1 Chunking and persistent connections
Figure 15-6. Anatomy of a chunked message
15.6.3.2 Trailers in chunked messages
15.6.4 Combining Content and Transfer Encodings
Figure 15-7. Combining content encoding with transfer encoding
15.6.5 Transfer-Encoding Rules
15.7 Time-Varying Instances
Figure 15-8. Instances are "snapshots" of a resource in time
15.8 Validators and Freshness
15.8.1 Freshness
Table 15-3. Cache-Control header directives
15.8.2 Conditionals and Validators
Table 15-4. Conditional request types
15.9 Range Requests
Figure 15-9. Entity range request example
15.10 Delta Encoding
Figure 15-10. Mechanics of delta-encoding
Table 15-5. Delta-encoding headers
15.10.1 Instance Manipulations, Delta Generators, and Delta Appliers
Table 15-6. IANA registered types of instance manipulations
15.11 For More Information
Chapter 16. Internationalization
16.1 HTTP Support for International Content
16.2 Character Sets and HTTP
16.3 Multilingual Character Encoding Primer
16.4 Language Tags and HTTP
16.5 Internationalized URIs
16.6 Other Considerations
16.7 For More Information
Chapter 17. Content Negotiation and Transcoding
17.1 Content-Negotiation Techniques
17.2 Client-Driven Negotiation
17.3 Server-Driven Negotiation
17.4 Transparent Negotiation
17.5 Transcoding
17.6 Next Steps
17.7 For More Information
Part V: Content Publishing and Distribution
Chapter 18. Web Hosting
18.1 Hosting Services
18.1.1 A Simple Example: Dedicated Hosting
Figure 18-1. Outsourced dedicated hosting
18.2 Virtual Hosting
Figure 18-2. Outsourced virtual hosting
18.2.1 Virtual Server Request Lacks Host Information
Figure 18-3. HTTP/1.0 server requests don't contain hostname information
18.2.2 Making Virtual Hosting Work
18.2.2.1 Virtual hosting by URL path
18.2.2.2 Virtual hosting by port number
18.2.2.3 Virtual hosting by IP address
Figure 18-4. Virtual IP hosting
18.2.2.4 Virtual hosting by Host header
Figure 18-5. Host headers distinguish virtual host requests
18.2.3 HTTP/1.1 Host Headers
18.2.3.1 Syntax and usage
18.2.3.2 Missing Host headers
18.2.3.3 Interpreting Host headers
18.2.3.4 Host headers and proxies
18.3 Making Web Sites Reliable
18.3.1 Mirrored Server Farms
Figure 18-6. Mirrored server farm
Figure 18-7. Dispersed mirrored servers
18.3.2 Content Distribution Networks
18.3.3 Surrogate Caches in CDNs
18.3.4 Proxy Caches in CDNs
Figure 18-8. Client requests intercepted by a switch and sent to a proxy
18.4 Making Web Sites Fast
18.5 For More Information
Chapter 19. Publishing Systems
19.1 FrontPage Server Extensions for Publishing Support
19.1.1 FrontPage Server Extensions
Figure 19-1. FrontPage publishing architecture
19.1.2 FrontPage Vocabulary
19.1.3 The FrontPage RPC Protocol
Figure 19-2. Initial request
19.1.3.1 Request
19.1.3.2 Response
19.1.4 FrontPage Security Model
19.2 WebDAV and Collaborative Authoring
19.2.1 WebDAV Methods
19.2.2 WebDAV and XML
19.2.3 WebDAV Headers
19.2.4 WebDAV Locking and Overwrite Prevention
Figure 19-3. Lost update problem
19.2.5 The LOCK Method
19.2.5.1 The opaquelocktoken scheme
19.2.5.2 The XML element
19.2.5.3 Lock refreshes and the Timeout header
19.2.6 The UNLOCK Method
Table 19-1. Status codes for LOCK and UNLOCK methods
19.2.7 Properties and META Data
19.2.8 The PROPFIND Method
19.2.9 The PROPPATCH Method
Table 19-2. Status codes for PROPFIND and PROPPATCH methods
19.2.10 Collections and Namespace Management
19.2.11 The MKCOL Method
19.2.12 The DELETE Method
19.2.13 The COPY and MOVE Methods
19.2.13.1 Overwrite header effect
19.2.13.2 COPY/MOVE of properties
19.2.13.3 Locked resources and COPY/MOVE
Table 19-3. Status codes for the MKCOL, DELETE, COPY, and MOVE methods
19.2.14 Enhanced HTTP/1.1 Methods
19.2.14.1 The PUT method
19.2.14.2 The OPTIONS method
19.2.15 Version Management in WebDAV
19.2.16 Future of WebDAV
19.3 For More Information
Chapter 20. Redirection and Load Balancing
20.1 Why Redirect?
20.2 Where to Redirect
20.3 Overview of Redirection Protocols
Table 20-1. General redirection methods
Table 20-2. Proxy and cache redirection techniques
20.4 General Redirection Methods
20.4.1 HTTP Redirection
Figure 20-1. HTTP redirection
20.4.2 DNS Redirection
Figure 20-2. DNS-based redirection
20.4.2.1 DNS round robin
Example 20-1. IP addresses for www.cnn.com
20.4.2.2 Multiple addresses and round-robin address rotation
Example 20-2. Rotating DNS address lists
20.4.2.3 DNS round robin for load balancing
Figure 20-3. DNS round robin load balances across servers in a server farm
20.4.2.4 The impact of DNS caching
20.4.2.5 Other DNS-based redirection algorithms
Figure 20-4. DNS request involving authoritative server
20.4.3 Anycast Addressing
Figure 20-5. Distributed anycast addressing
20.4.4 IP MAC Forwarding
Figure 20-6. Layer-2 switch sending client requests to a gateway
Figure 20-7. MAC forwarding using a layer-4 switch
20.4.5 IP Address Forwarding
Figure 20-8. A switch doing IP forwarding to a caching proxy or mirrored web server
Figure 20-9. Full NAT of a TCP/IP datagram
20.4.6 Network Element Control Protocol
20.4.6.1 Messages
Table 20-3. NECP messages
20.5 Proxy Redirection Methods
20.5.1 Explicit Browser Configuration
20.5.2 Proxy Auto-configuration
Figure 20-10. Proxy auto-configuration
20.5.3 Web Proxy Autodiscovery Protocol
20.5.3.1 PAC file autodiscovery
Figure 20-11. WPAD determines the PAC URL, which determines the proxy server
20.5.3.2 WPAD algorithm
20.5.3.3 CURL discovery using DHCP
20.5.3.4 DNS A record lookup
20.5.3.5 Retrieving the PAC file
20.5.3.6 When to execute WPAD
20.5.3.7 WPAD spoofing
20.5.3.8 Timeouts
20.5.3.9 Administrator considerations
20.6 Cache Redirection Methods
20.6.1 WCCP Redirection
20.6.1.1 How WCCP redirection works
20.6.1.2 WCCP2 messages
Table 20-4. WCCP2 messages
20.6.1.3 Message components
Table 20-5. WCCP2 message components
20.6.1.4 Service groups
20.6.1.5 GRE packet encapsulation
Figure 20-12. How a WCCP router changes an HTTP packet's destination IP address
20.6.1.6 WCCP load balancing
20.7 Internet Cache Protocol
20.8 Cache Array Routing Protocol
Figure 20-13. ICP queries
Figure 20-14. CARP redirection
20.9 Hyper Text Caching Protocol
Figure 20-15. HTCP message format
Table 20-6. HTCP data components
Table 20-7. HTCP opcodes
20.9.1 HTCP Authentication
Table 20-8. HTCP authentication components
20.9.2 Setting Caching Policies
Table 20-9. List of Cache headers for modifying caching policies
20.10 For More Information
Chapter 21. Logging and Usage Tracking
21.1 What to Log?
21.2 Log Formats
21.2.1 Common Log Format
Table 21-1. Common Log Format fields
Example 21-1. Common Log Format
21.2.2 Combined Log Format
Table 21-2. Additional Combined Log Format fields
Example 21-2. Combined Log Format
21.2.3 Netscape Extended Log Format
Table 21-3. Additional Netscape Extended Log Format fields
Example 21-3. Netscape Extended Log Format
21.2.4 Netscape Extended 2 Log Format
Table 21-4. Additional Netscape Extended 2 Log Format fields
Example 21-4. Netscape Extended 2 Log Format
Table 21-5. Netscape route codes
Table 21-6. Netscape finish status codes
Table 21-7. Netscape cache codes
21.2.5 Squid Proxy Log Format
Table 21-8. Squid Log Format fields
Example 21-5. Squid Log Format
Table 21-9. Squid result codes
21.3 Hit Metering
21.3.1 Overview
21.3.2 The Meter Header
Table 21-10. Hit Metering directives
Figure 21-1. Hit Metering example
21.4 A Word on Privacy
21.5 For More Information
Part VI: Appendixes
Appendix A. URI Schemes
Table A-1. URI schemes from the W3C registry
Appendix B. HTTP Status Codes
B.1 Status Code Classifications
Table B-1. Status code classifications
B.2 Status Codes
Table B-2. Status codes
Appendix C. HTTP Header Reference
Appendix D. MIME Types
D.1 Background
D.2 MIME Type Structure
D.2.1 Discrete Types
D.2.2 Composite Types
D.2.3 Multipart Types
D.2.4 Syntax
Table D-1. Common primary MIME types
D.3 MIME Type IANA Registration
D.3.1 Registration Trees
Table D-2. Four MIME media type registration trees
D.3.2 Registration Process
D.3.3 Registration Rules
D.3.4 Registration Template
Example D-1. IANA MIME registration email template
D.3.5 MIME Media Type Registry
D.4 MIME Type Tables
D.4.1 application/*
Table D-3. "Application" MIME types
D.4.2 audio/*
Table D-4. "Audio" MIME types
D.4.3 chemical/*
Table D-5. "Chemical" MIME types
D.4.4 image/*
Table D-6. "Image" MIME types
D.4.5 message/*
Table D-7. "Message" MIME types
D.4.6 model/*
Table D-8. "Model" MIME types
D.4.7 multipart/*
Table D-9. "Multipart" MIME types
D.4.8 text/*
Table D-10. "Text" MIME types
D.4.9 video/*
Table D-11. "Video" MIME types
D.4.10 Experimental Types
Table D-12. Extension MIME types
Appendix E. Base-64 Encoding
E.1 Base-64 Encoding Makes Binary Data Safe
E.2 Eight Bits to Six Bits
Table E-1. Base-64 alphabet
Figure E-1. Base-64 encoding example
E.3 Base-64 Padding
Table E-2. Base-64 padding examples
E.4 Perl Implementation
E.5 For More Information
Appendix F. Digest Authentication
F.1 Digest WWW-Authenticate Directives
Table F-1. Digest WWW-Authenticate header directives (from RFC 2617)
F.2 Digest Authorization Directives
Table F-2. Digest Authorization header directives (from RFC 2617)
F.3 Digest Authentication-Info Directives
Table F-3. Digest Authentication-Info header directives (from RFC 2617)
F.4 Reference Code
F.4.1 File "digcalc.h"
F.4.2 File "digcalc.c"
F.4.3 File "digtest.c"
Appendix G. Language Tags
G.1 First Subtag Rules
G.2 Second Subtag Rules
G.3 IANA-Registered Language Tags
Table G-1. Language tags
G.4 ISO 639 Language Codes
Table G-2. ISO 639 and 639-2 language codes
G.5 ISO 3166 Country Codes
Table G-3. ISO 3166 country codes
G.6 Language Administrative Organizations
Appendix H. MIME Charset Registry
H.1 MIME Charset Registry
H.2 Preferred MIME Names
H.3 Registered Charsets
Table H-1. IANA MIME charset tags