For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.
Contents at a Glance
Foreword ........................................................................................ xiii
About the Authors ............................................................................ xv
About the Technical Reviewer ....................................................... xvii
Acknowledgments .......................................................................... xix
■
Chapter 1: Introduction to HTML5 WebSocket
................................ 1
■
Chapter 2: The WebSocket API
..................................................... 13
■
Chapter 3: The WebSocket Protocol
............................................. 33
■
Chapter 4: Building Instant Messaging and Chat
over WebSocket with XMPP ......................................................... 61
■
Chapter 5: Using Messaging over WebSocket with STOMP
.......... 85
■
Chapter 6: VNC with the Remote Framebuffer Protocol
............. 109
■
Chapter 7: WebSocket Security
.................................................. 129
■
Chapter 8: Deployment Considerations
...................................... 149
■
Appendix A: Inspecting WebSocket Traffic
................................. 163
■
Appendix B: WebSocket Resources
............................................ 177
Index .............................................................................................. 183
v
Chapter 1
Introduction to HTML5
WebSocket
This book is for anyone who wants to learn how to build real-time web applications.
You might say to yourself, “I already do that!” or ask “What does that really mean?” Let’s
clarify: this book will show you how to build truly real-time web applications using a
revolutionary new and widely supported open industry standard technology called
WebSocket, which enables full-duplex, bidirectional communication between your client
application and remote servers over the Web—without plugins!
Still confused? So were we a few years ago, before we started working with HTML5
WebSocket. In this guide, we’ll explain what you need to know about WebSocket, and
why you should be thinking about using WebSocket today. We will show you how to
implement a WebSocket client in your web application, create your own WebSocket
server, use WebSocket with higher-level protocols like XMPP and STOMP, secure traffic
between your client and server, and deploy your WebSocket-based applications. Finally,
we will explain why you should be thinking about using WebSocket right now.
What is HTML5?
First, let’s examine the “HTML5” part of “HTML5 WebSocket.” If you’re already an expert
with HTML5, having read, say, Pro HTML5 Programming, and are already developing
wonderfully modern and responsive web applications, then feel free to skip this section
and read on. But, if you’re new to HTML5, here’s a quick introduction.
HTML was originally designed for static, text-based document sharing on the
Internet. Over time, as web users and designers wanted more interactivity in their HTML
documents, they began enhancing these documents, by adding form functionality and
early “portal” type capabilities. Now, these static document collections, or web sites,
are more like web applications, based on the principles of rich client/server desktop
applications. These web applications are being used on almost any device: laptops, smart
phones, tablets—the gamut.
HTML5 is designed to make the development of these rich web applications easier,
more natural, and more logical, where developers can design and build once, and deploy
anywhere. HTML5 makes web applications more usable, as well, as it removes the need
for plugins. With HTML5, you now use semantic markup language like instead
of
. Multimedia is also much easier to code, by using tags like
1
CHAPTER 1 ■ INTRODUCTION TO HTML5 WEBSOCKET
CHAPTER 1 ■ INTRODUCTION TO HTML5 WEBSOCKET
browser. Cross-Document Messaging provides asynchronous messages passing between
JavaScript contexts.
The HTML5 specification for Cross-Document Messaging also clarifies and refines
domain security by introducing the concept of origin, which is defined by a scheme, host,
and port. Basically, two URIs are considered from the same origin if and only if they have
the same scheme, host and port. The path is not considered in the origin value.
The following examples show mismatched schemes, hosts, and ports (and therefore
different origins):
•
•
•
https://www.example.com and http://www.example.com
http://www.example.com and http://example.com
http://example.com:8080 and http://example.com:8081
The following examples are URLs of the same origin:
http://www.example.com/page1.html and http://www.example.com/page2.html.
Cross-Document Messaging overcomes the same-origin limitation by allowing
messages to be exchanged between different origins. When you send a message, the
sender specifies the receiver’s origin and when you receive a message the sender’s origin
is included as part of the message. The origin of the message is provided by the browser
and cannot be spoofed. On the receiver’s side, you can decide which messages to process
and which to ignore. You can also keep a “white list” and process only messages from
documents with trusted origins.
Cross-Document Messaging is a great example of where the HTML5 specification
simplifies communication between web applications with a very powerful API. However,
its focus is limited to communicating across windows, tabs, and iFrames. It does not
address the complexities that have become overwhelming in protocol communication,
which brings us to WebSocket.
Ian Hickson, the lead writer of the HTML5 specification, added what we now call
WebSocket to the Communication section of the HTML5 specification. Originally called
TCPConnection, WebSocket has evolved into its own independent specification. While
WebSocket now lives outside the realm of HTML5, it’s important for achieving real-
time connectivity in modern (HTML5-based) web applications. WebSocket is also often
discussed as part of the Connectivity area of HTML5. So, why is WebSocket meaningful
in today’s Web? Let’s first take a look at older HTTP architectures where protocol
communication is significant.
Overview of Older HTTP Architectures
To understand the significance of WebSocket, let’s first take a look at older architectures,
specifically those that use HTTP.
HTTP 101 (or rather, HTTP/1.0 and HTTP/1.1)
In older architectures, connectivity was handled by HTTP/1.0 and HTTP/1.1. HTTP is
a protocol for request-response in a client/server model, where the client (typically a
web browser) submits an HTTP request to the server, and the server responds with the
3
CHAPTER 1 ■ INTRODUCTION TO HTML5 WEBSOCKET
requested resources, such as an HTML page, as well as additional information about the
page. HTTP was also designed for fetching documents; HTTP/1.0 sufficed for a single
document request from a server. However, as the Web grew beyond simple document
sharing and began to include more interactivity, connectivity needed to be refined to
enable quicker response time between the browser request and the server response.
In HTTP/1.0, a separate connection was made for every request to the server, which,
to say the least, did not scale well. The next revision of HTTP, HTTP/1.1, added reusable
connections. With the introduction of reusable connections, browsers could initialize a
connection to a web server to retrieve the HTML page, then reuse the same connection
to retrieve resources like images, scripts, and so on. HTTP/1.1 reduced latency between
requests by reducing the number of connections that had to be made from clients to servers.
HTTP is stateless, which means it treats each request as unique and independent.
There are advantages to a stateless protocol: for example, the server doesn’t need to keep
information about the session and thus doesn’t require storage of that data. However, this
also means that redundant information about the request is sent for every HTTP request
and response.
Let’s take a look at an example HTTP/1.1 request from a client to a server. Listing 1-1
shows a complete HTTP request containing several HTTP headers.
Listing 1-1. HTTP/1.1 Request Headers from the Client to the Server
GET /PollingStock/PollingStock HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.5)
Gecko/20091102 Firefox/3.5.5
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://localhost:8080/PollingStock/
Cookie: showInheritedConstant=false; showInheritedProtectedConst
ant=false; showInheritedProperty=false; showInheritedProtectedPr
operty=false; showInheritedMethod=false; showInheritedProtectedM
ethod=false; showInheritedEvent=false; showInheritedStyle=false;
showInheritedEffect=false;
Listing 1-2 shows an example HTTP/1.1 response from a server to a client.
Listing 1-2. HTTP/1.1 Response Headers from the Server to the Client
HTTP/1.x 200 OK
X-Powered-By: Servlet/2.5
Server: Sun Java System Application Server 9.1_02
Content-Type: text/html;charset=UTF-8
Content-Length: 321
Date: Wed, 06 Dec 2012 00:32:46 GMT
4
CHAPTER 1 ■ INTRODUCTION TO HTML5 WEBSOCKET
In Listings 1-1 and 1-2, the total overhead is 871 bytes of solely header information
(that is, no actual data). These two examples show just the request’s header information
that goes over the wire in each direction: from the client to the server, and the server to
client, regardless of whether the server has actual data or information to deliver to the
client.
With HTTP/1.0 and HTTP/1.1, the main inefficiencies stem from the following:
•
•
HTTP was designed for document sharing, not the rich,
interactive applications we’ve become accustomed to on our
desktops and now the Web
The amount of information that the HTTP protocol requires to
communicate between the client and server adds up quickly the
more interaction you have between the client and server
By nature, HTTP is also half duplex, meaning that traffic flows in a single direction at
a time: the client sends a request to the server (one direction); the server then responds
to the request (one direction). Being half duplex is simply inefficient. Imagine a phone
conversation where every time you want to communicate, you must press a button, state
your message, and press another button to complete it. Meanwhile, your conversation
partner must patiently wait for you to finish, press the button, and then finally respond
in kind. Sound familiar? We used this form of communication as kids on a small scale,
and our military uses this all the time: it’s a walkie-talkie. While there are definitely
benefits and great uses for walkie-talkies, they are not always the most efficient form of
communication.
Engineers have been working around this issue for years with a variety of well-known
methods: polling, long polling, and HTTP streaming.
The Long Way Around: HTTP Polling, Long Polling,
and Streaming
Normally when a browser visits a web page, an HTTP request is sent to the server that
hosts that page. The web server acknowledges the request and sends the response back
to the web browser. In many cases, the information being returned, such as stock prices,
news, traffic patterns, medical device readings, and weather information, can be stale by
the time the browser renders the page. If your users need to get the most up-to-date real-
time information, they can constantly manually refresh the page, but that’s obviously an
impractical and not a particularly elegant solution.
Current attempts to provide real-time web applications largely revolve around
a technique called polling to simulate other server-side push technologies, the most
popular of which is Comet, which basically delays the completion of an HTTP response to
deliver messages to the client.
Polling is a regularly timed synchronous call where the client makes a request to the
server to see if there’s any information available for it. The requests are made at regular
intervals; the client receives a response, regardless of whether there’s information.
Specifically, if there’s information available, the server sends it. If no information is
available, the server returns a negative response and the client closes the connection.
5