A Data-Oriented (and Beyond) Network Architecture
∗†
Teemu Koponen
tkoponen@iki.fi
‡
Andrey Ermolinskiy
andreye@cs.berkeley.edu
mchawla@cs.berkeley.edu
‡
Mohit Chawla
‡
Kye Hyun Kim
kyekim@cs.berkeley.edu
‡
Ion Stoica
istoica@cs.berkeley.edu
‡
∗‡
Byung-Gon Chun
bgchun@cs.berkeley.edu
Scott Shenker
shenker@icsi.berkeley.edu
ABSTRACT
The Internet has evolved greatly from its original incarnation. For
instance, the vast majority of current Internet usage is data retrieval
and service access, whereas the architecture was designed around
host-to-host applications such as telnet and ftp. Moreover, the
original Internet was a purely transparent carrier of packets, but
now the various network stakeholders use middleboxes to improve
security and accelerate applications. To adapt to these changes, we
propose the Data-Oriented Network Architecture (DONA), which
involves a clean-slate redesign of Internet naming and name
resolution.
Categories and Subject Descriptors
C.2.5 [Computer-Communication Networks]: Local and Wide-
Area Networks—Internet; C.2.1 [Computer-Communication Net-
works]: Network Architecture and Design
General Terms
Design
Keywords
Naming, Internet architecture, name resolution, data, middleboxes
1
The DNS name resolution system is a fundamental part of today’s
Internet, underlying almost all Internet usage. However, the DNS
was developed rather late in the Internet’s evolution, after many basic
pieces of the architecture were in place. For instance, TCP sessions
were already bound to IP addresses and the Berkeley Socket API
referred to addresses, not names; frozen design decisions, such as
these, limited the extent to which DNS names (or any other naming
∗
International Computer Science Institute (ICSI)
†
Helsinki Institute for Information Technology (HIIT)
‡
UC Berkeley, Computer Science Division
Introduction
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGCOMM’07, August 27–31, 2007, Kyoto, Japan.
Copyright 2007 ACM 978-1-59593-713-1/07/0008 ...$5.00.
system) could permeate the architecture. As a result, the current role
of naming in the architecture is more an accident of history than the
result of principled architectural design. In this proposal, we take a
“clean-slate” look at naming and name resolution.
The test of any architecture is whether it gracefully accommodates
a wide spectrum of potential uses (and can withstand potential
abuses), both those we encounter in the present and those we
anticipate for the future. However, to motivate our design, we
first focus more narrowly on one particular issue; the shift in usage
from host-centric to data-centric applications.
The first Internet applications, such as file transfer and remote
login, focused strictly on host-to-host communication: The user
explicitly directed the source to communicate with another host,
and the network’s only role was to carry packets to the destination
address in the packet header. The Internet architecture was built
around this host-to-host model and, as a result, the architecture is
well-suited for communication between pairs of stationary hosts.
Today, however, the vast majority of Internet usage is data
retrieval and service access, where the user cares about content
and is oblivious to location. That is, the user knows that she wants
headlines from CNN, or videos from YouTube, or access to her
bank account, but does not know or care on which machine the
desired data or service resides. The current architecture can support
data/service access, as is obvious from its prevalence on the Internet,
but it does not fit comfortably within the host-to-host model. For
instance, consider the following three user-relevant issues:
• Persistence: once given a name for some data or service,
the user would like that name to remain valid as long as the
underlying data or service is available. There should be no
equivalent of today’s “broken links” when data is moved to
another site. Today, HTTP redirect and dynamic DNS are
used to minimize this problem, but they are not sufficient
answers. For instance, neither works if the data changes
administrative domains, unless the operator of the previous
domain provides perpetual support.
• Availability: data and services should have high availability,
in terms of both reliability and low-latency. Availability
is usually provided by replication at endpoints, and the
network’s role is to allow user requests to find nearby
copies. The first large-scale solution to this was deployed
by Akamai, using intelligent DNS servers and URL rewriting.
More recently, P2P mechanisms like BitTorrent have become
prevalent. While the success of these mechanisms is
undeniable, it is not at all clear that such a fundamental
requirement, availability, should rely on a set of ad hoc and
application-specific mechanisms.
• Authenticity: users would like to know that the data came
from the appropriate source, rather than from some spoofing
adversary. Today this requires a PKI to provide users with
the public key of the provider. Moreover, authenticity today
is typically achieved by securing the channel to the source,
rather than explicitly authenticating the data.
Thus, several of the most natural features one would want for
service access and data retrieval — persistence, availability, and
authentication — are made unnecessarily hard by the current host-
to-host model of the Internet, often requiring awkward or expensive
work-arounds. Given this discordance between historical design
(host-oriented) and current usage (data-oriented), we ask: what
would the architecture look like if we built it around service and
data access?
Somewhat surprisingly, our research suggests that most of the
necessary changes reside in how Internet names are structured and
resolved. We propose replacing DNS names with flat, self-certifying
names, and replacing DNS name resolution with a name-based
anycast primitive that lives above the IP layer. We call the resulting
design the Data-Oriented Network Architecture (DONA).
DONA improves data retrieval and service access by providing
stronger and more architecturally coherent support for persistence,
availability, and authentication. It can also be extended to provide
support for caching and RSS-like updates. However, DONA’s
impact is not limited to data and service access; we use these
applications as motivating examples because they force us to think
differently about some fundamental issues, but most of these issues
are not particular to data/service access. As a result, as we describe
below, DONA’s overall design has architectural implications that
range far beyond data/service access.
DONA’s name-based anycast primitive is useful for many kinds
it can provide the basic
of resource discovery; for instance,
primitives underlying SIP, support host mobility and multihoming,
and establish forwarding state for interdomain multicast. Placing
anycast at the naming layer, rather than at the IP layer, allows us
to design for functionality rather than be limited by concerns about
scalability, since the mechanisms need not operate at link speed.
There is another issue where historical design is at odds with
current usage. The original Internet architecture, following the
end-to-end principle, intended the network to be a purely trans-
parent carrier of packets. Today, however, the various network
stakeholders (such as enterprises) use middleboxes to improve
security (e.g., firewalls, proxies) and accelerate applications (e.g.,
caches) [3]. Because DONA’s anycast name resolution process
follows essentially the same administrative path as the ensuing
data packets (we will address the subtleties in this statement later),
DONA can treat the stakeholders along the path as relevant Internet
actors. This allows DONA to provide clean support for network-
imposed middleboxes. This isn’t a repudiation of the end-to-end
principle, in that functionality is still provided at the ends; it is
merely a recognition that operators should have, at their disposal,
architecturally coherent mechanisms to control how and what traffic
traverses their network.
More recently, there has been much hand-wringing about the
scalability of routing in the current addressing paradigm [27].
DONA’s anycast primitive provides a discovery mechanism that
lives above the IP layer; as we will later describe, this enables the
use of path-labels rather than global addresses, an approach that
results in tiny interdomain routing tables.
At a more speculative level, DONA represents a partial shift away
from sender-based primitives to a more receiver-based approach.
One of our future research tasks is to explore how far we can go in
this direction, and what this might mean for a future Internet.
These architectural implications encouraged us that DONA
is not merely restricted to data and service access (which, by
itself, is significant as it is by far the dominant usage on the
current Internet), but rather facilitates improvements along many
dimensions. However, there are many other issues, not discussed
here, that demand attention: the Internet still needs better security
(particularly against DoS and malicious/misconfigured routers),
better manageability, better usability, and many other properties.
We aren’t proposing DONA as a solution to these problems; in fact,
we think DONA is largely orthogonal to them. We hope to eventually
incorporate work on these problems within a larger framework that
also includes DONA.
The next section presents DONA’s basic design and Section 3
describes how this design supports such tasks as server selection,
mobility, multihoming, session initiation, and interdomain multicast
state establishment. Section 4 discusses how DONA’s infrastructure
could be extended to support more advanced functionality, such
as content delivery, delay-tolerant networking, and a variety of
administrative access policies (including middlebox insertion).
Section 5 discusses our prototype implementation and Section 6
addresses the crucial question of DONA’s feasibility. The name-
based anycast primitive will require routing on a very large
namespace, but it need only be done at name resolution speeds,
not line speeds; we present various estimates indicating that DONA
is within reach of today’s technology.
We delay our discussion of related work until Section 7, in order
to have enough context to make the necessary connections. For now,
we merely note that almost every aspect of our design is (proudly)
stolen from elsewhere, most notably from TRIAD [19], HIP [28],
and SFS [25]. It is the synthesis of these various ideas into a coherent
architecture that we claim as our contribution.
The paper ends, in Section 8, with some speculations on DONA’s
broader architectural implications. In particular, we discuss the
possibility of basing the interface offered to applications on DONA’s
name-based anycast primitive.
2 Basic Design
DONA involves a major redesign of Internet naming. In this section
we first motivate these changes and then present DONA’s naming
structure and name resolution mechanism. This is followed by a
brief discussion of security and addressing issues. For lack of space,
many details are omitted.
2.1 Motivation
We start with the problem of service and data access and ask how
we might easily achieve persistence, availability, and authentication,
basic tasks which today are (sometimes badly) handled by external
ad hoc mechanisms.
In DONA, we propose a strict separation
of concerns between naming and name resolution in handling
these tasks: names handle persistence and authenticity, while name
resolution handles availability.
To provide persistence and authenticity, we use flat, self-certifying
names [25,28]. This form of naming is, by now, a standard technique.
As we review shortly in Section 2.2, such names will remain
invariant and enable easy authentication. The use of flat names
makes informal identification harder (since you can’t remember
your friend’s 128-bit identifier), but it makes formal authentication
easier.
High availability means that when a user requests data by name,
she receives the data quickly and reliably. To provide availability,
the name resolution process should (a) guide requests to nearby
copies of the data, and (b) avoid failed or overloaded servers. The
question, then, is how to build such a name resolution process.
There are two main resolution paradigms in the literature. The
first is what we use today: lookup-by-name in a distributed database,
which returns the location (IP address) of a nearby copy. This
database must maintain locations of all the copies, identify the
location of the requester, and then find a reasonably good match
between the two. Akamai has pioneered the development of
techniques to accomplish this, but it clearly requires significant
mechanism to achieve.
The other possibility, most notably used in TRIAD [19], is to
route-by-name to the closest copy. Routing protocols are designed
to find shortest paths and route around failures, exactly the two
tasks (a) and (b) we’ve assigned to name resolution. This led us
to conclude that route-by-name, rather than look-up, was the most
natural approach. We discuss our design for this in Section 2.3.
We note that there is one other issue — in addition to persistence,
authenticity, and availability — that the user cares about, trustwor-
thiness: users would like to know whether they are getting their
information from a reliable source. We believe that this, like several
other issues we discuss below, is best handled by mechanisms
that are external to the architecture. Trust is so idiosyncratic and
subjective that we don’t believe any network architecture should
mandate the mechanisms by which trust is established. Moreover,
by remaining outside of the core architectural structures, external
trust mechanisms can evolve along with changes in society and
institutions in a way that a fixed architecture can’t. Today, a
variety of external mechanisms, ranging from Google to personal
recommendations, are used to establish trust. We expect that new
trust mechanisms, such as reputation systems and enhanced “webs-
of-trust”, will be developed in the future, but they will (and should)
lie outside the confines of the architecture.
2.2 Naming
DONA names are organized around principals. Each principal is
associated with a public-private key pair, and each datum or service
or any other named entity (host, domain, etc.) is associated with a
principal. Names are of the form P:L where P is the cryptographic
hash of the principal’s public key and L is a label chosen by the
principal, who ensures that these names are unique. The granularity
of naming is left up to principals; a principal might choose to just
name her web site, or name her web site and each page within it, or
name at a finer granularity (such as naming each individual photo or
publication).
Principals are considered to own their data, in the sense that only
hosts authorized by the principal P can offer to serve (i.e., provide
access to) entities with names of the form P:L. Each datum comes
with metadata including the principal’s public key and the principal’s
signature of the data; thus, when we speak of a client retrieving data
we mean it has received the triplet
(along with perhaps other metadata).1 In such a scheme, requesting
clients rely on the principal’s signature to ensure the data’s integrity.
These names are application-independent and globally unique
(and can refer to anything, not just data or services). They are also
self-certifying in the following sense [25,28]: When a client asks for
a piece of data with name P:L and receives the triplet , it can immediately verify that the data did indeed
come from the principal by checking that the public key hashes to P,
1The signature might be signed with a secondary key, but
accompanied by a certificate from the principal’s key authorizing
the secondary key.
and that this key also generated the signature. This satisfies the need
for authentication; persistence follows from the fact that the names
don’t refer to location, and thus the data can be hosted anywhere.
With a slight alteration, these basic ideas can be naturally applied
to immutable data: here, the label L is the cryptographic hash
of the contents of the data and the principal P is the purveyor of
the data, not the owner; for instance, the purveyor could be the
hosting CDN. Since the client need not rely on a principal to ensure
the integrity of the data (the hash over the contents ensures this),
the only role of the principal is to ensure data delivery. Since
different CDNs may have different degrees of reliability or coverage,
clients seeking immutable data might specifically request it from a
particular purveyor.
Note that there is a difference between the administrative structure
of the hosting machines, and the nature of the principal. A person’s
web page would be associated with their own public key. The web
page might initially be hosted at their university or company or paid
hosting service, but this is not reflected in the name; instead, the
owner (as we discuss below) would authorize (with some reasonable
TTL) this entity to host their web page (and this authorization could
be applied to only a specific datum, or to a portion of the principal’s
data, or to all the principal’s data). If the person decided to move
the page (for instance, if they changed employers), then they would
authorize a new entity to host their page (and let the old authorization
expire). The name of the data would not change, even though the
entity hosting the data (or service) did change.
The biggest challenge to making such flat names work is making
sure they resolve to the appropriate locations, which is what we
discuss below. But there are usage questions that must also be
addressed (though we are not the first to propose names of this form,
and several of these issues have been discussed elsewhere, such as
in [37]).
For instance, one usage question is: how will users learn these
flat, long, and user-unfriendly names? We expect that users will
learn these flat names through a variety of external mechanisms that
the user trusts (to varying degrees), such as search engines, private
communication, recommender services, and the like. Users won’t,
of course, remember the flat names directly, but will have their own
private namespace of human-readable names, which map onto these
global and flat names (as in [14]). While such flat names are harder
to use than today’s DNS names, they offer the advantage that the
mappings between private human-readable names and flat names
will be free to reflect evolving social structures rather than being
tied, as DNS names are, to a fixed administrative structure.
Also, DONA does not provide a reverse-lookup mechanism for
names, like DNS, where one can map an IP address to a name. The
basic use of reverse-lookup is to tie a user-unreadable label (in this
case an IP address) to an user-meaningful identity (in this case, a
DNS name). We think the analogous service in DONA would allow
a user to map a principal’s key (which is human-unreadable) to a
trust-chain between the principal’s key and his own that would help
the user understand the identity of the principal.
2.3 Name Resolution
We now turn from discussing the nature of DONA’s names to
presenting how DONA translates these names into locations. Our
goal in this design is to achieve high availability, by finding close-by
copies and avoiding failures.
As discussed earlier, DONA uses the route-by-name paradigm
for name resolution. Rather than use DNS servers, DONA will
rely on a new class of network entities called resolution handlers
(RHs). Name resolution is accomplished through the use of two
basic primitives: FIND(P:L) and REGISTER(P:L). A client issues a
Tier-1
RH
RH
RH
RH
RH
Copy
RH
Copy
RH
Client
Figure 1: Registration state (solid arrows) in RHs after copies
have registered themselves. RHs route client-issued FIND
(dashed arrow) to a nearby copy.
register : the received REGISTER message, if any.
regs
pref reg: preferred REGISTER to disseminate to peers/parents,
: all REGISTER messages for the name (P:L).
: the name (P:L) event concerns.
if any.
name
regs ←load(name);
if register received then
if duplicate or invalid signature then
return;
end
set timer for expiration(register);
else
// A REGISTER expired...
end
foreach out in provider and peer links do
pref reg ←decision process(out,regs);
if pref reg changed for out then
msg ←new message(pref reg);
add intra cost(out,msg);
sign message(private key,msg);
queue(out,msg);
end
end
store(name, all changes);
FIND(P:L) packet to locate the object named P:L, and RHs route
this request towards a nearby copy. REGISTER messages set up the
state necessary for the RHs to route FINDs effectively.
Each domain or administrative entity will have one logical RH
(but perhaps many physical incarnations); we will denote the RH
associated with an administrative entity X by RHX. RHX is
the provider/customer/peer (or, alternatively, parent/child/peer) of
RHY if X is the provider/customer/peer of Y in terms of AS-level
relationships. This RH structure can extend to finer granularity
than ASes to reflect other organizational and social structures; for
instance, there could be departmental RHs at universities and firms
and, going even further, users could have their own local RHs
which peer with those of their neighbors and friends. RHs use
local policy (consistent with their domain’s peering agreements)
when processing REGISTERs and FINDs.
Each client knows the location of its local RH through some local
configuration (much like they know about their local DNS server).
Any machine authorized to serve a datum or service with name P:L
sends a REGISTER(P:L) command to its local RH. Registrations
can also take the form REGISTER(P:*) if the host is serving all
data associated with the principal (or will forward incoming FIND
packets to a local copy).
Each RH maintains a registration table that maps a name to both
a next-hop RH and the distance to the copy (in terms of the number
of RH hops, or some other metric). There is a separate entry for
P:*, in addition to individual entries for the various P:L. RHs use
longest-prefix matching; if a FIND for P:L arrives and there is an
entry for P:* but not P:L, the RH uses the entry for P:*; when entries
for both P:* and P:L exist, the RH uses the one for P:L. Only when
the RH has neither P:* or P:L entries do we say that P:L does not
have an entry in the registration table.
When a FIND(P:L) arrives, the forwarding rule is straightforward:
if there is an entry in the registration table, the FIND is sent to the
next-hop RH (and if there is more than one, the choice is based
on the local policy and which entry is closest); otherwise, the RH
forwards the FIND towards its parent (i.e., its provider) using its
local policy to choose among them if the RH is multi-homed. Thus,
registration table misses are forwarded up the AS hierarchy in the
hope of finding an entry (see Figure 1). In the case of immutable
data, a FIND command can take the normal form FIND(P:L), or the
special form FIND(*:L) which indicates that the client is willing to
receive the (self-certified) data from any purveyor.
If RHX receives a REGISTER from a child (i.e., customer), it
does not forward it onward unless no such record exists or the new
REGISTER comes from a copy closer than the previous copy. If
so, RHX forwards the REGISTER to its parents and peers (after
updating its registration table). If the REGISTER comes from a peer,
the entry can be forwarded or not based on local policy (depending,
for example, on whether the AS is willing to serve as a transit AS for
content). By letting the forwarding of FINDs and REGISTERs be
driven by the local policies, DONA can faithfully respect the basic
interdomain policies as reflected in BGP. In addition, the forwarding
of a REGISTER can be terminated at any point if dictated by some
administrative policy (such as a corporate firewall).
REGISTER commands must be authenticated. The local RH
issues a challenge with a nonce, which the client must sign with P’s
private key, or sign with some other key and provide a certificate
from P empowering this other key to register this piece of data.
When forwarding REGISTERs, the RH signs it so that the receiving
RHs know that the data came from a trusted RH. These signatures
are hop-by-hop and accumulated in a REGISTER along the path.
In a similar manner, the RHs accumulate the distances;
they
append their distance/cost to the previous-hop RH before sending
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Figure 2: Pseudo code for processing received and expired
REGISTERs.
the REGISTER to next RH. REGISTER commands have a TTL
and must be refreshed periodically. DONA also provides an
UNREGISTER command so that clients can indicate that they are
no longer serving some datum. Figure 2 shows the pseudo code for
processing received and expired REGISTERs.
The FIND packet does not just resolve the name, it also initiates
the transport exchange. The FIND packet takes the form as shown
in Figure 3, where the DONA-related content is essentially inserted
as a shim layer between the IP and transport headers. The name-
based routing provided by DONA ensures that the packet reaches
an appropriate destination. If the FIND request reaches a Tier-1 AS
and doesn’t find a record associated with that principal, then the
Tier-1 RH returns an error message to the source of the FIND. If the
FIND does locate a record, the responding server returns a standard
transport-level response (the same as if the transport header had been
received on a normal data packet, not on a FIND packet). To make
this work, transport protocols should bind to names, not addresses,
but otherwise do not need to change. Similarly, application protocols
need only be modified to use names, not addresses, when calling
Type
IP header
Name (P:L, 40 bytes)
Next header type
Transport protocol header
Figure 3: Protocol headers of a FIND packet. Type is to
separate FINDs from their responses.
transport.
In fact, many applications could be simplified when
implemented on top of DONA. Using HTTP as an example, we
note that the only essential information in an HTTP initiation is the
URL and header information (such as language, etc.); the URL is
not needed, since the data is already named in a lower layer, and
if each variation of the data (such as language) is given a separate
name then the header information is also superfluous.
The packet exchanges that occur after a FIND has been received
are not handled by RHs (except, as we note in Section 4, when they
serve as caches or other middleboxes), but instead are routed to the
appropriate destination using standard IP routing and forwarding.
To this extent, DONA does not require modifications of the IP
infrastructure.
2.4 Security Issues
There are a variety of security issues that must be addressed, some
by DONA itself and some by underlying or external mechanisms.
For bandwidth denial-of-service attacks, we assume that there are
IP-level mechanisms that can restrain unwanted packet streams that
are overwhelming an RH, server, or client. For resource exhaustion
attacks against RHs, DONA relies on contractual limits providers
place on customers for the number of FINDs and REGISTERs they
can submit per time period. RHs may additionally impose other
rate-limiting techniques such as cryptographic puzzles.
We assume that as part of establishing customer/provider/ peering
relationships, peering RHs have securely exchanged their public
keys, so RHs can always ensure that they are receiving packets from
the appropriate RH. However, a malicious RH can still cause damage.
For instance, a malicious RH can refuse to forward REGISTERs
and FINDs; this is a failure of the AS, in much the same way an AS
could fail to forward packets, and presumably commercial pressures
would reduce this form of misbehavior. More subtly, a malicious
RH could forward REGISTERs overheard from other RHs. To
minimize this risk, when RHs forward a REGISTER they include
the next-hop’s public key (or its cryptographic hash).
The worst a malicious RH can do is deny a client service (since
cryptographic measures allow the client to authenticate the data). In
Section 3 we discuss ways in which clients can request access to
other copies (i.e., not just the closest one); this will allow a client
to avoid misbehaving RHs, unless the misbehaving RH lies on the
path to all copies of that particular item.
In all cases, though, RHs are commercially related to the clients
they are serving; they are paid (perhaps recursively) by either the
client or the server. Thus, DONA is not relying on the cooperation
of arbitrary entities, but is relying on the nature of a commercially
provided service. Thus, while the design should be reasonably
secure against misconfigured or subverted RHs, we anticipate that
any such problems would be detected and corrected by the provider.
In such a key-centric design, the greatest fear is key compromise.
There is no remedy except for providing effective means for key
revocation. DONA does not, itself, provide a key revocation
mechanism. However, there are a variety of mechanisms one could
use for this, such as third-parties (e.g., Google) providing databases
of revocation lists. Each revocation is cryptographically proven, so
they cannot be faked; thus the database need merely provide access
to the data, not vouch for its correctness.
DONA could itself provide a useful substrate for revocation lists
and online key status query protocols. That is, key revocations
related to a principal P (both of P’s private key, and of any secondary
keys P has used) could be stored under P:L for some special reserved
name L, and if an RH finds any entry corresponding to that name it
immediately returns notification of a key compromise.2 Thus, if a
client wanted to check a key related to P it could issue a FIND(P:L)
for this special value of L. As we describe in Section 4, DONA
also supports update functionality, so a client could subscribe to be
notified of any such revocation.
Internet Addressing
Finally, principals and replicas may ensure that their key is not
already in use by doing a FIND(P:*) on a freshly generated key P
(using DONA’s name resolution).
2.5
The DONA design, as just described, could function over the current
IP layer, with its present form of addressing and routing. However,
many think the current Internet addressing scheme is facing a loom-
ing crisis, as the increasing demand for multihoming threatens to
explode routing tables [27]. Even aside from this speculative threat,
the current addressing paradigm requires a delicate balance between
scalability (e.g., aggregation) and flexibility (e.g., multihoming,
policy routing) that isn’t always easy to achieve.
DONA’s name-based anycast primitive can remove much of the
pressure on the lower-level addressing structure by providing a
separate mechanism for path discovery. In particular, DONA could
enable IP to use path-labels (as in [22]) rather than globally routable
addresses. In what follows, we refer to the client as the source of
the FIND and the server as the node that responds to the FIND
(presumably a node that generated the REGISTER, or a caching
RH as discussed later in Section 4). Moreover, each host has a
domain-specific address; that is, for each domain within which it
is homed, that domain associates an address to that host, and that
address has no meaning outside of that domain.
In this approach, when a client sends a FIND, its source address
is originally just its domain-specific address. As the FIND is
forwarded from client to server, next-hop domain path instructions
are appended to this source address. Each such instruction has
purely local meaning; for instance, as the FIND passes from domain
A to domain B, an annotation is added to the path instruction that
tells A that the next-hop domain is B and, vice-versa, tells B that, in
the reverse direction, the next hop is A. This instruction need only be
understood by the two connected domains A and B. When the FIND
arrives at the server, the server appends its domain-specific address
to the path description. It can then reverse these path instructions
and use them for its response to the client (since reversing the order
just gives the path in the opposite direction). Similarly, when the
server’s packets arrive at the client, the client can reverse the path in
order to send packets to the server.
Because these per-hop path instructions only need to distinguish
between the various next-hop domains, they can be quite short (say,
on the order of a few bytes), so the total path instruction would
be quite short. More importantly, the interdomain routing tables
would be extremely small (and quite static); merely enough to
translate these per-hop instructions into a next-hop AS. Note that
these path-instructions would not have global meaning, since if a
source in a different domain used this path, the domain-specific
next-hop instructions would not necessarily lead to the desired
2Note that once any key revocation entry has been registered for
that principal, there is nothing the key compromiser can do to cause
its removal from the registration tables.
destination. Thus, in this design, there would be no globally
meaningful addresses and the DONA FIND/REGISTER primitives
would be required to establish end-to-end connectivity.
This approach would require the endpoints to detect AS-level path
failures and to resend a FIND in that case. This is a substantial extra
burden on connections, but it is the tradeoff for doing path-discovery
above the IP layer. Also, while this design might superficially
resemble other connection-oriented designs, with the FIND playing
the role of a connection-establishment, an important distinction is
that in DONA there is no per-flow state in the network.
This approach would produce symmetric AS-level paths, because
the path of the FINDs lay down path instructions which guide the
reverse path. There are other possibilities for how policy routing
could be handled in this case, ranging from allowing asymmetric
routes but requiring a FIND sent in the opposite direction (like many
capability mechanisms [41,43]), or giving policy control going from
the client to the core (and from the core to the client) to the providers
near the client, and giving policy control going from the core to the
server (and from the server to the core) to the providers near the
server; this would preserve AS-level path symmetry but distribute
the control differently among the ASes. Lastly, one could adopt a
NIRA-like [42] approach to providing endpoint-control. We do not
explore these options here.
3 Using DONA’s Basic Functionality
DONA’s name resolution process is essentially a name-based
anycast service: a FIND(P:L) request is routed (by RHs) to a nearby
RH at which the name P:L is registered. We now discuss a few
ways in which this basic primitive might be used (and later, in
Section 4, we discuss how RHs could be extended to provide greater
functionality). We present only a very high-level description of the
examples below, omitting (due to space) many protocol details.
Server Selection: The most basic use of DONA’s anycast primitive
is to select among several possible servers (e.g., for content distri-
bution, or replicated service access). Each server (or datacenter)
authorized by a principal P to host a service or datum named P:L
simply registers P:L at their local RH. DONA routes any FIND(P:L)
to the closest such server (closest according to the DONA routing
metric). Note that these servers could be from one or many different
CDN networks; the name P:L remains the same no matter which
server is providing the data.
P2P applications could also employ this primitive, and would
probably use immutable names where the principal would identify
the particular P2P infrastructure (these may have different degrees
of reliability and coverage). Systems like BitTorrent that break files
into chunks are also supported; the name of the file (immutable
data) is mapped to an index of the chunk names (also immutable),
and the client can then separately request each chunk.
If it so
desires, the client can then offer to serve the content, registering
both the chunks and the index. We will see later in Section 4 how
increased functionality in the RHs can further improve the CDN-like
properties of DONA.
Mobility and Multihoming: A roaming host can first unregister
from one location and then re-register at its new location. Sub-
sequent FINDs will be routed to the new location as soon as the
new registrations have installed the necessary state.3 Well-known
techniques below the transport layer [28], at the transport layer [17],
or at the session-layer [32] can be used to mask mobility from higher
layers. Multihoming is similarly straightforward: a multihomed host
3A host that suddenly loses its connectivity may have to wait until
its previous registration’s TTL expires before it can be sure that
subsequent FINDs will find it.
registers with each local RH and a multihomed domain forwards
its REGISTERs to each provider. This allows FINDs, and thus the
resulting data connections, to make use of multiple paths. While the
above discussion implicitly assumed current IP addressing, and no
need for symmetric AS-level paths, similar multihoming could be
achieved with the addressing described in Section 2.5.
Session Initiation: Rendezvous mechanisms are the core of appli-
cation-layer session initiation and presence protocols. Consider,
for instance, the Session Initiation Protocol (SIP) [31]. A SIP
user agent begins by sending a SIP INVITE message. The
SIP proxy infrastructure then routes the INVITE message to the
current location of the remote agent, which responds to begin the
session negotiation. To keep their location up-to-date in the proxy
infrastructure, SIP user agents register their current location with
registrars (often co-located with SIP proxies). This process maps
directly onto DONA’s basic primitives. SIP INVITE messages
translate to FINDs and the SIP and DONA REGISTER messages
play the same role. Thus, SIP’s rendezvous functionality can be
provided by DONA and SIP’s negotiation functionality could be
implemented directly on top of DONA’s REGISTER and FIND.
In Section 4, we show how DONA enables the network to impose
middleboxes; networks could impose (stateful and stateless) SIP
proxies to enrich SIP services, control service access, and to protect
themselves and their customers just as they do today (e.g., SIP
providers often deploy session border control boxes to rewrite SIP
messages to hide their internal network topologies). Of course, after
DONA enables the discovery, the data transfer between the two
endpoints occurs over IP.
Multicast State Establishment: It has been a long struggle to
define a simple and scalable interdomain multicast protocol. We
now show how DONA could be used to establish this interdomain
multicast state in a straightforward manner (similar to how this is
done in [29]). In this design, DONA’s anycast primitive provides
the tree discovery function, allowing a domain’s border router
that has local members in a multicast group G to discover and
establish connectivity with other domains that have members in
the group. We assume that each domain runs some intradomain
multicast protocol. Each multicast group has a name of the form P:G,
where the principal is the originator of the group. Such a structure
makes it easy to keep group names unique. When a new node in
a particular domain joins P:G, the domain’s border router Rnew
issues a FIND(P:G) packet which DONA routes to the nearest router
R that also has local members in P:G, if one exists. Upon receiving
the FIND(P:G) packet, R attaches Rnew to the overlay topology for
P:G as its neighbor and the two routers add appropriate entries to
their neighbor tables for P:G.4 To complete the join operation, Rnew
sends a REGISTER(P:G) command, announcing its membership
and willingness to serve as an attachment point for other incoming
group members. This construction ensures that the resulting overlay
topology remains acyclic at all times, thereby simplifying packet
forwarding.
To transmit a multicast packet destined for a particular group
P:G, the sender’s border router R similarly issues a FIND packet
to locate a nearby domain that belongs to the group and forwards
the packet to that domain’s border router, which in turn initiates the
packet’s dissemination. Note that if the sender’s domain has one
or more members in P:G then R is itself a member of the overlay
topology for P:G and has the forwarding state necessary to initiate
the dissemination.
4By overlay we mean that the two routers tunnel packets to each
other, so the intervening routers need not have routing state for this
multicast group.
find: the FIND packet received.
if unknown transport or application protocol then
forward to next hop(find);
else
else
if content cached and still valid then
terminate protocols(find);
send content();
// Follow the caching policy
if should be cached then
terminate protocols(find);
new find ←clone(find);
change src address(new find);
initiate protocols(new find);
send to next hop(new find);
store and send content();
forward to next hop(find);
else
end
end
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Figure 4: Pseudo code for caching logic.
Improving Content Delivery
4 Extending DONA
As we’ve just discussed, DONA’s name-based anycast primitive
provides support for a range of discovery-like tasks. Even restricted
to this capability, we believe DONA would be a better naming
foundation for the Internet than the current DNS system. In this
section we discuss extensions to DONA that broaden its impact.
These extensions involve adding functionality to the RHs, so they
take a more active role in the architecture than just forwarding
FINDs and REGISTERs .
4.1
The basic DONA design provides support for server selection,
which is useful when accessing widely replicated data or services.
However, DONA can provide better support for content delivery in
three ways.
Caching: RHs can be extended to provide a universal (i.e., general-
purpose and always on-path) caching infrastructure. An RH that
implements caching must first populate its cache. It can do so by
changing the source IP address of an incoming FIND packet to be its
own (and also source port number, if necessary) before forwarding
the FIND to the next-hop RH. This ensures that the response to the
FIND will traverse this RH, so the RH will receive the returning
data and can install it in its cache. As per current practice, cacheable
data items will be labeled with a TTL or other invalidation metadata
that determines when data becomes stale. When a FIND arrives
and there is a cache hit, the RH responds to the FIND’s source IP
address, returning appropriate transport responses which then will
proceed into a standard application-level exchange. If the RH does
not understand the transport or application-level protocol (as defined
by the port in the transport header) for a particular FIND, it does not
provide caching for that request. Figure 4 shows the pseudo code
for the above logic and Figure 5 depicts the packet exchanges.
Subscriptions: Often clients would like to subscribe for updates, as
in RSS. This can be easily accomplished by adding TTLs to FINDs.
When the server responds to such a TTL’ed FIND, it notes whether
and how long it will provide updates to the FIND. When a server
Cache hit in RH 1:
Client
RH 1
Cache miss in RH 1, RH 1 skips the data:
Client
RH 1
Cache miss in RH 1, RH 1 caches the data:
Client
RH 1
RH 2
Server
RH 2
Server
RH 2
Server
Figure 5: Three cases of processing a FIND packet when
caching is supported:
serve from the cache, skip (merely
forward the FIND), and cache. RH 2 is a middlebox for all
traffic.
updates a piece of content that has a pending TTL’ed FIND, it sends
the update to the source of the FIND. RHs can assist in this process
by caching TTL’ed FINDs, using the previously described technique
of inserting their own address before forwarding the FIND. If a
collection of RHs have cached the TTL’ed FINDs for a particular
item, they form a distribution tree that provides scalable updates
to that item. There is no explicit tree construction algorithm; it is
formed via the normal routing of FINDs.
Avoiding Misbehaving and Overloaded Servers: In general,
RHs will route FINDs to nearby copies of the data. However, some
of these servers may be misbehaving (due to failures or malicious
intent) in a way that is not visible to the RHs but which deprives
the client of a valid copy of the data. To make sure that clients
can still access the data even when their closest replica server is
misbehaving, we allow the client to ask that its FIND be routed to a
different server. In particular, we amend the REGISTER commands
to keep track of the number of servers below a particular RH (this
number is incremented as when a REGISTER is sent upwards in the
RH graph), and we amend the FIND command to allow the client to
request access to the k’th closest server, rather than the closest one.
This may be particularly useful when accessing data from relatively
unreliable P2P servers.
We can also augment DONA to allow overloaded servers to
indicate they are overloaded, so the RHs can then direct excess
load to other, less-loaded, servers. When an RH sends REGISTERs
to a peer, it includes the distance to and load on the closest server and
also, if the closest server is overloaded, the distance to the closest
underloaded server; thus, at any time, an RH knows the next-hop
towards the closest server with spare capacity. We have simulated
the algorithm and find that it comes close to matching the optimal
load distribution in several settings.
With these three changes, DONA could provide a combined
caching, updating, and load-balancing infrastructure for content
delivery. High-end commercial content providers may still require
more advanced features and more careful monitoring than DONA
can provide, but we expect even these high-end CDN services to
augment DONA’s content-delivery infrastructure rather than replace
it entirely.
4.2 Delay-Tolerant Networking
There has been recent interest in developing architectures for
networks where end-to-end connectivity may rarely exist, but
sporadic hop-by-hop connectivity is often present (see e.g., [6,
12]). These connectivity-challenged networks occur in many
settings, ranging from large (interplanetary communication), to
small (sensornets), to deep (oceanic communications). The key
concept in designing a delay-tolerant network (DTN) architecture is
message-level custody transfer; rather than the communication being
end-to-end, there are intermediate elements that take custody of the
message and then are responsible for making sure it is forwarded
onward. In DONA, the RHs can act as these custody agents (or can
deputize other hosts to act as custody agents). If an RH knows that
connectivity to its neighboring RH is intermittent, it can choose to
accept custody of the subsequent transfer in much the same way
as it places itself on the return path for caching. Previous work on
DTNs [12] has also embraced name-based routing. Though these
two uses of name-based routing are realized somewhat differently,
the similarity in approaches suggests that DONA might provide a
reasonable substrate for DTN architectures.
4.3 Access Rules and Middleboxes
The original Internet architecture was built around the end-to-end
principle, with intermediate routers playing no role other than
forwarding packets to their destination. However, commercial and
security pressures have created an Internet where layer-violating
access policies and on-path middleboxes are common. There is no
support for access rules and middleboxes in the current architecture;
some maintain that this is a feature, not a bug, but we suggest that the
presence of so many access rules and middleboxes is an indication
of important unmet needs in the current architecture, and that any
viable future design must address those needs.
One can view FINDs and REGISTERs as a general signaling
mechanism; these are addressed at the IP layer to each RH along the
path, and as such the RHs are not violating layering by inspecting
them. Corporate networks may choose to not forward REGISTERs
originating at internal RHs to external RHs if they don’t want
internal content available to the public. Similarly, such networks
might also choose to not forward FINDs originating at external
RHs to internal servers, or perhaps make the decision based on the
application-port found in the FIND’s transport header.
More generally, there are several natural policy decisions an RH
can make upon receiving a FIND. The RH could deny the FIND,
either failing silently or returning an error code. The RH could
also ask the source of the FIND for authentication; for instance,
it could ask for credentials proving that the user was indeed an
employee. DONA itself would not standardize the format and nature
of these credentials; DONA would merely provide a channel for
authentication protocols to exchange these credentials.
An RH could also impose a middlebox, such as an application-
specific proxy or a firewall, which is otherwise off-path. There
has been previous work on incorporating endpoint-imposed middle-
boxes (see [2, 38]), where either the sender or receiver desires the
packets to traverse a middlebox such as a firewall or transcoder. The
middleboxes we consider here are network-imposed; the decision
is being made by one or more of the networks carrying the packets,
not the endpoints. We expect future networks to have a mixture of
network-imposed and endpoint-imposed middleboxes, so we see the
two approaches (DONA and [2, 38]) as complementary.
An example of an authentication request and a middlebox
insertion is shown in Figure 6. Here, the FIND goes from client to
RH1 to RH2, which forwards it to the firewall FW. The firewall asks
the client for authentication, and the client responds; the firewall
then forwards the client’s FIND to the server, inserting its own IP
address in the FIND. The server responds to the firewall, which then
forwards the response to the client. Finally, the data flows between
the client and server via the explicitly addressed firewall.
FW requires authentication:
RH 1
Client
RH 2
FW
Server
Figure 6: Server-side RH imposes a firewall middlebox which
requires authentication (dashed arrow) before becoming a
proxy for client-server communication.
When RHs play the role of caches, both in responding to FINDs
and inserting themselves on the return path to receive the response,
they are acting as middleboxes (as shown in Figure 5). RHs can
similarly insert themselves as a middlebox for reasons other than
caching, and, as just shown in the example in Figure 6, can also
forward the FIND to some other box (e.g., a firewall) that inserts
itself on the path. This is done by inserting the IP address of the
middlebox in the source address field in the FIND packet (and
changing the transport source port, as well as updating checksums
and TTL fields) before it is forwarded onwards, so all returning
packets pass through the middlebox. This method of inserting
middleboxes does not violate layering, as all packets intercepted by
these middleboxes would have their IP address in the destination
field.
Lastly, an RH could also provide the capabilities required to
traverse the domain. We won’t delve into the details here, but
mechanisms similar to SIFF [41] and TVA [43] could be used (but
on a per-domain basis, not per-router basis), with the capabilities
inserted in the FIND as it traversed domains. There are two
subtleties here. First, if more than one intervening domain wants
to require capabilities, there must be a way of either concatenating
the capabilities, or somehow sharing a fixed length capabilities field.
The second subtlety is that using FINDs to gather the capabilities
works if the FIND packet and the data packets being sent in both
directions take the same AS-level (or administrative-level) path.
This symmetry is required only for the networks that use capabilities.
Today, restrictive access policies are typically only exerted at the
edges, where the paths are likely to be symmetric. However, if one
expects that capabilities might be used more generally, the path-
labels discussed in Section 2.5 could be used, because they produce
symmetric AS-level paths. In contrast to capabilities, middlebox
insertion does not require symmetric AS-level paths, because the
middlebox is explicitly addressed.
5 Implementation
To gain some experience with our design, we implemented a
Linux-based prototype of an RH and deployed it on Planetlab.
Implementing DONA helped us uncover many details that we had
neglected, but it did not raise any fundamental issues. We think the
far more interesting issue is large-scale feasibility, which we could
not address in our small-scale prototype deployment; we address
these scaling issues in Section 6.
The RH prototype is a stand-alone user-level Java daemon that
uses a TUN/TAP device to interpose itself as an overlay between
the transport and IP layer, and uses the Socket API to invoke the
kernel transport protocol implementations. Most of the core features
described earlier have been implemented, including dissemination
of REGISTERs, name-based routing of FINDs, content caching,
and protocol support for a few applications.
The implementation is comprised of the following modules: