HEED: A Hybrid, Energy-Efficient, Distributed
1
Clustering Approach for Ad-hoc Sensor Networks
Ossama Younis and Sonia Fahmy
Department of Computer Sciences, Purdue University
250 N. University Street, West Lafayette, IN 47907–2066, USA
E-mail: {oyounis,fahmy}@cs.purdue.edu
Abstract
Topology control in a sensor network balances load on sensor nodes, and increases network scalability and
lifetime. Clustering sensor nodes is an effective topology control approach. In this paper, we propose a novel
distributed clustering approach for long-lived ad-hoc sensor networks. Our proposed approach does not make any
assumptions about the presence of infrastructure or about node capabilities, other than the availability of multiple
power levels in sensor nodes. We present a protocol, HEED (Hybrid Energy-Efficient Distributed clustering), that
periodically selects cluster heads according to a hybrid of the node residual energy and a secondary parameter,
such as node proximity to its neighbors or node degree. HEED terminates in O(1) iterations, incurs low message
overhead, and achieves fairly uniform cluster head distribution across the network. We prove that, with appropriate
bounds on node density and intra-cluster and inter-cluster transmission ranges, HEED can asymptotically almost
surely guarantee connectivity of clustered networks. Simulation results demonstrate that our proposed approach is
effective in prolonging the network lifetime and supporting scalable data aggregation.
sensor networks, clustering, network lifetime, energy efficiency, fault tolerance
Index Terms
– This research has been sponsored in part by NSF grant ANI-0238294 (CAREER) and the Schlumberger Foundation.
I. INTRODUCTION
2
Sensor networks have recently emerged as a platform for several important surveillance and control
applications [1], [2]. Sensor nodes are typically less mobile, more limited in capabilities, and more
densely deployed than mobile ad-hoc networks (MANETs). This necessitates devising novel energy-
efficient solutions to some of the conventional wireless networking problems, such as medium access
control, routing, self-organization, bandwidth allocation, and security. Exploiting the tradeoffs among
energy, accuracy, and latency, and using hierarchical (tiered) architectures are important techniques for
prolonging the network lifetime.
Network lifetime can be defined as the time elapsed until the first node (or the last node) in the network
depletes its energy (dies). For example, in a military field where sensors are monitoring chemical activity,
the lifetime of a sensor is critical for maximum field coverage. Energy consumption in a sensor node
can be attributed to either “useful” or “wasteful” sources. Useful energy consumption can be due to (i)
transmitting/receiving data, (ii) processing query requests, and (iii) forwarding queries/data to neighboring
nodes. Wasteful energy consumption can be due to (i) idle listening to the media, (ii) retransmitting due
to packet collisions, (iii) overhearing, and (iv) generating/handling control packets.
A number of protocols have been proposed to reduce useful energy consumption. These protocols can
be classified into three classes. Protocols in the first class control the transmission power level at each
node to increase network capacity while keeping the network connected [3], [4]. Protocols in the second
class make routing decisions based on power optimization goals, e.g., [5], [6], [7], [8]. Protocols in the
third class control the network topology by determining which nodes should participate in the network
operation (be awake) and which should not (remain asleep) [9], [10], [11]. Nodes in this case, however,
require knowledge of their locations via GPS-capable antennae or via message exchange.
Hierarchical (clustering) techniques can aid in reducing useful energy consumption [8]. Clustering is
particularly useful for applications that require scalability to hundreds or thousands of nodes. Scalability
in this context implies the need for load balancing, efficient resource utilization, and data aggregation.
3
Routing protocols can also employ clustering [12], [13]. Clustering can be extremely effective in one-to-
many, many-to-one, one-to-any, or one-to-all (broadcast) communication.
Although many protocols proposed in the literature minimize energy consumption on forwarding paths
to increase energy efficiency, such protocols do not necessarily prolong network lifetime when certain
nodes are “popular,” i.e., present on most forwarding paths in the network. Even if dynamic routing (in
which data is forwarded to nodes with the highest residual energy) is used, it may cause problems such as
unbounded delay and routing loops. With clustering, a popular node is guaranteed to “lose its popularity”
as new clusters (and forwarding paths) are constructed. Of course, node popularity due to interest in the
data it provides can only be reduced by deploying several redundant nodes, and rotating among them
(e.g., [9]).
The essential operation in sensor node clustering is to select a set of cluster heads from the set
of nodes in the network, and then cluster the remaining nodes with these heads. Cluster heads are
responsible for coordination among the nodes within their clusters and aggregation of their data (intra-
cluster coordination), and communication with each other and/or with external observers on behalf of their
clusters (inter-cluster communication). Fig. 1 depicts an application where sensors periodically transmit
information to a remote observer (e.g., a base station). The figure illustrates that clustering can reduce the
communication overhead for both single-hop and multi-hop networks. Periodic re-clustering can select
nodes with higher residual energy to act as cluster heads. Network lifetime is prolonged through (i)
reducing the number of nodes contending for channel access, (ii) summarizing information and updates at
the cluster heads, and (iii) routing through an overlay among cluster heads, which has a relatively small
network diameter.
Clustering protocols have been investigated in the context of routing protocols [3], [14], [12], [15], [8],
or independent of routing [16], [17], [13], [18], [19], [20]. In this work, we present a general distributed
clustering approach that considers a hybrid of energy and communication cost. Based on this approach,
we present the HEED (Hybrid, Energy-Efficient, Distributed) clustering protocol. HEED has four primary
objectives [21]: (i) prolonging network lifetime by distributing energy consumption, (ii) terminating the
4
(a) Single hop without clustering
(b) Multi-hop without clustering
(c) Single hop with clustering
(d) Multi-hop with clustering
Fig. 1. Sensor information forwarding with and without clustering and aggregation
clustering process within a constant number of iterations, (iii) minimizing control overhead (to be linear
in the number of nodes), and (iv) producing well-distributed cluster heads. Our clustering approach does
not make assumptions about the distribution of nodes, or about node capabilities, e.g., location-awareness.
The approach only assumes that sensor nodes can control their transmission power level.
The problem that we address in this work has unique requirements that distinguish it from the classical
load-balancing problem in distributed systems. In classical distributed systems, a node can either be a
server or a source, but not both. A fixed number of servers is known to every source in the system, and
a server is always available for processing (see [22] for more details). In our model, every node can act
as both a source and a server (cluster head), which motivates the need for efficient algorithms to select
servers according to the system goals outlined below. A node only knows about the servers that are within
its reachable range, which implies that achieving global goals cannot always be guaranteed but can be
approximated through intelligent local decisions. Finally, a node may fail if its energy resource is depleted,
which motivates the need for rotating the server role among all nodes for load balancing.
5
The remainder of this paper is organized as follows. Section II describes the network model and states
the problem that we address in this work. Section III presents the HEED protocol and argues that it
satisfies its objectives. Section IV shows HEED effectiveness via simulations, and compares it to other
clustering techniques. Section V discusses applications that can use HEED, and compares HEED with a
generalized energy-efficient version of LEACH [8]. Section VI discusses some of the HEED design issues
and possible extensions. Section VII briefly surveys related work. Finally, Section VIII gives concluding
remarks and directions for future work.
II. PROBLEM STATEMENT
We first describe the network model and then give our objectives.
A. Network Model
Consider a set of sensors dispersed in a field. We assume the following properties about the sensor
network:
• The sensor nodes are quasi-stationary. This is typical for sensor network applications.
• Links are symmetric, i.e., two nodes v1 and v2 can communicate using the same transmission power
level.
• The network serves multiple mobile/stationary observers, which implies that energy consumption is
not uniform for all nodes.
• Nodes are location-unaware, i.e. not equipped with GPS-capable antennae. This justifies why some
techniques, such as [10], [23] are inapplicable.
• All nodes have similar capabilities (processing/communication), and equal significance. This motivates
the need for extending the lifetime of every sensor.
• Nodes are left unattended after deployment. Therefore, battery re-charge is not possible. Efficient,
energy-aware sensor network protocols are thus required for energy conservation.
• Each node has a fixed number of transmission power levels. An example of such sensor nodes are
6
Berkeley Motes [24]. It is typically straightforward to set the transmission power level via the standard
ioctl() system call.
Let the clustering process duration, TCP , be the time interval taken by the clustering protocol to cluster
the network. Let the network operation interval, TN O, be the time between the end of a TCP interval and
the start of the subsequent TCP interval. We must ensure that TN O TCP to reduce overhead. (Section V
further discusses how to set TN O.) Although we assume that nodes are not mobile, clustering can still
be performed if nodes that announce their willingness to be cluster heads are quasi-stationary during the
TCP interval in which they are selected, and the ensuing TN O interval. Nodes that travel rapidly in the
network may degrade the cluster quality, because they alter the node distribution in their cluster.
We currently assume that node failures are primarily caused by energy depletion. In Section VI-C, we
discuss measures to withstand unexpected node failures in hostile environments, such as volcanic areas
or military fields.
It is important to note that in our model, no assumptions are made about (1) homogeneity of node
dispersion in the field, (2) network density or diameter, (3) distribution of energy consumption among
sensor nodes, (4) proximity of querying observers, or (5) node synchronization. In Section III-C and
Section IV-D, we show that unsynchronized nodes can still execute HEED independently, but cluster
quality may be affected. For time sensitive applications, the network can be synchronized using techniques,
such as RBS [25].
B. The Clustering Problem
Assume that n nodes are dispersed in a field and the above assumptions hold. Our goal is to identify
a set of cluster heads which cover the entire field. Each node vi, where 1 ≤ i ≤ n, must be mapped to
exactly one cluster cj, where 1 ≤ j ≤ nc, and nc is the number of clusters (nc ≤ n). A node must be
able to directly communicate with its cluster head (via a single hop). Cluster heads can use a routing
protocol to compute inter-cluster paths for multi-hop communication to the observer(s), as discussed in
Section VI. The following requirements must be met:
7
1) Clustering is completely distributed. Each node independently makes its decisions based only on
local information.
2) Clustering terminates within a fixed number of iterations (regardless of network diameter).
3) At the end of each TCP , each node is either a cluster head, or not a cluster head (which we refer
to as a regular node) that belongs to exactly one cluster.
4) Clustering should be efficient in terms of processing complexity and message exchange.
5) Cluster heads are well-distributed over the sensor field, and have relatively high average residual
energy compared to regular nodes.
III. THE HEED PROTOCOL
In this section, we describe the HEED protocol. First, we define the parameters used in the clustering
process. Second, we present the protocol design and pseudo-code. Finally, we prove that the protocol
meets its requirements.
A. Clustering Parameters
The overarching goal of our approach is to prolong network lifetime. For this reason, cluster head
selection is primarily based on the residual energy of each node. Measuring this residual energy is not
necessary, since the energy consumed per bit for sensing, processing, and communication is typically
known, and hence residual energy can be estimated. To increase energy efficiency and further prolong
network lifetime, we also consider intra-cluster “communication cost” as a secondary clustering parameter.
For example, cost can be a function of neighbor proximity or cluster density.
We use the primary clustering parameter to probabilistically select an initial set of cluster heads, and
the secondary parameter to “break ties” among them. A tie in this context means that a node falls within
the “range” of more than one cluster head. To understand what “range” denotes in this case, observe that
a node typically has a number (e.g., 6) of discrete transmission power levels. Thus, the cluster range
or radius is determined by the transmission power level used for intra-cluster announcements and during
8
clustering. We refer to this level as the cluster power level. The cluster power level should be set to one of
the lower power levels of a node, to increase spatial reuse, and reserve higher power levels for inter-cluster
communication. These higher power levels should cover at least two or more cluster diameters to guarantee
that the resulting inter-cluster overlay will be connected. If this condition cannot be satisfied, then our
approach for clustering in conjunction with power level selection is inapplicable. We analyze inter-cluster
connectivity conditions in Section III-D. The cluster power level dictates the number of clusters in our
network. It is non-trivial to determine an optimal cluster power level, because network topology changes
due to node failures and energy depletion.
The secondary clustering parameter, intra-cluster communication cost, is a function of (i) cluster
properties, such as cluster size, and (ii) whether or not variable power levels are permissible for intra-
cluster communication. If the power level used for intra-cluster communication is fixed for all nodes,
then the cost can be proportional to (i) node degree, if the requirement is to distribute load among cluster
heads, or (ii)
1
node degree, if the requirement is to create dense clusters. This means that a node joins the
cluster head with minimum degree to distribute cluster head load (possibly at the expense of increased
interference and reduced spatial reuse), or joins the one with maximum degree to create dense clusters.
We use the terms minimum degree cost and maximum degree cost to denote these cost types. Observe that
inter-cluster communication is not incorporated in the cost function since local information is insufficient
in this case.
Now consider the case when variable power levels are allowed for intra-cluster communication. Let
MinP wri denote the minimum power level required by a node vi, 1 ≤ i ≤ M, to communicate with a
cluster head u, where M is the number of nodes within the cluster range. We define the average minimum
reachability power (AMRP) as the mean of the minimum power levels required by all M nodes within the
P
cluster range to reach u, i.e., AMRP =
M
i=1 M inP wri
M
. If each node is allowed to select the appropriate
power level to reach its cluster head, then AMRP provides a good estimate of the communication cost.
The AMRP of a node is a measure of the expected intra-cluster communication energy consumption if
this node becomes a cluster head. Using AMRP as cost in selecting cluster heads is superior to just