﻿ABSTRACT
On the Effectiveness of Automatic Patching
We study the effectiveness of automatic patching and quantify
the speed of patch dissemination required for worm containment.
We focus on random scanning as this is representative
of current generation worms, though smarter strategies
exist. We find that even such “dumb” worms require
very fast patching. Our primary focus is on how delays due
to worm detection and patch generation and dissemination
affect worm spread. Motivated by scalability and trust issues,
we consider a hierarchical system where network hosts
are partitioned into subnets, each containing a patch server
(termed superhost). Patches are disseminated to superhosts
through an overlay connecting them and, after verification,
to end hosts within subnets. When patch dissemination delay
on the overlay is negligible, we find that the number of
hosts infected is exponential in the ratio of worm infection
rate to patch rate. This implies strong constraints on the
time to disseminate, verify and install patches in order for
it to be effective. We also provide bounds that account for
alert or patch dissemination delay. Finally, we evaluate the
use of filtering in combination with patching and show that
it can substantially improve worm containment. The results
accommodate a variety of overlays by a novel abstraction of
minimum broadcast curve. They demonstrate that effective
automatic patching is feasible if combined with mechanisms
to bound worm scan rate and with careful engineering of the
patch dissemination. The results are obtained analytically
and verified by simulations.
Categories and Subject Descriptors
C.0 [Computer Systems Organization]: General
General Terms
Performance, Security
Keywords
Patching, Software Updates, Automatic Updates, Epidemic,
Worm, Virus, Minimum Broadcast Curve
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage
and that copies bear this notice and the full citation on the first page.
To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
WORM’05, November 11, 2005, Fairfax, Virginia, USA.
Copyright 2005 ACM 1-59593-229-1/05/0011 ...$5.00.
Milan Vojnovi ′
c and Ayalvadi Ganesh
Microsoft Research
{milanv,ajg}@microsoft.com
1. INTRODUCTION
1.1 Problem statement
Most Internet worms observed to date have been reverse
engineered from patches released to address some vulnerability.
Thetimebetweenpatchreleaseandwormappearance
has been shrinking; e.g., the Witty worm [5] was observed
little more than a day after patch release. It may soon happen
that a worm appears before the patch is available (aka
zero-day worms). How can we limit the spread of such a
worm? The doubling time of the number of infected hosts
for some worms is in the order of tens of seconds, which
means Internet scale infection is possible within minutes.
Clearly, human response is too slow and a worm containment
system must be automatic. An automatic patching
system requires (i) worm detection, (ii) patch generation,
and (iii) patch dissemination, verification and installation.
Our work is primarily concerned with item (iii). The first
two items have been considered in work by others, discussed
later. Incorporating the effect of delays due to items (i)
and (ii) through appropriate choice of the initial conditions
(number of infected hosts), we consider the system from
the time instant at which the worm has been detected and
a patch generated. The effect of worm detection delay on
the number of ultimately infected hosts is accommodated
through appropriate choice of the initial number of infectives.
The two processes of patch dissemination from the
server at which it was generated, and worm spread from the
population of infected hosts, are in a race. Our objective is
to patch vulnerable hosts before they can be infected. How
fast does patch dissemination have to be in order to win the
race, i.e., to limit worm spread to a specified multiple of the
number of initially infected hosts? Note that we do not propose
a new automatic patching system, but rather provide
analytical results that apply to a broad set of patching systems.
We complement and verify our results by simulations.
1.2 Why is the problem non-trivial?
If vulnerable hosts could be patched many orders of magnitude
faster than the worm can spread, then the problem
would be rendered of marginal interest. But this is not easy
in practice. Suppose a centralized infrastructure is used for
patch dissemination, e.g. the present-day Microsoft Automatic
Update. Then the time to disseminate patches to
250M+ Windows PCs would be of the order of hours, which
ismuchlongerthanthetimeforwormspread. Thisdifficulty
could be ameliorated by using a content distribution service;
e.g. Akamai has about 15000 servers, each of which would
need to serve patches to about 15000 client machines in the
above example (assuming perfectly balanced servers). But
worms can also be made faster by using smarter strategies
such as subnet-biased scanning, so the arms race between
worms and countermeasures is likely to continue for some
time into the future. Hence, a quantitative understanding
of how the relative speeds of worm and patch influence the
outcome of the epidemic is important.
1.3 Related work
Much work has been done on studying worms and their
containment [1, 6, 9, 13, 15]. We do not aim at addressing
all related work, but only that which to our knowledge
is closely related to automatic patching. There are several
schemes for worm detection, e.g., (i) honeypots: these monitor
unused segments of IP address space. The presumption
is that scans to these addresses are either due to malicious
attempts or misconfigured protocols (ii) detecting anomalous
scanning behavior either at end hosts or in the network
(iii) detecting worm signatures either by looking for common
patterns in network traffic or by analysing data and
control flow of computer program executions. Automatic
patch generation is addressed in Vigilante, which was proposed
by Costa et al [4], and motivates the work in this
paper. Vigilante is an end-host based system for automatic
worm containment. Its main feature is that when a host
detects a worm, it generates a self-certifying alert (SCA).
Such alerts identify a vulnerability and provide a machinegenerated
proof of it which can be verified by any recipient.
Vigilante uses a structured overlay to propagate alerts to all
hosts in the system, which then use some mechanism to generate
a filter which essentially corresponds to a patch. The
self-certifying property of alerts is important as it solves the
problem of trust and the concomitant one of attacks using
the automatic response system. A similar system was also
proposed by Sidiroglou and Keromytis [14]. There has also
beenworkonanalysingthecompetingprocessesofpatching,
filtering and worm spread [13, 15, 18]. These works typically
consider a ‘flat’ network for both the worm and patch processes,
whereas we study a hierarchical model motivated by
considerations of scalability and trust.
In practice, Microsoft Update is a planet-scale automatic
system to disseminate patches as well as other software updates.
At present, the system is not designed to contain
wormsatthetimescaleofwormspread, butratherforproactive
patching at a timescale of days.
2. SYSTEM MODEL AND RESULTS
2.1 Worm model
We consider worms that scan the IP address space uniformly
at random. This is the method used by many, but
notall, wormsobservedsofar. Smarterstrategiesincludebiasing
the scan in favour of the local subnet (routing worms),
and exploiting the topology of some application-layer overlay
(e.g. instant messenger buddy lists, ssh address lists
etc.) While the techniques described here can be extended
to routing worms, topological worms appear to be a much
harder problem. Random scanning worms are characterized
by the worm infection rate β, which is the rate at which
per-host scans hit vulnerable IP addresses. For concrete-
ness, β = 0.00045 per second for CodeRed 1 and β = 0.117
per second for Slammer [8]. In our discussions, we use the
infection rate of Slammer as a recurring example, but the
reader should bear in mind that this worm exhibits global
spread behavior that is somewhat different from that of a
random scanning worm; see [7] for an analysis of the underlying
causes.
2.2 Hosts, Superhosts, Subnets
The use a centralized server to distribute patches to all
clients is not scalable. Costa et al. [4] consider a fully decentralized
peer-to-peer scheme where hosts are organized into
a structured overlay over which alerts/patches are spread.
It is not clear whether such a system will be universally
deployed. In this work, we consider an intermediate setting
where hosts are partitioned into subnets and there is
a patch server in each subnet, which we call a superhost.
Superhosts are connected by an overlay, which is used to
propagate patches between them. A superhost that has received
a patch is said to be alerted, as is the subnet to
which it belongs. Once a superhost is alerted, it propagates
the patch to other superhosts as well as to hosts within its
own subnet. We think of subnets as corresponding to organizational
domains such as a university or corporation;
one could also imagine ISPs maintaining a distributed set of
superhosts. The superhost could be thought of as either enforcing
a security policy by pushing patches on to clients, or
as sending them Vigilante-style self-certifying alerts, which
the clients use to generate filters. The two are equivalent
from a modelling perspective.
2.3 Patch dissemination
The overlay connecting superhosts can be arbitrary. For
example, it could be a balanced multicast tree or any one
of several standard structured overlays (e.g. Pastry [12],
Chord [16] or CAN [11]) or an unstructured overlay (e.g.
Gnutella). Our framework only requires knowing a lower
bound on the number of alerted superhosts at any time after
the start of patch dissemination. We introduce the new
concept of minimum broadcast curve as an abstraction of
this information. Given an overlay, its minimum broadcast
curvecanbecomputedfairlyeasily. Forinstance, forak-ary
tree of J superhosts with unit per-hop delay, the broadcast
time is at least ⌈log k J⌉ and the number of hosts reached
up to time n is k n . However, trees are not robust to node
or link failures, so we might prefer to use one of the other
overlays described above. Most structured overlays have
diameter O(log(J)), so flooding on such overlays would provide
comparable broadcast time to trees but with greater
resilience to failures. The same order for broadcast time
holds in probability for dissemination on overlays by random
gossiping [10]. In our simulations, we used flooding on
the structured overlay Pastry [12], but we reiterate that our
results hold more generally.
Once a superhost is alerted, it starts sending the patch
to hosts within its own subnet in addition to propagating
the alert to other superhosts. Since subnets may be large,
we need to take into account the time needed to patch all
hosts within a subnet. For analytical tractability, we model
the time to patch hosts as independent, exponentially distributed
random variables with mean 1/µ. Intuitively, the
1 see www.caida.org/analysis/security/code-red
model corresponds to a pull system where all hosts poll a
server (superhost) to check for patches. We can either think
of µ as the per host polling rate, or of µN as the capacity of
the server per unit time, where N is the number of hosts in
the relevant subnet. A served request results in a patched
host if the request comes from a susceptible host. Thus the
patching rate is proportional to the fraction of hosts which
are susceptible. This setting conforms to polling systems
such as Microsoft Automatic Update. In practice, the superhost
will typically not resend patches to the same host –
to that extent, the pull model is somewhat pessimistic and
overestimates the number infected hosts. While we focus
on the pull model due to its prevalence in deployed systems
like Windows Update, we also briefly discuss a push system
wherein a superhost is assumed to have an inventory
list of hosts in the subnet, and patches them sequentially
in some order (see Section 4). The assumption of the pull
model renders the system Markovian and greatly simplifies
the analysis. Similar assumptions also lie behind the epidemic
models used in previous work.
Trust The requirements on trust relationships among hosts
and superhosts are out of the scope of this work. Note,
however, that this problem is largely eliminated in systems
such as Vigilante [4], as alerts are self-certifying.
2.4 Summary of results
2.4.1 Estimate of frequency of updates
Simple analysis shows that in the absence of countermeasures,
the characteristic timescale of epidemic spread is 1/β
andmosthostsareinfectedwithinsomefairlysmallmultiple
of this time; for example, the time to go from 100 infected
hosts to 1 million is about 10/β. This suggests that any reactive
countermeasure needs to have response time smaller
than the worm characteristic time. Note that the time 1/β
is about 40 minutes for Code Red and smaller than 10 seconds
for Slammer. In the case of a worm like Slammer,
this suggests that we either need to patch most hosts within
a timescale of about 1 minute or deploy additional mechanisms
like rate capping [17] in order to slow down the worm.
Inanidealizedscenario, whenalertdisseminationbetween
superhosts is instantaneous, we obtain an exact expression
relating the initial and final number of infected hosts, for a
given worm infection rate β and patching rate µ; see Corollary
1. The result implies that the ratio of final to initial
infectives is at most exp(β/µ). Thus, for instance, the final
number infected can be limited to no more than 100 times
the initial by taking µ > β/5. This is the sort of operating
regime in which we are interested. In particular, we find
that we need 1/µ to be about 3 hours for Code Red and 50
seconds for Slammer. If a subnet contains 1000 hosts, this
means that we need to patch at a rate of 5 hosts per minute
for Code Red and 20 hosts per second for Slammer.
2.4.2 Minimum broadcast curve
More realistically, we are interested in the situation when
alert dissemination to superhosts is not instantaneous. The
time it takes to alert superhosts depends on the broadcast
mechanism and the shape of the overlay used. We abstract
these details by introducing the concept of a minimum
broadcast curve: a function g(t), t ≥ 0, is said to be
a minimum broadcast curve for an overlay of superhosts if,
for any broadcast of alerts initiated at time 0, the fraction
of alerted superhosts of the overlay at time t is at least g(t).
We can explicitly compute the minimum broadcast curve
for simple topologies. It is given by (i) exponential function
(truncated at 1) for a tree, and (ii) logistic function
for hypercubes and for gossip-based dissemination. We also
verify that when flooding alerts on a Pastry [12] overlay, the
fraction of alerted superhosts over time is well approximated
by a logistic function. We can use the minimum broadcast
curve to obtain an upper bound on the fraction of hosts
that eventually become infected; the results are presented
in Theorem 1, for the specific case of a logistic minimum
broadcast curve. Theorem 3 extends the results to the case
when µ and β are both small, i.e., alert dissemination is
fast compared to both worm spread and patching. Simulation
results presented later show that this formula is indeed
accurate in realistic parameter regimes.
2.4.3 Patching and filtering
Weinvestigatetheeffectivenessofpatchingcombinedwith
filters that block worm scans in and out of alerted subnets,
and obtain a closed-form solution for the fraction of infected
and susceptible hosts at any time within non-alerted subnets.
The eventual fraction of infected hosts within a subnet
is now entirely determined by the fraction of infected hosts
in the subnet at the time instant when it became alerted.
2.5 Structure of the paper
In Section 3, we describe a model consisting of a large
number of subnets, and present results on patching using a
pull-based in Section 3.1. Section 4 briefly addresses pushbasedpatchdissemination.
InSection5, westudythepatchingsystemcomplementedwithfilters.Wepresentsomesimulation
results in Section 6 before concluding.
3. PATCHING
In this section we consider a hierarchical patch distribution
where, in the first-layer, the patch is disseminated
among superhosts and in the second-layer, superhosts push
the patch on to hosts within their subnet. We have a population
of N hosts stratified into J disjoint subnets, with
the jth subnet of size Nj. Associated with each subnet is
a superhost, denoted by the same index j. A host is either
susceptible, infected or patched. We conservatively assume
that, once infected, a host cannot be patched (on the
timescale of our automatic patching system), and also that
a patched host cannot be infected by the worm. An infected
host attempts to infect other hosts by scanning the address
space of size Ω uniformly at random at a fixed rate η. We
say that a superhost is alerted if it has received a patch and
dormant otherwise; in a system like Vigilante, we could say
that a superhost is alerted when it has received and verified
an alert, and generated a filter for that vulnerability.
Dormant superhosts do nothing. Active superhosts make
thepatchavailableforhostswithintheirsubnettopull; each
host in the subnet becomes patched after a random time
which is exponentially distributed with mean 1/µ. In addition,
activesuperhostsdisseminatealertsamongthemselves.
We shall make some specific assumptions on this dissemination
process and show how the framework accommodates
a variety of alert dissemination systems. We analyse these
models by considering an asymptotic regime (large population
limit) in which Ω, N and J increase to infinity, while
the ratios Ω/N and J/N are kept fixed.
Time scale. Without loss of generality, we scale time by
κ, the rate of alert dissemination, and henceforth take the
rate of alert dissemination to be unity. All our statements
can be interpreted without difficulty by merely replacing the
rates β and µ and times t with β/κ, µ/κ and κt respectively.
3.1 Dynamics and limit points
Denote by i(t) and s(t) the fraction of hosts which are infected
and susceptible respectively, at time t. Similarly, let
i 1 (t) and s 1 (t) be the fraction of all hosts which are infected
and susceptible respectively, and reside under alerted superhosts.
Denote by g(t) the fraction of superhosts which are
alerted. We assume that superhosts cannot be infected by
the worm, e.g., because they do not run services exhibiting
the vulnerability. The system dynamics is as follows:
g ′ (t) = g(t)(1 − g(t)) (1)
i ′ (t) = βi(t)s(t) (2)
s ′ (t) = −βi(t)s(t) − µs 1 (t) (3)
i 1′ (t) = βi(t)s 1 (t) + g(t)(i(t) − i 1 (t)) (4)
s 1′ (t) = −(βi(t) + µ)s 1 (t) + g(t)(s(t) − s 1 (t)) (5)
The last term in the derivative of i 1 (t) comes from a subnet
whose superhost undergoes transition from dormant to
alertedstate. Thishappensatrate Jg(t)(1−g(t))and, when
it does, the number of infectives belonging to subnets with
alerted superhosts jumps up by the number of infectives in
that subnet, which is N[i(t) − i 1 (t)]/[J(1 − g(t))] on average.
The last term in the derivative for s 1 (t) is explained
similarly. We can solve for g(t) explicitly:
g(t) =
g(0)
. (6)
g(0) + (1 − g(0))e−t Any alert dissemination mechanism that results in the same
time evolution of g(t) as in (6), namely a logistic function,
will yield the same solution for the dynamics (1)–(5). In
particular, the results apply equally to alert dissemination
by gossip or over a hypercube, or commonly used structured
overlays such as Pastry. We recast the system of equations
(1)–(5) into an equivalent form, by defining the auxiliary
process w(t) = s 1 (t)/s(t). Note that w(t) is the fraction of
susceptible hosts under alerted subnets in total number of
susceptible hosts at time t. We have
d
i(t) = β i(t)s(t),
dt
(7)
d
s(t) = −β i(t)s(t) − µ w(t)s(t),
dt
(8)
d
dt w(t) = µ w(t)2 − (µ + g(t)) w(t) + g(t). (9)
The last differential equation is known as Ricatti’s differential
equation and it can be solved explicitly. The interested
reader is referred to [2] for the solution and limit
points of w(t) and discussion.
Comment. A somewhat similar host immunization process
to (7)–(8) was considered by Wong et al [18]. They
do not consider patch dissemination on an overlay, hence
w(t) ≡ 1. The major distinction is that Wong et al (Section
6.1 [18]) assume a host can be patched in both infected and
susceptible state. This amounts to the following system:
d
i(t)
dt
= βi(t)s(t) − µi(t),
d
s(t)
dt
= −βi(t)s(t) − µs(t).
In contrast, we assume that an infected host cannot be
patched on the timescale of the automatic patching system.
This may be a more plausible assumption as a smart worm
may prevent patch installation on an infected host.
Wenowpresentourresults; see[2]fordetailedderivations.
Theorem 1. For the system of differential equations (1)–
(5), it holds that
i(+∞) + µ
β
� +∞
0
w(u)d log i(u) = i(0) + s(0). (10)
Corollary 1. Assume g(0) = 1, i.e. all superhosts are
initially immunized. Then, w(·) ≡ 1 and it follows that
i(+∞) + µ i(+∞)
log = i(0) + s(0). (11)
β i(0)
Indeed, in this special case, the network behaves like a single
large subnet. The result may be regarded as an approximation
for the limiting case in which patch dissemination on
the overlay connecting superhosts is much faster than either
worm spread or patch spread within subnets.
Comment. Corollary 1 has the following implication:
i(+∞) ≤ i(0) exp
�
(i(0) + s(0)) β
�
. (12)
µ
In words, the number of hosts which ever become infected
is a multiple of the number initially infected; this multiple
is at most exponential in the ratio of worm infection rate β
to patching rate µ. We see in Section 6 that this bound is a
good approximation for ranges of the value of β/µ that are
of practical interest.
For further intuition on this bound, note that the fraction
of infected hosts i(t) for a random scanning worm with
infection rate β, and with no patching, satisfies the inequality
i(t) ≤ i(0) exp(βt), for any t ≥ 0. Suppose now that
the automatic patching system ensures that all vulnerable
hosts are patched no later than time T, if not already infected
by the worm. Then it follows from the above that
i(+∞) ≤ exp(βT ). Now (12) is of the same form, but
with the deterministic upper bound T replaced by the mean
patching time 1/µ.
3.2 Effect of Alert Broadcast Time
InCorollary1, weconsideredthesituationwhenallsuperhosts
are alerted instantaneously. This yields a lower bound
onthefractionofeventuallyinfectedhostsinapracticalsystem
with a non-zero alert time. Suppose now that T > 0 is
a deterministic bound on the time to alert all superhosts; we
call it an alert broadcast time. An upper bound on the fraction
of hosts eventually infected can be obtained by making
the worst-case assumption that no superhost is alerted prior
to time T. Thus, we get:
Theorem 2. Suppose the patching rate is µ in each subnet,
and that the alert dissemination system guarantees that
the alert broadcast time is at most T. Then, the fraction of
ultimately infected hosts i(+∞) satisfies:
i(+∞) + µ
β log
�
i(+∞) i(0) + s(0)e
i(0)
−β(i(0)+s(0))T �
i(0) + s(0)
≤ i(0) + s(0).
The inequality of the theorem implies
i(+∞) + µ i(+∞)
log ≤ (i(0) + s(0))(1 + µT ).
β i(0)
As in the comment after Corollary 1, this implies
i(+∞) ≤ i(0) exp
�
(i(0) + s(0))β
�
1
µ + T
��
. (13)
Comparing the last inequality with (12), which holds when
all superhosts are alerted at the start, we note that the alert
broadcast time enters by effectively increasing the mean
patching time from 1/µ to (1/µ) + T. This is a simple and
intuitive result.
The inequality in the statement of Theorem 2 becomes
tight as the worm infection rate β and patching rate µ both
tend to zero. This limit regime corresponds to a separation
of timescales whereby alert dissemination runs on the fastest
timescale and patching on the slowest, with the timescale
of worm spread lying in between. This particular regime
is of practical interest. We formalize these statements in
the following theorem, which applies to alert dissemination
mechanisms characterized by a logistic function g(t).
Theorem 3. The patching system described by (1)–(5)
satisfies
i(+∞) + µ
β log
� �
i(+∞)
i(0)
� �
s(0)(1 − w(0))
1
∼ i(0) + s(0) + µ log
,
1 − g(0)
g(0)
where we write ∼ to mean that the ratio of the two sides
tends to 1 as µ and β tend to zero with their ratio fixed.
3.3 Minimum Broadcast Curve
The concept of minimum broadcast curve is an abstractionofthemechanismofalertbroadcastonarbitraryoverlay
topologies. It is attractive as it allows us to abstract diverse
broadcast networks by a single curve. We say that a function
m(t) is a minimum broadcast curve for a given alert
dissemination system, if, defining t = 0 as the time of first
alert, the fraction of alerted hosts on any interval [0, t] is at
least m(t). Now note that the patching system (7)–(9) holds
more generally than for g(t) a logistic function (corresponding
to alert dissemination by random gossip). It is intuitive
to expect that if we replace w(t) in (7) and (8) with some
function that is a lower bound for all t, then the resulting
solution yields an upper bound on the fraction of infected
hosts for all t. This is the content of:
Proposition 1. Consider functions f1 and f2 satisfying
0 ≤ m ≤ f1(t), f2(t) ≤ M < +∞, and the following two
systems of equations:
d
dt ij(t) = βij(t)sj(t)
d
dt sj(t) = −βsj(t)ij(t) − fj(t)sj(t)
j = 1, 2.
Fraction of alerted superhosts
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5
time (sec)
2 2.5 3
Figure 1: Empirical fraction of alerted superhosts
achieved by flooding on Pastry overlay of 100 Pastry
nodes and delayed-logistic minimum broadcast
curve. See Figure 4 in [2] for additional plots. The
figure suggests delayed-logistic curve a natural candidate
for minimum broadcast curve of standard
overlays.
Assume that f1(t) ≥ f2(t) for all t ≥ 0, and that on any
finite interval [s, t], there exists a finite partition s ≤ t1 ≤
· · · ≤ tn−1 ≤ tn = t, such that
inf f1(u) ≥ sup f2(u), all k = 1, 2, . . . , n − 1.
u∈[tk,tk+1] u∈[tk,tk+1] Then, i1(t) ≤ i2(t) and s1(t) ≥ s2(t), for all t ≥ 0, whenever
i1(0) ≤ i2(0) and s1(0) ≥ s2(0).
The conditions of the proposition are satisfied if the functions
f1 and f2 are continuous, and f1(t) > f2(t) for all t.
Its relevance is in that the solution of (7)– (9), using any
minimum broadcast curve m(t) in place of g(t), yields an
upper bound on the fraction of infected hosts in the actual
system. In Figure 1, we compare empirical broadcast curve
obtained by flooding in Pastry and a minimum broadcast
curve taken to be a logistic function; see [2] for details.
4. PUSH-BASED PATCH DISSEMINATION
We have so far discussed a pull mechanism for patch dissemination,
motivated by currently deployed systems. We
now explore push schemes for comparison. We consider two
approaches: a hierarchical scheme analogous to that studied
for pull, and a peer-to-peer scheme, either system-wide or
deployed within each subnet.
In the hierarchical scheme, each alerted superhost pushes
patches on to the N nodes in its subnet in some order, at
rate Nµ. Thus, the server capacity is taken to be the same
as in the pull system considered earlier. For a single subnet,
the system dynamics is:
d
i(t) = βi(t)s(t)
dt
(14)
d
s(t)
dt
1
= −βs(t)i(t) − µ
s(t)
1 − µt
(15)
for 0 ≤ t < 1/µ. Comparing with system (7)–(8), with
w(t) ≡ 1, we note that (14)–(15) differs for the additional
i(+∞)/i(0)
10 2
10 1
0.15
β = 0.05
10
0 1 2 3 4 5 6 7 8 9 10
0
1/µ (sec)
Figure 2: The ultimate number of infected hosts
with worm-like patch dissemination (19)–(21) versus
mean patch time 1/µ. The initial values are
i(0) = 1/1000, a(0) = 1/10000, p(0) = 0. The alert
dissemination rate is κ = 4 sec −1 .
term 1/(1 − µt). Intuitively, for the same initial fractions
of infectives and susceptibles and identical patch rate µ, the
push system will result in smaller number of infectives than
the pull system. This intuition is indeed true in view of
Proposition 1 below. The numerical values in Figure 3 illustrate
the superiority of push, for the same server load.
4.1 Worm-like patch dissemination
We noted above that a superhost can improve the effectiveness
of patching by pushing patches to hosts rather than
waiting for them to be pulled. Even better performance
is achievable using peer-to-peer dissemination. Specifically,
we consider gossip or epidemic-style protocols to disseminate
the patch within subnets since such a scheme is fast,
scalable, and robust. (The process of alert dissemination
to superhosts remains the same as before.) We make the
reasonable assumption that hosts have some side knowledge
about addresses of other hosts in the same subnet and don’t
rely on random scanning to locate them. Then, the gossip
scheme used for patch dissemination can proceed much
faster than worm infection. This is modelled by taking the
patch scan rate to be larger than the worm scan rate.
We distinguish two cases. In the first, we assume that
as soon as a host receives a patch (or alert), it is quarantined
and cannot be infected by the worm. This quarantine
state lasts until the host verifies and installs the patch. In
the second case, the host continues to be vulnerable in the
interim. In both cases, we consider a single subnet. Equivalently,
the analysis would apply if, instead of employing a
hierarchical scheme, the peer-to-peer scheme were deployed
system-wide, as in Vigilante.
4.1.1 With perfect quarantine
We deal with two epidemics both spreading in the same
manner, but the worm has smaller infection rate than the
patch. Denote the worm infection rate by β and the patch
infection rate by µ. Let p(t) denote the fraction of patched
hosts at time t. The race of worm and patch is specified by
0.2
0.1
i(+∞) / i(0)
8
7
6
5
4
3
2
PULL
PUSH
1
0 0.2 0.4 0.6 0.8 1
β / µ
1.2 1.4 1.6 1.8 2
Figure 3: The ratio of the ultimately infected hosts
to initially infected hosts in a single subnet with pull
and push patch dissemination. The initial fraction
of infectives is 10 −5 .
thefollowingdifferentialequationsdescribingtwocompeting
epidemics:
d
i(t) = βi(t)(1 − i(t) − p(t))
dt
(16)
d
p(t) = µp(t)(1 − i(t) − p(t)).
dt
(17)
The limit point of the system (16)–(17) follows readily.
Proposition 2. Thefractionofhostsultimatelyinfected,
i(+∞), is the solution of the following equation
�
1 − i(+∞)
i(+∞) = i(0)
p(0)
� β
µ
.
The proposition implies the following simple bound on the
final fraction of infected hosts:
i(+∞) ≤ i(0)e β
µ log(1/p(0)) . (18)
The last inequality fleshes out the appeal of worm-like patch
dissemination as β/µ would typically be close to 0; hence,
if the fraction of initially patched hosts p(0) is not inordinately
small, then (β/µ) log(1/p(0)) would be only slightly
larger than 0. In other words, the fraction of hosts eventually
infected exhibits only a small increase over the fraction
initially infected at the time of worm detection.
4.1.2 With imperfect quarantine
Suppose now that after a host receives a patch or alert, it
is not patched instantaneously. Instead, there is some nonzero
time required to verify the patch and install it. Such
an assumption is required if trust cannot be assumed. During
the time between receiving and installing a patch, the
host continues to be vulnerable and will become infected if
scanned by the worm. This may be a reasonable assumption
if the worm can bypass quarantine. To facilitate further
analysis, we assume that the time between receiving a
patch and installing it is an exponentially distributed random
variable with mean 1/µ at each host, and independent
across hosts. Denote by a(t) the fraction of hosts that are
alerted but not yet patched. We now have three epidemics
in place that specify the race of worm, alert, and patch:
d
i(t) = βi(t)(1 − i(t) − p(t)) (19)
dt
d
a(t) = κp(t)(1 − i(t) − p(t) − a(t))
dt
−βi(t)a(t) − µa(t) (20)
d
p(t) = µa(t). (21)
dt
Note that we assume that a host participates in disseminationofalertsonlyafteritispatched;
thisisreflectedinthe
factor p(t) in (20). The reason for this assumption is that
the time between alerting and patching includes the time
for verifying the alert; the assumption is needed to prevent
attacks based on enticing hosts to flood the network with
bogus alerts.
Thesystemofdifferentialequations(19)–(21)canbesolved
numerically. Figure 2 shows the ultimate number of infected
hosts for a range of values of β and 1/µ.The figure demonstrates
that patch verification and installation time does significantly
affect the number of ultimately infected hosts.
5. FILTERING AND PATCHING
We have until now studied a model with an overlay network
of superhosts that acted as distribution servers for
patches. In this section, we extend the model by letting
superhosts also act as filters (or firewalls). Whenever a superhost
j is alerted, a worm scan originating from or destined
to the subnet of superhost j fails with probability 1.
In other words, a worm scan at time t from subnet k to subnet
m succeeds only if neither superhost k nor superhost m
are alerted. As earlier, each host under an alerted superhost
installs a patch with rate µ.
TheraceofwormandpatchcanbedescribedbyaMarkov
process (see [2]). Here we directly proceed to the limit population
dynamics under the many hosts and many superhosts
assumptions. We first consider the subpopulation of hosts in
non-alerted subnets. Denote by i 0 (t) and s 0 (t) the fraction
of infected and susceptible hosts in non-alerted subnets. We
have the following dynamics:
d
dt i0 (t) = βi 0 (t)s 0 (t) − g(t)i 0 (t) (22)
d
dt s0 (t) = −βs 0 (t)i 0 (t) − g(t)s 0 (t) (23)
d
g(t) = g(t)(1 − g(t)). (24)
dt
These differential equations are the analogue of (1)–(5)
that we obtained in the patching only case. The equation
for g(t) is the same, and hence has the same solution, given
by (6). We obtain closed form solutions for i 0 and s 0 in this
case.
Theorem 4. The system of equations (22)–(23) has the
solution:
i 0 (t) = i 0 u
(0)
β′
s0 (0)
i0 (0)+s0 (0) +
i0 (0)
i0 (0)+s0 (0) u(t)β′
s 0 (t) = � 1 − i 0 (u(t)) � 1−g(0)u(t)
1−g(0) ,
1−g(0)u(t)
1−g(0)
(25)
with u(t) := g(t)/g(0) and β ′ := (i 0 (0) + s 0 (0))/(1 − g(0)).
Fraction of infected hosts
x 10−6
1.5
1
0.5
0.1
0.05
β=0.01
0
0 1 2 3 4 5
time (sec)
6 7 8 9 10
Figure 4: The fraction of infected hosts in nonalerted
subnets versus time, as given by (25). See [2]
for the same plot, but with respect to the fraction
of alerted superhosts.
The fraction of infected hosts i 0 (t) increases with t on
some initial interval [1, t0), attains its maximum at some
t0 > 0andthendecreasesto 0as tgoesto +∞. Thedecrease
is due to the fact that superhosts become alerted and so
the fraction of non-alerted subnets decreases over time. See
Figure 4 for numerical plots of the function i 0 (t). This is
also validated in Section 6.
The reason for being interested in the evolution of i 0 (t)
and s 0 (t) is that they can be used to obtain the number of
infectives and susceptibles in a subnet j, at the time Tj at
which superhost j becomes alerted. After this time, subnet
j is isolated and its evolution decouples from that of the rest
of the network. On [0, Tj], we have
d
dt ij(t) = βi 0 (t)sj(t), (26)
d
dt sj(t) = −βi 0 (t)sj(t). (27)
�
Defining φ(t) := exp −β � t
0 i0 �
(u)du , it follows that, in
particular, sj(Tj) = sj(0)φ(Tj) and
0.2
ij(Tj) = (1 − φ(Tj))nj + φ(Tj)ij(0). (28)
The function φ(t) can be obtained in closed-form by integrating
the solution (25):
φ(t) =
Now, for t > Tj, we have
i 0 (0) + s 0 (0)
s 0 (0) + i 0 (0)u(t) β′ .
d
dt ij(t) = βij(t)sj(t)
d
dt sj(t) = −βsj(t)ij(t) − µsj(t).
But this is a familiar dynamics that we already encountered
in Section 3.1. We have an explicit relation between the
initial and limit point of this system given in Corollary 1.
Specializing to the current context, the identity of Corol-
Fraction of infected hosts
10 0
10 −1
10 −2
N=1000
10
0 100 200 300
−3
1/µ (sec)
Fraction of infected hosts
10 0
10 −1
10 −2
N=10000
10
0 100 200 300
−3
1/µ (sec)
Fraction of infected hosts
10 0
10 −1
10 −2
N=100000
10
0 100 200 300
−3
1/µ (sec)
Figure 5: Patching all alerted subnets with the number of hosts (left) N = 1000, (middle) N = 10000, and
(right) N = 100000. Each graph shows simulation results of two settings: (lighter) i(0) = 1/100 and (darker)
i(0) = 1/1000. For each of the setting, there are 5 simulation outcomes obtained with distinct random seed.
The worm infection rate is β = 0.1. The dark solid curves are that obtained by Corollary 1. The light solid
lines are from (12).
lary 1, for a subnet j, reads
ij(+∞) + µ ij(+∞)
log
β ij(Tj) = nj. (29)
Inviewofthelastidentity, theultimatefractionofinfectives
in a subnet is fully specified given ij(Tj), i.e. given the
function φ(t) and the alert times Tj (see Equation (28)).
5.1 The effectiveness of filtering
We first evaluate filtering in the limit when alert propagation
time is negligible, so that we can assume that Tj = 0
for all j. The benefit of filtering can be discerned from
(29). To see this, suppose that nj is of the order of 1/J (or
exactly equal to 1/J, assuming equal-sized subnets); then
(29) implies ij(+∞) ≤ ij(0) exp((β/J)/µ). Recall that this
inequality is a good estimate for ij(+∞) in the limit case
where the first element in the left-hand side of (29) is much
smaller than the second element. Summing over j, we have
i(+∞) ≤ i(0) exp((β/J)/µ), which is precisely the inequality
that one would obtain for a patching system with worm
infection rate β/J and patching rate µ. Thus, for the same
initial conditions, the final number of infectives in a system
with patching and filtering and worm infection rate β
is bounded by the final number of infectives in the patching
only system with worm infection rate β/J. In other words,
filtering effectively reduces the speed of worm spread by a
factor J equal to the number of subnets, which is large in
practice. This demonstrates the potential effectiveness of
filtering in reducing worm infection rate.
Next, we take into account the effect of non-zero alert
propagation time. We consider a limit system of a large
number of subnets, each of a fixed size. Recall that once
a subnet is alerted, no further infections can come in from
outside. Furthermore, infected hosts inside the subnet scan
within the subnet with negligible probability, in the limit
considered. Hence,allnodeswithinthesubnetwillbepatched
before there are any more infections. The system dynamics
is described by (22)–(23). The fraction of infected hosts
i 1 (t) in alerted subnets evolves as:
d
dt i1 (t) = g(t)i 0 (t).
Inviewof(25), weknow i 0 (t), soitonlyremainstointegrate
g(t)i 0 (t) in order to obtain i 1 (t). We have
i 1 (t) = i1(0) + g(0)i 0 � u(t)
(0)
[i
0
0 (0) + s 0 (0)u −β′
] −1 du.
The last is a binomial integral, which we can evaluate by
expanding the integrand in a binomial series. Instead, we
use the trivial bound 1/[i 0 (0) + s 0 (0)u −β′
] ≤ (1/s 0 (0))u β′
,
so that we obtain
i 1 (t) ≤ i 1 (0) + g(0) i0 (0)
s0 1
(u(t)1+β′ − 1).
(0) 1 + β ′
This would be a good approximation in the cases of interest
where i 0 (0) and β are small. It follows
i 1 (+∞) ≤ i 1 (0) + i 0 1
(0)
s0 (0)(1 + β ′ ) eβ′ � �
log 1
g(0) .
The inequality fleshes out dependence on the "diameter" of
the alert dissemination overlay log(1/g(0)). We compare the
patching with filtering with patching only system, which we
studied in Section 3, in the limit case of fast subnets. The
dynamics of patching only boils down to
d
dt i0 (t) = βi(t)s 0 (t) − g(t)i 0 (t)
d
dt s0 (t) = −βs 0 (t)i(t) − g(t)s 0 (t)
d
dt i1 (t) = g(t)i 0 (t).
Note that this system differs from (22)–(23) in that the infection
rate contains i(t) in place of i 0 (t). This is because,
under patching only, no scan is blocked. We provide a numerical
example in Figure 6. The figure demonstrates that
the gain with filtering increases as the fraction of infectives
in initially alerted subnets increases.
i(+∞) / i(0)
10 5
10 4
10 3
10 2
10 1
no filtering
i 1 (0) = (1/2) i(0)
i 1 (0) = g(0) i(0)
i 1 (0) = (1 − g(0)) i(0)
10
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
0
β / κ
Figure 6: The ratio of ultimately to initially infected
hosts in patching system with and with no filtering
versus the infection to alert rate β/κ. It is assumed
that patching in subnets is fast. The fraction of
initial infectives in alerted subnets is set to i 1 (0) =
ρi(0), for ρ indicated in the graph. The initial point
is i(0) = 10 −5 , s 0 (0) = 1 − g(0) − (1 − ρ)i(0), g(0) = 10 −3 .
Note the initial point is such that i 1 (0) + p(0) = g(0),
where p(t) is the fraction of patched hosts.
We end this section by noting that, as in patching-only
systems (Section 3.1), we can use the abstraction of a minimum
broadcast curve in patching and filtering systems to
obtain an upper bound on the number of infected hosts.
This is shown in [2, Proposition 3].
6. SIMULATION VALIDATION
We verify some of our results by packet-level discreteevent
simulations in SimPastry [3]. In our simulations, superhosts
are nodes in a Pastry overlay [12]. The Pastry
nodes are attached to J nodes chosen uniformly at random
from a network topology that is input to the simulator.
We used a transit-stub topology generated by the Georgia
Tech topology generator [19]. The topology has slightly
more than 5000 routers arranged into two hierarchical levels:
(top) 10 transit domains with approximately 5 routers
in each transit domain; (bottom) 10 stub domains attached
to each transit router with approximately 10 routers in each
stub domain. The delays between routers are provided by
the topology generator. The Pastry parameters are set as
b = 1 and ℓ = 32. The alert broadcast system is flooding
performed as follows. Suppose a Pastry node becomes
alerted at some time instant. It then broadcasts an alert to
all other nodes that are in its routing table. Subsequently,
each Pastry node receiving an alert, if not already alerted,
broadcasts the alert to all the nodes in its routing table;
otherwise, it discards the alert. This process continues until
all Pastry nodes are alerted. We do not consider faults in
this work. The alert broadcast time depends on the number
of routing hops between any two nodes (diameter), and
on the delays between nodes. In the presence of a worm, it
may well happen that the network becomes saturated due to
worm traffic and some servers slow down. To capture this,
we vary as input parameter a fixed delay at each routing
Fraction of infected hosts i 0 (t)
1
0.8
0.6
0.4
0.2
x 10 −3
Alert times
0
0 0.5 1 1.5
Time t (sec)
2 2.5 3
Figure 7: Patching with filters and Pastry overlay:
Fractionofinfectedhostsinnon-alertedsubnetsversus
time. The worm scan rate is β = 0.1. and the
initial fraction of infected hosts i(0) = 1/1000. The
number of superhosts, J, is equal to 100. There are
N = 100000 with 1000 hosts in each subnet. The
graph shows sample paths of 5 simulation runs with
random seeds. The thick lines are from Theorem 5,
obtained for g(t) a delayed-logistic minimum broadcast
curve. The simulation results and analytical
predictions match well. See also [2–Figure 8].
hop in the Pastry overlay. This lets us examine the performance
under overload without making specific assumptions
on the workload causing the delays. We implemented
a packet-level worm propagation model. The model is that
of a birth-and-death Markov process for infected and susceptible
hosts found in [2] (Section 3 and Section 5 therein).
The difference is that in the simulator, the alerts disseminate
by flooding on Pastry. We next present our simulation
results:
Patching with instantly alerted superhosts. The
goal is to validate the result of Corollary 1 and the bound
(12) on the ultimate fraction of infected hosts. We separate
the effects of the alert broadcasting delays by configuring
zero-valued delays at each link of the input network topology.
See Figure 5 for the results; a detailed description of
the simulation parameters can be found in the caption of the
figure. The simulation results conform well to the analytical
estimate of Corollary 1. Indeed, as expected, the larger the
number of hosts N, the smaller the variability in the results.
The figure also shows that the bound (12) is a good estimate
as long as the number of ultimately infected hosts is sufficiently
small. All the observations confirm predictions of the
analysis. Patching with alert delays. We also examined
how our analysis predicts the fraction of ultimately infected
hosts when alerts are propagated on overlay with some delay.
We varied alert delays by adding a fixed per-hop delay
in Pastry routing, ranging from 0 to 3 sec. The figures are
not presented here due to lack of space, but the simulations
results are in good agreement with the analytical prediction
of Theorem 3. Patching with filters. We consider the
same simulation setup as with patching described earlier,
but now, in addition, we have filters in place as described in
Section 5. We first validate Theorem 4. See the caption of
Figure7foradescriptionofthesimulationsetup. Thefigure
provides a comparison of analytical and simulation results,
and shows that there is a good agreement.
We also performed simulations to validate the identity
(29). To that end, we used the same setup as in Figure 7,
but varied a fixed per-hop-delay of alerts at each superhost.
The results are not shown due to lack of space, but they
confirm that: (i) patching with filters significantly outperforms
a system with no filters; (ii) alert broadcast time is a
significant factor.
7. CONCLUSION & DISCUSSION
Our analysis suggests the following claims. (i) Patching as
a countermeasure initiated at the time of worm onset can be
effective if the ratio of the worm infection rate to patching
rate is not too large. This may require the use of mechanisms
such as per-host capping of scan rates in order to limit
worm infection rates. (ii) In networks partitioned into subnets,
with each subnet having a dedicated server for patch
distribution (superhost), and superhosts connected by an
overlay, the alert broadcast time on the overlay is a significant
factor in determining the number of ultimately infected
hosts. It is important to employ overlays with small diameter
to ensure fast dissemination. (iii) We introduce the
concept of minimum broadcast curve, which proves useful
in abstracting a variety of overlay protocols. This abstraction
yields a unified analysis for evaluating the performance
of worm containment countermeasures in the presence of
alert delays. (iv) Worm-like dissemination of patches is an
appealing solution for rapid dissemination of patches. However,
its effectiveness can significantly deteriorate if the time
to verify and install patches at end-hosts is large.
Our results suggest that a large-scale system needs to be
carefully designed in order to ensure that patch spread is
sufficiently faster than worm spread. There are indications
that automatic patch verification and filter generation for
some worms can be realized on rather fast timescales (in the
order of tens of milliseconds), with a notably larger time to
generate a filter for CodeRed worm (about 4 seconds) [4].
It remains a challenge to deploy and operate a real-world,
planet-scale automatic patching system for heterogeneous
hosts that will be sufficiently rapid to guarantee effective
worm containment.
There are several open problems. (i) The implications of
systemheterogeneitywithrespecttosubnetsizesandpatching
rates over subnets. (ii) Extension of the framework to
routing or local preference worms. (iii) The models provide
a framework to study adversarial strategies for worms to
maximize the ultimate number of infected hosts.
Acknowledgments
The work described here was inspired by conversations with
Don Towsley. We thank Manuel Costa for his comments
and simulator code, Miguel Castro and Stuart Staniford for
their valuable comments on an earlier draft of this paper.
8. REFERENCES
[1] W. A. Arbaugh. A Patch in Nine Saves Time? IEEE
Computer, pages 82–83, June 2004.
[2] M. Vojnović and A. Ganesh. On the Race of Worms,
Alerts, and Patches. Technical Report TR-2005-13,
Microsoft Research, February 2005.
[3] M. Castro, P. Druschel, M. Jones, A.-M. Kermarrec,
A. Rowstron, and M. Theimer. SimPastry version 1.1.
http://www.research.microsoft.com
/∼antr/pastry/download.htm, 2002.
[4] M. Costa, J. Crowcroft, M. Castro, A. Rowstron,
L. Zhou, L. Zhang, and P. Barham. Vigilante:
End-to-End Containment of Internet Worms. In Proc.
SOSP 2005, Brighton, United Kingdom, October 2005.
[5] http://www.caida.org/analysis/security/witty, 2005.
[6] A. D. Keromytis. "Patch on Demand" Saves Even
More Time? IEEE Computer, pages 94–96, Aug 2004.
[7] G. Kesidis, I. Hamadeh, and S. Jiwasurat. Coupled
Kermack-McKendrick Model for Randomly Scanning
Worms and Bandwidth-staturating Internet Worms.
In Proc. QoS-IP, Sicily, Italy, February 2005.
[8] D. Moore, V. Paxson, S. Savage, C. Shannon,
S. Staniford, and N. Weaver. Inside the Slammer
Worm. IEEE Security & Privacy, 1(4):33–39, 2003.
[9] D. Moore, C. Shannon, G. M. Voelker, and S. Savage.
Internet quarantine: Requirements for containing
self-propagating code. In Proc. IEEE Infocom 2003,
San Francisco, CA, March 2003.
[10] B. Pittel. On Spreading a Rumor. SIAM J. Appl.
Math., 47(1):213–223, February 1987.
[11] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and
S. Shenker. A scalable content addressable network. In
ACM Sigcomm 2001, 2001.
[12] A. Rowstron and P. Druschel. Pastry: Scalable,
distributed object location and routing for large-scale
peer-to-peer systems. In IFIP/ACM Int’l Conf. on
Distributed Systems Platforms (Middleware), pages
329–350, Heildelberg, Germany, November 2001.
[13] N. Weaver S. Staniford, V. Paxson. How to Own
Internet in your Spare Time. In IEEE Security &
Privacy, 2004.
[14] S. Sidiroglou and A. D. Keromytis. Countering
network worms through automatic patch generation.
In IEEE Security & Privacy, 2005.
[15] S. Staniford. Containment of scanning worms in
enterprise networks. Journal of Computer Security (To
Appear), 2004.
[16] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger,
M. F. Kaashoek, F. Dabek, and H. Balakrisnan.
Chord: A scalable peer-to-peer lookup protocol for
internet applications. IEEE/ACM Trans. on
Networking, 11(1), February 2003.
[17] M. M. Williamson. Throttling viruses: Restricting
propagation to defeat malicious mobile code. In
ACSAC, 2002.
[18] C. Wong, C. Wang, D. Song, S. Bielski, and G. R.
Ganger. Dynamic quarantine of internet worms. In
Proc. of the International Conference on Dependable
Systems and Networks (DSN-2004), Florence, Italy,
June 2004.
[19] E. Zegura and S. Bhattacharjee. How to model an
internetwork. In Proc. of the INFOCOM’96, San
Francisco, California, 1996.
