Massachusetts Institute of Technology
6.042J/18.062J, Fall ’07: Mathematics for Computer Science
Prof. Albert R. Meyer, Dr. Clifford Smyth
Course Notes, Week 8
October 22
revised October 23, 2007, 1283 minutes
Introduction to Number Theory
Number theory is the study of the integers. Why anyone would want to study the integers is not
immediately obvious. First of all, what’s to know? There’s 0, there’s 1, 2, 3, and so on, along
with their negatives. Which one don’t you understand? After all, the mathematician G. H. Hardy
wrote:
[Number theorists] may be justified in rejoicing that there is one science, at any rate,
and that their own, whose very remoteness from ordinary human activities should
keep it gentle and clean.
What most concerned Hardy was that number theory not be used in warfare; he was a pacifist.
Good for him, but if number theory is remote from all human activity, then why study it? We’ll
come back to that question later on, but ironically, we’ll see that poor Hardy must be turning in
his grave.
1 Divisibility
We’ll be examining integer properties in these notes, so we’ll adopt the convention that variables
range over integers.
The nature of number theory emerges as soon as we consider the divides relation
a divides b
iff ak = b for some k.
The notation, a | b, is an abbreviation for “a divides b.” If a | b, then we also say that b is a multiple
of a. As we’ve seen, a consequence of this definition is that every number divides zero.
This seems simple enough, but let’s play with this definition. The Pythagoreans, an ancient sect
of mathematical mystics, said that a number is perfect if it equals the sum of its positive integral
divisors, excluding itself. For example, 6 = 1 + 2 + 3 and 28 = 1 + 2 + 4 + 7 + 14 are perfect
numbers. On the other hand, 10 is not perfect because 1 + 2 + 5 = 8, and 12 is not perfect because
1+2+3+4+6 = 16. Euclid characterized all the even perfect numbers around 300 BC. But is there
an odd perfect number? More than two thousand years later, we still don’t know! All numbers up
to about 10300 have been ruled out, but no one has proved that there isn’t an odd perfect number
waiting just over the horizon.
So a half-page into number theory, we’ve strayed past the outer limits of human knowledge! This
is pretty typical; number theory is full of questions that are easy to pose, but incredibly difficult
to answer. Interestingly, computer scientists have found ways to turn these difficulties to their
advantage. Every time you buy a book from Amazon, check your grades on WebSIS, or use a
PayPal account, you are relying on number theoretic algorithms.
DON’T PANIC— we’re going to stick to some relatively benign parts of number theory. We won’t
put any of these super-hard unsolved problems on exams!
Copyright © 2007, Prof. Albert R. Meyer. All rights reserved.
2
Course Notes, Week 8: Introduction to Number Theory
1.1 Facts About Divisibility
The lemma below states some basic facts about divisibility that are not difficult to prove:
Lemma 1.1. The following statements about divisibility hold.
1. If a | b, then a | bc for all c.
2. If a | b and b | c, then a | c.
3. If a | b and a | c, then a | sb + tc for all s and t.
4. For all c 6= 0, a | b if and only if ca | cb.
Proof. We’ll prove only part 2.; the other proofs are similar.
Proof of 2.: Since a | b, there exists an integer k1 such that ak1 = b. Since b | c, there exists an
integer k2 such that bk2 = c. Substituting ak1 for b in the second equation gives ak1k2 = c, which
implies that a | c.
A number p > 1 with no positive divisors other than 1 and itself is called a prime. Every other
number greater than 1 is called composite. For example, 2, 3, 5, 7, 11, and 13 are all prime, but 4,
6, 8, and 9 are composite. The number 1 is considered neither prime nor composite. This is just
a matter of definition, but reflects the fact that 1 does not behave like a prime in many contexts,
such as the Fundamental Theorem of Arithmetic, which we’ll come to shortly.
1.2 When Divisibility Goes Bad
As you learned in elementary school, if one number does not evenly divide another, then there
is a “remainder” left over. More precisely, if you divide n by d, then you get a quotient q and a
remainder r:
Theorem 1.2 (Division Theorem). Let n and d be integers such that d > 0. Then there exists a unique
pair of integers q and r such that n = qd + r and 0 ≤ r < d.1
As an example, suppose that a = 10 and b = 2716. Then the quotient is q = 271 and the remainder
is r = 6, since 2716 = 271 · 10 + 6.
The remainder r in the Division Theorem is denoted rem(n, d). In other words, rem(n, d) is the
remainder when n is divided by d. For example, rem(32, 5) is the remainder when 32 is divided
by 5, which is 2. Similarly, rem(−11, 7) = 3, since −11 = (−2)· 7 + 3. There is a remainder operator
built into many programming languages. For example, the expression “32 % 5” evaluates to 2 in
Java, C, and C++. However, all these languages treat negative numbers strangely.
We’ll take this familiar Division Theorem for granted without proof.
1This theorem is often called the “Division Algorithm,” even though it is not an algorithm in the modern sense.
Course Notes, Week 8: Introduction to Number Theory
3
Famous Problems in Number Theory
Fermat’s Last Theorem Do there exist positive integers x, y, and z such that
xn + yn = zn
for some integer n > 2? In a book he was reading around 1630, Fermat claimed to
have a proof, but not enough space in the margin to write it down. Wiles finally gave
a proof of the theorem in 1994, after seven years of working in secrecy and isolation
in his attic. His proof did not fit in any margin.
Goldbach Conjecture Is every even integer greater than or equal to 4 the sum of two
primes? For example, 4 = 2 + 2, 6 = 3 + 3, 8 = 3 + 5, etc. The conjecture holds for
all numbers up to 1016. In 1939 Schnirelman proved that every even number can be
written as the sum of not more than 300,000 primes, which was a start. Today, we
know that every even number is the sum of at most 6 primes.
Twin Prime Conjecture Are there infinitely many primes p such that p+2 is also a prime?
In 1966 Chen showed that there are infinitely many primes p such that p + 2 is the
product of at most two primes. So the conjecture is known to be almost true!
Primality Testing Is there an efficient way to determine whether n is prime? A naive
search for factors of n takes a number of steps exponential in log n, which is the size
of n in bits. All known procedures for prime checking blew up like this on vari-
ous inputs. Finally in 2002, an amazingly simple, new method was discovered by
Agrawal, Kayal, and Saxena, which showed that prime testing only required a poly-
nomial number of steps. Their paper began with a quote from Gauss emphasizing
the importance and antiquity of the problem even in his time— two centuries ago.
So prime testing is definitely not in the category of infeasible problems requiring an
exponentially growing number of steps in bad cases.
Factoring Given the product of two large primes n = pq, is there an efficient way to re-
cover the primes p and q? The best known algorithm is the “number field sieve”,
which runs in time proportional to:
This is infeasible when n has 300 digits or more.
e1.9(ln n)1/3(ln ln n)2/3
4
2 Die Hard
Course Notes, Week 8: Introduction to Number Theory
We’ve previously looked at the Die Hard water jug problem with jugs of sizes 3 and 5, and 3 and
9. It would be nice if we could solve all these silly water jug questions at once. In particular, how
can one form g gallons using jugs with capacities a and b? Here’s where number theory comes in
handy.
2.1 Finding an Invariant Property
Suppose that we have water jugs with capacities a and b. The state of the system is described
below with a pair of numbers (x, y), where x is the amount of water in the jug with capacity a
and y is the amount in the jug with capacity b. Let’s carry out sample operations and see what
happens, assuming the b-jug is big enough:
(0, 0) → (a, 0)
→ (0, a)
→ (a, a)
→ (2a − b, b)
→ (2a − b, 0)
→ (0, 2a − b)
→ (a, 2a − b)
→ (3a − 2b, b)
fill first jug
pour first into second
fill first jug
pour first into second (assuming 2a ≥ b)
empty second jug
pour first into second
fill first
pour first into second (assuming 3a ≥ 2b)
What leaps out is that at every step, the amount of water in each jug is of the form
s · a + t · b
(1)
for some integers s and t. An expression of the form (1) is called an integer linear combination of
a and b, but in these notes we’ll just call it a linear combination, since we’re only talking integers.
So we’re suggesting:
Lemma 2.1. Suppose that we have water jugs with capacities a and b. Then the amount of water in each
jug is always a linear combination of a and b.
Lemma 2.1 is easy to prove by induction on the number of pourings.
But Lemma 2.1 isn’t very satisfying. We’ve just managed to recast a pretty understandable ques-
tion about water jugs into a complicated question about linear combinations. This might not seem
like progress. Fortunately, linear combinations are closely related to something more familiar and
that will help us solve the water jug problem.
3 The Greatest Common Divisor
We’ve already examined the Euclidean Algorithm for computing gcd(a, b), the greatest common
divisor of a and b. This quantity turns out to be a very valuable piece of information about the
relationship between a and b. We’ll be making arguments about greatest common divisors all the
time.
Course Notes, Week 8: Introduction to Number Theory
5
3.1 Linear Combinations and the GCD
The theorem below relates the greatest common divisor to linear combinations. This theorem is
very useful; take the time to understand it and then remember it!
Theorem 3.1. The greatest common divisor of a and b is equal to the smallest positive linear combination
of a and b.
For example, the greatest common divisor of 52 and 44 is 4. And, sure enough, 4 is a linear
combination of 52 and 44:
6 · 52 + (−7) · 44 = 4
Furthermore, no linear combination of 52 and 44 is equal to a smaller positive integer.
Proof. By the well-ordering principle, there is a smallest positive linear combination of a and b;
call it m. We’ll prove that m = gcd(a, b) by showing both gcd(a, b) ≤ m and m ≤ gcd(a, b).
First, we show that gcd(a, b) ≤ m. Now any common divisor of a and b, that is, any c such that
c | a and c | b will divide both sa and tb, and therefore also divides sa + tb. The gcd(a, b) is by
definition a common divisor of a and b, so
gcd(a, b) | sa + tb
every s and t. In particular, gcd(a, b) | m, which implies that gcd(a, b) ≤ m.
Now, we show that m ≤ gcd(a, b). We do this by showing that m | a. A symmetric argument
shows that m | b, which means that m is a common divisor of a and b. Thus, m must be less than
or equal to the greatest common divisor of a and b.
All that remains is to show that m | a. By the Division Algorithm, there exists a quotient q and
remainder r such that:
a = q · m + r
(where 0 ≤ r < m)
Recall that m = sa + tb for some integers s and t. Substituting in for m and rearranging terms
gives:
a = q · (sa + tb) + r
r = (1 − qs)a + (−qt)b
We’ve just expressed r as a linear combination of a and b. However, m is the smallest positive linear
combination and 0 ≤ r < m. The only possibility is that the remainder r is not positive; that is,
r = 0. This implies m | a.
The proof notes that every linear combination of a and b is a multiple of gcd(a, b). Conversely, since
gcd(a, b) is a linear combination of a and b, every multiple of gcd(a, b) is as well. This establishes a
corollary:
Corollary 3.2. Every linear combination of a and b is a multiple of gcd(a, b) and vice versa.
Now we can restate the water jugs lemma in terms of the greatest common divisor:
Corollary 3.3. Suppose that we have water jugs with capacities a and b. Then the amount of water in each
jug is always a multiple of gcd(a, b).
For example, there is no way to form 2 gallons using 1247 and 899 gallon jugs, because 2 is not a
multiple of gcd(1247, 899) = 29.
6
Course Notes, Week 8: Introduction to Number Theory
3.2 Properties of the Greatest Common Divisor
We’ll often make use of some basic gcd facts:
Lemma 3.4. The following statements about the greatest common divisor hold:
1. Every common divisor of a and b divides gcd(a, b).
2. gcd(ka, kb) = k · gcd(a, b) for all k > 0.
3. If gcd(a, b) = 1 and gcd(a, c) = 1, then gcd(a, bc) = 1.
4. If a | bc and gcd(a, b) = 1, then a | c.
5. gcd(a, b) = gcd(b, rem(a, b)).
Here’s the trick to proving these statements: translate the gcd world to the linear combination
world using Theorem 3.1, argue about linear combinations, and then translate back using Theo-
rem 3.1 again.
Proof. We prove only parts 3. and 4.
Proof of 3.: The assumptions together with Theorem 3.1 imply that there exist integers s, t, u, and
v such that:
sa + tb = 1
ua + vc = 1
Multiplying these two equations gives:
(sa + tb)(ua + vc) = 1
The left side can be rewritten as a · (asu + btu + csv) + b(ctv). This is a linear combination of a and
bc that is equal to 1, so gcd(a, bc) = 1 by Theorem 3.1.
Proof of 4.: Theorem 3.1 says that gcd(ac, bc) is equal to a linear combination of ac and bc. Now
a | ac trivially and a | bc by assumption. Therefore, a divides every linear combination of ac and
bc. In particular, a divides gcd(ac, bc) = c · gcd(a, b) = c · 1 = c. The first equality uses part 2. of
this lemma, and the second uses the assumption that gcd(a, b) = 1.
Lemma 3.4, part 5. is the fact we assumed when we proved correctness of the Euclidean Algorithm.
Now let’s see if it’s possible to make 3 gallons using 21 and 26-gallon jugs. Using Euclid’s algo-
rithm:
gcd(26, 21) = gcd(21, 5) = gcd(5, 1) = 1.
Now 3 is a multiple of 1, so we can’t rule out the possibility that 3 gallons can be formed. On the
other hand, we don’t know it can be done.
Course Notes, Week 8: Introduction to Number Theory
7
3.3 One Solution for All Water Jug Problems
Can Bruce form 3 gallons using 21 and 26-gallon jugs? This question is not so easy to answer
without some number theory.
Corollary 3.2 says that 3 can be written as a linear combination of 21 and 26, since 3 is a multiple
of gcd(21, 26) = 1. In other words, there exist integers s and t such that:
3 = s · 21 + t · 26
We don’t know what the coefficients s and t are, but we do know that they exist.
Now the coefficient s could be either positive or negative. However, we can readily transform this
linear combination into an equivalent linear combination
3 = s0 · 21 + t0 · 26
where the coefficient s0 is positive. The trick is to notice that if we increase s by 26 in the original
equation and decrease t by 21, then the value of the expression s · 21 + t · 26 is unchanged overall.
Thus, by repeatedly increasing the value of s (by 26 at a time) and decreasing the value of t (by 21
at a time), we get a linear combination s0 · 21 + t0 · 26 = 3 where the coefficient s0 is positive. Notice
that t0 must be negative; otherwise, this expression would be much greater than 3.
Now here’s how to form 3 gallons using jugs with capacities 21 and 26:
Repeat s0 times:
1. Fill the 21-gallon jug.
2. Pour all the water in the 21-gallon jug into the 26-gallon jug. Whenever the 26-gallon jug
becomes full, empty it out.
At the end of this process, there must be exactly 3 gallons in the 26-gallon jug! Here’s why: we’ve
taken s0 · 21 gallons of water from the fountain, we’ve poured out some multiple of 26 gallons, and
in the end the 26-gallon jug holds somewhere between 0 and 26 gallons. Furthermore, we know:
s0 · 21 + t0 · 26 = 3
Thus, we must have emptied the 26-gallon jug exactly −t0 times; if we had emptied it fewer times,
then there would be more than 26 gallons left. And we did not withdraw enough water from the
fountain to empty the 26-gallon jug more than −t0 times. Thus, by the equation above, there must
be exactly 3 gallons left.
Remarkably, we don’t even need to know the coefficients s0 and t0 in order to use this strategy!
Instead of repeating the outer loop s0 times, we could just repeat until we obtain 3 gallons, since that
must happen eventually. Of course, we have to keep track of the amounts in the two jugs so we
8
Course Notes, Week 8: Introduction to Number Theory
know when we’re done. Here’s the solution that approach gives:
(0, 0)
fill 21−−−→ (21, 0)
fill 21−−−→ (21, 21)
fill 21−−−→ (21, 16)
fill 21−−−→ (21, 11)
fill 21−−−→ (21, 6)
fill 21−−−→ (21, 1)
fill 21−−−→ (21, 22)
fill 21−−−→ (21, 17)
fill 21−−−→ (21, 12)
fill 21−−−→ (21, 7)
fill 21−−−→ (21, 2)
fill 21−−−→ (21, 23)
fill 21−−−→ (21, 18)
fill 21−−−→ (21, 13)
fill 21−−−→ (21, 8)
−−−−−−−−−→ (0, 21)
pour 21 into 26
−−−−−−−−−→ (16, 26)
pour 21 into 26
−−−−−−−−−→ (11, 26)
pour 21 into 26
−−−−−−−−−→ (6, 26)
pour 21 into 26
−−−−−−−−−→ (1, 26)
pour 21 into 26
−−−−−−−−−→ (0, 22)
pour 21 into 26
−−−−−−−−−→ (17, 26)
pour 21 into 26
−−−−−−−−−→ (12, 26)
pour 21 into 26
−−−−−−−−−→ (7, 26)
pour 21 into 26
−−−−−−−−−→ (2, 26)
pour 21 into 26
−−−−−−−−−→ (0, 23)
pour 21 into 26
−−−−−−−−−→ (18, 26)
pour 21 into 26
−−−−−−−−−→ (13, 26)
pour 21 into 26
−−−−−−−−−→ (8, 26)
pour 21 into 26
−−−−−−−−−→ (3, 26)
pour 21 into 26
−−−−−→ (16, 0)
empty 26
−−−−−→ (11, 0)
empty 26
−−−−−→ (6, 0)
empty 26
−−−−−→ (1, 0)
empty 26
−−−−−−−−−→ (0, 16)
pour 21 into 26
−−−−−−−−−→ (0, 11)
pour 21 into 26
−−−−−−−−−→ (0, 6)
pour 21 into 26
−−−−−−−−−→ (0, 1)
pour 21 into 26
−−−−−→ (17, 0)
empty 26
−−−−−→ (12, 0)
empty 26
−−−−−→ (7, 0)
empty 26
−−−−−→ (2, 0)
empty 26
−−−−−−−−−→ (0, 17)
pour 21 into 26
−−−−−−−−−→ (0, 12)
pour 21 into 26
−−−−−−−−−→ (0, 7)
pour 21 into 26
−−−−−−−−−→ (0, 2)
pour 21 into 26
−−−−−→ (18, 0)
empty 26
−−−−−→ (13, 0)
empty 26
−−−−−→ (8, 0)
empty 26
−−−−−→ (3, 0)
empty 26
−−−−−−−−−→ (0, 18)
pour 21 into 26
−−−−−−−−−→ (0, 13)
pour 21 into 26
−−−−−−−−−→ (0, 8)
pour 21 into 26
−−−−−−−−−→ (0, 3)
pour 21 into 26
The same approach works regardless of the jug capacities and even regardless the amount we’re
trying to produce! Simply repeat these two steps until the desired amount of water is obtained:
1. Fill the smaller jug.
2. Pour all the water in the smaller jug into the larger jug. Whenever the larger jug becomes
full, empty it out.
By the same reasoning as before, this method eventually generates every multiple of the greatest
common divisor of the jug capacities —all the quantities we can possibly produce. No ingenuity
is needed at all!
3.4 The Pulverizer
We saw that no matter which pair of integers a and b we are given, there is always a pair of integer
coefficients s and t such that
gcd(a, b) = sa + tb.
The previous subsection gives a roundabout and not very efficient method of finding such coeffi-
cients s and t. In the Notes on State Machines we defined and verified the “Extended Euclidean
GCD algorithm” which is a much more efficient way to find these coefficients. In this section
we give a more straightforward description of this procedure for finding s and t that dates to
sixth-century India, where it was called kuttak, which means “The Pulverizer.”