DATABASE MANAGEMENT
SYSTEMS
SOLUTIONS MANUAL
THIRD EDITION
Raghu Ramakrishnan
University of Wisconsin
Madison, WI, USA
Johannes Gehrke
Cornell University
Ithaca, NY, USA
Jeff Derstadt, Scott Selikoff, and Lin Zhu
Cornell University
Ithaca, NY, USA
CONTENTS
PREFACE
1
2
INTRODUCTION TO DATABASE SYSTEMS
INTRODUCTION TO DATABASE DESIGN
3 THE RELATIONAL MODEL
4 RELATIONAL ALGEBRA AND CALCULUS
5
SQL: QUERIES, CONSTRAINTS, TRIGGERS
6 DATABASE APPLICATION DEVELOPMENT
7
INTERNET APPLICATIONS
8 OVERVIEW OF STORAGE AND INDEXING
9
STORING DATA: DISKS AND FILES
10 TREE-STRUCTURED INDEXING
11 HASH-BASED INDEXING
12 OVERVIEW OF QUERY EVALUATION
13 EXTERNAL SORTING
14 EVALUATION OF RELATIONAL OPERATORS
i
iii
1
6
16
28
45
63
66
73
81
88
100
119
126
131
iiDatabase Management Systems Solutions Manual Third Edition
15 A TYPICAL QUERY OPTIMIZER
16 OVERVIEW OF TRANSACTION MANAGEMENT
17 CONCURRENCY CONTROL
18 CRASH RECOVERY
19 SCHEMA REFINEMENT AND NORMAL FORMS
20 PHYSICAL DATABASE DESIGN AND TUNING
21 SECURITY
144
159
167
179
189
204
215
PREFACE
It is not every question that deserves an answer.
Publius Syrus, 42 B.C.
I hope that most of the questions in this book deserve an answer. The set of questions
is unusually extensive, and is designed to reinforce and deepen students’ understanding
of the concepts covered in each chapter. There is a strong emphasis on quantitative
and problem-solving type exercises.
While I wrote some of the solutions myself, most were written originally by students
in the database classes at Wisconsin. I’d like to thank the many students who helped
in developing and checking the solutions to the exercises; this manual would not be
available without their contributions.
In alphabetical order: X. Bao, S. Biao, M.
Chakrabarti, C. Chan, W. Chen, N. Cheung, D. Colwell, J. Derstadt, C. Fritz, V.
Ganti, J. Gehrke, G. Glass, V. Gopalakrishnan, M. Higgins, T. Jasmin, M. Krish-
naprasad, Y. Lin, C. Liu, M. Lusignan, H. Modi, S. Narayanan, D. Randolph, A.
Ranganathan, J. Reminga, A. Therber, M. Thomas, Q. Wang, R. Wang, Z. Wang and
J. Yuan. In addition, James Harrington and Martin Reames at Wisconsin and Nina
Tang at Berkeley provided especially detailed feedback.
Several students contributed to each chapter’s solutions, and answers were subse-
quently checked by me and by other students. This manual has been in use for several
semesters. I hope that it is now mostly accurate, but I’m sure it still contains er-
rors and omissions. If you are a student and you do not understand a particular
solution, contact your instructor; it may be that you are missing something, but it
may also be that the solution is incorrect! If you discover a bug, please send me mail
(raghu@cs.wisc.edu) and I will update the manual promptly.
The latest version of this solutions manual is distributed freely through the Web; go
to the home page mentioned below to obtain a copy.
For More Information
The home page for this book is at URL:
iii
Database Management Systems Solutions Manual Third Edition
http://www.cs.wisc.edu/~ dbbook
This page is frequently updated and contains information about the book, past and
current users, and the software. This page also contains a link to all known errors in
the book, the accompanying slides, and the software. Since the solutions manual is
distributed electronically, all known errors are immediately fixed and no list of errors is
maintained. Instructors are advised to visit this site periodically; they can also register
at this site to be notified of important changes by email.
1
INTRODUCTION TO DATABASE
SYSTEMS
Exercise 1.1 Why would you choose a database system instead of simply storing data
in operating system files? When would it make sense not to use a database system?
Answer 1.1 A database is an integrated collection of data, usually so large that it
has to be stored on secondary storage devices such as disks or tapes. This data can
be maintained as a collection of operating system files, or stored in a DBMS (database
management system). The advantages of using a DBMS are:
Data independence and efficient access. Database application programs are in-
dependent of the details of data representation and storage. The conceptual and
external schemas provide independence from physical storage decisions and logical
design decisions respectively. In addition, a DBMS provides efficient storage and
retrieval mechanisms, including support for very large files, index structures and
query optimization.
Reduced application development time. Since the DBMS provides several impor-
tant functions required by applications, such as concurrency control and crash
recovery, high level query facilities, etc., only application-specific code needs to
be written. Even this is facilitated by suites of application development tools
available from vendors for many database management systems.
Data integrity and security. The view mechanism and the authorization facilities
of a DBMS provide a powerful access control mechanism. Further, updates to the
data that violate the semantics of the data can be detected and rejected by the
DBMS if users specify the appropriate integrity constraints.
Data administration. By providing a common umbrella for a large collection of
data that is shared by several users, a DBMS facilitates maintenance and data
administration tasks. A good DBA can effectively shield end-users from the chores
of fine-tuning the data representation, periodic back-ups etc.
1
2
Chapter 1
Concurrent access and crash recovery. A DBMS supports the notion of a trans-
action, which is conceptually a single user’s sequential program. Users can write
transactions as if their programs were running in isolation against the database.
The DBMS executes the actions of transactions in an interleaved fashion to obtain
good performance, but schedules them in such a way as to ensure that conflicting
operations are not permitted to proceed concurrently. Further, the DBMS main-
tains a continuous log of the changes to the data, and if there is a system crash,
it can restore the database to a transaction-consistent state. That is, the actions
of incomplete transactions are undone, so that the database state reflects only the
actions of completed transactions. Thus, if each complete transaction, executing
alone, maintains the consistency criteria, then the database state after recovery
from a crash is consistent.
If these advantages are not important for the application at hand, using a collection of
files may be a better solution because of the increased cost and overhead of purchasing
and maintaining a DBMS.
Exercise 1.2 What is logical data independence and why is it important?
Answer 1.2 Answer omitted.
Exercise 1.3 Explain the difference between logical and physical data independence.
Answer 1.3 Logical data independence means that users are shielded from changes
in the logical structure of the data, while physical data independence insulates users
from changes in the physical storage of the data. We saw an example of logical data
independence in the answer to Exercise 1.2. Consider the Students relation from that
example (and now assume that it is not replaced by the two smaller relations). We
could choose to store Students tuples in a heap file, with a clustered index on the
sname field. Alternatively, we could choose to store it with an index on the gpa field,
or to create indexes on both fields, or to store it as a file sorted by gpa. These storage
alternatives are not visible to users, except in terms of improved performance, since
they simply see a relation as a set of tuples. This is what is meant by physical data
independence.
Exercise 1.4 Explain the difference between external, internal, and conceptual sche-
mas. How are these different schema layers related to the concepts of logical and
physical data independence?
Answer 1.4 Answer omitted.
Exercise 1.5 What are the responsibilities of a DBA? If we assume that the DBA
is never interested in running his or her own queries, does the DBA still need to
understand query optimization? Why?
Introduction to Database Systems
3
Answer 1.5 The DBA is responsible for:
Designing the logical and physical schemas, as well as widely-used portions of the
external schema.
Security and authorization.
Data availability and recovery from failures.
Database tuning: The DBA is responsible for evolving the database, in particular
the conceptual and physical schemas, to ensure adequate performance as user
requirements change.
A DBA needs to understand query optimization even if s/he is not interested in run-
ning his or her own queries because some of these responsibilities (database design
and tuning) are related to query optimization. Unless the DBA understands the per-
formance needs of widely used queries, and how the DBMS will optimize and execute
these queries, good design and tuning decisions cannot be made.
Exercise 1.6 Scrooge McNugget wants to store information (names, addresses, de-
scriptions of embarrassing moments, etc.) about the many ducks on his payroll. Not
surprisingly, the volume of data compels him to buy a database system. To save
money, he wants to buy one with the fewest possible features, and he plans to run it as
a stand-alone application on his PC clone. Of course, Scrooge does not plan to share
his list with anyone. Indicate which of the following DBMS features Scrooge should
pay for; in each case, also indicate why Scrooge should (or should not) pay for that
feature in the system he buys.
1. A security facility.
2. Concurrency control.
3. Crash recovery.
4. A view mechanism.
5. A query language.
Answer 1.6 Answer omitted.
Exercise 1.7 Which of the following plays an important role in representing informa-
tion about the real world in a database? Explain briefly.
1. The data definition language.