From:椰林
DBS概念和组成
A database-management system (DBMS) is a collection of interrelated data and
a set of programs to access those data. The collection of data, usually referred to
as the database, contains information relevant to an enterprise.
Data Abstraction
Physical level. how the data are actually stored
Logical level what information to keep in the database
View level only a part of the database
The user of the logical level does not need to be aware of this complexity. This
is referred to as physical data independence.
Instances and Schemas//数据库实例和模式
The collection of information stored in the database at a particular moment is
called an instance of the database.
The overall design of the database is called the database schema.
The physical schema describes the database design at the physical level, while
the logical schema describes the database design at the logical level.
A database may also have several schemas at the view level, sometimes called
subschemas, that describe different views of the database.
Data Models//数据模型
a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints.
Data-Manipulation Language
DML is a language that enables users to access
or manipulate data as organized by the appropriate data model.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special
language called a data-definition language (DDL).
Data dictionary(写在后面)
Database design
conceptual-design 概念设计
specification of functional requirements 功能需求说明
logical-design phase 逻辑设计
physical-design phase 物理设计
Relational data structure
This attribute is called a foreign key from r1, referencing r2.
The relation r1 is also called the referencing relation of the foreign key dependency,
and r2 is called the referenced relation of the foreign key.
Relational algebra
select, project ,union, set-difference, cartesian-product, rename
SQL语句与关系代数表达式、查询执行计划
SQL相关
query
Select
Data manipulation
Insert, Delete, Update
Data definition
Data control
Create, Drop, Alter
Grant, Revoke
各类语句
做题
SQL语句和关系代数的转换
第4章 高级SQL
Integrity constraint 完整性约束
Integrity constraints ensure that changes made to the database by authorized
users do not result in a loss of data consistency
security mechanism
prevent users from accessing certain parts of the database
Entity/domain integrity
referential Integrity 参照完整性
let r1 and r2 be relations whose set of attributes are R1 and R2, respectively, with primary
keys K1 and K2. We say that a subset _ of R2 is a foreign key referencing K1 in relation r1 if
it is required that, for every tuple t2 in r2, there must be a tuple t1 in r1 such that t1.K1 = t2.
K2.Requirements of this form are called referential-integrity constraints, or
subset dependencies
Integrity schemes
create in SQL, check
assertions//断言
An assertion is a predicate expressing a condition that we wish the database
always to satisfy.
Triggers
A trigger is a statement that the system executes automatically as a side effect
of a modification to the database.
Security and Authorization
grant语句//权限授予// privilege list有select, insert, update, and delete
grant
on
to ;
e.g.
grant select on department to Amit, Satoshi
revoke语句//权限收回
revoke
on <表名>
from ;
e.g.
revoke select on department from Amit, Satoshi;
revoke update (budget) on department from Amit, Satoshi
E-R model
weak entity sets
没有主码的实体集
Participation Constraints
Partial/ total 部分参与和全部参与
做题
ER图与关系表的转换
为什么需要规范化???
函数依赖,1NF, 2NF, 3NF, BCNF的基本概念
•
Functional dependency− is a set of constraints between two attributes in a relation.
•
Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial
FD. Trivial FDs always hold.//平凡的
•
•
1NF - 数据表中的每一列(每个字段)必须是不可拆分的最小单元,也就是确保每一列的原子性
2NF - 满足1NF后,要求表中的所有列,都必须依赖于主键,而不能有任何一列与主键没有关系,也就是说一个表只描述
一件事情
•
3NF - 必须先满足第二范式(2NF),要求:表中的每一列只与主键直接相关而不是间接相关,(表中的每一列只能依赖
于主键)
做题
判断关系模式所属范式(1NF, 2NF, 3NF, BCNF)
从文字描述中抽象出函数依赖
候选键、Fc、3NF分解;
无损分解、函数依赖保持分解
File Organization//文件组织
Fixed-Length Records和variable-length records
定长记录,变长记录
Organization of Records in Files//文件中记录的组织
Heap File Organization,堆文件 Clustering File Organization簇文件
Sequential File Organization,顺序文件 Hashing File Organization,散列文件
Data-Dictionary //数据字典
Relational schemas and other metadata(元数据)about relations are stored in a
structure called the data dictionary
Content内容(基本):
Names of the relations.关系的名字
Names of the attributes of each relation.关系中的属性名
Domains and lengths of attributes.属性的域和长度
Names of views defined on the database, and definitions of those views.
视图名和视图定义
Integrity constraints (for example, key constraints).完整性约束
聚集/非聚集索引,稠密/稀疏索引
Dense 稠密,每个搜索码值都有索引项
sparse indices 稀疏,部分搜索码值有索引项,只有聚集索引才能稀疏
primary=clustering 聚集索引
包含记录的文件按照某个搜索码的顺序排列
secondary indices=non-clustering indices 非聚集索引
搜索码中的指定顺序与文件记录的顺序不同
An attribute or set of attributes used to look up records in a file is called a search key.
做题
能够正确设计、区分不同索引类型;
数据文件组织/单层索引
多层聚集索引/多层非聚集索引结构
Query Processing 查询处理
三步:
parsing and translation,语法分析与翻译
optimizing,优化
evaluation 执行
Measures of query cost 查询代价的度量
factors in query costs//3个影响查询的因素:
disk accesses 磁盘存取
CPU times to execute a query 执行一个查询所用CPU时间
communication costs 通信代价
Query Optimization
两种执行策略/方法:
Cost-Based Optimization 基于代价
Heuristics in Optimization 启发式查询优化
关系代数等价变换规则(感觉是基于做题吧)
做题
SQL语句初始查询树优化查询树
(注意利用选择下移、投影下移,简化查询)
Transactions 事务
定义:A transaction is a unit of program execution that accesses and possibly
updates various data items.
组成:A transaction is delimited by statements (or function calls) of the form begin
transaction and end transaction. //用这两个语句界定
生命周期:The transaction consists of all operations executed between the begin
transaction and end transaction.
事务由高级数据操纵语言(SQL)或编程语言通过JDBC或ODBC嵌入式数据库访问书
写的用户程序的执行所引起。
ACID特性:p357
Atomicity consistency durability isolation
原子性,一致性,持久性,隔离性
ACID特性的保障机制
recovery system(恢复系统)保障Atomicity,durability原子性和持久性
concurrency-control system(并发控制系统)保障isolation隔离性
States of transactions
active, partially committed, failed, aborted, committed.
partially committed
部分提交
Committed
提交
Active
活动
Failed
失败
Aborted
中止
各状态下事务所做工作 p360
事务的串行、并行执行(都是理解性概念)
冲突操作
写读,读写,写写
可串行化调度(事务调度可串行化)
That is, the schedule should, in some sense, be equivalent to a serial schedule. Such
schedules are called serializable schedules.
冲突等价
If a schedule S can be transformed into a schedule S1 by a series of swaps of
nonconflicting instructions, we say that S and S1 are conflict equivalent.
冲突可串行化
We say that a schedule S is conflict serializable if it is conflict equivalent to a serial
schedule.
做题
Precedence graph 前驱图判断是否可串行化,
recoverable schedule 可恢复调度,cascadeless schedule 无级联调度的判断
Concurrency Control
The Two-Phase Locking Protocol
Growing phase. A transaction may obtain locks, but may not release any
lock. Shrinking phase. A transaction may release locks, but may not obtain any
new locks.
一旦进入缩减阶段,不能再申请新锁。
strict two-phase locking protocol 严格
This protocol requires not only that locking be two phase, but also that all
exclusive-mode locks taken by a transaction be held until that transaction commits.
事务持有的排他锁在事务提交后方可释放。
rigorous two-phase locking protocol 强制
all locks be held until the transaction commits.
事务提交之前不得释放任何锁
构造2PL
multiple-granularity locking protocol多粒度锁协议:
在不同粒度的数据对象上(DB, table, tuple, attribute etc.)
施加S(共享)、X(排他)、IS(共享型意向)、IX(排他性意向)、
SIX(共享排他型意向锁)
Recovery System
Three types of failures may occur in DBS:
Transaction failure 事务故障
logical errors: transaction cannot complete due to some internal error condition,
比如:输入错误,找不到数据,溢出,超出资源限制.
system errors: system has entered an undesirable state (for example,deadlock)
比如:死锁
System crash 系统崩溃
硬件或数据库软件,操作系统的漏洞,使易失性存储器内容丢失,非易失性存储器
完好无损。
Disk (storage medium failure) 介质故障
数据传输过程中内容丢失。
Data Access p405
Input(B)传送物理块至主存
Output(B)传送缓冲块至磁盘,并替换磁盘上的物理块
Write(X) 若X所在块B不在主存中,需要发指令Input(B)
Read(X) 若X所在块B不在主存中,需要发指令Input(B)
Output 用于主存空间不足,或者希望将变化反应在磁盘上的时候
Log-Based recovery
Update log record:
事务Ti,数据项Xj,V1数据项写前值,V2数据项写后值
开始,提交,中止
日志放在稳定存储器中 stable storage
在数据库修改前建立该次写操作的日志记录并加到日志中。
做题
根据调度写日志
根据日志进行恢复(判断数据项的变化)
讲义例题