1
International University, VNU-HCMC
Lecture 2: Relational Model and Algebra
School of Computer Science and Engineering
Instructor: Nguyen Thi Thuy Loan
nttloan@hcmiu.edu.vn nthithuyloan@gmail.com,
https://nttloan.wordpress.com/
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
2
Duke CS, Fall 2021 2Duke CS, Fall 2021
Course Website
•Blackboard IU
•Please check frequently for updates!
2
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
3
Duke CS, Fall 2021 3Duke CS, Fall 2021
Outline
•Relati a delon l mo
•Relati ebraonal alg
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
4
Duke CS, Fall 2021 4Duke CS, Fall 2021
Acknowledgement
•The following slides are referenced from Dr. Sudeepa
Roy, Duke University.
3
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
5
Duke CS, Fall 2021 5Duke CS, Fall 2021
•Pilot in the Royal Air Forcein WW2
•Inventor of the relati a modon l el
an al b wh IBMd ge ra ileat
•Turing Award, 1981
RDBMS = Relati a D Son l BM
http /en.w ped .org Edgar_F_Co .jpg:/ iki ia /w /Fiki ile: dd
Edgar F. Codd (1923-2 3)00
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
6
Duke CS, Fall 2021 6Duke CS, Fall 2021
B sar
Each has an address
Drinkers
Each has an address
Beers
Each has a brewer
Drinkers Frequent B sar
“X” times a w kee
BarsServe Beers
At price “Y”
Drinkers Likes Beers
(Later in ER diagram – how to
design a rel ona a se)ati l d taba
The famous “Beers” database
4
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
7
S, Fall 2021
S ba fee online data se or more tuples
Bar
Beer
Drinker
Likes
Serves
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
S f onatis acti Budweiser 2.25
Name brewer
Budweiser Anheuser-Bu Insch c.
Corona Grupo Modelo
Dixie Dix ewingie Br
name address
Amy 100 W. Main Street
Ben 101 W. Main Street
Dan 300 N. Duke Street
n ame address
The Edge 108 Morris Street
S f oatis acti n 905 ain W. M Street
dri ernk bar times_a_week
Ben Satis actif on 2
Dan The Edge 1
Dan Satis actif on 2
dri ernk beer
Amy Corona
Dan Budweiser
Dan Corona
Ben Budweiser
Fr uentseq
“Beers” as a Relational Database
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
8
Duke CS, Fall 2021 8Duke CS, Fall 2021
•A database is a collection of rela tablestions(or )
•Ea rech lation has a set of attributes (o )r columns
•Ea ach ttribute has a n (oame and adomain r type)
•Set- tvalued attribu es are no owedt all
•Ea re ach lation cont ins a set” of tuple ss (or row )
•Each tuple has a value for each attribut elatione of the r
•Duplicat te uples are not allowed (Two tuples are duplicat tes if hey
agre tes)e on all attribu
•Or esdering of rows do n’t matter (even though output is
always in some order)
•Howeve o r, SQL supp rts bag
or duplicate tuple (w ?)s hy
Simpl viicity is a rtue!
Serves
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
S f onatis acti Budweiser 2.25
Relational data model
5
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
9
Duke CS, Fall 2021 9Duke CS, Fall 2021
•Ordering of rows doesn’t matter (even though output is
always in some order)
Beer
Serves
bar b eer p rice
The Edge Budweiser 2.50
The Edge Corona 3.00
S f onatis acti Budweiser 2.25
Name brewer
Budweiser Anheuser-Bu Insch c.
Corona Grupo Modelo
Dixie Dix ewingie Br
dri ernk bar times_a_week
Ben Satis actif on 2
Dan The Edge 1
Dan Satis actif on 2
Fr uentseq
Schem s st cea v . in an
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
10
Duke CS, Fall 2021 10Duke CS, Fall 2021
•Schema (metadata)
•Specifies the logical structure of data
•Is defined at setup time
•Rarely changes
•Instance
•Represents the data content
•Changes rapidly, but always conforms to the schema
Compare to types vs. collections of objects of these types in a
programming language
Schem s st cea v . in an
6
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
11
Duke CS, Fall 2021 11Duke CS, Fall 2021
Example
•Schema (metadata)
•Beer(name str str g)ing, brewer in
•Serves(bar str str ic float)ing, beer ing, pr e
•Frequents (drinker str str t)ing, bar ing, times in_a_week
•Instance
•Beer {<Budweiser, Anhe er-Bu In Modelo>,…}us sch c.>, <corona, Grupo
•Serves{<the Edge, Budweiser, 2.50>,<The Edge, Corona, 3.0>,..}
•Frequents {<Ben, Satisfaction,2>,<Dan, The Edge, 1>,…}
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
12
Duke CS, Fall 2021 12Duke CS, Fall 2021
Alanguage for querying relati tonal da a
based on “operators”
RelOp
RelOp
•Core operat s:or
•Sele oje osction, pr ction, cr s product, union, , difference
and renaming
•Additi alon , derived operators:
•Jo ain, n tural join, interse octi n, etc.
•Compose operators to make comp rlex que ies
Relational algebra
7
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
13
Duke CS, Fall 2021 13Duke CS, Fall 2021
•Input: a table 𝑅
•Nota oti n: 𝜎P𝑅
𝑝is c eall d aselection condition (orpr icateed )
•P ordurpose: filter rows acc ing to some criteria
•O sutput: ame colum t tns as 𝑅,but only rows of 𝑅 ha
satisfy 𝑝 (𝑠𝑒𝑡!)
Selection
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
14
Duke CS, Fall 2021 14Duke CS, Fall 2021
Serves 𝝈𝒑𝒓𝒊𝒄𝒆 < 𝟐.𝟕𝟓 Serves
Find 5beers with price < 2.7
bar beer price
The Edge Budw eiser 2.50
The Edge Corona 3.00
S f oatis acti n Budweiser 2.25
bar beer price
The Edge Budweiser 2.50
S f onatis acti Budweiser 2.25
Selection example
8
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
15
Duke CS, Fall 2021 15Duke CS, Fall 2021
•Selection c on can includeonditi any column of 𝑅, constants,
c paom rison ( lean co=,≤, etc.) and Boo nnectives (: and, :
or, ¬: not)
•Example: Serves tuples for “Th e” or pric 5e Edg e >= 2.7
𝜎bar=&‘The&Edge’&&price&³2.75𝑆𝑒𝑟𝑣𝑒𝑠
•You must be able to evaluate the condition over each single
rowof the input table!
•E mos sixample: the t expen ve beer at any bar
𝜎price&³every&price&in&Servers 𝑈𝑠𝑒𝑟
WRONG!
Serves
bar b
eer
p
rice
The Edge Budweiser 2.50
The Edge Corona 3.00
S f onatis acti Budweiser 2.25
Mo sre on election
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
16
Duke CS, Fall 2021 16Duke CS, Fall 2021
•Input: a table 𝑅
•N atiot on: 𝜋L𝑅
𝐿is a li lust of co mns in 𝑅
•P po uur se: outp t cho cosen lumns
•Output: same rows, but onlythecolumnsin 𝐿 (𝑠𝑒𝑡!)
Projection
9
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
17
Duke CS, Fall 2021 17Duke CS, Fall 2021
Serves 𝝅𝒃𝒆𝒆𝒓,𝒑𝒓𝒊𝒄𝒆 Serves
Example: Find all the prices for each rbee
Output of 𝜋beerServes?
bar beer price
The Edge Budweiser 2.50
The Edge Corona 3.00
S f onatis acti Budweiser 2.25
beer p rice
Budweiser 2.50
Corona 3.00
Budweiser 2.25
Projection
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
18
Duke CS, Fall 2021 18Duke CS, Fall 2021
More on Projection
•Duplicate output rows are removed (by definition)
•Example: beer on servers
Serves
bar beer p rice
The Edge Budweiser 2.50
The Edge Corona 3.00
S f onatis acti Budweiser 2.25
𝝅𝒃𝒆𝒆𝒓 Serves
beer
Budweiser
Corona
Budweiser
10
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
19
Duke CS, Fall 2021 19Duke CS, Fall 2021
•Input: two tables 𝑅and 𝑆
•Natation: 𝑅×𝑆
•Purpose: pairs rows from two tables
•Output: for ea roch w 𝑟in 𝑅and each 𝑠in 𝑆, output
a row 𝑟𝑠 (concatenation of 𝑟and )𝑠
Cross product
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
20
Duke CS, Fall 2021 20Duke CS, Fall 2021
Bar Frequents
name address
The Edge 108 Morris Street
Satis actionf 905 ain W. M Street
drinker bar times_a_week
Ben Satisf action 2
Dan The Edge 1
Dan Satisf action 2
n ame address drink er bar t imes_a_
w eek
The Edge 10 Mo Str8 rris eet Ben Satis actif o n 2
The Edge 10 Mo Str8 rris eet Dan The Edge 1
The Edge 10 Mo Str8 rris eet Dan Satis actif o n 2
Satis actif on 905 W. Main Street Ben Satis actif o n 2
Satis actif on 905 W. Main Street Dan The Edge 1
Satis actif on 905 W. Main Street Dan Satis actif o n 2
Bar x Fr sequent
Cross product
11
International University, VNU-HCMC
Assoc. Pr hD
21
Duke CS, Fall 2021 21Duke CS, Fall 2021
Cross product
•Ord ier ng of columns is unimportant as far as
contents are concerned.
So cross product is commutative, i.e., for any R and
S, R X S = S X R (up to the ordering of columns)
dr ink er bar
time
s_a_
w
eek
n ame address
Ben Satis actif o n 2 The Edge
10 Mo Str8 rris eet
Dan The Edge 1 The Edge
10 Mo Str8 rris eet
Dan Satis actif o n 2 The Edge
10 Mo Str8 rris eet
Ben Satis actif o n 2 Satis actif o n 905 W. Main Str eet
Dan The Edge 1 Satis actif o n 905 W. Main Str eet
Dan Satis actif o n 2 Satis actif o n 905 W. Main Str eet
n ame address dr ink
er
bar
t im
es_
a_w
eek
Th e Edge
10 Mo Str8 rris eet
Ben Satis actif on 2
Th e Edge
10 Mo Str8 rris eet
Dan Th e Edge 1
Th e Edge
10 Mo Str8 rris eet
Dan S atis actif on 2
Satis actif on 90 5 W. Main Str eet Ben S atis actif on 2
Satis actif on 90 5 W. Main Str eet Dan Th e Edge 1
Satis actif on 90 5 W. Main Str eet Dan S atis actif on 2
=
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
22
Duke CS, Fall 2021 22Duke CS, Fall 2021
(Also known as th tae -join”: most general joins)
•Input: two tables 𝑅and𝑆
•Notation:𝑅P𝑆
𝑝is called a join condition (or p ic ered at )
•Pu la orrpose: re te rows from two tables acc ding
t crit iao some er
•Ou u fo a ro a eatp t: r e ch w 𝑟in 𝑅 nd ch row 𝑠in ,𝑆
output a row 𝑟𝑠 if 𝑟a tnd 𝑠sa isfy 𝑝
•Shorthand for 𝜎P(R×𝑆)
•Predicate p only has equality (A = 5 B = 7) : equijoin
One of themost important
opera s!tion
De vri ed operator:join
12
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
23
•Exten Frequed nts relation with addre barssses of the
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 bar%=%name 𝐵𝑎𝑟
Bar Fr sequent
Ambiguous attribute? Prefix a column reference with table name and “.” to
disambiguate identically named columns from different tables. Ex. Use Bar. amn e
name address
The Edge
108 Morris Street
Satisfaction
90 5 W. Main Street
dr inker bar times_a_week
Ben Satis actif on 2
Dan The Edge 1
Dan Satis actif on 2
name address dri ernk bar t imes_a_week
The Edge 108 Morris Street Ben
Satisfaction
2
The Edge 108 Morris Street Dan The Edge 1
The Edge 108 Morris Street Dan Satis actif on 2
Satis actif on 905 W. Main Street Ben Satis actif on 2
Satis actif on 905 W. Main Street Dan The Edge 1
Satis actif on 905 W. Main Street Dan Satis actif on 2
Join example
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
24
Duke CS, Fall 2021 24Duke CS, Fall 2021
•Theta Join
•Equi-Join
•Natural Join
•Lat ( t/er, lef right) outer join, semi-join
Jo Ty sin pe
13
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
25
Duke CS, Fall 2021 25Duke CS, Fall 2021
•Input: two tables 𝑅and 𝑆
•N atiot on: 𝑅𝑆(i.e bsc ). no su ript
•P pour se: relate rows from two tabl des, an
•Enfor equ be a n umce ality tween identic lly amed col ns
•Eliminate one copy of identica n umlly amed col ns
•Shorthand for 𝜋 (L R P𝑆) , where
𝑝equates each pair of columns commonto 𝑅and 𝑆
𝐿is theunion of column names from 𝑅and 𝑆(with
duplicate columns re vemo d)
De vri ed operator:natural join
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
26
Duke CS, Fall 2021 26Duke CS, Fall 2021
Serves 𝐿𝑖𝑘𝑒𝑠
=𝜋?(𝑆𝑒𝑟𝑣𝑒𝑠 ? 𝐿𝑖𝑘𝑒𝑠)
=𝜋bar, beer, price, drinker (𝑆𝑒𝑟𝑣𝑒𝑠 Serves.beer = Likes.beer 𝐿𝑖𝑘 𝑒𝑠)
Serves Likes
Natu Onl onral Join is on beer. y e
column forbeer in th oute put
Wha at h ppens if t tablhe es
h twave o or more common columns?
bar bee r price
The Edge Budweiser 2.50
The Edge Corona 3.00
Satisfaction Budweiser 2.25
drinker beer
Amy Corona
Dan Budweiser
Dan Corona
Ben Budweiser
bar bee r price dr rinke
The Edge Budweiser 2.5 0 Dan
The Edge Budweiser 2.5 0 Ben
The Edge Corona 3 .00 Amy
The Edge Corona 3 .00 Dan
... …. ..
Serves 𝐿𝑖𝑘𝑒𝑠
Natural join example
14
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
27
Duke CS, Fall 2021 27Duke CS, Fall 2021
•Input: two tables 𝑅and 𝑆
•N atiot on: 𝑅𝑆
𝑅and 𝑆must have identical schema
•Ou utp t:
•Hasthesame schema as 𝑅and 𝑆
•Conta al r atins l ows in 𝑅and all rows in 𝑆(with duplic e
rows removed)
Example on board
Important for set o tiopera ns:
U pat litynion Com ibi
Union
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
28
Duke CS, Fall 2021 28Duke CS, Fall 2021
•Input: two tables 𝑅and 𝑆
•N atiot on: 𝑅−:𝑆
𝑅and 𝑆must have identical schema
•Ou utp t:
•Hasthesame schema as 𝑅and 𝑆
•Conta al r tins l ows in 𝑅 hat are notin 𝑆
Example on board
Important for set o tiopera ns:
U pat litynion Com ibi
Diffe ceren
15
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
29
Duke CS, Fall 2021 29Duke CS, Fall 2021
•Input: two tables 𝑅and 𝑆
•N atiot on: 𝑅𝑆
𝑅and 𝑆must have identical schema
•Ou utp t:
•Hasthesame schema as 𝑅and 𝑆
•Conta al r boins l ows that are in th 𝑅and 𝑆
•How can you write it using other ope to ?ra rs
•Shorthand for 𝑅− ( 𝑅 𝑆)
•Also eq vaui lent t )o 𝑆− ( 𝑆 𝑅
•And to 𝑅 𝑆
Important for set o tiopera ns:
U pat litynion Com ibi
De vri ed operator:intersect nio
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
30
Duke CS, Fall 2021 30Duke CS, Fall 2021
•I tabnput: a le 𝑅
•Nota oti n: 𝜌S𝑅, 𝜌(A1,'A2 ,…)𝑅, or 𝜌S(A1,A 2 ,…)
R
•P and columurpose: “rename” a table /or its ns
•O tabutput: a le with the same rows as 𝑅,but called
differe lynt
•Used to
•Avoid nfusion caused co by identical column names
•Creat tice iden al column na jomes for natural ins
•As with all other relational operators, it doesn’t modify the
database
•Think of the r e enamed table as a copy of th original
Renam gin
16
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
31
Duke CS, Fall 2021 31Duke CS, Fall 2021
Fr ntseque
•Find drinke ors wh
freque o hnt b th “T e
Edge” and “Satis faction
𝜋drinker 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 Bar=%‘The%Edge%
Bar%=%‘Satisfaction%
drinker%=%drinker
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠
WRONG!
𝜋uid1
𝜌d1,%b1,%t1 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠
b1%=%‘The%Edge’b2%=%Satisfaction’%%d1=d2
𝜌d2,%b2,%t2%%𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠
Rename!
dri ernk bar times_a_week
Ben Satis actif on 2
Dan The Edge 1
Dan Satis actif on 2
Renaming example
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
32
Duke CS, Fall 2021
𝐵𝑎𝑟 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠
𝜋address
bar=name
𝜎drinker&=&‘Dan’
•Find addresses of all bars that
‘Dan’ frequents
Fr tsequen
𝜋address(𝐵𝑎𝑟 bar&=&name&(𝜎drinker&=‘Dan𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠))
Equivalent to
What if you move 𝜎to the top?
Still correct?
More or less efficient?
name address
The Edge 108 Morris Street
S f onatis acti 905 W. Main Street
drinker bar t imes_a_week
Ben Satis actif on 2
Dan The Edge 1
Dan Satis actif on 2
Bar Also called logical Plan tree
Expression tree notation
17
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
33
Duke CS, Fall 2021 33Duke CS, Fall 2021
•Sele ict on:𝜎
P
𝑅
•Projection:𝜋
L
𝑅
•Cro pross duct:𝑅×𝑆
•Union: 𝑅 𝑆
•Difference: 𝑅−:𝑆
•Renaming: 𝜌S A1
, A2,… 𝑅
•Does not really add “processing” power
Su cmmary of ore operators
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
34
Duke CS, Fall 2021 34Duke CS, Fall 2021
•Join: 𝑅
P
𝑆
•Natural join: 𝑅 𝑆
•Inte ctirse on:𝑅 𝑆
•Many more
•Semijoin, anti-semijoin, quotient, …
Su demmary of rived operators
18
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
35
Duke CS, Fall 2021
•Bars that drinkers N. Duke Strin address “300 eet”
do notfrequent
Frequents(drinker, bar, times_o kf_wee )
Bar(na a ress)me, dd
Drink (naer me, address)
Exercise
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
36
Duke CS, Fall 2021
•Fo ea ba f ke freque mer ch r, indthe drin rs who nt it max no. ti s a
w kee
41
Frequents(drinker, bar, times_o kf_wee )
Bar(na a ress)me, dd
Drink (naer me, address)
A trickier Exercise
19
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
37
Duke CS, Fall 2021 37Duke CS, Fall 2021
Expressions in a Single Assignment
•Example: the theta-join R3 = R1 CR2 can be
written: R3 := σC(R1 R2)Χ
•Precedence of relational operators:
1. [σ, , ] (highest)π ρ
2. [Χ, ]
3.
4. [ —],
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
38
Duke CS, Fall 2021
Find names of sailors who’ve reserved boat #103
Sailors(sid, sname, rating, age)
Boats(bid, bname, color)
Reserves(sid, bid, day)
20
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
39
Duke CS, Fall 2021 39Duke CS, Fall 2021
Find sailors who’ve reserved red a or a green
boat
•Can identify all red or green boats, then find sailors
who’ve reserved one of these boats:
Can also define Te m pb o at s using union.Tr y the “AND”
version yourself
Use of rename operation
Sailors(sid, sname, rating, age)
Boats(bid, bname, color)
Reserves(sid, bid, day)
International University, VNU-HCMC
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
40
Duke CS, Fall 2021 40Duke CS, Fall 2021
•Not supported as aprimitive operator, but useful for
expressing queries like:
Find sailors who have reserved
all
boats.
•Let A have have2fields, xand y;B only field y:
i.e., A/B contains all x tuples (sailors) such that for
every y tuple (boat) in B, there is an xy tuple in A.
Or:If the set of yvalues (boats) associated with an xvalue
(sailor) in Acontains all yvalues in B,the xvalue is in A/B.
A B/ = p
x(A)-
p
x ((
p
x ( )A ´B)-A)
Division

Preview text:

International University, VNU-HCMC
School of Computer Science and Engineering
Lecture 2: Relational Model and Algebra
Instructor: Nguyen Thi Thuy Loan
nttloan@hcmiu.edu.vn nthithuyloan@gmail , .com https://nttloan.wordpress.com/
International University, VNU-HCMC Course Website •Blackboard IU
•Please check frequently for updates!
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 2 2 1
International University, VNU-HCMC Outline •Relational model •Relational algebra
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 3 3
International University, VNU-HCMC Acknowledgement
•The following slides are referenced from Dr. Sudeepa Roy, Duke University.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 4 4 2
International University, VNU-HCMC Edgar F. Codd (1923-2 3) 00
•Pilot in the Royal Air Forcein WW2 •Inventor of the relati a on l model an al d gebra whileat IBM •Turing Award, 1981 RDBMS = Relati a on l DBMS
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
http://en.wikipedia.org/wiki/File:Edgar_F_Co .jpg dd Duke CS, Fall 2021 5 5
International University, VNU-HCMC
The famous “Beers” database Bars Drinkers Frequent Bars Each has an address “X” times a week BarsServe Beers At price “Y” Drinkers Likes Beers Drinkers Beers
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Each has an address
(Later in ER diagram – how to Each has a brewer design a relational database) Duke CS, Fall 2021 6 6 3
International University, VNU-HCMC
See online database for more tuples
“Beers” as a Relational Database Bar Serves bar beer price n ame address The Edge Budweiser 2.50 The Edge 108 Morris Street The Edge Corona 3.00 S f atis o acti n 905 W. Main Street Satisfaction Budweiser 2.25 Beer dri er nk bar times_a_week Name brewer Ben Satisfaction 2 Budweiser Anheuser-Busch Inc. Dan The Edge 1 Corona Grupo Modelo Dan Satisfaction 2 Dixie Dixie ewing Br Frequents Drinker dri er nk beer name address Amy Corona Amy 100 W. Main Street Dan Budweiser Ben 101 W. Main Street
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Dan Corona Likes Dan 300 N. Duke Street Ben Budweiser S, Fall 2021 7
International University, VNU-HCMC Relational data model
•A database is a collection of relations(or tables)
•Each relation has a set of attributes (or columns)
•Each attribute has a name and adomain (or type)
•Set-valued attributes are not allowed
•Each relation contains a “set” of tuples (or rows)
•Each tuple has a value for each attribute of the relation
•Duplicate tuples are not allowed (Two tuples are duplicates if they agree on all attributes) •Ordering of rows es
do n’t matter (even though output is always in some order) Serves bar beer price •However, SQL s o upp rts “bag” The Edge Budweiser 2.50 or duplicate tuples (w ?) hy The Edge Corona 3.00 Simplicity is a virtue!
Assoc. Prof. Nguyen Thi Thuy Loan, PhD S f atis on acti Budweiser 2.25 Duke CS, Fall 2021 8 8 4
International University, VNU-HCMC Schema vs. instance Serves bar b eer p rice The Edge Budweiser 2.50 The Edge Corona 3.00 Satisfaction Budweiser 2.25 Beer Frequents Name brewer dri er nk bar times_a_week Budweiser Anheuser-Busch Inc. Ben Satisfaction 2 Corona Grupo Modelo Dan The Edge 1 Dixie Dixie ewing Br Dan Satisfaction 2
•Ordering of rows doesn’t matter (even though output is
Assoc. Prof. Nguyen Thi Thuy Loan, PhD always in some order) Duke CS, Fall 2021 9 9
International University, VNU-HCMC Schema vs. instance •Schema (metadata)
•Specifies the logical structure of data •Is defined at setup time •Rarely changes •Instance •Represents the data content
•Changes rapidly, but always conforms to the schema
Compare to types vs. collections of objects of these types in a programming language
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 10 10 5
International University, VNU-HCMC Example •Schema (metadata)
•Beer(name string, brewerstring)
•Serves(bar string, beerstring, price float)
•Frequents (drinker string, bar string, times_a_weekint) •Instance •Beer {, ,…} •Serves{,,..} •Frequents {,,…}
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 1 1 1 1
International University, VNU-HCMC Relational algebra
Alanguage for querying relational data based on “operators” RelOp RelOp •Core operators:
•Selection, projection, cross product, union, difference, and renaming •Additi al on , derived operators:
•Join, natural join, intersection, etc.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
•Compose operators to make complex queries Duke CS, Fall 2021 12 12 6
International University, VNU-HCMC Selection •Input: a table 𝑅 •Notation: 𝜎P𝑅
•𝑝is called aselection condition (orpredicate)
•Purpose: filter rows according to some criteria
•Output: same columns as 𝑅,but only rows of 𝑅that satisfy 𝑝 (𝑠𝑒𝑡!)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 13 13
International University, VNU-HCMC Selection example Find beers with price < 5 2.7
Serves 𝝈𝒑𝒓𝒊𝒄𝒆 < 𝟐.𝟕𝟓 Serves bar beer price bar beer price The Edge Budweiser 2.50 The Edge Budweiser 2.50 The Edge Corona 3.00 Satisfactio n Budweiser 2.25 S f atis on acti Budweiser 2.25
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 14 14 7
International University, VNU-HCMC More on selection
•Selection condition can include any column of 𝑅, constants,
comparison (=,≤, etc.) and B lean oo connectives (∧: and, : ∨ or, ¬: not)
•Example: Serves tuples for “The Edge” or price >= 5 2.7
𝜎bar=&‘The&Edge’&∨&price&³2.75𝑆𝑒𝑟𝑣𝑒𝑠
•You must be able to evaluate the condition over each single rowof the input table!
•Example: the most expensive beer at any bar
𝜎price&³every&price&in&Servers 𝑈𝑠𝑒𝑟 WRONG! Serves bar beer p rice The Edge Budweiser 2.50 The Edge Corona 3.00
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Satisfaction Budweiser 2.25 Duke CS, Fall 2021 15 15
International University, VNU-HCMC Projection •Input: a table 𝑅 •Notation: 𝜋L𝑅
•𝐿is a list of columns in 𝑅
•Purpose: output chosen columns
•Output: same rows, but onlythecolumnsin 𝐿 (𝑠𝑒𝑡!)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 16 16 8
International University, VNU-HCMC Projection
Example: Find all the prices for each beer
Serves 𝝅𝒃𝒆𝒆𝒓,𝒑𝒓𝒊𝒄𝒆 Serves bar beer price The Edge Budweiser 2.50 beer p rice The Edge Corona 3.00 Budweiser 2.50 Satisfaction Budweiser 2.25 Corona 3.00 Budweiser 2.25 Output of 𝜋beerServes?
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 17 17
International University, VNU-HCMC More on Projection
•Duplicate output rows are removed (by definition) •Example: beer on servers 𝝅 Serves 𝒃𝒆𝒆𝒓 Serves bar beer p rice beer The Edge Budweiser 2.50 Budweiser The Edge Corona 3.00 Corona S f atis on acti Budweiser 2.25 Budweiser
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 18 18 9
International University, VNU-HCMC Cross product
•Input: two tables 𝑅and 𝑆 •Natation: 𝑅×𝑆
•Purpose: pairs rows from two tables
•Output: for each row 𝑟in 𝑅and each 𝑠in 𝑆, output
a row 𝑟𝑠 (concatenation of 𝑟and 𝑠)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 19 19
International University, VNU-HCMC Cross product Bar Frequents name address drinker bar times_a_week The Edge 108 Morris Street Ben Satisf action 2 Dan The Edge 1 Satisfaction 905 W. Main Street Dan Satisf action 2 Bar x Frequents n ame address drink er bar t imes_a_ w eek The Edge 108 Morris Street Ben Satisfactio n 2 The Edge 108 Morris Street Dan The Edge 1 The Edge 108 Morris Street Dan Satisfactio n 2 Satisfaction 905 W. Main Street Ben Satisfactio n 2 Satisfaction 905 W. Main Street Dan The Edge 1 Satisfaction 905 W. Main Street Dan Satisfactio n 2
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 20 20 10
International University, VNU-HCMC Cross product
•Ordering of columns is unimportant as far as contents are concerned. n ame address dr ink bar t im dr inker bar time n ame address er es_ s_a_ a_w w eek eek Th e Edge 10 Mo 8 rris Street Ben S atisfaction 2 Ben Satisfactio n 2 The Edge 10 Mo 8 rris Street Th e Edge 10 Mo 8 rris Street Dan The Edge 1 Dan The Edge 1 The Edge 10 Mo 8 rris Street Th e Edge 10 Mo 8 rris Street Dan S atisfaction 2 = Dan Satisfactio n 2 The Edge 10 Mo 8 rris Street Satisfaction 90 5 W. Main Street Ben S atisfaction 2 Ben Satisfactio n 2
Satisfactio n 905 W. Main Str eet Satisfaction 90 5 W. Main Street Dan The Edge 1 Dan The Edge 1
Satisfactio n 905 W. Main Str eet Satisfaction 90 5 W. Main Street Dan S atisfaction 2 Dan Satisfactio n 2
Satisfactio n 905 W. Main Str eet
• So cross product is commutative, i.e., for any R and Assoc. Pr hD
S, R X S = S X R (up to the ordering of columns) Duke CS, Fall 2021 21 21
International University, VNU-HCMC Derived operator:join
(Also known as “theta-join”: most general joins) One of themost important
•Input: two tables 𝑅and𝑆 operati s! on •Notation:𝑅⋈P𝑆
•𝑝is called a join condition (or predicate)
•Purpose: relate rows from two tables according to some criteria
•Output: for each row 𝑟in 𝑅and each row 𝑠in 𝑆,
output a row 𝑟𝑠 if 𝑟and 𝑠satisfy 𝑝
•Shorthand for 𝜎P(R×𝑆) Assoc. Prof. Nguyen •Predicate Thi p Thuy Loan, PhD
only has equality (A = 5 ∧B = 7) : equijoin Duke CS, Fall 2021 22 22 11
International University, VNU-HCMC Join example •Exten Freque d
nts relation with addresses of the bars
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 ⋈bar%=%name 𝐵𝑎𝑟
Ambiguous attribute? Prefix a column reference with table name and “.” to
disambiguate identically named columns from different tables. Ex. Use Bar. am n e Bar Frequents name address dr inker bar times_a_week The Edge 108 Morris Street Ben Satisfaction 2
Satisfaction 90 5 W. Main Street Dan The Edge 1 Dan Satisfaction 2 name address dri er nk bar times_a_week The Edge 108 Morris Street Ben Satisfaction 2 The Edge 108 Morris Street Dan The Edge 1 The Edge 108 Morris Street Dan Satisfaction 2 Satisfaction 905 W. Main Street Ben Satisfaction 2 Assoc. Prof. Nguyen Satisfacti Thi on Thuy Loan, PhD 905 W. Main Street Dan The Edge 1 Satisfaction 905 W. Main Street Dan Satisfaction 2 23
International University, VNU-HCMC Join Types •Theta Join •Equi-Join •Natural Join •Later, ( t/
lef right) outer join, semi-join
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 24 24 12
International University, VNU-HCMC Derived operator:natural join
•Input: two tables 𝑅and 𝑆
•Notation: 𝑅⋈𝑆(i.e. no subscript)
•Purpose: relate rows from two tables, and
•Enforce equality between identically named columns
•Eliminate one copy of identically named columns
•Shorthand for 𝜋L( R ⋈P𝑆) , where
•𝑝equates each pair of columns commonto 𝑅and 𝑆
•𝐿is theunion of column names from 𝑅and 𝑆(with duplicate columns removed)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 25 25
International University, VNU-HCMC Natural join example
Serves ⋈ 𝐿𝑖𝑘𝑒𝑠
=𝜋?(𝑆𝑒𝑟𝑣𝑒𝑠 ⋈? 𝐿𝑖𝑘𝑒𝑠)
=𝜋bar, beer, price, drinker (𝑆𝑒𝑟𝑣𝑒𝑠 ⋈Serves.beer = Likes.beer 𝐿𝑖𝑘𝑒𝑠) Serves Likes bar bee r price drinker beer The Edge Budweiser 2.50 Amy Corona The Edge Corona 3.00 Dan Budweiser Satisfaction Budweiser 2.25 Dan Corona Ben Budweiser
Serves ⋈ 𝐿𝑖𝑘𝑒𝑠 bar bee r price drinker The Edge Budweiser 2.5 0 Dan Natural Join is on beer Onl . on y e The Edge Budweiser 2.5 0 Ben column forbeer in the output The Edge Corona 3 .00 Amy What happens if the tables The Edge Corona 3 .00 Dan
have two or more common columns? ... …. …..
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 26 26 13
International University, VNU-HCMC Union
•Input: two tables 𝑅and 𝑆 Important for set operations: •Notation: Union Compatibility 𝑅∪𝑆
•𝑅and 𝑆must have identical schema •Output:
•Hasthesame schema as 𝑅and 𝑆
•Contains all rows in 𝑅and all rows in 𝑆(with duplicate rows removed) Example on board
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 27 27
International University, VNU-HCMC Difference
•Input: two tables 𝑅and 𝑆 Important for set operations: •Notation: 𝑅−:𝑆 Union Compatibility
•𝑅and 𝑆must have identical schema •Output:
•Hasthesame schema as 𝑅and 𝑆
•Contains all rows in 𝑅that are notin 𝑆 Example on board
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 28 28 14
International University, VNU-HCMC Derived operator:intersection
•Input: two tables 𝑅and 𝑆 Important for set operations: •Notation: 𝑅∩𝑆 Union Compatibility
•𝑅and 𝑆must have identical schema •Output:
•Hasthesame schema as 𝑅and 𝑆
•Contains all rows that are in both 𝑅and 𝑆
•How can you write it using other operators?
•Shorthand for 𝑅− (𝑅− 𝑆)
•Also equivalent to 𝑆− (𝑆− 𝑅) •And to 𝑅 ⋈ 𝑆
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 29 29
International University, VNU-HCMC Renaming •Input: a table 𝑅
•Notation: 𝜌S𝑅, 𝜌(A1,'A2 ,…)𝑅, or 𝜌S(A1,A2 ,…) R
•Purpose: “rename” a table and/or its columns
•Output: a table with the same rows as 𝑅,but called differently •Used to
•Avoid confusion caused by identical column names
•Create identical column names for natural joins
•As with all other relational operators, it doesn’t modify the database
•Think of the renamed table as a copy of the original
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 30 30 15
International University, VNU-HCMC Renaming example Frequents •Find drinkers who dri e nk r bar times_a_week frequent both “The Ben Satisfaction 2 Edge” and “Satisfacti ” on Dan The Edge 1 Dan Satisfaction 2 WRONG!
𝜋drinker 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 ⋈ Bar=%‘The%Edge%∧
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 Bar%=%‘Satisfaction’∧% drinker%=%drinker Rename!
𝜌d1,%b1,%t1 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 𝜋uid1
⋈b1%=%‘The%Edge’∧b2%=%Satisfaction’%∧%d1=d2
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
𝜌d2,%b2,%t2%%𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 Duke CS, Fall 2021 31 31
International University, VNU-HCMC Expression tree notation
•Find addresses of all bars that
What if you move 𝜎to the top? Still correct? ‘Dan’ frequents More or less efficient?
Bar Also called logical Plan tree name address The Edge 108 Morris Street 𝜋address Satisfaction 905 W. Main Street ⋈bar=name Frequents drinker bar t imes_a_week
𝜎drinker&=&‘Dan’ Ben Satisfaction 2 Dan The Edge 1
𝐵𝑎𝑟 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠 Dan Satisfaction 2 Equivalent to
Assoc. Prof. Nguyen Thi Thuy Loan, PhD
𝜋address(𝐵𝑎𝑟 ⋈bar&=&name&(𝜎drinker&=‘Dan’𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑡𝑠)) Duke CS, Fall 2021 32 16
International University, VNU-HCMC Summary of core operators •Selection:𝜎 P𝑅 •Projection:𝜋 L𝑅 •Cross product:𝑅×𝑆 •Union: 𝑅 ∪ 𝑆 •Difference: 𝑅−:𝑆
•Renaming: 𝜌S A1 , A2,… 𝑅
•Does not really add “processing” power
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 33 33
International University, VNU-HCMC Summary of derived operators •Join: 𝑅 ⋈ P 𝑆 •Natural join: 𝑅 ⋈ 𝑆 •Intersection:𝑅 ∩ 𝑆 •Many more
•Semijoin, anti-semijoin, quotient, …
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 34 34 17
International University, VNU-HCMC
Frequents(drinker, bar, times_of_week) Bar(name, address) Drinker(name, address) Exercise
•Bars that drinkers in address “30 N. Duk 0 e Street” do notfrequent
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 35
International University, VNU-HCMC 41
Frequents(drinker, bar, times_of_week) Bar(name, address) Drinker(name, address) A trickier Exercise
•For each bar, findthe drinkers who frequent it max no. times a week
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 36 18
International University, VNU-HCMC
Expressions in a Single Assignment
•Example: the theta-join R3 = R1 ⋈CR2 can be written: R3 := σC(R1 Χ R2)
•Precedence of relational operators: 1. [σ, π, ρ] (highest) 2. [Χ, ⋈] 3. ∩ 4. [∪ —] ,
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 37 37
International University, VNU-HCMC
Find names of sailors who’ve reserved boat #103
Sailors(sid, sname, rating, age) Boats(bid, bname, color) Reserves(sid, bid, day)
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 38 19
International University, VNU-HCMC
Find sailors who’ve reserved a red or a green boat
Sailors(sid, sname, rating, age) Boats(bid, bname, color) Use of rename operation Reserves(sid, bid, day)
•Can identify all red or green boats, then find sailors
who’ve reserved one of these boats:
Can also define Te m p b o a t s using union.Tr y the “AND” version yourself
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 39 39
International University, VNU-HCMC Division
•Not supported as aprimitive operator, but useful for expressing queries like:
Find sailors who have reserved all boats.
•Let A have 2fields, xand y;Bhave only field y:
A/B = p x(A)- p x ((p x (A)´B)-A) –i.e.,
A/B contains all x tuples (sailors) such that for
every y tuple (boat) in B, there is an xy tuple in A.
–Or:If the set of yvalues (boats) associated with an xvalue
(sailor) in Acontains all yvalues in B,the xvalue is in A/B.
Assoc. Prof. Nguyen Thi Thuy Loan, PhD Duke CS, Fall 2021 40 40 20