BT thiết kế và quản trị cơ sở dữ liệu_Assignment| Môn thiết kế và quản trị cơ sở dữ liệu| Trường Đại học Bách Khoa Hà Nội

BT thiết kế và quản trị cơ sở dữ liệu_Assignment| Môn thiết kế và quản trị cơ sở dữ liệu| Trường Đại học Bách Khoa Hà Nội. Tài liệu gồm 9 trang giúp bạn đọc ôn tập và đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem.

Assignment 1
Uploading Data to the Database
Database Management and Tuning
Start date: March 5, 2012
Due date: March 19, 2012, 16:00
Grading: 1 point
Note: This assignment involves reading the documentation of Java, the JDBC
driver, and PostgreSQL. Finding the relevant sources of information is part of the
challenge.
1. Download http://www.inf.unibz.it/dis/teaching/DMT/dblp.zip. This
archive contains two tab separated files (publ.tsv and auth.tsv) that store
authors and their publications as found in the DBLP
1
bibliography. The
imported tables have the following schemas:
Auth(name(49),pubID(129))
Publ(pubID(129),type(13),title(700),booktitle(132),
year(4),publisher(196))
You can assume that all attribute values are strings; the maximum string
length is shown in brackets.
2. The straightforward algorithm to load the data from the TSV file to a table
issues an SQL INSERT query for each line in the TSV file.
Task 1: Implement the straightforward approach to load auth.tsv to the
database (PostgreSQL, Java).
2
Task 2: The straightforward approach is slow. There are other approaches
that are significantly faster. Figure out how the efficient approaches work
and implement two of them.
Report: Describe the two efficient approaches. Give the runtime for loading
auth.tsv with the straightforward and the efficient approaches. Why are
the efficient approaches faster? Which tuning principle did you apply?
Access parameters for PostgreSQL:
host: alcor.inf.unibz.it, port: 5432
user/password: your university login
Please indicate the time that you spent solving this assignment in your report.
The time that you indicate will have no impact on your grade.
1
http://www.informatik.uni-trier.de/~ley/db/
2
This might be very slow. Instead of loading all the date with this approach, you can also load
part of the data and assume that runtime scales linearly. Mention this in the report.
Assignment 2
Query Tuning
Database Management and Tuning
Start date: March 19, 2013
Due date: April 2, 2013, 16:00
Grading: 1 point
In this assignment you will gain hands-on experience in rewriting slow queries and
in experimentally evaluating the rewritten queries.
Task 1: Create a database with the following database schema:
Employee(ssnum,name,manager,dept,salary,numfriends)
unique index on ssnum
unique index on name
index on dept
Student(ssnum,name,course,grade)
unique indexes on ssnum
unique indexes on name
Techdept(dept,manager,location)
unique index on dept
a manager may manage multiple departments
a location may contain multiple departments
Task 2: Fill the database with 100k employees, 100k students, and 10 technical
departments. Only about 10% of the employees are in a technical department.
The types of the attributes should make sense (e.g., ssnum should be an integer),
but the values need not be meaningful (e.g., names can be random strings).
Task 3: Choose two types of queries that might be hard for your database to
optimize. Taking queries from the lecture notes is OK.
NOTE: For at least one of your queries rewriting should make a difference.
Task 4: Rewrite the queries and consult the execution plans of the original and
the rewritten query.
Task 5: Run the original and the rewritten query and measure the runtime.
Report:
Describe your instance (data types, how did you fill the tables?).
Give the original and the rewritten queries.
Show and explain the execution plans.
Report and briefly discuss the runtime results from your experiment.
Please indicate the time that you spent solving this assignment in your report.
The time that you indicate will have no impact on your grade.
Assignment 3
Index Tuning
Database Management and Tuning
Start date: April 12, 2012
Due date: April 26, 2012, 16:00
Grading: 1 points
In this assignment you will study the indexing capabilities of a database management
systems of your choice.
Choose one of the following database management systems:
PostgreSQL 9
Oracle 11g
SQL Server 2008
IBM DB2 UDB V9
Consider the table Employee(ssnum,name,dept,salary), where ssnum is a key. For
the system of your choice answer the following questions.
1. Which index data structures (e.g., B
+
-tree index) are supported?
2. Discuss how the system supports clustered indexes, in particular:
(a) How do you create a clustered index on ssnum? Show the query.
1
(b) Are clustered indexes on non-key attributes supported, e.g., on name?
Show the query.
(c) Is the clustered index dense or sparse?
(d) How does the system deal with overflows in clustered indexes? How is the
fill factor controlled?
(e) Discuss any further characteristics of the system related to clustered in-
dexes that are relevant to a database tuner?
3. Discuss how the system supports non-clustered indexes, in particular:
(a) How do you create a non-clustered index on (dept,salary)? Show the
query.
1
(b) Can the system take advantage of covering indexes? What if the index
covers the query, but the condition is not a prefix of the attribute sequence
(dept,salary)?
1
Give the queries for creating a hash index and a B
+
-tree index if both of them are supported.
(c) Discuss any further characteristics of the system related to non-clustered
indexes that are relevant to a database tuner?
4. If your system supports B
+
-trees, what kind of key compression (if any) does
it support? How large is the default disk page? Can it be changed?
Important: Reference your information sources.
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade.
Assignment 4
Index Tuning
Database Management and Tuning
Start date: April 16, 2013
Due date: April 30, 2013, 16:00
Grading: 1 point
In this assignment you will experiment with indexes using PostgreSQL 8.
1. Download dblp.zip. This archive contains two tab separated files (publ.tsv
and auth.tsv) that store authors and their publications as found in the DBLP
1
bibliography. The imported tables have the following schemas:
Auth(name(49),pubID(129))
Publ(pubID(129),type(13),title(700),booktitle(132),
year(4),publisher(196))
You can assume that all attribute values are strings; the maximum string length
is shown in brackets. Publ.pubID is a key.
2. Compare clustered B
+
-tree, non-clustered B
+
-tree, non-clustered hash index,
and table scan (no index) for the following queries and measure the throughput:
SELECT * FROM Publ WHERE pubID = ...
SELECT * FROM Publ WHERE booktitle = ...
SELECT * FROM Publ WHERE year = ...
(a) Explain your experimental setup, in particular, the conditions that you
used in your queries and the computation of the throughput.
(b) Discuss your observations. Are the results expected? Why (not)?
Notes:
Do not specify primary key, foreign key, or uniqueness constraints when
you create the tables. PostgreSQL automatically creates an index to en-
sure uniqueness, which you want to avoid.
When you measure the throughput and repeat a query, do not to use
the same condition in the WHERE clauses of the repeated queries since the
database might buffer the results.
1
http://www.informatik.uni-trier.de/~ley/db/
To test the non-clustered indexes, cluster the table according to an at-
tribute that is independent of the indexed attribute, e.g., cluster the table
according to title for the condition on year.
3. Study index nested loop join, merge join, and hash join for the following queries:
SELECT name,title
FROM Auth, Publ
WHERE Auth.pubID=Publ.pubID;
SELECT title
FROM Auth, Publ
WHERE Auth.pubID=Publ.pubID AND Auth.name=’Divesh Srivastava’
(a) What join strategies does the system propose if you do not use an index,
with a unique non-clustering index on Publ.pubID, with two clustering
indexes on pubID?
(b) Test the index nested loop join with an index on Publ.pubID, on Auth.pubID,
and both Publ.pubID and Auth.pubID. Give the response times and dis-
cuss the query plans.
(c) Test the merge join without index, with two non-clustering indexes, and
with two clustering indexes. Give response times and discuss the query
plans.
(d) Test the hash join without index and give the response time.
(e) Are the results (query plan and throughput) expected? Why (not)?
Note: You can stop queries that run for more than 10 minutes on alcor.inf.unibz.it.
Check the query plan to avoid queries with excessive runtime.
Notes about PostgreSQL
Clustering indexes: You first create an index, then you use the index to cluster
the table (i.e., physically sort the table by the index attribute):
CREATE INDEX year idx ON publ(year);
ALTER TABLE publ CLUSTER ON year idx;
Query plan: The command EXPLAIN shows the query plan without executing
the query. The command EXPLAIN ANALYSE also executes the query. Example:
EXPLAIN ANALYZE SELECT * FROM publ WHERE year=’2006’;
Join strategy: You can influence the optimizer choice with the switches enable hashjoin,
enable mergejoin, and enable nestloop. Example:
SET enable hashjoin TO true;
SHOW enable hashjoin;
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade.
Assignment 5
Concurrency Control
Database Management and Tuning
Start date: April 30, 2013
Due date: May 14, 2013, 16:00
Grading: 1 point
In this assignment you will explore the concurrency control features of PostgreSQL 8.
A company with 100 employees pays the salaries at the end of the month. The
account of the company (account number 0, initial balance 100) and the accounts of
all employees (account numbers 1 to 100, initial balance 0) are with the same bank.
The payment transactions should run concurrently. Here are two approaches to solve
the problem:
(a) For each employee 1 i 100 run the following transaction:
e SELECT balance FROM Accounts WHERE account=i
UPDATE Accounts SET balance=e + 1 WHERE account=i
c SELECT balance FROM Accounts WHERE account=0
UPDATE Accounts SET balance=c 1 WHERE account=0
(b) For each employee 1 i 100 run the following transaction:
UPDATE Accounts SET balance=balance+1 WHERE account=i
UPDATE Accounts SET balance=balance-1 WHERE account=0
Task 1: Run solution (a) with isolation level READ COMMITTED. Compare throughput
and correctness for different numbers of concurrent transactions, ranging from 1 to 5.
The correctness is defined as (c
1
c
2
)/100, where c
1
and c
2
are the balances of account
0 before and after running all transactions, respectively. Repeat the experiment with
isolation level SERIALIZABLE.
Note: If a query is rolled back, restart it until it commits.
Task 2: As Task 1, but for solution (b).
Task 3: Discuss the outcome and explain the difference between the isolation levels
in PostgreSQL with respect to your experiment. The following information sources
might be useful:
Lecture notes:
http://www.inf.unibz.it/dis/teaching/DMT/dmt09-handout-1x1.pdf
PostgreSQL documentation:
http://www.postgresql.org/docs/8.4/static/transaction-iso.html
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade.
Assignment 6
Transaction Chopping
Database Management and Tuning
Start date: May 14, 2013
Due date: May 28, 2013, 16:00
Grading: 1 point
In this assignment you will tune concurrent transactions by chopping them without
trading in serializability.
1. A bank has two tables, Account(accountID,branch,a balance) which stores
accounts with their branch and their balance, and
Branch(branch,b balance) which stores the balance of each branch.
The following types of transactions run concurrently:
T
1
: Add money to an account and update the corresponding branch bal-
ance. No two transactions add money to the same account.
T
2
: Read an account balance.
T
3
: Compare the balance of each branch with the sum of the account
balances in that branch.
(a) Give the SQL queries (including pseudo code if necessary) for each trans-
action.
(b) Model all transactions with read/write operations.
(c) Show the chopping graph and give the finest possible correct chopping.
(d) How does the chopping change if two concurrent transactions of type T
1
can update the same account? Explain.
(e) The order of the atomic operations in T
3
has an impact on the chopping.
Show two semantically equivalent implementations of T
3
, one which favors
chopping, the other which does not favor chopping. Explain.
2. Given the following transactions:
T
1
: R(a), R(b), W (b), R(e)
T
2
: R(b), R(e)
T
3
: R(a), W (a), R(e)
T
4
: R(a), W (c)
T
5
: R(c)
T
6
: R(c), W (d), W (c), R (b)
1
Find the finest chopping for the concurrent execution of the following transac-
tions and show the respective chopping graphs.
(a) all transactions (i.e., T
1
, T
2
, T
3
, T
4
, T
5
, T
6
)
(b) all transactions except T
4
(i.e., T
1
, T
2
, T
3
, T
5
, T
6
)
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade.
2
| 1/9

Preview text:

Assignment 1 Uploading Data to the Database Database Management and Tuning Start date: March 5, 2012
Due date: March 19, 2012, 16:00 Grading: 1 point
Note: This assignment involves reading the documentation of Java, the JDBC
driver, and PostgreSQL. Finding the relevant sources of information is part of the challenge.
1. Download http://www.inf.unibz.it/dis/teaching/DMT/dblp.zip. This
archive contains two tab separated files (publ.tsv and auth.tsv) that store
authors and their publications as found in the DBLP1 bibliography. The
imported tables have the following schemas: • Auth(name(49),pubID(129))
• Publ(pubID(129),type(13),title(700),booktitle(132), year(4),publisher(196))
You can assume that all attribute values are strings; the maximum string length is shown in brackets.
2. The straightforward algorithm to load the data from the TSV file to a table
issues an SQL INSERT query for each line in the TSV file.
Task 1: Implement the straightforward approach to load auth.tsv to the database (PostgreSQL, Java).2
Task 2: The straightforward approach is slow. There are other approaches
that are significantly faster. Figure out how the efficient approaches work and implement two of them.
Report: Describe the two efficient approaches. Give the runtime for loading
auth.tsv with the straightforward and the efficient approaches. Why are
the efficient approaches faster? Which tuning principle did you apply?
Access parameters for PostgreSQL:
host: alcor.inf.unibz.it, port: 5432
user/password: your university login
Please indicate the time that you spent solving this assignment in your report.
The time that you indicate will have no impact on your grade.
1http://www.informatik.uni-trier.de/~ley/db/
2This might be very slow. Instead of loading all the date with this approach, you can also load
part of the data and assume that runtime scales linearly. Mention this in the report. Assignment 2 Query Tuning Database Management and Tuning Start date: March 19, 2013 Due date: April 2, 2013, 16:00 Grading: 1 point
In this assignment you will gain hands-on experience in rewriting slow queries and
in experimentally evaluating the rewritten queries.
Task 1: Create a database with the following database schema:
• Employee(ssnum,name,manager,dept,salary,numfriends) – unique index on ssnum – unique index on name – index on dept
• Student(ssnum,name,course,grade) – unique indexes on ssnum – unique indexes on name
• Techdept(dept,manager,location) – unique index on dept
– a manager may manage multiple departments
– a location may contain multiple departments
Task 2: Fill the database with 100k employees, 100k students, and 10 technical
departments. Only about 10% of the employees are in a technical department.
The types of the attributes should make sense (e.g., ssnum should be an integer),
but the values need not be meaningful (e.g., names can be random strings).
Task 3: Choose two types of queries that might be hard for your database to
optimize. Taking queries from the lecture notes is OK.
NOTE: For at least one of your queries rewriting should make a difference.
Task 4: Rewrite the queries and consult the execution plans of the original and the rewritten query.
Task 5: Run the original and the rewritten query and measure the runtime. Report:
• Describe your instance (data types, how did you fill the tables?).
• Give the original and the rewritten queries.
• Show and explain the execution plans.
• Report and briefly discuss the runtime results from your experiment.
Please indicate the time that you spent solving this assignment in your report.
The time that you indicate will have no impact on your grade. Assignment 3 Index Tuning Database Management and Tuning Start date: April 12, 2012
Due date: April 26, 2012, 16:00 Grading: 1 points
In this assignment you will study the indexing capabilities of a database management systems of your choice.
Choose one of the following database management systems: • PostgreSQL 9 • Oracle 11g • SQL Server 2008 • IBM DB2 UDB V9
Consider the table Employee(ssnum,name,dept,salary), where ssnum is a key. For
the system of your choice answer the following questions.
1. Which index data structures (e.g., B+-tree index) are supported?
2. Discuss how the system supports clustered indexes, in particular:
(a) How do you create a clustered index on ssnum? Show the query.1
(b) Are clustered indexes on non-key attributes supported, e.g., on name? Show the query.
(c) Is the clustered index dense or sparse?
(d) How does the system deal with overflows in clustered indexes? How is the fill factor controlled?
(e) Discuss any further characteristics of the system related to clustered in-
dexes that are relevant to a database tuner?
3. Discuss how the system supports non-clustered indexes, in particular:
(a) How do you create a non-clustered index on (dept,salary)? Show the query.1
(b) Can the system take advantage of covering indexes? What if the index
covers the query, but the condition is not a prefix of the attribute sequence (dept,salary)?
1Give the queries for creating a hash index and a B+-tree index if both of them are supported.
(c) Discuss any further characteristics of the system related to non-clustered
indexes that are relevant to a database tuner?
4. If your system supports B+-trees, what kind of key compression (if any) does
it support? How large is the default disk page? Can it be changed?
Important: Reference your information sources.
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade. Assignment 4 Index Tuning Database Management and Tuning Start date: April 16, 2013
Due date: April 30, 2013, 16:00 Grading: 1 point
In this assignment you will experiment with indexes using PostgreSQL 8.
1. Download dblp.zip. This archive contains two tab separated files (publ.tsv
and auth.tsv) that store authors and their publications as found in the DBLP1
bibliography. The imported tables have the following schemas: • Auth(name(49),pubID(129))
• Publ(pubID(129),type(13),title(700),booktitle(132), year(4),publisher(196))
You can assume that all attribute values are strings; the maximum string length
is shown in brackets. Publ.pubID is a key.
2. Compare clustered B+-tree, non-clustered B+-tree, non-clustered hash index,
and table scan (no index) for the following queries and measure the throughput:
SELECT * FROM Publ WHERE pubID = ...
SELECT * FROM Publ WHERE booktitle = ...
SELECT * FROM Publ WHERE year = ...
(a) Explain your experimental setup, in particular, the conditions that you
used in your queries and the computation of the throughput.
(b) Discuss your observations. Are the results expected? Why (not)? Notes:
• Do not specify primary key, foreign key, or uniqueness constraints when
you create the tables. PostgreSQL automatically creates an index to en-
sure uniqueness, which you want to avoid.
• When you measure the throughput and repeat a query, do not to use
the same condition in the WHERE clauses of the repeated queries since the
database might buffer the results.
1http://www.informatik.uni-trier.de/~ley/db/
• To test the non-clustered indexes, cluster the table according to an at-
tribute that is independent of the indexed attribute, e.g., cluster the table
according to title for the condition on year.
3. Study index nested loop join, merge join, and hash join for the following queries: SELECT name,title FROM Auth, Publ WHERE Auth.pubID=Publ.pubID; SELECT title FROM Auth, Publ
WHERE Auth.pubID=Publ.pubID AND Auth.name=’Divesh Srivastava’
(a) What join strategies does the system propose if you do not use an index,
with a unique non-clustering index on Publ.pubID, with two clustering indexes on pubID?
(b) Test the index nested loop join with an index on Publ.pubID, on Auth.pubID,
and both Publ.pubID and Auth.pubID. Give the response times and dis- cuss the query plans.
(c) Test the merge join without index, with two non-clustering indexes, and
with two clustering indexes. Give response times and discuss the query plans.
(d) Test the hash join without index and give the response time.
(e) Are the results (query plan and throughput) expected? Why (not)?
Note: You can stop queries that run for more than 10 minutes on alcor.inf.unibz.it.
Check the query plan to avoid queries with excessive runtime. Notes about PostgreSQL
• Clustering indexes: You first create an index, then you use the index to cluster
the table (i.e., physically sort the table by the index attribute):
CREATE INDEX year idx ON publ(year);
ALTER TABLE publ CLUSTER ON year idx;
• Query plan: The command EXPLAIN shows the query plan without executing
the query. The command EXPLAIN ANALYSE also executes the query. Example:
EXPLAIN ANALYZE SELECT * FROM publ WHERE year=’2006’;
• Join strategy: You can influence the optimizer choice with the switches enable hashjoin,
enable mergejoin, and enable nestloop. Example: SET enable hashjoin TO true; SHOW enable hashjoin;
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade. Assignment 5 Concurrency Control Database Management and Tuning Start date: April 30, 2013 Due date: May 14, 2013, 16:00 Grading: 1 point
In this assignment you will explore the concurrency control features of PostgreSQL 8.
A company with 100 employees pays the salaries at the end of the month. The
account of the company (account number 0, initial balance 100) and the accounts of
all employees (account numbers 1 to 100, initial balance 0) are with the same bank.
The payment transactions should run concurrently. Here are two approaches to solve the problem:
(a) For each employee 1 ≤ i ≤ 100 run the following transaction:
e ←SELECT balance FROM Accounts WHERE account=i
UPDATE Accounts SET balance=e + 1 WHERE account=i
c ←SELECT balance FROM Accounts WHERE account=0
UPDATE Accounts SET balance=c − 1 WHERE account=0
(b) For each employee 1 ≤ i ≤ 100 run the following transaction:
UPDATE Accounts SET balance=balance+1 WHERE account=i
UPDATE Accounts SET balance=balance-1 WHERE account=0
Task 1: Run solution (a) with isolation level READ COMMITTED. Compare throughput
and correctness for different numbers of concurrent transactions, ranging from 1 to 5.
The correctness is defined as (c1−c2)/100, where c1 and c2 are the balances of account
0 before and after running all transactions, respectively. Repeat the experiment with isolation level SERIALIZABLE.
Note: If a query is rolled back, restart it until it commits.
Task 2: As Task 1, but for solution (b).
Task 3: Discuss the outcome and explain the difference between the isolation levels
in PostgreSQL with respect to your experiment. The following information sources might be useful: • Lecture notes:
http://www.inf.unibz.it/dis/teaching/DMT/dmt09-handout-1x1.pdf • PostgreSQL documentation:
http://www.postgresql.org/docs/8.4/static/transaction-iso.html
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade. Assignment 6 Transaction Chopping Database Management and Tuning Start date: May 14, 2013 Due date: May 28, 2013, 16:00 Grading: 1 point
In this assignment you will tune concurrent transactions by chopping them without trading in serializability.
1. A bank has two tables, Account(accountID,branch,a balance) which stores
accounts with their branch and their balance, and
Branch(branch,b balance) which stores the balance of each branch.
The following types of transactions run concurrently:
• T1: Add money to an account and update the corresponding branch bal-
ance. No two transactions add money to the same account.
• T2: Read an account balance.
• T3: Compare the balance of each branch with the sum of the account balances in that branch.
(a) Give the SQL queries (including pseudo code if necessary) for each trans- action.
(b) Model all transactions with read/write operations.
(c) Show the chopping graph and give the finest possible correct chopping.
(d) How does the chopping change if two concurrent transactions of type T1
can update the same account? Explain.
(e) The order of the atomic operations in T3 has an impact on the chopping.
Show two semantically equivalent implementations of T3, one which favors
chopping, the other which does not favor chopping. Explain.
2. Given the following transactions:
• T1: R(a), R(b), W (b), R(e) • T2: R(b), R(e) • T3: R(a), W (a), R(e) • T4: R(a), W (c) • T5: R(c)
• T6: R(c), W (d), W (c), R(b) 1
Find the finest chopping for the concurrent execution of the following transac-
tions and show the respective chopping graphs.
(a) all transactions (i.e., T1, T2, T3, T4, T5, T6)
(b) all transactions except T4 (i.e., T1, T2, T3, T5, T6)
Please indicate the time that you spent solving this assignment in your report. The
time that you indicate will have no impact on your grade. 2