Chương 12 - Môn Thị trường và các định chế tài chính - Đại Học Kinh Tế - Đại học Đà Nẵng

A distributed database management system (DDBMS) governs the storage and processing of logically related data over interconnected computer systems in which both data and processing are distributed among several sites. Tài liệu giúp bạn tham khảo ôn tập và đạt kết quả cao. Mời bạn đọc đón xem!

Thông tin:
40 trang 1 tháng trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

Chương 12 - Môn Thị trường và các định chế tài chính - Đại Học Kinh Tế - Đại học Đà Nẵng

A distributed database management system (DDBMS) governs the storage and processing of logically related data over interconnected computer systems in which both data and processing are distributed among several sites. Tài liệu giúp bạn tham khảo ôn tập và đạt kết quả cao. Mời bạn đọc đón xem!

27 14 lượt tải Tải xuống
lOMoARcPSD|50032646
Downloaded by Huyen Thu
(hth11@gmail.com)
MoARcPSD|50032646
12-1 T he Evolution of Distributed Database
Management Systems
A distributed database management system (DDBMS) governs the storage and
processing of logically related data over interconnected computer systems in which
both data and processing are distributed among several sites. To understand how and
why the DDBMS is di erent from the DBMS, it is useful to brie y examine the changes
in the business environment that set the stage for the development of the DDBMS.
During the 1970s, corporations implemented centralized database management
systems to meet their structured information needs. e use of a centralized database
required that corporate data be stored in a single central site, usually a mainframe c
omputer. Data access was provided through dumb terminals. e centralized approach,
illustrated in Figure 12.1, worked well to ll the structured information needs of
corporations, but it fell short when quickly moving events required faster response
times and equally quick access to information. e slow progression from information
request to approval to specialist to user simply did not serve decision makers well in a
dynamic environment. What was needed was quick, unstructured access to databases,
using ad hoc queries to generate on-the-spot information.
e last two decades gave birth to a series of crucial social and technological changes
that a ected the nature of the systems and the data they use:
lOMoARcPSD|50032646
Business operations became global; with this change, competition expanded from the
shop on the next corner to the web store in cyberspace.
Customer demands and market needs favored an on-demand transaction style,
mostly based on web-based services.
Rapid social and technological changes fueled by low-cost, smart mobile devices
increased the demand for complex and fast networks to interconnect them. As a
consequence, corporations have increasingly adopted advanced network
technologies as the platform for their computerized solutions. See Chapter 15,
Database Connectivity and Web Technologies, for a discussion of cloud-based
services.
Data realms are converging in the digital world more frequently. As a result,
applications must manage multiple types of data, such as voice, video, music, and
images. Such data tends to be geographically distributed and remotely accessed from
diverse locations via location-aware mobile devices.
e advent of social media as a way to reach new customers and
open new markets has fueled the need to store large amounts of
digital data and created a revolution in the way data is managed
and mined for knowledge. Businesses are looking for new ways to gain business
intelligence through the analysis of vast stores of structured and unstructured data.
ese factors created a dynamic business environment in which companies had to
respond quickly to competitive and technological pressures. As large business units
restructured to form leaner, quickly reacting, dispersed operations, two database
requirements became obvious:
Rapid ad hoc data access became crucial in the quick-response decision-making environment.
Distributed data access was needed to support geographically dispersed business units.
During recent years, these factors became even more rmly entrenched. However, the
way they were addressed was strongly in uenced by the following factors:
e growing acceptance of the Internet as the platform for data access and
distribution. e web is e ectively the repository for distributed data.
e mobile wireless revolution. e widespread use of mobile wireless digital devices
includes smartphones and tablets. ese devices have created high demand for data
access. ey access data from geographically dispersed locations and require varied data
exchanges in multiple formats, such as data, voice, video, music, and pictures.
Although distributed data access does not necessarily imply distributed databases,
performance and failure tolerance requirements o en lead to the use of data
replication techniques similar to those in distributed databases.
e accelerated growth of companies using “applications as a service.” is new type of
service provides remote applications to companies that want to outsource their
application development, maintenance, and operations. e company data is
generally stored on central servers and is not necessarily distributed. Just as with
mobile data access, this type of service may not require fully distributed data
functionality; however, other factors such as performance and failure tolerance o
en require the use of data replication techniques similar to those in distributed
databases.
e increased focus on mobile business intelligence. More and more companies are
embracing mobile technologies within their business plans. As companies use social
networks to get closer to customers, the need for on-the-spot decision making
increases. Although a data warehouse is not usually a distributed database, it does
distributed database
management system
(DDBMS)
A DBMS that supports
a database distributed
across several di erent
sites; a DDBMS
governs the storage
and processing of
logically related data
over interconnected
computer systems in
which both data and
processing functions
are distributed among
several sites.
MoARcPSD|50032646
rely on techniques such as data replication and distributed queries that facilitate
data extraction and integration. (You will learn more about this topic in Chapter 13,
Business Intelligence and Data Warehouses.)
Emphasis on Big Data analytics. e era of mobile communications unraveled an
avalanche of data from many sources and of many types. Today’s customers have
signi cant in uence on the spending habits of communities, and organizations are
investing in ways to harvest such data to “discover” new ways to e ectively and e
ciently reach customers.
At this point, the long-term impact of the Internet and the mobile revolution on
distributed database design and management is just starting to be felt. Perhaps the
success of the Internet and mobile technologies will foster the use of distributed
databases as bandwidth becomes a less troublesome bottleneck. Perhaps the resolution
of bandwidth problems will simply con rm the centralized database standard. In any case,
distributed database concepts and components are likely to nd a place in future database development,
particularly for specialized mobile and location-aware applications.
e distributed database is especially desirable because centralized database manage-
ment is subject to problems such as:
Performance degradation because of a growing number of remote locations over
greater distances.
High costs associated with maintaining and operating large central (mainframe)
database systems and physical infrastructure.
Reliability problems created by dependence on a central site (single point of failure
syndrome) and the need for data replication.
Scalability problems associated with the physical limits imposed by a single location,
such as physical space, temperature conditioning, and power consumption.
Organizational rigidity imposed by the database, which means it might not support
the exibility and agility required by modern global organizations.
e dynamic business environment and the centralized database’s shortcomings
spawned a demand for applications based on accessing data from di erent sources at
multiple locations. Such a multiple-source/multiple-location database environment is
best managed by a DDBMS.
12-2 DDBMS Advantages and Disadvantages
Distributed database management systems deliver several advantages over traditional
systems. At the same time, they are subject to some problems. Table 12.1 summarizes
the advantages and disadvantages associated with a DDBMS.
Distributed databases are being used successfully in many web staples such as
Google and Amazon, but they still have a long way to go before they yield the full
exibility and power they theoretically possess.
e remainder of this chapter explores the basic components and concepts of the
distributed database. Because the distributed database is usually based on the
relational database model, relational terminology is used to explain the basic concepts
and components. Even though some of the most widely used distributed databases are
part of the NoSQL movement (see Chapter 2, Data Models), the basic concepts and
fundamentals of distributed data still apply to them.
lOMoARcPSD|50032646
12-3 Distributed Processing and Distributed
Databases
In distributed processing, a database’s logical processing is shared among two or more physically
independent sites that are connected through a network. For example, the data
input/output (I/O), data selection, and data validation might be performed on one
computer, and a report based on that data might be created on another computer.
A basic distributed processing environment is illustrated in Figure 12.2, which shows
that a distributed processing system shares the database processing chores among three
sites connected through a communications network. Although the database resides at only
one site (Miami), each site can access the data and update the database. e database is
located on Computer A, a network computer known as the database server.
A distributed database, on the other hand, stores a logically related database over two
or more physically independent sites. e sites are connected via a computer
DISTRIBUTED DBMS ADVANTAGES AND DISADVANTAGES
ADVANTAGES
DISADVANTAGES
Increased storage and infrastructure requirements.
distributed processing
Sharing the logical
processing of a database
over two or more sites
connected by a network.
distributed database
A logically related
database that is stored in
two or more physically
independent sites.
Security.
MoARcPSD|50032646
network. In contrast, the distributed processing system uses only a single-site database but shares the
processing chores among several sites. In a distributed database system, a database is composed of
several parts known as database fragments. e database fragments are located at di erent sites and can be
replicated among various sites. Each database fragment is, in turn, managed by its local database process.
An example of a distributed database environment is shown in Figure 12.3.
e database in Figure 12.3 is divided into three database fragments (E1, E2, and E3)
located at di erent sites. e computers are connected through a network system. In a fully distributed
database, the users Alan, Betty, and Hernando do not need to know the name or location of each database
fragment in order to access the database. Also, the
database fragment
A subset of a distributed
database. Although the
fragments may be
stored at di erent sites
within a computer
network, the set of all
fragments is treated as
a single database. See
also horizontal
fragmentation and
vertical fragmentation.
Processor independence.
lOMoARcPSD|50032646
users might be at sites other than Miami, New York, or Atlanta and still be able to
access the database as a single logical unit.
As you examine Figures 12.2 and 12.3, keep the following points in mind:
Distributed processing does not require a distributed database, but a distributed
database requires distributed processing. (Each database fragment is managed by
its own local database process.)
Distributed processing may be based on a single database located on a single
computer. For the management of distributed data to occur, copies or parts of the
database processing functions must be distributed to all data storage sites.
Both distributed processing and distributed databases require a network of
interconnected components.
12-4 Characteristics of Distributed Database Management
Systems
A DDBMS governs the storage and processing of logically related data over
interconnected computer systems in which both data and processing functions are
distributed among several sites. A DBMS must have at least the following functions to
be classi ed as distributed:
Application interface to interact with the end user, application programs, and other
DBMSs within the distributed database
Validation to analyze data requests for syntax correctness
Transformation to decompose complex requests into atomic data request components
Query optimization to nd the best access strategy (which database fragments must be
accessed by the query, and how must data updates, if any, be synchronized?)
Mapping to determine the data location of local and remote fragments
I/O interface to read or write data from or to permanent local storage
Formatting to prepare the data for presentation to the end user or to an application program
Security to provide data privacy at both local and remote databases
Backup and recovery to ensure the availability and recoverability of the database in case
of a failure
DB administration features for the database administrator
Concurrency control to manage simultaneous data access and to ensure data consistency
across database fragments in the DDBMS
Transaction management to ensure that the data moves from one consistent state
to another; this activity includes the synchronization of local and remote
transactions as well as transactions across multiple distributed segments
A fully distributed database management system must perform all of the functions of
a centralized DBMS, as follows:
1. Receive the request of an application or end user.
2. Validate, analyze, and decompose the request. e request might include m athematical and logical
operations such as the following: Select all customers with a balance greater than $1,000. e request
might require data from only a single table, or it might require access to several tables.
3. Map the request’s logical-to-physical data components.
4. Decompose the request into several disk I/O operations.
5. Search for, locate, read, and validate the data.
6. Ensure database consistency, security, and integrity.
7. Validate the data for the conditions, if any, speci ed by the request.
8. Present the selected data in the required format.
In addition, a distributed DBMS must handle all necessary functions imposed by the
distribution of data and processing, and it must perform those additional functions
transparently to the end user. e DDBMS’s transparent data access features are
illustrated in Figure 12.4.
lOMoARcPSD|50032646
e single logical database in Figure 12.4 consists of two database fragments, A1 and A2, located at
Sites 1 and 2, respectively. Mary can query the database as if it were a local database; so
can Tom. Both users “see” only one logical database and do not need to know the names
of the fragments. In fact, the end users do not even need to know that the d atabase is
divided into fragments, nor do they need to know where the fragments are located.
To better understand the di erent types of distributed database scenarios, rst consider
the components of the distributed database system.
12-5 DDBMS Components
e DDBMS must include at least the following components:
Computer workstations or remote devices (sites or nodes) that form the network system.
e distributed database system must be independent of the computer system hardware.
Network hardware and so ware components that reside in each workstation or device. e
network components allow all sites to interact and exchange data. Because the
componentscomputers, operating systems, network hardware, and so onare likely to
be supplied by di erent vendors, it is best to ensure that distributed database functions
can be run on multiple platforms.
Communications media that carry the data from one node to another. e DDBMS must be
communications media-independent; that is, it must be able to support several types of
communications media.
e transaction processor (TP) is the so ware component found in each computer or device
that requests data. e transaction processor receives and processes the application’s
remote and local data requests. e TP is also known as the application processor (AP) or the
transaction manager (TM).
e data processor (DP) is the so ware component residing on each computer or device that
stores and retrieves data located at the site. e DP is also known as the data manager (DM).
A data processor may even be a centralized DBMS.
Figure 12.5 illustrates the placement of the components and the
interaction among them. e communication among TPs and DPs is made
possible through a speci c set of rules, or protocols, used by the DDBMS.
transaction processor
(TP)
In a DDBMS, the
software component on
each computer that
requests data. The TP is
responsible for the
execution and
coordination of all
database requests issued
by a local application
that accesses data on
any DP. Also called
transaction manager
(TM) or application
processor (AP).
application processor
(AP)
See transaction
processor (TP).
transaction manager
(TM)
See transaction
processor (TP).
data processor (DP)
The resident software
component that stores
and retrieves data
through a DDBMS. The
DP is responsible for
managing the local data
in the computer and
coordinating access to
that data. Also known as
data manager (DM).
data manager (DM)
See data processor (DP).
lOMoARcPSD|50032646
e protocols determine how the distributed database system will:
Interface with the network to transport data and commands between DPs and TPs.
Synchronize all data received from DPs (TP side) and route retrieved data to the appropriate TPs
(DP side).
Ensure common database functions in a distributed system. Such functions include
data security, transaction management and concurrency control, data partitioning and
synchronization, and data backup and recovery.
DPs and TPs should be added to the system transparently without a ecting its
operation. A TP and a DP can reside on the same computer, allowing the end user to
access both local and remote data transparently. In theory, a DP can be an
independent centralized DBMS with proper interfaces to support remote access from
other independent DBMSs in the network.
12-6 Levels of Data and Process Distribution
Current database systems can be classi ed on the basis of how process distribution and data distribution
are supported. For example, a DBMS may store data in a single site (using a centralized DB) or in multiple
sites (using a distributed DB), and it may support data processing at one or more sites.
Table 12.2 uses a simple matrix to classify database systems according to data and process
distribution. ese types of processes are discussed in the sections that follow.
In the single-site
processing, single-
site data (SPSD) scenario, all processing is done on a single host computer, and all data is
stored on the host computer’s local disk system.
DATABASE SYSTEMS: LEVELS OF DATA AND PROCESS DISTRIBUTION
ADVANTAGES
SINGLE SITE DATA
MULTIPLE SITE DATA
single-site processing,
s ingle-site data
(SPSD)
A scenario in which all
processing is done on a
single host computer
and all data is stored
on the host computer’s
local disk.
Using Figure 12.6 as an example, you can see that the functions of the TP and DP
are embedded within the DBMS on the host computer. e DBMS usually runs under a
time-sharing, multitasking operating system, which allows several processes to run
concurrently on a host computer accessing a single DP. All data storage and data
processing are handled by a single host computer.
Under the multiple-site processing, single-site data (MPSD) scenario, multiple processes run on di erent
computers that share a single data repository. Typically, the MPSD scenario requires a network le server
running conventional applications that are accessed through a network. Many multiuser accounting
applications running under a personal computer network t such a description (see Figure 12.7). As you
examine Figure 12.7, note that:
e TP on each workstation acts only as a redirector to route all network data requests
to the le server.
e end user sees the le server as just another hard disk. Because only the data
storage input/output (I/O) is handled by the le server’s computer, the MPSD o ers
limited capabilities for distributed processing.
multiple-site
processing, single-
site data (MPSD)
A scenario in which
multiple processes run
on di erent computers
sharing a single data
repository.
e DBMS is on the host
lOMoARcPSD|50032646
communication costs.
e ine ciency of the last condition can be illustrated easily. For example, suppose that
the le server computer stores a CUSTOMER table containing 100,000 data rows, 50 of
which have balances greater than $1,000. Suppose that Site A issues the following SQL
query:
All 100,000 CUSTOMER rows must travel
through the network to be evaluated at Site A. A variation of the multiple-site processing,
single-site data approach is known as client/server architecture. Client/server architecture
is similar to that of the network le server except that all database processing is done at the
server site, thus reducing network tra c. Although both the network le server and the
client/server systems perform multiple-site processing, the client/server system’s
processing is distributed. Note that the network le server approach requires the database
to be located at a single site. In contrast, the client/server architecture is capable of
supporting data at multiple sites.
e multiple-site processing, multiple-site data (MPMD) scenario describes a fully distributed
DBMS with support for multiple data processors and transaction processors at multiple
sites. Depending on the level of support for various types of databases, DDBMSs are classi
ed as either homogeneous or heterogeneous.
Homogeneous DDBMSs integrate multiple instances of the same DBMS over a networkfor example,
multiple instances of Oracle 11g running on di erent platforms. In contrast, heterogeneous DDBMSs
integrate di erent types of DBMSs over a network, but all support the same data model. For example,
Table 12.3 lists several relational database systems that could be integrated within a DDBMS. A fully
heterogeneous DDBMS will support di erent DBMSs, each one supporting a di erent data model, r unning
under di erent computer systems.
client/server
architecture
A hardware and software
system composed of
clients, servers, and
middleware. Features a
user of resources
(client) and a provider
of resources (server).
multiple-site p
rocessing, multiple-
site data (MPMD)
A scenario describing a
fully distributed
database management
system
with support for multiple
data processors and
transaction processors at
multiple sites.
homogeneous
DDBMS
A system that integrates
only one type of
centralized database
management system
over a network.
heterogeneous
DDBMS
A system that integrates
di erent types of
centralized database
management
systems over a
network.
fully h eterogeneous
distributed d atabase
system (fully h
eterogeneous
DDBMS)
A system that integrates
di erent types of
database
management systems
(hierarchical,
network, and
relational) over a
network. It supports di
erent database
management systems
that may even support
di erent data models
running under di erent
computer systems.
SELECT
*
FROM
CUSTOMER
WHERE
CUS_BALANCE > 1000;
DATABASE SYSTEMS: LEVELS OF DATA AND PROCESS DISTRIBUTION
PLATFORM
DBMS
OPERATING SYSTEM
NETWORK COMMUNICATIONS PROTOCOL
Distributed database implementations are better understood as an abstraction
layer on top of a DBMS. is abstraction layer provides additional functionality that
enables support for distributed database features, including straightforward data
links, replication, advanced data fragmentation, synchronization, and integration. In
fact, most database vendors provide for increasing levels of data fragmentation,
replication, and integration. erefore, the support for distributed databases can be
better seen as a continuous spectrum that goes from homogeneous to fully
heterogeneous distributed data management. Consequently, at any point on this
spectrum, a DDBMS is subject to certain restrictions. For example:
Remote access is provided on a read-only basis and does not support write
privileges.
Restrictions are placed on the number of remote tables that may be accessed in a
single transaction.
Restrictions are placed on the number of distinct databases that may be accessed.
Restrictions are placed on the database model that may be accessed. us, access
may be provided to relational databases but not to network or hierarchical databases.
e preceding list of restrictions is by no means exhaustive. e DDBMS t echnology
continues to change rapidly, and new features are added frequently. Managing data at
multiple sites leads to a number of issues that must be addressed and understood. e
next section examines several key features of distributed database management
systems.
12-7 Distributed Database Transparency Features
A distributed database system should provide some desirable transparency features
that make all the system’s complexities hidden to the end user. In other words, the
end user should have the sense of working with a centralized DBMS. For this reason,
the minimum desirable DDBMS transparency features are:
Distribution transparency allows a distributed database to be treated as a single logical database. If a
DDBMS exhibits distribution transparency, the user does not need to know:
e data is partitioned—meaning the table’s rows and columns are split vertically or
horizontally and stored among multiple sites.
e data is geographically dispersed among multiple sites.
e data is replicated among multiple sites.
distribution
transparency
A DDBMS feature that
allows a distributed
database to look like a
single logical database
to an end user.
lOMoARcPSD|50032646
Transaction transparency allows a transaction to update data at more than one network
site. Transaction transparency ensures that the transaction will be either entirely
completed or aborted, thus maintaining database integrity.
Failure transparency ensures that the system will continue to operate in the event of a
node or network failure. Functions that were lost because of the failure will be picked
up by another network node. is is a very important feature, particularly in organizations
that depend on web presence as the backbone for maintaining trust in their business.
Performance transparency allows the system to perform as if it were a centralized DBMS.
e system will not su er any performance degradation due to its use on a network or
because of the network’s platform di erences. Performance transparency also ensures
that the system will nd the most cost-e ective path to access remote data. e system
should be able to “scale outin a transparent manner or increase performance capacity
by adding more transaction or data-processing nodes, without a ecting the overall
performance of the system.
Heterogeneity transparency allows the integration of several di erent local DBMSs
(relational, network, and hierarchical) under a common, or global, schema. e DDBMS is
responsible for translating the data requests from the global schema to the local DBMS
schema. e following sections discuss each of these transparency features in greater
detail.
12-8 Distribution Transparency
Distribution transparency allows a physically dispersed database to be managed as though
it were a centralized database. e level of transparency supported by the DDBMS varies
from system to system. ree levels of distribution transparency are recognized:
Fragmentation transparency is the highest level of distribution transparency. e end user
or programmer does not need to know that a database is partitioned. erefore, neither
fragment names nor fragment locations are speci ed prior to data access.
Location transparency exists when the end user or programmer must specify the
database fragment names but does not need to specify where those fragments are
located.
Local mapping transparency exists when the end user or programmer must specify
both the fragment names and their locations. Transparency features are summarized
in Table 12.4.
SUMMARY OF TRANSPARENCY FEATURES
IF THE SQL STATEMENT REQUIRES:
FRAGMENT
NAME?
LOCATION
NAME?
THEN THE DBMS SUPPORTS
LEVEL OF DISTRIBUTON TRANSPARENCY
transaction
transparency
A DDBMS property that
ensures database
transactions will
maintain the distributed
database’s integrity and
consistency, and that a
transaction will be
completed only when all
database sites involved
complete their part of
the transaction.
failure transparency
A feature that allows
continuous operation
of a DDBMS, even if a
network node fails.
performance
transparency
A DDBMS feature that
allows a system to
perform as though it
were a centralized
DBMS.
heterogeneity
transparency
A feature that allows a
system to integrate
several centralized
DBMSs into one logical
DDBMS.
fragmentation
transparency
A DDBMS feature that
allows a system to treat
a distributed database
as a single database
even though it is divided
into two or more
fragments. location
transparency
A property of a DDBMS
in which database access
requires the user to
know only the name of
the database fragments.
(Fragment locations
need not be known.)
lOMoARcPSD|50032646
To illustrate the use of various transparency levels, suppose you have an EMPLOYEE
table that contains the attributes EMP_NAME, EMP_DOB, EMP_ADDRESS, EMP_
DEPARTMENT, and EMP_SALARY. e EMPLOYEE data is distributed over three di erent
locations: New York, Atlanta, and Miami. e table is divided by location; that is, New
York employee data is stored in fragment E1, Atlanta employee data is stored in
fragment E2, and Miami employee data is stored in fragment E3 (see Figure 12.8).
Now suppose that the end user wants to list all employees born before January 1,
1960. To focus on the transparency issues, also suppose that the EMPLOYEE table is
fragmented and each fragment is unique. e unique fragment condition indicates that
each row is unique, regardless of the fragment in which it is located. Finally, assume
that no portion of the database is replicated at any other site on the network.
Depending on the level of distribution transparency support, you may examine
three query cases.
e query conforms to a nondistributed database query format; that is, it does not s pecify
fragment names or locations. e query reads:
SELECT *
FROM EMPLOYEE
WHERE EMP_DOB < '01-JAN-1979';
Fragment names must
be speci ed in the query, but the fragment’s location is not speci ed. e query reads:
SELECT *
FROM E1
WHERE UNION
EMP_DOB < '01-JAN-1979'
SELECT
*
FROM
E2
local mapping
transparency
A property of a DDBMS
in which database
access requires the user
to know both the name
and location of the
fragments.
unique fragment
In a DDBMS, a condition
in which each row is
unique, regardless of
which fragment it is
located in.
lOMoARcPSD|50032646
WHERE UNION
EMP_DOB < '01-JAN-1979'
SELECT
*
FROM
E3
WHERE
EMP_DOB < '01-JAN-1979'
Both the fragment name and its location must be speci ed in the query. Using pseudo-SQL:
SELECT
*
FROM
El NODE NY
WHERE UNION
EMP_DOB < '01-JAN-1979';
SELECT
*
FROM
E2 NODE ATL
WHERE UNION
EMP_DOB < '01-JAN-1979';
SELECT
*
FROM
E3 NODE MIA
WHERE
EMP_DOB < '01-JAN-1979';
As you examine the preceding query formats, you can see how distribution
transparency a ects the way end users and programmers interact with the database.
Distribution transparency is supported by a distributed data dictionary (DDD) or a
distributed data catalog (DDC). The DDC contains the description of the entire database as
seen by the database administrator. The database description, known as the distributed
global schema, is the common database schema used by local TPs to translate user
requests into subqueries (remote requests) that will be processed by different DPs. The
DDC is itself distributed, and it is replicated at the network nodes. Therefore, the DDC must
maintain consistency through updating at all sites.
Keep in mind that some of the current DDBMS implementations impose limitations on
the level of transparency support. For instance, you might be able to distribute a database,
but not a table, across multiple sites. Such a condition indicates that the DDBMS supports
location transparency but not fragmentation transparency.
12-9 Transaction Transparency
Transaction transparency is a DDBMS property that ensures database
transactions will maintain the distributed database’s integrity and
consistency. Remember that a DDBMS database transaction can update
data stored in many di erent computers connected in a network.
Transaction transparency ensures that the transaction will be
completed only when all database sites involved in the transaction complete their part
of the transaction.
Distributed database systems require complex mechanisms to manage transactions
and ensure the database’s consistency and integrity. To understand how the
transactions are managed, you should know the basic concepts governing remote
requests, remote transactions, distributed transactions, and distributed requests.
distributed data d
ictionary (DDD)
See distributed data
catalog.
distributed data c
atalog (DDC)
A data dictionary that
contains the
description (fragment
names and locations) of
a distributed database.
distributed global
schema
The database schema
description of a
distributed database
as seen by the
database
administrator.
lOMoARcPSD|50032646
Whether or not a transaction is distributed, it is formed by one or more database
requests. e basic di erence between a nondistributed transaction and a distributed
transaction is that the distributed transaction can update or request data from several
di erent remote sites on a network. To better understand distributed transactions,
begin by learning the di erence between remote and distributed transactions, using
the BEGIN WORK and COMMIT WORK transaction format. Assume the existence of
location transparency to avoid having to specify the data location.
A remote request, illustrated in Figure 12.9, lets a single SQL statement access the
data that are to be processed by a single remote database processor. In other words,
the SQL statement (or request) can reference data at only one remote site.
Similarly, a remote transaction, composed of several requests, accesses data at a single
remote site. A remote transaction is illustrated in Figure 12.10. As you examine Figure
12.10, note the following remote transaction features: e transaction updates the PRODUCT
and INVOICE tables (located at Site B). e remote transaction is sent to the remote
Site B and executed there.
1
e details of distributed requests and transactions were originally described by David McGoveran and
Colin White, “Clarifying client/server,” DBMS 3(12), November 1990, pp. 7889.
remote request
A DDBMS feature that
allows a single SQL
statement to access data
in a single remote DP.
remote transaction
A DDBMS feature that
allows a transaction
(formed by several
requests) to access
data in a single remote
DP.
lOMoARcPSD|50032646
DP.
A distributed transaction can reference several di erent local or remote DP sites.
Although each single request can reference only one local or remote DP site, the
transaction as a whole can reference multiple DP sites because each request can
reference a di erent site. e distributed transaction process is illustrated in Figure 12.11.
Note the following features in Figure 12.11:
| 1/40

Preview text:

lOMoARcPSD| 50032646 Downloaded by Huyen Thu (hth11@gmail.com) MoARcPSD| 50032646
12-1 T he Evolution of Distributed Database Management Systems
A distributed database management system (DDBMS) governs the storage and
processing of logically related data over interconnected computer systems in which
both data and processing are distributed among several sites. To understand how and
why the DDBMS is di erent from the DBMS, it is useful to brie y examine the changes
in the business environment that set the stage for the development of the DDBMS.
During the 1970s, corporations implemented centralized database management
systems to meet their structured information needs. e use of a centralized database
required that corporate data be stored in a single central site, usually a mainframe c
omputer. Data access was provided through dumb terminals. e centralized approach,
illustrated in Figure 12.1, worked well to ll the structured information needs of
corporations, but it fell short when quickly moving events required faster response
times and equally quick access to information. e slow progression from information
request to approval to specialist to user simply did not serve decision makers well in a
dynamic environment. What was needed was quick, unstructured access to databases,
using ad hoc queries to generate on-the-spot information.
e last two decades gave birth to a series of crucial social and technological changes
that a ected the nature of the systems and the data they use: lOMoARcPSD| 50032646
Business operations became global; with this change, competition expanded from the distributed database
shop on the next corner to the web store in cyberspace. management system (DDBMS)
Customer demands and market needs favored an on-demand transaction style, A DBMS that supports
mostly based on web-based services. a database distributed across several di erent
Rapid social and technological changes fueled by low-cost, smart mobile devices sites; a DDBMS
increased the demand for complex and fast networks to interconnect them. As a governs the storage
consequence, corporations have increasingly adopted advanced network and processing of
technologies as the platform for their computerized solutions. See Chapter 15, logically related data
Database Connectivity and Web Technologies, for a discussion of cloud-based over interconnected services. computer systems in which both data and
Data realms are converging in the digital world more frequently. As a result, processing functions
applications must manage multiple types of data, such as voice, video, music, and are distributed among
images. Such data tends to be geographically distributed and remotely accessed from several sites.
diverse locations via location-aware mobile devices.
e advent of social media as a way to reach new customers and
open new markets has fueled the need to store large amounts of
digital data and created a revolution in the way data is managed
and mined for knowledge. Businesses are looking for new ways to gain business
intelligence through the analysis of vast stores of structured and unstructured data.
ese factors created a dynamic business environment in which companies had to
respond quickly to competitive and technological pressures. As large business units
restructured to form leaner, quickly reacting, dispersed operations, two database requirements became obvious:
Rapid ad hoc data access became crucial in the quick-response decision-making environment.
Distributed data access was needed to support geographically dispersed business units.
During recent years, these factors became even more rmly entrenched. However, the
way they were addressed was strongly in uenced by the following factors:
e growing acceptance of the Internet as the platform for data access and
distribution. e web is e ectively the repository for distributed data.
e mobile wireless revolution. e widespread use of mobile wireless digital devices
includes smartphones and tablets. ese devices have created high demand for data
access. ey access data from geographically dispersed locations and require varied data
exchanges in multiple formats, such as data, voice, video, music, and pictures.
Although distributed data access does not necessarily imply distributed databases,
performance and failure tolerance requirements o en lead to the use of data
replication techniques similar to those in distributed databases.
e accelerated growth of companies using “applications as a service.” is new type of
service provides remote applications to companies that want to outsource their
application development, maintenance, and operations. e company data is
generally stored on central servers and is not necessarily distributed. Just as with
mobile data access, this type of service may not require fully distributed data
functionality; however, other factors such as performance and failure tolerance o
en require the use of data replication techniques similar to those in distributed databases.
e increased focus on mobile business intelligence. More and more companies are
embracing mobile technologies within their business plans. As companies use social
networks to get closer to customers, the need for on-the-spot decision making
increases. Although a data warehouse is not usually a distributed database, it does MoARcPSD| 50032646
rely on techniques such as data replication and distributed queries that facilitate
data extraction and integration. (You will learn more about this topic in Chapter 13,
Business Intelligence and Data Warehouses.)
Emphasis on Big Data analytics. e era of mobile communications unraveled an
avalanche of data from many sources and of many types. Today’s customers have
signi cant in uence on the spending habits of communities, and organizations are
investing in ways to harvest such data to “discover” new ways to e ectively and e ciently reach customers.
At this point, the long-term impact of the Internet and the mobile revolution on
distributed database design and management is just starting to be felt. Perhaps the
success of the Internet and mobile technologies will foster the use of distributed
databases as bandwidth becomes a less troublesome bottleneck. Perhaps the resolution
of bandwidth problems will simply con rm the centralized database standard. In any case,
distributed database concepts and components are likely to nd a place in future database development,
particularly for specialized mobile and location-aware applications.
e distributed database is especially desirable because centralized database manage-
ment is subject to problems such as:
Performance degradation because of a growing number of remote locations over greater distances.
High costs associated with maintaining and operating large central (mainframe)
database systems and physical infrastructure.
Reliability problems created by dependence on a central site (single point of failure
syndrome) and the need for data replication.
Scalability problems associated with the physical limits imposed by a single location,
such as physical space, temperature conditioning, and power consumption.
Organizational rigidity imposed by the database, which means it might not support
the exibility and agility required by modern global organizations.
e dynamic business environment and the centralized database’s shortcomings
spawned a demand for applications based on accessing data from di erent sources at
multiple locations. Such a multiple-source/multiple-location database environment is best managed by a DDBMS.
12-2 DDBMS Advantages and Disadvantages
Distributed database management systems deliver several advantages over traditional
systems. At the same time, they are subject to some problems. Table 12.1 summarizes
the advantages and disadvantages associated with a DDBMS.
Distributed databases are being used successfully in many web staples such as
Google and Amazon, but they still have a long way to go before they yield the full
exibility and power they theoretically possess.
e remainder of this chapter explores the basic components and concepts of the
distributed database. Because the distributed database is usually based on the
relational database model, relational terminology is used to explain the basic concepts
and components. Even though some of the most widely used distributed databases are
part of the NoSQL movement (see Chapter 2, Data Models), the basic concepts and
fundamentals of distributed data still apply to them. lOMoARcPSD| 50032646
12-3 Distributed Processing and Distributed Databases
In distributed processing, a database’s logical processing is shared among two or more physically
independent sites that are connected through a network. For example, the data
distributed processing input/output (I/O), data selection, and data validation might be performed on one Sharing the logical
processing of a database computer, and a report based on that data might be created on another computer. over two or more sites
A basic distributed processing environment is illustrated in Figure 12.2, which shows
connected by a network. that a distributed processing system shares the database processing chores among three distributed database
sites connected through a communications network. Although the database resides at only A logically
related one site (Miami), each site can access the data and update the database. e database is database that is stored in
two or more physically located on Computer A, a network computer known as the database server. independent sites.
A distributed database, on the other hand, stores a logically related database over two
or more physically independent sites. e sites are connected via a computer
DISTRIBUTED DBMS ADVANTAGES AND DISADVANTAGES ADVANTAGES DISADVANTAGES Security.
Increased storage and infrastructure requirements. MoARcPSD| 50032646 database fragment A subset of a distributed database. Although the fragments may be stored at di erent sites within a computer network, the set of all fragments is treated as a single database. See also horizontal fragmentation and vertical fragmentation. Processor independence.
network. In contrast, the distributed processing system uses only a single-site database but shares the
processing chores among several sites. In a distributed database system, a database is composed of
several parts known as database fragments. e database fragments are located at di erent sites and can be
replicated among various sites. Each database fragment is, in turn, managed by its local database process.
An example of a distributed database environment is shown in Figure 12.3.
e database in Figure 12.3 is divided into three database fragments (E1, E2, and E3)
located at di erent sites. e computers are connected through a network system. In a fully distributed
database, the users Alan, Betty, and Hernando do not need to know the name or location of each database
fragment in order to access the database. Also, the lOMoARcPSD| 50032646
users might be at sites other than Miami, New York, or Atlanta and still be able to
access the database as a single logical unit.
As you examine Figures 12.2 and 12.3, keep the following points in mind:
Distributed processing does not require a distributed database, but a distributed
database requires distributed processing. (Each database fragment is managed by
its own local database process.)
Distributed processing may be based on a single database located on a single
computer. For the management of distributed data to occur, copies or parts of the
database processing functions must be distributed to all data storage sites.
Both distributed processing and distributed databases require a network of interconnected components.
12-4 Characteristics of Distributed Database Management Systems
A DDBMS governs the storage and processing of logically related data over
interconnected computer systems in which both data and processing functions are
distributed among several sites. A DBMS must have at least the following functions to be classi ed as distributed:
Application interface to interact with the end user, application programs, and other
DBMSs within the distributed database
Validation to analyze data requests for syntax correctness
Transformation to decompose complex requests into atomic data request components
Query optimization to nd the best access strategy (which database fragments must be
accessed by the query, and how must data updates, if any, be synchronized?)
Mapping to determine the data location of local and remote fragments
I/O interface to read or write data from or to permanent local storage
Formatting to prepare the data for presentation to the end user or to an application program
Security to provide data privacy at both local and remote databases
Backup and recovery to ensure the availability and recoverability of the database in case of a failure
DB administration features for the database administrator
Concurrency control to manage simultaneous data access and to ensure data consistency
across database fragments in the DDBMS
Transaction management to ensure that the data moves from one consistent state
to another; this activity includes the synchronization of local and remote
transactions as well as transactions across multiple distributed segments
A fully distributed database management system must perform all of the functions of
a centralized DBMS, as follows:
1. Receive the request of an application or end user.
2. Validate, analyze, and decompose the request. e request might include m athematical and logical
operations such as the following: Select all customers with a balance greater than $1,000. e request
might require data from only a single table, or it might require access to several tables.
3. Map the request’s logical-to-physical data components.
4. Decompose the request into several disk I/O operations.
5. Search for, locate, read, and validate the data.
6. Ensure database consistency, security, and integrity.
7. Validate the data for the conditions, if any, speci ed by the request.
8. Present the selected data in the required format.
In addition, a distributed DBMS must handle all necessary functions imposed by the
distribution of data and processing, and it must perform those additional functions
transparently to the end user. e DDBMS’s transparent data access features are illustrated in Figure 12.4. lOMoARcPSD| 50032646
e single logical database in Figure 12.4 consists of two database fragments, A1 and A2, located at
Sites 1 and 2, respectively. Mary can query the database as if it were a local database; so
transaction processor can Tom. Both users “see” only one logical database and do not need to know the names (TP)
of the fragments. In fact, the end users do not even need to know that the d atabase is In a DDBMS, the
divided into fragments, nor do they need to know where the fragments are located. software component on each computer that
To better understand the di erent types of distributed database scenarios, rst consider requests data. The TP is
the components of the distributed database system. responsible for the execution and coordination of all
database requests issued 12-5 DDBMS Components by a local application
e DDBMS must include at least the following components: that accesses data on any DP. Also called
Computer workstations or remote devices (sites or nodes) that form the network system. transaction manager
e distributed database system must be independent of the computer system hardware. (TM) or application processor (AP).
Network hardware and so ware components that reside in each workstation or device. e
application processor network components allow all sites to interact and exchange data. Because the (AP) See transaction
components—computers, operating systems, network hardware, and so on—are likely to processor (TP).
be supplied by di erent vendors, it is best to ensure that distributed database functions transaction manager
can be run on multiple platforms. (TM)
Communications media that carry the data from one node to another. e DDBMS must be See transaction processor (TP).
communications media-independent; that is, it must be able to support several types of data processor (DP) communications media. The resident software
e transaction processor (TP) is the so ware component found in each computer or device component that stores and retrieves data
that requests data. e transaction processor receives and processes the application’s through a DDBMS. The
remote and local data requests. e TP is also known as the application processor (AP) or the DP is responsible for transaction manager (TM). managing the local data in the computer and
e data processor (DP) is the so ware component residing on each computer or device that coordinating access to
stores and retrieves data located at the site. e DP is also known as the data manager (DM). that data. Also known as data manager (DM).
A data processor may even be a centralized DBMS. data manager (DM)
Figure 12.5 illustrates the placement of the components and the
See data processor (DP). interaction among them. e communication among TPs and DPs is made
possible through a speci c set of rules, or protocols, used by the DDBMS. lOMoARcPSD| 50032646
e protocols determine how the distributed database system will:
Interface with the network to transport data and commands between DPs and TPs.
Synchronize all data received from DPs (TP side) and route retrieved data to the appropriate TPs (DP side).
Ensure common database functions in a distributed system. Such functions include
data security, transaction management and concurrency control, data partitioning and
synchronization, and data backup and recovery.
DPs and TPs should be added to the system transparently without a ecting its
operation. A TP and a DP can reside on the same computer, allowing the end user to
access both local and remote data transparently. In theory, a DP can be an
independent centralized DBMS with proper interfaces to support remote access from
other independent DBMSs in the network.
12-6 Levels of Data and Process Distribution
Current database systems can be classi ed on the basis of how process distribution and data distribution
are supported. For example, a DBMS may store data in a single site (using a centralized DB) or in multiple
sites (using a distributed DB), and it may support data processing at one or more sites.
Table 12.2 uses a simple matrix to classify database systems according to data and process single-site processing,
distribution. ese types of processes are discussed in the sections that follow. s ingle-site data (SPSD)
In the single-site A scenario in which all processing is done on a
processing, single- single host computer
site data (SPSD) scenario, all processing is done on a single host computer, and all data is and all data is stored
stored on the host computer’s local disk system. on the host computer’s local disk.
DATABASE SYSTEMS: LEVELS OF DATA AND PROCESS DISTRIBUTION ADVANTAGES SINGLE SITE DATA MULTIPLE SITE DATA e DBMS is on the host
Using Figure 12.6 as an example, you can see that the functions of the TP and DP
are embedded within the DBMS on the host computer. e DBMS usually runs under a
time-sharing, multitasking operating system, which allows several processes to run
concurrently on a host computer accessing a single DP. All data storage and data
processing are handled by a single host computer.
Under the multiple-site processing, single-site data (MPSD) scenario, multiple processes run on di erent
computers that share a single data repository. Typically, the MPSD scenario requires a network le server
running conventional applications that are accessed through a network. Many multiuser accounting
applications running under a personal computer network t such a description (see Figure 12.7). As you
examine Figure 12.7, note that: multiple-site processing, single-
e TP on each workstation acts only as a redirector to route all network data requests site data (MPSD) to the le server. A scenario in which
e end user sees the le server as just another hard disk. Because only the data multiple processes run
storage input/output (I/O) is handled by the le server’s computer, the MPSD o ers on di erent computers sharing a single data
limited capabilities for distributed processing. repository. lOMoARcPSD| 50032646 client/server architecture A hardware and software system composed of clients, servers, and middleware. Features a user of resources communication costs. (client) and a provider of resources (server).
e ine ciency of the last condition can be illustrated easily. For example, suppose that multiple-site p
the le server computer stores a CUSTOMER table containing 100,000 data rows, 50 of rocessing, multiple-
which have balances greater than $1,000. Suppose that Site A issues the following SQL site data (MPMD) query: A scenario describing a fully distributed SELECT * database management FROM CUSTOMER
All 100,000 CUSTOMER rows must travel system WHERE CUS_BALANCE > 1000; with support for multiple data processors and transaction processors at multiple sites. homogeneous DDBMS A system that integrates only one type of centralized database management system over a network. heterogeneous DDBMS A system that integrates di erent types of centralized database management systems over a network. fully h eterogeneous distributed d atabase system (fully h eterogeneous DDBMS)
through the network to be evaluated at Site A. A variation of the multiple-site processing, A system that integrates di erent types of
single-site data approach is known as client/server architecture. Client/server architecture database
is similar to that of the network le server except that all database processing is done at the management systems
server site, thus reducing network tra c. Although both the network le server and the (hierarchical,
client/server systems perform multiple-site processing, the client/server system’s network, and relational) over a
processing is distributed. Note that the network le server approach requires the database network. It supports di
to be located at a single site. In contrast, the client/server architecture is capable of erent database
supporting data at multiple sites. management systems that may even support
e multiple-site processing, multiple-site data (MPMD) scenario describes a fully distributed di erent data models
DBMS with support for multiple data processors and transaction processors at multiple running under di erent computer systems.
sites. Depending on the level of support for various types of databases, DDBMSs are classi
ed as either homogeneous or heterogeneous.
Homogeneous DDBMSs integrate multiple instances of the same DBMS over a network—for example,
multiple instances of Oracle 11g running on di erent platforms. In contrast, heterogeneous DDBMSs
integrate di erent types of DBMSs over a network, but all support the same data model. For example,
Table 12.3 lists several relational database systems that could be integrated within a DDBMS. A fully
heterogeneous DDBMS will support di erent DBMSs, each one supporting a di erent data model, r unning
under di erent computer systems.
DATABASE SYSTEMS: LEVELS OF DATA AND PROCESS DISTRIBUTION PLATFORM DBMS OPERATING SYSTEM
NETWORK COMMUNICATIONS PROTOCOL
Distributed database implementations are better understood as an abstraction
layer on top of a DBMS. is abstraction layer provides additional functionality that
enables support for distributed database features, including straightforward data
links, replication, advanced data fragmentation, synchronization, and integration. In
fact, most database vendors provide for increasing levels of data fragmentation,
replication, and integration. erefore, the support for distributed databases can be
better seen as a continuous spectrum that goes from homogeneous to fully
heterogeneous distributed data management. Consequently, at any point on this
spectrum, a DDBMS is subject to certain restrictions. For example:
Remote access is provided on a read-only basis and does not support write privileges.
Restrictions are placed on the number of remote tables that may be accessed in a single transaction.
Restrictions are placed on the number of distinct databases that may be accessed.
Restrictions are placed on the database model that may be accessed. us, access
may be provided to relational databases but not to network or hierarchical databases.
e preceding list of restrictions is by no means exhaustive. e DDBMS t echnology
continues to change rapidly, and new features are added frequently. Managing data at
multiple sites leads to a number of issues that must be addressed and understood. e
next section examines several key features of distributed database management systems.
12-7 Distributed Database Transparency Features
A distributed database system should provide some desirable transparency features
that make all the system’s complexities hidden to the end user. In other words, the
end user should have the sense of working with a centralized DBMS. For this reason,
the minimum desirable DDBMS transparency features are:
Distribution transparency allows a distributed database to be treated as a single logical database. If a
DDBMS exhibits distribution transparency, the user does not need to know: distribution
– e data is partitioned—meaning the table’s rows and columns are split vertically or transparency
horizontally and stored among multiple sites. A DDBMS feature that allows a distributed
– e data is geographically dispersed among multiple sites. database to look like a
– e data is replicated among multiple sites. single logical database to an end user. lOMoARcPSD| 50032646
Transaction transparency allows a transaction to update data at more than one network transaction
site. Transaction transparency ensures that the transaction will be either entirely transparency
completed or aborted, thus maintaining database integrity. A DDBMS property that ensures database
Failure transparency ensures that the system will continue to operate in the event of a transactions will
node or network failure. Functions that were lost because of the failure will be picked maintain the distributed
up by another network node. is is a very important feature, particularly in organizations database’s integrity and consistency, and that a
that depend on web presence as the backbone for maintaining trust in their business. transaction will be completed only when all
Performance transparency allows the system to perform as if it were a centralized DBMS. database sites involved
e system will not su er any performance degradation due to its use on a network or complete their part of
because of the network’s platform di erences. Performance transparency also ensures the transaction.
that the system will nd the most cost-e ective path to access remote data. e system failure transparency
should be able to “scale out” in a transparent manner or increase performance capacity A feature that allows
by adding more transaction or data-processing nodes, without a ecting the overall continuous operation of a DDBMS, even if a performance of the system. network node fails.
Heterogeneity transparency allows the integration of several di erent local DBMSs performance
(relational, network, and hierarchical) under a common, or global, schema. e DDBMS is transparency A DDBMS feature that
responsible for translating the data requests from the global schema to the local DBMS allows a system to
schema. e following sections discuss each of these transparency features in greater perform as though it detail. were a centralized DBMS. heterogeneity
12-8 Distribution Transparency transparency A feature that allows a
Distribution transparency allows a physically dispersed database to be managed as though system to integrate
it were a centralized database. e level of transparency supported by the DDBMS varies several centralized DBMSs into one logical
from system to system. ree levels of distribution transparency are recognized: DDBMS.
Fragmentation transparency is the highest level of distribution transparency. e end user fragmentation
or programmer does not need to know that a database is partitioned. erefore, neither transparency
fragment names nor fragment locations are speci ed prior to data access. A DDBMS feature that allows a system to treat
Location transparency exists when the end user or programmer must specify the a distributed database
database fragment names but does not need to specify where those fragments are as a single database even though it is divided located. into two or more
Local mapping transparency exists when the end user or programmer must specify fragments. location
both the fragment names and their locations. Transparency features are summarized transparency A property of a DDBMS in Table 12.4. in which database access requires the user to know only the name of the database fragments. (Fragment locations need not be known.)
SUMMARY OF TRANSPARENCY FEATURES
IF THE SQL STATEMENT REQUIRES: FRAGMENT LOCATION THEN THE DBMS SUPPORTS
LEVEL OF DISTRIBUTON TRANSPARENCY NAME? NAME? lOMoARcPSD| 50032646
To illustrate the use of various transparency levels, suppose you have an EMPLOYEE
table that contains the attributes EMP_NAME, EMP_DOB, EMP_ADDRESS, EMP_
DEPARTMENT, and EMP_SALARY. e EMPLOYEE data is distributed over three di erent
locations: New York, Atlanta, and Miami. e table is divided by location; that is, New
York employee data is stored in fragment E1, Atlanta employee data is stored in
fragment E2, and Miami employee data is stored in fragment E3 (see Figure 12.8).
Now suppose that the end user wants to list all employees born before January 1,
1960. To focus on the transparency issues, also suppose that the EMPLOYEE table is
fragmented and each fragment is unique. e unique fragment condition indicates that
each row is unique, regardless of the fragment in which it is located. Finally, assume
that no portion of the database is replicated at any other site on the network.
Depending on the level of distribution transparency support, you may examine three query cases.
e query conforms to a nondistributed database query format; that is, it does not s pecify local mapping
fragment names or locations. e query reads: transparency A property of a DDBMS SELECT * in which database FROM EMPLOYEE access requires the user to know both the name WHERE EMP_DOB < '01-JAN-1979'; and location of the fragments. Fragment names must unique fragment
be speci ed in the query, but the fragment’s location is not speci ed. e query reads: In a DDBMS, a condition in which each row is SELECT * unique, regardless of FROM E1 which fragment it is located in.
WHERE UNION EMP_DOB < '01-JAN-1979' SELECT * FROM E2 lOMoARcPSD| 50032646
WHERE UNION EMP_DOB < '01-JAN-1979' SELECT * FROM E3 WHERE EMP_DOB < '01-JAN-1979'
Both the fragment name and its location must be speci ed in the query. Using pseudo-SQL: SELECT * FROM El NODE NY
WHERE UNION EMP_DOB < '01-JAN-1979'; SELECT * FROM E2 NODE ATL
WHERE UNION EMP_DOB < '01-JAN-1979'; SELECT * FROM E3 NODE MIA distributed data d WHERE EMP_DOB < '01-JAN-1979'; ictionary (DDD)
As you examine the preceding query formats, you can see how distribution See distributed data
transparency a ects the way end users and programmers interact with the database. catalog.
Distribution transparency is supported by a distributed data dictionary (DDD) or a distributed data c
distributed data catalog (DDC). The DDC contains the description of the entire database as atalog (DDC) A data dictionary that
seen by the database administrator. The database description, known as the distributed contains the
global schema, is the common database schema used by local TPs to translate user description (fragment
requests into subqueries (remote requests) that will be processed by different DPs. The names and locations) of
DDC is itself distributed, and it is replicated at the network nodes. Therefore, the DDC must a distributed database.
maintain consistency through updating at all sites. distributed global schema
Keep in mind that some of the current DDBMS implementations impose limitations on The database schema
the level of transparency support. For instance, you might be able to distribute a database, description of a
but not a table, across multiple sites. Such a condition indicates that the DDBMS supports distributed database
location transparency but not fragmentation transparency. as seen by the database 12-9 Transaction Transparency administrator.
Transaction transparency is a DDBMS property that ensures database
transactions will maintain the distributed database’s integrity and
consistency. Remember that a DDBMS database transaction can update
data stored in many di erent computers connected in a network.
Transaction transparency ensures that the transaction will be
completed only when all database sites involved in the transaction complete their part of the transaction.
Distributed database systems require complex mechanisms to manage transactions
and ensure the database’s consistency and integrity. To understand how the
transactions are managed, you should know the basic concepts governing remote
requests, remote transactions, distributed transactions, and distributed requests. lOMoARcPSD| 50032646
Whether or not a transaction is distributed, it is formed by one or more database
requests. e basic di erence between a nondistributed transaction and a distributed
transaction is that the distributed transaction can update or request data from several
di erent remote sites on a network. To better understand distributed transactions,
begin by learning the di erence between remote and distributed transactions, using
the BEGIN WORK and COMMIT WORK transaction format. Assume the existence of
location transparency to avoid having to specify the data location.
A remote request, illustrated in Figure 12.9, lets a single SQL statement access the
data that are to be processed by a single remote database processor. In other words,
the SQL statement (or request) can reference data at only one remote site. remote request
Similarly, a remote transaction, composed of several requests, accesses data at a single A DDBMS feature that
remote site. A remote transaction is illustrated in Figure 12.10. As you examine Figure allows a single SQL
12.10, note the following remote transaction features: e transaction updates the PRODUCT statement to access data in a single remote DP.
and INVOICE tables (located at Site B). e remote transaction is sent to the remote Site B and executed there. remote transaction A DDBMS feature that allows a transaction
1 e details of distributed requests and transactions were originally described by David McGoveran and (formed by several
Colin White, “Clarifying client/server,” DBMS 3(12), November 1990, pp. 78–89. requests) to access data in a single remote DP. lOMoARcPSD| 50032646 DP.
A distributed transaction can reference several di erent local or remote DP sites.
Although each single request can reference only one local or remote DP site, the
transaction as a whole can reference multiple DP sites because each request can
reference a di erent site. e distributed transaction process is illustrated in Figure 12.11.
Note the following features in Figure 12.11: