Module 10: Databases | Tài liệu môn An toàn thông tin Trường đại học sư phạm kỹ thuật TP. Hồ Chí Minh

A collection of datasets organized as records and columns in tables. In a relational database system, relationships are defined between the database tables. Think of a relational database as a set of data with 1-to-1 and 1-to-many relationships. For example, a database of customers has unique customers (unique is one of concept relationships). Developers use structured query language (SQL) to interact with the database. Tài liệu giúp bạn tham khảo, ôn tập và đạt kết quả cao. Mời bạn đọc đón xem!

Họ tên : Lê Thái Hưng
MSSV : 22110154
Module 10: Databases
Relational database
A collection of datasets organized as records and columns in tables. In a relational database system,
relationships are defined between the database tables. Think of a relational database as a set of data with
1-to-1 and 1-to-many relationships. For example, a database of customers has unique customers (unique
is one of concept relationships). Developers use structured query language (SQL) to interact with the
database.
Amazon Relational Database Service (Amazon RDS)
Amazon RDS lets developers create and manage relational databases in the cloud. Amazon RDS lets
developers track large amounts of data and organize and search through it efficiently.
Amazon DynamoDB
The AWS nonrelational database service. Data is stored in key-value pairs.
Nonrelational database
Also called a "NoSQL" or "Not only SQL" database. Each entry is stored in a key-value pair in which
each key is attached to values. Each entry can have different values attached to a key.
Amazon Redshift
The AWS data-warehousing service that can store massive amounts of data in a way that makes it fast to
query for business intelligence (BI) purposes.
Online transaction processing (OLTP)
A category of data processing that is focused on transaction-oriented(hướng về transaction) tasks. OLTP
typically involves inserting, updating, or deleting small amounts of data in a database.-> liên quan đến
cập nhật database
Transaction-oriented tasks processing of individual transactions refer to tasks that involve the in a
system. A transaction is a unit of work that involves operations like , , or data inserting updating deleting
in a database, and it must be completed fully or not at all (this is known as the "all-or-nothing" principle).
Online analytic processing (OLAP)
A computing method that lets users efficiently and selectively extract(lựa chọn trích) and query data to
analyze it from different points of view(nhiều góc nhìn). -> liên quan đến trích xuất data và tính toán
Amazon Aurora
A relational database (hệ quản trị/công cụ quản lí) compatible(tương thích) with MySQL and engine
PostgreSQL, built for the cloud, combining the performance and availability of traditional enterprise
databases with the simplicity and cost-effectiveness of open-source databases. -> exterprise database
+open-source databases
MySQL
An open-source relational database management system.
OLTP and OLAP
Many different types of databases are available. To decide which type of database you need, it is
important to know how the data will be processed. There are two types of data processing: online
transaction processing (OLTP) and online analytic processing (OLAP).
OLAP operations are primarily read-only; that means they read the data and perform various types of
aggregation as sum, group, and sort. Relational database management systems have built-in functions for
performing these types of operations.-> OLAP trong relationship database perform rất fast và effectively
In a nonrelational database, the values must be extracted from the key-value pairs, which can be a time-
intensive process(tiến trình xử lí mất thời gian hơn vì phải trích value từ key).
Aggregation : handle data records and return a result that has been caculated.
built-in functions : những hàm được được code sẵn.
OLAP systems(hệ thống OLAP) are often used where the system is required to process a lot of related
data, perhaps to generate business reports. Companies often need to analyze a lot of data points that have
occurred over a long period of time to determine trends and predict behaviors.
This type of system doesn’t necessarily need to be a real-time system—it can run as a background
process. For example, in an ecommerce system, an OLAP system could run in the background without
impacting the user experience. Today, it’s more common to see relational databases (especially large-
scale, columnar data stores) rather than nonrelational databases being used for OLAP.
Real-time system : hệ thống phải hoạt động thật đúng thời gian.
OLTP operations, however, in addition to(thay vì) reading it , also need to update the database. Updating
can involve adding, changing, or deleting values. Updating can become complex because many of the
tables in a relational database are virtual. That is, the tables need to be combined in real time from
nonvirtual tables. -> các virtual table phải được combined ngay khi các non-virtual table khác được cập
nhật.
Relational database are virtual : nghĩa là virtual table được tạo ra bởi hàm join , join bởi những
non-virtual table khác.
Example why the tables need to be combined in real time from nonvirtual tables:
A department store database has tables that contain information about customers and products. The
customer table has data relating only to customers such as name and address. The product table has data
relating only to products such as name and price. To record information about purchases, a purchase
table must be created that has a combined primary key that includes both customer-ID and product-ID,
showing how much of a certain product a particular customer purchased.
To display a complete readout of the purchase, the customer table and the product table must be
combined in real time with the purchase table to show things such as customer name, product name, how
much of the product was purchased, and the cost of the sale. The type of operation that combines tables
in real time is called a JOIN. The result of a JOIN is a virtual table and, in most cases, it cannot be
updated directly.
OLTP systems are often used where the system is required to handle large volumes of transactions at a
high rate. Many ecommerce systems, such as shopping carts, sell a large number of items during the
checkout process while simultaneously removing the items from the inventory table. When the integrity
of the entire transaction is critical, and when processing needs to happen in near-real time, companies
should consider OLTP systems.
OLTP systems are not exclusively relational databases, even though there are relationships in the data. It’s
becoming more common for nonrelational databases to enforce(thực thi) constraints and enable
transactions, so that these databases can be used as OLTP systems. -> OLTP không chỉ dành riêng cho
relationship database mà non-relationship database có thể sử dụng OLTP
Finally, integrity(tính toàn vẹn) considerations must be handled in a relational database. In the example, if
a product needs to be deleted from the product table, there must be rules to make sure that references to
the product are also handled. These types of rules are known as integrity and consistency rules.
Applications of OLTP-> thực hiện các transaction->update table, database,…một cách toàn vẹn ,
đúng đắn,real-time
Entering orders online
Processing purchases
Storing customer details
Applications of OLAP -> tính toán dữ liệu -> đưa ra recommend và predict
Analyzing shopping patterns to make recommendations
Tracking purchasing trends for targeted advertisement
Analyzing seasonal buying trends to make sure items are in stock
Relationship database nên sử dụng OTLP,OLAP.
Non-relationship database chỉ nên sử dụng OLTP. Kh có cấu trúc nên khi trích xuất nhiều dữ liệu cần tốn
time
AWS database services
Amazon RDS is the classic relational database that uses SQL, Oracle, Aurora, or other similar database
systems. Amazon RDS is useful for companies that are storing a moderate amount of data that is uniform
in structure, meaning each unique ID (such as student name) is attached to the same number of data points
(grades).
Amazon RDS is primarily used for OLTP because it has better methods for maintaining the integrity and
consistency of the database when processing data.
DynamoDB is a nonrelational database, meaning that you can’t use traditional systems such as SQL or
Aurora. Each item in the database is stored as a key-value pair or a JavaScript Object Notation (JSON)
file. This means that each row can have a different number of columns. The entries do not all have to be
matched in the same way. This permits flexibility in processing that works well for blogging, gaming, and
advertising.
Aurora is a relational database engine->hệ quản trị nâng cấp tương thích với MySQL và PostgreSQL.
that is specifically made to work with the AWS Cloud. Aurora is up to five times faster than standard
MySQL databases and three times faster than standard PostgreSQL databases. It is designed to provide
the security, availability, and reliability of commercial databases at one-tenth the cost.-> chi phí bằng
1/10.
Aurora is fully managed by Amazon RDS, which automates time-consuming administrative tasks such as
hardware provisioning, database setup, patching, and backups.-> Amazon RDS tự động làm những công
việc quản trị tiêu tốn thời gian.
Amazon Redshift is a fast, fully managed data warehouse that makes it efficient and cost effective to
analyze all your data using standard SQL and your existing BI tools.
Questions
1. This module covers different types of databases or tables that store data entries. What are
some real-world uses of databases? Why are they useful? When have you used or seen a
database in your own life?
Social Media Platform,eCommerce Website,Email Service,Hotel Booking System,….
https://www.redswitches.com/blog/examples-of-databases/
It help to manage data more effectively than using traditional file, modify update more consistency.
I used database when I want to store my data of my Hotel Booking System Winform Project.
2. NoSQL databases like the ones used in DynamoDB store a set of values with a key in what is
called a key-value pair. A key-value pair is a set of two linked data items: a key, which is an
identifier for the item of data, and the value, which is the identity or location of the data.
Can you think of anything else that is generally found in a key-value pair? Why is the key-
value pairing a useful way to organize ideas or data points? If you were creating key-value
pairs to sort your music, picture, or video libraries, what would be some of the values you
would want to store?
A key-value pair is a set of two linked data items: a key, which is an identifier for the item of data, and the
value, which is the identity or location of the data. Metadata(thông tin của thông tin) is generally found in
a key-value pair
{
"SongA": {
"artist": "Artist1",
"album": "Album1",
"year": 2020,
"genre": "Pop",
"metadata": {
"dateAdded": "2023-09-15",
"fileSize": 5120
}
},
the key-value pairing a useful way to organize ideas or data points because : khi cần lưu trữ dữ liệu lớn
mở rộng liên tục nên sử dụng non-relational database , giả sử như trong relational-database khi muốn thay
đổi mở rộng gì đó những table sẽ bị ảnh hưởng lẫn nhau do có cấu trúc tham chiếu ràng buộc chặt chẽ ->
rất khó để nâng cấp mở rộng .
Khi thay đổi cập nhật attribute trong table cũng sẽ gặp khó khăn
Khi một table gặp sự cố sẽ ảnh hưởng đến những table khác
Độ phức tạp của non-relational database thấp hơn
https://uk.indeed.com/career-advice/career-development/key-value-pair
Amazon Redshift is a data warehousing service. A data warehouse is a central repository of
information that can be analyzed to make better-informed decisions. It is a database specially
designed for data analytics, which involves reading large amounts of data to understand
relationships and trends across the data. A database is used to capture and store data, such as
recording details of a transaction. What types of businesses do you think would benefit from a data
warehousing service and how would they use data warehousing to improve their business decisions?
Types of businesses do I think would benefit from a data warehousing service is Retalier,…
How would they use data warehousing to improve their business decisions is :
Data warehouse systems can be used to identify which products are selling best to avoid stock-outs, etc.
Customer data, which can be used to create personalized shopping experiences and marketing and PR
strategies.
https://www.existbi.com/blog/see-top-13-benefits/
Module 11: Load Balancers and Caching
Amazon ElastiCache
A web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. The
service improves the performance of web applications by letting you retrieve information from fast,
managed, in-memory caches, instead of relying on slower disk-based databases.-> lưu dữ liệu trên cache
memỏy -> chỉ request đến database 1 lần để tải dữ liệu lên cache memory
Cache
In computing, a cache is a high-speed data storage layer that stores a subset of data, typically transient in
nature, so that future requests for that data are served up faster than is possible by accessing the data’s
primary storage location.
Subset of data : tập con của dữ liệu -> một phần dữ liệu được trích ra từ một dữ liệu lớn
transient in nature : mang tính chất tạm thời
Data caching
Storing data in a cache lets you efficiently reuse previously retrieved or computed data. The data in a
cache is generally stored in fast-access hardware such as random access memory (RAM) and can also be
used with a software component.
software component manages and optimizes the use of the cache for performance and efficiency
What data should go into the cache.
When data should be removed or refreshed.
How to interact with the cache (e.g., fetching data from cache, invalidating cache).
Elastic Load Balancing
Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such
as Amazon Elastic Compute Cloud (Amazon EC2) instances, containers, IP addresses, and AWS Lambda
functions. If traffic to a website suddenly spikes, that traffic can be routed to other EC2 instances (or other
types of instances such as Lambda instances) that have been established in advance for this purpose(được
triển khai để dự phòng cho spikes). This load balancing avoids a single server being overloaded because
of increased traffic routed to it.
Spikes : đột biến -> ám chỉ đến việc tăng traffic đột biến
Random access memory (RAM)
Volatile, temporary memory storage. This is the data that is held temporarily while a machine is in use;
however, once the machine is powered off or the task is completed, this data goes away. Virtual memory
is stored in the read-only memory (ROM) as a supplement(là sự cung cấp RAM ảo) to RAM when there
is not enough temporary memory available.
There is many ways to handle data on computer,one of the most common is read-only data that needs to
be presented quickly to large number of users such as videos,musics that are streaming to the world. This
type of data is rarely(hiếm khi) updated or deleted, but there is a large volume of it(read-only data),and
the demand of it can fluctuate(dao động) dramatically(đáng kể) -> requests có lúc ít có lúc nhiều ( videos
and musics that are going to viral). Because the need of this type of access is becoming so popular,AWS
provides tools for handling it. The tools can retrieve data rapidly(nhanh chóng) and distribute data across
multiple servers in respone(server respones) to peaks and valleys of demand-> giống đồ thị giao động
traffic có lúc lên cao(peaks) có lúc thấp(valleys) and do it in a cost-effective way that only charges for
usage(chỉ khi nào traffic cao -> run mới tính phí)
Applications and websites often provide a range of data and services to users. Within this wide range of
data, there is often a smaller subset of data that is requested and accessed more often. This might be the
data on the front page that is shown to every visitor (think Amazon’s top 10 products of the day) or it
might be a recently released piece of media that is having a spike in popularity (a new song released on
Spotify).
Other applications run processes that are extremely memory intensive that might suffer from performance
problems on a slower storage drive.
For this type of heavily requested or memory-intensive data, a data caching service such as ElastiCache
can help to ensure that the data can be accessed and processed extremely quickly. It works by storing the
data in extremely fast but temporary memory that is faster than disk-based storage. The trade-off is that
the fast memory has less storage space and does not store the data permanently.
Many companies use ElatisChe to build real-time apps,speed up ecommerce,and cache their website.
Internet-scale applications: Real-time apps in gaming, ride hailing, media streaming, dating, and social
media need fast data access.
Amazon ElastiCache: Blazing fast in-memory data store for use as a database, cache, message broker, and
queue. Store ephemeral data in-memory for sub-millisecond response.
Use cases: real-time transactions, chat, BI and analytics, session store, gaming leaderboards, and cache.
Blazing : extremely fast,very quick
Message Broker: Manages and routes messages between systems.
Queue: Temporarily stores messages for processing in the correct order.
Heavy traffic can shut down apps and websites (downtime) if the server can not handle the load. This is
why AWS has ELB (Elastic load balance) ,which can detect there are too many requests and
automatically divert(chuyển hướng) traffic into a new server to maintain speed and stability. 3 types of
ELB :
Application Load Balancer : loading balancing of Hypertext Transfer Protocol (HTTP) and Secure
(HTTPS) traffic and provides advanced request routing targeted at the delivery of modern application
architectures including microservices and containers.( đưa ra yêu cầu định tuyến đã được xác định nghĩa
là xác định vị trí của server hoặc resource nào đó mà những gói tin HTTP hoặc HTTPS sẽ đến , khái
niệm này nằm ở tầng vận chuyển của kiến trúc modern application) -> có thể định tuyến được đến
targeted resource nhờ vào content của request. Operating at invidual request level (Layer 7) , Application
Load Balancer routes trafic to targets within Amazon Virtual Private Cloud(Amazon VPC) based on the
content of request. Application Load Balancer balancing is done based on the content of uniform resource
location(URL).
For example, if the URL ends in /main,the request will be routed to one instance; if the URL ends in
blog/,it will be routed to a different instance . if the work of Application Load Balancer has been done in
advance(sớm hơn thời gian dự tính)1 to make it happen.
Network Load Balancer: Network Load Balancer is best suited for load balancing of Transmission
Control Protocol (TCP), User Datagram Protocol (UDP), and Transport Layer Security (TLS) traffic
where extreme performance is required. Operating at the connection level (Layer 4), Network Load
Balancer routes traffic to targets within Amazon VPC and is capable of handling millions of requests per
second while maintaining ultra-low latencies.
Network Load Balancer is also optimized to handle sudden and volatile traffic patterns.
Because of the increased speed that can be achieved at the connection layer, the Network Load Balancer
type of load balancing is more desirable when trying to avoid higher volumes of network traffic. For
example, to avoid delay when interest in a website goes viral, you would choose to use Network Load
Balancer balancing.
Classic Load Balancer : basic loading across multiple EC2 instances , operates at the request and
connection levels. Classic Load Balancer is used for applications that were built within the EC2-Classic
network.
Three computers are accessing content in the AWS Cloud. A load balancer splits this access between AZ
A and AZ B . Each zone has 3 instance EC2 bit in Zone A 1 instance is not functioning.
1. Is there anything you have done so often that it has become automatic for you or you can do
it without thinking? What actions fit this category? How do you think this relates to data
caching?
Typing keyboard without need of looking the key boards
I can work more effectively , faster ,provide more workload, reduce unnecessary work like looking to the
key board.
2. This module is about load balancing. What strategies or tools do you use to balance your
responsibilities and life? Why is it important to have a way to maintain balance?
My strategies is keep retaining good habit
Keeping balance in life is important because it helps you avoid stress, feel happier, and do better in
everything you do. Just like caching saves time by reusing data, a balanced life helps you use your time
and energy wisely, so you can recharge and handle your tasks better. This balance also supports personal
growth and better relationships.
3. Data caching is crucial for parts of websites and apps that need to be processed or retrieved
very quickly. Remember that because the cache is a snapshot of the data on a server, it is
not updated immediately when the data changes. What are some examples of data in
websites or apps that you think should be cached? Why?
read-only data that needs to be presented quickly to large number of users such as
videos,musics,APIResponse,Search Results,… that are streaming to the world
Because it improve performance and user satisfaction, making it essential for websites and apps with
large audiences.
Module 12: Elastic Beanstalk and CloudFormation
AWS Elastic Beanstalk
Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing,
automatic scaling, and application health monitoring of an application. In many ways, using Elastic
Beanstalk is like running a macro or a batch file that places a wrapper around an existing application so
that it runs smoothly in the Amazon Web Services (AWS) Cloud.
A macro is an action or a set of actions that you can use to automate tasks
A batch file is a file that stores to be executed in a serial order.script commands
AWS CloudFormation
This service gives developers and businesses an easy way to create a collection of related AWS resources
and provision them in an orderly and predictable fashion. CloudFormation provides a means for
combining a stack of AWS services, similar to writing macros or batch files in Linux or Microsoft
Windows.
Khá giống AWS Elastic Beanstalk nhưng Elastic Beanstalk sài UI để provision,config,… còn
CloudFormation thì là mình viết code và AWS sẽ set up tất cả dựa trên code mình viết
Orderly and Predictable fashion là gì :
CloudFormation handles the between services, ensuring that resources are provisioned in dependencies
the correct order. For example, it might ensure that an EC2 instance is created only after a security group
or VPC is set up.
This ensures a , so that each time you create the same stack, the resources are predictable outcome
provisioned in the same order and state.
Stack
A collection of AWS resources that you can manage as a single unit. You can create, update, or delete a
collection of resources by creating, updating, or deleting stacks.
Elastic Beanstalk
Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services
developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as
Apache, Nginx, Passenger, and IIS.
You upload your code and Elastic Beanstalk automatically handles the deployment, from capacity
provisioning, load balancing, and automatic scaling to application health monitoring. At the same time,
you retain full control over the AWS resources powering your application and can access the underlying
resources at any time.
Benefits of Elastic Beanstalk:
1. Fast and simple to begin
Elastic Beanstalk is the fastest and simplest way to deploy your application on AWS.
2. Developer productivity
Elastic Beanstalk provisions and operates the infrastructure and manages the application stack (platform)
for you, so you don't have to spend the time or develop the expertise(kiến thức chuyên môn).
3. Impossible to outgrow
Elastic Beanstalk automatically scales your application up and down based on your application's specific
need using easily adjustable automatic scaling settings.
4. Complete resource control
You have the freedom to select the AWS resources, such as Amazon Elastic Compute Cloud (Amazon
EC2) instance type, that are optimal for your application.
CloudFormation
CloudFormation provides a common language for you to describe and provision all the infrastructure
resources in your cloud environment. CloudFormation lets you use programming languages or a simple
text file to model and provision, in an automated and secure manner, all the resources needed for your
applications across all AWS Regions and accounts.
Benefits of CloudFormation
1. Model it all.
CloudFormation lets you model your entire infrastructure with a text file or programming languages.
2. Automate and deploy.
CloudFormation provisions your resources in a safe, repeatable manner, letting you build and rebuild your
infrastructure and applications, without having to perform manual actions or write custom scripts.
3. It’s code.
Codifying your infrastructure lets you treat your infrastructure as code.
How Elastic Beanstalk differs from CloudFormation
These services are designed to complement each other. Elastic Beanstalk provides an environment to
easily deploy and run applications in the cloud. It is integrated with developer tools and provides a one-
stop experience for you to manage the life cycle of your applications.
CloudFormation is a convenient provisioning mechanism for a broad range of AWS resources. It supports
the infrastructure needs of many different types of applications such as existing enterprise applications,
legacy applications, and applications built using a variety of AWS resources and container-based solutions
(including those built using Elastic Beanstalk).
Container-based solutions refer to a method of deploying and running applications inside containers
A container isolates an application and its dependencies from the underlying infrastructure (OS,
hardware), ensuring it runs the same in any environment.
Containers are compared to virtual machines (VMs) because they share the host operating lightweight
system’s kernel but run their own applications in isolated processes.
To be clear, Elastic Beanstalk is like running a .bat file and CloudFormation is like writing a .bat file.
Elastic Beanstalk lets developers upload and run their code; it then does all the behind-the-scenes cloud
setup such as launching EC2 instances and attaching elastic block storage. With CloudFormation, you are
basically setting up a template for all of the cloud resources you want to run so that it can all be done at
once and in a repeatable way.
a refers to a method or process that can be executed multiple times with consistent and repeatable way
predictable results
CloudFormation supports Elastic Beanstalk application environments as one of the AWS resource types.
This lets you, for example, create and manage an application hosted by Elastic Beanstalk, along with an
Amazon Relational Database Service (Amazon RDS) database to store the application data. In addition to
RDS DB instances, any other supported AWS resources can be added to the group as well.
CloudFormation allows you to define your Elastic Beanstalk environment (application, compute
resources, scaling policies, etc.) in a template.
Imagine you want to deploy a web application that runs on and requires a Elastic Beanstalk Relational
Database Service (RDS) for storing application data. Instead of manually setting up each service, you
can define everything in a CloudFormation template
Questions
1. Elastic Beanstalk is a service that lets developers upload their applications and
automatically provision all of the needed resources for the application to run smoothly and
efficiently. How do you think this process differs from traditional application deployment
(without the cloud)? Why is this style of deployment beneficial?
First when the development application team have to make a mannual for the deployment application to
set up everything in the on-premise server , sometime the deployment application does something wrong
or confuse something and have to ask the the development application team.
With cloud the development and deployment application is now the same team , plan to provision
resource in cloud easier . -> both them have a standard to follow
https://www.youtube.com/watch?v=3c-iBn73dDE
2. What things do you picture or think of when you hear the name ? Why do Elastic Beanstalk
you think the AWS Cloud service that provides the necessary resources for an uploaded
application is called ?Elastic Beanstalk
The name "Elastic beanstalk" is a reference to the beanstalk that grew all the way up to the clouds in the
fairy tale Jack and the Beanstalk.
Elastic : linh hoạt , co giãn -> refers to scalability-> scale out(tăng instance) nếu traffic lớn , và giảm
instace nếu traffic nhỏ lại.
Beanstalk : You can quickly "plant" your application (upload it), and AWS Elastic Beanstalk takes care of
the heavy lifting (provisioning resources, managing scaling, health monitoring, etc.)—making it easy for
developers to grow their applications without dealing with complex infrastructure setup.
https://en.wikipedia.org/wiki/AWS_Elastic_Beanstalk
3. CloudFormation is a service that lets you create a template to deploy any number of cloud
resources at any time. What are some other industries or processes that use a template to
build or create something quickly? Why is this process beneficial?
processes that use a template to build or create something quickly is docker file or docker image. You
don’t have set up all environments. Just type a text file like CloudFormation.
| 1/17

Preview text:

Họ tên : Lê Thái Hưng MSSV : 22110154 Module 10: Databases Relational database
A collection of datasets organized as records and columns in tables. In a relational database system,
relationships are defined between the database tables. Think of a relational database as a set of data with
1-to-1 and 1-to-many relationships. For example, a database of customers has unique customers (unique
is one of concept relationships). Developers use structured query language (SQL) to interact with the database.
Amazon Relational Database Service (Amazon RDS)
Amazon RDS lets developers create and manage relational databases in the cloud. Amazon RDS lets
developers track large amounts of data and organize and search through it efficiently. Amazon DynamoDB
The AWS nonrelational database service. Data is stored in key-value pairs. Nonrelational database
Also called a "NoSQL" or "Not only SQL" database. Each entry is stored in a key-value pair in which
each key is attached to values. Each entry can have different values attached to a key. Amazon Redshift
The AWS data-warehousing service that can store massive amounts of data in a way that makes it fast to
query for business intelligence (BI) purposes.
Online transaction processing (OLTP)
A category of data processing that is focused on transaction-oriented(hướng về transaction) tasks. OLTP
typically involves inserting, updating, or deleting small amounts of data in a database.-> liên quan đến cập nhật database
Transaction-oriented tasks refer to tasks that involve the processing of individual transactions in a
system. A transaction is a unit of work that involves operations like inserting, updating, or deleting data
in a database, and it must be completed fully or not at all (this is known as the "all-or-nothing" principle).
Online analytic processing (OLAP)
A computing method that lets users efficiently and selectively extract(lựa chọn trích) and query data to
analyze it from different points of view(nhiều góc nhìn). -> liên quan đến trích xuất data và tính toán Amazon Aurora
A relational database engine(hệ quản trị/công cụ quản lí) compatible(tương thích) with MySQL and
PostgreSQL, built for the cloud, combining the performance and availability of traditional enterprise
databases with the simplicity and cost-effectiveness of open-source databases. -> exterprise database +open-source databases MySQL
An open-source relational database management system. OLTP and OLAP
Many different types of databases are available. To decide which type of database you need, it is
important to know how the data will be processed. There are two types of data processing: online
transaction processing (OLTP) and online analytic processing (OLAP).
OLAP operations are primarily read-only; that means they read the data and perform various types of
aggregation as sum, group, and sort. Relational database management systems have built-in functions for
performing these types of operations.-> OLAP trong relationship database perform rất fast và effectively
In a nonrelational database, the values must be extracted from the key-value pairs, which can be a time-
intensive process(tiến trình xử lí mất thời gian hơn vì phải trích value từ key).
Aggregation : handle data records and return a result that has been caculated.
built-in functions : những hàm được được code sẵn.
OLAP systems(hệ thống OLAP) are often used where the system is required to process a lot of related
data, perhaps to generate business reports. Companies often need to analyze a lot of data points that have
occurred over a long period of time to determine trends and predict behaviors.
This type of system doesn’t necessarily need to be a real-time system—it can run as a background
process. For example, in an ecommerce system, an OLAP system could run in the background without
impacting the user experience. Today, it’s more common to see relational databases (especially large-
scale, columnar data stores) rather than nonrelational databases being used for OLAP.
Real-time system : hệ thống phải hoạt động thật đúng thời gian.
OLTP operations, however, in addition to(thay vì) reading it , also need to update the database. Updating
can involve adding, changing, or deleting values. Updating can become complex because many of the
tables in a relational database are virtual. That is, the tables need to be combined in real time from
nonvirtual tables. -> các virtual table phải được combined ngay khi các non-virtual table khác được cập nhật.
Relational database are virtual : nghĩa là virtual table được tạo ra bởi hàm join , join bởi những non-virtual table khác.
Example why the tables need to be combined in real time from nonvirtual tables:
A department store database has tables that contain information about customers and products. The
customer table has data relating only to customers such as name and address. The product table has data
relating only to products such as name and price. To record information about purchases, a purchase
table must be created that has a combined primary key that includes both customer-ID and product-ID,
showing how much of a certain product a particular customer purchased.

To display a complete readout of the purchase, the customer table and the product table must be
combined in real time with the purchase table to show things such as customer name, product name, how

much of the product was purchased, and the cost of the sale. The type of operation that combines tables
in real time is called a JOIN. The result of a JOIN is a virtual table and, in most cases, it cannot be updated directly.

OLTP systems are often used where the system is required to handle large volumes of transactions at a
high rate. Many ecommerce systems, such as shopping carts, sell a large number of items during the
checkout process while simultaneously removing the items from the inventory table. When the integrity
of the entire transaction is critical, and when processing needs to happen in near-real time, companies should consider OLTP systems.
OLTP systems are not exclusively relational databases, even though there are relationships in the data. It’s
becoming more common for nonrelational databases to enforce(thực thi) constraints and enable
transactions, so that these databases can be used as OLTP systems. -> OLTP không chỉ dành riêng cho
relationship database mà non-relationship database có thể sử dụng OLTP
Finally, integrity(tính toàn vẹn) considerations must be handled in a relational database. In the example, if
a product needs to be deleted from the product table, there must be rules to make sure that references to
the product are also handled. These types of rules are known as integrity and consistency rules.
Applications of OLTP-> thực hiện các transaction->update table, database,…một cách toàn vẹn , đúng đắn,real-time  Entering orders online  Processing purchases  Storing customer details
Applications of OLAP -> tính toán dữ liệu -> đưa ra recommend và predict
Analyzing shopping patterns to make recommendations 
Tracking purchasing trends for targeted advertisement 
Analyzing seasonal buying trends to make sure items are in stock
Relationship database nên sử dụng OTLP,OLAP.
Non-relationship database chỉ nên sử dụng OLTP. Kh có cấu trúc nên khi trích xuất nhiều dữ liệu cần tốn time AWS database services
Amazon RDS is the classic relational database that uses SQL, Oracle, Aurora, or other similar database
systems. Amazon RDS is useful for companies that are storing a moderate amount of data that is uniform
in structure, meaning each unique ID (such as student name) is attached to the same number of data points (grades).
Amazon RDS is primarily used for OLTP because it has better methods for maintaining the integrity and
consistency of the database when processing data.
DynamoDB is a nonrelational database, meaning that you can’t use traditional systems such as SQL or
Aurora. Each item in the database is stored as a key-value pair or a JavaScript Object Notation (JSON)
file. This means that each row can have a different number of columns. The entries do not all have to be
matched in the same way. This permits flexibility in processing that works well for blogging, gaming, and advertising.
Aurora is a relational database engine->hệ quản trị nâng cấp tương thích với MySQL và PostgreSQL.
that is specifically made to work with the AWS Cloud. Aurora is up to five times faster than standard
MySQL databases and three times faster than standard PostgreSQL databases. It is designed to provide
the security, availability, and reliability of commercial databases at one-tenth the cost.-> chi phí bằng 1/10.
Aurora is fully managed by Amazon RDS, which automates time-consuming administrative tasks such as
hardware provisioning, database setup, patching, and backups.-> Amazon RDS tự động làm những công
việc quản trị tiêu tốn thời gian.
Amazon Redshift is a fast, fully managed data warehouse that makes it efficient and cost effective to
analyze all your data using standard SQL and your existing BI tools. Questions
1. This module covers different types of databases or tables that store data entries. What are
some real-world uses of databases? Why are they useful? When have you used or seen a database in your own life?
Social Media Platform,eCommerce Website,Email Service,Hotel Booking System,….
https://www.redswitches.com/blog/examples-of-databases/
It help to manage data more effectively than using traditional file, modify update more consistency.
I used database when I want to store my data of my Hotel Booking System Winform Project.
2. NoSQL databases like the ones used in DynamoDB store a set of values with a key in what is
called a key-value pair. A key-value pair is a set of two linked data items: a key, which is an
identifier for the item of data, and the value, which is the identity or location of the data.
Can you think of anything else that is generally found in a key-value pair? Why is the key-
value pairing a useful way to organize ideas or data points? If you were creating key-value
pairs to sort your music, picture, or video libraries, what would be some of the values you would want to store?

A key-value pair is a set of two linked data items: a key, which is an identifier for the item of data, and the
value, which is the identity or location of the data. Metadata(thông tin của thông tin) is generally found in a key-value pair { "SongA": { "artist": "Artist1", "album": "Album1", "year": 2020, "genre": "Pop", "metadata": { "dateAdded": "2023-09-15", "fileSize": 5120 } },
the key-value pairing a useful way to organize ideas or data points because : khi cần lưu trữ dữ liệu lớn
mở rộng liên tục nên sử dụng non-relational database , giả sử như trong relational-database khi muốn thay
đổi mở rộng gì đó những table sẽ bị ảnh hưởng lẫn nhau do có cấu trúc tham chiếu ràng buộc chặt chẽ ->
rất khó để nâng cấp mở rộng .
Khi thay đổi cập nhật attribute trong table cũng sẽ gặp khó khăn
Khi một table gặp sự cố sẽ ảnh hưởng đến những table khác
Độ phức tạp của non-relational database thấp hơn
https://uk.indeed.com/career-advice/career-development/key-value-pair
Amazon Redshift is a data warehousing service. A data warehouse is a central repository of
information that can be analyzed to make better-informed decisions. It is a database specially
designed for data analytics, which involves reading large amounts of data to understand
relationships and trends across the data. A database is used to capture and store data, such as

recording details of a transaction. What types of businesses do you think would benefit from a data
warehousing service and how would they use data warehousing to improve their business decisions?

Types of businesses do I think would benefit from a data warehousing service is Retalier,…
How would they use data warehousing to improve their business decisions is :
Data warehouse systems can be used to identify which products are selling best to avoid stock-outs, etc.
Customer data, which can be used to create personalized shopping experiences and marketing and PR strategies.
https://www.existbi.com/blog/see-top-13-benefits/
Module 11: Load Balancers and Caching Amazon ElastiCache
A web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. The
service improves the performance of web applications by letting you retrieve information from fast,
managed, in-memory caches, instead of relying on slower disk-based databases.-> lưu dữ liệu trên cache
memỏy -> chỉ request đến database 1 lần để tải dữ liệu lên cache memory Cache
In computing, a cache is a high-speed data storage layer that stores a subset of data, typically transient in
nature, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location.
Subset of data : tập con của dữ liệu -> một phần dữ liệu được trích ra từ một dữ liệu lớn
transient in nature : mang tính chất tạm thời Data caching
Storing data in a cache lets you efficiently reuse previously retrieved or computed data. The data in a
cache is generally stored in fast-access hardware such as random access memory (RAM) and can also be
used with a software component.
software component manages and optimizes the use of the cache for performance and efficiency
What data should go into the cache.
When data should be removed or refreshed.
How to interact with the cache (e.g., fetching data from cache, invalidating cache). Elastic Load Balancing
Elastic Load Balancing automatically distributes incoming application traffic across multiple targets, such
as Amazon Elastic Compute Cloud (Amazon EC2) instances, containers, IP addresses, and AWS Lambda
functions. If traffic to a website suddenly spikes, that traffic can be routed to other EC2 instances (or other
types of instances such as Lambda instances) that have been established in advance for this purpose(được
triển khai để dự phòng cho spikes). This load balancing avoids a single server being overloaded because
of increased traffic routed to it.
Spikes : đột biến -> ám chỉ đến việc tăng traffic đột biến
Random access memory (RAM)
Volatile, temporary memory storage. This is the data that is held temporarily while a machine is in use;
however, once the machine is powered off or the task is completed, this data goes away. Virtual memory
is stored in the read-only memory (ROM) as a supplement(là sự cung cấp RAM ảo) to RAM when there
is not enough temporary memory available.
There is many ways to handle data on computer,one of the most common is read-only data that needs to
be presented quickly to large number of users such as videos,musics that are streaming to the world. This
type of data is rarely(hiếm khi) updated or deleted, but there is a large volume of it(read-only data),and
the demand of it can fluctuate(dao động) dramatically(đáng kể) -> requests có lúc ít có lúc nhiều ( videos
and musics that are going to viral). Because the need of this type of access is becoming so popular,AWS
provides tools for handling it. The tools can retrieve data rapidly(nhanh chóng) and distribute data across
multiple servers in respone(server respones) to peaks and valleys of demand-> giống đồ thị giao động
traffic có lúc lên cao(peaks) có lúc thấp(valleys) and do it in a cost-effective way that only charges for
usage(chỉ khi nào traffic cao -> run mới tính phí)
Applications and websites often provide a range of data and services to users. Within this wide range of
data, there is often a smaller subset of data that is requested and accessed more often. This might be the
data on the front page that is shown to every visitor (think Amazon’s top 10 products of the day) or it
might be a recently released piece of media that is having a spike in popularity (a new song released on Spotify).
Other applications run processes that are extremely memory intensive that might suffer from performance
problems on a slower storage drive.
For this type of heavily requested or memory-intensive data, a data caching service such as ElastiCache
can help to ensure that the data can be accessed and processed extremely quickly. It works by storing the
data in extremely fast but temporary memory that is faster than disk-based storage. The trade-off is that
the fast memory has less storage space and does not store the data permanently.
Many companies use ElatisChe to build real-time apps,speed up ecommerce,and cache their website.
Internet-scale applications: Real-time apps in gaming, ride hailing, media streaming, dating, and social media need fast data access.
Amazon ElastiCache: Blazing fast in-memory data store for use as a database, cache, message broker, and
queue. Store ephemeral data in-memory for sub-millisecond response.
Use cases: real-time transactions, chat, BI and analytics, session store, gaming leaderboards, and cache.
Blazing : extremely fast,very quick
Message Broker: Manages and routes messages between systems.
Queue: Temporarily stores messages for processing in the correct order.
Heavy traffic can shut down apps and websites (downtime) if the server can not handle the load. This is
why AWS has ELB (Elastic load balance) ,which can detect there are too many requests and
automatically divert(chuyển hướng) traffic into a new server to maintain speed and stability. 3 types of ELB :
Application Load Balancer : loading balancing of Hypertext Transfer Protocol (HTTP) and Secure
(HTTPS) traffic and provides advanced request routing targeted at the delivery of modern application
architectures including microservices and containers.( đưa ra yêu cầu định tuyến đã được xác định nghĩa
là xác định vị trí của server hoặc resource nào đó mà những gói tin HTTP hoặc HTTPS sẽ đến , khái
niệm này nằm ở tầng vận chuyển của kiến trúc modern application) -> có thể định tuyến được đến
targeted resource nhờ vào content của request. Operating at invidual request level (Layer 7) , Application
Load Balancer routes trafic to targets within Amazon Virtual Private Cloud(Amazon VPC) based on the
content of request. Application Load Balancer balancing is done based on the content of uniform resource location(URL).
For example, if the URL ends in /main,the request will be routed to one instance; if the URL ends in
blog/,it will be routed to a different instance . if the work of Application Load Balancer has been done in
advance(sớm hơn thời gian dự tính)1 to make it happen.
Network Load Balancer: Network Load Balancer is best suited for load balancing of Transmission
Control Protocol (TCP), User Datagram Protocol (UDP), and Transport Layer Security (TLS) traffic
where extreme performance is required. Operating at the connection level (Layer 4), Network Load
Balancer routes traffic to targets within Amazon VPC and is capable of handling millions of requests per
second while maintaining ultra-low latencies.
Network Load Balancer is also optimized to handle sudden and volatile traffic patterns.
Because of the increased speed that can be achieved at the connection layer, the Network Load Balancer
type of load balancing is more desirable when trying to avoid higher volumes of network traffic. For
example, to avoid delay when interest in a website goes viral, you would choose to use Network Load Balancer balancing.
Classic Load Balancer : basic loading across multiple EC2 instances , operates at the request and
connection levels. Classic Load Balancer is used for applications that were built within the EC2-Classic network.
Three computers are accessing content in the AWS Cloud. A load balancer splits this access between AZ
A and AZ B . Each zone has 3 instance EC2 bit in Zone A 1 instance is not functioning.
1. Is there anything you have done so often that it has become automatic for you or you can do
it without thinking? What actions fit this category? How do you think this relates to data caching?
Typing keyboard without need of looking the key boards
I can work more effectively , faster ,provide more workload, reduce unnecessary work like looking to the key board.
2. This module is about load balancing. What strategies or tools do you use to balance your
responsibilities and life? Why is it important to have a way to maintain balance?
My strategies is keep retaining good habit
Keeping balance in life is important because it helps you avoid stress, feel happier, and do better in
everything you do. Just like caching saves time by reusing data, a balanced life helps you use your time
and energy wisely, so you can recharge and handle your tasks better. This balance also supports personal
growth and better relationships.
3. Data caching is crucial for parts of websites and apps that need to be processed or retrieved
very quickly. Remember that because the cache is a snapshot of the data on a server, it is
not updated immediately when the data changes. What are some examples of data in
websites or apps that you think should be cached? Why?

read-only data that needs to be presented quickly to large number of users such as
videos,musics,APIResponse,Search Results,… that are streaming to the world
Because it improve performance and user satisfaction, making it essential for websites and apps with large audiences.
Module 12: Elastic Beanstalk and CloudFormation AWS Elastic Beanstalk
Elastic Beanstalk automatically handles the deployment details of capacity provisioning, load balancing,
automatic scaling, and application health monitoring of an application. In many ways, using Elastic
Beanstalk is like running a macro or a batch file that places a wrapper around an existing application so
that it runs smoothly in the Amazon Web Services (AWS) Cloud.
A macro is an action or a set of actions that you can use to automate tasks
A batch file is a script file that stores commands to be executed in a serial order. AWS CloudFormation
This service gives developers and businesses an easy way to create a collection of related AWS resources
and provision them in an orderly and predictable fashion. CloudFormation provides a means for
combining a stack of AWS services, similar to writing macros or batch files in Linux or Microsoft Windows.
Khá giống AWS Elastic Beanstalk nhưng Elastic Beanstalk sài UI để provision,config,… còn
CloudFormation thì là mình viết code và AWS sẽ set up tất cả dựa trên code mình viết
Orderly and Predictable fashion là gì :
CloudFormation handles the dependencies between services, ensuring that resources are provisioned in
the correct order. For example, it might ensure that an EC2 instance is created only after a security group or VPC is set up.
This ensures a predictable outcome, so that each time you create the same stack, the resources are
provisioned in the same order and state. Stack
A collection of AWS resources that you can manage as a single unit. You can create, update, or delete a
collection of resources by creating, updating, or deleting stacks. Elastic Beanstalk
Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services
developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as
Apache, Nginx, Passenger, and IIS.
You upload your code and Elastic Beanstalk automatically handles the deployment, from capacity
provisioning, load balancing, and automatic scaling to application health monitoring. At the same time,
you retain full control over the AWS resources powering your application and can access the underlying resources at any time. Benefits of Elastic Beanstalk: 1. Fast and simple to begin
Elastic Beanstalk is the fastest and simplest way to deploy your application on AWS. 2. Developer productivity
Elastic Beanstalk provisions and operates the infrastructure and manages the application stack (platform)
for you, so you don't have to spend the time or develop the expertise(kiến thức chuyên môn). 3. Impossible to outgrow
Elastic Beanstalk automatically scales your application up and down based on your application's specific
need using easily adjustable automatic scaling settings. 4. Complete resource control
You have the freedom to select the AWS resources, such as Amazon Elastic Compute Cloud (Amazon
EC2) instance type, that are optimal for your application. CloudFormation
CloudFormation provides a common language for you to describe and provision all the infrastructure
resources in your cloud environment. CloudFormation lets you use programming languages or a simple
text file to model and provision, in an automated and secure manner, all the resources needed for your
applications across all AWS Regions and accounts. Benefits of CloudFormation 1. Model it all.
CloudFormation lets you model your entire infrastructure with a text file or programming languages. 2. Automate and deploy.
CloudFormation provisions your resources in a safe, repeatable manner, letting you build and rebuild your
infrastructure and applications, without having to perform manual actions or write custom scripts. 3. It’s code.
Codifying your infrastructure lets you treat your infrastructure as code.
How Elastic Beanstalk differs from CloudFormation
These services are designed to complement each other. Elastic Beanstalk provides an environment to
easily deploy and run applications in the cloud. It is integrated with developer tools and provides a one-
stop experience for you to manage the life cycle of your applications.
CloudFormation is a convenient provisioning mechanism for a broad range of AWS resources. It supports
the infrastructure needs of many different types of applications such as existing enterprise applications,
legacy applications, and applications built using a variety of AWS resources and container-based solutions
(including those built using Elastic Beanstalk).
Container-based solutions refer to a method of deploying and running applications inside containers
A container isolates an application and its dependencies from the underlying infrastructure (OS,
hardware), ensuring it runs the same in any environment.

Containers are lightweight compared to virtual machines (VMs) because they share the host operating
system’s kernel but run their own applications in isolated processes.

To be clear, Elastic Beanstalk is like running a .bat file and CloudFormation is like writing a .bat file.
Elastic Beanstalk lets developers upload and run their code; it then does all the behind-the-scenes cloud
setup such as launching EC2 instances and attaching elastic block storage. With CloudFormation, you are
basically setting up a template for all of the cloud resources you want to run so that it can all be done at once and in a repeatable way.
a repeatable way refers to a method or process that can be executed multiple times with consistent and predictable results
CloudFormation supports Elastic Beanstalk application environments as one of the AWS resource types.
This lets you, for example, create and manage an application hosted by Elastic Beanstalk, along with an
Amazon Relational Database Service (Amazon RDS) database to store the application data. In addition to
RDS DB instances, any other supported AWS resources can be added to the group as well.
CloudFormation allows you to define your Elastic Beanstalk environment (application, compute
resources, scaling policies, etc.) in a template.
Imagine you want to deploy a web application that runs on Elastic Beanstalk and requires a Relational
Database Service (RDS)
for storing application data. Instead of manually setting up each service, you
can define everything in a CloudFormation template Questions
1. Elastic Beanstalk is a service that lets developers upload their applications and
automatically provision all of the needed resources for the application to run smoothly and
efficiently. How do you think this process differs from traditional application deployment
(without the cloud)? Why is this style of deployment beneficial?

First when the development application team have to make a mannual for the deployment application to
set up everything in the on-premise server , sometime the deployment application does something wrong
or confuse something and have to ask the the development application team.
With cloud the development and deployment application is now the same team , plan to provision
resource in cloud easier . -> both them have a standard to follow
https://www.youtube.com/watch?v=3c-iBn73dDE
2. What things do you picture or think of when you hear the name Elastic Beanstalk? Why do
you think the AWS Cloud service that provides the necessary resources for an uploaded
application is called Elastic Beanstalk
?
The name "Elastic beanstalk" is a reference to the beanstalk that grew all the way up to the clouds in the
fairy tale Jack and the Beanstalk.
Elastic : linh hoạt , co giãn -> refers to scalability-> scale out(tăng instance) nếu traffic lớn , và giảm
instace nếu traffic nhỏ lại.
Beanstalk : You can quickly "plant" your application (upload it), and AWS Elastic Beanstalk takes care of
the heavy lifting (provisioning resources, managing scaling, health monitoring, etc.)—making it easy for
developers to grow their applications without dealing with complex infrastructure setup.
https://en.wikipedia.org/wiki/AWS_Elastic_Beanstalk
3. CloudFormation is a service that lets you create a template to deploy any number of cloud
resources at any time. What are some other industries or processes that use a template to
build or create something quickly? Why is this process beneficial?

processes that use a template to build or create something quickly is docker file or docker image. You
don’t have set up all environments. Just type a text file like CloudFormation.