Tổng hợp bài giảng của thầy Nguyễn Hữu Đức| Bài giảng môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Tổng hợp bài giảng của thầy Nguyễn Hữu Đức| Bài giảng môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội. Tài liệu gồm 674 trang giúp bạn đọc ôn tập và đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem.

Thông tin:
674 trang 3 tháng trước

Bình luận

Vui lòng đăng nhập hoặc đăng ký để gửi bình luận.

Tổng hợp bài giảng của thầy Nguyễn Hữu Đức| Bài giảng môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội

Tổng hợp bài giảng của thầy Nguyễn Hữu Đức| Bài giảng môn quản trị dữ liệu và trực quan hóa| Trường Đại học Bách Khoa Hà Nội. Tài liệu gồm 674 trang giúp bạn đọc ôn tập và đạt kết quả cao trong kỳ thi sắp tới. Mời bạn đọc đón xem.

26 13 lượt tải Tải xuống
Chapter 1
Introduction to data
management and visualization
How big is big data?
3
How big is big data?
4
Data science: The 4th paradigm for scientific
discovery
5
Big data in 2008
6
Big data sources
E-commerce
Social networks
Internet of things
Data-intensive experiments (bioinformatics, quantum
physics, etc)
7
Data is the new oil
8
Big data 5'V
Big data is a term for data sets that are so large or complex that
traditional data processing application software is inadequate to
deal with them (wikipedia)
9
Big data big value
source: wipro.com
10
Introduction to data management
What is Data Management
Data management is the development and execution
of architectures, policies, practices and procedures
in order to manage the information lifecycle needs of
an enterprise in an effective manner
Poor Data Management
94% of companies suffering from a catastrophic data
loss do not survive – 43% never reopen and 51% close
within two years. (University of Texas)
7 out of 10 small firms that experience a major data
loss go out of business within a year. (DTI/Price
Waterhouse Coopers)
50% of all tape backups fail to restore. (Gartner)
25% of all PC users suffer from data loss each year
(Gartner)
Data is a valuable asset it is expensive and time
consuming to collect
Data should be managed to:
o maximize the effective use and value of data and information
assets
o continually improve the quality including data accuracy, integrity,
integration, timeliness of data capture and presentation, relevance
and usefulness
o ensure appropriate use of data and information
o facilitate data sharing
o ensure sustainability and accessibility in long term for re-use in
science
Why Data Management:
Foundation to Advance Science
A new image processing technique reveals something not before seen in this Hubble Space
Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered
with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al.,
Astrophysical Journal Letters
The first thing it tells you is how valuable maintaining long-term archives can be.
Here is a major discovery that’s been lurking in the data for about 10 years!
comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore,
which operates Hubble.
The second thing its tells you is having a well calibrated archive is necessary but not
sufficient to make breakthroughs — it also takes a very innovative group of people to
develop very smart extraction routines that can get rid of all the artifacts to reveal the
planet hidden under all that telescope and detector structure.
Planet hidden in Hubble
archives Science News
(Feb. 27, 2009)
D. Lafrenière et al., ApJ Letters
Data Management Facilitates Sharing
and Re-use…
Where a majority of data end up now…
Imagine if data were more accessible….
Data Life Cycle
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
Planning
Consider data management before you collect data
What kind of data will be collected?
Which methods will be used (sensors, samples, etc.)?
What data formats/standards are appropriate?
How will the data be used?
How will you share the data?
Will your methods satisfy
Funding requirements
Policies for access, sharing, reuse
Budget most of the time tihis is overlooked!
Output
Formal document
| 1/674

Preview text:

Chapter 1 Introduction to data management and visualization How big is big data? 3 How big is big data? 4
Data science: The 4th paradigm for scientific discovery 5 Big data in 2008 6 Big data sources • E-commerce • Social networks • Internet of things
• Data-intensive experiments (bioinformatics, quantum physics, etc) 7 Data is the new oil 8 Big data 5'V
Big data is a term for data sets that are so large or complex that
traditional data processing application software is inadequate to deal with them (wikipedia) 9 Big data – big value source: wipro.com 10
Introduction to data management What is Data Management
Data management is the development and execution
of architectures, policies, practices and procedures
in order to manage the information lifecycle needs of
an enterprise in an effective manner Poor Data Management
• 94% of companies suffering from a catastrophic data
loss do not survive – 43% never reopen and 51% close
within two years. (University of Texas)
• 7 out of 10 small firms that experience a major data
loss go out of business within a year. (DTI/Price Waterhouse Coopers)
• 50% of all tape backups fail to restore. (Gartner)
• 25% of all PC users suffer from data loss each year (Gartner) Why Data Management: Foundation to Advance Science
• Data is a valuable asset – it is expensive and time consuming to collect • Data should be managed to:
o maximize the effective use and value of data and information assets
o continually improve the quality including data accuracy, integrity,
integration, timeliness of data capture and presentation, relevance and usefulness
o ensure appropriate use of data and information o facilitate data sharing
o ensure sustainability and accessibility in long term for re-use in science
A new image processing technique reveals something not before seen in this Hubble Space
Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered
with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters “Planet hidden in Hubble tters JLe
archives” Science News (Feb. 27, 2009) et al., Ap frenière La D.
The first thing it tells you is how valuable maintaining long-term archives can be.
Here is a major discovery that’s been lurking in the data for about 10 years!

comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble.
“The second thing its tel s you is having a wel calibrated archive is necessary but not
sufficient to make breakthroughs — it also takes a very innovative group of people to
develop very smart extraction routines that can get rid of al the artifacts to reveal the
planet hidden under al that telescope and detector structure.”
Data Management Facilitates Sharing and Re-use…
Where a majority of data end up now…
Imagine if data were more accessible…. Data Life Cycle Plan Analyze Collect Integrate Assure Discover Describe Preserve Planning
• Consider data management before you collect data
• What kind of data will be collected?
• Which methods will be used (sensors, samples, etc.)?
• What data formats/standards are appropriate? • How will the data be used?
• How will you share the data? • Will your methods satisfy • Funding requirements
• Policies for access, sharing, reuse
• Budget – most of the time tihis is overlooked! • Output • Formal document