lOMoARcPSD| 23136115
DATA SCIENCE &
DATA VISUALIZATION -
Final
Nguyen Quang Dieu
BStudy
lOMoARcPSD| 23136115
lOMoARcPSD| 23136115
01 THEORY
Crical points of W07 -
Interacon
1. Stac content: infographics, books + dynamic content:
animated in auto-play, interacve content
2. Need to interact because exploring data that is
big/complex -> amplies cognion
3. Direct manipulaon: interact directly with objects + indirect interact
lOMoARcPSD| 23136115
4. Types: single
view: overme, navigaon, semanc zooming, ltering, focus + mulple views: selecon, linking,
brushing, adapng representaon
5. Change over me: use slides to see views at dierent mes, show dierences explicitly -> doesn t
have to be literal me
6. Transion: change orders, track what s going on -> animated transions: smooth interpolaon
between states/techniques
7. Animaon caveats: changes hard to track, and eyes over memory
8. Navigaon: pan, zoom, rotate
9. Scrolltelling: an interacve story, interacng by scrolling but unexpected behavior10. Semanc
zooming: update content on zooming, more details and readable at any resoluons
lOMoARcPSD| 23136115
Crical
points of W07 — Interacon
11. Focus + Context: pick what to show, hint not showing -> visual encoding and interacon
(aggregaon, reducon, layering)
12. Elision: focus item shown in detail
13. Degree of interest (DOI): Represent objects in the neighbourhood in detail and only major
landmarks far away DOI(x) = I(x) - D(x,y)
lOMoARcPSD| 23136115
14. Interacve
tree with animated transions that t within a bounded region of space layout depends on the
user s esmated DOI tree 15. Superimpose: Focus layer limited to a local region of view
16. Magic lenses: details data is shown when moving len over a scene -> Labeling 17.
Distoron: Use geometric distoron of the contextual regions to make room for the details in
the focus regions(s)
18. Distoron kinds: perspecve wall, sheye, hyperbolic geometry -> unsuitable for relave
spaal judgments
19. Overview and detail: One view shows an overview + Other shows a detail
20. Filter & Dynamic querying: Mantra overview rst, zoom & lter, details on demand
Crical points of W08 - Views
1. Mulple views: eyes beat memory, no single visual encoding is opmal, and too many to be
shown in one view
2. Linked views: Mulple views are simultaneously visible and linked together
(highlighng + navigaon + encoding + dataset)
3. Mulform: Dierent visual encodings are used between views, supporng dierent tasks
4. Stack zooming, Mizbee, Stratomex -> Small mulples: same visual encoding, but shows a dierent
subset
lOMoARcPSD| 23136115
5. Paroning:
Acon on the dataset that separates the categorical data into groups (divide data + splits + views)
6. Trellis plots: panel variable (encoded in individual views), paroning variables
(assigned to columns and rows), main-eects ordering
7. Recursive subdivision: Flexibly transform data aributes into a hierarchy using treemaps as space-
lling rectangular layouts
8. Layering/Overlay: combine mulple views on top of one another -> composite view
9. SUPERIMPOSED (best for tasks carried out within a local visual span) VS
JUXTAPOSED (best for global tasks)
10. Dual axis, Combined chart, Layers/Dynamic layers
Crical points of W09 - Table
1. Table: scale (1000+ need analysis), records (10 000 need analysis), homogeneity
(same types/scale)
2. Analyc component: scaer plot, parallel coordinated -> heat map -> muldimensional scaling
lOMoARcPSD| 23136115
3. Techniques:
magnitude (size comparison), distribuon (aggregang large data), ranking (magnitude ranking,
bump charts, temporal, table lens, lineup), part to whole, deviaon (reference point), correlaon,
change over me (line chart, stacked area, sparklines, clipped graphs, heatmap)
4. Bar chart and isotype visualizaon -> Part of the whole: Show how a single enty can be
broken down into its component elements, stacked bar chart, pie and donut chart, treemap,
stacked area 5. Histogram -> good choice of bins = sqrt(n) or log2(n) + 1, density plot, box-
andwhisker plot, notched box plot (with condence interval), dot plots, violin plot = box plot +
probability density funcon
6. Mulple aributes: combiner funcon (weighted sum for serial, maximum for parallel, and
product/nesng for complex)
lOMoARcPSD| 23136115
Crical points of W10 - Storytelling
1. Good stories = facts + data + context + engage + educate
2. Underscore your arguments with Data/Facts and leverage the power of
Visualizaon
3. Components: Introducon, Context, Main story, Annotaon of key point
4. Genre: magazine, annotated chart, paroned poster, ow chart, comic strip, slide show,
lm/video/animaon
5. Author driven << linear ordering, heavy message, no interacvity + no ordering, no message,
free interacvity >> Reader driven
6. Marni glas (author-driven, then open to explore), interacve slideshow (mulple scenes,
interacon midway), drill-down story (decide path, annotated)
7. Layout: descripve tles, subtles, annotaon, saturaon
8. Interacvity: navigaon, details on demand, relevant to the reader -> ask for opinions/prior
knowledge
9. Design: fewer colors, average for context, beer scale, richer annotaons
10. Engagement: know target audience (opinionated <-> high informaon density) -> public
media, expert panel, educaon, group meeng, board meeng
lOMoARcPSD| 23136115
Crical points of W11 - Evaluaon
1. Problem-driven: top-down approach, idenfy a problem encountered by users, design a soluon to
help users work more eecvely somemes called a design study
2. Technique-driven: boom-up approach, invent new visualizaon techniques or algorithms, classify
or compare against other idioms and algorithms
3. Nested model approach
4. Design process: domain problem -> map to task + data type & factors -> idenfy &
implement suitable technique
5. Domain characterizaon: details domain, grouped users, target domain, quesons, data
6. Domain problem: innite domain task, broken down to abstract task -> soluons probably
exist
7. Task abstracon: what-why, generalized terms, task that user wants to do, data types and
model, transform data -> specic task requirements
8. Encoding & Interacons: design of visualizaon techniques, manipulaon of visual
representaons, decisions of separated/intertwined, drive decisions
lOMoARcPSD| 23136115
Crical points of W11 - Evaluaon
9. Task: analyze, search, query
10. High-level Analyze: consume -> discover, present, enjoy & produce -> annotate, record, derive
11. Mid-level Search: target, locaon -> lookup, browse, locate, explore
12. Mid-level Query: one, some, all -> idenfy, compare, summarize
13. Low-level Target: all data (trend, outliers, features), aributes (one/many), network data
(topology), spaal data (shape)
14. Design: creang something new to solve a problem; design is used in many elds15.
Funcon can constrain possible forms -> Form depends on tasks that must be achieved
16. When designing: wicked problems (no clear denion, not a good soluon) / notwicked
problems (math. chess, puzzles)
17. Why maer? Ineecve visualizaon combinaons, unique problems & data, tasks, design
space
lOMoARcPSD| 23136115
Crical points of W11 - Evaluaon
18. Evaluaon methods: controlled experiments, interviews/quesonnaires, eld/lab observaons,
log analysis, algorithmic performance measurement, heuriscs evaluaon, usability tesng,
Wizard of Oz, eye-tracker, expert, insights-bases, case studies
19. Quantave: metrics, measurements, number/stats for data vs Qualitave method: subjecve
metrics, descripons, understandings
20. Internal validity: can you trust your experiment (high when in lab condions, aected by test
condions) vs External validity: representave of real-world usage (high when tested in the
elds, valid in the world) -> trade-o: The more akin to realworld situaons, the more
experiment is suscepble to uncontrolled sources of variaon
21. Scope of evaluaon
22. Predesign -> user work environment and workow, design -> visual encoding + interacon
design, prototype -> see if it achieves design goals and compares with convenonal soluons,
deployment -> see if it aects workow/work process and eecveness in the elds, re-design ->
improve current design by idenfying usability problems
lOMoARcPSD| 23136115
02
lOMoARcPSD| 23136115
CODE
(open in VSCode)
Thank you for coming
DATA SCIENCE &
lOMoARcPSD| 23136115
DATA VISUALIZATION -
Final
Nguyen Quang Dieu
BStudy

Preview text:

lOMoAR cPSD| 23136115 DATA SCIENCE & DATA VISUALIZATION - Final Nguyen Quang Dieu BStudy lOMoAR cPSD| 23136115 lOMoAR cPSD| 23136115 01 THEORY Critical points of W07 - Interaction
1. Static content: infographics, books + dynamic content:
animated in auto-play, interactive content
2. Need to interact because exploring data that is
big/complex -> amplifies cognition
3. Direct manipulation: interact directly with objects + indirect interact lOMoAR cPSD| 23136115 4. Types: single
view: overtime, navigation, semantic zooming, filtering, focus + multiple views: selection, linking,
brushing, adapting representation
5. Change over time: use slides to see views at different times, show differences explicitly -> doesn t have to be literal time
6. Transition: change orders, track what s going on -> animated transitions: smooth interpolation between states/techniques
7. Animation caveats: changes hard to track, and eyes over memory
8. Navigation: pan, zoom, rotate
9. Scrolltelling: an interactive story, interacting by scrolling but unexpected behavior10. Semantic
zooming: update content on zooming, more details and readable at any resolutions lOMoAR cPSD| 23136115 Critical points of W07 — Interaction
11. Focus + Context: pick what to show, hint not showing -> visual encoding and interaction
(aggregation, reduction, layering)
12. Elision: focus item shown in detail
13. Degree of interest (DOI): Represent objects in the neighbourhood in detail and only major
landmarks far away DOI(x) = I(x) - D(x,y) lOMoAR cPSD| 23136115 14. Interactive
tree with animated transitions that fit within a bounded region of space layout depends on the
user s estimated DOI tree 15. Superimpose: Focus layer limited to a local region of view
16. Magic lenses: details data is shown when moving len over a scene -> Labeling 17.
Distortion: Use geometric distortion of the contextual regions to make room for the details in the focus regions(s)
18. Distortion kinds: perspective wall, fisheye, hyperbolic geometry -> unsuitable for relative spatial judgments
19. Overview and detail: One view shows an overview + Other shows a detail
20. Filter & Dynamic querying: Mantra overview first, zoom & filter, details on demand
Critical points of W08 - Views
1. Multiple views: eyes beat memory, no single visual encoding is optimal, and too many to be shown in one view
2. Linked views: Multiple views are simultaneously visible and linked together
(highlighting + navigation + encoding + dataset)
3. Multiform: Different visual encodings are used between views, supporting different tasks
4. Stack zooming, Mizbee, Stratomex -> Small multiples: same visual encoding, but shows a different subset lOMoAR cPSD| 23136115 5. Partitioning:
Action on the dataset that separates the categorical data into groups (divide data + splits + views)
6. Trellis plots: panel variable (encoded in individual views), partitioning variables
(assigned to columns and rows), main-effects ordering
7. Recursive subdivision: Flexibly transform data attributes into a hierarchy using treemaps as space- filling rectangular layouts
8. Layering/Overlay: combine multiple views on top of one another -> composite view
9. SUPERIMPOSED (best for tasks carried out within a local visual span) VS
JUXTAPOSED (best for global tasks)
10. Dual axis, Combined chart, Layers/Dynamic layers
Critical points of W09 - Table
1. Table: scale (1000+ need analysis), records (10 000 need analysis), homogeneity (same types/scale)
2. Analytic component: scatter plot, parallel coordinated -> heat map -> multidimensional scaling lOMoAR cPSD| 23136115 3. Techniques:
magnitude (size comparison), distribution (aggregating large data), ranking (magnitude ranking,
bump charts, temporal, table lens, lineup), part to whole, deviation (reference point), correlation,
change over time (line chart, stacked area, sparklines, clipped graphs, heatmap) 4.
Bar chart and isotype visualization -> Part of the whole: Show how a single entity can be
broken down into its component elements, stacked bar chart, pie and donut chart, treemap,
stacked area 5. Histogram -> good choice of bins = sqrt(n) or log2(n) + 1, density plot, box-
andwhisker plot, notched box plot (with confidence interval), dot plots, violin plot = box plot + probability density function
6. Multiple attributes: combiner function (weighted sum for serial, maximum for parallel, and product/nesting for complex) lOMoAR cPSD| 23136115
Critical points of W10 - Storytelling
1. Good stories = facts + data + context + engage + educate
2. Underscore your arguments with Data/Facts and leverage the power of Visualization
3. Components: Introduction, Context, Main story, Annotation of key point
4. Genre: magazine, annotated chart, partitioned poster, flow chart, comic strip, slide show, film/video/animation 5.
Author driven << linear ordering, heavy message, no interactivity + no ordering, no message,
free interactivity >> Reader driven 6.
Martini glas (author-driven, then open to explore), interactive slideshow (multiple scenes,
interaction midway), drill-down story (decide path, annotated) 7.
Layout: descriptive titles, subtitles, annotation, saturation 8.
Interactivity: navigation, details on demand, relevant to the reader -> ask for opinions/prior knowledge 9.
Design: fewer colors, average for context, better scale, richer annotations
10. Engagement: know target audience (opinionated <-> high information density) -> public
media, expert panel, education, group meeting, board meeting lOMoAR cPSD| 23136115
Critical points of W11 - Evaluation
1. Problem-driven: top-down approach, identify a problem encountered by users, design a solution to
help users work more effectively sometimes called a design study
2. Technique-driven: bottom-up approach, invent new visualization techniques or algorithms, classify
or compare against other idioms and algorithms 3. Nested model approach 4.
Design process: domain problem -> map to task + data type & factors -> identify & implement suitable technique 5.
Domain characterization: details domain, grouped users, target domain, questions, data 6.
Domain problem: infinite domain task, broken down to abstract task -> solutions probably exist 7.
Task abstraction: what-why, generalized terms, task that user wants to do, data types and
model, transform data -> specific task requirements 8.
Encoding & Interactions: design of visualization techniques, manipulation of visual
representations, decisions of separated/intertwined, drive decisions lOMoAR cPSD| 23136115
Critical points of W11 - Evaluation
9. Task: analyze, search, query
10. High-level Analyze: consume -> discover, present, enjoy & produce -> annotate, record, derive
11. Mid-level Search: target, location -> lookup, browse, locate, explore
12. Mid-level Query: one, some, all -> identify, compare, summarize
13. Low-level Target: all data (trend, outliers, features), attributes (one/many), network data
(topology), spatial data (shape)
14. Design: creating something new to solve a problem; design is used in many fields15.
Function can constrain possible forms -> Form depends on tasks that must be achieved
16. When designing: wicked problems (no clear definition, not a good solution) / notwicked
problems (math. chess, puzzles)
17. Why matter? Ineffective visualization combinations, unique problems & data, tasks, design space lOMoAR cPSD| 23136115
Critical points of W11 - Evaluation
18. Evaluation methods: controlled experiments, interviews/questionnaires, field/lab observations,
log analysis, algorithmic performance measurement, heuristics evaluation, usability testing,
Wizard of Oz, eye-tracker, expert, insights-bases, case studies
19. Quantitative: metrics, measurements, number/stats for data vs Qualitative method: subjective
metrics, descriptions, understandings
20. Internal validity: can you trust your experiment (high when in lab conditions, affected by test
conditions) vs External validity: representative of real-world usage (high when tested in the
fields, valid in the world) -> trade-off: The more akin to realworld situations, the more
experiment is susceptible to uncontrolled sources of variation 21. Scope of evaluation
22. Predesign -> user work environment and workflow, design -> visual encoding + interaction
design, prototype -> see if it achieves design goals and compares with conventional solutions,
deployment -> see if it affects workflow/work process and effectiveness in the fields, re-design ->
improve current design by identifying usability problems lOMoAR cPSD| 23136115 02 lOMoAR cPSD| 23136115 CODE (open in VSCode) Thank you for coming DATA SCIENCE & lOMoAR cPSD| 23136115 DATA VISUALIZATION - Final Nguyen Quang Dieu BStudy