14 trang 47 lượt tải

Multimedia data compression & coding: NFW final report môn Đa phương tiện và các ứng dụng giải trí | Trường Đại học Bách Khoa Hà Nội

This report provides a comprehensive overview of the process and outcomes of our advanced audio coding project implemented using MATLAB. It outlines each stage of the development in a clear, step-by-step manner, detailing the methods, algorithms, and tools used throughout the project. Tài liệu được sưu tầm gồm 14 trang, giúp các bạn ôn luyện và phục vụ cho việc học tập, đạt kết quả tốt. Mời các bạn đón xem!

Môn: Đa phương tiện và các ứng dụng giải trí 10 tài liệu

Trường: Đại học Bách Khoa Hà Nội 4.7 K tài liệu

Tác giả:

My Lữ

3 tháng trước

Tải xuống Báo cáo

Danh sách Quiz

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

---□&□---

ASSIGNMENT REPORT

TOPIC: NON-LINEAR FREQUENCY WARPING

Instructor Assoc. Prof. Dr Phạm Văn Tiến

Member Vũ Minh Hiển 20224311

Lê Hoàng Anh 20224297

Đào Hữu Mão 20233865

Phạm Thị Thanh Trúc 20224293

Subject Multimedia data compression and coding

Class 157320

Group 10

 
Table of Contents  
I. Introduction ........................................................................................................... 3 
II. Details .................................................................................................................. 4 
2.1. Record the audio ........................................................................................... 4 
2.2. Spectrum show and analysis ......................................................................... 4 
2.2.1. Matlab coding .......................................................................................... 4 
2.2.2. Spectrum analysis ................................................................................... 5 
2.3. NFW Compression and Decompression ....................................................... 6 
2.3.1. Audio Compression Using NFW .............................................................. 6 
2.3.2. Audio Decompression Using NFW .......................................................... 7 
Performance and Evaluation ............................................................................. 7 
2.4. PSNR calculate and compare between NFW and MP3 ................................ 9 
2.4.1. About PSNR ............................................................................................ 9 
2.4.2. Convert .wav (original) file to .mp3 file .................................................... 9 
2.4.3. Compare PNSR value of NFW codec and mp3 compression with original 
 .......................................................................................................................... 9 
signal ................................................................................................................. 9 
2.5. Generate MIDI music and perform a Jazz song .......................................... 12 
 
III. CONCLUSION ..................................................................................................... 
13

I. Introduction

- Students who have been contributing for the project:

Vũ Minh Hiển

20224311

Creates a Jazz music mix, audio decompress.

Lê Hoàng Anh

20224297

Implemented the code for audio compress, decompress,

PSNR comparation.

Đào Hữu Mão

20233865

Convert .wav (original) file to .mp3 file, Calculate and

compare PSNR value.

Phạm Thị Thanh Trúc

20224293

Recording code, Spectrum Analysis, audio compress,

report.

- Brief view of the project:

This report provides a comprehensive overview of the process and outcomes of our

advanced audio coding project implemented using MATLAB. It outlines each stage

of the development in a clear, step-by-step manner, detailing the methods, algorithms,

and tools used throughout the project. The report also includes code snippets, visual

illustrations, and explanations to support and demonstrate our implementation. By

presenting both the theoretical background and practical execution, we aim to give

readers a complete understanding of how the project was carried out by our group.

II. Details

2.1. Record the audio

− Each of us recorded using phone, mix all with app and converted from mp3 to wav.

− We can also use the 'Part1_recording.m ' file to record

2.2. Spectrum show and analysis

2.2.1. Matlab coding

− After recording, our next task is using Matlab to show spectrum of the recorded audio

signal, then comments on its energy distribution over the frequency axis.

− This is the code we used and result:

%% 1. Load Audio File

[y, fs] = audioread('recorded.wav'); y =

y(:,1); % Convert to mono if stereo

%% 2. Spectrum Analysis

% Calculate FFT

N = length(y); Y

= fft(y);

f = linspace(0, fs, N); %

Create figure with subplots

figure('Name', 'Audio Spectrum Analysis', 'Position', [100 100 1200 800]);

% Plot 1: Time Domain Signal

subplot(2,1,1); t = (0:N-1)/fs;

plot(t, y); xlabel('Thời gian

(s)'); ylabel('Biên độ');

title('Tín hiệu âm thanh gốc');

grid on;

% Plot 2: Frequency Spectrum

subplot(2,1,2);

plot(f(1:N/2), abs(Y(1:N/2)));

xlabel('Tần số (Hz)'); ylabel('Biên

độ');

title('Phổ tần số của tín hiệu âm thanh'); grid

on;

%% 3. Frequency Band Analysis %

Define frequency bands

bands = [0 100; 100 500; 500 2000; 2000 8000; 8000 fs/2];

band_names = {'0-100 Hz', '100-500 Hz', '500-2000 Hz', '2000-8000 Hz', '8000+ Hz'};

energy_bands = zeros(length(bands), 1); % Calculate energy in each band

for i = 1:length(bands) band_indices = f >=

bands(i,1) & f <= bands(i,2); energy_bands(i) =

sum(abs(Y(band_indices)).^2); end

% Normalize energy

energy_bands = energy_bands / sum(energy_bands) * 100;

% Display energy distribution

disp('=== Phân bố năng lượng theo dải tần ==='); for

i = 1:length(bands)

fprintf('%s: %.2f%%\n', band_names{i}, energy_bands(i)); end

− And it gives out the following result:

2.2.2. Spectrum analysis

a. Frequency Band Analysis

−

0–100 Hz (2.37%): This very low-frequency range mostly contains background

noise, such as wind or unwanted ambient sounds.

−

100–500 Hz (70.50%): This range holds the majority of the signal’s energy. It likely

represents core sound components such as human speech fundamentals or main

musical tones, making it a crucial frequency band for natural audio.

−

500–2000 Hz (20.71%): This band contains clearer, more detailed audio

components — often speech formants or instrumental harmonics that enhance

clarity.

−

2000–8000 Hz (4.53%): Higher-frequency components contributing to brightness or

timbral nuance, usually from vocals or high-pitched instruments. These are

perceptually important but lower in energy.

−

8000+ Hz (1.88%): This ultrasonic range contains very little energy and likely holds

inaudible frequencies or minor noise, often non-essential for perceptual audio

quality.

b. Observations

− The energy distribution reflects the nature and quality of the recorded audio. In the

case of speech, most energy resides in low to mid-frequency bands, which is

consistent with the analysis.

− The dominance of the 100–500 Hz band indicates this is the primary region carrying

meaningful audio content.

− Frequencies above 8000 Hz contribute little to the overall energy and may be safely

discarded or heavily compressed without noticeable quality loss.

c. Recommendation for Compression

Based on this analysis, we recommend prioritizing the preservation of low and

midfrequency bands during compression, while applying more aggressive reduction or

removal to high-frequency components. This strategy helps maintain perceptual quality

while significantly reducing file size.

2.3. NFW Compression and Decompression

2.3.1. Audio Compression Using NFW

Objective: The first script aims to compress an audio file using NFW, producing a .mat file

containing quantized magnitude and phase components.

Steps Involved:

1. Reading the Audio File:

a. The audio is read from a WAV file using audioread. The mono channel is

selected for processing.

2. Short-Time Fourier Transform (STFT):

a. The audio signal is divided into frames using a windowed STFT process

(with a 2048-sample frame length and 1024-sample hop length).

b. FFT is applied to each frame to generate the STFT matrix, which contains

both magnitude and phase information for each frequency bin.

3. Non-linear Frequency Warping (Mel Scaling):

a. The frequency bins are warped using Mel scaling to mimic the human

auditory system's frequency perception.

b. The warped frequency indices are used to re-assign magnitudes from the

original STFT to a Mel-scaled matrix.

4. Quantization:

a. Both magnitude and phase matrices are quantized. Magnitudes are quantized

with 6 bits, and phase information is quantized with 4 bits. This reduces the

bit-depth, lowering the storage requirement.

5. Saving the Compressed Data:

a. The quantized magnitude and phase are stored in a .mat file, alongside the

parameters necessary for decompression (sampling frequency, frame length,

hop length, etc.).

b. The compression ratio is calculated, comparing the original and compressed

bit sizes.

Output:

• A .mat file containing the compressed magnitude and phase data.

Compression Information: The script prints the compression ratio, original file size, and

compressed file size for transparency.

2.3.2. Audio Decompression Using NFW

Objective: The second script reconstructs the audio from the compressed .mat file,

reversing the NFW process and producing a reconstructed audio file in .wav format.

Steps Involved:

1. Loading the Compressed Data:

a. The compressed file is loaded to extract the quantized magnitude and phase,

as well as the original audio parameters (frame length, hop length, etc.).

2. Dequantization:

a. The magnitude and phase values are dequantized, reversing the quantization

process by mapping back the levels to the original range.

3. Inverse Frequency Warping (De-warping):

a. The Mel-scaled magnitude and phase are mapped back to the linear

frequency scale. This involves interpolating between the warped indices to

reconstruct the original frequency bins.

4. Reconstruction of the Full Spectrum:

a. The full spectrum (both positive and negative frequencies) is reconstructed

by applying conjugate symmetry for real signals.

5. Inverse Short-Time Fourier Transform (ISTFT):

a. The inverse STFT is applied using the Overlap-Add method. The frames are

synthesized from the magnitude and phase information and combined to

produce the time-domain signal.

6. Saving the Decompressed Audio:

a. The decompressed audio is saved as a .wav file, normalized to prevent

clipping, and played back to compare with the original audio.

Output:

• A .wav file containing the decompressed audio.

Decompression Information:

• The script prints details such as the input and output file names, sampling frequency,

and audio duration.

Performance and Evaluation

Compression Ratio:

• The compression ratio is an important metric. The original audio file is typically in

16-bit PCM format, while the compressed version uses 6-bit magnitude and 4-bit

phase, significantly reducing the storage size.

Quality Assessment:

• The decompressed audio is played back and compared with the original to evaluate

the perceptual quality. Since this is a lossy compression method, there might be

some degradation in quality, but the compression ratio and file size reduction justify

its use in storage-limited applications.

2.4. PSNR calculate and compare between NFW and MP3

2.4.1. About PSNR

- PSNR (Peak Signal-to-Noise Ratio) measures the reconstruction quality of a signal

after processing (such as compression and decompression) compared to the original

signal. In the context of audio

• Original signal: recorded.wav (the original audio).

• Reconstructed signal: compressed_mel.wav (the audio after compression and

decompression).

- PSNR calculates the error between the two signals in the same domain (either time

or frequency), typically in the time domain (i.e., WAV audio samples).

2.4.2. Convert .wav (original) file to .mp3 file

1. Install FFmpeg for Windows

1. Go to: https://ffmpeg.org/download.html

2. Choose Windows → follow the link to gyan.dev.

3. Download ffmpeg-master-latest-win64-gpl.zi.

4. Extract and add the path of bin folder to your Windows PATH.

2. Use FFmpeg to convert .wav file to .mp3 file

1. Open terminal.

2. Check ffmpeg --version.

3. Copy .wav file to C:\Users\Dell (your path).

4. In terminal, enter:

ffmpeg -i recorded.wav -codec:a libmp3lame -b:a 192k recorded.mp3 The

final screen should be:

Then check C:\Users\Dell (your path) again to get .mp3 file.

2.4.3. Compare PNSR value of NFW codec and mp3 compression with

original signal

- On the next section of the assignment, we had to calculate the PSNR using this code:

input_file = 'recorded.wav';

compare_compression_quality(input_file);

function compare_compression_quality(input_file)

try

%% 1. Load Original Audio

[original, fs] = audioread(input_file);

original = original(:,1); % Convert to mono if stereo

%% 2. Load NFW Compressed Audio % Load

decompressed NFW audio

decompressed_nfw_file = 'decompressed.wav';

if ~exist(decompressed_nfw_file, 'file')

error('File giải nén NFW không tồn tại. Vui lòng chạy hàm decompress_NFW

trước.'); end

[nfw_audio, ~] = audioread(decompressed_nfw_file); nfw_audio

= nfw_audio(:,1); % Convert to mono if stereo

%% 3. Load MP3 Audio %

Load MP3 audio mp3_file =

'recorded.mp3';

[mp3_audio, ~] = audioread(mp3_file); mp3_audio =

mp3_audio(:,1); % Convert to mono if stereo %% 4.

Ensure All Audio Signals Have Same Length

min_length = min([length(original), length(nfw_audio), length(mp3_audio)]);

original = original(1:min_length); nfw_audio = nfw_audio(1:min_length);

mp3_audio = mp3_audio(1:min_length);

%% 5. Calculate PSNR %

Calculate PSNR for NFW

nfw_mse = mean((original - nfw_audio).^2);

if nfw_mse == 0 nfw_psnr = Inf; else

nfw_psnr = 10 * log10(1 / nfw_mse); end

% Calculate PSNR for MP3

mp3_mse = mean((original - mp3_audio).^2);

if mp3_mse == 0 mp3_psnr = Inf; else

mp3_psnr = 10 * log10(1 / mp3_mse); end

%% 6. Calculate Compression Ratios

% Original file size original_info

= dir(input_file);

original_size = original_info.bytes * 8; % Convert to bits

% NFW compressed size

compressed_nfw_file = 'compressed.mat'; if

~exist(compressed_nfw_file, 'file')

error('File nén NFW không tồn tại. Vui lòng chạy hàm compress_NFW trước.'); end

nfw_info = dir(compressed_nfw_file); nfw_size =

nfw_info.bytes * 8; % Convert to bits nfw_ratio

= original_size / nfw_size;

% MP3 compressed size mp3_info

= dir(mp3_file);

mp3_size = mp3_info.bytes * 8; % Convert to bits

mp3_ratio = original_size / mp3_size;

%% 7. Display Results

fprintf('=== Compression Quality Comparison ===\n');

fprintf('Original file: %s\n', input_file); fprintf('Original

size: %d bits\n', original_size); fprintf('\n');

fprintf('NFW Compression:\n'); fprintf(' Compressed

size: %d bits\n', nfw_size); fprintf(' Compression

ratio: %.2f:1\n', nfw_ratio); fprintf(' PSNR: %.2f

dB\n', nfw_psnr); fprintf('\n');

fprintf('MP3 Compression:\n'); fprintf(' Compressed

size: %d bits\n', mp3_size); fprintf(' Compression

ratio: %.2f:1\n', mp3_ratio); fprintf(' PSNR: %.2f

dB\n', mp3_psnr);

% So sánh kết quả

fprintf('\n=== Comparison Summary ===\n'); if

nfw_psnr > mp3_psnr

fprintf('NFW has better quality (PSNR: %.2f dB > %.2f dB)\n', nfw_psnr,

mp3_psnr);

elseif mp3_psnr > nfw_psnr

fprintf('MP3 has better quality (PSNR: %.2f dB > %.2f dB)\n', mp3_psnr,

nfw_psnr); else

fprintf('Both methods have the same quality (PSNR: %.2f dB)\n', nfw_psnr); end

if nfw_ratio > mp3_ratio

fprintf('NFW has better compression ratio (%.2f:1 > %.2f:1)\n', nfw_ratio,

mp3_ratio);

elseif mp3_ratio > nfw_ratio

fprintf('MP3 has better compression ratio (%.2f:1 > %.2f:1)\n', mp3_ratio,

nfw_ratio); else

fprintf('Both methods have the same compression ratio (%.2f:1)\n', nfw_ratio);

end

%% 8. Visual Comparison %

Create time-domain plots

figure; subplot(3,1,1);

plot(original);

title('Original Audio');

xlabel('Sample');

ylabel('Amplitude');

subplot(3,1,2);

plot(nfw_audio); title('NFW

Compressed Audio');

xlabel('Sample');

ylabel('Amplitude');

subplot(3,1,3);

plot(mp3_audio); title('MP3

Compressed Audio');

xlabel('Sample');

ylabel('Amplitude');

% Create frequency-domain plots figure;

subplot(3,1,1); plot_spectrum(original,

fs); title('Original Audio Spectrum');

subplot(3,1,2);

plot_spectrum(nfw_audio, fs);

title('NFW Compressed Audio Spectrum');

subplot(3,1,3);

plot_spectrum(mp3_audio, fs);

title('MP3 Compressed Audio Spectrum');

% Create error plots

figure; subplot(2,1,1);

plot(original -

nfw_audio);

title('Error: Original

- NFW');

xlabel('Sample');

ylabel('Amplitude');

subplot(2,1,2);

plot(original -

mp3_audio);

title('Error: Original

- MP3');

xlabel('Sample');

ylabel('Amplitude'); %%

9. Play Audio for

Comparison

disp('Playing original

audio...');

sound(original, fs);

pause(length(original)

/fs + 1); disp('Playing

NFW compressed

audio...');

sound(nfw_audio, fs);

pause(length(nfw_audio

)/fs + 1);

disp('Playing MP3

compressed audio...');

sound(mp3_audio, fs);

catch ME

fprintf('Error in comparison: %s\n', ME.message);

fprintf('Stack trace:\n'); disp(ME.stack);

rethrow(ME); end end function plot_spectrum(x, fs)

% Calculate spectrum

N = length(x); X = fft(x); f

= linspace(0, fs/2, N/2+1); %

Plot magnitude spectrum

plot(f, 2*abs(X(1:N/2+1))/N);

xlabel('Frequency (Hz)');

ylabel('Magnitude'); xlim([0

min(20000, fs/2)]);

end

- The result show that:

2.5. Generate MIDI music and perform a Jazz song

− Here is the code to generate midi background music in Python:

from mido import Message, MidiFile, MidiTrack import

random

# Tổng thời gian mong muốn: 5 phút = 300 giây

# Với tempo mặc định: 500000 us/beat = 0.5s/beat

# 480 ticks per beat (default)

# => 1 giây = 960 ticks

# => 5 phút = 300 giây = 288000 ticks

total_ticks = 288000 ticks_per_note = 600 num_notes

= total_ticks // ticks_per_note # => 480 nốt

mid = MidiFile() track =

MidiTrack()

mid.tracks.append(track)

track.append(Message('program_change', program=0, time=0))

for i in range(num_notes):

note = random.choice([60, 62, 65, 67, 69, 72])

velocity = random.randint(60, 90)

track.append(Message('note_on', note=note,

velocity=velocity, time=200))

track.append(Message('note_off', note=note, velocity=0,

time=400))

mid.save('midi_output.mid')

- The last thing we do: combining recorded audio and backgroun audio .wav( that is

transfered from the midi ouput generated from the python file.

[y1, fs1] = audioread('recorded.wav');

[y2, fs2] = audioread('midi_output.wav');

% Chuyển về mono nếu cần

if size(y1,2) > 1 y1 =

mean(y1, 2); end if

size(y2,2) > 1 y2 =

mean(y2, 2); end

% Cắt đến độ dài nhỏ nhất min_len =

min(length(y1), length(y2)); y1 =

y1(1:min_len); y2 = y2(1:min_len);

% Mix với hệ số 0.5 để tránh clipping mixed

= y1 + 0.5 * y2;

mixed = mixed / max(abs(mixed)); % Chuẩn hóa

audiowrite('jazz_mix.wav', mixed, fs1); fprintf("✅

Đã tạo thành công jazz_mix.wav\n");

III. CONCLUSION

This is our first big assignment regarding the Multimedia Data Compression And Coding

subject. Albeit very amateur on working on all problems, it is also such a essential

objective for us – not only a student of Hust but also a member of ET-E16 major to gain

a plethora of experiences in terms of both knowledge and cooperation, aiming for further

education in multimedia.

In this project, we successfully applied Non-linear Frequency Warping (NFW)

combined with quantization to compress audio signals efficiently. The analysis of

frequency energy distribution helped us focus on preserving essential low- and

midfrequency components while discarding high-frequency details that contribute less to

perceived quality.

Additionally, we generated a MIDI jazz sequence using a randomized C major jazz

scale. This showcased how programmatic music creation can be used to simulate

realistic audio with minimal file size, suitable for embedded or multimedia systems.

Overall, both the compression technique and MIDI synthesis demonstrated effective

strategies for audio size reduction and generation without sacrificing perceptual quality.

Bấm Tải xuống để xem toàn bộ.

Preview text:

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
---□&□--- ASSIGNMENT REPORT
TOPIC: NON-LINEAR FREQUENCY WARPING
Instructor
Assoc. Prof. Dr Phạm Văn Tiến Member Vũ Minh Hiển 20224311 Lê Hoàng Anh 20224297 Đào Hữu Mão 20233865 Phạm Thị Thanh Trúc 20224293 Subject
Multimedia data compression and coding Class 157320 Group 10 Table of Contents
I. Introduction ........................................................................................................... 3
II. Details .................................................................................................................. 4
2.1. Record the audio ........................................................................................... 4
2.2. Spectrum show and analysis ......................................................................... 4
2.2.1. Matlab coding .......................................................................................... 4
2.2.2. Spectrum analysis ................................................................................... 5
2.3. NFW Compression and Decompression ....................................................... 6
2.3.1. Audio Compression Using NFW .............................................................. 6
2.3.2. Audio Decompression Using NFW .......................................................... 7
Performance and Evaluation ............................................................................. 7
2.4. PSNR calculate and compare between NFW and MP3 ................................ 9
2.4.1. About PSNR ............................................................................................ 9
2.4.2. Convert .wav (original) file to .mp3 file .................................................... 9
2.4.3. Compare PNSR value of NFW codec and mp3 compression with original
.......................................................................................................................... 9
signal ................................................................................................................. 9
2.5. Generate MIDI music and perform a Jazz song .......................................... 12
III. CONCLUSION ..................................................................................................... 13
I. Introduction
- Students who have been contributing for the project: Vũ Minh Hiển
20224311 Creates a Jazz music mix, audio decompress.
Implemented the code for audio compress, decompress, Lê Hoàng Anh 20224297 PSNR comparation.
Convert .wav (original) file to .mp3 file, Calculate and Đào Hữu Mão 20233865 compare PSNR value.
Recording code, Spectrum Analysis, audio compress, Phạm Thị Thanh Trúc 20224293 report. - Brief view of the project:
This report provides a comprehensive overview of the process and outcomes of our
advanced audio coding project implemented using MATLAB. It outlines each stage
of the development in a clear, step-by-step manner, detailing the methods, algorithms,
and tools used throughout the project. The report also includes code snippets, visual
illustrations, and explanations to support and demonstrate our implementation. By
presenting both the theoretical background and practical execution, we aim to give
readers a complete understanding of how the project was carried out by our group.
II. Details 2.1. Record the audio
− Each of us recorded using phone, mix all with app and converted from mp3 to wav.
− We can also use the 'Part1_recording.m ' file to record
2.2. Spectrum show and analysis 2.2.1. Matlab coding
− After recording, our next task is using Matlab to show spectrum of the recorded audio
signal, then comments on its energy distribution over the frequency axis.
− This is the code we used and result: %% 1. Load Audio File
[y, fs] = audioread('recorded.wav'); y =
y(:,1); % Convert to mono if stereo %% 2. Spectrum Analysis % Calculate FFT N = length(y); Y = fft(y); f = linspace(0, fs, N); % Create figure with subplots
figure('Name', 'Audio Spectrum Analysis', 'Position', [100 100 1200 800]); % Plot 1: Time Domain Signal
subplot(2,1,1); t = (0:N-1)/fs;
plot(t, y); xlabel('Thời gian (s)'); ylabel('Biên độ');
title('Tín hiệu âm thanh gốc'); grid on; % Plot 2: Frequency Spectrum subplot(2,1,2);
plot(f(1:N/2), abs(Y(1:N/2)));
xlabel('Tần số (Hz)'); ylabel('Biên độ');
title('Phổ tần số của tín hiệu âm thanh'); grid on;
%% 3. Frequency Band Analysis % Define frequency bands
bands = [0 100; 100 500; 500 2000; 2000 8000; 8000 fs/2];
band_names = {'0-100 Hz', '100-500 Hz', '500-2000 Hz', '2000-8000 Hz', '8000+ Hz'};
energy_bands = zeros(length(bands), 1); % Calculate energy in each band
for i = 1:length(bands) band_indices = f >=
bands(i,1) & f <= bands(i,2); energy_bands(i) =
sum(abs(Y(band_indices)).^2); end % Normalize energy
energy_bands = energy_bands / sum(energy_bands) * 100; % Display energy distribution
disp('=== Phân bố năng lượng theo dải tần ==='); for i = 1:length(bands)
fprintf('%s: %.2f%%\n', band_names{i}, energy_bands(i)); end
− And it gives out the following result:
2.2.2. Spectrum analysis a. Frequency Band Analysis
− 0–100 Hz (2.37%): This very low-frequency range mostly contains background
noise, such as wind or unwanted ambient sounds.
− 100–500 Hz (70.50%): This range holds the majority of the signal’s energy. It likely
represents core sound components such as human speech fundamentals or main
musical tones, making it a crucial frequency band for natural audio.
− 500–2000 Hz (20.71%): This band contains clearer, more detailed audio
components — often speech formants or instrumental harmonics that enhance clarity.
− 2000–8000 Hz (4.53%): Higher-frequency components contributing to brightness or
timbral nuance, usually from vocals or high-pitched instruments. These are
perceptually important but lower in energy.
− 8000+ Hz (1.88%): This ultrasonic range contains very little energy and likely holds
inaudible frequencies or minor noise, often non-essential for perceptual audio quality. b. Observations
− The energy distribution reflects the nature and quality of the recorded audio. In the
case of speech, most energy resides in low to mid-frequency bands, which is consistent with the analysis.
− The dominance of the 100–500 Hz band indicates this is the primary region carrying meaningful audio content.
− Frequencies above 8000 Hz contribute little to the overall energy and may be safely
discarded or heavily compressed without noticeable quality loss.
c. Recommendation for Compression
Based on this analysis, we recommend prioritizing the preservation of low and
midfrequency bands during compression, while applying more aggressive reduction or
removal to high-frequency components. This strategy helps maintain perceptual quality
while significantly reducing file size.
2.3. NFW Compression and Decompression
2.3.1. Audio Compression Using NFW
Objective: The first script aims to compress an audio file using NFW, producing a .mat file
containing quantized magnitude and phase components.
Steps Involved:
1. Reading the Audio File:
a. The audio is read from a WAV file using audioread. The mono channel is selected for processing.
2. Short-Time Fourier Transform (STFT):
a. The audio signal is divided into frames using a windowed STFT process
(with a 2048-sample frame length and 1024-sample hop length).
b. FFT is applied to each frame to generate the STFT matrix, which contains
both magnitude and phase information for each frequency bin.
3. Non-linear Frequency Warping (Mel Scaling):
a. The frequency bins are warped using Mel scaling to mimic the human
auditory system's frequency perception.
b. The warped frequency indices are used to re-assign magnitudes from the
original STFT to a Mel-scaled matrix. 4. Quantization:
a. Both magnitude and phase matrices are quantized. Magnitudes are quantized
with 6 bits, and phase information is quantized with 4 bits. This reduces the
bit-depth, lowering the storage requirement.
5. Saving the Compressed Data:
a. The quantized magnitude and phase are stored in a .mat file, alongside the
parameters necessary for decompression (sampling frequency, frame length, hop length, etc.).
b. The compression ratio is calculated, comparing the original and compressed bit sizes. Output:
• A .mat file containing the compressed magnitude and phase data.
Compression Information: The script prints the compression ratio, original file size, and
compressed file size for transparency.
2.3.2. Audio Decompression Using NFW
Objective: The second script reconstructs the audio from the compressed .mat file,
reversing the NFW process and producing a reconstructed audio file in .wav format.
Steps Involved:
1. Loading the Compressed Data:
a. The compressed file is loaded to extract the quantized magnitude and phase,
as well as the original audio parameters (frame length, hop length, etc.). 2. Dequantization:
a. The magnitude and phase values are dequantized, reversing the quantization
process by mapping back the levels to the original range.
3. Inverse Frequency Warping (De-warping):
a. The Mel-scaled magnitude and phase are mapped back to the linear
frequency scale. This involves interpolating between the warped indices to
reconstruct the original frequency bins.
4. Reconstruction of the Full Spectrum:
a. The full spectrum (both positive and negative frequencies) is reconstructed
by applying conjugate symmetry for real signals.
5. Inverse Short-Time Fourier Transform (ISTFT):
a. The inverse STFT is applied using the Overlap-Add method. The frames are
synthesized from the magnitude and phase information and combined to
produce the time-domain signal.
6. Saving the Decompressed Audio:
a. The decompressed audio is saved as a .wav file, normalized to prevent
clipping, and played back to compare with the original audio. Output:
• A .wav file containing the decompressed audio.
Decompression Information:
• The script prints details such as the input and output file names, sampling frequency, and audio duration.
Performance and Evaluation Compression Ratio:
• The compression ratio is an important metric. The original audio file is typically in
16-bit PCM format, while the compressed version uses 6-bit magnitude and 4-bit
phase, significantly reducing the storage size. Quality Assessment:
• The decompressed audio is played back and compared with the original to evaluate
the perceptual quality. Since this is a lossy compression method, there might be
some degradation in quality, but the compression ratio and file size reduction justify
its use in storage-limited applications.
2.4. PSNR calculate and compare between NFW and MP3 2.4.1. About PSNR
- PSNR (Peak Signal-to-Noise Ratio) measures the reconstruction quality of a signal
after processing (such as compression and decompression) compared to the original
signal. In the context of audio
• Original signal: recorded.wav (the original audio).
• Reconstructed signal: compressed_mel.wav (the audio after compression and decompression).
- PSNR calculates the error between the two signals in the same domain (either time
or frequency), typically in the time domain (i.e., WAV audio samples).
2.4.2. Convert .wav (original) file to .mp3 file
1. Install FFmpeg for Windows
1. Go to: https://ffmpeg.org/download.html
2. Choose Windows → follow the link to gyan.dev.
3. Download ffmpeg-master-latest-win64-gpl.zi.
4. Extract and add the path of bin folder to your Windows PATH.
2. Use FFmpeg to convert .wav file to .mp3 file 1. Open terminal.
2. Check ffmpeg --version.
3. Copy .wav file to C:\Users\Dell (your path). 4. In terminal, enter:
ffmpeg -i recorded.wav -codec:a libmp3lame -b:a 192k recorded.mp3 The final screen should be:
Then check C:\Users\Dell (your path) again to get .mp3 file.
2.4.3. Compare PNSR value of NFW codec and mp3 compression with original signal
- On the next section of the assignment, we had to calculate the PSNR using this code: input_file = 'recorded.wav';
compare_compression_quality(input_file);
function compare_compression_quality(input_file) try %% 1. Load Original Audio
[original, fs] = audioread(input_file);
original = original(:,1); % Convert to mono if stereo
%% 2. Load NFW Compressed Audio % Load decompressed NFW audio
decompressed_nfw_file = 'decompressed.wav';
if ~exist(decompressed_nfw_file, 'file')
error('File giải nén NFW không tồn tại. Vui lòng chạy hàm decompress_NFW trước.'); end
[nfw_audio, ~] = audioread(decompressed_nfw_file); nfw_audio
= nfw_audio(:,1); % Convert to mono if stereo %% 3. Load MP3 Audio % Load MP3 audio mp3_file = 'recorded.mp3';
[mp3_audio, ~] = audioread(mp3_file); mp3_audio =
mp3_audio(:,1); % Convert to mono if stereo %% 4.
Ensure All Audio Signals Have Same Length
min_length = min([length(original), length(nfw_audio), length(mp3_audio)]);
original = original(1:min_length); nfw_audio = nfw_audio(1:min_length);
mp3_audio = mp3_audio(1:min_length); %% 5. Calculate PSNR % Calculate PSNR for NFW
nfw_mse = mean((original - nfw_audio).^2);
if nfw_mse == 0 nfw_psnr = Inf; else
nfw_psnr = 10 * log10(1 / nfw_mse); end % Calculate PSNR for MP3
mp3_mse = mean((original - mp3_audio).^2);
if mp3_mse == 0 mp3_psnr = Inf; else
mp3_psnr = 10 * log10(1 / mp3_mse); end
%% 6. Calculate Compression Ratios
% Original file size original_info = dir(input_file);
original_size = original_info.bytes * 8; % Convert to bits % NFW compressed size
compressed_nfw_file = 'compressed.mat'; if
~exist(compressed_nfw_file, 'file')
error('File nén NFW không tồn tại. Vui lòng chạy hàm compress_NFW trước.'); end
nfw_info = dir(compressed_nfw_file); nfw_size =
nfw_info.bytes * 8; % Convert to bits nfw_ratio = original_size / nfw_size;
% MP3 compressed size mp3_info = dir(mp3_file);
mp3_size = mp3_info.bytes * 8; % Convert to bits
mp3_ratio = original_size / mp3_size; %% 7. Display Results
fprintf('=== Compression Quality Comparison ===\n');
fprintf('Original file: %s\n', input_file); fprintf('Original
size: %d bits\n', original_size); fprintf('\n');
fprintf('NFW Compression:\n'); fprintf(' Compressed
size: %d bits\n', nfw_size); fprintf(' Compression
ratio: %.2f:1\n', nfw_ratio); fprintf(' PSNR: %.2f
dB\n', nfw_psnr); fprintf('\n');
fprintf('MP3 Compression:\n'); fprintf(' Compressed
size: %d bits\n', mp3_size); fprintf(' Compression
ratio: %.2f:1\n', mp3_ratio); fprintf(' PSNR: %.2f dB\n', mp3_psnr); % So sánh kết quả
fprintf('\n=== Comparison Summary ===\n'); if nfw_psnr > mp3_psnr
fprintf('NFW has better quality (PSNR: %.2f dB > %.2f dB)\n', nfw_psnr, mp3_psnr); elseif mp3_psnr > nfw_psnr
fprintf('MP3 has better quality (PSNR: %.2f dB > %.2f dB)\n', mp3_psnr, nfw_psnr); else
fprintf('Both methods have the same quality (PSNR: %.2f dB)\n', nfw_psnr); end if nfw_ratio > mp3_ratio
fprintf('NFW has better compression ratio (%.2f:1 > %.2f:1)\n', nfw_ratio, mp3_ratio);
elseif mp3_ratio > nfw_ratio
fprintf('MP3 has better compression ratio (%.2f:1 > %.2f:1)\n', mp3_ratio, nfw_ratio); else
fprintf('Both methods have the same compression ratio (%.2f:1)\n', nfw_ratio); end %% 8. Visual Comparison % Create time-domain plots figure; subplot(3,1,1); plot(original); title('Original Audio'); xlabel('Sample'); ylabel('Amplitude'); subplot(3,1,2); plot(nfw_audio); title('NFW Compressed Audio'); xlabel('Sample'); ylabel('Amplitude'); subplot(3,1,3); plot(mp3_audio); title('MP3 Compressed Audio'); xlabel('Sample'); ylabel('Amplitude');
% Create frequency-domain plots figure;
subplot(3,1,1); plot_spectrum(original,
fs); title('Original Audio Spectrum'); subplot(3,1,2); plot_spectrum(nfw_audio, fs);
title('NFW Compressed Audio Spectrum'); subplot(3,1,3); plot_spectrum(mp3_audio, fs);
title('MP3 Compressed Audio Spectrum'); % Create error plots figure; subplot(2,1,1); plot(original - nfw_audio); title('Error: Original - NFW'); xlabel('Sample'); ylabel('Amplitude'); subplot(2,1,2); plot(original - mp3_audio); title('Error: Original - MP3'); xlabel('Sample'); ylabel('Amplitude'); %% 9. Play Audio for Comparison disp('Playing original audio...'); sound(original, fs); pause(length(original) /fs + 1); disp('Playing NFW compressed audio...'); sound(nfw_audio, fs); pause(length(nfw_audio )/fs + 1); disp('Playing MP3 compressed audio...'); sound(mp3_audio, fs); catch ME
fprintf('Error in comparison: %s\n', ME.message);
fprintf('Stack trace:\n'); disp(ME.stack);
rethrow(ME); end end function plot_spectrum(x, fs) % Calculate spectrum N = length(x); X = fft(x); f = linspace(0, fs/2, N/2+1); % Plot magnitude spectrum plot(f, 2*abs(X(1:N/2+1))/N); xlabel('Frequency (Hz)'); ylabel('Magnitude'); xlim([0 min(20000, fs/2)]); end - The result show that:
2.5. Generate MIDI music and perform a Jazz song
− Here is the code to generate midi background music in Python:
from mido import Message, MidiFile, MidiTrack import random
# Tổng thời gian mong muốn: 5 phút = 300 giây
# Với tempo mặc định: 500000 us/beat = 0.5s/beat
# 480 ticks per beat (default) # => 1 giây = 960 ticks
# => 5 phút = 300 giây = 288000 ticks
total_ticks = 288000 ticks_per_note = 600 num_notes
= total_ticks // ticks_per_note # => 480 nốt mid = MidiFile() track = MidiTrack() mid.tracks.append(track)
track.append(Message('program_change', program=0, time=0)) for i in range(num_notes):
note = random.choice([60, 62, 65, 67, 69, 72])
velocity = random.randint(60, 90)
track.append(Message('note_on', note=note, velocity=velocity, time=200))
track.append(Message('note_off', note=note, velocity=0, time=400)) mid.save('midi_output.mid')
- The last thing we do: combining recorded audio and backgroun audio .wav( that is
transfered from the midi ouput generated from the python file.
[y1, fs1] = audioread('recorded.wav');
[y2, fs2] = audioread('midi_output.wav');
% Chuyển về mono nếu cần if size(y1,2) > 1 y1 = mean(y1, 2); end if size(y2,2) > 1 y2 = mean(y2, 2); end
% Cắt đến độ dài nhỏ nhất min_len =
min(length(y1), length(y2)); y1 =
y1(1:min_len); y2 = y2(1:min_len);
% Mix với hệ số 0.5 để tránh clipping mixed = y1 + 0.5 * y2;
mixed = mixed / max(abs(mixed)); % Chuẩn hóa
audiowrite('jazz_mix.wav', mixed, fs1); fprintf("✅
Đã tạo thành công jazz_mix.wav\n"); III. CONCLUSION
This is our first big assignment regarding the Multimedia Data Compression And Coding
subject. Albeit very amateur on working on all problems, it is also such a essential
objective for us – not only a student of Hust but also a member of ET-E16 major to gain
a plethora of experiences in terms of both knowledge and cooperation, aiming for further education in multimedia.
In this project, we successfully applied Non-linear Frequency Warping (NFW)
combined with quantization to compress audio signals efficiently. The analysis of
frequency energy distribution helped us focus on preserving essential low- and
midfrequency components while discarding high-frequency details that contribute less to perceived quality.
Additionally, we generated a MIDI jazz sequence using a randomized C major jazz
scale. This showcased how programmatic music creation can be used to simulate
realistic audio with minimal file size, suitable for embedded or multimedia systems.
Overall, both the compression technique and MIDI synthesis demonstrated effective
strategies for audio size reduction and generation without sacrificing perceptual quality.

Multimedia data compression & coding: NFW final report môn Đa phương tiện và các ứng dụng giải trí | Trường Đại học Bách Khoa Hà Nội

Tài liệu liên quan:

Bài tập lớn về Mã hóa dữ liệu đa phương tiện | Trường Đại học Bách Khoa Hà Nội

Báo cáo thí nghiệm môn Đa phương tiện và các ứng dụng giải trí | Trường Đại học Bách Khoa Hà Nội

Báo cáo cuối kì Xây dựng chương trình điều khiển Notepad++ bằng giọng nói môn Đa phương tiện và các ứng dụng giải trí | Trường Đại học Bách Khoa Hà Nội

Đề cương tài liệu Việt Nhật môn Đa phương tiện và các ứng dụng giải trí | Trường Đại học Bách Khoa Hà Nội