Improved time spread echo hiding method for robust môn Kỹ thuật máy tính | Đại học Bách Khoa, Đại học Đà Nẵng
Improved time spread echo hiding method for robust môn Kỹ thuật máy tính | Đại học Bách Khoa, Đại học Đà Nẵng Improved time spread echo hiding method for robust môn Kỹ thuật máy tính | Đại học Bách Khoa, Đại học Đà Nẵnggiúp sinh viên tham khảo, ôn luyện và phục vụ nhu cầu học tập của mình cụ thể là có định hướng, ôn tập, nắm vững kiến thức môn học và làm bài tốt trong những bài kiểm tra, bài tiểu luận, bài tập kết thúc học phần, từ đó học tập tốt và có kết quả cao cũng như có thể vận dụng tốt những kiến thức mình đã học
Preview text:
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/224720221
Improved time spread echo hiding method for robust and transparent audio watermarking Conference Paper · July 2007
DOI: 10.1109/SIU.2007.4298838·Source: IEEE Xplore CITATIONS READS 2 108 3 authors: Yousof Erfani Mehdi Parviz Apple Inc.
10 PUBLICATIONS81 CITATIONS
11 PUBLICATIONS187 CITATIONS SEE PROFILE SEE PROFILE Shirin Ghanbari self
6 PUBLICATIONS25 CITATIONS SEE PROFILE
All content following this page was uploaded by Yousof Erfani on 13 August 2014.
The user has requested enhancement of the downloaded file.
Improved time spread echo hiding method for robust and transparent audio watermarking
Yousof Erfani, Mehdi Parviz, Shirin Ghanbari
Multimedia Group, IT Department, Iran Telecomm Research Center, Tehran, Iran
Emails:{erfani, mparviz, sghanb}@itrc.ac.ir Abstract
conventional echo hiding systems they also have
concerning problems, that being security. The
In this paper we propose an accurate and content-
conventional echo hiding simply uses the watermarking
based algorithm for embedding and detecting
bit and does not utilize any other symmetric or public
watermarks within audio signals. The algorithm is
key. Their decoder is lenient and any unauthorized
based on a new time spread method with minimum
receiver (i.e. pirates) can detect the watermarked bit and
signal distortion. At first a novel time spread method is
hence these systems are not appropriate for full robust
introduced such that the detection process does not rely watermarking [2].
upon the presence of a peak at a special delay but
Time spread echo hiding (TS) is a strong approach
instead it depends on a correlation quantity. Next we
compared to conventional echo hiding systems. This
adaptively select the coefficients of embedding echoes
system uses the many echoes spreading in time with
in each segment based on a correlation quantity
small amplitude for each segment as in real room (that
between the audio signal cepstral content and a pseudo
includes echoes). This system uses a key to generate a
random noise. In this method the effect of the original
pseudo random noise as the echo coefficients. In
signal at the detector (prime source of error within non-
inserting one bit to a segment, a sequence of echoes is
attack environments), is removed, similar to the
embedded into one segment, but instead the system
improved spread spectrum watermarking and is shifted
solves the security problem of the conventional echo
to the encoder stage. This will make the system highly
hiding while preserving good audio quality.
accurate and robust against signal processing attacks
Although benefits can clearly be seen within TS echo
in comparison with conventional time spread methods.
hiding systems, they still present two essential
Good results were obtained for watermark inaudibility
problems. Firstly the watermark is not permeated the
through Mean Opinion Test (MOS) test and SNR value
entire signal and it is inserted after a special delay in comparisons.
each segment and this is an essential weakness is seen
within watermarking security [7]. The second problem 1. INTRODUCTION
is the erroneously detection of watermarked bit even in
the condition of no attacks, like the conventional echo
Digital watermarking is an important technique for the
hiding. This problem is due to the effect of the original
protection of digital media contents through the
audio signal in the detector process.
insertion of a hidden copyright message. Such a
Here, we first propose a new TS echo hiding method
message is a group of bits describing information
that differs from conventional TS echo watermarking
pertaining to the content or the authors’ content. In the
systems and solve the first presented problem. In this
case of audio content, several algorithms have been
system the watermarked bit is not related to a delay but
proposed such as echo hiding [1, 2], spread spectrum as a sign bit b ∈{ , 1 − }
1 to be detected. The receiver does
modulation [3], quantization index modulation (QIM)
not distinguish this embedded bit through realizing a
[4], pitch scaling [5] and etc [6]. Due to their simplicity
maximum apex in the correlation signal between the
and slight distortion to the original signal, echo hiding
cepstrum and pseudo random noise, and instead detects
systems are more advantageous for watermarking
it through the degree of correlation within the receiver. applications
In solving the second problem, we will change the
In conventional echo hiding systems a single or double
proposed systems encoder and decoder through
echo is inserted into the audio signal while the
eliminating the effect of the original audio in the
watermarked bit is selected based on the echo delays.
detector and convey it to the encoder stage as in ISS
The detector of such systems uses cepstrum analysis to
watermarking [8]. Afterwards the system detector will
detect the embedded echo delay(s) and correspondingly
detect the watermarked bit with no fault in the no-attack
the watermarked bit [1]. In spite of the many benefits of
environments and the robustness to the signal
processing attacks will be increased as a result while cc( ) n = c ( )
n ⊗ p(n) = n (n) + . a p( )
n ⊗ p(n − d ) (4) y s
preserving audio quality. This can be seen through the
experimental result. The imperceptibility of the
presumably has a peak at d and the receiver
proposed method is investigated via a listening test and c ( c n) SNR values.
decides the embedding bit based on the delay that he or
In section 2, we discuss the basics of TS Echo Hiding
she discovers corresponding to this peak.
Watermarking. In section 3, we present a new design
Here a big problem to solve is the first term of (4) that
for such a methodology, and we improve the proposed
may make the detection process erroneously. As
system to an accurate content based system in section 4.
another problem, it is comprehensible that watermark is
Experimental results will be discussed in section 5 and
not embedded into the whole of signal. To solve these
finally we conclude the paper in section 6.
two problems, we first introduce a novel TS echo
hiding in the next section and after that improve its
2. TS ECHO HIDING WATERMARKING
decoder in the proceeding section.
In echo hiding systems, the original audio signal is
3. PROPOSED TS ECHO HIDING
convolved with a kernel signal to make watermarked WATERMARKING
signals. A kernel signal is composed of some discrete
impulses in the time domain that are discerned by
The encoder of (2) is changed to the below relation
delays and amplitudes. Usually the watermark to be h′( )
n = δ (n) + α.bp(n)
embedded is distinguished by the delays or a key to (5) N generate these delays.
y(n) = x(n ) + α.b
p (i ).x(n − i) 0 < α << 1 ∑
After the original audio signal is segmented, the echoed i 0 =
signal y(n) will be the convolution of original audio
x(n) and the kernel h(n) . The kernel for TS is as
Unlike the conventional TS echo hiding, the
p(n) sequence is embedded to the entire original audio h(n) = δ ( )
n +α . p(n − d ) (1)
signal from first bit to the end. b ∈ 1 ± is the bit to be
embedded and decoded in the decoder by means of a
p(n) is a pseudo random noise whose amplitude is 1 ±
pseudo random sequence p(n) and in the above relation,
, δ (n) is Dirac Delta function, α is a small value as
N is the audio signal length.
echo coefficients and d is a delay that is selected
In the decoder stage after applying the cepstrum
between two values corresponding to one or zero bit
transform to the watermarked signal we will have
embedding. By using this kernel the watermarked
c (n) = c (n) + α . b p(n) (6) y x
signal is a faint copy of real room echo of original
signal and more desirable for ear. By using a key for
We define a normalized correlation amount as
generating p(n) by means of a linear shift register the N
algorithm will be key dependent and secure. C = 1
x(n).y(n) (7) N ∑
In the embedding stage, the watermarked signal will be n=1
the original signal segment besides attenuated and delayed copies of it.
And after computing the normalized correlation amount N
between c (n) and p(n) , we will have ( y ) n = ( x ) n +α. p(i). ( x n − d − ) i 0 < α << 1 (2) ∑ y N i=0 1 C = p n ( ) c . ( ) = N ∑ y n n=1
N is the p(n) length and is smaller than segment size. N 1 c ( n
( ).p(n) +a b . .p n ( ).p( )) = (8) N ∑ x n
The cepstrum transform c ( ) 1 n F − = (ln(F( [ y n]))) is y n=1 used in the decoder: N 1 = + α − (3) c ( ) n c (n) . p(n d ) ( ). ( ) . N ∑c n p n + x a b y x n=1
By generating right p(n) by authorized receiver, the
final step is to take a cross-correlation between c (n) y
The correlation amount in (8) have two terms, left term
that is a noise section due to the original signal effect and p(n)
in the detector and is considered as the source of error
If we rewrite the decoder equations (6), (7) and (8) for x(n)
this system, the correlation amount will change to the Convolution following C = a b . (10) b α × y(n)
It is clear from (10) that the noise source was removed in the correlation amount in the no-attacks PNG
environments and based on b the correlation amount
will be positive or negative and hence the decoder will
distinguish the embedded bit exactly. Due the decoder Key
blindness, it hasn’t the original audio signal and hence Fig.1 Proposed encoder for TS
cannot remove the original signal effect in the decoder,
but the encoder has the original audio signal and can
remove the original audio signal effect in the decoder y(n Correlation Comparato ) b antecedently.
5. PROPOSED METHOD ASSESSMENT AND Key PNG EXPRIMENTAL RESULTS
Fig.2. proposed decoder for TS
By removing the original audio signal effect in the
decoding stage, we make an accurate detector for TS
in detection process and the right term that the
echo hiding watermarking. This algorithm will be more
robust against signal processing attacks because the
watermark bit b is in it.
original signal effect in the decoder as the source of
The detector distinguishes the watermark bit based on
misdetection is much larger than signal processing
the sign of the correlation amount. The larger the
attacks. We can make the system more robust against
parameter α is, the more robust the watermark will be
signal processing attacks by increasing the value ofα
and the less the inaudibility will become.
In the case of audibility, we add λ , a value related to
This system is considerably different from that of TS
the original signal cepsrtal contents, to the coefficients
echo hiding. In the encoder the watermark is spread into
of echoes in the embedding stage. The value of λ is
the whole of the signal and the watermark bit is a sign
bit, not a special delay. The system decoder is relied on
approximately smaller than .01, nevertheless the value
of α is in this range too. The cepstrum of the original
a correlation amount, instead of a peak at the decoder.
The encoder and decoder for proposed system are
signal is a decreasing function. An increase in the
shown within Fig.1 and Fig.2. In these figures, PNG,
length of the segment size N causes a slight change in
pseudo random number generator, is a system that uses
the cepstrum of the original audio signal and a decrease
some bits as a key to generate pseudo random stream
in theλ value. Therefore, in the case of big segment
sizes, the audio quality will be improved, at the cost of,
4. PROPOSED METHOD FOR ACCURATE
a small growth in computational load.
CONTENT BASED TS ECHO HIDING
Here we use 5 audio clips for our experiments: a speech
clip with big silences, an audio clip containing just
The left term of the decoding equation (8) is the main
Persian signing with no instruments, a clip containing
source of error in the detection process even in the no-
just a discrete instrument (Tar: an Persian lute), a clip
attacks environments. Here, we remove it from the
containing a continuous instrument (violin) and a clip
decoder and move it to the encoder by using the real
containing an orchestra (many instruments), whereby a
cepstrum instead of the complex cepstrum and changing
duration of ten seconds of each clip is used. The clips
the encoder stage to the below equation
are sampled with 44.1 kHz and 16 bit quantization. N
After segmentation and hanning windowing for
y (n ) = x (n )+ (α.b − λ)
p (i ).x (n −i ) 0 <α <<1 ∑
reducing the artifacts of the neighboring segments we
apply the proposed watermarking scheme to each i=0 (9)
segment (1 second). The result is an average for all N λ = 1 ( ). ( )
segments and all 5 audio clips. We compare N ∑c x n p n
conventional TS and our proposed method in these = n 1
experiments. We use 100 and 110 bits for zero and one
Table.1.Robustness, Subjective test and SNR comparison
through making essential changes in the encoder and Conventional Proposed
decoder of the TS echo hiding. Afterwards we proposed OPTION TS TS
a content based echo hiding system based on the first
proposed method that solved the second problem of TS No attacks BER 14.5% 0%
echo hiding. In this system we removed the original Mp3 attack BER 45% 47%
audio signal effect in the blind decoder and shifted it to Quantization attack 17.5% 5.5%
the encoding stage and because of that the receiver BER
could detect the watermarked bit with no-error. Good Re-sampling BER 21% 15%
experimental results were obtained for robustness Noise attack BER 19.5% 12.5
against attacks and audio signal quality. As a result,, the SNR(dB) 22.5 17.5
watermarked signal quality was reduced slightly. The MOS 4.8 4.5
authors are currently working on the improvement of
audio quality of the proposed algorithm. This is to be
achieved through the analysis-by-synthesis approach
bit embedding in conventional TS echo hiding. We use described in [10]. α = 0 . 1 for both systems.
The experimental results for robustness and audibility
of proposed method, in comparison to the conventional 7. REFERENCES
TS echo hiding, are shown within Table.1.
[1] D. Gruhl and W. Bender, “Echo hiding”, in Proc.
Our experiments were done under the following
Information Hiding Workshop, Cambridge, U.K., pp. 295– conditions: 315, 1996
No attacks: closed loop (immediately decoding after
[2] B. –S. Ko, R Nishimura,Y. Suzuki, “Time-Spread Echo encoding)
Method for Digital Audio Watermarking”, IEEE Trans On
Mp3 attack: compressing the watermarked signal by
Multimedia, VOL. 7, NO. 2, April 2005
Mpeg-3 layer1 and reverting it again to the original
[3] D. Kirovski, and H. Malvar, “Robust spread spectrum
audio watermarking”, IEEE International, Conference on wave file
Acoustics, Speech, and Signal Processing, Salt Lake City, UT,
Re-sampling: sampling the watermarked signal with pp. 1345-1348, 2001 16 kHz sampling rate
[4] B. Chen and G. W. Wornell, “Quantization index
Re-Quantization: quantizing the watermarked signal
modulation: A class of provably good methods for digital with 8 bits
watermarking and information embedding”, IEEE Trans. on
Noise attack: adding noise with zero mean and
Information Theory, vol. 47, no. 4, pp. 1423-1443, May 2001
Gaussian power density function to the watermarked
[5] S. Shin, O. Kim, J. Kim, and J. Choi, “A robust audio
watermarking algorithm using pitch scaling”, IEEE signal.
International Conference on Digital Signal processing, pp.
The BER was calculated by following equation 701-704, 2002
[6] N. Cvejic, “Algorithms for Audio Watermarking and
Number of erroneously decoded bits
Steganography”, PhD thesis, Oulu university, 2004 BER = (11)
[7] I. Cox, M. Miller, and J. Bloom, “Digital Watermarking”,
Number of embedding bits for the clip Academic Press, 2002
We use the ABX test project [9] for the MOS test
[8] H. S. Malvar, D. Florencio, “Improved spread spectrum: A
evaluation while we consider the MOS grade ‘5’ for our
new modulation technique for robust watermarking”, IEEE
original audio clips that we use.
Trans. Signal Processing, Vol. 52, No. 4, pp. 898-905, 2003
As we can see from Table.1, our system is erroneous
[9] ITU-R Rec. BS.1116, “Methods for the Subjective
free in the no-attacks environments. In addition to the
Assessment of Small Impairments in Audio Systems
good quality of the proposed system, its robustness
Including Multichannel Sound Systems”, International
against signal processing attacks is far better than
Telecommunication Union, Geneva, Switzerland, 1994.
[10] Wen-Chih Wu, O.T.-C. Chen, “An Analysis-by- conventional TS echo hiding.
Synthesis Echo Watermarking Method”, Proc. of IEEE Int.
Conf. on Multimedia and Expo, June 2004.
6. CONCLUSION AND FUTURE WORK
The conventional time spread (TS) echo hiding has two
security problems. The first is due to the fact that the
watermark is not inserted to the whole of the original
signal and secondly, it has erroneous watermark bit
detection even in the matter of no-attacks environment.
In this paper, we first proposed a new TS echo hiding
watermarking system that solved the first problem