DOI: 10.3724/SP.J.1146.2006.01021

Journal of Electronics & Information Technology (电子与信息学报) 2008/30:2 PP.362-366

Endpoint Detection of Whispers Based on the Fitting Characteristic of EMD

Whispered speech is the especial form of people’s pronunciation. There is lower Signal-to-Noise Ratio (SNR) in whispers and unobvious pitch waveform compared with the normal speech, so it is more difficult to process the whispered speech. The endpoint detection of whispers is the first pivotal step of whispered speech signal processing. This paper uses the Empirical Mode Decomposition (EMD) of Hilbert-Huang Transform (HHT) to solve the problem, and firstly proposes a novel algorithm of endpoint detection of whispered speech based on the fitting characteristic of EMD. Normalize the energy of Intrinsic Mode Function (IMF) obtained by EMD, and use the fitting parameters of the energy as the characteristic and then the endpoint of whispers can be easily divided. The results of experiments show that it is very useful in endpoint detection of whispers, and the accurate rate is 98.25% in 1200 samples (SNR=2~10dB).

Key words:Hilbert-Huang Transform (HHT),Empirical Mode Decomposition (EMD),Intrinsic Mode Function (IMF),Fitting characteristic of normalized energy

ReleaseDate:2014-07-21 15:37:56

[1] 拉宾纳, 谢弗著. 朱雪龙等译. 语音信号数字处理 [M]. 北京:科学出版社, 1983: 100-105. Written by Rabiner L R and Schafer R W. Translated by Zhu X L. Digital Processing of Speech Signals [M]. Beijing: Science Press, 1983: 100-105.

[2] 陈四根, 和应民. 一种基于信息熵的语音端点检测方法 [J]. 应用科技, 2001, 28(3): 13-14. Chen S G and He Y M. A scheme of speech endpoint detection based on information entropy [J]. Applied Science and Technology, 2001, 28(3): 13-14.

[3] 胡光锐, 韦晓东. 基于倒谱特征的带噪语音端点检测 [J]. 电子学报, 2000, 28(10): 95-97. Hu G R and Wei X D. Endpoint detection of noisy speech based on cepstrum [J]. Acta Electronica Sinica, 2000, 28(10): 95-97.

[4] Drouiche K, Gomez P, Alvarez A, Martinez R, Rodellar V, and Nieto V. A spectral distance measure for speech detection in noise and speech segmentation [C]. Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing, Singapore, 2001: 500-503.

[5] 韦岗, 陆以勤, 欧阳景正. 混沌,分形理论与语音信号处理 [J]. 电子学报, 1996, 24(1): 34-39. Wei G, Lu Y Q, and Ouyang J Z. Chaos and fractal theories for speech signal processing [J]. Acta Electronica Sinica, 1996, 24(1): 34-39.

[6] Chen S H, Liao Y F, and Chiang S M, et al.. An RNN-based pre-classification method for fast continuous mandarin speech recognition [J]. IEEE Trans. on Speech and Audio Processing, 1998, 6(1): 86-90.

[7] 朱杰, 韦晓东. 噪声环境中基于HMM模型的语音信号端点检测方法[J]. 上海交通大学学报, 1998, 32(10): 14-16. Zhu J and Wei X D. Speech signal endpoint detection method based on HMM in noise [J]. Journal of Shanghai Jiaotong University, 1998, 32(10): 14-16.

[8] Robert W M and Mark A C. Reconstruction of speech from whispers [J]. Medical Engineering & Physics, 2002, 24(8): 515-520.

[9] Higashikawa M. Perceived pitch of whispered vowels–relationship with formant frequencies: a preliminary study [J], Journal of Voice, 1996, 10(2): 155-158.

[10] 栗学丽, 丁慧, 徐柏龄. 基于熵函数的耳语音声韵分割法 [J]. 声学学报, 2005, 30(1): 69-75. Li X L, Ding H, and Xu B L. Entropy-based initial/final segmentation for Chinese whispered speech [J]. Acta Acoustica, 2005, 30(1): 69-75.

[11] Huang N E, Shen Z, and Long S R, et al.. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis [J]. J Proc. R. Soc. Lond. A, 1998, 454: 903-995.

[12] Liu Z F, Liao Z P, and Sang E F. Speech enhancement based on Hilbert-Huang transform [C]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China, 2005, 8: 4908-4912.

[13] Huang H and Pan J Q. Speech pitch determination based on Hilbert-Huang transform [J]. Signal Processing, 2006, 86(4): 792-803.

[14] 杨莉莉, 李燕, 徐柏龄. 汉语耳语音库的建立与听觉实验研究[J]. 南京大学学报(自然科学), 2005, 41(3): 311-317. Yang L L, Li Y, and Xu B L. The establishment of a Chinese whisper database and perceptual experiment [J]. Journal of Nanjing University (Natural Sciences), 2005, 41(3): 311-317.

[15] Taisuke I, Kazuya T, and Fumitada I. Analysis and recognition of whispered speech [J]. Speech Communication, 2005, 45(2): 139-152.

[16] 吴宗济,林茂灿主编. 实验语音学概要[M]. 北京: 高等教育出版社, 1989: 112-152. Wu Z J and Lin M C. Experiment Phonetics [M]. Beijing: Higher Education Press, 1989: 112-152.