訊 號 補 償

 
Speech Enhancement
系 統 簡 介
An important problem in speech processing is to detect the presence of speech in noisy environment, where the word boundary is hard to detect exactly. This problem is often referred to as the robust endpoint location problem. The inaccurate detection of word boundary will be harmful to speech recognition and enhancement. In many applications, the problem is further complicated by nonstationary backgrounds where there may exist concurrent noises due to movements of desks, door slams, etc. These background noises can be broadly classified into three classes: impulse noise, fixed-level noise, and variable-level noise. Among the three classes of background noises, the impulse noise can be solved by the parameter of time duration. In fixed noise-level environment, we propose a new word boundary detection algorithm by using a neural fuzzy network (called ATF-based SONFIN algorithm) for identifying islands of word signals. We further propose a new RTF-based RSONFIN algorithm where the background noise level varies during the procedure of recording. The adaptive time-frequency (ATF) and refined time-frequency (RTF) parameters extend the TF parameter from single band to multiband spectrum analysis, and help to make the distinction of speech and noise signals clear. The ATF and RTF parameters can extract useful frequency information by adaptively choosing proper bands of the mel-scale frequency bank. Due to the self-learning ability of SONFIN and RSONFIN, the proposed algorithms avoid the need of empirically determining thresholds and ambiguous rules. The RTF-based RSONFIN algorithm can also find the variation of the background noise level and detect correct word boundaries in the condition of variable background noise level by processing the temporal relations.

Structure of the Recurrent Self-Organizing Neural Fuzzy Inference Network (RSONFIN)

The RTF-based RSONFIN algorithm for automatic word boundary detection

 

實 驗 結 果

Original clean speech signal.

Speech signal with additive increasing-levelwhite noise (SNR = 10 dB).
The word boundaries detected by the RTF-basedRSONFIN algorithm are shown by solid lines.

Enhanced speech signal without noise estimation
during speech segments

Enhanced speech signal with noise estimation
during speech segments