Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Zulfiqar Ali; Muhammad Talha

doi:10.1109/ACCESS.2018.2805845

Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Zulfiqar Ali, Muhammad Talha

Research output: Contribution to journal › Article › peer-review

31 Citations (Scopus)

183 Downloads (Pure)

Abstract

An accurate and noise-robust voice activity detection (VAD) system can be widely used for emerging speech technologies in the fields of audio forensics, wireless communication, and speech recognition. However, in real-life application, the sufficient amount of data or human-annotated data to train such a system may not be available. Therefore, a supervised system for VAD cannot be used in such situations. In this paper, an unsupervised method for VAD is proposed to label the segments of speech-presence and speech-absence in an audio. To make the proposed method efficient and computationally fast, it is implemented by using long-term features that are computed by using the Katz algorithm of fractal dimension estimation. Two databases of different languages are used to evaluate the performance of the proposed method. The first is Texas Instruments Massachusetts Institute of Technology (TIMIT) database, and the second is the King Saud University (KSU) Arabic speech database. The language of TIMIT is English, while the language of the KSU speech database is Arabic. TIMIT is recorded in only one environment, whereas the KSU speech database is recorded in distinct environments using various recording systems that contain sound cards of different qualities and models. The evaluation of the proposed method suggested that it labels voiced and unvoiced segments reliably in both clean and noisy audio.

Original language	English
Pages (from-to)	15494-15504
Number of pages	11
Journal	IEEE Access
Volume	6
DOIs	https://doi.org/10.1109/ACCESS.2018.2805845
Publication status	Published (in print/issue) - 13 Feb 2018

Keywords

fractal dimension
Katz algorithm
KSU speech database
TIMIT database
Voiced and unvoiced segmentation

Access to Document

10.1109/ACCESS.2018.2805845

VAD_IEEE_ACCESSFinal published version, 7.84 MB

Cite this

@article{ab98a602e6d7476b9158b3ea9948a17a,

title = "Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments",

abstract = "An accurate and noise-robust voice activity detection (VAD) system can be widely used for emerging speech technologies in the fields of audio forensics, wireless communication, and speech recognition. However, in real-life application, the sufficient amount of data or human-annotated data to train such a system may not be available. Therefore, a supervised system for VAD cannot be used in such situations. In this paper, an unsupervised method for VAD is proposed to label the segments of speech-presence and speech-absence in an audio. To make the proposed method efficient and computationally fast, it is implemented by using long-term features that are computed by using the Katz algorithm of fractal dimension estimation. Two databases of different languages are used to evaluate the performance of the proposed method. The first is Texas Instruments Massachusetts Institute of Technology (TIMIT) database, and the second is the King Saud University (KSU) Arabic speech database. The language of TIMIT is English, while the language of the KSU speech database is Arabic. TIMIT is recorded in only one environment, whereas the KSU speech database is recorded in distinct environments using various recording systems that contain sound cards of different qualities and models. The evaluation of the proposed method suggested that it labels voiced and unvoiced segments reliably in both clean and noisy audio.",

keywords = "fractal dimension, Katz algorithm, KSU speech database, TIMIT database, Voiced and unvoiced segmentation",

author = "Zulfiqar Ali and Muhammad Talha",

year = "2018",

month = feb,

day = "13",

doi = "10.1109/ACCESS.2018.2805845",

language = "English",

volume = "6",

pages = "15494--15504",

journal = "IEEE Access",

publisher = "IEEE",

}

TY - JOUR

T1 - Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

AU - Ali, Zulfiqar

AU - Talha, Muhammad

PY - 2018/2/13

Y1 - 2018/2/13

N2 - An accurate and noise-robust voice activity detection (VAD) system can be widely used for emerging speech technologies in the fields of audio forensics, wireless communication, and speech recognition. However, in real-life application, the sufficient amount of data or human-annotated data to train such a system may not be available. Therefore, a supervised system for VAD cannot be used in such situations. In this paper, an unsupervised method for VAD is proposed to label the segments of speech-presence and speech-absence in an audio. To make the proposed method efficient and computationally fast, it is implemented by using long-term features that are computed by using the Katz algorithm of fractal dimension estimation. Two databases of different languages are used to evaluate the performance of the proposed method. The first is Texas Instruments Massachusetts Institute of Technology (TIMIT) database, and the second is the King Saud University (KSU) Arabic speech database. The language of TIMIT is English, while the language of the KSU speech database is Arabic. TIMIT is recorded in only one environment, whereas the KSU speech database is recorded in distinct environments using various recording systems that contain sound cards of different qualities and models. The evaluation of the proposed method suggested that it labels voiced and unvoiced segments reliably in both clean and noisy audio.

AB - An accurate and noise-robust voice activity detection (VAD) system can be widely used for emerging speech technologies in the fields of audio forensics, wireless communication, and speech recognition. However, in real-life application, the sufficient amount of data or human-annotated data to train such a system may not be available. Therefore, a supervised system for VAD cannot be used in such situations. In this paper, an unsupervised method for VAD is proposed to label the segments of speech-presence and speech-absence in an audio. To make the proposed method efficient and computationally fast, it is implemented by using long-term features that are computed by using the Katz algorithm of fractal dimension estimation. Two databases of different languages are used to evaluate the performance of the proposed method. The first is Texas Instruments Massachusetts Institute of Technology (TIMIT) database, and the second is the King Saud University (KSU) Arabic speech database. The language of TIMIT is English, while the language of the KSU speech database is Arabic. TIMIT is recorded in only one environment, whereas the KSU speech database is recorded in distinct environments using various recording systems that contain sound cards of different qualities and models. The evaluation of the proposed method suggested that it labels voiced and unvoiced segments reliably in both clean and noisy audio.

KW - fractal dimension

KW - Katz algorithm

KW - KSU speech database

KW - TIMIT database

KW - Voiced and unvoiced segmentation

UR - http://www.scopus.com/inward/record.url?scp=85042131123&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2018.2805845

DO - 10.1109/ACCESS.2018.2805845

M3 - Article

AN - SCOPUS:85042131123

VL - 6

SP - 15494

EP - 15504

JO - IEEE Access

JF - IEEE Access

ER -

Innovative Method for Unsupervised Voice Activity Detection and Classification of Audio Segments

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this