Site Loader
Rock Street, San Francisco


Speech  is 
the  most  usual 
form  of  communication 
and  speech  processing 
has  been  one 
of  the most  exciting 
areas  of  the 
signal  processing.  Speech recognition  technology 
has  made  it 
possible  for  computer 
to follow  human  speech 
inputs  and  understand 
human languages.  The  main 
goal  of the  speech 
recognition  is  to 
develop  techniques  and 
systems  for  speech 
input and  it  will be processed  by a machine. Automatic speech recognition
systems has implemented in real time applications which require human machine
interface, such as automatic call processing 
in telephone networks, and query based information systems that
provide  updated  travel 
information etc.,

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

speech recognition system takes the input from various source languages and
processes it. Subspace Gaussian mixture model is used to store the input
parameters taken from multiple languages. In existing system K-means clustering
is used for multilingual speech recognition. The
proposed system is implemented by using Fuzzy C means clustering. Fuzzy C means
clustering approach is used frequently in pattern recognition problems.

In Multilingual speech recognition system fuzzy based approach is
proposed for better results. Mel frequency coefficient cepstral method is used
for feature extraction process. The word error rate is reduced in proposed system.
An efficient classification is done by using Support Vector Machine technique.
K-means and Fuzzy C means clustering performance has been analyzed by using
confusion matrix. The two clustering techniques K-means and
Fuzzy C means are compared and the results provided. The accuracy rate of the
predicted results has been calculated and performance of the clustering
techniques also analyzed.


speech recognition is a process of identifying speech by a machine. It takes
human speech as input and returns the output as a string of words, phrases or
continuous speech in the form of text. Vocabulary
size, Speaking style, Speaker
mode, Channel type, Transducer
type are –plays vital role in speech recognition. Multilingual Speech Recognition
refers   the speech from different
languages taken as input and the system recognize the speech and process
it.  The proposed system reduces the
error rate, memory footprint and computational bandwidth requirements of a
grammar-based, medium-vocabulary speech recognition system, intended for
deployment on a portable or otherwise low-resource device. Fuzzy C-means
clustering is used in the proposed system to achieve the better performance.
Feature Extraction, Clustering and Classification are done with the best
approaches and the results achieved in multilingual speech recognition system.


The standard method for feature extraction in speech is Mel Frequency
Cepstral Coefficients. The use of about 20 MFCC coefficients is common in ASR,
although 10-12 coefficients are often considered to be sufficient for coding




Log Mel Spectrum






In the pre-processing stage first
each signal is de-noised by soft-thresh holding the wavelet coefficients, and
since the silent parts of the signals do not carry any useful information,

Framing is a process of segmenting
the speech samples obtained from the Analog to Digital Conversion (ADC), into
the small frames with the time length within the range of 20 to 40 ms.

            The work of FFT is to obtain the magnitude
frequency response of each frame. When FFT is performed on a frame, it is
assumed that the signal within a frame is periodic, and continuous when
wrapping around. Each frame has to be multiplied with a
hamming window in order to keep the continuity of the first and the last points
in the frame.

            The Mel filter bank consists of
overlapping triangular filters with the cutoff frequencies determined by the
center frequencies of the two adjacent filters. The filters have linearly
spaced centre frequencies and fixed bandwidth on the Mel scale. The logarithm
has the effect of changing multiplication into addition. DCT is applied on the 20 log energy Ek
obtained from the triangular band pass filters to have L Mel-scale cepstral

Clustering and classification:

          The clustering technique involves in
grouping the similar type of data from the extracted values center point is
calculated and based on distance between data points clustering is performed. K-means Clustering is the process of partitioning a
group of data points into a small number of clusters. The FCM
algorithm attempts to partition a finite collection of elements X={,, … ,} into a collection of c
Fuzzy Clusters with respect to some given criterion.


Implementation is the realization of an
application, or execution of a plan,
idea, model, design, specification, standard, algorithm, or policy. The proposed method is
implemented using the following modules

Data Acquisition Module

Feature Extraction Module

Clustering module

Classification Module

Decision module



            The proposed
system with Fuzz C means algorithm takes input from 160 audio files and
processes it. This audio file are stored along with.wav extension .the isolated
words taken from various languages are used as input data. The same word is
pronounced four languages. The input dataset is taken from Tamil, Hindi,
Malayalam and English languages. The audios are recorded by using sound
recorder with closed mikes in a silent room.


            After acquiring
the input data feature extraction is dine efficiently by using Mel Frequency
Cepstral Coefficient method. Preprocessing, framing, windowing, Mel Filter Bank and Frequency Wrapping are
done for the input audio files and logarithm values taken. After taking
logarithm values discrete cosine Transform is calculated and the values
obtained for next step.



. In existing system K-means algorithm is used .In the proposed
system Fuzzy C means clustering is used and results obtained. The fuzzy based
approach produced better results.

Fig 3: System Flow



           In this proposed work classification
is done using Support Vector Machine (SVM). The classification involves two
processes i.e., Training and Testing. In training phase, all the training
datasets will be trained and placed in the template database. In testing phase,
the test dataset available in the test database will be trained and is compared
with template database for the decision to be made accordingly.



           In this module decision is made
based upon the match scores generated by the classifier. After classification
K-means and Fuzzy C means clustering results are produced. The predicted
results by using the two clustering techniques are compared with actual
results. The performance analysis is done by using confusion matrix .Accuracy
rate is calculated and analyzed.

Post Author: admin


I'm Dora!

Would you like to get a custom essay? How about receiving a customized one?

Check it out