Speech Coding

Excellent speech quality implies that coded speech is indistinguishable from the original and without perceptible noise. On the other hand, bad (unacceptable) quality implies the presence of extremely annoying noise and artefacts in the coded speech.

Signal-to-Noise Ratio
The speech quality may be gauged from the signal-to-noise ratio, SNR

s(n) = original speech data
sc(n) = coded speech data
N = total number of samples

This is a long term measure of speech reconstruction and does not show up problems with temporal reconstruction noise.

Segmented Signal-to-Noise Ratio
An alternative is to use the short-time signal-to-noise ratio or segmented SNR,

Diagnostic Rhyme Test
The DRT is an intelligibility measure where the subjectís task is to recognize one of two possible words in a set of rhyming pairs (e.g., meat - heat).

Diagnostic Acceptability Measure
Scores are based on results of test methods evaluating the quality of a communication system based on the acceptability of speech as perceived by a trained normative listener.

Mean Opinion Score
Usually involves 12 to 24 listeners who are instructed to rate phonetically balanced records according to a 5-level quality scale.

MOS Scale Speech Quality
  1. Bad
  2. Poor
  3. Fair
  4. Good
  5. Excellent

In MOS tests listeners are "calibrated" in the sense that they are familiarized with the listening conditions and the range of speech quality they will encounter. Ratings are obtained by averaging numerical scores over several hundreds of speech records. The MOS range relates to speech quality as follows: a MOS of 4-4.5 implies network quality, scores between 3.5 and 4 imply communications quality, and a MOS between 2.5 and 3.5 implies synthetic quality. We note here that MOS ratings may differ significantly from test to test and hence they are not absolute measures for the comparison of different coders.

