Most fraud prevention technologies—such as PINs, call pattern analysis, authentication, and radio frequency fingerprinting— are only partially effective. Rather than verifying the caller, they merely authenticate a piece of information (PIN, ESN/MIN), a piece of equipment (RF fingerprinting, authentication), or the subscriber’s call patterns (call pattern analysis).
Voice verification systems are based on the uniqueness of each person’s voice and the reliability of the technology that can distinguish one voice from another by comparing a digitized sample of a person’s voice with a stored model or “voice print.” One of the most advanced voice verification systems comes from T-NETIX, Inc. The company uses a combination of decision- tree and neural network technologies to implement what it calls a “neural tree network”.
The neural tree is comprised of nodes, or neurons, that are discriminantly trained through multiple repeated utterances of a subscriber-selected password or a small sample of speech. Discriminant training contrasts the acoustic features of the speaker being enrolled to features of the speakers already enrolled in the service.
During the verification process, each neuron must decide whether the acoustic features of the spoken input are more like those of the person whose identity is claimed or more like those of other speakers in the system. The neural tree network technology permits this complex decision-making, or discriminant, process to be completed in a relatively short period of time in contrast to other technologies.
In effect, yes/no decisions are reached at each neuron of the neural tree, and a conclusion is reached after moving through five or six branches of the tree. The relative simplicity of the neural network decision-path design facilitates rapid analysis of spoken input with no upward limit on the number of enrollees. The technology is also robust in its ability to determine and isolate channel environmental conditions.
The front-end analysis recognizes and normalizes conditions such as background noise, channel differences, and microphone variances. The mobile service subscriber goes through an enrollment process consisting of the following steps:
- To access the enrollment system, the subscriber inputs his or her identity using a PIN.
- The voice response unit prompts the subscriber to speak the password a few times (typically three or four). The speaker verification technology averages the voice samples to obtain a more robust voice model for the subscriber.
- The technology then analyzes the characteristics of the subscriber’s statement of the password and characterizes its tonal aspects. The process also results in characterization and isolation of the channel environment (i.e., line type, hand-set type).
- The system segments the voice utterance into its subword units in order to examine the utterance in greater detail.
- Models for the voice segments are created and compared with other samples stored in the database to train the system to distinguish between individuals with similar voice characteristics.
- Finally, the system loads the subscriber’s voice model into the voice identification database, indexing it to the subscriber’s numeric identifier.
The voice verification system can reside on a public or private network as an intelligent peripheral or can be placed as an adjunct serving a Private Branch Exchange (PBX) or Automatic Call Distributor (ACD). In a mobile environment, the system can be an adjunct to a Mobile Switching Center (MSC).