-
AudioVisual Recognition
(Embedded)
(Server Based)
(Combination of Speaker, Speech, Face Recognition, and Object Detection and Recognition with a single interface)
Platform:
This is a multi-modal system using a combination of face and speech
in order to recognize a candidate and to perform full diarization of
the media. The result is a JSON, XML, Text, or HTML response,
containing timestamps signifying the points in time when a speaker
change is detected, with the identity of the speaker at each
segment, according to the vocal and facial characteristics of the
speaker. A fusion of the audio and visual results is provided, as
well as individual results coming from each individual engine. The
transcription is also provided within each segment. For the
identity, as well as the speaker and facial recognition, more than
one possible result is returned with corresponding scores and
confidences, sorted by the score. In summary, the engine is capable
of doing verification and identification based on both speech and
face, face detection, speaker segmentation, and speech
transcription. It is a marriage of our award-winnnig speaker
recognition engine (voice biometrics engine) with our face
recognition engine. We provide a C++ API as well as web, Android,
iOS, and command-line interfaces.
-
Large-Vocabulary Speech Recognition
(Embedded)
(Server Based)
Initially available for English, Spanish, Mandarin, Arabic, and German, is now available for 100+ languages
Also includes multilinguagl support and code-switching
(Customizable domain full transcription ~ 300,000+ word vocabulary)
-
Speaker Recognition
(Embedded)
(Server Based)
(Language- and Text-Independent, aka: Speaker Biometrics, Voice Biometrics, or SIV)
Recipient: Frost & Sullivan Award 2011
-
Face Recognition
(Embedded)
(Server Based)
(Face detection and recognition)
-
Object Recognition
(Embedded)
(Server Based)
(Object detection and recognition)
|