WO2015096429A1 - Procédé et appareil de reconnaissance de la voix d'un appel - Google Patents

Procédé et appareil de reconnaissance de la voix d'un appel Download PDF

Info

Publication number
WO2015096429A1
WO2015096429A1 PCT/CN2014/080661 CN2014080661W WO2015096429A1 WO 2015096429 A1 WO2015096429 A1 WO 2015096429A1 CN 2014080661 W CN2014080661 W CN 2014080661W WO 2015096429 A1 WO2015096429 A1 WO 2015096429A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
call
model library
sample
sound model
Prior art date
Application number
PCT/CN2014/080661
Other languages
English (en)
Chinese (zh)
Inventor
雷杨
华国栋
王勿英
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015096429A1 publication Critical patent/WO2015096429A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates

Definitions

  • the present invention relates to the field of mobile applications, and in particular to a method and apparatus for recognizing a voice of a call.
  • BACKGROUND OF THE INVENTION At present, communication technology has been greatly developed. While the communication industry is developing rapidly, criminal activities using these means of communication for fraud are becoming increasingly rampant, and telephone fraud is one of them. Telephone fraud, that is, using the phone for fraudulent activities, an important means of fraud by criminals is to scam by calling the victim's acquaintance to call the victim. In many cases, the victim cannot immediately distinguish the opposite caller by voice. Identity, or because the face does not promptly challenge the identity of the other party, may lead to fraud.
  • a call voice recognition method including: acquiring a sound sample of a call object that performs a call; comparing the sound sample with a sound in a sound model library; and speaking the call according to the comparison result The sound is identified.
  • the method further includes: sampling and saving the sound of the contact in the address book of the mobile terminal to establish a sound model library, where The sound model library is stored in the remote server and/or in the mobile terminal.
  • Sampling and saving the voice of the contact in the address book of the mobile terminal includes: extracting the sampled sound into a sound vector, and converting the digital vector into a digital vector.
  • Comparing the sound sample with the sound in the sound model library includes: acquiring a counterpart number of the call; searching for a sound in the sound model library according to the counterpart number, and comparing the sound sample with the found sound Compare.
  • the method further includes: comparing the sound sample with all the sounds in the sound model library. Identifying the call voice according to the comparison result includes: when the similarity of the sound found in the sound sample and the sound model library is greater than or equal to a threshold, identifying the call object as the sound model library The user corresponding to the middle sound model; when the similarity between the sound sample and the sound found in the sound model library is less than a threshold, it is confirmed that the call object is a stranger. The method further includes: notifying the mobile terminal of the recognition result of the call object.
  • a call voice recognition apparatus including: an acquisition module, configured to acquire a sound sample of a call object that performs a call of the mobile terminal; and a comparison module configured to set the sound sample and the sound model The sounds in the library are compared; the recognition module is arranged to recognize the call sound based on the comparison result.
  • the device further includes: a saving module, configured to sample and save the sound of the contact in the address book of the mobile terminal, to establish a sound model library, wherein the sound model library is stored in the remote server and / or in the mobile terminal.
  • the saving module includes: an extracting unit configured to perform sound feature extraction on the sampled sound and convert the image into a digital vector; and the saving unit is configured to save the digital vector.
  • the comparison module includes: an obtaining unit configured to acquire a counterpart number of the call; a comparing unit configured to search for a sound in the sound model library according to the counterpart number, and the sound sample and the found sound Compare.
  • the comparison module is further configured to compare the sound samples with all of the sounds in the sound model library in the event that the sound search fails in the sound model library according to the counterpart number.
  • the comparison module and the identification module are located in the mobile terminal or in a server on the network side.
  • the identification module is configured to identify the call object as a sound model corresponding to the sound model library when the similarity between the sound sample and the sound found in the sound model library is greater than or equal to a threshold value
  • the user confirms that the call object is a stranger when the similarity between the sound sample and the sound found in the sound model library is less than a threshold.
  • the device further includes: a notification module, configured to notify the mobile terminal of the recognition result of the call object. According to the present invention, a sound sample for acquiring a call object for making a call is used; the sound sample is compared with the sound in the sound model library; and the call sound is recognized according to the comparison result, and the terminal is unable to pass the call sound in the related art.
  • FIG. 1 is a flowchart of a voice recognition method according to an embodiment of the present invention
  • FIG. 2 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention
  • FIG. 3 is a voice recognition of a voice according to an embodiment of the present invention.
  • FIG. 4 is an optional block diagram 2 of a call voice recognition apparatus according to an embodiment of the present invention
  • FIG. 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention
  • FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention
  • FIG. 8 is a flow chart of a call voice recognition function according to an embodiment of the present invention.
  • Step S102 Acquire a call The sound sample of the call object
  • Step S104 comparing the sound sample with the sound in the sound model library
  • Step S106 identifying the call sound according to the comparison result.
  • the obtained sound sample of the call object is compared with the sound stored in the sound model library in advance, and the call sound is recognized according to the comparison result.
  • the terminal cannot distinguish the opposite call by the call voice.
  • the identity of the person can identify the voice of the call at the opposite end of the call, and then identify the identity of the person at the opposite end of the call, so that the mobile terminal user can determine whether the opposite end of the call is a stranger. More preferably, the user can select whether to continue the call or adjust the content of the call according to the result of the judgment, and can also select an alarm, thereby effectively reducing the occurrence of the mobile phone fraud event and improving the security.
  • the sound model library may be pre-established prior to comparing the sound samples to the sounds in the sound model library. The establishment of the sound model library can be implemented in various ways. In this embodiment, a relatively good implementation manner is provided. In this manner, the sound model library is established through the address book of the mobile terminal.
  • the voice of the contact is set up and saved, wherein the sound model library is stored in the remote server and/or in the mobile terminal.
  • the sampling process may be to select a recording and get a sound sample of the contact each time a call to the contact is received.
  • the user knows the voice of the contact, so that a more accurate sound sample can be obtained.
  • the sound model library may be corresponding to each user.
  • both user A and user B have their own sound model libraries.
  • the sound database can also be shared by multiple users or a group of users. For example, all users of a company or a group share a sound model library, and the shared sound model library can be concentrated after each user records the sound sample by himself. Formed together.
  • the operator can use the obtained sound samples of all users as a large sound model library, and the sound model library can provide users with more comprehensive voice recognition.
  • the sampling process and the saving of the voice of the contact may be implemented in various manners.
  • a preferred implementation manner is provided.
  • the sound obtained by the sampling may be extracted and converted.
  • the digital vector is saved, and then the voice of the contact in the address book of the mobile terminal is sampled and saved.
  • there are many ways to obtain a party There is a relatively straightforward way to obtain the party number of the call, find the voice in the voice model library according to the number of the party, and find the voice sample and the sound. The sound is compared.
  • the other party number exists in the address book of the mobile terminal, and the sound model library is sampled and saved by the voice of the contact in the address book, the other party number is directly searched in the sound model library in the sound model library.
  • the sound in the middle compares the sound sample with the found sound; when the other party number is not in the address book of the mobile terminal, finds whether the other party's number has a corresponding sound in the sound model library, if there is a corresponding sound , compares the sound sample to the sound you find.
  • the sound samples can be compared with all the sounds in the sound model library in the case where the sound search fails in the sound model library according to the counterpart number.
  • a similarity determination method may be adopted for the recognition of the sound.
  • the call object When the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold, the call object is identified as the sound model library. The user corresponding to the sound model; when the similarity of the sounds found in the sound sample and the sound model library is less than the threshold, the call object is confirmed to be a stranger.
  • the recognition result of the call object may also be notified to the mobile terminal.
  • a call voice recognition device is also provided, and the device is used to implement the foregoing device. The description of the device in the device is not described here.
  • the name of the module in the device should not be understood as The module is defined, for example, an acquisition module, which is set to obtain a sound sample of a call object for making a call, and may also be expressed as "a module for acquiring a sound sample of a call object for making a call", the module described below
  • the function can be implemented by the processor.
  • 2 is a block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 2, the method includes: an acquisition module 22, a comparison module 24, and an identification module 26.
  • the obtaining module 22 is configured to obtain a sound sample of the call object that performs the call; the comparing module 24 is configured to compare the sound sample with the sound in the sound model library; and the identifying module 26 is configured to The call voice is recognized.
  • the comparison module 24 and the identification module 26 may be located in the mobile terminal or in a server on the network side.
  • 3 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus further includes: a saving module 32 configured to sample a voice of a contact in an address book of the mobile terminal. Processing and saving to build a sound model library, wherein the sound model library is stored in the remote server and/or in the mobile terminal.
  • the saving module 32 includes: an extracting unit 42 configured to perform sound feature extraction on the sampled sound and convert it into a digital vector.
  • the save unit 44 is set to save the digital vector.
  • 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention.
  • the comparison module 24 includes: an acquisition unit 52 configured to acquire a counterpart number of a call; and a comparison unit 54 configured to The number looks up the sound in the sound model library and compares the sound sample to the found sound.
  • the comparison module 24 is further configured to compare the sound samples to all of the sounds in the sound model library in the event that the sound search fails in the sound model library based on the counterpart number.
  • the identification module 26 is configured to identify the call object as a user corresponding to the sound model in the sound model library when the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold value; When the similarity between the sound sample and the sound found in the sound model library is less than the threshold, it is confirmed that the call object is a stranger.
  • FIG. 6 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG.
  • the apparatus further includes: a notification module 62, configured to notify the mobile terminal of the recognition result of the call object.
  • a notification module 62 configured to notify the mobile terminal of the recognition result of the call object.
  • the apparatus in this alternative embodiment includes two subsystems: a front end subsystem and a back end subsystem.
  • the front-end subsystem can include four modules, namely: 1. a user interface interface module; 2. a sound sampling module; 3.
  • the back-end subsystem includes five modules, which are: 1. User configuration management module; 2. Sound feature extraction module; 3. Sound model creation module; 4. Sound recognition module; 5. Communication interface module.
  • the voice recognition module implements the functions of the comparison module 24 and the recognition module 26 described above. These modules are described below.
  • Sound Sampling Module responsible for capturing the voice of the other party's speaker during the call, and then handing it over to the sound feature extraction module of the front-end subsystem.
  • Sound Feature Extraction Module responsible for converting the acquired sound extraction features into digital vectors.
  • Sound Model Creation Module responsible for establishing a sound model for the sound digital vector after feature extraction.
  • Voice recognition module Used to identify the identity of the caller based on the voice.
  • FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention.
  • the front end subsystem includes: a user interface interface module, a sound sampling module, a sound feature extraction module, and a communication interface module.
  • the backend subsystem includes: a user configuration management module, a sound feature extraction module, a voice recognition module, a sound model creation module, and a communication interface module.
  • the front-end subsystem of the device can be deployed to the user's smartphone, and the back-end subsystem of the device can be deployed to the user's smartphone or deployed to the back-end server. If the back-end subsystem is deployed on the smartphone, the front-end subsystem and the back-end subsystem use the internal communication communication mode of the mobile phone operating system. If the back-end subsystem is deployed to the back-end server, the front-end subsystem and the back-end subsystem use wifi or 3G network communication method.
  • the backend subsystem is responsible for creating and storing the voice model of the contacts in the address book for the mobile phone user, and the front end subsystem is responsible for sampling the voice of the opposite speaker during the mobile phone call, and then uploading the sampled and feature extracted sound samples to the rear terminal.
  • FIG. 8 is a flowchart of a call voice recognition function according to an embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:
  • the phone received an incoming call.
  • the front-end subsystem of the device will match the phone address book to confirm whether the caller number belongs to the existing number in the address book. If the caller number belongs to the existing number in the address book, go to S803; if the caller number does not belong to the existing number in the address book, go to S804.
  • the front-end subsystem of the device queries the user address book to confirm whether the number has a sound model in the sound model library. If the number already has a sound model in the sound model library, go to S804; otherwise, go to S807.
  • the front end subsystem sound feature extraction module of the device picks up the voice of the opposite caller in the sample call, and performs feature extraction, and then proceeds to S805.
  • the front-end subsystem inputs the sound feature extracted by the sound feature extraction module of the S804 as a voice input module input to the back-end subsystem, and the voice recognition module identifies the opposite caller of the call according to the sound model in the sound model library.
  • Identity S806.
  • the user interface interface module module notifies the mobile phone user of the identity of the peer speaker.
  • the sound sampling module of the front end subsystem of the device uploads the sampled sound sample to the back end subsystem using the communication module, and the sound feature extraction module of the back end subsystem Feature extraction is performed on this sound sample, and then go to S808.
  • the sound model building module of the back end subsystem constructs a sound model by extracting the sound samples from the feature, and then deposits the sound model into the sound model library.
  • the method or device of the alternative embodiment is different from the previous method of human judgment, and the voice of the mobile phone is discriminated by a non-manual method, which can effectively prevent the mobile phone user from being deceived in the telephone fraud.
  • the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices.
  • the computing device may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or they may be Multiple modules or steps are made into a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the above is only an alternative embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
  • the present invention relates to the field of mobile applications, which adopts a sound sample for acquiring a call object for making a call; compares the sound sample with the sound in the sound model library; and recognizes the call sound according to the comparison result, and solves the related technology Because the terminal can not identify the identity of the opposite party through the voice of the call, it is easy to cause the problem of the fraud event, and the terminal can identify the identity of the opposite party by the voice of the call, thereby improving the security.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

L'invention concerne un procédé et un appareil de reconnaissance de la voix d'un appel, lequel procédé consiste à: obtenir un échantillon vocal d'une cible d'appel qui effectue un appel; comparer l'échantillon vocal à la voix dans une bibliothèque de modèles de voix; et reconnaître la voix d'un appel en fonction d'un résultat de comparaison. L'invention permet de résoudre le problème où un événement de fraude peut se produire facilement d'un fait qu'un terminal de l'art pertinent ne peut pas distinguer l'identité d'une personne appelant d'une extrémité opposée via la voix d'un appel, permet au terminal de distinguer l'identité de la personne appelant d'une extrémité opposée via la voix d'un appel, et permet d'améliorer la sécurité.
PCT/CN2014/080661 2013-12-25 2014-06-24 Procédé et appareil de reconnaissance de la voix d'un appel WO2015096429A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310728622.1 2013-12-25
CN201310728622.1A CN104751848A (zh) 2013-12-25 2013-12-25 通话声音识别方法及装置

Publications (1)

Publication Number Publication Date
WO2015096429A1 true WO2015096429A1 (fr) 2015-07-02

Family

ID=53477465

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/080661 WO2015096429A1 (fr) 2013-12-25 2014-06-24 Procédé et appareil de reconnaissance de la voix d'un appel

Country Status (2)

Country Link
CN (1) CN104751848A (fr)
WO (1) WO2015096429A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113225327A (zh) * 2021-04-29 2021-08-06 心动网络股份有限公司 基于语音识别的登录客户监督方法、装置、设备及介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106790949A (zh) * 2015-11-20 2017-05-31 北京奇虎科技有限公司 恶意电话的语音特征库的配置方法和装置
CN105590632B (zh) * 2015-12-16 2019-01-29 广东德诚科教有限公司 一种基于语音相似性识别的s-t教学过程分析方法
WO2018170816A1 (fr) * 2017-03-23 2018-09-27 李卓希 Procédé de traitement de commande d'appels et terminal mobile
CN108122555B (zh) * 2017-12-18 2021-07-23 北京百度网讯科技有限公司 通讯方法、语音识别设备和终端设备
CN107846493B (zh) * 2017-12-21 2019-10-25 Oppo广东移动通信有限公司 通话联系人控制方法、装置及存储介质和移动终端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852560A (zh) * 2005-07-22 2006-10-25 华为技术有限公司 一种用户身份识别方法和呼叫控制方法与系统
US20080159488A1 (en) * 2006-12-27 2008-07-03 Chander Raja Voice based caller identification and screening
CN102576530A (zh) * 2009-10-15 2012-07-11 索尼爱立信移动通讯有限公司 对声音模式加了标签的联系人
CN102780819A (zh) * 2012-07-27 2012-11-14 广东欧珀移动通信有限公司 一种移动终端的语音识别联系人的方法
CN103377652A (zh) * 2012-04-25 2013-10-30 上海智臻网络科技有限公司 一种用于进行语音识别的方法、装置和设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442579A (zh) * 2007-11-23 2009-05-27 中兴通讯股份有限公司 一种具有语音识别主叫用户信息的移动终端
JP2011119953A (ja) * 2009-12-03 2011-06-16 Hitachi Ltd 呼制御および通話録音の機能を用いた通話録音システム
CN202142288U (zh) * 2011-07-07 2012-02-08 龙旗科技(上海)有限公司 一种便携终端的安全语音通讯装置
CN103281425A (zh) * 2013-04-25 2013-09-04 广东欧珀移动通信有限公司 一种通过通话声音分析联系人的方法及装置
CN103313249B (zh) * 2013-05-07 2017-05-10 百度在线网络技术(北京)有限公司 用于终端的提醒方法、系统和服务器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852560A (zh) * 2005-07-22 2006-10-25 华为技术有限公司 一种用户身份识别方法和呼叫控制方法与系统
US20080159488A1 (en) * 2006-12-27 2008-07-03 Chander Raja Voice based caller identification and screening
CN102576530A (zh) * 2009-10-15 2012-07-11 索尼爱立信移动通讯有限公司 对声音模式加了标签的联系人
CN103377652A (zh) * 2012-04-25 2013-10-30 上海智臻网络科技有限公司 一种用于进行语音识别的方法、装置和设备
CN102780819A (zh) * 2012-07-27 2012-11-14 广东欧珀移动通信有限公司 一种移动终端的语音识别联系人的方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113225327A (zh) * 2021-04-29 2021-08-06 心动网络股份有限公司 基于语音识别的登录客户监督方法、装置、设备及介质

Also Published As

Publication number Publication date
CN104751848A (zh) 2015-07-01

Similar Documents

Publication Publication Date Title
WO2015096429A1 (fr) Procédé et appareil de reconnaissance de la voix d'un appel
US9607621B2 (en) Customer identification through voice biometrics
CN105306657B (zh) 身份识别方法、装置及通讯终端
KR101881058B1 (ko) 음성 검증 방법, 장치 및 시스템
US10062268B2 (en) Terminal alarm method and apparatus
CN104537746A (zh) 智能电子门控制方法、系统及设备
CN113794805A (zh) 一种goip诈骗电话的检测方法、检测系统
WO2017201874A1 (fr) Procédé et appareil permettant d'indiquer une perte de terminal
US20180013869A1 (en) Integration of voip phone services with intelligent cloud voice recognition
CN204990444U (zh) 智能安防控制设备
US8483672B2 (en) System and method for selective monitoring of mobile communication terminals based on speech key-phrases
CN106657625B (zh) 终端呼叫方法与系统
CN109039509A (zh) 一种语音控制广播设备的方法及广播设备
WO2017059679A1 (fr) Procédé et appareil de traitement de compte
CN104933791A (zh) 智能安防控制方法及设备
CN107995381B (zh) 一种报警终端、云端及其报警处理方法、以及存储介质
CN112333709B (zh) 一种跨网络涉诈关联分析方法、系统及计算机存储介质
CN110929244A (zh) 数字化身份识别方法、装置、设备及存储介质
US20160330315A1 (en) System and method for user-privacy-aware communication monitoring and analysis
WO2016124008A1 (fr) Procédé, appareil et système de commande vocale
JP2016149636A (ja) 認証装置、電話端末、認証方法および認証プログラム
US20180343342A1 (en) Controlled environment communication system for detecting unauthorized employee communications
JP2016071068A (ja) 通話解析装置、通話解析方法および通話解析プログラム
CN111988426B (zh) 基于声纹识别的通信方法、装置、智能终端及存储介质
WO2016058540A1 (fr) Procédé et appareil d'authentification d'identité et support d'informations

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14873352

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14873352

Country of ref document: EP

Kind code of ref document: A1