WO2008061463A1 - The method and system for authenticating the voice of the speaker, the mrcf and mrpf - Google Patents

The method and system for authenticating the voice of the speaker, the mrcf and mrpf Download PDF

Info

Publication number
WO2008061463A1
WO2008061463A1 PCT/CN2007/070805 CN2007070805W WO2008061463A1 WO 2008061463 A1 WO2008061463 A1 WO 2008061463A1 CN 2007070805 W CN2007070805 W CN 2007070805W WO 2008061463 A1 WO2008061463 A1 WO 2008061463A1
Authority
WO
WIPO (PCT)
Prior art keywords
verification
media resource
speaker
voice
entity
Prior art date
Application number
PCT/CN2007/070805
Other languages
French (fr)
Chinese (zh)
Inventor
Zhiyong Xu
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Publication of WO2008061463A1 publication Critical patent/WO2008061463A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints

Definitions

  • the present invention relates to the field of communication technologies, and in particular, to a method for speaker verification, a system for speaker verification, a media resource control entity, and a media resource processing entity.
  • Speaker verification technology is a technique for performing feature matching by analyzing the speaker's speech characteristics, matching the features with the already sampled speech samples, and then performing identity verification based on the matching results. This technique verifies the speaker by analyzing the unique characteristics of the speech sample, such as the frequency of the pronunciation. Speaker verification technology allows people to control access to restricted areas, such as telephone banking, database services, voicemail, etc., through voices that speak.
  • the voice sample is called a voiceprint and is a voice sample that is used as a verification standard, usually a user voice that is directly recorded.
  • SI/SV Speaker Identifier/Speaker Verified
  • AS Application Server
  • the lower layer uses media processing unit to support media processing functions and media interaction with clients.
  • the speaker verification command from the application server is received by the additional SI/SV server, and the voice sent by the media resource processing entity is received for speaker verification, and then the verification result is reported to the AS.
  • the control signaling interaction between the SI/SV server and the AS and the media processing unit for speaker verification/verification is performed through a specially defined SPEECHSC protocol.
  • a media resource control entity and a media resource processing entity interact with an AS to implement a media resource service.
  • the speaker verification technology needs to be implemented in the system of media control and bearer separation, according to the above-mentioned speaker verification network architecture, the above SI/SV server needs to be set in the system of media control and bearer separation, and the existing protocol needs to be changed. The structure, the changes to the existing network, and the high cost of network upgrades. Summary of the invention
  • the embodiments of the present invention provide a method and system for speaker verification, and a media resource control entity and a media resource processing entity, which can implement speaker verification technology without changing the network architecture of the media control and bearer separation system. .
  • the media resource control entity instructs the media resource processing entity to perform the speaker verification process; the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity.
  • a media resource control entity configured to instruct the media resource processing entity to perform speaker verification processing, and receive a verification result from the media resource processing entity
  • the media resource processing entity is configured to receive and verify the voice input of the speaker, and report the verification result to the media resource control entity.
  • the media resource control entity provided by the embodiment of the present invention includes: a speaker verification and service information interaction module, a control module, and a media control interaction module;
  • a speaker verification and service information interaction module configured to receive a speaker verification command, and transmit the verification command to the control module, and return the verification result from the control module to the device that sends the speaker verification command;
  • the control module is configured to generate a speaker verification request according to the speaker verification command from the speaker verification and business information interaction module, and transmit the verification request to the media control interaction module; and receive the speaker verification result from the media control interaction module, And passing the verification result to the speaker verification and business information interaction module;
  • the media control interaction module is configured to receive a speaker verification request from the control module, and convert the request into a format supported by the media control protocol to be sent to the media resource processing entity; and receive the verification of the media control protocol support format from the media resource processing entity As a result, the protocol conversion process is performed and the information that can be recognized by the control module is sent to the control module.
  • the media resource processing entity provided by the embodiment of the present invention includes: a media control interaction module and a speaker verification module;
  • a media control interaction module configured to receive a speaker verification request in the form of a media control protocol from the media resource control entity, and convert the speaker verification request into information that can be recognized by the speaker verification module, and send the information to the speaker verification module. And receiving the verification result from the speaker verification module, and converting the verification result into a media control protocol and sending the result to the media resource control entity;
  • a speaker verification module configured to acquire a corresponding voiceprint according to a speaker verification request from the media control interaction module and receive a voice input of the user, and verify the verification result by using the acquired voiceprint to verify the received voice input by the user And sending the verification result to the media control interaction module.
  • the media resource control entity instructs the media resource processing entity to perform speaker verification processing
  • the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and performs verification
  • the verification result is reported to the media resource control entity, and the network architecture and protocol structure of the existing bearer control separation need not be changed, that is, the speaker verification technology is implemented, and the cost of the network upgrade is reduced.
  • FIG. 1 is a schematic diagram of a network architecture for implementing a speaker verification technology in the prior art
  • FIG. 2 is a schematic diagram of a system composition for implementing a speaker verification technology according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of internal components of a media resource control entity and a media resource processing entity according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a network according to an example of a specific embodiment of the present invention.
  • FIG. 5 is a general flowchart of a method for verifying a speaker in a specific embodiment of the present invention
  • FIG. 6 is a specific flowchart of an example of a method for verifying a speaker in a specific embodiment of the present invention. Mode for carrying out the invention
  • the media resource control entity instructs the media resource processing entity to perform the speaker verification process; the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity, thereby implementing the speaker. Verification technology.
  • the media resource control entity may send a verification request including the speaker voice ripple information to the media resource processing entity according to the verification command; the media resource processing entity acquires the voice ripple The voice ripple corresponding to the information, through which the user voice input received according to the verification request is verified, and the verification result is returned to the media resource control entity.
  • the above acoustic ripple information may include: a voice ripple path and a voice ripple name.
  • the media resource processing entity can obtain the corresponding voice ripple according to the voice ripple information.
  • the voice ripple path can be a local path or an accessible network server path; the voiceprint name can take characters String. Under the above described acoustic corrugation path, the voiceprint name uniquely corresponds to one voiceprint file.
  • the media resource control entity may be a Multimedia Resource Function Controller (MRFC) or a Media Gateway Controller (MGC).
  • MRFC Multimedia Resource Function Controller
  • MSC Media Gateway Controller
  • the media resource processing entity is a Multimedia Resource Function Processor (MRFP) or a Media Gateway (MG).
  • FIG. 2 A specific implementation of the system for implementing speaker verification of the present invention is shown in FIG. 2.
  • the system includes: a media resource control entity 21 and a media resource processing entity 22.
  • the media resource control entity 21 is configured to instruct the media resource processing entity 22 to perform a speaker verification process. Specifically, the media resource control entity 21 may receive a device speaker verification command from the AS or from another device for transmitting a media resource service request, the verification command including speaker voice ripple information, and the media resource processing entity 22 according to the verification command. A speaker verification request including the above speaker voice ripple information is transmitted.
  • the media resource processing entity 22 is configured to receive the voice input of the speaker according to the indication of the media resource control entity 21 and perform voice verification processing, and report the verification result to the media resource control entity 21. Specifically, the media resource processing entity 22 receives the speaker verification request from the media resource control entity 21 and acquires the corresponding voice ripple based on the speaker voice ripple information in the request.
  • the media resource control entity 21 specifically includes a speaker verification and service information interaction module 211, a control module 212, and a media control interaction module 213.
  • the speaker verification and service information interaction module 211 is configured to receive service information from an application server or other device for sending a media resource service request, such as a Service-Call Session Control Function (S-CSCF). , that is, a speaker verification command including speaker voice ripple information, and transmitting the verification command to the control mode Block 212, and returning the verification result information from the control module 212 to the service application server or other device for transmitting the media resource service request.
  • a media resource service request such as a Service-Call Session Control Function (S-CSCF).
  • S-CSCF Service-Call Session Control Function
  • the speaker verification and service information interaction module 211 is also used to control the interaction of the module 212 with the application server or other devices to perform other service information.
  • the control module 212 is configured to control the process interaction and the state machine of the call in the service process, and in this embodiment, specifically, the speaker verification request is generated according to the speaker verification command from the speaker verification and service information interaction module 211, and The verification request is transmitted to the media control interaction module 213; and the speaker verification result information from the media control interaction module 213 is received, and the verification result information is transmitted to the speaker verification and service information interaction module 211.
  • the media control interaction module 213 is configured to receive information from the control module 212, convert the information into a media control protocol message, such as an H.248 protocol message, to the media resource processing entity 22, and receive information from the media resource processing entity 22. And converted to information that the control module 212 can verify is transmitted to the control module 212. Specifically, in this embodiment, the speaker verification request from the control module 212 is received, and the request is converted into a media control protocol, such as a format supported by H.248, and sent to the media control interaction module 221 in the media resource processing entity 22.
  • a media control protocol message such as an H.248 protocol message
  • the media resource processing entity 22 specifically includes: a speaker verification module 220 and a media control interaction module 221.
  • the media control interaction module 221 is configured to receive the information in the H.248 protocol format from the media resource control entity, and convert the information that can be verified by the speaker verification module 220 to the speaker verification module 220; and the speaker from the speaker Verification module 220 information
  • the format is converted to the H.248 protocol and sent to the media resource control entity.
  • the speaker verification request in the H.248 protocol format that receives the media control interaction module 213 from the media resource control entity 21 is converted into information that can be verified by the speaker verification module 220, and then transmitted to the speaker verification module. And receiving the verification result information from the speaker verification module 220, and converting the information into a format supported by the H.248 protocol and sent to the media control interaction module 213 in the media resource control entity 21.
  • the speaker verification module 220 is configured to acquire a corresponding voiceprint and receive a voice input of the user according to the received speaker verification request information from the media control interaction module, and verify the received voice input by the user through the acquired voice ripple. The verification result is determined, and the verification result is sent to the media control interaction module 221.
  • the speaker verification module 220 may include: a control module 222, a speaker verification processing engine 223, a voiceprint acquisition module 224, and a speaker voice receiving module 225.
  • the control module 222 is used to control the process interaction and state machine of the call during the business process.
  • the method relates to the speaker verification processing engine 223 according to the speaker verification from the media control interaction module 221, and the control speaker voice receiving module 225 receiving the voice input from the user, and transmitting the voice input to the speaker.
  • the person verification processing engine 223, and the control speaker verification processing engine 223 performs speaker verification based on the received voice ripple and user voice input, and receives the verification result information returned by the speaker verification processing engine 223, and transmits the result information.
  • the media control interaction module 221 is provided.
  • the speaker verification processing engine 223 is configured to receive the voiceprint from the voiceprint acquisition module 224 and the user voice input from the speaker voice receiving module 225, that is, the voice data of the speaker, and obtain the voice ripple and reception according to the acoustic characteristics.
  • the user voice input is compared to generate verification result information, and the verification result information is sent to the control module 222.
  • the voiceprint acquisition module 224 is configured to acquire a corresponding voice ripple according to the control of the control module 222, and transmit the acquired voice ripple to the speaker verification processing engine 223.
  • the voice verification information in the corresponding speaker verification command is included in the above speaker verification request, and then the control module 222 sends an acquisition voice ripple command including the voice ripple information to the voiceprint acquisition module 224, and the voiceprint acquisition module 224 is based on the sound.
  • the voice ripple path and the voice ripple name in the ripple information are acquired to the corresponding address to obtain the corresponding voice ripple, and then the acquired voice ripple is transmitted to the speaker verification processing engine 223.
  • the speaker voice receiving module 225 is configured to receive the voice input of the user according to the command of the control module 222, and transmit the received voice to the speaker verification processing engine 223.
  • the media resource control entity in this embodiment may be an MGC, and the corresponding media resource processing entity is an MG.
  • the media resource control entity may also be an MRFC, and the corresponding media resource processing entity is an MRFP.
  • the network architecture on which the embodiment is based will be described below by way of an example.
  • the network architecture applied in the IP Multimedia Subsystem (IMS) network including AS, S-CSCF, MRFC, and MRFP, is received by the S-CSCF through the S-CSCF.
  • the speaker verification command of the AS according to the command, instructs the MRFP to acquire and perform verification of the corresponding voiceprint and user voice input, and return the verification result information to the S-CSCF.
  • IMS IP Multimedia Subsystem
  • FIG. 5 The overall process of a specific embodiment of the method for implementing speaker verification according to the present invention is shown in FIG. 5, which mainly includes the following steps:
  • Step 501 The media resource control entity instructs the media resource processing entity to perform speaker verification processing
  • Step 502 The media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity, and performs verification, and reports the verification result to the media resource control entity.
  • the foregoing media resource control entity and media resource processing entity may be an MGC and an MG, or an MRFC and an MRFP.
  • the present embodiment is applied to the IMS network architecture as an example.
  • the media resource control entity and the media resource processing entity are MRFC and MRFP, and the embodiment is described in detail. As shown in FIG. 6, when the embodiment is applied to an IMS network architecture, the following steps are specifically included:
  • Step 601 After the bearer channel between the MRFC and the MRFP is established, the MRFC receives a speaker verification command from the S-CSCF, where the command includes the speaker's voice ripple information.
  • the voice ripple information here is the path and name corresponding to the user's voice ripple.
  • the speaker verification command sent by the S-CSCF to the MRFC is usually from the AS.
  • the AS can also send the speaker risk command directly to the MRFC.
  • the MRFC may specifically include a speaker verification and service information interaction module, a control module, and a media control interaction module.
  • the step specifically includes: the speaker verification and service information interaction module receives the service information from the AS directly or through the S-CSCF, that is, the speaker verification command including the speaker voice ripple information, and transmits the speaker verification command to the control. Module.
  • Step 602 The MRFC converts the received speaker verification command into a speaker verification request, and converts the request into an H.248 message format and sends it to the MRFP through the Mp interface between the MRFP and the MRFP.
  • control module in the MRFC may generate a speaker verification request according to the received speaker verification command, and transmit the verification request to the media control interaction module, and the media control interaction module converts the verification request into H.
  • the format supported by the .248 protocol is sent to the MRFP.
  • the speaker verification request in this step may adopt a Mod.req message in the H.248 protocol, including information of the service session endpoint T1, and the value of the signal is speaker verification.
  • the request may specifically include one or more of the following information:
  • the voiceprint identifier that is, the voice ripple information, including the path and name of the voice ripple, where the sound
  • the path of the ripple storage may be a local server path or a network server path;
  • the voiceprint name adopts a string, but the voiceprint name is required to be unique under the specified path;
  • the score threshold is 0 - 100, which is used by the MRFP to confirm whether the speaker verification is successful according to the threshold, and the verification result is successful if the matching result score is greater than or equal to this value;
  • Voice input end detection button the user can end the voice input by pressing the button, and the user can be notified of the specific button by the initial prompt tone.
  • the information included in the speaker verification request described above is generated based on the corresponding information set in the MRFC in addition to the voiceprint identification from the speaker verification command.
  • Step 603 After receiving the speaker verification request, the MRFP returns a Mod.resp message to the MRFC including the information of the endpoint T1.
  • the MRFP may include a media control interaction module, a control module, a speaker verification processing engine, a voice ripple acquisition module, and a speaker voice receiving module.
  • the media control interaction module in the MRFP receives the speaker verification request in the H.248 protocol format, and converts the request into information that can be recognized by the control module in the MRFP, and then transmits the information to the control module;
  • the control module generates a Mod.resp message based on the received request. Responsive, and sent to the MRFC through the media control interaction module of the MRFP.
  • Step 604 The MRFP performs verification processing, specifically, acquiring a corresponding voice ripple according to the voice ripple path and the voice ripple name in the voiceprint identifier in the speaker verification request, and passing the Not.req message in step 605 if the voice ripple is not obtained. Returning the verification failure information to the MRFC or returning the error information; if the voice ripple is obtained, the initial prompt tone is played to the user and waiting for the user input, and after receiving the user voice input, the MRFP performs the voice input by the user and the acquired voice ripple sample. The verification result score is obtained.
  • the verification success prompt tone is sent to the user, and in step 605, the verification success information is returned to the MRFC through the Not.req message; if the verification result score is less than the score threshold, then A verification failure prompt tone is sent to the user, and in step 605, the information of the verification failure is returned to the MRFC through the Not.req message.
  • control module verifies the processing engine according to the speaker from the media control interaction module, and controls the speaker voice receiving module to play the initial prompt tone to the user, and receives the voice input from the user; Before the maximum duration of the human voice input timer, the speaker voice receiving module receives the voice input of the user, and after receiving the voice input end detection button, or the preset input duration arrives, the voice input is transmitted to the speaker.
  • Verifying the processing engine if the voice input of the user has not been received after the maximum duration of the timer for waiting for the speaker voice input, the speaker verification processing engine is notified to determine that the verification fails; the control module also controls the speaker verification processing engine
  • the speaker verification is performed according to the received voice ripple and the user voice input, and the speaker verification processing compares the acquired voice ripple with the received user voice input according to the acoustic characteristics, thereby generating a matching result score, and the score is compared with the speech.
  • Person test Request score threshold comparison if the result of the matching score is greater than or equal to the threshold score, it is determined that the verification is successful, the receiving module through the speaker's voice played to the user authentication is successful tone; otherwise, determines that the verification fails, The verification failure prompt tone is played to the user through the speaker voice receiving module; the verification result of the verification success or the verification failure by the speaker verification processing engine, or other information further included, such as the matching result score, the duration of the user input voice, and the voiceprint identifier The verification result information is sent to the control module, and the control module transmits the information to the media control interaction module. If each module involved in this step is a module in the MRFP.
  • Step 605 The MRFP returns the risk certificate result information to the MRFC, and the verification result information may be carried by the Not.req message.
  • the verification result information here includes at least information on whether the verification is successful, and may further include one or more of the following information:
  • the matching value that is, the matching result score, the score value can be between 0 - 100, 100 matches best, 0 matches the worst;
  • the media control interaction module in the MRFP converts the received verification result information into a format supported by the H.248 protocol and sends it to the MRFC.
  • Step 606 After receiving the verification result information, the MRFC returns a Not.resp message to the MRFP to respond.
  • the step of the step includes: the media control interaction module in the MRFC receives the verification result information from the MRFP, and performs protocol conversion processing on the information that can be recognized by the control module in the MRFC, and then sends the information to the control module, and after receiving the information, the control module receives the information.
  • the media control interaction module returns a response to the MRFP returning the Not.resp message in the H.248 protocol format.
  • Step 607 The MRFC converts the received verification result information into a message supported by the Mr interface between the S-CSCF and sends the message to the S-CSCF, and the S-CSCF sends the verification result information to the AS.
  • the MRFC can also send the verification result information directly to the AS without passing through the S-CSCF.
  • the verification result from the media control interaction module received by the control module in the MRFC The message is transmitted to the speaker verification and service information interaction module, and the speaker verification and service information interaction module returns the verification result letter to the AS directly or through the S-CSCF according to the verification result information, which can be seen by the solution described in the above specific embodiment.
  • the media resource control entity instructs the media resource processing entity to perform the speaker verification process, and the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and performs verification, and reports the verification result to the media resource control. Entity, without changing the existing network architecture and protocol structure of bearer control separation, realizes speaker verification technology and reduces the cost of network upgrade.
  • the embodiment of the present invention provides a media resource control entity in the existing network architecture to support speaker verification by providing a media resource control entity including a speaker verification and service information interaction module, a control module, and a media control interaction module. deal with.
  • the media resource processing entity in the existing network architecture can support speaker-risk processing.

Abstract

The method for authenticating the voice of the speaker includes, the MRCF indicates the MRPF to authenticate the voice of the speaker. The MRPF authenticates the voice received, and sends the result to the MRCF. The system for authenticating the voice of the speaker, the MRCF and the MRPF are given at the same time. So the authentication of the speaker's voice can be achieved without changing the current network architecture.

Description

说话人验证方法和系统及媒体资源控制实体和处理实体 技术领域  Speaker verification method and system and media resource control entity and processing entity
本发明涉及通信技术领域, 特别是涉及一种说话人验证的方法、 一 种说话人验证的系统、 一种媒体资源控制实体和一种媒体资源处理实 体。 发明背景  The present invention relates to the field of communication technologies, and in particular, to a method for speaker verification, a system for speaker verification, a media resource control entity, and a media resource processing entity. Background of the invention
说话人验证技术是通过分析说话人语音特征, 与已经采样的语音样 本进行特征匹配, 再根据匹配结果进行身份验证的技术。 该技术通过分 析语音样本的唯一特性, 例如发音的频率等特性, 验证出说话的人。 说 话人验证技术使得人们可以通过说话的嗓音来控制能否出入限制性的 区域, 访问对身份敏感的数据, 如: 电话银行、 数据库服务、 语音邮件 等。 语音样本称为声波纹( voiceprint ), 是作为验证标准的语音采样, 通常为直接录制的用户语音。  Speaker verification technology is a technique for performing feature matching by analyzing the speaker's speech characteristics, matching the features with the already sampled speech samples, and then performing identity verification based on the matching results. This technique verifies the speaker by analyzing the unique characteristics of the speech sample, such as the frequency of the pronunciation. Speaker verification technology allows people to control access to restricted areas, such as telephone banking, database services, voicemail, etc., through voices that speak. The voice sample is called a voiceprint and is a voice sample that is used as a verification standard, usually a user voice that is directly recorded.
目前标准组织因特网工程任务组 ( Internet Engineering Task Force, IETF ) 中说话险证 ( SPEECHSC ) 工作组定义了说话人险证 ( Speaker Identifier/Speaker Verified, SI/SV )的应用网络架构。 IETF在 RFC4313 中 提到该技术的应用场景和建议架构, 如图 1所示, 其中建议采用应用服 务器( Application Server, AS )控制业务, 下层采用媒体处理单元支持 媒体处理功能和与客户端的媒体交互,通过附加的 SI/SV服务器接收来 自应用服务器的说话人验证命令, 并接收媒体资源处理实体发送的语音 进行说话人验证, 然后把验证结果上报给 AS。 其中 SI/SV服务器与 AS 及媒体处理单元间通过专门定义的 SPEECHSC 协议进行有关说话人验 证 /验证的控制信令交互。 在媒体控制与承载分离的系统中, 通常由媒体资源控制实体和媒体 资源处理实体与 AS交互实现媒体资源服务。 如需要在媒体控制与承载 分离的系统中实现说话人验证技术, 根据上述说话人验证的网络架构, 需要在媒体控制与承载分离的系统中设置上述 SI/SV服务器, 还需要更 改现有的协议结构, 对现有网络改动大, 并且也会造成网络升级的成本 很高。 发明内容 Currently, the Standards Organization Internet Engineering Task Force (IETF) Speech Insurance (SPEECHSC) Working Group defines the application network architecture of Speaker Identifier/Speaker Verified (SI/SV). The IETF mentioned the application scenario and recommended architecture of the technology in RFC4313. As shown in Figure 1, it is recommended to use Application Server (AS) to control services. The lower layer uses media processing unit to support media processing functions and media interaction with clients. The speaker verification command from the application server is received by the additional SI/SV server, and the voice sent by the media resource processing entity is received for speaker verification, and then the verification result is reported to the AS. The control signaling interaction between the SI/SV server and the AS and the media processing unit for speaker verification/verification is performed through a specially defined SPEECHSC protocol. In a system in which media control and bearer are separated, a media resource control entity and a media resource processing entity interact with an AS to implement a media resource service. If the speaker verification technology needs to be implemented in the system of media control and bearer separation, according to the above-mentioned speaker verification network architecture, the above SI/SV server needs to be set in the system of media control and bearer separation, and the existing protocol needs to be changed. The structure, the changes to the existing network, and the high cost of network upgrades. Summary of the invention
有鉴于此, 本发明实施例提供了一种说话人验证的方法和系统, 以 及一种媒体资源控制实体和媒体资源处理实体, 能够不改变媒体控制与 承载分离系统的网络架构实现说话人验证技术。  In view of this, the embodiments of the present invention provide a method and system for speaker verification, and a media resource control entity and a media resource processing entity, which can implement speaker verification technology without changing the network architecture of the media control and bearer separation system. .
本发明实施例提供的说话人验证方法, 包括:  The speaker verification method provided by the embodiment of the invention includes:
媒体资源控制实体指示媒体资源处理实体进行说话人验证处理; 媒体资源处理实体接收说话人的语音输入并进行验证, 将验证结果 上报给媒体资源控制实体。  The media resource control entity instructs the media resource processing entity to perform the speaker verification process; the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity.
本发明实施例提供的说话人验证系统, 包括:  The speaker verification system provided by the embodiment of the invention includes:
媒体资源控制实体, 用于指示媒体资源处理实体进行说话人验证处 理, 以及接收来自媒体资源处理实体的验证结果;  a media resource control entity, configured to instruct the media resource processing entity to perform speaker verification processing, and receive a verification result from the media resource processing entity;
媒体资源处理实体, 用于接收说话人的语音输入并进行验证, 以及 将验证结果上报给媒体资源控制实体。  The media resource processing entity is configured to receive and verify the voice input of the speaker, and report the verification result to the media resource control entity.
本发明实施例提供的媒体资源控制实体, 包括: 说话人验证及业务 信息交互模块、 控制模块和媒体控制交互模块; 其中,  The media resource control entity provided by the embodiment of the present invention includes: a speaker verification and service information interaction module, a control module, and a media control interaction module;
说话人验证及业务信息交互模块, 用于接收说话人验证命令, 并将 该验证命令传送给控制模块, 以及将来自控制模块的验证结果返回给发 送所述说话人验证命令的设备; 控制模块用于根据来自说话人验证及业务信息交互模块的说话人验 证命令生成说话人验证请求, 并将该验证请求传送给媒体控制交互模 块; 以及接收来自媒体控制交互模块的说话人验证结果, 并将该验证结 果传递给说话人验证及业务信息交互模块; a speaker verification and service information interaction module, configured to receive a speaker verification command, and transmit the verification command to the control module, and return the verification result from the control module to the device that sends the speaker verification command; The control module is configured to generate a speaker verification request according to the speaker verification command from the speaker verification and business information interaction module, and transmit the verification request to the media control interaction module; and receive the speaker verification result from the media control interaction module, And passing the verification result to the speaker verification and business information interaction module;
媒体控制交互模块用于接收来自控制模块的说话人验证请求, 并将 该请求转化为媒体控制协议支持的格式发送给媒体资源处理实体; 以及 接收来自媒体资源处理实体的媒体控制协议支持格式的验证结果, 并进 行协议转换处理为控制模块能够识别的信息后发送给控制模块。  The media control interaction module is configured to receive a speaker verification request from the control module, and convert the request into a format supported by the media control protocol to be sent to the media resource processing entity; and receive the verification of the media control protocol support format from the media resource processing entity As a result, the protocol conversion process is performed and the information that can be recognized by the control module is sent to the control module.
本发明实施例提供的媒体资源处理实体, 包括: 媒体控制交互模块 和说话人验证模块; 其中,  The media resource processing entity provided by the embodiment of the present invention includes: a media control interaction module and a speaker verification module;
媒体控制交互模块, 用于接收来自媒体资源控制实体的媒体控制协 议形式的说话人验证请求, 并将该说话人验证请求转换为说话人验证模 块能够识别的信息后发送给该说话人验证模块, 以及接收来自说话人验 证模块的验证结果, 并将该验证结果转换为媒体控制协议发送给媒体资 源控制实体;  a media control interaction module, configured to receive a speaker verification request in the form of a media control protocol from the media resource control entity, and convert the speaker verification request into information that can be recognized by the speaker verification module, and send the information to the speaker verification module. And receiving the verification result from the speaker verification module, and converting the verification result into a media control protocol and sending the result to the media resource control entity;
说话人验证模块, 用于根据来自媒体控制交互模块的说话人验证请 求获取对应的声波纹和接收用户的语音输入, 并通过所述获取的声波纹 对接收的用户输入的语音进行验证确定验证结果, 以及将验证结果发送 给媒体控制交互模块。  a speaker verification module, configured to acquire a corresponding voiceprint according to a speaker verification request from the media control interaction module and receive a voice input of the user, and verify the verification result by using the acquired voiceprint to verify the received voice input by the user And sending the verification result to the media control interaction module.
由上述方案可以看出, 本发明实施例中通过媒体资源控制实体指示 媒体资源处理实体进行说话人验证处理, 媒体资源处理实体根据媒体资 源控制实体的指示接收说话人的语音输入并进行验证, 将验证结果上报 给媒体资源控制实体, 不需要改变现有的承载控制分离的网络架构和协 议结构, 即实现了说话人验证技术, 减少了网络升级的成本。 附图简要说明 It can be seen that, in the embodiment of the present invention, the media resource control entity instructs the media resource processing entity to perform speaker verification processing, and the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and performs verification, and The verification result is reported to the media resource control entity, and the network architecture and protocol structure of the existing bearer control separation need not be changed, that is, the speaker verification technology is implemented, and the cost of the network upgrade is reduced. BRIEF DESCRIPTION OF THE DRAWINGS
图 1为现有技术中实现说话人验证技术的网络架构示意图; 图 2 为本发明具体实施例中实现说话人验证技术的系统组成示意 图;  1 is a schematic diagram of a network architecture for implementing a speaker verification technology in the prior art; FIG. 2 is a schematic diagram of a system composition for implementing a speaker verification technology according to an embodiment of the present invention;
图 3为本发明具体实施例中媒体资源控制实体和媒体资源处理实体 的内部组成示意图;  3 is a schematic diagram of internal components of a media resource control entity and a media resource processing entity according to an embodiment of the present invention;
图 4为本发明具体实施例中一种示例的网络组成示意图;  4 is a schematic structural diagram of a network according to an example of a specific embodiment of the present invention;
图 5为本发明具体实施例中说话人验证方法的总体流程图; 图 6 为本发明具体实施例中说话人验证方法一种示例的具体流程 图。 实施本发明的方式  FIG. 5 is a general flowchart of a method for verifying a speaker in a specific embodiment of the present invention; FIG. 6 is a specific flowchart of an example of a method for verifying a speaker in a specific embodiment of the present invention. Mode for carrying out the invention
为使本发明的目的、 技术方案和优点更加清楚, 下面结合附图对本 发明作进一步的详细描述。  In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings.
本发明实施例中, 由媒体资源控制实体指示媒体资源处理实体进行 说话人验证处理; 媒体资源处理实体接收说话人的语音输入并进行验 证, 将验证结果上报给媒体资源控制实体, 从而实现说话人验证技术。  In the embodiment of the present invention, the media resource control entity instructs the media resource processing entity to perform the speaker verification process; the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity, thereby implementing the speaker. Verification technology.
媒体资源控制实体可以在接收到包括说话人声波纹信息的说话人验 证命令后, 根据该验证命令向媒体资源处理实体发送包括上述说话人声 波纹信息的验证请求; 媒体资源处理实体获取上述声波纹信息对应的声 波纹, 通过该声波纹, 对根据验证请求接收的用户语音输入进行验证, 并将验证结果返回给媒体资源控制实体。  After receiving the speaker verification command including the speaker voice ripple information, the media resource control entity may send a verification request including the speaker voice ripple information to the media resource processing entity according to the verification command; the media resource processing entity acquires the voice ripple The voice ripple corresponding to the information, through which the user voice input received according to the verification request is verified, and the verification result is returned to the media resource control entity.
上述声波纹信息可以包括: 声波纹路径和声波纹名称。 媒体资源处 理实体可以根据这声波纹信息获取对应的声波纹。 声波纹路径可以是本 地路径, 也可以是可访问的网络服务器路径; 声波纹名称可以采用字符 串。 在上述声波纹路径下, 声波纹名称唯一对应一个声波紋文件。 The above acoustic ripple information may include: a voice ripple path and a voice ripple name. The media resource processing entity can obtain the corresponding voice ripple according to the voice ripple information. The voice ripple path can be a local path or an accessible network server path; the voiceprint name can take characters String. Under the above described acoustic corrugation path, the voiceprint name uniquely corresponds to one voiceprint file.
上述媒体资源控制实体可以是媒体资源控制器 ( Multimedia Resource Function Controller , MRFC ) 或者媒体网关控制器 (Media Gateway Controller, MGC )。 对应地, 媒体资源处理实体为媒体资源处 理器 ( Multimedia Resource Function Processor, MRFP )或者媒体网关 ( Media Gateway, MG )。  The media resource control entity may be a Multimedia Resource Function Controller (MRFC) or a Media Gateway Controller (MGC). Correspondingly, the media resource processing entity is a Multimedia Resource Function Processor (MRFP) or a Media Gateway (MG).
以下通过具体实施例对本发明进行详细说明。  The invention will now be described in detail by way of specific examples.
本发明实现说话人验证的系统的一种具体实施例如图 2所示, 该系 统包括: 媒体资源控制实体 21和媒体资源处理实体 22。  A specific implementation of the system for implementing speaker verification of the present invention is shown in FIG. 2. The system includes: a media resource control entity 21 and a media resource processing entity 22.
其中, 媒体资源控制实体 21用于指示媒体资源处理实体 22进行说 话人验证处理。 具体地, 媒体资源控制实体 21可以接收来自 AS , 或者 来自其它用于发送媒体资源服务请求的设备说话人验证命令, 该验证命 令包括说话人声波纹信息, 根据该验证命令向媒体资源处理实体 22发 送包括上述说话人声波纹信息的说话人验证请求。  The media resource control entity 21 is configured to instruct the media resource processing entity 22 to perform a speaker verification process. Specifically, the media resource control entity 21 may receive a device speaker verification command from the AS or from another device for transmitting a media resource service request, the verification command including speaker voice ripple information, and the media resource processing entity 22 according to the verification command. A speaker verification request including the above speaker voice ripple information is transmitted.
媒体资源处理实体 22用于根据媒体资源控制实体 21的指示接收说 话人的语音输入并进行语音验证处理, 将验证结果上报给媒体资源控制 实体 21。 具体地, 媒体资源处理实体 22接收来自媒体资源控制实体 21 的说话人验证请求, 并根据该请求中的说话人声波纹信息获取对应的声 波纹。  The media resource processing entity 22 is configured to receive the voice input of the speaker according to the indication of the media resource control entity 21 and perform voice verification processing, and report the verification result to the media resource control entity 21. Specifically, the media resource processing entity 22 receives the speaker verification request from the media resource control entity 21 and acquires the corresponding voice ripple based on the speaker voice ripple information in the request.
如图 3所示,媒体资源控制实体 21中具体包括说话人验证及业务信 息交互模块 211、 控制模块 212和媒体控制交互模块 213。  As shown in FIG. 3, the media resource control entity 21 specifically includes a speaker verification and service information interaction module 211, a control module 212, and a media control interaction module 213.
其中, 说话人验证及业务信息交互模块 211用于从应用服务器或者 其它用于发送媒体资源服务请求的设备, 如业务呼叫会话功能实体 ( Service-Call Session Control Function, S-CSCF ), 接收业务信息, 即包 括说话人声波纹信息的说话人验证命令, 并将该验证命令传送给控制模 块 212, 以及将来自控制模块 212的验证结果信息返回给业务应用服务 器或者其它用于发送媒体资源服务请求的设备。 当然, 说话人验证及业 务信息交互模块 211还用于控制模块 212与应用服务器或者上述其它设 备进行其它业务信息的交互。 The speaker verification and service information interaction module 211 is configured to receive service information from an application server or other device for sending a media resource service request, such as a Service-Call Session Control Function (S-CSCF). , that is, a speaker verification command including speaker voice ripple information, and transmitting the verification command to the control mode Block 212, and returning the verification result information from the control module 212 to the service application server or other device for transmitting the media resource service request. Of course, the speaker verification and service information interaction module 211 is also used to control the interaction of the module 212 with the application server or other devices to perform other service information.
控制模块 212用于控制业务处理过程中呼叫的流程交互和状态机, 就本实施例中, 具体涉及根据来自说话人验证及业务信息交互模块 211 的说话人验证命令生成说话人验证请求, 并将该验证请求传送给媒体控 制交互模块 213; 以及接收来自媒体控制交互模块 213的说话人验证结 果信息, 并将该验证结果信息传递给说话人验证及业务信息交互模块 211。  The control module 212 is configured to control the process interaction and the state machine of the call in the service process, and in this embodiment, specifically, the speaker verification request is generated according to the speaker verification command from the speaker verification and service information interaction module 211, and The verification request is transmitted to the media control interaction module 213; and the speaker verification result information from the media control interaction module 213 is received, and the verification result information is transmitted to the speaker verification and service information interaction module 211.
媒体控制交互模块 213用于接收来自控制模块 212的信息, 将该信 息转化为媒体控制协议消息, 例如 H.248协议消息, 发送给媒体资源处 理实体 22; 以及接收来自媒体资源处理实体 22的信息, 并转换为控制 模块 212能够验证的信息传送给控制模块 212。 在本实施例中具体涉及 接收来自控制模块 212的说话人验证请求, 并将该请求转化为媒体控制 协议, 如 H.248支持的格式, 发送给媒体资源处理实体 22中的媒体控 制交互模块 221 ; 以及接收来自媒体资源处理实体 22中媒体控制交互模 块 223的媒体控制协议支持格式的验证结果信息, 并进行协议转换处理 为控制模块 212能够验证的信息后发送给控制模块 212。 以下以媒体控 制协议为 H.248协议为例进行说明。  The media control interaction module 213 is configured to receive information from the control module 212, convert the information into a media control protocol message, such as an H.248 protocol message, to the media resource processing entity 22, and receive information from the media resource processing entity 22. And converted to information that the control module 212 can verify is transmitted to the control module 212. Specifically, in this embodiment, the speaker verification request from the control module 212 is received, and the request is converted into a media control protocol, such as a format supported by H.248, and sent to the media control interaction module 221 in the media resource processing entity 22. And receiving the verification result information from the media control protocol support format of the media control interaction module 223 in the media resource processing entity 22, and performing protocol conversion processing on the information that can be verified by the control module 212, and then sending the information to the control module 212. The following is an example of the H.248 protocol using the media control protocol.
如图 3所示, 媒体资源处理实体 22 中具体包括: 说话人验证模块 220和媒体控制交互模块 221。  As shown in FIG. 3, the media resource processing entity 22 specifically includes: a speaker verification module 220 and a media control interaction module 221.
其中, 媒体控制交互模块 221 用于接收来自媒体资源控制实体的 H.248协议格式的信息, 并转换为说话人验证模块 220能够验证的信息 后传送给说话人验证模块 220; 以及将来自说话人验证模块 220的信息 转换为 H.248协议的格式后发送给媒体资源控制实体。 在本实施例中具 体涉及到接收来自媒体资源控制实体 21 的媒体控制交互模块 213 的 H.248协议格式的说话人验证请求转换为说话人验证模块 220能够验证 的信息后传送给说话人验证模块 220;以及接收来自说话人验证模块 220 的验证结果信息, 并将该信息转化为 H.248协议支持的格式发送给媒体 资源控制实体 21中的媒体控制交互模块 213。 The media control interaction module 221 is configured to receive the information in the H.248 protocol format from the media resource control entity, and convert the information that can be verified by the speaker verification module 220 to the speaker verification module 220; and the speaker from the speaker Verification module 220 information The format is converted to the H.248 protocol and sent to the media resource control entity. In this embodiment, the speaker verification request in the H.248 protocol format that receives the media control interaction module 213 from the media resource control entity 21 is converted into information that can be verified by the speaker verification module 220, and then transmitted to the speaker verification module. And receiving the verification result information from the speaker verification module 220, and converting the information into a format supported by the H.248 protocol and sent to the media control interaction module 213 in the media resource control entity 21.
说话人验证模块 220用于根据接收的来自媒体控制交互模块的说话 人验证请求信息获取对应的声波纹和接收用户的语音输入, 并通过所述 获取的声波纹对接收的用户输入的语音进行验证确定验证结果, 以及将 验证结果发送给媒体控制交互模块 221。  The speaker verification module 220 is configured to acquire a corresponding voiceprint and receive a voice input of the user according to the received speaker verification request information from the media control interaction module, and verify the received voice input by the user through the acquired voice ripple. The verification result is determined, and the verification result is sent to the media control interaction module 221.
具体地, 上述说话人验证模块 220中可以包括: 控制模块 222、 说 话人验证处理引擎 223、 声波纹获取模块 224 和说话人声音接收模块 225。  Specifically, the speaker verification module 220 may include: a control module 222, a speaker verification processing engine 223, a voiceprint acquisition module 224, and a speaker voice receiving module 225.
控制模块 222用于控制业务处理过程中呼叫的流程交互和状态机。 就本实施例中, 具体涉及根据来自媒体控制交互模块 221的说话人验证 给说话人验证处理引擎 223 , 和控制说话人声音接收模块 225接收来自 用户的语音输入, 并将该语音输入传送给说话人验证处理引擎 223 , 以 及控制说话人验证处理引擎 223根据其接收的声波纹和用户语音输入进 行说话人验证, 并接收说话人验证处理引擎 223验证后返回的验证结果 信息, 将该结果信息传送给媒体控制交互模块 221。  The control module 222 is used to control the process interaction and state machine of the call during the business process. In this embodiment, the method relates to the speaker verification processing engine 223 according to the speaker verification from the media control interaction module 221, and the control speaker voice receiving module 225 receiving the voice input from the user, and transmitting the voice input to the speaker. The person verification processing engine 223, and the control speaker verification processing engine 223 performs speaker verification based on the received voice ripple and user voice input, and receives the verification result information returned by the speaker verification processing engine 223, and transmits the result information. The media control interaction module 221 is provided.
说话人验证处理引擎 223 , 用于接收来自声波纹获取模块 224的声 波纹和来自说话人声音接收模块 225的用户语音输入, 即说话人的语音 数据, 并根据声学特征把获取的声波纹和接收的用户语音输入进行对 比, 从而生成验证结果信息, 并将该验证结果信息发送给控制模块 222。 声波纹获取模块 224用于根据控制模块 222的控制获取对应的声波 纹, 以及将获取的声波纹传送给说话人验证处理引擎 223。 上述的说话 人验证请求中均包括对应说话人验证命令中的声波纹信息, 则这里控制 模块 222 向声波纹获取模块 224发送包括声波纹信息的获取声波纹命 令, 声波纹获取模块 224根据该声波纹信息中声波纹路径和声波纹名称 到对应的地址获取对应的声波纹, 然后将该获取的声波纹传送给说话人 验证处理引擎 223。 The speaker verification processing engine 223 is configured to receive the voiceprint from the voiceprint acquisition module 224 and the user voice input from the speaker voice receiving module 225, that is, the voice data of the speaker, and obtain the voice ripple and reception according to the acoustic characteristics. The user voice input is compared to generate verification result information, and the verification result information is sent to the control module 222. The voiceprint acquisition module 224 is configured to acquire a corresponding voice ripple according to the control of the control module 222, and transmit the acquired voice ripple to the speaker verification processing engine 223. The voice verification information in the corresponding speaker verification command is included in the above speaker verification request, and then the control module 222 sends an acquisition voice ripple command including the voice ripple information to the voiceprint acquisition module 224, and the voiceprint acquisition module 224 is based on the sound. The voice ripple path and the voice ripple name in the ripple information are acquired to the corresponding address to obtain the corresponding voice ripple, and then the acquired voice ripple is transmitted to the speaker verification processing engine 223.
说话人声音接收模块 225 , 用于根据控制模块 222的命令接收用户 的语音输入, 并将接收的语音传送给说话人验证处理引擎 223。  The speaker voice receiving module 225 is configured to receive the voice input of the user according to the command of the control module 222, and transmit the received voice to the speaker verification processing engine 223.
本实施例中的媒体资源控制实体可以是 MGC,则对应的媒体资源处 理实体为 MG; 媒体资源控制实体也可以是 MRFC, 则对应的媒体资源 处理实体为 MRFP。  The media resource control entity in this embodiment may be an MGC, and the corresponding media resource processing entity is an MG. The media resource control entity may also be an MRFC, and the corresponding media resource processing entity is an MRFP.
下面再通过一个示例来说明本实施例基于的网络架构。如图 4所示, 为本实施例应用于 IP多媒体子系统( IP Multimedia Subsystem, IMS ) 网络中的网络架构, 包括 AS、 S-CSCF、 MRFC和 MRFP, 贝' J MRFC通 过 S-CSCF接收来自 AS的说话人验证命令, 根据该命令指示 MRFP获 取对应的声波纹和用户语音输入并进行验证, 以及将验证结果信息返回 给 S-CSCF。 当然 IMS网络中还包括其它实体, 但由于与本发明实施例 的关系不大, 所以这里没有示出。  The network architecture on which the embodiment is based will be described below by way of an example. As shown in FIG. 4, the network architecture applied in the IP Multimedia Subsystem (IMS) network, including AS, S-CSCF, MRFC, and MRFP, is received by the S-CSCF through the S-CSCF. The speaker verification command of the AS, according to the command, instructs the MRFP to acquire and perform verification of the corresponding voiceprint and user voice input, and return the verification result information to the S-CSCF. Of course, other entities are also included in the IMS network, but since they are not related to the embodiment of the present invention, they are not shown here.
本发明实现说话人验证的方法的一种具体实施例的总体流程如图 5 所示, 主要包括以下步骤:  The overall process of a specific embodiment of the method for implementing speaker verification according to the present invention is shown in FIG. 5, which mainly includes the following steps:
步骤 501、 媒体资源控制实体指示媒体资源处理实体进行说话人验 证处理;  Step 501: The media resource control entity instructs the media resource processing entity to perform speaker verification processing;
步骤 502、 媒体资源处理实体根据媒体资源控制实体的指示接收说 话人的语音输入并进行验证, 将验证结果上报给媒体资源控制实体。 上述媒体资源控制实体和媒体资源处理实体可以是 MGC和 MG, 或者是 MRFC和 MRFP。 以下以本实施例应用于 IMS网络架构中为例, 即上述媒体资源控制实体和媒体资源处理实体是 MRFC和 MRFP为例, 详细说明本实施例。 如图 6所示, 本实施例应用于 IMS网络架构时具体 包括如下步骤: Step 502: The media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity, and performs verification, and reports the verification result to the media resource control entity. The foregoing media resource control entity and media resource processing entity may be an MGC and an MG, or an MRFC and an MRFP. The present embodiment is applied to the IMS network architecture as an example. The media resource control entity and the media resource processing entity are MRFC and MRFP, and the embodiment is described in detail. As shown in FIG. 6, when the embodiment is applied to an IMS network architecture, the following steps are specifically included:
步骤 601、 MRFC与 MRFP之间的承载通道建立后, MRFC接收来 自 S-CSCF的说话人验证命令, 该命令中包括说话人的声波纹信息。 这 里的声波纹信息为对应用户声波纹的路径和名称。 S-CSCF发送给 MRFC 的说话人验证命令通常是来自 AS的, 另外, AS也可以直接向 MRFC 发送说话人险证命令。  Step 601: After the bearer channel between the MRFC and the MRFP is established, the MRFC receives a speaker verification command from the S-CSCF, where the command includes the speaker's voice ripple information. The voice ripple information here is the path and name corresponding to the user's voice ripple. The speaker verification command sent by the S-CSCF to the MRFC is usually from the AS. In addition, the AS can also send the speaker risk command directly to the MRFC.
MRFC中具体可以包括说话人验证及业务信息交互模块、 控制模块 和媒体控制交互模块。 则本步骤具体包括: 说话人验证及业务信息交互 模块直接或通过 S-CSCF接收来自 AS的业务信息, 即包括说话人声波 纹信息的说话人验证命令, 并将该说话人验证命令传送给控制模块。  The MRFC may specifically include a speaker verification and service information interaction module, a control module, and a media control interaction module. The step specifically includes: the speaker verification and service information interaction module receives the service information from the AS directly or through the S-CSCF, that is, the speaker verification command including the speaker voice ripple information, and transmits the speaker verification command to the control. Module.
步骤 602、 MRFC将接收的说话人验证命令转化为说话人验证请求, 并将该请求转化为 H.248消息格式通过其与 MRFP之间的 Mp接口发送 给 MRFP。  Step 602: The MRFC converts the received speaker verification command into a speaker verification request, and converts the request into an H.248 message format and sends it to the MRFP through the Mp interface between the MRFP and the MRFP.
具体地, 本步骤中, 可以是 MRFC中的控制模块根据接收的说话人 验证命令生成说话人验证请求, 并将该验证请求传送给媒体控制交互模 块, 媒体控制交互模块将该验证请求转化为 H.248协议支持的格式发送 给 MRFP。  Specifically, in this step, the control module in the MRFC may generate a speaker verification request according to the received speaker verification command, and transmit the verification request to the media control interaction module, and the media control interaction module converts the verification request into H. The format supported by the .248 protocol is sent to the MRFP.
本步骤中的说话人验证请求可以采用 H.248协议中的 Mod.req消息, 其中包括该业务会话端点 T1的信息, 信号的取值为说话人验证。 另夕卜, 该请求中具体还可以包括如下信息中的一项或多项:  The speaker verification request in this step may adopt a Mod.req message in the H.248 protocol, including information of the service session endpoint T1, and the value of the signal is speaker verification. In addition, the request may specifically include one or more of the following information:
1 )声波纹标识; 即声波纹信息, 包括声波纹的路径和名称, 其中声 波纹存放的路径可以为本地服务器路径, 也可以为网络服务器路径; 声 波纹名称采用字符串, 但声波纹名称在指定路径下要求具有唯一性;1) the voiceprint identifier; that is, the voice ripple information, including the path and name of the voice ripple, where the sound The path of the ripple storage may be a local server path or a network server path; the voiceprint name adopts a string, but the voiceprint name is required to be unique under the specified path;
2 )分数阈值; 取值范围为 0 - 100, 用于 MRFP根据该阈值确认说 话人验证是否成功, 匹配结果分数大于等于这个数值时,认为验证成功;2) the score threshold; the value range is 0 - 100, which is used by the MRFP to confirm whether the speaker verification is successful according to the threshold, and the verification result is successful if the matching result score is greater than or equal to this value;
3 )初始提示音; 当开始用户验证之前播放的提示音, 用户根据提示 音进行语音输入; 3) initial prompt tone; when the prompt tone played before the user verification is started, the user performs voice input according to the prompt tone;
4 )验证成功提示音; 当匹配结果分数大于等于分数阈值时, 播放成 功提示音;  4) verifying the success tone; when the matching result score is greater than or equal to the score threshold, playing a successful prompt tone;
5 )验证失败提示音; 当匹配结果分数小于分数阈值时, 播放失败提 示音;  5) verifying the failure prompt tone; when the matching result score is less than the score threshold, playing the failure prompt tone;
6 )无输入允许提示的最大次数; 当没有接收到用户语音时, 通过初 始提示音提示用户进行语音输入的最大次数;  6) The maximum number of times the input is allowed to be prompted; when the user's voice is not received, the maximum number of times the user is prompted to perform voice input by the initial prompt tone;
7 )等待说话人语音输入的定时器最大时长; 本参数用来指示等待语 音输入的最大时间, 超时则按照验证失败处理;  7) The maximum duration of the timer waiting for the speaker's voice input; this parameter is used to indicate the maximum time to wait for the voice input, and the timeout is processed according to the verification failure;
8 )语音输入结束检测按键; 用户可以通过按键结束语音输入, 可以 通过初始提示音向用户通知具体按键。  8) Voice input end detection button; the user can end the voice input by pressing the button, and the user can be notified of the specific button by the initial prompt tone.
上述说话人验证请求包括的信息中, 除声波纹标识为来自说话人验 证命令的之外, 其他均为根据预先设置在 MRFC中的对应信息生成。  The information included in the speaker verification request described above is generated based on the corresponding information set in the MRFC in addition to the voiceprint identification from the speaker verification command.
步骤 603、 MRFP接收到说话人验证请求后,向 MRFC返回 Mod.resp 消息其中包括端点 T1的信息。  Step 603: After receiving the speaker verification request, the MRFP returns a Mod.resp message to the MRFC including the information of the endpoint T1.
MRFP 中具体可以包括媒体控制交互模块、 控制模块、 说话人验证 处理引擎、 声波纹获取模块和说话人声音接收模块。 则本步骤中, 由 MRFP中的媒体控制交互模块接收上述 H.248协议格式的说话人验证请 求, 并将该请求转换为 MRFP中的控制模块能够识别的信息后传送给该 控制模块; MRFP的控制模块根据接收到的请求生成 Mod.resp消息作为 响应, 并通过 MRFP的媒体控制交互模块发送给 MRFC。 Specifically, the MRFP may include a media control interaction module, a control module, a speaker verification processing engine, a voice ripple acquisition module, and a speaker voice receiving module. In this step, the media control interaction module in the MRFP receives the speaker verification request in the H.248 protocol format, and converts the request into information that can be recognized by the control module in the MRFP, and then transmits the information to the control module; The control module generates a Mod.resp message based on the received request. Responsive, and sent to the MRFC through the media control interaction module of the MRFP.
步骤 604、 MRFP进行验证处理, 具体包括根据说话人验证请求中 的声波纹标识中的声波纹路径和声波纹名称获取对应的声波纹, 如果没 有获取到声波纹则在步骤 605通过 Not.req消息向 MRFC返回验证失败 的信息或者返回错误信息; 如果获取到声波纹, 则向用户播放初始提示 音并等待用户输入, 接收到用户语音输入后, MRFP对用户输入的语音 和获取的声波纹样本进行对比得到验证结果分数, 如果验证结果分数大 于等于分数阈值, 则向用户发送验证成功提示音, 并在步骤 605 通过 Not.req消息向 MRFC返回验证成功的信息; 如果验证结果分数小于分 数阈值, 则向用户发送验证失败提示音, 并在步骤 605通过 Not.req消 息向 MRFC返回验证失败的信息。  Step 604: The MRFP performs verification processing, specifically, acquiring a corresponding voice ripple according to the voice ripple path and the voice ripple name in the voiceprint identifier in the speaker verification request, and passing the Not.req message in step 605 if the voice ripple is not obtained. Returning the verification failure information to the MRFC or returning the error information; if the voice ripple is obtained, the initial prompt tone is played to the user and waiting for the user input, and after receiving the user voice input, the MRFP performs the voice input by the user and the acquired voice ripple sample. The verification result score is obtained. If the verification result score is greater than or equal to the score threshold, the verification success prompt tone is sent to the user, and in step 605, the verification success information is returned to the MRFC through the Not.req message; if the verification result score is less than the score threshold, then A verification failure prompt tone is sent to the user, and in step 605, the information of the verification failure is returned to the MRFC through the Not.req message.
本步骤中, 控制模块根据来自媒体控制交互模块的说话人验证请求 人验证处理引擎, 并控制说话人声音接收模块向用户播放初始提示音, 和接收来自用户的语音输入; 这里如果在上述等待说话人语音输入的定 时器最大时长之前, 说话人声音接收模块接收到用户的语音输入, 则在 接收到语音输入结束检测按键, 或者预先设定的输入时长到达后, 将该 语音输入传送给说话人验证处理引擎; 如果在上述等待说话人语音输入 的定时器最大时长到达后, 还没有接收到用户的语音输入, 则通知说话 人验证处理引擎, 确定验证失败; 控制模块还控制说话人验证处理引擎 根据其接收的声波纹和用户语音输入进行说话人验证, 说话人验证处理 弓 I擎根据声学特征把获取的声波纹和接收的用户语音输入进行对比, 从 而生成匹配结果分数, 将该分数与说话人验证请求中的分数阈值进行对 比, 如果匹配结果分数大于或等于该分数阈值, 则确定验证成功, 通过 说话人声音接收模块向用户播放验证成功提示音;否则,确定验证失败, 通过说话人声音接收模块向用户播放验证失败提示音; 说话人验证处理 引擎将验证成功或者验证失败的验证结果, 或者进一步包括的其他信 息、 如匹配结果分数、 用户输入语音的时长、 声波纹标识等验证结果信 息发送给控制模块, 控制模块将这些信息传送给媒体控制交互模块。 如 果本步骤中涉及的各模块均为 MRFP中的模块。 In this step, the control module verifies the processing engine according to the speaker from the media control interaction module, and controls the speaker voice receiving module to play the initial prompt tone to the user, and receives the voice input from the user; Before the maximum duration of the human voice input timer, the speaker voice receiving module receives the voice input of the user, and after receiving the voice input end detection button, or the preset input duration arrives, the voice input is transmitted to the speaker. Verifying the processing engine; if the voice input of the user has not been received after the maximum duration of the timer for waiting for the speaker voice input, the speaker verification processing engine is notified to determine that the verification fails; the control module also controls the speaker verification processing engine The speaker verification is performed according to the received voice ripple and the user voice input, and the speaker verification processing compares the acquired voice ripple with the received user voice input according to the acoustic characteristics, thereby generating a matching result score, and the score is compared with the speech. Person test Request score threshold comparison, if the result of the matching score is greater than or equal to the threshold score, it is determined that the verification is successful, the receiving module through the speaker's voice played to the user authentication is successful tone; otherwise, determines that the verification fails, The verification failure prompt tone is played to the user through the speaker voice receiving module; the verification result of the verification success or the verification failure by the speaker verification processing engine, or other information further included, such as the matching result score, the duration of the user input voice, and the voiceprint identifier The verification result information is sent to the control module, and the control module transmits the information to the media control interaction module. If each module involved in this step is a module in the MRFP.
步骤 605、MRFP向 MRFC返回险证结果信息,具体可以通过 Not.req 消息携带该验证结果信息。 这里的验证结果信息至少包括验证是否成功 的信息, 此外还可以包括如下信息中的一项或多项:  Step 605: The MRFP returns the risk certificate result information to the MRFC, and the verification result information may be carried by the Not.req message. The verification result information here includes at least information on whether the verification is successful, and may further include one or more of the following information:
1 ) 匹配值, 即匹配结果分数, 分数值可以为 0 - 100之间, 100匹 配最好, 0匹配最差;  1) The matching value, that is, the matching result score, the score value can be between 0 - 100, 100 matches best, 0 matches the worst;
2 )输入语音的时长;  2) the length of time the voice is input;
3 )声波纹标识。  3) Sound ripple logo.
MRFP 中的媒体控制交互模块将接收的验证结果信息转换为 H.248 协议支持的格式, 并发送给 MRFC。  The media control interaction module in the MRFP converts the received verification result information into a format supported by the H.248 protocol and sends it to the MRFC.
步骤 606、 MRFC接收到验证结果信息后, 向 MRFP返回 Not.resp 消息进行响应。  Step 606: After receiving the verification result information, the MRFC returns a Not.resp message to the MRFP to respond.
本步骤具体包括, MRFC中的媒体控制交互模块接收来自 MRFP的 验证结果信息,并进行协议转换处理为 MRFC中的控制模块能够识别的 信息后发送给该控制模块, 控制模块接收到该信息后, 通过媒体控制交 互模块向 MRFP返回 H.248协议格式的 Not.resp消息进行响应。  The step of the step includes: the media control interaction module in the MRFC receives the verification result information from the MRFP, and performs protocol conversion processing on the information that can be recognized by the control module in the MRFC, and then sends the information to the control module, and after receiving the information, the control module receives the information. The media control interaction module returns a response to the MRFP returning the Not.resp message in the H.248 protocol format.
步骤 607、 MRFC将接收到的验证结果信息转化为其与 S-CSCF之 间的 Mr接口支持的消息, 并发送给 S-CSCF, S-CSCF再将该验证结果 信息发送给 AS。当然,这里 MRFC也可以将验证结果信息不通过 S-CSCF 而直接发送给 AS。  Step 607: The MRFC converts the received verification result information into a message supported by the Mr interface between the S-CSCF and sends the message to the S-CSCF, and the S-CSCF sends the verification result information to the AS. Of course, here the MRFC can also send the verification result information directly to the AS without passing through the S-CSCF.
MRFC中的控制模块将接收的来自媒体控制交互模块的验证结果信 息, 传递给说话人验证及业务信息交互模块, 说话人验证及业务信息交 互模块根据该验证结果信息直接或通过 S-CSCF向 AS返回验证结果信 由上述具体实施例中描述的方案可以看出, 本发明实施例通过媒体 资源控制实体指示媒体资源处理实体进行说话人验证处理, 媒体资源处 理实体根据媒体资源控制实体的指示接收说话人的语音输入并进行验 证, 将验证结果上报给媒体资源控制实体, 不需要改变现有的承载控制 分离的网络架构和协议结构, 即实现了说话人验证技术, 减少了网络升 级的成本。 The verification result from the media control interaction module received by the control module in the MRFC The message is transmitted to the speaker verification and service information interaction module, and the speaker verification and service information interaction module returns the verification result letter to the AS directly or through the S-CSCF according to the verification result information, which can be seen by the solution described in the above specific embodiment. The media resource control entity instructs the media resource processing entity to perform the speaker verification process, and the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and performs verification, and reports the verification result to the media resource control. Entity, without changing the existing network architecture and protocol structure of bearer control separation, realizes speaker verification technology and reduces the cost of network upgrade.
此外, 本发明实施例通过提供一种包括说话人验证及业务信息交互 模块、 控制模块和媒体控制交互模块的媒体资源控制实体, 实现了现有 网络架构中的媒体资源控制实体能够支持说话人验证处理。  In addition, the embodiment of the present invention provides a media resource control entity in the existing network architecture to support speaker verification by providing a media resource control entity including a speaker verification and service information interaction module, a control module, and a media control interaction module. deal with.
并通过提供一种包括媒体控制交互模块和说话人验证模块的媒体资 源处理实体, 实现了现有网络架构中的媒体资源处理实体能够支持说话 人 -险证处理。  And by providing a media resource processing entity including a media control interaction module and a speaker verification module, the media resource processing entity in the existing network architecture can support speaker-risk processing.
以上是对本发明具体实施例的说明, 在具体的实施过程中可对本发 明的方法进行适当的改进,以适应具体情况的具体需要。 因此可以理解, 根据本发明的具体实施方式只是起示范作用, 并不用以限制本发明的保 护范围。  The above is a description of specific embodiments of the present invention, and the method of the present invention can be appropriately modified in a specific implementation process to suit the specific needs of a specific situation. Therefore, it is to be understood that the specific embodiments of the present invention are merely exemplary and are not intended to limit the scope of the invention.

Claims

权利要求书 Claim
1、 一种说话人验证方法, 其特征在于, 该方法包括:  A speaker verification method, the method comprising:
媒体资源控制实体指示媒体资源处理实体进行说话人验证处理; 媒体资源处理实体接收说话人的语音输入并进行验证, 将验证结果 上报给媒体资源控制实体。  The media resource control entity instructs the media resource processing entity to perform the speaker verification process; the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity.
2、根据权利要求 1所述的方法, 其特征在于, 所述媒体资源控制实 体指示媒体资源处理实体进行说话人验证处理, 包括:  The method according to claim 1, wherein the media resource control entity instructs the media resource processing entity to perform a speaker verification process, including:
媒体资源控制实体根据接收的包括说话人声波纹信息的说话人验证 命令, 向媒体资源处理实体发送包括上述说话人声波纹信息的验证请 求。  The media resource control entity transmits a verification request including the speaker voice ripple information to the media resource processing entity based on the received speaker verification command including the speaker voice ripple information.
3、根据权利要求 2所述的方法, 其特征在于, 所述媒体资源处理实 体接收说话人的语音输入并进行验证, 将验证结果上报给媒体资源控制 实体, 包括:  The method according to claim 2, wherein the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity, including:
媒体资源处理实体根据所述验证请求获取所述声波纹信息对应的声 波纹, 通过该声波纹对根据所述验证请求接收的用户语音输入进行验 证, 并将验证结果返回给媒体资源控制实体。  The media resource processing entity acquires the voice ripple corresponding to the voice ripple information according to the verification request, and verifies the user voice input received according to the verification request by the voice ripple, and returns the verification result to the media resource control entity.
4、根据权利要求 2或 3中任一所述的方法, 其特征在于, 所述声波 纹信息包括: 声波纹路径和声波纹名称。  The method according to any one of claims 2 or 3, wherein the sound wave pattern information comprises: a voice ripple path and a voice ripple name.
5、 根据权利要求 3所述的方法, 其特征在于, 所述验证请求中进一 步包括下述各参数: 用于媒体资源处理实体确定说话人验证是否成功的 阈值、 初始提示音、 验证成功提示音、 验证失败提示音、 无输入允许提 示的最大次数、 等待说话人语音输入的定时器最大时长或语音输入结束 检测按键; 或者进一步包括以上各参数的任意组合;  The method according to claim 3, wherein the verification request further includes the following parameters: a threshold for the media resource processing entity to determine whether the speaker verification is successful, an initial prompt tone, and a verification success prompt tone. , the verification failure prompt tone, the maximum number of times the input is allowed to be prompted, the maximum duration of the timer waiting for the speaker's voice input, or the voice input end detection button; or further including any combination of the above parameters;
对应地, 所述媒体资源处理实体根据所述阈值和用户语音输入与所 述声波纹匹配的结果确定说话人验证是否成功, 或根据所述初始提示音 参数在验证前提示用户进行验证语音的输入, 或根据所述验证成功提示 音参数在验证通过后, 进行验证成功提示放音, 或根据所述验证失败提 示音在验证失败后, 进行验证失败提示放音, 或根据所述无输入允许提 示的最大次数在检测到用户没有语音输入后, 进行多次提示放音, 或根 据所述等待说话人语音输入的定时器最大时长确定等待用户的语音输 入, 根据在该最大时长范围内没有接收到用户的语音输入确定验证失 败, 或根据接收到的用户输入的所述语音输入结束检测按键确定用户语 音输入结束, 或者执行上述各个步骤的任意组合。 Correspondingly, the media resource processing entity inputs and coordinates according to the threshold and the user voice The result of the voice ripple matching determines whether the speaker verification is successful, or prompts the user to perform the verification voice input according to the initial prompt sound parameter before the verification, or according to the verification success prompt sound parameter, after the verification is passed, the verification success prompts Play, or according to the verification failure prompt tone, after the verification failure, the verification failure prompts the playback, or according to the maximum number of the no-input permission prompts, after detecting that the user has no voice input, the prompting playback is performed multiple times. Or determining a voice input waiting for the user according to the maximum duration of the timer waiting for the speaker voice input, determining the verification failure according to the voice input that does not receive the user within the maximum duration, or according to the received user input. The input end detection button determines the end of the user's voice input, or performs any combination of the various steps described above.
6、根据权利要求 3所述的方法,其特征在于,所述的验证结果包括: 验证是否成功。  6. The method of claim 3, wherein the verification result comprises: verifying whether the verification is successful.
7、根据权利要求 6所述的方法, 其特征在于, 所述验证结果中进一 步包括: 匹配值、 输入语音的时长或声波纹信息, 或以上各项的任意组 合。  The method according to claim 6, wherein the verification result further comprises: a matching value, a duration of the input voice or voice ripple information, or any combination of the above.
8、根据权利要求 3所述的方法, 其特征在于, 如果所述媒体资源处 理实体根据所述验证请求没有获取到所述声波纹信息对应的声波纹, 则 媒体资源处理实体向媒体资源控制实体上报错误信息, 然后结束该流 程。  The method according to claim 3, wherein if the media resource processing entity does not acquire the voice ripple corresponding to the voice ripple information according to the verification request, the media resource processing entity sends a media resource control entity to the media resource control entity. Report the error message and end the process.
9、 一种说话人验证系统, 其特征在于, 该系统包括:  9. A speaker verification system, characterized in that the system comprises:
媒体资源控制实体, 用于指示媒体资源处理实体进行说话人验证处 理, 以及接收来自所述媒体资源处理实体的验证结果;  a media resource control entity, configured to instruct the media resource processing entity to perform speaker verification processing, and receive a verification result from the media resource processing entity;
所述媒体资源处理实体, 用于接收说话人的语音输入并进行验证, 以及将验证结果上报给所述媒体资源控制实体。  The media resource processing entity is configured to receive a voice input of the speaker and perform verification, and report the verification result to the media resource control entity.
10、 根据权利要求 9所述的系统, 其特征在于, 该系统中进一步包 括: 业务服务器, 用于向所述媒体资源控制实体发送说话人验证命令, 以及接收所述媒体资源控制实体返回的验证结果; 10. The system according to claim 9, wherein the system further comprises: a service server, configured to send a speaker verification command to the media resource control entity, and receive a verification result returned by the media resource control entity;
所述媒体资源控制实体接收所述说话人验证命令, 并根据该验证命 令生成对应的说话人验证请求发送给媒体资源处理实体, 且进一步将接 收到的来自媒体资源处理实体的验证结果上报给业务服务器。  The media resource control entity receives the speaker verification command, and generates a corresponding speaker verification request according to the verification command, and sends the verification result to the media resource processing entity, and further reports the received verification result from the media resource processing entity to the service. server.
11、根据权利要求 10所述的系统,其特征在于,该系统进一步包括: 业务呼叫会话控制功能实体, 连接在应用服务器和所述媒体资源控制实 体之间, 用于接收来自应用服务器的说话人验证命令, 并将该命令发送 给所述媒体资源控制实体, 和接收来自所述媒体资源控制实体的验证结 果, 并将验证结果发送给应用服务器。  The system according to claim 10, wherein the system further comprises: a service call session control function entity, connected between the application server and the media resource control entity, for receiving a speaker from the application server And verifying the command, and sending the command to the media resource control entity, and receiving the verification result from the media resource control entity, and sending the verification result to the application server.
12、根据权利要求 9至 11中任一所述的系统, 其特征在于, 所述媒 体资源控制实体为媒体资源控制器, 所述媒体资源处理实体为媒体资源 处理器; 或者所述媒体资源控制实体为媒体网关控制器, 所述媒体资源 处理实体为媒体网关。  The system according to any one of claims 9 to 11, wherein the media resource control entity is a media resource controller, the media resource processing entity is a media resource processor; or the media resource control The entity is a media gateway controller, and the media resource processing entity is a media gateway.
13、 一种媒体资源控制实体, 其特征在于, 该媒体资源控制实体中 包括:  A media resource control entity, wherein the media resource control entity includes:
说话人验证及业务信息交互模块, 用于接收说话人验证命令, 并将 该验证命令传送给控制模块, 以及将来自控制模块的验证结果返回给发 送所述说话人验证命令的设备;  a speaker verification and service information interaction module, configured to receive a speaker verification command, and transmit the verification command to the control module, and return the verification result from the control module to the device that sends the speaker verification command;
控制模块, 用于根据来自说话人验证及业务信息交互模块的说话人 验证命令生成说话人验证请求, 并将该验证请求传送给媒体控制交互模 块; 以及接收来自媒体控制交互模块的说话人验证结果, 并将该验证结 果传递给说话人验证及业务信息交互模块;  a control module, configured to generate a speaker verification request according to a speaker verification command from the speaker verification and business information interaction module, and transmit the verification request to the media control interaction module; and receive a speaker verification result from the media control interaction module And passing the verification result to the speaker verification and business information interaction module;
媒体控制交互模块, 用于接收来自控制模块的说话人验证请求, 并 将该请求转化为媒体控制协议支持的格式发送给媒体资源处理实体; 以 及接收来自媒体资源处理实体的媒体控制协议支持格式的验证结果, 并 进行协议转换处理为控制模块能够识别的信息后发送给控制模块。 a media control interaction module, configured to receive a speaker verification request from the control module, and convert the request into a format supported by the media control protocol and send the request to the media resource processing entity; And receiving the verification result of the media control protocol support format from the media resource processing entity, and performing protocol conversion processing on the information that can be recognized by the control module, and then sending the information to the control module.
14、根据权利要求 13所述媒体资源控制实体, 其特征在于, 所述媒 体资源控制实体为媒体资源控制器或媒体网关控制器。  The media resource control entity according to claim 13, wherein the media resource control entity is a media resource controller or a media gateway controller.
15、 一种媒体资源处理实体, 其特征在于, 该媒体资源处理实体包 括:  15. A media resource processing entity, wherein the media resource processing entity comprises:
媒体控制交互模块, 用于接收来自媒体资源控制实体的媒体控制协 议形式的说话人验证请求, 并将该说话人验证请求转换为说话人验证模 块能够识别的信息后发送给该说话人验证模块, 以及接收来自说话人验 证模块的验证结果, 并将该验证结果转换为媒体控制协议消息发送给媒 体资源控制实体;  a media control interaction module, configured to receive a speaker verification request in the form of a media control protocol from the media resource control entity, and convert the speaker verification request into information that can be recognized by the speaker verification module, and send the information to the speaker verification module. And receiving the verification result from the speaker verification module, and converting the verification result into a media control protocol message and sending the message to the media resource control entity;
说话人验证模块, 用于根据来自媒体控制交互模块的说话人验证请 求获取对应的声波纹和接收用户的语音输入, 并通过所述获取的声波纹 对接收的用户输入的语音进行验证确定验证结果, 以及将验证结果发送 给媒体控制交互模块。  a speaker verification module, configured to acquire a corresponding voiceprint according to a speaker verification request from the media control interaction module and receive a voice input of the user, and verify the verification result by using the acquired voiceprint to verify the received voice input by the user And sending the verification result to the media control interaction module.
16、根据权利要求 15所述的媒体资源处理实体, 其特征在于, 所述 说话人验证模块中包括:  The media resource processing entity according to claim 15, wherein the speaker verification module comprises:
控制模块, 用于根据来自媒体控制交互模块的说话人验证请求通知 声波纹获取模块获取对应的声波纹, 并将获取的声波纹传送给说话人验 证处理引擎, 和控制说话人声音接收模块接收来自用户的语音输入, 并 将该语音输入传送给说话人验证处理引擎, 以及控制说话人验证处理引 擎根据其接收的声波纹和用户语音输入进行说话人验证, 并接收说话人 验证处理引擎验证后返回的验证结果, 将该验证结果传送给媒体控制交 互模块;  a control module, configured to notify the voice ripple acquisition module to acquire a corresponding voice ripple according to the speaker verification request from the media control interaction module, and transmit the acquired voice ripple to the speaker verification processing engine, and control the speaker voice receiving module to receive the User's voice input, and the voice input is transmitted to the speaker verification processing engine, and the speaker verification processing engine is controlled to perform speaker verification according to the received voice ripple and user voice input, and receives the speaker verification processing engine verification and returns The verification result is transmitted to the media control interaction module;
说话人验证处理引擎, 用于接收来自声波纹获取模块的声波纹和来 自说话人声音接收模块的用户语音输入, 并根据声学特征把获取的声波 纹和接收的用户语音输入进行对比, 生成验证结果, 并将该验证结果发 送给控制模块; a speaker verification processing engine for receiving sound waves from the voiceprint acquisition module and User voice input from the speaker voice receiving module, and comparing the acquired voiceprint with the received user voice input according to the acoustic characteristics, generating a verification result, and transmitting the verification result to the control module;
声波纹获取模块, 用于根据控制模块的控制获取对应的声波纹, 以 及将获取的声波纹传送给说话人验证处理引擎;  a voice ripple acquisition module, configured to acquire a corresponding voice ripple according to control of the control module, and transmit the acquired voice ripple to the speaker verification processing engine;
说话人声音接收模块, 用于根据控制模块的命令接收用户的语音输 入, 并将接收的语音传送给说话人验证处理引擎。  The speaker voice receiving module is configured to receive the voice input of the user according to the command of the control module, and transmit the received voice to the speaker verification processing engine.
17、根据权利要求 15或 16所述的媒体资源处理实体,其特征在于, 所述媒体资源处理实体为: 媒体资源处理器或媒体网关。  The media resource processing entity according to claim 15 or 16, wherein the media resource processing entity is: a media resource processor or a media gateway.
PCT/CN2007/070805 2006-11-20 2007-09-27 The method and system for authenticating the voice of the speaker, the mrcf and mrpf WO2008061463A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNA2006101403081A CN101192925A (en) 2006-11-20 2006-11-20 Speaker validation method and system and media resource control entity and processing entity
CN200610140308.1 2006-11-20

Publications (1)

Publication Number Publication Date
WO2008061463A1 true WO2008061463A1 (en) 2008-05-29

Family

ID=39429395

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/070805 WO2008061463A1 (en) 2006-11-20 2007-09-27 The method and system for authenticating the voice of the speaker, the mrcf and mrpf

Country Status (2)

Country Link
CN (1) CN101192925A (en)
WO (1) WO2008061463A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951930A (en) * 2015-04-27 2015-09-30 上海交通大学 Electronic cipher ticket method and system based on bio-information identity verification

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923853B (en) * 2009-06-12 2013-01-23 华为技术有限公司 Speaker recognition method, equipment and system
CN101923856B (en) * 2009-06-12 2012-06-06 华为技术有限公司 Audio identification training processing and controlling method and device
CN102111278A (en) * 2010-12-28 2011-06-29 华为技术有限公司 Conference media quality monitoring method, device and system
CN105225664B (en) * 2015-09-24 2019-12-06 百度在线网络技术(北京)有限公司 Information verification method and device and sound sample generation method and device
CN105719650A (en) * 2016-01-30 2016-06-29 深圳市尔木科技有限责任公司 Speech recognition method and system
WO2019065733A1 (en) * 2017-09-28 2019-04-04 京セラ株式会社 Voice command system and voice command method
CN112885360A (en) * 2019-11-29 2021-06-01 安徽智恒信科技股份有限公司 Cabinet door opening control method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4993068A (en) * 1989-11-27 1991-02-12 Motorola, Inc. Unforgeable personal identification system
US20040255168A1 (en) * 2003-06-16 2004-12-16 Fujitsu Limited Biometric authentication system
CN1808567A (en) * 2006-01-26 2006-07-26 覃文华 Voice-print authentication device and method of authenticating people presence
CN1815484A (en) * 2006-03-06 2006-08-09 覃文华 Digitalized authentication system and its method
US20060229879A1 (en) * 2005-04-06 2006-10-12 Top Digital Co., Ltd. Voiceprint identification system for e-commerce

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4993068A (en) * 1989-11-27 1991-02-12 Motorola, Inc. Unforgeable personal identification system
US20040255168A1 (en) * 2003-06-16 2004-12-16 Fujitsu Limited Biometric authentication system
US20060229879A1 (en) * 2005-04-06 2006-10-12 Top Digital Co., Ltd. Voiceprint identification system for e-commerce
CN1808567A (en) * 2006-01-26 2006-07-26 覃文华 Voice-print authentication device and method of authenticating people presence
CN1815484A (en) * 2006-03-06 2006-08-09 覃文华 Digitalized authentication system and its method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951930A (en) * 2015-04-27 2015-09-30 上海交通大学 Electronic cipher ticket method and system based on bio-information identity verification

Also Published As

Publication number Publication date
CN101192925A (en) 2008-06-04

Similar Documents

Publication Publication Date Title
WO2008061463A1 (en) The method and system for authenticating the voice of the speaker, the mrcf and mrpf
US8102838B2 (en) Mechanism for authentication of caller and callee using otoacoustic emissions
US9455983B2 (en) Digital signatures for communications using text-independent speaker verification
JP4272429B2 (en) System and method for providing authentication and identification services in an extended media gateway
US20120084087A1 (en) Method, device, and system for speaker recognition
KR101126775B1 (en) Centralized biometric authentication
US7920680B2 (en) VoIP caller authentication by voice signature continuity
WO2018184433A1 (en) Internet-of-things authentication system and internet-of-things authentication method
US7992196B2 (en) Apparatus and method for performing hosted and secure identity authentication using biometric voice verification over a digital network medium
WO2017036365A1 (en) Voice communication processing method, electronic device, system and storage medium
JP2010109618A (en) Authentication device, authentication method, and program
JP4853646B2 (en) Time authentication system
US11924370B2 (en) Method for controlling a real-time conversation and real-time communication and collaboration platform
CN107172620A (en) A kind of wireless local area network (WLAN) verification method and apparatus
US20030123619A1 (en) Voice authenticated terminal registration
JP2008234398A (en) Voice authentication system and authentication station server
WO2006076347A2 (en) System and method for recording network based voice and video media
WO2015196823A1 (en) Method and device for achieving cyclic playing from text to voice service, and server
WO2009071021A1 (en) Method, system, mscg and server for limiting voip terminal roaming
US8873544B2 (en) Digital telecommunications system, program product for, and method of managing such a system
WO2014187217A1 (en) Voice message implementation method and voice message server
CN111246021B (en) Method for enabling remote access to a personal voice assistant
WO2015172437A1 (en) Multimedia conference access notification method, device and server
JPH10243105A (en) Access authentication system for voice information service
CN115766511A (en) Scheduling communication system performance test method and system based on BICC signaling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07816996

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07816996

Country of ref document: EP

Kind code of ref document: A1