WO2008061463A1

WO2008061463A1 - The method and system for authenticating the voice of the speaker, the mrcf and mrpf

Info

Publication number: WO2008061463A1
Application number: PCT/CN2007/070805
Authority: WO
Inventors: Zhiyong Xu
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2006-11-20
Filing date: 2007-09-27
Publication date: 2008-05-29
Also published as: CN101192925A

Abstract

The method for authenticating the voice of the speaker includes, the MRCF indicates the MRPF to authenticate the voice of the speaker. The MRPF authenticates the voice received, and sends the result to the MRCF. The system for authenticating the voice of the speaker, the MRCF and the MRPF are given at the same time. So the authentication of the speaker's voice can be achieved without changing the current network architecture.

Description

Speaker verification method and system and media resource control entity and processing entity

The present invention relates to the field of communication technologies, and in particular, to a method for speaker verification, a system for speaker verification, a media resource control entity, and a media resource processing entity. Background of the invention

Speaker verification technology is a technique for performing feature matching by analyzing the speaker's speech characteristics, matching the features with the already sampled speech samples, and then performing identity verification based on the matching results. This technique verifies the speaker by analyzing the unique characteristics of the speech sample, such as the frequency of the pronunciation. Speaker verification technology allows people to control access to restricted areas, such as telephone banking, database services, voicemail, etc., through voices that speak. The voice sample is called a voiceprint and is a voice sample that is used as a verification standard, usually a user voice that is directly recorded.

Currently, the Standards Organization Internet Engineering Task Force (IETF) Speech Insurance (SPEECHSC) Working Group defines the application network architecture of Speaker Identifier/Speaker Verified (SI/SV). The IETF mentioned the application scenario and recommended architecture of the technology in RFC4313. As shown in Figure 1, it is recommended to use Application Server (AS) to control services. The lower layer uses media processing unit to support media processing functions and media interaction with clients. The speaker verification command from the application server is received by the additional SI/SV server, and the voice sent by the media resource processing entity is received for speaker verification, and then the verification result is reported to the AS. The control signaling interaction between the SI/SV server and the AS and the media processing unit for speaker verification/verification is performed through a specially defined SPEECHSC protocol. In a system in which media control and bearer are separated, a media resource control entity and a media resource processing entity interact with an AS to implement a media resource service. If the speaker verification technology needs to be implemented in the system of media control and bearer separation, according to the above-mentioned speaker verification network architecture, the above SI/SV server needs to be set in the system of media control and bearer separation, and the existing protocol needs to be changed. The structure, the changes to the existing network, and the high cost of network upgrades. Summary of the invention

In view of this, the embodiments of the present invention provide a method and system for speaker verification, and a media resource control entity and a media resource processing entity, which can implement speaker verification technology without changing the network architecture of the media control and bearer separation system. .

The speaker verification method provided by the embodiment of the invention includes:

The media resource control entity instructs the media resource processing entity to perform the speaker verification process; the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity.

The speaker verification system provided by the embodiment of the invention includes:

a media resource control entity, configured to instruct the media resource processing entity to perform speaker verification processing, and receive a verification result from the media resource processing entity;

The media resource processing entity is configured to receive and verify the voice input of the speaker, and report the verification result to the media resource control entity.

The media resource control entity provided by the embodiment of the present invention includes: a speaker verification and service information interaction module, a control module, and a media control interaction module;

a speaker verification and service information interaction module, configured to receive a speaker verification command, and transmit the verification command to the control module, and return the verification result from the control module to the device that sends the speaker verification command; The control module is configured to generate a speaker verification request according to the speaker verification command from the speaker verification and business information interaction module, and transmit the verification request to the media control interaction module; and receive the speaker verification result from the media control interaction module, And passing the verification result to the speaker verification and business information interaction module;

The media control interaction module is configured to receive a speaker verification request from the control module, and convert the request into a format supported by the media control protocol to be sent to the media resource processing entity; and receive the verification of the media control protocol support format from the media resource processing entity As a result, the protocol conversion process is performed and the information that can be recognized by the control module is sent to the control module.

The media resource processing entity provided by the embodiment of the present invention includes: a media control interaction module and a speaker verification module;

a media control interaction module, configured to receive a speaker verification request in the form of a media control protocol from the media resource control entity, and convert the speaker verification request into information that can be recognized by the speaker verification module, and send the information to the speaker verification module. And receiving the verification result from the speaker verification module, and converting the verification result into a media control protocol and sending the result to the media resource control entity;

a speaker verification module, configured to acquire a corresponding voiceprint according to a speaker verification request from the media control interaction module and receive a voice input of the user, and verify the verification result by using the acquired voiceprint to verify the received voice input by the user And sending the verification result to the media control interaction module.

It can be seen that, in the embodiment of the present invention, the media resource control entity instructs the media resource processing entity to perform speaker verification processing, and the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and performs verification, and The verification result is reported to the media resource control entity, and the network architecture and protocol structure of the existing bearer control separation need not be changed, that is, the speaker verification technology is implemented, and the cost of the network upgrade is reduced. BRIEF DESCRIPTION OF THE DRAWINGS

1 is a schematic diagram of a network architecture for implementing a speaker verification technology in the prior art; FIG. 2 is a schematic diagram of a system composition for implementing a speaker verification technology according to an embodiment of the present invention;

3 is a schematic diagram of internal components of a media resource control entity and a media resource processing entity according to an embodiment of the present invention;

4 is a schematic structural diagram of a network according to an example of a specific embodiment of the present invention;

FIG. 5 is a general flowchart of a method for verifying a speaker in a specific embodiment of the present invention; FIG. 6 is a specific flowchart of an example of a method for verifying a speaker in a specific embodiment of the present invention. Mode for carrying out the invention

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings.

In the embodiment of the present invention, the media resource control entity instructs the media resource processing entity to perform the speaker verification process; the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity, thereby implementing the speaker. Verification technology.

After receiving the speaker verification command including the speaker voice ripple information, the media resource control entity may send a verification request including the speaker voice ripple information to the media resource processing entity according to the verification command; the media resource processing entity acquires the voice ripple The voice ripple corresponding to the information, through which the user voice input received according to the verification request is verified, and the verification result is returned to the media resource control entity.

The above acoustic ripple information may include: a voice ripple path and a voice ripple name. The media resource processing entity can obtain the corresponding voice ripple according to the voice ripple information. The voice ripple path can be a local path or an accessible network server path; the voiceprint name can take characters String. Under the above described acoustic corrugation path, the voiceprint name uniquely corresponds to one voiceprint file.

The media resource control entity may be a Multimedia Resource Function Controller (MRFC) or a Media Gateway Controller (MGC). Correspondingly, the media resource processing entity is a Multimedia Resource Function Processor (MRFP) or a Media Gateway (MG).

The invention will now be described in detail by way of specific examples.

A specific implementation of the system for implementing speaker verification of the present invention is shown in FIG. 2. The system includes: a media resource control entity 21 and a media resource processing entity 22.

The media resource control entity 21 is configured to instruct the media resource processing entity 22 to perform a speaker verification process. Specifically, the media resource control entity 21 may receive a device speaker verification command from the AS or from another device for transmitting a media resource service request, the verification command including speaker voice ripple information, and the media resource processing entity 22 according to the verification command. A speaker verification request including the above speaker voice ripple information is transmitted.

The media resource processing entity 22 is configured to receive the voice input of the speaker according to the indication of the media resource control entity 21 and perform voice verification processing, and report the verification result to the media resource control entity 21. Specifically, the media resource processing entity 22 receives the speaker verification request from the media resource control entity 21 and acquires the corresponding voice ripple based on the speaker voice ripple information in the request.

As shown in FIG. 3, the media resource control entity 21 specifically includes a speaker verification and service information interaction module 211, a control module 212, and a media control interaction module 213.

The speaker verification and service information interaction module 211 is configured to receive service information from an application server or other device for sending a media resource service request, such as a Service-Call Session Control Function (S-CSCF). , that is, a speaker verification command including speaker voice ripple information, and transmitting the verification command to the control mode Block 212, and returning the verification result information from the control module 212 to the service application server or other device for transmitting the media resource service request. Of course, the speaker verification and service information interaction module 211 is also used to control the interaction of the module 212 with the application server or other devices to perform other service information.

The control module 212 is configured to control the process interaction and the state machine of the call in the service process, and in this embodiment, specifically, the speaker verification request is generated according to the speaker verification command from the speaker verification and service information interaction module 211, and The verification request is transmitted to the media control interaction module 213; and the speaker verification result information from the media control interaction module 213 is received, and the verification result information is transmitted to the speaker verification and service information interaction module 211.

The media control interaction module 213 is configured to receive information from the control module 212, convert the information into a media control protocol message, such as an H.248 protocol message, to the media resource processing entity 22, and receive information from the media resource processing entity 22. And converted to information that the control module 212 can verify is transmitted to the control module 212. Specifically, in this embodiment, the speaker verification request from the control module 212 is received, and the request is converted into a media control protocol, such as a format supported by H.248, and sent to the media control interaction module 221 in the media resource processing entity 22. And receiving the verification result information from the media control protocol support format of the media control interaction module 223 in the media resource processing entity 22, and performing protocol conversion processing on the information that can be verified by the control module 212, and then sending the information to the control module 212. The following is an example of the H.248 protocol using the media control protocol.

As shown in FIG. 3, the media resource processing entity 22 specifically includes: a speaker verification module 220 and a media control interaction module 221.

The media control interaction module 221 is configured to receive the information in the H.248 protocol format from the media resource control entity, and convert the information that can be verified by the speaker verification module 220 to the speaker verification module 220; and the speaker from the speaker Verification module 220 information The format is converted to the H.248 protocol and sent to the media resource control entity. In this embodiment, the speaker verification request in the H.248 protocol format that receives the media control interaction module 213 from the media resource control entity 21 is converted into information that can be verified by the speaker verification module 220, and then transmitted to the speaker verification module. And receiving the verification result information from the speaker verification module 220, and converting the information into a format supported by the H.248 protocol and sent to the media control interaction module 213 in the media resource control entity 21.

The speaker verification module 220 is configured to acquire a corresponding voiceprint and receive a voice input of the user according to the received speaker verification request information from the media control interaction module, and verify the received voice input by the user through the acquired voice ripple. The verification result is determined, and the verification result is sent to the media control interaction module 221.

Specifically, the speaker verification module 220 may include: a control module 222, a speaker verification processing engine 223, a voiceprint acquisition module 224, and a speaker voice receiving module 225.

The control module 222 is used to control the process interaction and state machine of the call during the business process. In this embodiment, the method relates to the speaker verification processing engine 223 according to the speaker verification from the media control interaction module 221, and the control speaker voice receiving module 225 receiving the voice input from the user, and transmitting the voice input to the speaker. The person verification processing engine 223, and the control speaker verification processing engine 223 performs speaker verification based on the received voice ripple and user voice input, and receives the verification result information returned by the speaker verification processing engine 223, and transmits the result information. The media control interaction module 221 is provided.

The speaker verification processing engine 223 is configured to receive the voiceprint from the voiceprint acquisition module 224 and the user voice input from the speaker voice receiving module 225, that is, the voice data of the speaker, and obtain the voice ripple and reception according to the acoustic characteristics. The user voice input is compared to generate verification result information, and the verification result information is sent to the control module 222. The voiceprint acquisition module 224 is configured to acquire a corresponding voice ripple according to the control of the control module 222, and transmit the acquired voice ripple to the speaker verification processing engine 223. The voice verification information in the corresponding speaker verification command is included in the above speaker verification request, and then the control module 222 sends an acquisition voice ripple command including the voice ripple information to the voiceprint acquisition module 224, and the voiceprint acquisition module 224 is based on the sound. The voice ripple path and the voice ripple name in the ripple information are acquired to the corresponding address to obtain the corresponding voice ripple, and then the acquired voice ripple is transmitted to the speaker verification processing engine 223.

The speaker voice receiving module 225 is configured to receive the voice input of the user according to the command of the control module 222, and transmit the received voice to the speaker verification processing engine 223.

The media resource control entity in this embodiment may be an MGC, and the corresponding media resource processing entity is an MG. The media resource control entity may also be an MRFC, and the corresponding media resource processing entity is an MRFP.

The network architecture on which the embodiment is based will be described below by way of an example. As shown in FIG. 4, the network architecture applied in the IP Multimedia Subsystem (IMS) network, including AS, S-CSCF, MRFC, and MRFP, is received by the S-CSCF through the S-CSCF. The speaker verification command of the AS, according to the command, instructs the MRFP to acquire and perform verification of the corresponding voiceprint and user voice input, and return the verification result information to the S-CSCF. Of course, other entities are also included in the IMS network, but since they are not related to the embodiment of the present invention, they are not shown here.

The overall process of a specific embodiment of the method for implementing speaker verification according to the present invention is shown in FIG. 5, which mainly includes the following steps:

Step 501: The media resource control entity instructs the media resource processing entity to perform speaker verification processing;

Step 502: The media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity, and performs verification, and reports the verification result to the media resource control entity. The foregoing media resource control entity and media resource processing entity may be an MGC and an MG, or an MRFC and an MRFP. The present embodiment is applied to the IMS network architecture as an example. The media resource control entity and the media resource processing entity are MRFC and MRFP, and the embodiment is described in detail. As shown in FIG. 6, when the embodiment is applied to an IMS network architecture, the following steps are specifically included:

Step 601: After the bearer channel between the MRFC and the MRFP is established, the MRFC receives a speaker verification command from the S-CSCF, where the command includes the speaker's voice ripple information. The voice ripple information here is the path and name corresponding to the user's voice ripple. The speaker verification command sent by the S-CSCF to the MRFC is usually from the AS. In addition, the AS can also send the speaker risk command directly to the MRFC.

The MRFC may specifically include a speaker verification and service information interaction module, a control module, and a media control interaction module. The step specifically includes: the speaker verification and service information interaction module receives the service information from the AS directly or through the S-CSCF, that is, the speaker verification command including the speaker voice ripple information, and transmits the speaker verification command to the control. Module.

Step 602: The MRFC converts the received speaker verification command into a speaker verification request, and converts the request into an H.248 message format and sends it to the MRFP through the Mp interface between the MRFP and the MRFP.

Specifically, in this step, the control module in the MRFC may generate a speaker verification request according to the received speaker verification command, and transmit the verification request to the media control interaction module, and the media control interaction module converts the verification request into H. The format supported by the .248 protocol is sent to the MRFP.

The speaker verification request in this step may adopt a Mod.req message in the H.248 protocol, including information of the service session endpoint T1, and the value of the signal is speaker verification. In addition, the request may specifically include one or more of the following information:

1) the voiceprint identifier; that is, the voice ripple information, including the path and name of the voice ripple, where the sound The path of the ripple storage may be a local server path or a network server path; the voiceprint name adopts a string, but the voiceprint name is required to be unique under the specified path;

2) the score threshold; the value range is 0 - 100, which is used by the MRFP to confirm whether the speaker verification is successful according to the threshold, and the verification result is successful if the matching result score is greater than or equal to this value;

3) initial prompt tone; when the prompt tone played before the user verification is started, the user performs voice input according to the prompt tone;

4) verifying the success tone; when the matching result score is greater than or equal to the score threshold, playing a successful prompt tone;

5) verifying the failure prompt tone; when the matching result score is less than the score threshold, playing the failure prompt tone;

6) The maximum number of times the input is allowed to be prompted; when the user's voice is not received, the maximum number of times the user is prompted to perform voice input by the initial prompt tone;

7) The maximum duration of the timer waiting for the speaker's voice input; this parameter is used to indicate the maximum time to wait for the voice input, and the timeout is processed according to the verification failure;

8) Voice input end detection button; the user can end the voice input by pressing the button, and the user can be notified of the specific button by the initial prompt tone.

The information included in the speaker verification request described above is generated based on the corresponding information set in the MRFC in addition to the voiceprint identification from the speaker verification command.

Step 603: After receiving the speaker verification request, the MRFP returns a Mod.resp message to the MRFC including the information of the endpoint T1.

Specifically, the MRFP may include a media control interaction module, a control module, a speaker verification processing engine, a voice ripple acquisition module, and a speaker voice receiving module. In this step, the media control interaction module in the MRFP receives the speaker verification request in the H.248 protocol format, and converts the request into information that can be recognized by the control module in the MRFP, and then transmits the information to the control module; The control module generates a Mod.resp message based on the received request. Responsive, and sent to the MRFC through the media control interaction module of the MRFP.

Step 604: The MRFP performs verification processing, specifically, acquiring a corresponding voice ripple according to the voice ripple path and the voice ripple name in the voiceprint identifier in the speaker verification request, and passing the Not.req message in step 605 if the voice ripple is not obtained. Returning the verification failure information to the MRFC or returning the error information; if the voice ripple is obtained, the initial prompt tone is played to the user and waiting for the user input, and after receiving the user voice input, the MRFP performs the voice input by the user and the acquired voice ripple sample. The verification result score is obtained. If the verification result score is greater than or equal to the score threshold, the verification success prompt tone is sent to the user, and in step 605, the verification success information is returned to the MRFC through the Not.req message; if the verification result score is less than the score threshold, then A verification failure prompt tone is sent to the user, and in step 605, the information of the verification failure is returned to the MRFC through the Not.req message.

In this step, the control module verifies the processing engine according to the speaker from the media control interaction module, and controls the speaker voice receiving module to play the initial prompt tone to the user, and receives the voice input from the user; Before the maximum duration of the human voice input timer, the speaker voice receiving module receives the voice input of the user, and after receiving the voice input end detection button, or the preset input duration arrives, the voice input is transmitted to the speaker. Verifying the processing engine; if the voice input of the user has not been received after the maximum duration of the timer for waiting for the speaker voice input, the speaker verification processing engine is notified to determine that the verification fails; the control module also controls the speaker verification processing engine The speaker verification is performed according to the received voice ripple and the user voice input, and the speaker verification processing compares the acquired voice ripple with the received user voice input according to the acoustic characteristics, thereby generating a matching result score, and the score is compared with the speech. Person test Request score threshold comparison, if the result of the matching score is greater than or equal to the threshold score, it is determined that the verification is successful, the receiving module through the speaker's voice played to the user authentication is successful tone; otherwise, determines that the verification fails, The verification failure prompt tone is played to the user through the speaker voice receiving module; the verification result of the verification success or the verification failure by the speaker verification processing engine, or other information further included, such as the matching result score, the duration of the user input voice, and the voiceprint identifier The verification result information is sent to the control module, and the control module transmits the information to the media control interaction module. If each module involved in this step is a module in the MRFP.

Step 605: The MRFP returns the risk certificate result information to the MRFC, and the verification result information may be carried by the Not.req message. The verification result information here includes at least information on whether the verification is successful, and may further include one or more of the following information:

1) The matching value, that is, the matching result score, the score value can be between 0 - 100, 100 matches best, 0 matches the worst;

2) the length of time the voice is input;

3) Sound ripple logo.

The media control interaction module in the MRFP converts the received verification result information into a format supported by the H.248 protocol and sends it to the MRFC.

Step 606: After receiving the verification result information, the MRFC returns a Not.resp message to the MRFP to respond.

The step of the step includes: the media control interaction module in the MRFC receives the verification result information from the MRFP, and performs protocol conversion processing on the information that can be recognized by the control module in the MRFC, and then sends the information to the control module, and after receiving the information, the control module receives the information. The media control interaction module returns a response to the MRFP returning the Not.resp message in the H.248 protocol format.

Step 607: The MRFC converts the received verification result information into a message supported by the Mr interface between the S-CSCF and sends the message to the S-CSCF, and the S-CSCF sends the verification result information to the AS. Of course, here the MRFC can also send the verification result information directly to the AS without passing through the S-CSCF.

The verification result from the media control interaction module received by the control module in the MRFC The message is transmitted to the speaker verification and service information interaction module, and the speaker verification and service information interaction module returns the verification result letter to the AS directly or through the S-CSCF according to the verification result information, which can be seen by the solution described in the above specific embodiment. The media resource control entity instructs the media resource processing entity to perform the speaker verification process, and the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and performs verification, and reports the verification result to the media resource control. Entity, without changing the existing network architecture and protocol structure of bearer control separation, realizes speaker verification technology and reduces the cost of network upgrade.

In addition, the embodiment of the present invention provides a media resource control entity in the existing network architecture to support speaker verification by providing a media resource control entity including a speaker verification and service information interaction module, a control module, and a media control interaction module. deal with.

And by providing a media resource processing entity including a media control interaction module and a speaker verification module, the media resource processing entity in the existing network architecture can support speaker-risk processing.

The above is a description of specific embodiments of the present invention, and the method of the present invention can be appropriately modified in a specific implementation process to suit the specific needs of a specific situation. Therefore, it is to be understood that the specific embodiments of the present invention are merely exemplary and are not intended to limit the scope of the invention.

Claims

Claim

A speaker verification method, the method comprising:

The method according to claim 1, wherein the media resource control entity instructs the media resource processing entity to perform a speaker verification process, including:

The media resource control entity transmits a verification request including the speaker voice ripple information to the media resource processing entity based on the received speaker verification command including the speaker voice ripple information.

The method according to claim 2, wherein the media resource processing entity receives the voice input of the speaker and performs verification, and reports the verification result to the media resource control entity, including:

The media resource processing entity acquires the voice ripple corresponding to the voice ripple information according to the verification request, and verifies the user voice input received according to the verification request by the voice ripple, and returns the verification result to the media resource control entity.

The method according to any one of claims 2 or 3, wherein the sound wave pattern information comprises: a voice ripple path and a voice ripple name.

The method according to claim 3, wherein the verification request further includes the following parameters: a threshold for the media resource processing entity to determine whether the speaker verification is successful, an initial prompt tone, and a verification success prompt tone. , the verification failure prompt tone, the maximum number of times the input is allowed to be prompted, the maximum duration of the timer waiting for the speaker's voice input, or the voice input end detection button; or further including any combination of the above parameters;

Correspondingly, the media resource processing entity inputs and coordinates according to the threshold and the user voice The result of the voice ripple matching determines whether the speaker verification is successful, or prompts the user to perform the verification voice input according to the initial prompt sound parameter before the verification, or according to the verification success prompt sound parameter, after the verification is passed, the verification success prompts Play, or according to the verification failure prompt tone, after the verification failure, the verification failure prompts the playback, or according to the maximum number of the no-input permission prompts, after detecting that the user has no voice input, the prompting playback is performed multiple times. Or determining a voice input waiting for the user according to the maximum duration of the timer waiting for the speaker voice input, determining the verification failure according to the voice input that does not receive the user within the maximum duration, or according to the received user input. The input end detection button determines the end of the user's voice input, or performs any combination of the various steps described above.

6. The method of claim 3, wherein the verification result comprises: verifying whether the verification is successful.

The method according to claim 6, wherein the verification result further comprises: a matching value, a duration of the input voice or voice ripple information, or any combination of the above.

The method according to claim 3, wherein if the media resource processing entity does not acquire the voice ripple corresponding to the voice ripple information according to the verification request, the media resource processing entity sends a media resource control entity to the media resource control entity. Report the error message and end the process.

9. A speaker verification system, characterized in that the system comprises:

The media resource processing entity is configured to receive a voice input of the speaker and perform verification, and report the verification result to the media resource control entity.

10. The system according to claim 9, wherein the system further comprises: a service server, configured to send a speaker verification command to the media resource control entity, and receive a verification result returned by the media resource control entity;

The media resource control entity receives the speaker verification command, and generates a corresponding speaker verification request according to the verification command, and sends the verification result to the media resource processing entity, and further reports the received verification result from the media resource processing entity to the service. server.

The system according to claim 10, wherein the system further comprises: a service call session control function entity, connected between the application server and the media resource control entity, for receiving a speaker from the application server And verifying the command, and sending the command to the media resource control entity, and receiving the verification result from the media resource control entity, and sending the verification result to the application server.

The system according to any one of claims 9 to 11, wherein the media resource control entity is a media resource controller, the media resource processing entity is a media resource processor; or the media resource control The entity is a media gateway controller, and the media resource processing entity is a media gateway.

A media resource control entity, wherein the media resource control entity includes:

a speaker verification and service information interaction module, configured to receive a speaker verification command, and transmit the verification command to the control module, and return the verification result from the control module to the device that sends the speaker verification command;

a control module, configured to generate a speaker verification request according to a speaker verification command from the speaker verification and business information interaction module, and transmit the verification request to the media control interaction module; and receive a speaker verification result from the media control interaction module And passing the verification result to the speaker verification and business information interaction module;

a media control interaction module, configured to receive a speaker verification request from the control module, and convert the request into a format supported by the media control protocol and send the request to the media resource processing entity; And receiving the verification result of the media control protocol support format from the media resource processing entity, and performing protocol conversion processing on the information that can be recognized by the control module, and then sending the information to the control module.

The media resource control entity according to claim 13, wherein the media resource control entity is a media resource controller or a media gateway controller.

15. A media resource processing entity, wherein the media resource processing entity comprises:

a media control interaction module, configured to receive a speaker verification request in the form of a media control protocol from the media resource control entity, and convert the speaker verification request into information that can be recognized by the speaker verification module, and send the information to the speaker verification module. And receiving the verification result from the speaker verification module, and converting the verification result into a media control protocol message and sending the message to the media resource control entity;

The media resource processing entity according to claim 15, wherein the speaker verification module comprises:

a control module, configured to notify the voice ripple acquisition module to acquire a corresponding voice ripple according to the speaker verification request from the media control interaction module, and transmit the acquired voice ripple to the speaker verification processing engine, and control the speaker voice receiving module to receive the User's voice input, and the voice input is transmitted to the speaker verification processing engine, and the speaker verification processing engine is controlled to perform speaker verification according to the received voice ripple and user voice input, and receives the speaker verification processing engine verification and returns The verification result is transmitted to the media control interaction module;

a speaker verification processing engine for receiving sound waves from the voiceprint acquisition module and User voice input from the speaker voice receiving module, and comparing the acquired voiceprint with the received user voice input according to the acoustic characteristics, generating a verification result, and transmitting the verification result to the control module;

a voice ripple acquisition module, configured to acquire a corresponding voice ripple according to control of the control module, and transmit the acquired voice ripple to the speaker verification processing engine;

The speaker voice receiving module is configured to receive the voice input of the user according to the command of the control module, and transmit the received voice to the speaker verification processing engine.

The media resource processing entity according to claim 15 or 16, wherein the media resource processing entity is: a media resource processor or a media gateway.