CN101192925A

CN101192925A - Speaker validation method and system and media resource control entity and processing entity

Info

Publication number: CN101192925A
Application number: CNA2006101403081A
Authority: CN
Inventors: 许志勇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2006-11-20
Filing date: 2006-11-20
Publication date: 2008-06-04
Also published as: WO2008061463A1

Abstract

The invention discloses a speaker verification method which includes that: A. a media resource control entity instructs a media resource processing entity to process the speaker verification; and B. the media resource processing entity receives voice-input of the speaker, which is verified, and verified results are reported to the media resource control entity. The invention further discloses a speaker verification system, the media resource control entity and the media resource processing entity. The invention realizes the speaker verification technology without changing the existing network frameworks with separation bearing and control through the proposal.

Description

Speaker verification method and system, media resource control entity and processing entity

Technical Field

The invention relates to the technical field of communication, in particular to a method for verifying a speaker, a system for verifying the speaker, a media resource control entity and a media resource processing entity.

Background

The speaker verification technology is a technology for performing feature matching with a sampled voice sample by analyzing voice features of a speaker and performing identity verification according to a matching result. The technique verifies the person speaking by analyzing a unique characteristic of the speech sample, such as the frequency of the utterance. Speaker verification techniques allow a person to control access to restricted areas by speaking voice, accessing identity sensitive data such as: telephone banking, database services, voice mail, etc. The voice sample, called voiceprint, is a sample of voice that is used as a validation criteria, typically a directly recorded user voice.

The speech verification (speech hsc) working group in the current Internet Engineering Task Force (IETF) defines the application network architecture of Speaker verification (Speaker identifier/Speaker Verified, SI/SV). The IETF mentions the Application scenario and proposed architecture of the technology in RFC4313, AS shown in fig. 1, wherein an Application Server (AS) is proposed to control services, a media processing unit is adopted at a lower layer to support media processing functions and media interaction with a client, a speaker verification command from the Application Server is received through an additional SI/SV Server, and a voice sent by a media resource processing entity is received to perform speaker verification, and then a verification result is reported to the AS. Wherein, the SI/SV server performs control signaling interaction related to speaker verification/authentication with AS and media processing unit through a specially defined SPECECHSC protocol.

In a system with separated media control and bearer, a media resource control entity and a media resource processing entity generally interact with an AS to implement a media resource service. If the speaker verification technology needs to be implemented in a system with separation of media control and bearer, according to the network architecture of speaker verification, the SI/SV server needs to be installed in the system with separation of media control and bearer, and the existing protocol structure needs to be changed, which greatly changes the existing network and also causes high cost of network upgrade.

Disclosure of Invention

In view of the above, the main objective of the present invention is to provide a method and a system for speaker verification, a media resource control entity and a media resource processing entity, which can implement speaker verification without changing the network architecture of the media control and bearer separation system.

In order to achieve the above object, the present invention provides a speaker verification method, comprising:

A. the media resource control entity indicates the media resource processing entity to carry out speaker verification processing;

B. the media resource processing entity receives the voice input of the speaker and verifies the voice input, and reports the verification result to the media resource control entity.

The invention also provides a speaker verification system, which comprises a media resource control entity and a media resource processing entity;

the media resource control entity is used for indicating the media resource processing entity to carry out speaker verification processing and receiving a verification result from the media resource processing entity;

the media resource processing entity is used for receiving the voice input of the speaker, verifying the voice input and reporting the verification result to the media resource control entity.

The invention also provides a media resource control entity, which comprises: the speaker verification and service information interaction module, the control module and the media control interaction module; wherein,

the speaker verification and service information interaction module is used for receiving a speaker verification command, transmitting the verification command to the control module and returning a verification result from the control module to the equipment for sending the speaker verification command;

the control module is used for generating a speaker verification request according to a speaker verification command from the speaker verification and service information interaction module and transmitting the verification request to the media control interaction module; receiving a speaker verification result from the media control interaction module, and transmitting the verification result to the speaker verification and service information interaction module;

the media control interaction module is used for receiving a speaker verification request from the control module, converting the request into a format supported by a media control protocol and sending the format to the media resource processing entity; and receiving a verification result of a media control protocol support format from the media resource processing entity, performing protocol conversion processing to obtain information which can be identified by the control module, and then sending the information to the control module.

The invention also provides a media resource processing entity, which comprises: the media control interaction module and the speaker verification module; wherein,

the media control interaction module is used for receiving a speaker verification request in a media control protocol form from the media resource control entity, converting the speaker verification request into information which can be identified by the speaker verification module and then sending the information to the speaker verification module, receiving a verification result from the speaker verification module, converting the verification result into a media control protocol and sending the media control protocol to the media resource control entity;

the speaker verification module is used for acquiring a corresponding voiceprint according to a speaker verification request from the media control interaction module, receiving the voice input of a user, verifying the received voice input by the user through the acquired voiceprint to determine a verification result, and sending the verification result to the media control interaction module.

According to the scheme, the media resource control entity indicates the media resource processing entity to perform speaker verification processing, the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and performs verification, and the verification result is reported to the media resource control entity, so that the speaker verification technology is realized without changing the existing network architecture and protocol structure with separated bearing control, and the cost of network upgrading is reduced.

Drawings

FIG. 1 is a diagram of a prior art network architecture for implementing speaker verification techniques;

FIG. 2 is a schematic diagram of a system for implementing speaker verification techniques in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of the internal components of a media resource control entity and a media resource processing entity in an embodiment of the present invention;

FIG. 4 is a schematic diagram of an exemplary network configuration in an embodiment of the present invention;

FIG. 5 is a general flow chart of a speaker verification method in accordance with an embodiment of the present invention;

FIG. 6 is a flowchart illustrating an exemplary speaker verification method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.

The main idea of the invention is that the media resource control entity indicates the media resource processing entity to perform speaker verification processing; the media resource processing entity receives the voice input of the speaker and verifies the voice input, and reports the verification result to the media resource control entity, thereby realizing the speaker verification technology.

The media resource control entity can send a verification request comprising the voice ripple information of the speaker to the media resource processing entity according to the verification command after receiving the speaker verification command comprising the voice ripple information of the speaker; the media resource processing entity obtains the voiceprint corresponding to the voiceprint information, verifies the user voice input received according to the verification request through the voiceprint, and returns the verification result to the media resource control entity.

The voiceprint information may include a voiceprint path and a voiceprint name, and the media resource processing entity may obtain the corresponding voiceprint according to the information. The voiceprint path can be a local path or can be an accessible web server path. The voiceprint name can be in the form of a string, which uniquely corresponds to a voiceprint file under the voiceprint path described above.

The Media resource control entity may be a Media resource Controller (MRFC) or a Media Gateway Controller (MGC), and correspondingly, the Media resource processing entity is a Media resource Processor (MRFP) or a Media Gateway (MG).

The present invention will be described in detail below with reference to specific examples.

An embodiment of a system for implementing speaker verification according to the present invention is shown in fig. 2 and includes a media resource control entity 21 and a media resource processing entity 22.

The media resource control entity 21 is configured to instruct the media resource processing entity 22 to perform speaker verification processing. Specifically, media resource control entity 21 may receive a speaker verification command including speaker voiceprint information from an AS, or from another device for sending a media resource service request, and send a speaker verification request including the speaker voiceprint information to media resource processing entity 22 according to the verification command.

The media resource processing entity 22 is configured to receive a voice input of a speaker according to an instruction of the media resource control entity 21, perform voice verification processing, and report a verification result to the media resource control entity 21. Specifically, media resource processing entity 22 receives the speaker verification request from media resource control entity 21, and obtains the corresponding voiceprint according to the voiceprint information of the speaker in the request.

As shown in fig. 3, the media resource control entity 21 further includes a speaker verification and service information interaction module 211, a control module 212, and a media control interaction module 213.

The speaker verification and Service information interaction module 211 is configured to receive Service information, i.e., a speaker verification command including voice ripple information of a speaker, from an application server or other device for sending a media resource Service request, such as a Service-call session Control Function (S-CSCF), transmit the verification command to the Control module 212, and return verification result information from the Control module 212 to the application server or other device for sending the media resource Service request. Of course, the speaker verification and service information interaction module 211 is also used for the control module 212 to interact with other service information with the application server or other devices mentioned above.

The control module 212 is used for controlling the call flow interaction and state machine in the service processing process, and in this embodiment, specifically, generates a speaker verification request according to the speaker verification command from the speaker verification and service information interaction module 211, and transmits the verification request to the media control interaction module 213; and receives the speaker verification result information from the media control interaction module 213 and transfers the verification result information to the speaker verification and service information interaction module 211.

The media control interaction module 213 is configured to receive the information from the control module 212, convert the information into a media control protocol message, such as an h.248 protocol message, and send the message to the media resource processing entity 22; and receives information from media asset processing entity 22 and converts it into information that control module 212 can verify and transmits it to control module 212. In the embodiment, the present invention specifically relates to receiving a speaker verification request from the control module 212, and converting the request into a media control protocol, for example, a format supported by h.248, and sending the media control protocol to the media control interaction module 221 in the media resource processing entity 22; and receives the verification result information of the media control protocol support format from the media control interaction module 223 in the media resource processing entity 22, and sends the verification result information to the control module 212 after performing protocol conversion processing to obtain information that can be verified by the control module 212. The following description will be given taking the example in which the media control protocol is the h.248 protocol.

As shown in fig. 3, the media resource processing entity 22 further includes a speaker verification module 220 and a media control interaction module 221.

The media control interaction module 221 is configured to receive information in an h.248 protocol format from the media resource control entity, convert the information into information that can be verified by the speaker verification module 220, and transmit the information to the speaker verification module 220; and converts the information from the speaker verification module 220 into the format of the h.248 protocol and sends the converted information to the media resource control entity. In the embodiment, the method specifically involves receiving a speaker verification request in the h.248 protocol format from the media control interaction module 213 of the media resource control entity 21, converting the speaker verification request into information that can be verified by the speaker verification module 220, and transmitting the information to the speaker verification module 220; and receives the verification result information from the speaker verification module 220, converts the information into a format supported by the h.248 protocol, and sends the format to the media control interaction module 213 in the media resource control entity 21.

The speaker verification module 220 is configured to obtain a corresponding voiceprint and receive a voice input of a user according to the received speaker verification request information from the media control interaction module, verify the received voice input by the user through the obtained voiceprint to determine a verification result, and send the verification result to the media control interaction module 221.

Specifically, the speaker verification module 220 may include: a control module 222, a speaker verification processing engine 223, a voiceprint acquisition module 224, and a speaker voice reception module 225.

The control module 222 is used for controlling the flow interaction and state machine of the call in the service processing process. In this embodiment, the voiceprint acquisition module 224 is specifically notified to acquire a corresponding voiceprint and transmit the acquired voiceprint to the speaker verification processing engine 223 according to a speaker verification request from the media control interaction module 221, the speaker voice receiving module 225 is controlled to receive a voice input from a user and transmit the voice input to the speaker verification processing engine 223, and the speaker verification processing engine 223 is controlled to perform speaker verification according to the received voiceprint and the voice input of the user, receive verification result information returned after verification by the speaker verification processing engine 223, and transmit the result information to the media control interaction module 221.

The speaker verification processing engine 223 is configured to receive the voiceprint from the voiceprint acquisition module 224 and the user voice input, i.e., the voice data of the speaker, from the speaker voice receiving module 225, compare the acquired voiceprint with the received user voice input according to the acoustic characteristics, thereby generating verification result information, and send the verification result information to the control module 222.

Voiceprint acquisition module 224 is configured to acquire a corresponding voiceprint according to the control of control module 222 and transmit the acquired voiceprint to speaker verification processing engine 223. If the above-mentioned speaker verification requests all include voiceprint information in the corresponding speaker verification command, the control module 222 sends a voiceprint acquisition command including the voiceprint information to the voiceprint acquisition module 224, and the voiceprint acquisition module 224 acquires the corresponding voiceprint according to the voiceprint path and the voiceprint name in the voiceprint information to the corresponding address, and then transmits the acquired voiceprint to the speaker verification processing engine 223.

A speaker voice receiving module 225 for receiving a voice input of the user according to a command of the control module 222 and transmitting the received voice to the speaker verification processing engine 223.

The media resource control entity in this embodiment may be an MGC, and the corresponding media resource processing entity is an MG; the media resource control entity may also be MRFC, and the corresponding media resource processing entity is MRFP.

The following describes a network architecture based on this embodiment by way of an example. AS shown in fig. 4, for a network architecture applied to an IP Multimedia Subsystem (IMS) network in this embodiment, the network architecture includes an AS, an S-CSCF, an MRFC, and an MRFP, where the MRFC receives a speaker verification command from the AS through the S-CSCF, instructs the MRFP to obtain a corresponding voiceprint and a user voice input according to the command, performs verification, and returns verification result information to the S-CSCF. Of course, other entities may be included in the IMS network, but are not shown here since they are not germane to embodiments of the present invention.

The general flow of the specific embodiment of the method for realizing speaker verification provided by the invention is shown in fig. 5, and mainly comprises the following steps:

step 501, the media resource control entity instructs the media resource processing entity to perform speaker verification processing;

step 502, the media resource processing entity receives the voice input of the speaker according to the indication of the media resource control entity and verifies the voice input, and reports the verification result to the media resource control entity.

The media resource control entity and the media resource processing entity may be MGC and MG, or MRFC and MRFP. The following describes the present embodiment in detail by taking an example that the present embodiment is applied to an IMS network architecture, that is, the media resource control entity and the media resource processing entity are MRFC and MRFP. As shown in fig. 6, when the present embodiment is applied to an IMS network architecture, the method specifically includes the following steps:

step 601, after the bearing channel between the MRFC and the MRFP is established, the MRFC receives a speaker verification command from the S-CSCF, and the command comprises voiceprint information of a speaker. The voiceprint information here is the path and name of the voiceprint corresponding to the user. The speaker verification command sent by the S-CSCF to the MRFC is usually from the AS, and in addition, the AS may send the speaker verification command directly to the MRFC.

The MRFC may specifically include a speaker verification and service information interaction module, a control module, and a media control interaction module. The method specifically comprises the following steps: the speaker verification and service information interaction module receives service information from the AS directly or through the S-CSCF, namely a speaker verification command comprising speaker voice ripple information, and transmits the speaker verification command to the control module.

Step 602, the MRFC converts the received speaker verification command into a speaker verification request, and converts the request into an h.248 message format to be sent to the MRFP through the Mp interface between the MRFP and the MRFP.

Specifically, in this step, the control module in the MRFC may generate a speaker verification request according to the received speaker verification command, and transmit the verification request to the media control interaction module, and the media control interaction module converts the verification request into a format supported by the h.248 protocol and transmits the format to the MRFP.

The speaker verification request in this step may adopt a mod.req message in the h.248 protocol, which includes information of the service session endpoint T1, and the value of the signal is speaker verification. In addition, the request may specifically include one or more of the following information:

1) a voiceprint identification; the voiceprint information comprises a path and a name of the voiceprint, wherein the path stored by the voiceprint can be a local server path or a network server path; the voiceprint name adopts a character string, but the voiceprint name is required to have uniqueness under a specified path;

2) a score threshold; the value range is 0-100, and the MRFP is used for confirming whether the speaker verification is successful according to the threshold, and when the matching result score is greater than or equal to the value, the verification is considered to be successful;

3) an initial prompt tone; when the prompt tone played before user authentication is started, the user performs voice input according to the prompt tone;

4) verifying a successful prompt tone; when the matching result score is greater than or equal to the score threshold value, a success prompt tone is played;

5) a verification failure prompt tone; when the matching result score is smaller than the score threshold value, playing a failure prompt tone;

6) maximum number of no input allowed prompts; when the voice of the user is not received, prompting the user to carry out the maximum number of times of voice input through the initial prompt tone;

7) waiting for the maximum time length of a timer for the voice input of a speaker; the parameter is used for indicating the maximum time for waiting for voice input, and if the time is out, processing is carried out according to verification failure;

8) a voice input end detection key; the user can end the voice input by pressing a key, and can be informed of the specific key by an initial prompt tone.

The speaker verification request includes information generated based on corresponding information preset in the MRFC, except that the voiceprint identifier is from the speaker verification command.

Step 603, after receiving the speaker verification request, MRFP returns mod.resp message including the information of the endpoint T1 to MRFC.

The MRFP may specifically include a media control interaction module, a control module, a speaker verification processing engine, a voiceprint acquisition module, and a speaker voice reception module. In this step, the media control interaction module in the MRFP receives the speaker verification request in the h.248 protocol format, converts the request into information that can be identified by the control module in the MRFP, and transmits the information to the control module; and the control module of the MRFP generates a Mod.resp message as a response according to the received request and sends the Mod.resp message to the MRFC through the media control interaction module of the MRFP.

Step 604, the MRFP performs verification processing, specifically including obtaining a corresponding voiceprint according to a voiceprint path and a voiceprint name in a voiceprint identifier in the speaker verification request, and if a voiceprint is not obtained, returning verification failure information or returning error information to the MRFC through a not. If the voiceprint is obtained, an initial prompt tone is played to the user and the user input is waited for, after the user voice input is received, the MRFP compares the voice input by the user with the obtained voiceprint sample to obtain a verification result score, if the verification result score is more than or equal to a score threshold value, a verification success prompt tone is sent to the user, and in step 605, information of successful verification is returned to the MRFC through not. If the verification result score is less than the score threshold, a verification failure prompt tone is sent to the user, and information of verification failure is returned to the MRFC through a not.

In the step, the control module informs the voiceprint acquisition module to acquire the corresponding voiceprint according to the speaker verification request from the media control interaction module, transmits the acquired voiceprint to the speaker verification processing engine, and controls the speaker voice receiving module to play an initial prompt voice to the user and receive voice input from the user; if the speaker voice receiving module receives the voice input of the user before the maximum time length of the timer waiting for the voice input of the speaker, the voice input is transmitted to the speaker verification processing engine after receiving the voice input end detection key or the preset input time length; if the voice input of the user is not received after the maximum time length of the timer waiting for the voice input of the speaker is reached, informing a speaker verification processing engine to determine that the verification fails; the control module also controls the speaker verification processing engine to perform speaker verification according to the voiceprint received by the speaker verification processing engine and the user voice input, the speaker verification processing engine compares the obtained voiceprint with the received user voice input according to the acoustic characteristics so as to generate a matching result score, the score is compared with a score threshold value in a speaker verification request, if the matching result score is greater than or equal to the score threshold value, verification success is determined, and a prompt tone for successful verification is played to the user through the speaker voice receiving module; otherwise, determining that the verification fails, and playing a prompt tone for the user through the speaker voice receiving module; the speaker verification processing engine sends the verification result of successful verification or failed verification, or other further included information, such as matching result score, duration of voice input by the user, voiceprint identification and other verification result information to the control module, and the control module transmits the information to the media control interaction module. If all the modules involved in the step are modules in the MRFP.

Step 605, the MRFP returns the verification result information to the MRFC, and specifically, the verification result information may be carried by the not. The verification result information at least includes information on whether the verification is successful, and may further include one or more of the following information:

1) the matching value, namely the matching result score, can be 0-100, wherein 100 matches are the best and 0 matches are the worst;

2) inputting the duration of the voice;

3) and (4) carrying out voiceprint identification.

And the media control interaction module in the MRFP converts the received verification result information into a format supported by the H.248 protocol and sends the format to the MRFC.

Step 606, after receiving the verification result information, the MRFC returns a not.

The step specifically includes that a media control interaction module in the MRFC receives the verification result information from the MRFP, performs protocol conversion processing to obtain information that can be identified by a control module in the MRFC, and sends the information to the control module, and after receiving the information, the control module returns a not.resp message in an h.248 protocol format to the MRFP through the media control interaction module to respond.

Step 607, MRFC converts the received verification result information into a message supported by the Mr interface between itself and S-CSCF, and sends it to S-CSCF, which then sends the verification result information to AS. Of course, the MRFC may send the authentication result information directly to the AS without passing through the S-CSCF.

The control module in MRFC transfers the received verification result information from the media control interaction module to the speaker verification and service information interaction module, and the speaker verification and service information interaction module returns the verification result information to AS directly or through S-CSCF according to the verification result information.

It can be seen from the above-mentioned solutions described in the embodiments that, in the present invention, the media resource control entity instructs the media resource processing entity to perform speaker verification processing, and the media resource processing entity receives and verifies the voice input of the speaker according to the instruction of the media resource control entity, and reports the verification result to the media resource control entity, without changing the existing network architecture and protocol structure with separated bearer control, the speaker verification technology is implemented, and the cost of network upgrade is reduced.

In addition, the invention realizes that the media resource control entity in the existing network architecture can support speaker verification processing by providing the media resource control entity comprising the speaker verification and service information interaction module, the control module and the media control interaction module.

And by providing a media resource processing entity comprising a media control interaction module and a speaker verification module, the media resource processing entity in the existing network architecture can support speaker verification processing.

The foregoing is a description of specific embodiments of the invention, and the method of the invention may be modified, as appropriate, during the course of particular implementations to suit the particular needs of particular situations. It is therefore to be understood that the particular embodiments in accordance with the invention are illustrative only and are not intended to limit the scope of the invention.

Claims

1. A method for speaker verification, the method comprising:

2. The method of claim 1, wherein step a comprises:

the media resource control entity sends a verification request comprising the voice ripple information of the speaker to the media resource processing entity according to the received speaker verification command comprising the voice ripple information of the speaker.

3. The method of claim 2, wherein step B comprises:

the media resource processing entity obtains the voiceprint corresponding to the voiceprint information according to the verification request, verifies the user voice input received according to the verification request through the voiceprint, and returns the verification result to the media resource control entity.

4. A method according to any of claims 2 or 3, wherein the voiceprint information comprises a voiceprint path and a voiceprint name.

5. The method of claim 3, wherein the authentication request further comprises the following parameters: a threshold value used for the media resource processing entity to determine whether the speaker verification is successful, an initial prompt tone, a verification success prompt tone, a verification failure prompt tone, the maximum number of times of no input allowed prompt, the maximum duration of a timer waiting for the speaker voice input or a voice input end detection key; or further comprising any combination of the above parameters;

correspondingly, in the step B,

the media resource processing entity determines whether the speaker verification is successful according to the threshold and the result that the user voice input is matched with the voiceprint, or indicates the user to carry out voice verification input before verification according to the initial prompt tone parameter, or carries out verification success prompt playback after the verification passes according to the verification success prompt tone parameter, or carries out verification failure prompt playback after the verification fails according to the verification failure prompt tone, or carries out multiple prompt playback after detecting that the user has no voice input according to the maximum times of no input permission prompt, or determines to wait for the voice input of the user according to the maximum time of the timer waiting for the voice input of the speaker, determines verification failure according to the voice input of the user not received in the maximum time range, or determines to end the voice input of the user according to the voice input end detection key input end of the received user input, or performing any combination of the above steps.

6. The method of claim 3, wherein the verification result in step B comprises: and verifying whether the verification is successful.

7. The method of claim 6, wherein the verification result further comprises: a match value, a duration of an input voice, or voiceprint information, or any combination thereof.

8. The method according to claim 3, wherein if the media resource processing entity does not obtain the voiceprint corresponding to the voiceprint information according to the verification request, the media resource processing entity reports error information to a media resource control entity, and then ends the procedure.

9. A speaker verification system, the system comprising a media resource control entity and a media resource processing entity;

10. The system of claim 9, further comprising:

the service server is used for sending a speaker verification command to the media resource control entity and receiving a verification result returned by the media resource control entity;

the media resource control entity receives the speaker verification command, generates a corresponding speaker verification request according to the verification command, sends the speaker verification request to the media resource processing entity, and further reports the received verification result from the media resource processing entity to the service server.

11. The system of claim 10, further comprising: and the service call session control function entity is connected between the application server and the media resource control entity and is used for receiving the speaker verification command from the application server and sending the command to the media resource control entity, and receiving the verification result from the media resource control entity and sending the verification result to the application server.

12. The system according to any one of claims 9 to 11, wherein said media resource control entity is a media resource controller, and said media resource processing entity is a media resource processor; or the media resource control entity is a media gateway controller, and the media resource processing entity is a media gateway.

13. A medium resource control entity, comprising: the speaker verification and service information interaction module, the control module and the media control interaction module; wherein,

14. The media resource control entity of claim 13, wherein the media resource control entity is a media resource controller or a media gateway controller.

15. A media resource processing entity, comprising: the media control interaction module and the speaker verification module; wherein,

the media control interaction module is used for receiving a speaker verification request in a media control protocol form from the media resource control entity, converting the speaker verification request into information which can be identified by the speaker verification module and then sending the information to the speaker verification module, receiving a verification result from the speaker verification module, converting the verification result into a media control protocol message and sending the media control protocol message to the media resource control entity;

16. The media resource processing entity of claim 15, wherein the speaker verification module comprises: the system comprises a control module, a speaker verification processing engine, a voiceprint acquisition module and a speaker voice receiving module; wherein,

the control module is used for informing the voiceprint acquisition module to acquire a corresponding voiceprint according to a speaker verification request from the media control interaction module, transmitting the acquired voiceprint to the speaker verification processing engine, controlling the speaker voice receiving module to receive voice input from a user, transmitting the voice input to the speaker verification processing engine, controlling the speaker verification processing engine to perform speaker verification according to the received voiceprint and the voice input of the user, receiving a verification result returned after the verification of the speaker verification processing engine, and transmitting the verification result to the media control interaction module;

the speaker verification processing engine is used for receiving voiceprints from the voiceprint acquisition module and user voice input from the speaker voice receiving module, comparing the acquired voiceprints with the received user voice input according to acoustic characteristics to generate a verification result, and sending the verification result to the control module;

the voiceprint acquisition module is used for acquiring a corresponding voiceprint according to the control of the control module and transmitting the acquired voiceprint to the speaker verification processing engine;

the speaker voice receiving module is used for receiving the voice input of the user according to the command of the control module and transmitting the received voice to the speaker verification processing engine.

17. The media resource processing entity of claim 15 or 16, wherein the media resource processing entity is: a media resource processor or a media gateway.