CN103151041A - Method and system for achieving automatic speech recognition business and media server - Google Patents

Method and system for achieving automatic speech recognition business and media server Download PDF

Info

Publication number
CN103151041A
CN103151041A CN2013100321347A CN201310032134A CN103151041A CN 103151041 A CN103151041 A CN 103151041A CN 2013100321347 A CN2013100321347 A CN 2013100321347A CN 201310032134 A CN201310032134 A CN 201310032134A CN 103151041 A CN103151041 A CN 103151041A
Authority
CN
China
Prior art keywords
server
asr
media
transcoding
audio coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100321347A
Other languages
Chinese (zh)
Other versions
CN103151041B (en
Inventor
张伟
程佳佳
崔飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201310032134.7A priority Critical patent/CN103151041B/en
Publication of CN103151041A publication Critical patent/CN103151041A/en
Priority to PCT/CN2013/082219 priority patent/WO2013189430A2/en
Application granted granted Critical
Publication of CN103151041B publication Critical patent/CN103151041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Telephonic Communication Services (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method for achieving automatic speech recognition (ASR) business. The method comprises that a media server receives an access request of an application (APP) server and the conforms audio encoding and decoding types supported by the media server; the media server receives an ASR business request sent by the APP server and then applies for ASR business resources from an ASR server according to the ASR business types; and the media server performs negotiation according to the audio encoding and decoding type and the ASR server, performs decoding of a media business data pack according to the negotiated audio encoding and decoding types and sends the transcoded media business data pack to the ASR server. The invention further discloses a system for achieving the ASR business and the media server. The problem that the ASR server cannot access data of the media business data pack when the audio encoding and decoding capacity of the media server negotiated with a terminal cannot meet the ASR server can be solved, and ASR business achieving is ensured.

Description

A kind of implementation method of automatic speech recognition business, system and media server
Technical field
The present invention relates to automatic speech recognition (ASR) technology in the communications field, relate in particular to a kind of implementation method, system and media server of ASR business.
Background technology
Media server (Media Server, MS) be the autonomous device that the specialized media resource function is provided in soft exchanging system, it is also the visual plant in packet network, media processing function in basic, enhancing business is provided, and be used for all media processings relevant to audio frequency and video operations, described media processing operates and comprises: the data of Audio and Video RTP (RTP) and look, the mutual conversion of audio file.Simultaneously, media server also be used for to receive the guiding voice of the input of user by terminal dual-tone multifrequency (DTMF), play service and shows dynamic guide picture.The session initiation protocol that media server has (SIP) and MSML/MOML protocol capability make media server to complete whole conversation procedure under the control of application server (APP Server), and realization is mutual with the user's.
Media management module (MSCU) is an important module in media server, is mainly used in carrying out capability negotiation with other entities, and management, the maintenance of resource itself are provided, and controls other service resources modules and carry out complicated business.
Media store transmission of audio module (MSTU) is the service resources module in media server, is used for the voice data of storage magnanimity, and realizes the playing function of audio file.Be provided with external network interface on media store transmission of audio module, can directly pass through described external network interface receiving and sending audio data.
In prior art, being of wide application of media server mainly can be summarized as audio frequency and video and play, collect the digits and the function such as meeting.
RBT ASR be to the input audio-frequency information identify, be converted into word, and with Word message by information reporting to the user.At present, in field of telecommunications, ASR uses normally by the ASR server of special configuration and realizes, specifies the ASR server that word is sent to user side by signaling, as the terminal that sends to the user is completed the ASR business one time.
Fig. 1 is the system architecture schematic diagram of realizing the ASR business in prior art, and as shown in Figure 1, this system comprises: terminal, APP server, media server and ASR server.Method realization flow based on Fig. 1 described system comprises the steps:
Step 101: terminal is initiated call, triggers the APP server to activate the APP business;
Step 102:APP server passes through the SIP signaling to media server request ASR business;
Step 103: media server passes through the SIP signaling to ASR server request ASR resource, and controls the ASR server by media resource control protocol (MRCP) and carry out corresponding service;
Step 104: terminal sends the media business packet to the ASR server, and the ASR server reports media server with the text message that identifies.
More than at present typical ASR service groups anastomose composition and business realizing flow process.Wherein, the ASR server is the external device of media server.The APP server is just initiated request to media server when request ASR business, media server judgement present type of service, when type of service is the ASR application, media server is initiated request to the ASR server again, application resource, and the behavior of control ASR server, the ASR server is waited for the input of media information after receiving signaling, and automatically media information is identified as word, send to media server by MRCP.
But along with the expansion of service application, there is certain defect in above-mentioned existing implementation method, such as: the audio capability collection of ASR server and the audio capability collection of terminal do not mate, and will cause the ASR service fail.Because the APP server is when carrying out Session Description Protocol (SDP) negotiation with media server, media server does not also know whether current type of service is ASR, so can consult audio frequency parameter with terminal according to the limit of power of self.Under the media server during photos and sending messages (INFO) instruction, media server just can identify the ASR type of service when the APP server, at this moment, media server by terminal SDP information to ASR server application resource.But, when if the result that the audio coding decoding limit of power of ASR server and media server are consulted with terminal is not identical, such as: the audio coding decoding type that media server is consulted with terminal is the AMR form, but when the ASR server is only supported the audio format of G711, to cause the data failure of ASR server access media business packet, finally cause the ASR service fail.
Summary of the invention
In view of this, fundamental purpose of the present invention is to provide a kind of implementation method, system and media server of ASR business, in the time of can solving audio coding decoding ability that media server and terminal consult and can't satisfy the ASR server, the ASR server can't access medium business data packet data problem, guarantee the realization of ASR business.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention provides a kind of implementation method of automatic speech recognition ASR business, the method comprises:
After media server is received the request of access of APP server, determine the audio coding decoding set of types that self supports;
After media server is received the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources;
Media server is held consultation according to described audio coding decoding set of types and ASR server, and the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server.
Wherein, described media server and ASR server are held consultation, and the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server, for:
Media management module MSCU in media server sends session initiation protocol SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling;
The media business packet that voice centers interactive module MRU receiving terminal in media server is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the media store transmission of audio module MSTU in the media server;
MSCU controls the media business Packet Generation of MSTU after with transcoding to the ASR server.
Wherein, described media server is held consultation according to described audio coding decoding set of types and ASR server and is obtained the audio coding decoding type, for:
Media server sends the SIP signaling to the ASR server, after the ASR server is received the SIP signaling, judge whether the audio coding decoding type of self supporting is present in the audio coding decoding capability set of media server support, if there is the audio coding decoding type of coupling, notify media server, both sides specify the audio coding decoding type of described coupling as follow-up audio coding decoding type of the media business packet being carried out transcoding; If there is no the audio coding decoding type of coupling, finish current ASR operation flow.
In such scheme, after described media server was received the request of access of APP server, the method also comprised:
Terminal sends the request of media business packet to the APP server; The APP server sends the signaling of request of access to media server according to the request of described media business packet, media server specifies self and terminal to carry out mutual address afterwards.
Wherein, described media server carries out transcoding to the media business packet, and with the media business Packet Generation after transcoding to the ASR server, for:
MSCU notice MSTU in media server opens the NAT passage;
MSCU in media server issues the transcoding order to MRU;
MSCU in media server sets up link with the ASR server, and the input of notice ASR server wait audio frequency, and carries out audio identification;
Data in the media business packet that MRU in media server sends out terminal are carried out transcoding, and the media business packet after transcoding is sent to the receiving port of MSTU by the MRU internal orifice;
The media business packet of MSTU in media server after to transcoding carries out NAT, and sends to the ASR server.
The present invention also provides a kind of system that realizes of ASR business, and this system comprises: media server, APP server and ASR server; Wherein,
Described media server after being used for receiving the request of access of APP server, is determined the audio coding decoding set of types that self supports; After receiving the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Hold consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server;
Described APP server is used for sending request of access and ASR service request to media server;
Described ASR server is used for holding consultation with media server, and the media business packet after the transcoding sent out of receiving media server.
Further, this system also comprises terminal, after receiving the request of access of APP server for media server, sends the request of media business packet to the APP server; Accordingly,
Described APP server also is used for sending to media server according to the request of described media business packet the signaling of request of access;
Described media server after also being used for receiving the signaling of described request of access, specifies self and terminal to carry out mutual address.
Wherein, described media server is held consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server, for:
MSCU in media server sends the SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling;
The media business packet that MRU receiving terminal in media server is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the MSTU in the media server;
MSCU controls the media business Packet Generation of MSTU after with transcoding to the ASR server.
Further, described media server also comprises: MSCU, MRU and MSTU; Wherein,
Described MSCU is used for sending the SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling; Control the media business packet after MSTU sends transcoding;
Described MRU is used for the media business packet that receiving terminal is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the MSTU in the media server;
Described MSTU is used under the control of MSCU the media business Packet Generation after transcoding to the ASR server.
The present invention also provides a kind of media server, and described media server after being used for receiving the request of access of APP server, is determined the audio coding decoding set of types that self supports; After receiving the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Hold consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server.
After the implementation method of ASR business provided by the invention, system and media server, media server are received the request of access of APP server, determine the audio coding decoding set of types that self supports; After media server is received the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Media server is held consultation according to described audio coding decoding set of types and ASR server, and the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server.The present invention can determine by the negotiation of media server and ASR server the audio coding decoding type that both mate, and the media business packet after the audio coding decoding type of gained is encoded through consultation is sent to the ASR server.In described negotiations process, media server be not the audio coding decoding type supported with terminal as the audio coding decoding capability set of consulting institute's foundation, and all audio coding decoding types of supporting with media server are as the audio coding decoding capability set of consulting institute's foundation.Therefore, when the audio coding decoding capability set that the present invention can solve media server can't satisfy the ASR server, the problem of ASR server access media business packet failure, and then reached the effect that improves ASR server access media business packet success ratio, can guarantee the realization of ASR business.
Description of drawings
Fig. 1 is the system architecture schematic diagram of realizing the ASR business in prior art;
Fig. 2 is the implementation method schematic flow sheet of ASR service implementation example of the present invention;
Fig. 3 is that media server of the present invention and ASR server are held consultation, media server audio coding decoding type is through consultation carried out transcoding to the media business packet, and with the realization flow schematic diagram of the media business Packet Generation after transcoding to the embodiment of the method for ASR server;
Fig. 4 is the system architecture schematic diagram that the present invention realizes the ASR business;
Fig. 5 is the structural representation of media server embodiment of the present invention.
Embodiment
Basic thought of the present invention is: after media server is received the request of access of APP server, determine the audio coding decoding set of types that self supports; After media server is received the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Media server is held consultation according to described audio coding decoding set of types and ASR server, and the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server.
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
Fig. 2 is the implementation method schematic flow sheet of ASR service implementation example of the present invention, as shown in Figure 2, comprises the steps:
Step 201: after media server is received the request of access of APP server, determine the audio coding decoding set of types that self supports;
Be specially: the APP server sends to media server invites (INVITE) signaling to carry out media negotiation, media server is the selected audio coding decoding set of types identical with terminal from the audio coding decoding capability set of self supporting, is used for carrying out effective transmission of media business packet with terminal.This step can adopt existing techniques in realizing, no longer describes in detail herein.
Further, after described in this step, media server was received the request of access of APP server, the method also comprised: terminal sends the request of media business packet to the APP server; The APP server sends the signaling of request of access to media server according to the request of described media business packet, media server specifies self and terminal to carry out mutual address afterwards.Described mutual address is: the outer port address of MSTU.
Step 202: after media server is received the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources;
Be specially: the APP server sends the INFO instruction to media server, and media server determines that according to described INFO instruction the APP server is ASR to the type of service of self application, afterwards according to the ASR type of service to ASR server application ASR service resources.
Step 203: media server is held consultation according to described audio coding decoding set of types and ASR server, and the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server;
Concrete, the MSCU in media server is to the ASR server sends that the SIP signaling is held consultation and designated media server and ASR server mate audio coding decoding type; The media business packet that MRU receiving terminal in media server is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the MSTU in the media server; MSCU controls the media business Packet Generation of MSTU after with transcoding to the ASR server.
In actual moving process, as shown in Figure 3, the realization of method described in step 203 can comprise the steps:
Step 301: the MSCU in media server sends the SIP signaling to the ASR server, with ASR server negotiate audio coding decoding type;
Here, carry the audio coding decoding capability set that media server is supported in described SIP signaling, that is: carry all audio coding decoding types that the center interactive module of voice described in media server (MRU) are supported in the SIP signaling.After the ASR server is received the SIP signaling, judge whether the audio coding decoding type of self supporting is present in the audio coding decoding capability set of media server support, if there is the audio coding decoding type of coupling, notify media server, both sides specify the audio coding decoding type of described coupling as follow-up audio coding decoding type of the media business packet being carried out transcoding, here, if there is the audio coding decoding type of two or more couplings, therefrom the person is a kind of as follow-up audio coding decoding type of the media business packet being carried out transcoding; If there is no the audio coding decoding type of coupling, finish current ASR operation flow.
In embodiment of the present invention, the audio coding decoding type that media server is not supported with terminal is as the audio coding decoding capability set of consulting institute's foundation, and all audio coding decoding types of supporting with media server are as the audio coding decoding capability set of consulting institute's foundation.
Step 302: the MSCU notice MSTU in media server opens network address translation (NAT) passage;
Here, MSCU issues the order of opening the NAT passage to MSTU.
Step 303: the MSCU in media server issues the transcoding order to MRU;
Concrete, the media business packet that MSCU notice MRU receiving terminal in media server is sent out, and to specify the audio coding decoding type of the port that MRU is connected with the ASR server be the audio coding decoding type of having consulted in step 301, and to specify the audio coding decoding type of MRU transcoding institute foundation be the audio coding decoding type of having consulted in step 301.
Step 304: the MSCU in media server sets up link with the ASR server, and the input of notice ASR server wait audio frequency, and carries out audio identification;
Here, MSCU sets up the TCP/IP link with the ASR server, and MSCU sends the MRCP instruction by MRCP to the ASR server and notifies the ASR server to wait for the audio frequency input, and carries out audio identification.
Step 305: the data in the media business packet that the MRU in media server sends out terminal are carried out transcoding, and with the media business packet after transcoding, namely the audio frequency media business datum are sent to the receiving port of MSTU by the MRU internal orifice;
Step 306: the MSTU in media server carries out NAT after receiving media business packet after the transcoding that MRU sends, and sends to the ASR server.
After described step 203, the method also comprises: the ASR server resolves to word with the media business packet of receiving, and by MRCP, described word is sent to media server; Media server reports the INFO execution result to the APP server, and simultaneously, the APP server sends the BYE signaling to media server, with releasing resource; Media server returns results to the APP server afterwards to ASR server request releasing resource, the ASR service ending.
The present invention also provides a kind of system that realizes of ASR business, and as shown in Figure 4, this system comprises: media server, APP server and ASR server; Wherein,
Described media server after being used for receiving the request of access of APP server, is determined the audio coding decoding set of types that self supports; After receiving the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Hold consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server;
Described APP server is used for sending request of access and ASR service request to media server;
Described ASR server is used for holding consultation with media server, and the media business packet after the transcoding sent out of receiving media server.
Further, this system also comprises terminal, after receiving the request of access of APP server for media server, sends the request of media business packet to the APP server; Accordingly,
Described APP server also is used for sending to media server according to the request of described media business packet the signaling of request of access;
Described media server after also being used for receiving the signaling of described request of access, specifies self and terminal to carry out mutual address.
Wherein, described media server is held consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server, for:
MSCU in media server sends the SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling;
The media business packet that MRU receiving terminal in media server is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the MSTU in the media server;
MSCU controls the media business Packet Generation of MSTU after with transcoding to the ASR server.
Accordingly, as shown in Figure 5, described media server also comprises: MSCU, MRU and MSTU; Wherein,
Described MSCU is used for sending the SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling; Control the media business packet after MSTU sends transcoding;
Described MRU is used for the media business packet that receiving terminal is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the MSTU in the media server;
Described MSTU is used under the control of MSCU the media business Packet Generation after transcoding to the ASR server.
The present invention also provides a kind of media server, after being used for receiving the request of access of APP server, determines the audio coding decoding set of types that self supports; After receiving the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Hold consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server.
The above is only preferred embodiment of the present invention, is not for limiting protection scope of the present invention.

Claims (10)

1. the implementation method of an automatic speech recognition ASR business, is characterized in that, the method comprises:
After media server is received the request of access of APP server, determine the audio coding decoding set of types that self supports;
After media server is received the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources;
Media server is held consultation according to described audio coding decoding set of types and ASR server, and the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server.
2. the implementation method of ASR business according to claim 1, it is characterized in that, described media server and ASR server are held consultation, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server, for:
Media management module MSCU in media server sends session initiation protocol SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling;
The media business packet that voice centers interactive module MRU receiving terminal in media server is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the media store transmission of audio module MSTU in the media server;
MSCU controls the media business Packet Generation of MSTU after with transcoding to the ASR server.
3. the implementation method of ASR business according to claim 1, is characterized in that, described media server is held consultation according to described audio coding decoding set of types and ASR server and obtained the audio coding decoding type, for:
Media server sends the SIP signaling to the ASR server, after the ASR server is received the SIP signaling, judge whether the audio coding decoding type of self supporting is present in the audio coding decoding capability set of media server support, if there is the audio coding decoding type of coupling, notify media server, both sides specify the audio coding decoding type of described coupling as follow-up audio coding decoding type of the media business packet being carried out transcoding; If there is no the audio coding decoding type of coupling, finish current ASR operation flow.
4. the implementation method of according to claim 1,2 or 3 described ASR business, is characterized in that, after described media server was received the request of access of APP server, the method also comprised:
Terminal sends the request of media business packet to the APP server; The APP server sends the signaling of request of access to media server according to the request of described media business packet, media server specifies self and terminal to carry out mutual address afterwards.
5. the implementation method of ASR business according to claim 2, is characterized in that, described media server carries out transcoding to the media business packet, and with the media business Packet Generation after transcoding to the ASR server, for:
MSCU notice MSTU in media server opens the NAT passage;
MSCU in media server issues the transcoding order to MRU;
MSCU in media server sets up link with the ASR server, and the input of notice ASR server wait audio frequency, and carries out audio identification;
Data in the media business packet that MRU in media server sends out terminal are carried out transcoding, and the media business packet after transcoding is sent to the receiving port of MSTU by the MRU internal orifice;
The media business packet of MSTU in media server after to transcoding carries out NAT, and sends to the ASR server.
6. the system that realizes of an ASR business, is characterized in that, this system comprises: media server, APP server and ASR server; Wherein,
Described media server after being used for receiving the request of access of APP server, is determined the audio coding decoding set of types that self supports; After receiving the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Hold consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server;
Described APP server is used for sending request of access and ASR service request to media server;
Described ASR server is used for holding consultation with media server, and the media business packet after the transcoding sent out of receiving media server.
7. the system that realizes of ASR business according to claim 6, is characterized in that, this system also comprises terminal, after receiving the request of access of APP server for media server, sends the request of media business packet to the APP server; Accordingly,
Described APP server also is used for sending to media server according to the request of described media business packet the signaling of request of access;
Described media server after also being used for receiving the signaling of described request of access, specifies self and terminal to carry out mutual address.
8. the system that realizes of according to claim 6 or 7 described ASR business, it is characterized in that, described media server is held consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server, for:
MSCU in media server sends the SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling;
The media business packet that MRU receiving terminal in media server is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the MSTU in the media server;
MSCU controls the media business Packet Generation of MSTU after with transcoding to the ASR server.
9. the system that realizes of ASR business according to claim 8, is characterized in that, described media server also comprises: MSCU, MRU and MSTU; Wherein,
Described MSCU is used for sending the SIP signaling to the ASR server and holds consultation, and the audio coding decoding type of designated media server and ASR server coupling; Control the media business packet after MSTU sends transcoding;
Described MRU is used for the media business packet that receiving terminal is sent out, and described media business packet is carried out transcoding by the audio coding decoding type of described negotiation, and with the media business Packet Generation after transcoding the MSTU in the media server;
Described MSTU is used under the control of MSCU the media business Packet Generation after transcoding to the ASR server.
10. a media server, is characterized in that, described media server after being used for receiving the request of access of APP server, is determined the audio coding decoding set of types that self supports; After receiving the ASR service request that the APP server sends, according to the ASR type of service to ASR server application ASR service resources; Hold consultation according to described audio coding decoding set of types and ASR server, the audio coding decoding type of gained is carried out transcoding to the media business packet through consultation, and with the media business Packet Generation after transcoding to the ASR server.
CN201310032134.7A 2013-01-28 2013-01-28 A kind of implementation method of automatic speech recognition business, system and media server Active CN103151041B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310032134.7A CN103151041B (en) 2013-01-28 2013-01-28 A kind of implementation method of automatic speech recognition business, system and media server
PCT/CN2013/082219 WO2013189430A2 (en) 2013-01-28 2013-08-23 Method, system, and media server for implementing automatic speech recognition service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310032134.7A CN103151041B (en) 2013-01-28 2013-01-28 A kind of implementation method of automatic speech recognition business, system and media server

Publications (2)

Publication Number Publication Date
CN103151041A true CN103151041A (en) 2013-06-12
CN103151041B CN103151041B (en) 2016-02-10

Family

ID=48549063

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310032134.7A Active CN103151041B (en) 2013-01-28 2013-01-28 A kind of implementation method of automatic speech recognition business, system and media server

Country Status (2)

Country Link
CN (1) CN103151041B (en)
WO (1) WO2013189430A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013189430A2 (en) * 2013-01-28 2013-12-27 中兴通讯股份有限公司 Method, system, and media server for implementing automatic speech recognition service
CN105206273A (en) * 2015-09-06 2015-12-30 上海智臻智能网络科技股份有限公司 Voice transmission control method and system
CN107659415A (en) * 2016-07-25 2018-02-02 中兴通讯股份有限公司 A kind of managing medium resource method and device of cloud meeting
CN107820324A (en) * 2017-10-30 2018-03-20 铱方科技(深圳)有限公司 Mobile terminal receives method, system and its binding method of landline telephone call, system
CN109429068A (en) * 2017-09-01 2019-03-05 成都鼎桥通信技术有限公司 Coding and decoding video method for processing business and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1606772A (en) * 2002-04-10 2005-04-13 三菱电机株式会社 Method for distributed automatic speech recognition and distributed automatic speech recognition system
CN1633129A (en) * 2005-01-12 2005-06-29 北京邮电大学 A media server based on soft switch
CN1764190A (en) * 2004-10-22 2006-04-26 微软公司 Distributed speech service
CN1801322A (en) * 2004-11-19 2006-07-12 国际商业机器公司 Method and system for transcribing speech on demand using a transcription portlet
CN1984201A (en) * 2005-12-13 2007-06-20 国际商业机器公司 Voice services system and method
CN101437047A (en) * 2008-12-09 2009-05-20 中兴通讯股份有限公司 Method, system and media server for playback/ sound-recording for user terminal
CN102231734A (en) * 2011-06-22 2011-11-02 中兴通讯股份有限公司 Method, device and system for realizing audio transcoding of TTS (Text To Speech)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103151041B (en) * 2013-01-28 2016-02-10 中兴通讯股份有限公司 A kind of implementation method of automatic speech recognition business, system and media server

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1606772A (en) * 2002-04-10 2005-04-13 三菱电机株式会社 Method for distributed automatic speech recognition and distributed automatic speech recognition system
CN1764190A (en) * 2004-10-22 2006-04-26 微软公司 Distributed speech service
CN1801322A (en) * 2004-11-19 2006-07-12 国际商业机器公司 Method and system for transcribing speech on demand using a transcription portlet
CN1633129A (en) * 2005-01-12 2005-06-29 北京邮电大学 A media server based on soft switch
CN1984201A (en) * 2005-12-13 2007-06-20 国际商业机器公司 Voice services system and method
CN101437047A (en) * 2008-12-09 2009-05-20 中兴通讯股份有限公司 Method, system and media server for playback/ sound-recording for user terminal
CN102231734A (en) * 2011-06-22 2011-11-02 中兴通讯股份有限公司 Method, device and system for realizing audio transcoding of TTS (Text To Speech)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013189430A2 (en) * 2013-01-28 2013-12-27 中兴通讯股份有限公司 Method, system, and media server for implementing automatic speech recognition service
WO2013189430A3 (en) * 2013-01-28 2014-02-20 中兴通讯股份有限公司 Method, system, and media server for implementing automatic speech recognition service
CN105206273A (en) * 2015-09-06 2015-12-30 上海智臻智能网络科技股份有限公司 Voice transmission control method and system
CN105206273B (en) * 2015-09-06 2019-05-10 上海智臻智能网络科技股份有限公司 Voice transfer control method and system
CN107659415A (en) * 2016-07-25 2018-02-02 中兴通讯股份有限公司 A kind of managing medium resource method and device of cloud meeting
CN109429068A (en) * 2017-09-01 2019-03-05 成都鼎桥通信技术有限公司 Coding and decoding video method for processing business and equipment
CN109429068B (en) * 2017-09-01 2020-09-29 成都鼎桥通信技术有限公司 Video coding and decoding service processing method and device
CN107820324A (en) * 2017-10-30 2018-03-20 铱方科技(深圳)有限公司 Mobile terminal receives method, system and its binding method of landline telephone call, system

Also Published As

Publication number Publication date
CN103151041B (en) 2016-02-10
WO2013189430A3 (en) 2014-02-20
WO2013189430A2 (en) 2013-12-27

Similar Documents

Publication Publication Date Title
CN105450994B (en) A kind of video directing and scheduling system and method based on RTSP agreements
CN103151041B (en) A kind of implementation method of automatic speech recognition business, system and media server
CN101562667B (en) Coding/decoding conversion control method, media gateway and system under soft switch architecture
CN103781182A (en) Service establishing method and core network equipment
CN108965776A (en) A kind of communication means and communication system
US20230353603A1 (en) Call processing system and call processing method
CN104580119A (en) Audio-video communication method, equipment and system
CN102833254A (en) Method, system and equipment for implementing service control in SIP (session initiation protocol) network
CN101141807B (en) Coding/decoding negotiation method
CN104618615B (en) A kind of TeleConference Bridge meeting summary method for pushing based on short message
CN101888377A (en) Communication method, media server and communication system
CN103684970B (en) The transmission method of media data flow and thin terminal
CN102231734A (en) Method, device and system for realizing audio transcoding of TTS (Text To Speech)
CN103795958A (en) Multimedia call negotiation method, system and video interworking gateway, multimedia terminal
CN105306420B (en) Realize the method, apparatus played from Text To Speech cycle of business operations and server
CN101110864B (en) Method for providing dial-in service using medium service apparatus
CN101668092B (en) Method for realizing supplementary service dialing tone by network multimedia terminal and device
US9143726B2 (en) Video media server for realizing video intercommunication gateway function and video intercommunication method
US9398254B2 (en) Method for implementing telepresence technology and telepresence device
CN101370310B (en) User terminal, application server and call establishment method
CN103929436A (en) Method for limiting repeated media negotiations in IMS network
CN105245352B (en) A kind of intelligent public phone realization system and method based on SIP voice home gateways
CN108809911A (en) The method, apparatus and storage medium of two-stage dialing are realized in VoLTE networks
CN209731293U (en) A kind of VOIP voice terminal suitable for power distribution network O&M
CN102340753A (en) Method and system for realizing priority access of call

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant