CN111917726A - Adaptation layer, voice communication system and control method thereof - Google Patents

Adaptation layer, voice communication system and control method thereof Download PDF

Info

Publication number
CN111917726A
CN111917726A CN202010618280.8A CN202010618280A CN111917726A CN 111917726 A CN111917726 A CN 111917726A CN 202010618280 A CN202010618280 A CN 202010618280A CN 111917726 A CN111917726 A CN 111917726A
Authority
CN
China
Prior art keywords
message
intelligent voice
adaptation layer
media resource
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010618280.8A
Other languages
Chinese (zh)
Other versions
CN111917726B (en
Inventor
艾巍
邵小亮
谢隆飞
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202010618280.8A priority Critical patent/CN111917726B/en
Publication of CN111917726A publication Critical patent/CN111917726A/en
Application granted granted Critical
Publication of CN111917726B publication Critical patent/CN111917726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1101Session protocols
    • H04L65/1104Session initiation protocol [SIP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/65Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application provides an adaptation layer, a voice communication system and a control method thereof, wherein the adaptation layer comprises a routing adapter, a tone adapter and a resource monitor; the media resource server cluster and the intelligent voice server cluster establish N through an adaptation layer: n is a positive integer greater than 1; in the handshaking stage and the service requesting stage, the media resource servers in the media resource server cluster perform information interaction with the intelligent voice servers in the corresponding intelligent voice server cluster through the adaptation layer. The voice communication system constructs an adaptation layer in the middle, flexibly adjusts the adaptation layer in a low-cost mode on the premise of not modifying huge platforms of two third-party companies, namely a media resource server and an intelligent voice product, and meets the complex requirements of multiple sexual applications of multiple tenants. The adaptation layer masters the bottom layer communication of voice interaction, and can quickly iterate on the adaptation layer according to requirements in the future, so that more applications can be flexibly constructed.

Description

Adaptation layer, voice communication system and control method thereof
Technical Field
The present application relates to the field of voice transmission technologies, and in particular, to an adaptation layer, a voice communication system, and a control method thereof.
Background
With the development of artificial intelligence technology, intelligent outbound based on ASR and TTS is widely applied. In the mainstream architecture scheme in the industry, a media resource server is responsible for receiving the voice of a user and playing the voice synthesized by an intelligent voice service; the intelligent voice service is responsible for recognizing the voice of the user, forwarding the voice to the natural language processing module and synthesizing the voice into character synthetic voice returned by the natural language processing module. The media resource server and the intelligent voice service communicate through SIP, SDP, MRCP and RTP protocols, wherein the SIP protocol and the SDP protocol are used for negotiating service capability, TCP and UDP communication ports, negotiating communication channels and the like in a handshaking stage; in the interaction stage of the two, the MRCP protocol is used for controlling the interaction of media resources (text, voice and the like), and the RTP protocol is used for transmitting the media resource flow. There are several key configurations at the media resource server side: 1. a tone for specifying a tone used by TTS synthesized speech of the smart speech service; 2. the method comprises the steps that an mrcp server address and a port of intelligent voice service are used, and a media resource server sends messages and data to the address and the port for interaction; 3. the number of ASR and TTS lines, the number of channels established with the intelligent voice service is specified, and one channel supports one TTS service or one ASR service.
As described above, the media resource server and the intelligent voice service directly establish a one-to-one communication channel, and for a large-scale intelligent outbound call system, when there are many tenants and there are many tenant individual requirements, this construction mode has the following disadvantages:
1. and can not support a plurality of intelligent voice products. Different intelligent voice products have respective advantages and shortages, and service prices are different. Different tenants need to select different intelligent voice products according to own application scenes and budgets. At present, a multi-intelligent voice product is used, and one scheme is that the voice product is divided into a plurality of voice modules according to the following steps of 1: 1, a plurality of sets of media resource servers and intelligent voice services are deployed in the construction relationship, and the scheme has high cost and complex management; the other scheme is to modify a media resource server to connect a plurality of intelligent voice services, and the scheme depends on a company providing the media resource server by a third party, so that the cost is high, no independent property right exists, and the control cannot be realized; meanwhile, a plurality of business elements are migrated to the media resource server platform, so that the risk is high, the whole body is moved by pulling, and the system is not small and flexible enough.
2. Multiple timbres cannot be supported simultaneously. The tone colors of different tenants are different, and even the requirement of self-customizing tone colors is met. The tone parameters are set and transmitted by the media resource server platform at present, so that a plurality of tones cannot be configured, and different tones cannot be set for different tenants. The function is limited.
3. The resource occupation of the ASR and TTS cannot be effectively monitored. The currently known Zhongxing media resource server can monitor the number of ASR and TTS using lines of a certain time node; ali only supports monitoring the number of concurrencies for ASR. These monitoring methods have the disadvantage that only the number of concurrences can be counted. The key data transmitted in the intelligent voice service is voice stream, and the real-time voice stream reaches several seconds or even dozens of seconds, so that ASR and TTS resources are occupied. On the basis of counting the concurrency number, the service duration should be counted more, and then the resource consumption and the cost can be accurately calculated.
Disclosure of Invention
The present application provides an adaptation layer, a voice communication system and a control method thereof, aiming at the disadvantages of the existing methods, so as to solve the above problems and provide a basis for future application function expansion.
In a first aspect, an embodiment of the present application provides an adaptation layer for a multi-intelligent voice product, including a routing adapter, a tone adapter, and a resource monitor;
the media resource server cluster and the intelligent voice server cluster establish N through an adaptation layer: n is a positive integer greater than 1;
in the handshaking stage and the service requesting stage, the media resource servers in the media resource server cluster perform information interaction with the intelligent voice servers in the corresponding intelligent voice server cluster through the adaptation layer.
In one embodiment of the application, in the handshake phase, the adaptation layer forwards the information of the media resource server and the intelligent voice server by using the SIP protocol and the SDP protocol;
in the service requesting stage, the adaptation layer forwards the information of the media resource server and the intelligent voice server by using the MRCP protocol and the RTP protocol;
in one embodiment of the present application, messages of the SIP protocol, the SDP protocol, and the MRCP protocol are transmitted using TCP; the messages of the RTP protocol are transmitted using UDP.
In one embodiment of the application, the tone adapter is used for selecting a corresponding tone number modification request message to forward the communication according to the tenant identity in the outbound task parameter in the request service stage.
In one embodiment of the application, the resource monitor is configured to monitor the number of lines and the session start-stop time of the session during the service request phase.
In a second aspect, an embodiment of the present application provides a voice communication system, which includes a media resource server cluster, an intelligent voice server cluster, and an adaptation layer provided in the embodiment of the present application.
In a third aspect, an embodiment of the present application provides a method for controlling a voice communication system, where the method is applied to the voice communication system provided in the embodiment of the present application, and includes: in the handshake phase and the service request phase, the media resource server performs information interaction with the corresponding voice-enabled server through the adaptation layer.
In one embodiment of the present application, the specific steps of the handshake phase include:
the media resource server sends an INVITE message, the adaptation layer analyzes the message and forwards the INVITE message to an intelligent voice product in the intelligent voice server;
the intelligent voice product in the intelligent voice server returns the Trying message and the OK message, and the adaptation layer analyzes the message and forwards the Trying message and the OK message to the media resource server;
and the media resource server sends the ACK message, and the adaptation layer analyzes the message and forwards the ACK message to an intelligent voice product in the intelligent voice server.
In an embodiment of the present application, the request service phase includes a TTS service phase, and the specific steps of the TTS service phase include:
the media resource server sends a SPEAK message, and the route adapter forwards the SPEAK message to an intelligent voice product in the intelligent voice server;
the intelligent voice product IN the intelligent voice server returns the IN-PROGRESS message and the SPEAK-COMPLETE message, and the routing adapter forwards the IN-PROGRESS message and the SPEAK-COMPLETE message to the media resource server.
In one embodiment of the present application, the request service stage includes an ASR service stage, and the specific steps of the ASR service stage include:
the media resource server sends a RECOGINZE message, and the adaptation layer forwards the RECOGINZE message to an intelligent voice product in the intelligent voice server;
the intelligent voice product in the intelligent voice server returns a PROGRESS message, and the adaptation layer forwards the PROGRESS message to the media resource server;
the media resource server sends a RECOGINZE message, and the adaptation layer forwards the RECOGINZE message to an intelligent voice product in the intelligent voice server;
the intelligent voice product IN the intelligent voice server returns an IN-PROGRESS message and a START-OF-INPUT message, and the adaptation layer forwards the IN-PROGRESS message and the START-OF-INPUT message to the media resource server;
the media resource server sends a START-INPUT-TIMERS message, and the adaptation layer forwards the START-INPUT-TIMERS message to an intelligent voice product in the intelligent voice server;
the smart voice product in the smart voice server returns a COMPLETE message and a record-COMPLETE message, and the adaptation layer forwards the COMPLETE message and the record-COMPLETE message to the media resource server.
The technical scheme provided by the embodiment of the application at least has the following beneficial effects:
the adaptation layer media resource server cluster and the intelligent voice server cluster provided by the embodiment of the application have one-to-one construction relationship, and the adaptation layer is constructed in the middle. The adaptation layer is flexibly adjusted in a low-cost mode on the premise of not modifying huge platforms of two third-party companies, namely a media resource server and an intelligent voice product, so that the complex requirements of multiple sexual applications of multiple tenants are met. The adaptation layer masters the bottom layer communication of voice interaction, and can quickly iterate on the adaptation layer according to requirements in the future, so that more applications can be flexibly constructed. The beneficial effect that this application specifically brought is as follows:
the utility model provides a being independent of media resource server and intelligent voice server outside, having added dexterous adaptation layer, the adaptation layer can dock multiple intelligent voice product to make 1 of media resource server and intelligent voice server in the industry: 1, expanding the construction relationship into N: and N (N is an integer greater than 1). The adaptation layer can integrate various service elements, allocate communication channels, and improve the flexibility and expansibility of the intelligent outbound system according to different requirements of different tenants and different intelligent voice products. The adaptation layer can also create a load balancing strategy of the adaptation layer, and make up for the problem that the multimedia communication support of the load balancing products on the market is insufficient.
Secondly, thanks to the control of the adaptation layer on the bottom communication of voice interaction, the voice adapter of the adaptation layer can capture the MRCP message and modify the corresponding tone according to the tenant requirements. The problem that the media resource server in the industry can only set a single tone and cannot meet the requirements of different tones of multiple tenants is solved. The system can select different timbres of the same intelligent voice product and can also select different timbres of different intelligent voice products.
And thirdly, the number of lines of the media resource server and the number of licenses of the intelligent voice product are important constituent elements of the system construction cost, all communication messages are captured by the adaptation layer, the resource monitor of the adaptation layer breaks through the monitoring of the number of the resource lines common in the industry at present from a lower dimension, a more detailed and accurate monitoring dimension of a time dimension and a tenant dimension is added, and more scientific data are provided for cost accounting and system resource use monitoring.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic block diagram of an adaptation layer provided in an embodiment of the present application;
fig. 2 is a schematic architecture diagram of a voice communication system according to an embodiment of the present application;
fig. 3 is a timing diagram of a control method of a voice communication system according to the present application.
Detailed Description
Reference will now be made in detail to the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar parts or parts having the same or similar functions throughout. In addition, if a detailed description of the known art is not necessary for illustrating the features of the present application, it is omitted. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
Firstly, the embodiments of the present application and the names in the drawings of the specification are explained simply:
MRCP: media Resource Control Protocol (Media Resource Control Protocol) is a communication Protocol of computer network application layer, and is used for providing various voice services (such as voice recognition, voice synthesis, voiceprint recognition, etc.) to a client by a voice server.
Media resource server: the device provides media resource functions required for implementing various services on an IP network under the control of control devices (soft switch devices, application servers), including service voice provision, conferencing, interactive response (IVR), notification, high-level language services, and the like.
ASR: automatic Speech Recognition (Automatic Speech Recognition) is a technology for converting human Speech into text.
TTS: the Speech synthesis technology (Text To Speech) is a technology for converting Text into Speech.
IVR: interactive Voice Response (Interactive Voice Response).
A tenant: a company or individual leasing a voice communication system.
Outbound task parameters: when the outbound task is established, a series of parameters need to be set. Such as outbound time period, telephone number, name, gender, etc. One parameter is a parameter of the tenant id and can be transmitted through the Vendor-Specific-Parameters of the MRCP message, and the adaptation layer sets the type, tone and the like of the intelligent voice product of the corresponding tenant according to the parameter. The outbound task parameter of the body refers to the tenant id.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments.
The embodiment of the present application provides an adaptation layer 100 of a multi-intelligent voice product, and as shown in fig. 1, the adaptation layer 100 includes a routing adapter, a tone adapter, and a resource monitor.
As shown in fig. 2, a media resource server cluster establishes N: n is a positive integer greater than 1. The media resource server cluster includes a plurality of media resource servers 200, the intelligent voice server cluster includes a plurality of intelligent voice servers 300, and each intelligent voice server 300 runs an intelligent voice product. The number of the media resource servers 200 and the intelligent voice servers 300 in the voice communication system can be increased or decreased according to the actual situation, and the media resource servers 200 and the intelligent voice servers 300 are in a corresponding relation of N: N (N is more than 1). Each media resource server 200 in the media resource server cluster is in bidirectional communication with the adaptation layer 100, and each intelligent voice server 300 in the intelligent voice server cluster is in bidirectional communication with the adaptation layer 100.
In the handshaking phase and the service requesting phase, the media resource servers in the media resource server cluster perform information interaction with the intelligent voice server 300 in the corresponding intelligent voice server cluster through the adaptation layer 100.
In one embodiment of the present application, during the handshake phase, the adaptation layer 100 forwards information of the media resource server 200 and the intelligent voice server 300 using the SIP protocol and the SDP protocol. In the request service phase, adaptation layer 100 forwards the information of media resource server 200 and intelligent voice server 300 using MRCP protocol and RTP protocol.
In one embodiment of the application, messages of the SIP protocol, the SDP protocol and the MRCP protocol are transmitted using TCP, and messages of the RTP protocol are transmitted using UDP.
In one embodiment of the present application, the adaptation layer 100 includes a routing adapter. The route adapter is used for selecting the idle MRCP session channel (channel) of the corresponding intelligent voice product to forward communication according to the tenant identity in the outbound task parameter.
In an embodiment of the application, the tone adapter is used for selecting a corresponding tone number (Voice-Name) to modify the request message to forward the communication according to a tenant Identity (ID) in the outbound task parameter in the request service stage.
In one embodiment of the application, the resource monitor is configured to monitor the number of lines and the session start-stop time of the session during the service request phase. Optionally, the resource monitor monitors the number of monitoring lines and the start-stop time of the session in the ASR service stage and the TTS service stage, so as to facilitate subsequent accurate statistics of the service conditions of the ASR resource and the TTS resource of each tenant.
The adaptation layer media resource server cluster and the intelligent voice server cluster provided by the embodiment of the application have a one-to-one construction relationship, and the adaptation layer 100 is constructed in the middle. On the premise of not modifying the huge platforms of two third-party companies, namely the media resource server 200 and the intelligent voice product, the adaptation layer 100 is flexibly adjusted in a low-cost mode to meet the complex requirements of multiple personalized applications of multiple tenants. The adaptation layer 100 takes charge of the bottom-layer communication of voice interaction, and can quickly iterate on the adaptation layer 100 according to the requirements in the future, so that more applications can be flexibly constructed. The beneficial effect that this application specifically brought is as follows:
in addition to being independent of the media resource server 200 and the intelligent voice server 300, a smart adaptation layer 100 is added, and the adaptation layer 100 can interface with a plurality of intelligent voice products, so that the media resource server 200 and the intelligent voice server 3001 in the industry: 1, expanding the construction relationship into N: and N (N is an integer greater than 1). The adaptation layer 100 can integrate various service elements, allocate communication channels, and improve the flexibility and expansibility of the intelligent outbound system according to different requirements of different tenants and different intelligent voice products. The adaptation layer 100 can also create its own load balancing strategy to make up for the problem of insufficient support of the load balancing products on the market for multimedia communication.
Secondly, thanks to the control of the adaptation layer 100 on the underlying communication of voice interaction, the voice adapter of the adaptation layer 100 can capture the MRCP message and modify the corresponding tone according to the tenant requirements. The problem that the media resource server 200 in the industry can only set a single tone and cannot meet the requirements of different tones of multiple tenants is solved. The system can select different timbres of the same intelligent voice product and can also select different timbres of different intelligent voice products.
Thirdly, the number of lines of the media resource server 200 and the number of licenses of the intelligent voice product are important constituent elements of the system construction cost, all communication messages are captured by the adaptation layer 100, and the resource monitor of the adaptation layer 100 breaks through the monitoring of the number of the lines of the resources commonly used in the industry at present from a lower dimension, adds a more detailed and accurate monitoring dimension of a time dimension and a tenant dimension, and provides more scientific data for cost accounting and system resource use monitoring.
Based on the same inventive concept, an embodiment of the present application further provides a voice communication system, as shown in fig. 2, the voice communication system includes a media resource server cluster, an intelligent voice server cluster, and an adaptation layer 100 provided in the embodiment of the present application.
Based on the same inventive concept, the embodiment of the present application further provides a control method of a voice communication system, including: in the handshake phase and the request service phase, the media resource server 200 performs information interaction with the corresponding intelligent voice server 300 through the adaptation layer 100.
In an embodiment of the present application, the request service phase may include a TTS service phase and an ASR service phase.
In one embodiment of the present application, as shown in fig. 3, the specific steps of the handshake phase include: the media resource server 200 sends an INVITE message, and the adaptation layer 100 parses the message and forwards the INVITE message to the intelligent voice product in the intelligent voice server 300; the intelligent voice product in the intelligent voice server 300 returns a Trying message and an OK message, and the adaptation layer 100 parses the packet and forwards the Trying message and the OK message to the media resource server 200; media resource server 200 sends an ACK message, adaptation layer 100 parses the message and forwards the ACK message to the intelligent voice product in intelligent voice server 300.
The media resource server 200 sends INVITE message, and the adaptation layer 100 parses the message to obtain the following fields: Call-ID, media resource server 200IP address ivrpip, traffic type resource (speech recognition or speech synthesis speechsync), media resource server 200UDP communication port ztePort. The adaptation layer 100 plans the number of sessions established with each intelligent voice product according to the number of lines of the media resource server 200 and the number of licenses of the intelligent voice product. And if the session number of the intelligent voice product x is designed to be Nx, forwarding Nx INVITE messages to the intelligent voice product x in sequence. And constructing a parameter type (intelligent voice product type + resource). At this time, the message parsing fields are divided into different sets according to the type of the intelligent voice product. Meanwhile, the mappings < Call-ID, type >, < Call-ID, route (representing the intelligent voice product type) >, < Call-ID, ivrIP >, < Call-ID, ztePort > are established.
When the intelligent voice product returns the Trying message, the adaptation layer 100 analyzes the message, acquires the Call-ID, acquires the media resource server 200IP address which should be returned according to the mapping < Call-ID, ivrIP >, and returns the message; when the intelligent voice product returns an OK message, the adaptation layer 100 parses the message to obtain the following fields: Call-ID, MRCP session channel, intelligent voice product UDP communication port mrcpPort. The mappings < ztoport, mrcpPort >, < channel, route >, < ztoport, channel >, < mrcpPort, ivrsip >, < callID, channel >, < type, channel map (channel set) >, are established. The adaptation layer 100 needs to create a receiving end socket, and monitors a request from the media resource server 200 of a UDP port mrcpPort, which is referred to as outrtpctx herein; meanwhile, a sending end socket is created, and a request of a forwarding media resource server 200 of a UDP port ztePort is bound, wherein the UDP channel is called InnerRtpCx;
the media resource server 200 sends an ACK message, the adaptation layer 100 parses the message, obtains the Call-ID field, obtains the corresponding intelligent voice product type according to the mapping < Call-ID, route >, modifies the message source IP address and the target IP address, and then forwards the message. And after the intelligent voice product receives the ACK message, the handshake phase is finished.
In an embodiment of the present application, as shown in fig. 3, the request service phase includes a TTS service phase, and the specific steps of the TTS service phase include: media resource server 200 sends a SPEAK message, and the route adapter forwards the SPEAK message to the intelligent voice product in intelligent voice server 300; the intelligent voice product IN intelligent voice server 300 returns an IN-PROGRESS message and a SPEAK-COMPLETE message, which the routing adapter forwards to media resource server 200.
The media resource server 200 sends a SPEAK message and the route adapter of the adaptation layer 100 parses the tenant id field in the Vendor-Specific-Parameters. According to the configuration, the type of the intelligent voice product corresponding to the tenant id is determined, and an idle channel' is selected from the type, channel map (channel set) > map to be used as a session channel to replace the original session channel. At this time, the status flag of channel 'is updated to "in use" to prevent conflicts caused by other parallel TTS requests using channel'. Meanwhile, a mapping < channel', channel > is established so that when a message is returned, the message is replaced with the original channel return message to the media resource server 200. And the Voice adapter determines the tone corresponding to the tenant id according to the configuration, and replaces the Voice-Name value in the SPEAK message. And the resource monitor adds 1 to the resource counter corresponding to the tenant and records the starting time point. And finally, forwarding the message.
The intelligent voice product returns IN-PROGRESS message, the routing adapter of the adaptation layer 100 analyzes the Channel-Identifier field to obtain a Channel ', and obtains a source Channel according to the mapping < Channel', Channel > and replaces the source Channel, and forwards the message. At this time, ztePort of the adaptation layer 100 monitors TTS synthesized voice stream returned by the intelligent voice product, and forwards the TTS synthesized voice stream to the corresponding media resource server 200 as it is.
The intelligent voice product returns a peak-COMPLETE message, the route adapter of the adaptation layer 100 analyzes the Channel-Identifier field to obtain a Channel ', and obtains and replaces the source Channel according to the mapping < Channel', Channel >, and the message is forwarded. At this time, the resource monitor records the service termination time point, and the resource counter corresponding to the tenant is decremented by 1. And the subsequent statistics of the resource use condition of each tenant is facilitated.
In one embodiment of the present application, as shown in fig. 3, the request service stage includes an ASR service stage, and the specific steps of the ASR service stage include: media resource server 200 sends a RECOGINZE message, adaptation layer 100 forwards the RECOGINZE message to the intelligent voice product in intelligent voice server 300; the intelligent voice product in the intelligent voice server 300 returns a PROGRESS message, and the adaptation layer 100 forwards the PROGRESS message to the media resource server 200; media resource server 200 sends a RECOGINZE message, adaptation layer 100 forwards the RECOGINZE message to the intelligent voice product in intelligent voice server 300; the intelligent voice product IN the intelligent voice server 300 returns IN-PROGRESS message and START-OF-INPUT message, and the adaptation layer 100 forwards the IN-PROGRESS message and the START-OF-INPUT message to the media resource server 200; the media resource server 200 sends a START-INPUT-TIMERS message, and the adaptation layer 100 forwards the START-INPUT-TIMERS message to the intelligent voice product in the intelligent voice server 300; the smart voice product in the smart voice server 300 returns a COMPLETE message and a receipt-COMPLETE message, which the adaptation layer 100 forwards to the media resource server 200.
The specific principle of the ASR service phase is similar to that of the TTS service phase, and in addition, the timing diagram indicated in fig. 3 only lists a typical flow of the service, and for the case of TTS interruption, the timing diagram is slightly different, but the adaptation layer 100 of the present invention can also process the same, and is not described herein again
Those of skill in the art will appreciate that the various operations, methods, steps in the processes, acts, or solutions discussed in this application can be interchanged, modified, combined, or eliminated. Further, other steps, measures, or schemes in various operations, methods, or flows that have been discussed in this application can be alternated, altered, rearranged, broken down, combined, or deleted. Further, steps, measures, schemes in the prior art having various operations, methods, procedures disclosed in the present application may also be alternated, modified, rearranged, decomposed, combined, or deleted.
In the description of the present application, it is to be understood that the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present application, "a plurality" means two or more unless otherwise specified.
It should be understood that, although the respective steps in the flowcharts of the drawings are sequentially shown based on the indication of the arrow, the steps are not necessarily sequentially performed based on the order indicated by the arrow. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (10)

1. An adaptation layer of a multi-intelligent voice product is characterized by comprising a routing adapter, a tone adapter and a resource monitor;
the media resource server cluster establishes N with the intelligent voice server cluster through the adaptation layer: n is a positive integer greater than 1;
in the handshaking phase and the service requesting phase, the media resource server in the media resource server cluster performs information interaction with the corresponding intelligent voice server in the intelligent voice server cluster through the adaptation layer.
2. The adaptation layer of claim 1,
in the handshake phase, the adaptation layer forwards information of the media resource server and the intelligent voice server by using an SIP protocol and an SDP protocol;
in the service request phase, the adaptation layer forwards the information of the media resource server and the intelligent voice server by using MRCP protocol and RTP protocol.
3. The adaptation layer of claim 1,
the messages of the SIP protocol, the SDP protocol and the MRCP protocol are transmitted by using TCP;
the messages of the RTP protocol are transmitted using UDP.
4. The adaptation layer according to any of claims 1 to 3, wherein the tone adapter is configured to select a corresponding tone number modification request packet to forward the communication during the request service phase according to the tenant identity in the outbound task parameter.
5. An adaptation layer according to any of claims 1 to 3, wherein the resource monitor is configured to monitor the number of lines and the session start and stop time of a session during a requested service phase.
6. A voice communication system comprising a media resource server cluster, an intelligent voice server cluster and an adaptation layer according to any of claims 1-5.
7. A control method of a voice communication system applied to the voice communication system according to claim 6, comprising:
in the handshake phase and the service request phase, the media resource server performs information interaction with the corresponding voice-enabled server through the adaptation layer.
8. The method of claim 7, wherein the specific steps of the handshake phase comprise:
the media resource server sends an INVITE message, and the adaptation layer analyzes the message and forwards the INVITE message to an intelligent voice product in the intelligent voice server;
the intelligent voice product in the intelligent voice server returns a Trying message and an OK message, and the adaptation layer analyzes the message and forwards the Trying message and the OK message to the media resource server;
and the media resource server sends an ACK message, and the adaptation layer analyzes the message and forwards the ACK message to an intelligent voice product in the intelligent voice server.
9. The method of claim 7, wherein the request service phase comprises a TTS service phase, and the steps of the TTS service phase include:
the media resource server sends a SPEAK message, and the route adapter forwards the SPEAK message to an intelligent voice product in the intelligent voice server;
and the intelligent voice product IN the intelligent voice server returns an IN-PROGRESS message and a SPEAK-COMPLETE message, and the routing adapter forwards the IN-PROGRESS message and the SPEAK-COMPLETE message to the media resource server.
10. The method for controlling a speech communication system according to claim 7, wherein the request service stage comprises an ASR service stage, and the specific steps of the ASR service stage include:
the media resource server sends a RECOGINZE message, and the adaptation layer forwards the RECOGINZE message to an intelligent voice product in the intelligent voice server;
the intelligent voice product in the intelligent voice server returns a PROGRESS message, and the adaptation layer forwards the PROGRESS message to the media resource server;
the media resource server sends a RECOGINZE message, and the adaptation layer forwards the RECOGINZE message to an intelligent voice product in the intelligent voice server;
an intelligent voice product IN the intelligent voice server returns an IN-PROGRESS message and a START-OF-INPUT message, and the adaptation layer forwards the IN-PROGRESS message and the START-OF-INPUT message to the media resource server;
the media resource server sends a START-INPUT-TIMERS message, and the adaptation layer forwards the START-INPUT-TIMERS message to an intelligent voice product in the intelligent voice server;
the intelligent voice product in the intelligent voice server returns a COMPLETE message and a record-COMPLETE message, and the adaptation layer forwards the COMPLETE message and the record-COMPLETE message to the media resource server.
CN202010618280.8A 2020-07-01 2020-07-01 Adaptation layer, voice communication system and control method thereof Active CN111917726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010618280.8A CN111917726B (en) 2020-07-01 2020-07-01 Adaptation layer, voice communication system and control method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010618280.8A CN111917726B (en) 2020-07-01 2020-07-01 Adaptation layer, voice communication system and control method thereof

Publications (2)

Publication Number Publication Date
CN111917726A true CN111917726A (en) 2020-11-10
CN111917726B CN111917726B (en) 2022-03-15

Family

ID=73227102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010618280.8A Active CN111917726B (en) 2020-07-01 2020-07-01 Adaptation layer, voice communication system and control method thereof

Country Status (1)

Country Link
CN (1) CN111917726B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143401A (en) * 2021-11-24 2022-03-04 建信金融科技有限责任公司 Telephone customer service response adaptation method and device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11220471A (en) * 1998-02-02 1999-08-10 Nippon Telegr & Teleph Corp <Ntt> Atm communication network
US6865150B1 (en) * 2000-04-06 2005-03-08 Cisco Technology, Inc. System and method for controlling admission of voice communications in a packet network
CN1710966A (en) * 2005-07-08 2005-12-21 北京邮电大学 Intelligent external system based on control and use separation and having business generation ability
CN1852271A (en) * 2006-05-19 2006-10-25 戴华敏 Method for realizing medium netgate
CN101227473A (en) * 2008-01-28 2008-07-23 中兴通讯股份有限公司 Method and system of main control conference for multimedia communication system
US20100158201A1 (en) * 2008-11-24 2010-06-24 Rohini Vijay Marathe Systems, methods, and computer readable media for providing toll-free service in a telecommunications network
CN101902455A (en) * 2010-01-12 2010-12-01 北京邮电大学 Open multimedia conference service system and implementing method thereof
CN105744093A (en) * 2016-01-20 2016-07-06 北京智驾互联信息服务有限公司 Voice service system
CN106371322A (en) * 2015-07-20 2017-02-01 中兴通讯股份有限公司 Intelligent home control method, device and system
CN106462468A (en) * 2014-06-19 2017-02-22 微软技术许可有限责任公司 Tenant provisioning for testing a production multi-tenant service
CN109977218A (en) * 2019-04-22 2019-07-05 浙江华坤道威数据科技有限公司 A kind of automatic answering system and method applied to session operational scenarios
CN110445942A (en) * 2019-08-30 2019-11-12 上海浦东发展银行股份有限公司信用卡中心 A kind of outer paging system of intelligence and intelligent calling-out method
CN110769124A (en) * 2019-10-30 2020-02-07 国网江苏省电力有限公司镇江供电分公司 Electric power marketing customer communication system
CN111147530A (en) * 2018-10-16 2020-05-12 深圳市冠旭电子股份有限公司 System architecture, multi-voice platform switching method, intelligent terminal and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11220471A (en) * 1998-02-02 1999-08-10 Nippon Telegr & Teleph Corp <Ntt> Atm communication network
US6865150B1 (en) * 2000-04-06 2005-03-08 Cisco Technology, Inc. System and method for controlling admission of voice communications in a packet network
CN1710966A (en) * 2005-07-08 2005-12-21 北京邮电大学 Intelligent external system based on control and use separation and having business generation ability
CN1852271A (en) * 2006-05-19 2006-10-25 戴华敏 Method for realizing medium netgate
CN101227473A (en) * 2008-01-28 2008-07-23 中兴通讯股份有限公司 Method and system of main control conference for multimedia communication system
US20100158201A1 (en) * 2008-11-24 2010-06-24 Rohini Vijay Marathe Systems, methods, and computer readable media for providing toll-free service in a telecommunications network
CN101902455A (en) * 2010-01-12 2010-12-01 北京邮电大学 Open multimedia conference service system and implementing method thereof
CN106462468A (en) * 2014-06-19 2017-02-22 微软技术许可有限责任公司 Tenant provisioning for testing a production multi-tenant service
CN106371322A (en) * 2015-07-20 2017-02-01 中兴通讯股份有限公司 Intelligent home control method, device and system
CN105744093A (en) * 2016-01-20 2016-07-06 北京智驾互联信息服务有限公司 Voice service system
CN111147530A (en) * 2018-10-16 2020-05-12 深圳市冠旭电子股份有限公司 System architecture, multi-voice platform switching method, intelligent terminal and storage medium
CN109977218A (en) * 2019-04-22 2019-07-05 浙江华坤道威数据科技有限公司 A kind of automatic answering system and method applied to session operational scenarios
CN110445942A (en) * 2019-08-30 2019-11-12 上海浦东发展银行股份有限公司信用卡中心 A kind of outer paging system of intelligence and intelligent calling-out method
CN110769124A (en) * 2019-10-30 2020-02-07 国网江苏省电力有限公司镇江供电分公司 Electric power marketing customer communication system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
J. YANG等: ""Multilayer Adaptation Based Complex Echo Cancellation and Voice Enhancement"", 《2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
杨磊: ""声控多媒体系统适配工具的设计与实现"", 《万方学位论文》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114143401A (en) * 2021-11-24 2022-03-04 建信金融科技有限责任公司 Telephone customer service response adaptation method and device
CN114143401B (en) * 2021-11-24 2024-04-12 建信金融科技有限责任公司 Telephone customer service response adapting method and device

Also Published As

Publication number Publication date
CN111917726B (en) 2022-03-15

Similar Documents

Publication Publication Date Title
CN1669258B (en) Call transfer using session initiation protocol (sip)
DE602004000139T2 (en) Fast SIP / SDP network method for conference operation with optimization of network resources
CN100375481C (en) Method and system for realizing inter communication of telecommunication business between isomerized networks
US8767563B2 (en) System and method of remote testing in loopback mode using MGCP/NCS
CN101485172B (en) Method and communication terminal for provision of VOIP
CN109640028A (en) It is a kind of by multiple method and apparatus for carrying out group meeting depending on networked terminals and multiple internet terminals
WO2008040257A1 (en) Method, device and system for end to end tracking
CN108881149A (en) A kind of cut-in method and system of videophone device
CN101217533B (en) A synchronous big head portrait method and cluster server and terminal for the method
CN111917726B (en) Adaptation layer, voice communication system and control method thereof
EP1523839B1 (en) Multimedia application interface
US20070280204A1 (en) Metadata collection
CN101159910B (en) Synchronous present state method and cluster server and terminal for implementing the method
CN1984199B (en) Method for overlapped transmitting number by session initial protocol
CN100471328C (en) Call tracking method, system and device
CN110121049A (en) A kind of IMS conference medium method of flow control and device
CN103401842B (en) Cable voice port calling-control method based on Session Initiation Protocol
DE50301192D1 (en) A method for packet-oriented transmission of data in telecommunication networks by means of conversion in an intermediate node from a connectionless to a connection-oriented transmission protocol and vice versa
CN101997887A (en) Method for sending, receiving and transmitting service signaling and terminal
CN100531216C (en) Method and device for controlling medium resource
CN109005305A (en) A kind of call processing method and device of call center
CN100525193C (en) Method for implementing conference telephone in next generation network
Moon et al. A CRM model based on voice over IP
CN101437090B (en) Call center suitable for TDM and IP mixing network
US8130679B2 (en) Individual processing of VoIP contextual information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220914

Address after: 12 / F, 15 / F, 99 Yincheng Road, Pudong New Area pilot Free Trade Zone, Shanghai, 200120

Patentee after: Jianxin Financial Science and Technology Co.,Ltd.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Patentee before: CHINA CONSTRUCTION BANK Corp.

Patentee before: Jianxin Financial Science and Technology Co.,Ltd.