CN112217934A

CN112217934A - Voice recognition processing method and system in real-time communication system

Info

Publication number: CN112217934A
Application number: CN201910613421.4A
Authority: CN
Inventors: 范俊海
Original assignee: Hangzhou Qiaoka Technology Co ltd
Current assignee: Hangzhou Qiaoka Technology Co ltd
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2021-01-12

Abstract

The invention provides a speech recognition processing method and system in a real-time communication system, which relate to the field of computer networks and Internet, wherein the method is applied to a called party client, and comprises the following steps of S1: the called party client establishes real-time communication connection with the calling party client, then carries out number judgment, and determines whether to answer manually or AI according to the judgment result; s2: if the answer is manual answer, continuing to wait for manual answer, and forming an answer record after the manual answer; s3: if the answer is AI answer, AI answer is directly carried out, AI voice dialogue is carried out in the answering process, answer records are formed after answer is finished, and meanwhile AI voice dialogue is converted into texts. The invention needs to rely on a third-party server platform to directly carry out automatic voice call on all incoming call information through an application program on a called party client, record the call process, form character records after the call is finished, and solve the problem of voice answering of the called party in real-time communication.

Description

Voice recognition processing method and system in real-time communication system

Technical Field

The invention relates to the field of computer networks and internets, in particular to a method and a system for recognizing and processing voice in a real-time communication system.

Background

In real-time communication, for the problem that a call cannot be successfully answered due to busy or no-one answering, a commonly adopted solution is to transfer the call to a third-party server platform through call line transfer, perform logic processing through a program of the server platform, and select whether the call is transferred or not according to a voice prompt of the server platform by a calling party, for example, patent application numbers: CN20160067505.9, patent name: a method for extracting voice messages in a voice mailbox through an instant messaging tool is characterized in that an instant messaging technology is fused on the basis of the existing voice message function, the voice message content of a calling party is pushed to an instant messaging account appointed by a called party in time, communication interaction and information interaction can be realized in time, the voice message leaving and extracting activeness of a user can be improved, the calling party can obtain feedback information of the answering condition of the messages in time, and the enthusiasm of the calling party is increased; patent application No.: CN200410061582.0, patent name: the real-time information call center server provides the delivery of the call center service through a real-time communication call center server platform.

In the above patent applications, a call is transferred to a third-party server platform by call line transfer, and logic processing is performed by a program of the server platform. In this mode, the service flow which can be set by the user needs to depend on the third-party server platform, the incoming calls of the calling party cannot be completely connected and information can not be recorded, the calling of some calling parties is cancelled, and the called party does not know the calling intention and calling content of the calling party.

Disclosure of Invention

In view of the above disadvantages of the prior art, an object of the present invention is to provide a method and a system for speech recognition processing in a real-time communication system, which directly perform automatic speech communication on all incoming messages through an application program on a called party client without depending on a third-party server platform, record the communication process, form a text record after the call is finished, and solve the problem of speech answering of the called party in real-time communication.

The invention provides a speech recognition processing method in a real-time communication system, which is applied to a called party client and comprises the following steps:

s1: the called party client establishes real-time communication connection with the calling party client, then carries out number judgment, and determines whether to answer manually or AI according to the judgment result;

s2: if the answer is manual answer, continuing to wait for manual answer, and forming an answer record after the manual answer;

s3: if the answer is AI answer, AI answer is directly carried out, AI voice dialogue is carried out in the answering process, answer records are formed after answer is finished, and meanwhile AI voice dialogue is converted into texts.

Further, after the manual answering and the AI answering are completed, the called party client can regularly generate an answering analysis report according to the answering records.

Further, when the called party client performs number judgment, if the number is a pre-marked number, the called party client determines that the number is manually answered, and if the number is not the pre-marked number, the called party client determines that the number is AI answered.

Further, when the waiting manual answering is overtime, the manual answering is changed into AI answering.

Furthermore, the pre-marked number and the AI voice dialogue template are both stored in the cloud end and are received and acquired by the called party client in real time.

A speech recognition processing system in a real-time communication system comprises a client, wherein a conversation authority setting module, an AI setting module, a speech recognition module and a report generation module are arranged on the client;

the call authority setting module is used for judging the number and determining whether the call is answered manually or AI according to the judgment result;

the AI setting module is used for carrying out AI voice dialogue when AI answering is carried out;

the voice recognition module is used for converting the AI voice dialog into a text after the AI answering is finished;

the report generation module is used for periodically generating an answering analysis report according to the answering record after the manual answering and AI answering are finished.

Further, the system further comprises a cloud end connected with the client through a network, wherein the cloud end is used for storing the pre-marked number and the AI voice conversation template, and the pre-marked number and the AI voice conversation template are received and acquired by the client in real time.

Furthermore, the conversation permission setting module, the AI setting module, the voice recognition module and the report generation module are installed and arranged on the client in the form of an APP or a plug-in function module.

Furthermore, the client is an intelligent device with a voice call function.

As described above, the method and system for processing speech recognition in a real-time communication system according to the present invention have the following advantages: the invention directly carries out automatic voice call on all incoming call information through the application program on the client of the called party without depending on a third-party server platform, records the call process, forms character records after the call is finished, and solves the problem of voice answering of the called party in real-time communication.

Drawings

Fig. 1 is a flowchart illustrating a speech recognition processing method in a real-time communication system according to an embodiment of the present invention.

Fig. 2 is a block diagram showing a structure of a speech recognition processing system in the real-time communication system disclosed in the embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

As shown in fig. 1, the present invention provides a speech recognition processing method in a real-time communication system, wherein the method is applied to a called party client, and the method comprises the following steps:

specifically, when the called party client performs number judgment, if the number is a pre-marked number, the called party client determines that the number is manually answered, and if the number is not the pre-marked number, the called party client determines that the number is AI answered.

S2: if the answer is manual answer, continuing to wait for manual answer, and forming an answer record after the manual answer is finished;

specifically, if the waiting manual answering is overtime, the manual answering is changed into AI answering.

S3: if the answer is AI answer, the AI answer is directly carried out, the AI carries out voice conversation in the answering process, answer records are formed after the AI answer, and meanwhile, the AI voice conversation is converted into texts.

After the manual answering and the AI answering are completed, the called party client can generate an answering analysis report periodically according to the answering record, and the pre-marked number and the AI voice conversation template are both stored in the cloud and are received and acquired by the called party client in real time.

As shown in fig. 2, the present invention provides a speech recognition processing system in a real-time communication system, which includes a client, and a cloud connected to the client via a network, wherein the client is provided with a call authority setting module, an AI setting module, a speech recognition module, and a report generation module; the cloud end is provided with a public number library, a public voice library, a report analysis module and a client upgrading module;

the report generation module is used for periodically generating an answering analysis report according to the answering record after the manual answering and AI answering are finished;

the public number library is used for storing pre-marked manual answering numbers and judging number identification according to the marks;

the public voice library is used for storing an AI answering template;

the report analysis module is used for intelligently analyzing the manual answering and AI answering records and receiving and acquiring the records in real time by the client.

The client side upgrading module is used for providing a client side version upgrading application program.

Specifically, the conversation permission setting module, the AI setting module, the voice recognition module and the report generation module are installed and arranged on the client in the form of an APP or a plug-in function module.

The client is an intelligent device with a voice call function, such as a smart phone, a smart watch and the like.

The first embodiment is as follows:

when a client receives a take-away call or an express call, the call is usually displayed as a harassing call, and for the call, when the voice recognition processing system of the invention is used for processing the call, the client judges that the call belongs to an unmarked number and carries out AI answering when receiving an unknown call;

the client side AI setting module calls the universal template to carry out voice call, the AI voice call is converted into a text after the call is finished, and the client side can check the text and voice call record converted from the AI voice call through the report checking module (including information such as call time, call duration and the like).

In conclusion, the invention does not need the called party to select the incoming call, and directly answers the incoming call, thereby effectively avoiding the situation that part of the information which is misjudged as harassing calls cannot be known by the called party, and simultaneously converting the voice call content into text after AI voice call, thereby being convenient to check; the answering dialogue adopts an AI voice mode, automatically adapts to the incoming content of the calling party and has deep learning ability. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A speech recognition processing method in a real-time communication system, wherein the method is applied to a called party client, the method comprising the steps of:

2. The method of claim 1, wherein the speech recognition processing is performed in a real-time communication system comprising: after the manual answering and AI answering are completed, the called party client can regularly generate an answering analysis report according to the answering records.

3. The method of claim 1, wherein the speech recognition processing is performed in a real-time communication system comprising: when the called party client judges the number, if the number is the pre-marked number, the artificial answering is determined, and if the number is not the pre-marked number, the AI answering is determined.

4. The method of claim 1, wherein the speech recognition processing is performed in a real-time communication system comprising: and when the waiting manual answering is overtime, the manual answering is converted into AI answering.

5. The method of claim 3, wherein the speech recognition processing is performed in a real-time communication system comprising: the pre-marked number and the AI voice dialogue template are both stored in the cloud end and are received and acquired by the called party client in real time.

6. A speech recognition processing system in a real-time communication system, comprising: the system comprises a client, wherein a conversation authority setting module, an AI setting module, a voice recognition module and a report generation module are arranged on the client;

7. The system of claim 6, wherein the speech recognition processing system comprises: the system further comprises a cloud end connected with the client end through a network, wherein the cloud end is used for storing the pre-marked number and the AI voice conversation template, and the pre-marked number and the AI voice conversation template are received and acquired by the client end in real time.

8. The system of claim 6, wherein the speech recognition processing system comprises: the conversation permission setting module, the AI setting module, the voice recognition module and the report generation module are arranged on the client in the form of an APP or a plug-in function module.

9. The system of claim 6, wherein the speech recognition processing system comprises: the client is an intelligent device with a voice call function.