CN111128159B - Method and system for realizing multi-channel message distribution of intelligent loudspeaker box - Google Patents

Method and system for realizing multi-channel message distribution of intelligent loudspeaker box Download PDF

Info

Publication number
CN111128159B
CN111128159B CN201911312805.9A CN201911312805A CN111128159B CN 111128159 B CN111128159 B CN 111128159B CN 201911312805 A CN201911312805 A CN 201911312805A CN 111128159 B CN111128159 B CN 111128159B
Authority
CN
China
Prior art keywords
text
server
sound box
result
reply result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911312805.9A
Other languages
Chinese (zh)
Other versions
CN111128159A (en
Inventor
魏志斌
杨谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhikan Technology Co ltd
Original Assignee
Shanghai Zhikan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhikan Technology Co ltd filed Critical Shanghai Zhikan Technology Co ltd
Priority to CN201911312805.9A priority Critical patent/CN111128159B/en
Publication of CN111128159A publication Critical patent/CN111128159A/en
Application granted granted Critical
Publication of CN111128159B publication Critical patent/CN111128159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/40Security arrangements using identity modules
    • H04W12/48Security arrangements using identity modules using secure binding, e.g. securely binding identity modules to devices, services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • H04W4/14Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a method for realizing multi-channel message sending of an intelligent sound box, which comprises the following steps of firstly, connecting a cloud access part of a voice interaction system of the intelligent sound box with an ASR server and a TTS server, and setting an information distribution server in the voice interaction system of the intelligent sound box; when the intelligent sound box terminal receives a voice request sent by a user, the ASR server converts the voice request into a text request, sends the text request to a service logic server in the intelligent sound box voice interaction system, and generates a corresponding text reply result by the service logic server and sends the text reply result to an information distribution server; and then, the information distribution server identifies the text reply result, determines a sending channel according to the obtained identification result and sends the text reply result. The method and the device can realize multi-channel issuing of the reply result of the intelligent sound box, better protect the privacy of the user and facilitate the retention of the reply result information.

Description

Method and system for realizing multi-channel message distribution of intelligent loudspeaker box
Technical Field
The invention relates to an information issuing method in a voice interaction system, in particular to a method and a system for realizing multi-channel message issuing of an intelligent sound box.
Background
In recent two or three years, the smart sound box market has grown rapidly, and has gradually become a large-flow inlet of smart homes. The function of the sound box is continuously enhanced, and more abundant and various service contents are provided for users. More and more families have the intelligent loudspeaker box.
The existing intelligent sound box equipment generally adopts a voice interaction mode, namely a mode that a user asks for a response with a sound box. After the user activates the sound box and connects the sound box to the internet (generally through a Wifi mode), the user can initiate a request to the sound box in a normal speaking mode. After receiving the voice request, the microphone module of the sound box transmits the voice data to the cloud, converts the voice data into text data through an Automatic Speech Recognition technology (ASR), then transmits the text data to the back-end service logic server, and combines other technologies (such as a natural voice processing technology) to obtain a reply result of the corresponding problem, wherein the reply result exists in a text manner at this stage.
After the Text reply result is generated, the processing logic of the intelligent sound box generally generates a voice file through a voice synthesis technology (Text-To-Speech, TTS), and then plays the voice file on the sound box platform, so that the user obtains a required response result, thereby completing the whole flow of voice interaction.
In this process, the user input and output are in speech form, which, although in most cases brings convenience to the user, also causes problems, such as:
1) some information of the user relates to personal privacy and is not suitable for voice broadcasting. Because the presentation of speech is not fully controlled during the presentation, it is received by all people in a certain physical space, which leads to privacy leakage.
2) Some information needs to be left by the user for further processing, such as detailed terms of insurance products, and the user may need to download and print the information before going through the detailed view, or forward the information to a friend who trusts the information to obtain opinions and suggestions, and if the information is only broadcasted by voice, the requirements cannot be met.
3) Some messages contain long digital strings, such as the policy number of the user, and the user needs to know the long digital strings accurately and then inquire the detailed information of the related policy, and if the messages are broadcast only by voice, the user may need to prepare a paper pen in advance and record the information accurately when the information is issued.
4) In addition, there are cases where a user wants to set the device to be sent to the mobile phone software APP or the short message in a text manner, such as some types of business information, etc., and it is difficult to meet such a requirement in the existing voice interaction manner of the sound box.
In summary, the existing complete voice interaction mode of the smart sound box cannot bring a more perfect experience to the user, and the technical solution is needed to perform function improvement on the sound box, so that the above problems can be better solved.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a system for realizing multi-channel message issuing of an intelligent sound box, so as to solve the problems that when the intelligent sound box is in voice interaction with a user, the privacy of the user is easy to leak risks, the information is inconvenient to obtain, the information retaining mode is not ideal and the like, better protect the privacy of the user, enable the obtaining and retaining of issued information to be smoother and improve the user experience.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a method and a system for implementing a multichannel message distribution of an intelligent sound box, including the following steps:
step S1: connecting a cloud access part of the intelligent sound box voice interaction system with an automatic voice recognition (ASR) server and a text-to-speech (TTS) server, and setting an information distribution server in the intelligent sound box voice interaction system;
step S2: when the intelligent sound box terminal receives a voice request sent by a user, an Automatic Speech Recognition (ASR) server converts the voice request into a text request, sends the text request to a service logic server in the intelligent sound box voice interaction system, and generates a corresponding text reply result by the service logic server and sends the text reply result to an information distribution server;
step S3: and the information distribution server identifies the text reply result, determines a sending channel according to the obtained identification result and sends the text reply result.
Preferably, in the step S3:
when the determined issuing channel is a voice channel, sending the text reply result to a text-to-speech (TTS) server, converting the text reply result into a voice reply result by the text-to-speech (TTS) server, and sending the voice reply result to the intelligent sound box terminal for voice broadcasting;
when the determined issuing channel is a short message channel, sending a text reply result to a short message gateway server, and sending the text reply result to a mobile phone terminal associated with the intelligent sound box terminal in a short message form by the short message gateway server;
and when the determined issued channel is the APP channel, sending the text reply result to an APP access server, and sending the APP access server to a mobile phone APP associated with the intelligent sound box terminal in an APP message form.
Preferably, the step S3 further includes:
step S31: after the information distribution server obtains the identification result, a text confirmation request is generated according to the identification result;
step S32: the text confirmation request is sent to a text-to-speech (TTS) server, the text-to-speech (TTS) server converts the text confirmation request into a speech confirmation request and sends the speech confirmation request to the intelligent sound box terminal for voice broadcasting;
step S33: after receiving the response result, the intelligent sound box terminal sends the response result to the information distribution server;
step S34: and the information distribution server determines a issuing channel according to the identification result and the response result and issues a text reply result.
Preferably, in step S3, the process of recognizing the text reply result by the information distribution server includes:
step A1: judging whether the text reply result contains the user privacy content, if so, skipping to the step A5, otherwise, carrying out the next step;
step A2: judging whether the text reply result contains long text content, if so, jumping to the step A5, otherwise, performing the next step;
step A3: judging whether the text reply result contains the long digital sequence content, if so, jumping to the step A5, otherwise, performing the next step;
step A4: selecting a voice channel as a sending channel, sending the text reply result to a text-to-speech (TTS) server, converting the text reply result into a voice reply result by the text-to-speech (TTS) server, and sending the voice reply result to the intelligent sound box terminal for voice broadcasting;
step A5: judging whether the text reply result is suitable for the APP issuing, if so, sending the text reply result to an APP access server, and sending the text reply result to a mobile phone APP associated with the intelligent sound box terminal in an APP message form by the APP access server, otherwise, carrying out the next step;
step A6: and sending the text reply result to a short message gateway server, and sending the text reply result to a mobile phone terminal associated with the intelligent sound box terminal in a short message form by the short message gateway server.
In a second aspect, an embodiment of the present invention provides a system for issuing a message on an intelligent speaker in multiple channels, including a speaker access server, a service logic server, and an information distribution server;
the sound box access server is used for interacting voice files with the intelligent sound box terminal, is connected with an automatic voice recognition (ASR) server and a text-to-speech (TTS) server, and can send the voice files received by the intelligent sound box terminal to the automatic voice recognition (ASR) server for text processing and receive and broadcast the voice files sent by the text-to-speech (TTS) server;
the service logic server is used for obtaining a text request according to the conversion of the Automatic Speech Recognition (ASR) server, generating a corresponding text reply result and sending the text reply result to the information distribution server;
the information distribution server is used for identifying the text reply result, determining a sending channel according to the obtained identification result and sending the text reply result.
Preferably, the information distribution server is in communication connection with the sound box access server through a text-to-speech (TTS) server.
Preferably, the information distribution server is in communication connection with the loudspeaker box access server through the APP access server.
Preferably, the information distribution server is in communication connection with the sound box access server through a short message gateway server.
Preferably, the business logic server generates a text reply result corresponding to the text request by using a natural speech processing (NLP) technique.
Preferably, the information distribution server is further configured to generate a text confirmation request according to the recognition result, and send the response result to the information distribution server after the smart speaker terminal receives the response result; and the information distribution server determines a issuing channel according to the identification result and the response result and issues a text reply result.
According to the technical system and the implementation method, on the premise that too many resources are not required to be invested, the text reply result is identified through the intelligent judgment rule preset in the information distribution server and the specific rule set by the user, and the text reply result is sent to the intelligent sound box terminal, the short message end or the APP end of the mobile phone terminal in a proper mode, so that the problems of privacy disclosure of the user, inconvenience in information acquisition, unsatisfactory information retention and the like are effectively solved, the privacy of the user is better protected, the issued text reply result is more smoothly acquired and retained, the user experience is improved, the market competitiveness of the intelligent sound box terminal is enhanced, and the popularization of the intelligent sound box terminal is facilitated.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a frame structure diagram of a multi-channel message issuing system applying the method of the present invention;
FIG. 2 is a schematic diagram illustrating a binding between a user ID and a smart speaker terminal ID;
FIG. 3 is a schematic flow chart of a text reply result issued by an information distribution server in different channels;
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The embodiments of the present invention, and all other embodiments obtained by a person of ordinary skill in the art without any inventive work, belong to the scope of protection of the present invention.
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
According to the invention, the information distribution server is added in the intelligent sound box voice interaction system, and the voice interaction system identifies the text reply result and selects the text reply result below different information channels according to the identification result, so that the message is issued through multiple channels. Fig. 1 schematically shows a frame structure diagram of a multi-channel message issuing system applying the method of the present invention, and as shown in fig. 1, the multi-channel message issuing system includes an intelligent speaker terminal, a speaker access server, a service logic server and an information distribution server.
The intelligent sound box terminal is used for receiving a voice request of a user and broadcasting a voice file sent by the back end. As shown in fig. 2, before the user uses the smart speaker terminal, the user ID needs to be bound with the device ID of the smart speaker terminal, and the binding process is completed through the mobile APP client on the mobile phone terminal. When the user uses the smart sound box terminal for the first time, the user needs to identify the smart sound box terminal at the APP client side, and then under the WiFi scene, the APP client side configures corresponding parameters for the smart sound box terminal, so that the smart sound box terminal is connected with the WiFi network accessed by the mobile phone terminal. The intelligent sound box terminal is accessed to a WiFi network, the equipment ID (namely the unique serial number of the equipment) and the user ID registered by the user at the mobile phone APP client are automatically transmitted to the back end, and the equipment ID and the user ID are bound at the back end to form a mapping table of the user ID and the equipment ID so as to facilitate subsequent service logic processing and information sub-channel issuing.
The sound box access server is used for interacting voice files with the intelligent sound box terminal, the sound box access server is in communication connection with the service logic server through an automatic voice recognition (ASR) server (hereinafter called ASR server for short), and after the sound box access server receives a voice request of a user from the intelligent sound box terminal at the front end, the ASR server converts the voice request into a text request and sends the text request to the service logic server at the rear end for logic processing. The sound box access server is also in communication connection with the information distribution server through a text-to-speech (TTS) server (hereinafter referred to as TTS server), the TTS server takes a text reply result received from the information distribution server at the rear end as a speech reply result and sends the speech reply result to the sound box access server, and the sound box access server sends the speech reply result to the intelligent sound box terminal at the front end for speech broadcast.
And the service logic server is used for judging natural voice processing (NLP) according to the received text request, performing service processing by combining data in a database of the intelligent sound box voice interaction system and then generating a text reply result to be issued. In this embodiment, the service processing logic of the service logic server is the same as that of the existing smart speaker, and is not described in detail herein.
The information distribution server is used for identifying the text reply result, determining a sending channel according to the obtained identification result and sending the text reply result. And presetting a channel issuing rule in the information distribution server, identifying the text reply result through the preset channel issuing rule and determining an issuing channel. Specifically, the sub-channel issuing rule includes:
a) whether there are rules specifically set by the user. The user can make some special settings through the APP, for example, all the information of a certain category is selected to be sent down in the form of APP message or short message.
b) Whether the information contains the private content of the user. And if the text reply result transmitted at this time comprises the name, address and other information of the user or a person having interest or a relative relation with the user, selecting to transmit in the form of APP message or short message.
When the user inquires the policy, if the text reply result contains information such as name, address and the like of the insured person, the information is sent to the mobile phone in a text form, so that the privacy of the user can be effectively prevented from being revealed.
c) Whether the information is long text content. And if the statistical word number of the text reply result issued this time exceeds the set 500 words, issuing the result in the form of an APP message or a short message.
In insurance products, one or more insurance clauses inevitably form long texts, and the clauses are contents which need to be seriously considered by a user or even forwarded to others for discussion, and are more suitable to be issued in text form.
d) Whether there is a long sequence of digits in the information. If the text reply result sent down this time contains the long digital sequence information, the text reply result is sent down in the form of APP information or short message.
When the user inquires the policy, if the text reply result contains the policy number, the policy number is usually a long digital sequence, and if the text reply result is broadcasted by voice, the policy number cannot be stored.
e) Other rules require delivery in a non-voice manner.
It should be noted that the above rules are only partially schematic, and do not completely identify the issuing rules in detail, and a system service provider or a user may add, delete, modify, etc. the issuing rules according to actual needs.
The information distribution server is in communication connection with the APP client of the mobile phone terminal through the APP access server. The APP access server is used for interacting various types of data with the APP client of the mobile phone terminal, a user submits a request (at least comprising a text type request) at the APP client, the APP access server sends the request to the rear end for processing, and the APP access server can also receive a text reply result sent by the rear end and sends the text reply result to the mobile phone APP client in the form of an APP message.
The information distribution server is in communication connection with a short message client of the mobile phone terminal through a short message gateway server. The short message gateway server is used for transmitting the text reply result to the short message gateway in a short message mode and sending the result to the short message client of the mobile phone terminal, and the short message client is only used for receiving the text reply result, so that unidirectional downlink communication is formed between the short message gateway server and the mobile phone short message client.
The information distribution server is also used for generating a text confirmation request according to the identification result, and after the intelligent sound box terminal receives the response result, the response result is sent to the information distribution server; and the information distribution server determines a issuing channel according to the identification result and the response result and issues a text reply result.
In the identification process, if the text reply result is matched with the certain rule, the information distribution server generates a text confirmation request according to the identification result and sends the text confirmation request to the TTS server, and the TTS server converts the text confirmation request into a voice confirmation request and sends the voice confirmation request to the intelligent sound box terminal at the front end for broadcasting so that a user can respond and confirm at the front end. And after the intelligent sound box terminal receives the response result, the ASR server converts the response result into a text response result and sends the text response result to the information distribution server through the service logic server, and the information distribution server selects to send the text response result in the form of APP message or short message according to the text response result.
Based on the system, the method for realizing the multi-channel message distribution of the intelligent sound box comprises the following steps:
step 1: when the intelligent sound box terminal receives a voice request sent by a user, the ASR server converts the voice request into a text request, sends the text request to the service logic server, and the service logic server generates a corresponding text reply result and sends the text reply result to the information distribution server.
Step 2: and the information distribution server identifies the text reply result, determines a sending channel according to the obtained identification result and sends the text reply result.
Specifically, in step 2:
when the determined issuing channel is a voice channel, the text reply result is sent to a TTS server, and the TTS server converts the text reply result into a voice reply result and sends the voice reply result to the intelligent sound box terminal for voice broadcast;
when the determined issuing channel is a short message channel, sending a text reply result to a short message gateway server, sending the text reply result to a mobile phone terminal associated with the intelligent sound box terminal in a short message form by the short message gateway server, and displaying the text reply result on a mobile phone short message client;
and when the determined issued channel is the APP channel, sending the text reply result to an APP access server, sending the text reply result to a mobile phone APP associated with the intelligent sound box terminal in an APP message form by the APP access server, and displaying the text reply result at a mobile phone APP client.
Preferably, step 2 further comprises:
step 21: after the information distribution server obtains the recognition result, a text confirmation request is generated according to the recognition result;
step 22: and sending the text confirmation request to a text-to-speech (TTS) server, converting the text confirmation request into a speech confirmation request by the text-to-speech (TTS) server, and sending the speech confirmation request to the intelligent sound box terminal for voice broadcasting so as to enable a user to answer and confirm.
Step 23: after receiving the response result of the user, the intelligent sound box terminal sends the response result to the information distribution server;
step S34: and the information distribution server determines a issuing channel according to the identification result and the response result and issues a text reply result.
The issuing channel is confirmed by combining the response result of the user, so that the issuing of the text reply result is more suitable for the requirement of the user.
Specifically, as shown in fig. 3, in step 2, the process of identifying the text reply result by the information distribution server includes:
step A1: judging whether the text reply result contains the user privacy content, if so, skipping to the step A5, otherwise, carrying out the next step;
step A2: judging whether the text reply result contains long text content, if so, jumping to the step A5, otherwise, performing the next step;
step A3: judging whether the text reply result contains the long digital sequence content, if so, jumping to the step A5, otherwise, performing the next step;
step A4: selecting a voice channel as a sending channel as a voice channel, sending the text reply result to a text-to-speech (TTS) server, converting the text reply result into a voice reply result by the text-to-speech (TTS) server, and sending the voice reply result to the intelligent sound box terminal for voice broadcasting;
step A5: judging whether the text reply result is suitable for the APP issuing, if so, sending the text reply result to an APP access server, and sending the text reply result to a mobile phone APP associated with the intelligent sound box terminal in an APP message form by the APP access server, otherwise, carrying out the next step;
step A6: and sending the text reply result to a short message gateway server, and sending the text reply result to a mobile phone terminal associated with the intelligent sound box terminal in a short message form by the short message gateway server.
To enhance the understanding of the above implementation method, the application of the smart speaker terminal in the insurance industry is exemplified:
it should be noted that, when a user uses the smart sound box terminal as an insurance client, in view of the particularity of insurance services, when the user binds the user ID with the device ID of the smart sound box terminal at the mobile APP client, the user's identity information is bound at the same time, and when the user performs voice interaction with the smart sound box terminal, the service logic server at the back end has the authority to call the related insurance product information matched with the user ID, so as to perform normal voice interaction.
For example, when a user asks a question with voice to a smart speaker terminal: "I have several insurance policies? "
After receiving the voice problem, the intelligent sound box terminal performs text conversion through the ASR server, generates a text reply result after performing service processing on the service logic server, directly sends the text reply result to the TTS server for voice synthesis without triggering a preset issuing rule when the information issuing server is used, and responds a user with voice at the intelligent sound box terminal: "you have two insurance policies, one car insurance and one life insurance. "
For another example, the user asks a question to the smart speaker terminal: "who is the insurance policy insured? "
The intelligent sound box terminal generates a text reply result through the ASR server and the service logic server, when the information issuing server identifies the text reply result, the rule b is matched, at the moment, the information issuing server generates a text confirmation request, and the intelligent sound box terminal is sent through the TTS server to perform voice broadcast: "this information relates to your privacy and is therefore not broadcasted in voice and is sent to your mobile phone APP in text form". And meanwhile, the information issuing server sends the text reply result to the mobile phone APP client of the user and displays the text reply result.
If the user looks over the text on APP and replies the result after, continue to carry out pronunciation to intelligent audio amplifier terminal and ask questions: "I want to see the detailed terms of this life insurance. "
After the intelligent sound box terminal receives the question, another text reply result is generated by the business logic server, the rule c is triggered after the identification of the information distribution server, then a text confirmation request is additionally generated, and the text confirmation request is sent to the intelligent sound box terminal through a TTS server to perform voice broadcast: "number of detailed clauses exceeds six thousand, and it takes 30 minutes if voice broadcast, so it is recommended to send to your mobile phone APP in text form, if you agree? "
And (3) responding by the user: and if yes, the intelligent sound box terminal sends the detailed terms to the mobile phone APP client of the user.
If the user continues to ask questions of the intelligent sound box terminal: "I want to see the policy number for this life insurance. "
The processing procedure at intelligent audio amplifier terminal is as above, after triggering rule d, carry out voice broadcast through intelligent audio amplifier terminal: "do you probably need to record your policy number, do not perform voice broadcast, ask for a question to be sent to your mobile phone APP or a short message? "
If the user answers: ' sending short message. And if yes, the information distribution server forwards the information and the mobile phone number information of the user to a short message gateway server, and the information and the mobile phone number information of the user are sent to a mobile phone short message client of the user in a short message form through a short message gateway. (ii) a
And if the user answers 'sending the mobile phone APP'. And if so, the information distribution server sends the information APP message form to the mobile phone APP client of the user through the APP access server.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (7)

1. A method for realizing message transmission by multiple channels of an intelligent sound box is characterized by comprising the following steps:
step S1: connecting a cloud access part of the intelligent sound box voice interaction system with an automatic voice recognition (ASR) server and a text-to-speech (TTS) server, and setting an information distribution server in the intelligent sound box voice interaction system;
step S2: when the intelligent sound box terminal receives a voice request sent by a user, an Automatic Speech Recognition (ASR) server converts the voice request into a text request, sends the text request to a service logic server in the intelligent sound box voice interaction system, and generates a corresponding text reply result by the service logic server and sends the text reply result to an information distribution server;
step S3: the information distribution server identifies the text reply result, determines an issuing channel according to the obtained identification result, and issues the text reply result;
in the step S3:
when the determined issuing channel is a voice channel, sending the text reply result to a text-to-speech (TTS) server, converting the text reply result into a voice reply result by the text-to-speech (TTS) server, and sending the voice reply result to the intelligent sound box terminal for voice broadcasting;
when the determined issuing channel is a short message channel, sending a text reply result to a short message gateway server, and sending the text reply result to a mobile phone terminal associated with the intelligent sound box terminal in a short message form by the short message gateway server;
and when the determined issued channel is the APP channel, sending the text reply result to an APP access server, and sending the APP access server to a mobile phone APP associated with the intelligent sound box terminal in an APP message form.
2. The method for implementing multi-channel message distribution of smart speakers according to claim 1, wherein the step S3 further includes:
step S31: after the information distribution server obtains the identification result, a text confirmation request is generated according to the identification result;
step S32: the text confirmation request is sent to a text-to-speech (TTS) server, the text-to-speech (TTS) server converts the text confirmation request into a speech confirmation request and sends the speech confirmation request to the intelligent sound box terminal for voice broadcasting;
step S33: after receiving the response result, the intelligent sound box terminal sends the response result to the information distribution server;
step S34: and the information distribution server determines a issuing channel according to the identification result and the response result and issues a text reply result.
3. The method for implementing multi-channel message delivery of an intelligent sound box according to claim 1, wherein in step S3, the process of recognizing the text reply result by the information distribution server includes:
step A1: judging whether the text reply result contains the user privacy content, if so, jumping to the step A5, otherwise, performing the next step;
step A2: judging whether the text reply result contains long text content, if so, jumping to the step A5, otherwise, performing the next step;
step A3: judging whether the text reply result contains the long digital sequence content, if so, jumping to the step A5, otherwise, performing the next step;
step A4: selecting a voice channel as a sending channel, sending the text reply result to a text-to-speech (TTS) server, converting the text reply result into a voice reply result by the text-to-speech (TTS) server, and sending the voice reply result to the intelligent sound box terminal for voice broadcast;
step A5: judging whether the text reply result is suitable for the APP issuing, if so, sending the text reply result to an APP access server, and sending the text reply result to a mobile phone APP associated with the intelligent sound box terminal in an APP message form by the APP access server, otherwise, carrying out the next step;
step A6: and sending the text reply result to a short message gateway server, and sending the text reply result to a mobile phone terminal associated with the intelligent sound box terminal in a short message form by the short message gateway server.
4. A system for issuing messages by an intelligent sound box through multiple channels is an intelligent sound box voice interaction system and is characterized by comprising a sound box access server, a business logic server and an information distribution server;
the sound box access server is used for interacting voice files with the intelligent sound box terminal, is connected with an automatic voice recognition (ASR) server and a text-to-speech (TTS) server, and can send the voice files received by the intelligent sound box terminal to the automatic voice recognition (ASR) server for text processing and receive and broadcast the voice files sent by the text-to-speech (TTS) server;
the service logic server is used for obtaining a text request according to the conversion of the Automatic Speech Recognition (ASR) server, generating a corresponding text reply result and sending the text reply result to the information distribution server;
the information distribution server is used for identifying the text reply result, determining a sending channel according to the obtained identification result and sending the text reply result;
the information distribution server is in communication connection with the sound box access server through a text-to-speech (TTS) server;
the information distribution server is in communication connection with an APP client of the mobile phone terminal through an APP access server.
5. The system for multi-channel message delivery of smart speakers according to claim 4, wherein the message distribution server is communicatively connected to the short message client of the mobile phone terminal through a short message gateway server.
6. The system for multi-channel messaging by a smart speaker as in claim 4, wherein the service logic server generates the text reply result corresponding to the text request by using a natural speech processing (NLP) technique.
7. The system for multi-channel message distribution of smart speakers according to any one of claims 4-6, wherein the information distribution server is further configured to generate a text confirmation request according to the recognition result, and send the response result to the information distribution server after the smart speaker terminal receives the response result; and the information distribution server determines a issuing channel according to the identification result and the response result and issues a text reply result.
CN201911312805.9A 2019-12-18 2019-12-18 Method and system for realizing multi-channel message distribution of intelligent loudspeaker box Active CN111128159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911312805.9A CN111128159B (en) 2019-12-18 2019-12-18 Method and system for realizing multi-channel message distribution of intelligent loudspeaker box

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911312805.9A CN111128159B (en) 2019-12-18 2019-12-18 Method and system for realizing multi-channel message distribution of intelligent loudspeaker box

Publications (2)

Publication Number Publication Date
CN111128159A CN111128159A (en) 2020-05-08
CN111128159B true CN111128159B (en) 2022-05-31

Family

ID=70498347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911312805.9A Active CN111128159B (en) 2019-12-18 2019-12-18 Method and system for realizing multi-channel message distribution of intelligent loudspeaker box

Country Status (1)

Country Link
CN (1) CN111128159B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111369970A (en) * 2020-06-01 2020-07-03 浙江百应科技有限公司 Method for intelligent routing of TTS (text to speech) channel with high availability
CN114566163B (en) * 2022-02-23 2023-05-02 成都智元汇信息技术股份有限公司 Public transport voice processing method, device, system, electronic equipment and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10142271B2 (en) * 2015-03-06 2018-11-27 Unify Gmbh & Co. Kg Method, device, and system for providing privacy for communications
CN110334500B (en) * 2019-06-28 2022-04-12 百度在线网络技术(北京)有限公司 Authority control method and device of intelligent sound box, intelligent sound box and storage medium

Also Published As

Publication number Publication date
CN111128159A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN106791233B (en) It is a kind of for providing the method and IVR system of IVR service procedure
US20080126491A1 (en) Method for Transmitting Messages from a Sender to a Recipient, a Messaging System and Message Converting Means
JP2018501721A (en) Method and apparatus for processing audio information
JP2008219903A (en) Communication server for handling sound and data connection in parallel and method for using the same
CN105206272A (en) Voice transmission control method and system
CN109005190B (en) Method for realizing full duplex voice conversation and page control on webpage
CN111128159B (en) Method and system for realizing multi-channel message distribution of intelligent loudspeaker box
CN111447397B (en) Video conference based translation method, video conference system and translation device
US20040218737A1 (en) Telephone system and method
US20180139158A1 (en) System and method for multipurpose and multiformat instant messaging
CN106847256A (en) A kind of voice converts chat method
CN102655538A (en) Method and system for intelligently creating schedules
US9110888B2 (en) Service server apparatus, service providing method, and service providing program for providing a service other than a telephone call during the telephone call on a telephone
KR20170073417A (en) System for response correspond to mobile message
CN102497443A (en) Vehicle-mounted station based on Internet, system and communication method thereof
US20210249007A1 (en) Conversation assistance device, conversation assistance method, and program
CN111554280A (en) Real-time interpretation service system for mixing interpretation contents using artificial intelligence and interpretation contents of interpretation experts
CN110995577B (en) Multi-channel adaptation method and device for message and storage medium
US9277051B2 (en) Service server apparatus, service providing method, and service providing program
CN111246127A (en) Cross-platform-based real-time subtitle display method and management system
JP2005151553A (en) Voice portal
CN107241200A (en) A kind of Web conference method and device
CN110990553A (en) Coupling method and system of intelligent sound box voice interaction system and insurance agent
US20200322293A1 (en) Information processing system and method
US8331541B1 (en) Systems and methods for providing instant messaging to TDD/TTY users

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant