CN108831434A

CN108831434A - voice interactive system and method

Info

Publication number: CN108831434A
Application number: CN201810529253.6A
Authority: CN
Inventors: 尹绍华
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2018-11-16

Abstract

This application involves a kind of voice interactive system and methods, including：Multiple simple customer ends；Intelligent agent；The intelligent agent is connect with each simple customer end respectively；The simple customer end is used to receiving and/or playing voice, and voice transfer is carried out between the intelligent agent；The intelligent agent is used to carry out the relevant treatment that intelligent sound interacts with Cloud Server, and obtains relevant information, and the relevant information is converted to voice signal.By the way that intelligent agent is arranged, simple customer end only needs to carry out voice input, output, completes voice communication with intelligent agent, it can allows user to complete the interactive voice of intelligence, therefore the simplification of family terminal can be used, expand application scenarios, reduce production cost.

Description

Voice interactive system and method

Technical field

This application involves technical field of voice interaction more particularly to a kind of voice interactive system and methods.

Background technique

Artificial Intelligence Development quickly, is applied more and more extensive in many occasions at present.Wherein, based on the people of interactive voice Work intellectual product has more application and new product to occur at present.

Its implementation is typically all the interaction framework based on SmartClient and cloud server, complicated calculating collection In in cloud server, so as to simplify client, save the cost of client.

In the related technology, although simplifying client, still certain intelligence requires client, for example, client If end will be interacted with cloud server, need to realize basic interaction logic, the realization of these interaction logics is to equipment It there are certain requirements, such as the ability in operation third party's intelligent sound processing terminal library, the energy of related resource obtained by network Power etc. realizes that the expense of above-mentioned terminal is not low；Some traditional audio frequency apparatuses simultaneously such as simulate intercommunication terminal, can not Using, or if transformation can improve cost.

Summary of the invention

To be overcome the problems, such as present in the relevant technologies at least to a certain extent, the application provides a kind of voice interactive system And method.

According to the embodiment of the present application in a first aspect, provide a kind of voice interactive system, including：Multiple simple customer ends； Intelligent agent；The intelligent agent is connect with each simple customer end respectively；The simple customer end is for receiving and/or playing Voice, and voice transfer is carried out between the intelligent agent；The intelligent agent is used to carry out intelligent language with Cloud Server The relevant treatment of sound interaction, and relevant information is obtained, the relevant information is converted into voice signal.

Optionally, it is connected between the intelligent agent and each simple customer end using analog communication links, alternatively, Using TCP/IP link connection.

Optionally, each simple customer end is deployed in user local, for completing and the interaction of user.

Optionally, intelligent agent is deployed in user local, manages multiple simple customer ends；Alternatively, deployment is beyond the clouds.

Optionally, the simple customer end includes：Voice acquisition module, for receiving the voice of user；Voice plays mould Block, for playing voice to user；Transmission module, for by the received voice transfer of voice acquisition module to intelligent agent, with And the voice that intelligent agent is sent is received, and received voice is sent to voice playing module.

Optionally, the simple customer end further includes following one or more：Voice coding module, for institute's predicate The received voice of sound acquisition module is encoded, and the voice after coding is sent to the transmission module；Tone decoding module, For receiving the encoded voice of intelligent agent transmission, it is sent to voice after being decoded to encoded voice and plays mould Block；External control interface for receiving the phonetic order converted by voice, and is transferred to third party.

Optionally, the intelligent sound interaction process and obtain the ability that user needs resource that the intelligent agent is used for, Including the one or more of following item：Speech recognition, speech synthesis, semantic understanding, resource acquisition, speech detection, user make by oneself Justice processing logic.

According to the second aspect of the embodiment of the present application, a kind of voice interactive method is provided, including：Simple customer end and intelligence Voice transfer is carried out between agency, wherein the simple customer end is when receiving the voice of user's sending, by received voice It is sent to the intelligent agent, alternatively, the simple customer end receives the voice that the intelligent agent is sent and plays to user； The intelligent agent carries out the relevant treatment that intelligent sound interacts with Cloud Server, and obtains relevant information, by the correlation Information is converted to voice signal；Wherein, each intelligent agent is connect with multiple simple customer ends.

Optionally, the relevant treatment further includes：The intelligent agent is by speech recognition as a result, processing refers to for control It enables, returns to simple customer end.

Technical solution provided by the present application can include the following benefits：

By the way that intelligent agent is arranged, simple customer end only needs to carry out voice input, output, completes audio with intelligent agent Communication, it can allow user to complete the interactive voice of intelligence, therefore the simplification of family terminal can be used, expand application scenarios, drop Low production cost.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The application can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the application Example, and together with specification it is used to explain the principle of the application.

Fig. 1 is the structural schematic diagram for the voice interactive system that the application one embodiment provides；

Fig. 2 is the structural schematic diagram at the simple customer end for the voice interactive system that another embodiment of the application provides；

Fig. 3 is the flow chart for the voice interactive method that another embodiment of the application provides；

Fig. 4 is the flow chart for the voice interactive method that another embodiment of the application provides；

Fig. 5 is the flow chart for the voice interactive method that another embodiment of the application provides；

Fig. 6 is the flow chart for the voice interactive method that another embodiment of the application provides.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.

Fig. 1 is the structural schematic diagram for the voice interactive system that the application one embodiment provides.As shown in Figure 1, with two For simple customer end.The voice interactive system includes：Multiple simple customer ends 1；Intelligent agent 2；The intelligent agent 2 is distinguished It is connect with each simple customer end 1；The simple customer end 1 for receiving and/or play voice, and with the intelligent agent Voice transfer is carried out between 2；The intelligent agent 2 is used to carry out the relevant treatment that intelligent sound interacts with Cloud Server 3, and The information that user needs is obtained, the resource is converted into voice signal.

What above-mentioned simple customer end can have audio outputs and inputs function and transmitting function, simple customer end It is deployed on local, to interact with user.Further, simple customer end can also have the function of audio encoding and decoding.

Above-mentioned intelligent agent can connect multiple simple customer ends, can collect and send audio-frequency information, can also complete With the interaction of Cloud Server, such as：Speech recognition, speech synthesis, semantics recognition, resource acquisition etc..Pass through multiple simple customers End connects the same intelligent agent, can be the service of multiple simple customer ends by an intelligent agent, simple without one Client corresponds to an intelligent agent, to realize simple, saving cost.

Above-mentioned relevant treatment may include speech recognition, speech synthesis, semantics recognition, resource acquisition etc.；Above-mentioned related letter Breath may include speech answering, control instruction etc..

In the present embodiment, by the way that intelligent agent is arranged, simple customer end only needs to carry out voice input, output, with intelligence Agency completes voice communication, it can allows user to complete the interactive voice of intelligence, therefore the simplification of family terminal can be used, expand Application scenarios reduce production cost.

In some embodiments, connected between the intelligent agent 2 and each simple customer end 1 using analog communication links It connects, alternatively, using transmission control protocol/Internet Protocol (Transmission Control Protocol/ Internet Protocol, TCP/IP) link connection.

It should be noted that analogue communication is to utilize the variation of the amplitude, frequency or phase of sine wave, or utilize pulse Amplitude, width or change in location simulate original signal, to achieve the purpose that communication, therefore referred to as analogue communication.TCP/IP chain Road connection is the digital audio and video signals that coding is transmitted by TCP/IP.Two kinds of communication connection modes are all made of the prior art, this Place is no longer described in detail.

Transmission control protocol/Internet Protocol is the most basic agreement in internet, because of the base of special Internet Plinth is made of the IP agreement of network layer and the Transmission Control Protocol of transport layer.

When using TCP/IP link connection mode, it can increase between simple customer end and intelligent agent some simple Interactive instruction plays some specified prompt tone etc., for example, voice transmission is completed to simplify voice input/output control System.

In the present embodiment, by using analog communication links connection or TCP/IP between intelligent agent and simple customer end Link connection, convenient for carrying out information transmission, interactive voice between both ends, in order to save bandwidth, simple customer end can also increase Audio coding or decoded function.

It should be noted that each simple customer end is deployed in user local, for completing and the interaction of user.

In some embodiments, intelligent agent 2 is deployed in the multiple simple customer ends of user's local management；Alternatively, being deployed in cloud End.

In the present embodiment, intelligent agent can be deployed in user local and cloud, can be with management and service in multiple simple Client, to reduce the cost of implementation of whole system.

As shown in Fig. 2, in some embodiments, the simple customer end 1 includes：Voice acquisition module 11 is used for receiving The voice at family；Voice playing module 12, for playing voice to user；Transmission module 13, for connecing voice acquisition module 11 The voice transfer of receipts to intelligent agent 2, and, receive the voice that intelligent agent 2 is sent, and received voice be sent to voice Playing module 12.

It should be noted that above-mentioned voice acquisition module can be microphone, the voice playing module can be loudspeaking Device.

In the present embodiment, by the way that voice acquisition module, voice playing module, transmission module is arranged, meet simple customer end Basic function need, carry out voice collecting, play, and the communication interaction with intelligent agent.

As shown in Fig. 2, the simple customer end further includes following one or more in some embodiments：

Voice coding module 14, for being encoded to the received voice of the voice acquisition module 11, and will be after coding Voice be sent to the transmission module 13；

Tone decoding module 15 carries out encoded voice for receiving the encoded voice of the transmission of intelligent agent 2 Voice playing module 12 is sent to after decoding.

External control interface for receiving the phonetic order converted by voice, and is transferred to third party.

It should be noted that the purpose that voice is encoded be in order to by voice digitization and using people voiced process Present in the auditory properties of redundancy and people reduce encoding rate.

Further, above-mentioned voice acquisition module can be microphone, and above-mentioned voice playing module can be loudspeaker.

It is understood that voice coding and tone decoding can realize that this will not be detailed here using the relevant technologies.

In the present embodiment, by setting voice coding module and tone decoding module, the transmission of voice signal can be compressed Bandwidth increases the efficiency of transmission of channel；External control interface is set, and simple customer end can receive and transmit the control of voice conversion System instruction, realizes the third-party human-computer interaction function of voice control.

In some embodiments, the intelligent sound interaction process and acquisition user that the intelligent agent 2 is used for need resource Ability, including the one or more of following item：Speech recognition, speech synthesis, semantic understanding, resource acquisition, speech detection, use The customized processing logic in family.

It should be noted that intelligent agent can be with the interaction of multiple cloud resource servers, such as speech recognition server, sound Sound synthesis server, semantic understanding and answering server, Resource Server.

In the present embodiment, multiple function is integrated in one by intelligent agent, reduces the function at simple customer end, reduces simple visitor Family end cost.

As shown in figure 3, the application also provides a kind of voice interactive method, include the following steps：

S31：Received voice is sent to intelligent agent when receiving the voice of user's sending by simple customer end；

S32：Intelligent agent carries out the relevant treatment that intelligent sound interacts with Cloud Server, and obtains relevant information, will The relevant information is converted to voice signal；

Wherein, each intelligent agent is connect with multiple simple customer ends.

S33：Simple customer end receives the voice that intelligent agent is sent and plays to user.

In the present embodiment, by intelligent agent, speech recognition, synthesis, semantics recognition etc. can be carried out, and client only needs Voice input, output are carried out, is communicated with intelligent agent, and audio is encoded or decoded, simplifies client, Reduce production cost.

As shown in figure 4, including the following steps the embodiment of the present application also provides a kind of process of voice response：

S41：The connection for arriving intelligent agent is initiated after user's triggering is spoken in simple customer end；

S42：Simple customer end constantly sends the voice input " what is your name " being collected into and arrives intelligent agent；

S43：Intelligent agent collects voice input, detects after the completion of speaking, and sends voice and is input to speech-recognition services, And waiting voice input results；

S44：Intelligent agent receives speech recognition result, sends the text identified to semantic understanding and answer service, and Awaiting reply；

S45：It after intelligent agent receives a reply, is replied if it is text, then sends text and reply to speech synthesis service, and Waiting voice composite result；

S46：Intelligent agent receives speech synthesis as a result, and speech synthesis result is sent back simple customer end；

S47：Simple customer end plays speech synthesis result to user, answers " I is red bean " herein.

In the present embodiment, by the interaction of intelligent agent and Cloud Server, it can be carried out speech recognition, semantics recognition, go forward side by side Voice response process is completed in row speech synthesis sequence of operations, reduces the requirement to client, reduces client cost of manufacture.

As shown in figure 5, including the following steps the embodiment of the present application also provides a broadcasting music process：

S51：The connection for arriving intelligent agent is initiated after user's triggering is spoken in simple customer end；

S52：Simple customer end constantly sends the voice input " I wants to listen lustily water " being collected into and arrives intelligent agent；

S53：Intelligent agent collects voice input, detects after the completion of speaking, and sends voice and is input to speech-recognition services, And waiting voice input results；

S54：Intelligent agent receives speech recognition result, sends the text identified to semantic understanding and answer service, and Awaiting reply；

S55：After intelligent agent receives a reply, if it is the answer of playing music, then it transmit a request to music service Device requests MP3 music；

S56：After intelligent agent receives MP3 stream, the ability whether simple customer end has decoding MP3 is judged, if it is not, Intelligent agent first decodes MP3 stream, if so, then without decoding step；

S57：MP3 circulation is issued simple customer end by intelligent agent；

S58：Simple customer end plays music.

In the present embodiment, by the interaction of intelligent agent and Cloud Server, it can be carried out speech recognition, semantics recognition, go forward side by side Row speech synthesis sequence of operations completes control music process, reduces the requirement to client, reduces client and is fabricated to This.

As shown in fig. 6, including the following steps the embodiment of the present application also provides a voice control process：

S61：Simple customer end can initiate the connection of intelligent agent after user sets out and speaks；

S62：After user speaks, simple customer end can constantly send the sound being collected into intelligent agent, use at this time Say " opening TV " in family；

S63：Intelligent agent collects voice input, detects completion of speaking, that is, sends voice to speech-recognition services, and Awaiting reply；

S64：After intelligent agent receives a reply, if it is a control instruction, i.e., control instruction is sent to simple visitor Family end；

S65：Control instruction is sent to third party by simple customer end, opens electricity here by serial ports control infrared transmitter Depending on machine.

In the present embodiment, by the interaction of intelligent agent and Cloud Server, speech recognition, semantics recognition can be carried out, complete Voice control TV is controlled, the requirement to client is reduced, reduces the cost of manufacture of remote controler.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

It is understood that same or similar part can mutually refer in the various embodiments described above, in some embodiments Unspecified content may refer to the same or similar content in other embodiments.

It should be noted that term " first ", " second " etc. are used for description purposes only in the description of the present application, without It can be interpreted as indication or suggestion relative importance.In addition, in the description of the present application, unless otherwise indicated, the meaning of " multiple " Refer at least two.

Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be by the application Embodiment person of ordinary skill in the field understood.

Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms are not Centainly refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any One or more embodiment or examples in can be combined in any suitable manner.

Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it should not be understood as the limitation to the application, those skilled in the art within the scope of application can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims

1. a kind of voice interactive system, which is characterized in that including：

Multiple simple customer ends；

Intelligent agent；

The intelligent agent is connect with each simple customer end respectively；

The simple customer end is used to receiving and/or playing voice, and voice transfer is carried out between the intelligent agent；

The intelligent agent is used to carry out the relevant treatment that intelligent sound interacts with Cloud Server, and obtains relevant information, will The relevant information is converted to voice signal.

2. system according to claim 1, which is characterized in that between the intelligent agent and each simple customer end It is connected using analog communication links, alternatively, using TCP/IP link connection.

3. system according to claim 1, which is characterized in that each simple customer end is deployed in user local, has been used for At the interaction with user.

4. system according to claim 3, which is characterized in that intelligent agent is deployed in user local, manages multiple simple Client；Alternatively, deployment is beyond the clouds.

5. system according to claim 1, which is characterized in that the simple customer end includes：

Voice acquisition module, for receiving the voice of user；

Voice playing module, for playing voice to user；

Transmission module is used for the received voice transfer of voice acquisition module to intelligent agent, and, it receives intelligent agent and sends Voice, and received voice is sent to voice playing module.

6. system according to claim 5, which is characterized in that the simple customer end further includes such as the next item down or more ?：

Voice coding module is sent out for encoding to the received voice of the voice acquisition module, and by the voice after coding Give the transmission module；

Tone decoding module, for receiving the encoded voice of intelligent agent transmission, after being decoded to encoded voice It is sent to voice playing module；

7. system according to claim 1, which is characterized in that the intelligent sound interaction process that the intelligent agent is used for And the ability that user needs resource is obtained, including the one or more of following item：Speech recognition, speech synthesis, semantic understanding, money Source acquisition, speech detection, the customized processing logic of user.

8. a kind of voice interactive method, which is characterized in that including：

Voice transfer is carried out between simple customer end and intelligent agent, wherein the simple customer end is receiving user's sending Voice when, received voice is sent to the intelligent agent, alternatively, the simple customer end receives intelligent agent hair The voice that send simultaneously plays to user；

The intelligent agent carries out the relevant treatment that intelligent sound interacts with Cloud Server, and obtains relevant information, will be described Relevant information is converted to voice signal；

Wherein, each intelligent agent is connect with multiple simple customer ends.

9. according to the method described in claim 8, it is characterized in that, the relevant treatment further includes：The intelligent agent is by language Sound identification as a result, processing be control instruction, return to simple customer end.