CN109658931A

CN109658931A - Voice interactive method, device, computer equipment and storage medium

Info

Publication number: CN109658931A
Application number: CN201811554298.5A
Authority: CN
Inventors: 黄泽浩; 章锦涛
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-12-19
Filing date: 2018-12-19
Publication date: 2019-04-19

Abstract

The embodiment of the invention provides a kind of voice interactive method, device, computer equipment and storage mediums.This method is applied to interactive voice field, and this method includes input voice being obtained by first terminal, and carry out voice recognition processing to the input voice to obtain input text；Text combination all in preset text library is traversed, the text combination to match with the input text is obtained；The corresponding category of language of second terminal is obtained, and output text is generated according to the corresponding category of language of the second terminal；Output voice is generated according to the output text and the corresponding category of language of the second terminal, and the output voice is sent to the second terminal.Implement the embodiment of the present invention, can solve the problems, such as aphasis present in interactive voice, enrich the interest of interactive voice.

Description

Voice interactive method, device, computer equipment and storage medium

Technical field

The present invention relates to field of computer data processing more particularly to a kind of voice interactive method, device, computer equipments And computer readable storage medium.

Background technique

With the development of internet technology, online game also becomes more and more popular.In online game, game player is usually needed Interaction is carried out with other players, such as text is inputted by Text Entry and carries out written communication, or by real-time Voice communication carries out speech exchange.But in certain massively multiplayer games, player may be distributed in world's every country or Area.Country variant is perhaps since language obstacle can not understand the text information or input language of other side between regional player Sound makes the communication function in game application perform practically no function.

Summary of the invention

The embodiment of the invention provides a kind of voice interactive method, device, computer equipment and storage mediums, it is intended to solve Across languages communicating questions in interactive voice.

In a first aspect, the embodiment of the invention provides a kind of voice interactive methods comprising: it is obtained by first terminal defeated Enter voice, and voice recognition processing is carried out to obtain input text to the input voice；It traverses in preset text library and owns Text combination, obtain and the text combination that matches of input text；Obtain the corresponding category of language of second terminal, and root Output text is generated according to the corresponding category of language of the second terminal；It is corresponding according to the output text and the second terminal Category of language generate output voice, and the output voice is sent to the second terminal.

Second aspect, the embodiment of the invention provides a kind of voice interaction devices comprising:

First acquisition unit for obtaining input voice by first terminal, and carries out voice knowledge to the input voice It manages to obtain input text in other places；

Second acquisition unit obtains and the input text for traversing text combination all in preset text library The text combination to match；

Third acquiring unit, for obtaining the corresponding category of language of second terminal, and it is corresponding according to the second terminal Category of language generates output text；

First generation unit, it is defeated for being generated according to the output text and the corresponding category of language of the second terminal Voice out, and the output voice is sent to the second terminal.

The third aspect, the embodiment of the present invention provide a kind of computer equipment again comprising memory, processor and storage On the memory and the computer program that can run on the processor, wherein the processor executes described program The above-mentioned voice interactive method of Shi Shixian.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can It reads storage medium and is stored with computer program, the computer program includes program instruction, and described program instruction is when by processor The processor is set to execute above-mentioned voice interactive method when execution.

The embodiment of the present invention provides a kind of voice interactive method, device, computer equipment and computer readable storage medium. This method includes input voice being obtained by first terminal, and carry out voice recognition processing to the input voice to obtain input Text；Text combination all in preset text library is traversed, the text combination to match with the input text is obtained；It obtains The corresponding category of language of second terminal, and output text is generated according to the corresponding category of language of the second terminal；According to described It exports text and the corresponding category of language of the second terminal generates output voice, and described in the output voice is sent to Second terminal.Implement the embodiment of the present invention, can solve the problems, such as aphasis present in interactive voice, enrich interactive voice Interest.

Detailed description of the invention

Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides；

Fig. 2 is a kind of application scenarios schematic diagram for voice interactive method that one embodiment of the invention provides；

Fig. 3 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides；

Fig. 4 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides；

Fig. 5 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides；

Fig. 6 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides；

Fig. 7 is a kind of schematic block diagram for voice interaction device that one embodiment of the invention provides；

Fig. 8 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides；

Fig. 9 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides；

Figure 10 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides；

Figure 11 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides；

Figure 12 is a kind of schematic block diagram for computer equipment that one embodiment of the invention provides.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction Described feature, entirety, step, operation, the presence of element and/or component, but one or more other spies are not precluded Sign, entirety, step, operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

Fig. 1 and Fig. 2 is please referred to, for a kind of flow diagram for voice interactive method that one embodiment of the invention provides And application scenarios schematic diagram.As shown in Fig. 2, the voice interactive method is applied to server end 20, server end 20 can be It can be independent server, be also possible to the server cluster of multiple server compositions.The server end 20 can pass through network Communication carries out communication connection with terminal 10, to realize data interaction.Wherein, the quantity of terminal 10 can be multiple, such as terminal 10 include first terminal and second terminal etc., and first terminal and second terminal can carry out communication connection by server end.Eventually End 10 can be the tool such as smart phone, tablet computer, laptop, desktop computer, personal digital assistant and wearable device There is the electronic equipment of communication function.

Wherein, which includes but is not limited to step S110-S150.

S110 obtains input voice by first terminal, and carries out voice recognition processing to the input voice to obtain Input text.

Wherein, voice recognition processing is carried out to the input voice to show that input text specifically can be by server end Speech recognition tools are called to realize, which includes but is not limited to the speech recognition based on HMM and N-gram model Tool: CMU Sphinx, Kaldi, HTK, Julius and ISIP.

In some embodiments, as shown in figure 3, step S110 includes step S111-S112.

S111, judges whether the input voice duration length is greater than preset time threshold.

Specifically, which can be the interactive voice in game process.To reduce, user need not in game process The operational motion wanted, acquisition input voice can set automatic collection, i.e., be operated the then language by first terminal without user Voice messaging during its game of sound acquisition device automatic collection is as input voice.And it is defeated caused by user in game process Entering voice is not to belong to the input voice of interactive class, as user may during game with other people progress in real world Dialogue.In order to reduce server end to the treating capacity of input voice, and then time threshold is set to received input voice It is screened.The preset time threshold can be set according to the processing pressure of server end, and preset time threshold is shorter, The processing pressure of server end is smaller.For example, the preset time threshold is 3 seconds.

S112, if the input voice duration length is not more than preset time threshold, to the input language Sound carries out voice recognition processing to obtain input text.

Specifically, if the input voice duration length is not more than preset time threshold, show the input Voice may belong to the input voice of interactive class, and then carry out voice recognition processing to the input voice to obtain input text This.

If the input voice duration length is greater than preset time threshold, show that the input voice is too long, The input voice may not be the input voice of interactive class, and then delete the input voice, and not to the input voice into Row voice recognition processing is conducive to the fluency for promoting game to reduce the processing pressure of server end.

S120 traverses text combination all in preset text library, obtains the text to match with the input text Combination.

Specifically, preset text library is combined for stored text, and the quantity of text combination can be with one or more.Its In, text combination is used to store the interaction keyword that multiple semantemes are identical but interaction languages are different, and each interactive keyword is corresponding Unique interaction languages.

As shown in table 1, preset text library may include multiple text combinations, such as the first text combination, the second text combination Etc..Each text combination may each comprise multiple interactive keywords, such as " withdrawing ", " rescuing me ".Each interactive keyword Corresponding unique interaction languages, such as Chinese, English.

Interaction languages	Chinese	English	Japanese	……
					First text combination	It withdraws	retreat	リトリート	……
Second text combination	Rescue me	helpme	Private The rescues う	……
					……	……	……	……	……

Table 1

In some embodiments, as shown in figure 4, step S120 includes step S121-S123.

S121 carries out word segmentation processing to the input text, to generate text key word.

Specifically, carrying out word segmentation processing to first text information can be by calling the tools such as jieba to realize.Assuming that defeated Entering text is " fastly to rescue me ", is " fast come ", " rescues by carrying out obtained text key word after word segmentation processing to input text I ".

S122 traverses text combination all in preset text library, obtains interaction identical with the text key word Keyword.

Specifically, it is compared by the way that the text key word is carried out character with the interaction keyword in text combination, to obtain It takes and identical with the text key word interacts keyword.

S123, by it is identical with the text key word interact keyword where text combination be determined as and the input The text combination that text matches.

Specifically, it is assumed that text key word is that " rescuing me " can by traversing text combination all in preset text library Determine that the text combination interacted where keyword identical with the text key word is the second text combination, so by this second Text combination is determined as the text combination to match with the input text.

S130 obtains the corresponding category of language of second terminal, and is generated according to the corresponding category of language of the second terminal Export text.

In some embodiments, as shown in figure 5, step S130 includes step S131-S133.

S131, obtains the currently used system language type of second terminal, and by the currently used system of the second terminal Language is as the corresponding category of language of the second terminal.

Specifically, obtaining the currently used system language of second terminal can be obtained by sending system language to second terminal Instruction, and output that the system language acquisition instruction is returned is obtained as a result, determining second terminal by the output result in turn Currently used system language type.System language acquisition instruction specifically includes that getSystemLanguageList (), if institute The output result of return is " EN ", it is determined that the currently used system language of the second terminal is English.

S132, acquisition is identical with the second terminal category of language in preset text library interacts languages.

Specifically, each text combination may each comprise that multiple semantemes are identical but the different interaction keyword of interaction languages with And languages are interacted correspondingly with keyword is interacted.Such as second include that the identical interaction of three semantemes is crucial in text combination Word " rescues me ", these three semantic identical interactive keywords correspond respectively to an interactive languages, such as " rescuing me " respectively corresponds In " Chinese ", " English ", " Japanese " these three interaction languages.By by the corresponding category of language of second terminal obtained and institute The interaction languages stated in the text combination that input text matches are compared, and can get and the second terminal category of language Identical interactive languages.

S133 is generated according to languages and the preset text library of interacting identical with the second terminal category of language Export text.

Specifically, if it is accessed it is identical with the second terminal category of language interact languages for English, with Obtain the interaction keyword that interaction languages are English in the text combination that matches of input text, and by the interaction keyword As output text.

S140 generates output voice according to the output text and the corresponding category of language of the second terminal, and will The output voice is sent to the second terminal.

Specifically, the generation for exporting voice can realize that TTS is the abbreviation of Text To Speech by TTS technology, mean " from Text To Speech ", for realizing speech synthesis.It is sent to the second terminal by the way that voice will be exported, to realize across languages Interactive voice.

In some embodiments, as shown in fig. 6, step S140 includes step S141-S143.

S141, obtains the fundamental frequency of the input voice, and judges whether the fundamental frequency is greater than preset frequency Threshold value.

Specifically, fundamental frequency refers to people when sending out voiced sound, because air-flow and glottis make vocal cords generate the vibration of relaxation vibrating type It is dynamic, and then generate quasi-periodic driving pulse string.There are inseparable passes with the length of vocal cords, thin and thick, toughness for fundamental frequency System is analyzed by the fundamental frequency to input voice, can obtain the corresponding user's gender of input voice.

S142 determines that the input voice is that female voice inputs voice if the fundamental frequency is greater than preset frequency threshold, Female voice is generated according to the output text and the corresponding category of language of the second terminal and exports voice, and the female voice is defeated Voice is sent to the second terminal out.

Specifically, the fundamental frequency of male voice is lower, and the fundamental frequency of male voice is generally 50-200Hz；The fundamental frequency of female voice Higher, the fundamental frequency of female voice is generally 180-500Hz.Difference based on male voice Yu female voice fundamental frequency, preset frequency threshold The range of value may be set to 180-200Hz.For example, preset frequency threshold may be set to 190Hz.If the fundamental frequency is greater than 190Hz determines that the input voice is that female voice inputs voice, and then generates schoolgirl and export voice.

S143 determines that the input voice is that male voice inputs language if the fundamental frequency is not more than preset frequency threshold Sound generates male voice according to the output text and the corresponding category of language of the second terminal and exports voice, and by the male Voice output voice is sent to the second terminal.

Specifically, if the fundamental frequency is not more than 190Hz, determine that the input voice is that male voice inputs voice, in turn It generates boy student and exports voice.

Implement the embodiment of the present invention, by being analyzed input voice to obtain the corresponding user's gender of input voice, And then determine the sex types of output voice, interactive voice can be made with more authenticity, improve the interest of interactive voice.

Fig. 7 is a kind of schematic block diagram of voice interaction device 100 provided in an embodiment of the present invention.As shown in fig. 7, corresponding In the above voice interactive method, the present invention also provides a kind of voice interaction devices 100.The voice interaction device 100 includes being used for The unit of above-mentioned voice interactive method is executed, which can be configured in server end, and server end can be independence Server, be also possible to the server cluster of multiple servers composition.

Specifically, referring to Fig. 7, the voice interaction device 100 includes first acquisition unit 110, second acquisition unit 120, third acquiring unit 130 and the first generation unit 140.

First acquisition unit 110 for obtaining input voice by first terminal, and carries out voice to the input voice Identifying processing is to obtain input text.

In some embodiments, as shown in figure 8, the first acquisition unit 110 includes the first judging unit 111 and the One processing unit 112.

First judging unit 111, when for judging whether the input voice duration length is greater than preset Between threshold value.

First processing units 112, if being not more than preset time threshold for the input voice duration length Value carries out voice recognition processing to the input voice to obtain input text.

Second acquisition unit 120 obtains and the input text for traversing text combination all in preset text library Originally the text combination to match.

In some embodiments, as shown in figure 9, the second acquisition unit 120 includes the second generation unit the 121, the 4th Acquiring unit 122 and the first determination unit 123.

Second generation unit 121, for carrying out word segmentation processing to the input text, to generate text key word.

4th acquiring unit 122 is obtained and is closed with the text for traversing text combination all in preset text library The identical interactive keyword of keyword.

First determination unit 123, for by the text combination interacted where keyword identical with the text key word It is determined as the text combination to match with the input text.

Third acquiring unit 130, for obtaining the corresponding category of language of second terminal, and it is corresponding according to the second terminal Category of language generate output text.

In some embodiments, as shown in Figure 10, the third acquiring unit 130 includes the 5th acquiring unit the 131, the 6th Acquiring unit 132 and third generation unit 133.

5th acquiring unit 131, for obtaining the currently used system language type of second terminal, and eventually by described second Hold currently used system language as the corresponding category of language of the second terminal.

6th acquiring unit 132, it is identical with the second terminal category of language for being obtained in preset text library Interaction languages.

Third generation unit 133, for according to it is described it is identical with the second terminal category of language interact languages and Preset text library generates output text.

First generation unit 140, for raw according to the output text and the corresponding category of language of the second terminal At output voice, and the output voice is sent to the second terminal.

In some embodiments, as shown in figure 11, first generation unit 140 includes the 7th acquiring unit 141, second Processing unit 142 and third processing unit 143.

Whether 7th acquiring unit 141 for obtaining the fundamental frequency of the input voice, and judges the fundamental frequency Greater than preset frequency threshold.

The second processing unit 142 determines the input voice if being greater than preset frequency threshold for the fundamental frequency Voice is inputted for female voice, female voice is generated according to the output text and the corresponding category of language of the second terminal and exports language Sound, and female voice output voice is sent to the second terminal.

Third processing unit 143 determines the input language if being not more than preset frequency threshold for the fundamental frequency Sound is that male voice inputs voice, generates male voice according to the output text and the corresponding category of language of the second terminal and exports language Sound, and male voice output voice is sent to the second terminal.

It should be noted that it is apparent to those skilled in the art that, above-mentioned 100 He of voice interaction device The specific implementation process of each unit can refer to the corresponding description in preceding method embodiment, for convenience of description and succinctly, Details are not described herein.

Above-mentioned apparatus 100 can be implemented as a kind of form of computer program, and computer program can be as shown in figure 12 Computer equipment on run.

Figure 12 is please referred to, Figure 12 is a kind of schematic block diagram of computer equipment provided in an embodiment of the present invention.The calculating Machine equipment 500 can be terminal.The terminal can be smart phone, tablet computer, laptop, desktop computer, a number Word assistant and wearable device etc. have the electronic equipment of communication function.

The computer equipment 500 includes processor 520, memory and the network interface connected by system bus 510 550, wherein memory may include non-volatile memory medium 530 and built-in storage 540.

The non-volatile memory medium 530 can storage program area 531 and computer program 532.The computer program 532 It is performed, processor 520 may make to execute a kind of voice interactive method.

The processor 520 supports the operation of entire computer equipment 500 for providing calculating and control ability.

The built-in storage 540 provides environment for the operation of the computer program in non-volatile memory medium, the computer When program is executed by processor 520, processor 520 may make to execute a kind of voice interactive method.

The network interface 550 is used to carry out network communication with other equipment.It will be understood by those skilled in the art that the calculating The schematic block diagram of machine equipment is only the block diagram of part-structure relevant to the present invention program, is not constituted to the present invention program The restriction for the computer equipment 500 being applied thereon, specific computer equipment 500 may include than as shown in the figure more or Less component perhaps combines certain components or with different component layouts.

Wherein, the processor 520 is for running program code stored in memory, to implement function such as: logical It crosses first terminal and obtains input voice, and voice recognition processing is carried out to obtain input text to the input voice；Traversal is pre- If text library in all text combination, obtain the text combination to match with the input text；Obtain second terminal pair The category of language answered, and output text is generated according to the corresponding category of language of the second terminal；According to the output text with And the corresponding category of language of the second terminal generates output voice, and the output voice is sent to the second terminal.

In one embodiment, processor 520 inputs voice in described obtain by first terminal of execution, and to the input When the step that voice carries out voice recognition processing to obtain input text, specifically executes following steps: judging the input voice Whether duration length is greater than preset time threshold；If the input voice duration length is no more than pre- If time threshold, to the input voice carry out voice recognition processing to obtain input text.

In one embodiment, processor 520 is executing text combination all in the preset text library of traversal, obtains When the step of the text combination to match with the input text, specifically executes following steps: the input text is divided Word processing, to generate text key word；Text combination all in preset text library is traversed, is obtained and the text key word Identical interactive keyword；By with the text key word is identical interact keyword where text combination be determined as with it is described The text combination that input text matches.

In one embodiment, processor 520 is executing the corresponding category of language of the acquisition second terminal, and according to described When the corresponding category of language of second terminal generates the step of output text, following steps are specifically executed: it is current to obtain second terminal The system language type used, and using the currently used system language of the second terminal as the corresponding language of the second terminal Type；Acquisition is identical with the second terminal category of language in preset text library interacts languages；According to it is described with it is described The identical interactive languages of second terminal category of language and preset text library generate output text.

In one embodiment, processor 520 is described corresponding according to the output text and the second terminal in execution Category of language generate output voice, and when the output voice is sent to the step of the second terminal, specific execution is such as Lower step: the fundamental frequency of the input voice is obtained, and judges whether the fundamental frequency is greater than preset frequency threshold；If The fundamental frequency is greater than preset frequency threshold, determines that the input voice is that female voice inputs voice, according to the output text This and the corresponding category of language of the second terminal generate female voice and export voice, and female voice output voice is sent to institute State second terminal；If the fundamental frequency is not more than preset frequency threshold, determine that the input voice is that male voice inputs voice, Male voice is generated according to the output text and the corresponding category of language of the second terminal and exports voice, and the male voice is defeated Voice is sent to the second terminal out.

It should be appreciated that in embodiments of the present invention, processor 520 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or Person's processor is also possible to any conventional processor etc..

It will be understood by those skilled in the art that the schematic block diagram of the computer equipment 500 is not constituted and is set to computer Standby 500 restriction may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets.

A kind of computer readable storage medium is provided in another embodiment of the invention, the computer readable storage medium It is stored with computer program, wherein computer program includes program instruction.Described program instruction is realized such as when being executed by processor Lower step: obtaining input voice by first terminal, and carries out voice recognition processing to the input voice to obtain input text This；Text combination all in preset text library is traversed, the text combination to match with the input text is obtained；Obtain the The corresponding category of language of two terminals, and output text is generated according to the corresponding category of language of the second terminal；According to described defeated Text and the corresponding category of language of the second terminal generate output voice out, and the output voice is sent to described the Two terminals.

In one embodiment, described program instruction is executed by processor to realize that described obtain by first terminal inputs language Sound, and to it is described input voice carry out voice recognition processing with obtain input text step when, be implemented as follows step: sentencing Whether the input voice duration length of breaking is greater than preset time threshold；If it is described input voice it is lasting when Between length be not more than preset time threshold, to the input voice progress voice recognition processing to obtain input text.

In one embodiment, described program instruction is executed by processor to realize in the preset text library of traversal and own Text combination be implemented as follows step: to described when obtaining the step with the text combination that matches of input text It inputs text and carries out word segmentation processing, to generate text key word；Traverse text combination all in preset text library, obtain with The identical interactive keyword of the text key word；By the group of text interacted where keyword identical with the text key word Close the text combination for being determined as matching with the input text.

In one embodiment, described program instruction is executed by processor to realize the corresponding language of the acquisition second terminal Type, and when generating the step of output text according to the corresponding category of language of the second terminal, it is implemented as follows step: obtaining The system language type that second terminal is currently used is taken, and using the currently used system language of the second terminal as described second The corresponding category of language of terminal；Acquisition is identical with the second terminal category of language in preset text library interacts languages； Output text is generated according to languages and the preset text library of interacting identical with the second terminal category of language.

In one embodiment, described program instruction is executed by processor described according to the output text and institute to realize It states the corresponding category of language of second terminal and generates output voice, and the step of output voice is sent to the second terminal When, it is implemented as follows step: obtaining the fundamental frequency of the input voice, and it is default to judge whether the fundamental frequency is greater than Frequency threshold；If the fundamental frequency is greater than preset frequency threshold, determine that the input voice is that female voice inputs voice, root Female voice is generated according to the output text and the corresponding category of language of the second terminal and exports voice, and the female voice is exported Voice is sent to the second terminal；If the fundamental frequency is not more than preset frequency threshold, determine that the input voice is Male voice inputs voice, generates male voice according to the output text and the corresponding category of language of the second terminal and exports voice, And male voice output voice is sent to the second terminal.

The computer readable storage medium can be USB flash disk, mobile hard disk, read-only memory (ROM, Read- OnlyMemory), the various media that can store program code such as magnetic or disk.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not It is considered as beyond the scope of this invention.It is apparent to those skilled in the art that for convenience of description and simple Clean, the device of foregoing description and the specific work process of unit can refer to corresponding processes in the foregoing method embodiment, herein It repeats no more.

In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary.For example, the division of each unit, only Only a kind of logical function partition, there may be another division manner in actual implementation.Such as more than one unit or assembly can To combine or be desirably integrated into another system, or some features can be ignored or not executed.

The steps in the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.This hair Unit in bright embodiment device can be combined, divided and deleted according to actual needs.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product, It can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a storage medium, including some instructions are with so that a computer is set Standby (can be personal computer, terminal or the network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right It is required that protection scope subject to.

Claims

1. a kind of voice interactive method, which is characterized in that the described method includes:

Input voice is obtained by first terminal, and voice recognition processing is carried out to obtain input text to the input voice；

Text combination all in preset text library is traversed, the text combination to match with the input text is obtained；

The corresponding category of language of second terminal is obtained, and output text is generated according to the corresponding category of language of the second terminal；

Output voice is generated according to the output text and the corresponding category of language of the second terminal, and by the output language Sound is sent to the second terminal.

2. the method as described in claim 1, which is characterized in that it is described that input voice is obtained by first terminal, and to described It inputs voice and carries out voice recognition processing to obtain input text, comprising:

Judge whether the input voice duration length is greater than preset time threshold；

If the input voice duration length is not more than preset time threshold, voice is carried out to the input voice Identifying processing is to obtain input text.

3. the method as described in claim 1, which is characterized in that all text combinations in the preset text library of traversal, Obtain the text combination to match with the input text, comprising:

Word segmentation processing is carried out to the input text, to generate text key word；

Text combination all in preset text library is traversed, acquisition is identical with the text key word to interact keyword；

By with the text key word is identical interact keyword where text combination be determined as and the input text phase The text combination matched.

4. the method as described in claim 1, which is characterized in that the corresponding category of language of the acquisition second terminal, and according to The corresponding category of language of the second terminal generates output text, comprising:

The currently used system language type of second terminal is obtained, and using the currently used system language of the second terminal as institute State the corresponding category of language of second terminal；

Acquisition is identical with the second terminal category of language in preset text library interacts languages；

Output text is generated according to languages and the preset text library of interacting identical with the second terminal category of language.

5. the method as described in claim 1, which is characterized in that described according to the output text and the second terminal pair The category of language answered generates output voice, and the output voice is sent to the second terminal, comprising:

The fundamental frequency of the input voice is obtained, and judges whether the fundamental frequency is greater than preset frequency threshold；

If the fundamental frequency is greater than preset frequency threshold, determine that the input voice is that female voice inputs voice, according to described It exports text and the corresponding category of language of the second terminal generates female voice and exports voice, and female voice output voice is sent out It send to the second terminal；

If the fundamental frequency is not more than preset frequency threshold, determine that the input voice is that male voice inputs voice, according to institute It states output text and the corresponding category of language of the second terminal generates male voice output voice, and the male voice is exported into voice It is sent to the second terminal.

6. a kind of voice interaction device, which is characterized in that described device includes:

First acquisition unit for obtaining input voice by first terminal, and carries out at speech recognition the input voice Reason is to obtain input text；

Second acquisition unit obtains and the input text phase for traversing text combination all in preset text library The text combination matched；

Third acquiring unit, for obtaining the corresponding category of language of second terminal, and according to the corresponding language of the second terminal Type generates output text；

First generation unit, for generating output language according to the output text and the corresponding category of language of the second terminal Sound, and the output voice is sent to the second terminal.

7. device as claimed in claim 6, which is characterized in that the first acquisition unit includes:

First judging unit, for judging whether the input voice duration length is greater than preset time threshold；

First processing units, if being not more than preset time threshold for the input voice duration length, to institute It states input voice and carries out voice recognition processing to obtain input text.

8. device as claimed in claim 6, which is characterized in that the second acquisition unit includes:

Second generation unit, for carrying out word segmentation processing to the input text, to generate text key word；

4th acquiring unit obtains and the text key word phase for traversing text combination all in preset text library Same interaction keyword；

First determination unit, for by it is identical with the text key word interact keyword where text combination be determined as with The text combination that the input text matches.

9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1 to 5 The voice interactive method of item.

10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence, the computer program make the processor execute language as described in any one in claim 1-5 when being executed by a processor Sound exchange method.