CN109658931A - Voice interactive method, device, computer equipment and storage medium - Google Patents
Voice interactive method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN109658931A CN109658931A CN201811554298.5A CN201811554298A CN109658931A CN 109658931 A CN109658931 A CN 109658931A CN 201811554298 A CN201811554298 A CN 201811554298A CN 109658931 A CN109658931 A CN 109658931A
- Authority
- CN
- China
- Prior art keywords
- text
- voice
- terminal
- language
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The embodiment of the invention provides a kind of voice interactive method, device, computer equipment and storage mediums.This method is applied to interactive voice field, and this method includes input voice being obtained by first terminal, and carry out voice recognition processing to the input voice to obtain input text;Text combination all in preset text library is traversed, the text combination to match with the input text is obtained;The corresponding category of language of second terminal is obtained, and output text is generated according to the corresponding category of language of the second terminal;Output voice is generated according to the output text and the corresponding category of language of the second terminal, and the output voice is sent to the second terminal.Implement the embodiment of the present invention, can solve the problems, such as aphasis present in interactive voice, enrich the interest of interactive voice.
Description
Technical field
The present invention relates to field of computer data processing more particularly to a kind of voice interactive method, device, computer equipments
And computer readable storage medium.
Background technique
With the development of internet technology, online game also becomes more and more popular.In online game, game player is usually needed
Interaction is carried out with other players, such as text is inputted by Text Entry and carries out written communication, or by real-time
Voice communication carries out speech exchange.But in certain massively multiplayer games, player may be distributed in world's every country or
Area.Country variant is perhaps since language obstacle can not understand the text information or input language of other side between regional player
Sound makes the communication function in game application perform practically no function.
Summary of the invention
The embodiment of the invention provides a kind of voice interactive method, device, computer equipment and storage mediums, it is intended to solve
Across languages communicating questions in interactive voice.
In a first aspect, the embodiment of the invention provides a kind of voice interactive methods comprising: it is obtained by first terminal defeated
Enter voice, and voice recognition processing is carried out to obtain input text to the input voice;It traverses in preset text library and owns
Text combination, obtain and the text combination that matches of input text;Obtain the corresponding category of language of second terminal, and root
Output text is generated according to the corresponding category of language of the second terminal;It is corresponding according to the output text and the second terminal
Category of language generate output voice, and the output voice is sent to the second terminal.
Second aspect, the embodiment of the invention provides a kind of voice interaction devices comprising:
First acquisition unit for obtaining input voice by first terminal, and carries out voice knowledge to the input voice
It manages to obtain input text in other places;
Second acquisition unit obtains and the input text for traversing text combination all in preset text library
The text combination to match;
Third acquiring unit, for obtaining the corresponding category of language of second terminal, and it is corresponding according to the second terminal
Category of language generates output text;
First generation unit, it is defeated for being generated according to the output text and the corresponding category of language of the second terminal
Voice out, and the output voice is sent to the second terminal.
The third aspect, the embodiment of the present invention provide a kind of computer equipment again comprising memory, processor and storage
On the memory and the computer program that can run on the processor, wherein the processor executes described program
The above-mentioned voice interactive method of Shi Shixian.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, wherein the computer can
It reads storage medium and is stored with computer program, the computer program includes program instruction, and described program instruction is when by processor
The processor is set to execute above-mentioned voice interactive method when execution.
The embodiment of the present invention provides a kind of voice interactive method, device, computer equipment and computer readable storage medium.
This method includes input voice being obtained by first terminal, and carry out voice recognition processing to the input voice to obtain input
Text;Text combination all in preset text library is traversed, the text combination to match with the input text is obtained;It obtains
The corresponding category of language of second terminal, and output text is generated according to the corresponding category of language of the second terminal;According to described
It exports text and the corresponding category of language of the second terminal generates output voice, and described in the output voice is sent to
Second terminal.Implement the embodiment of the present invention, can solve the problems, such as aphasis present in interactive voice, enrich interactive voice
Interest.
Detailed description of the invention
Technical solution in order to illustrate the embodiments of the present invention more clearly, below will be to needed in embodiment description
Attached drawing is briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides;
Fig. 2 is a kind of application scenarios schematic diagram for voice interactive method that one embodiment of the invention provides;
Fig. 3 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides;
Fig. 4 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides;
Fig. 5 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides;
Fig. 6 is a kind of flow diagram for voice interactive method that one embodiment of the invention provides;
Fig. 7 is a kind of schematic block diagram for voice interaction device that one embodiment of the invention provides;
Fig. 8 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides;
Fig. 9 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides;
Figure 10 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides;
Figure 11 is a kind of another schematic block diagram for voice interaction device that one embodiment of the invention provides;
Figure 12 is a kind of schematic block diagram for computer equipment that one embodiment of the invention provides.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " and "comprising" instruction
Described feature, entirety, step, operation, the presence of element and/or component, but one or more other spies are not precluded
Sign, entirety, step, operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment
And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
Fig. 1 and Fig. 2 is please referred to, for a kind of flow diagram for voice interactive method that one embodiment of the invention provides
And application scenarios schematic diagram.As shown in Fig. 2, the voice interactive method is applied to server end 20, server end 20 can be
It can be independent server, be also possible to the server cluster of multiple server compositions.The server end 20 can pass through network
Communication carries out communication connection with terminal 10, to realize data interaction.Wherein, the quantity of terminal 10 can be multiple, such as terminal
10 include first terminal and second terminal etc., and first terminal and second terminal can carry out communication connection by server end.Eventually
End 10 can be the tool such as smart phone, tablet computer, laptop, desktop computer, personal digital assistant and wearable device
There is the electronic equipment of communication function.
Wherein, which includes but is not limited to step S110-S150.
S110 obtains input voice by first terminal, and carries out voice recognition processing to the input voice to obtain
Input text.
Wherein, voice recognition processing is carried out to the input voice to show that input text specifically can be by server end
Speech recognition tools are called to realize, which includes but is not limited to the speech recognition based on HMM and N-gram model
Tool: CMU Sphinx, Kaldi, HTK, Julius and ISIP.
In some embodiments, as shown in figure 3, step S110 includes step S111-S112.
S111, judges whether the input voice duration length is greater than preset time threshold.
Specifically, which can be the interactive voice in game process.To reduce, user need not in game process
The operational motion wanted, acquisition input voice can set automatic collection, i.e., be operated the then language by first terminal without user
Voice messaging during its game of sound acquisition device automatic collection is as input voice.And it is defeated caused by user in game process
Entering voice is not to belong to the input voice of interactive class, as user may during game with other people progress in real world
Dialogue.In order to reduce server end to the treating capacity of input voice, and then time threshold is set to received input voice
It is screened.The preset time threshold can be set according to the processing pressure of server end, and preset time threshold is shorter,
The processing pressure of server end is smaller.For example, the preset time threshold is 3 seconds.
S112, if the input voice duration length is not more than preset time threshold, to the input language
Sound carries out voice recognition processing to obtain input text.
Specifically, if the input voice duration length is not more than preset time threshold, show the input
Voice may belong to the input voice of interactive class, and then carry out voice recognition processing to the input voice to obtain input text
This.
If the input voice duration length is greater than preset time threshold, show that the input voice is too long,
The input voice may not be the input voice of interactive class, and then delete the input voice, and not to the input voice into
Row voice recognition processing is conducive to the fluency for promoting game to reduce the processing pressure of server end.
S120 traverses text combination all in preset text library, obtains the text to match with the input text
Combination.
Specifically, preset text library is combined for stored text, and the quantity of text combination can be with one or more.Its
In, text combination is used to store the interaction keyword that multiple semantemes are identical but interaction languages are different, and each interactive keyword is corresponding
Unique interaction languages.
As shown in table 1, preset text library may include multiple text combinations, such as the first text combination, the second text combination
Etc..Each text combination may each comprise multiple interactive keywords, such as " withdrawing ", " rescuing me ".Each interactive keyword
Corresponding unique interaction languages, such as Chinese, English.
Interaction languages | Chinese | English | Japanese | …… |
First text combination | It withdraws | retreat | リトリート | …… |
Second text combination | Rescue me | helpme | Private The rescues う | …… |
…… | …… | …… | …… | …… |
Table 1
In some embodiments, as shown in figure 4, step S120 includes step S121-S123.
S121 carries out word segmentation processing to the input text, to generate text key word.
Specifically, carrying out word segmentation processing to first text information can be by calling the tools such as jieba to realize.Assuming that defeated
Entering text is " fastly to rescue me ", is " fast come ", " rescues by carrying out obtained text key word after word segmentation processing to input text
I ".
S122 traverses text combination all in preset text library, obtains interaction identical with the text key word
Keyword.
Specifically, it is compared by the way that the text key word is carried out character with the interaction keyword in text combination, to obtain
It takes and identical with the text key word interacts keyword.
S123, by it is identical with the text key word interact keyword where text combination be determined as and the input
The text combination that text matches.
Specifically, it is assumed that text key word is that " rescuing me " can by traversing text combination all in preset text library
Determine that the text combination interacted where keyword identical with the text key word is the second text combination, so by this second
Text combination is determined as the text combination to match with the input text.
S130 obtains the corresponding category of language of second terminal, and is generated according to the corresponding category of language of the second terminal
Export text.
In some embodiments, as shown in figure 5, step S130 includes step S131-S133.
S131, obtains the currently used system language type of second terminal, and by the currently used system of the second terminal
Language is as the corresponding category of language of the second terminal.
Specifically, obtaining the currently used system language of second terminal can be obtained by sending system language to second terminal
Instruction, and output that the system language acquisition instruction is returned is obtained as a result, determining second terminal by the output result in turn
Currently used system language type.System language acquisition instruction specifically includes that getSystemLanguageList (), if institute
The output result of return is " EN ", it is determined that the currently used system language of the second terminal is English.
S132, acquisition is identical with the second terminal category of language in preset text library interacts languages.
Specifically, each text combination may each comprise that multiple semantemes are identical but the different interaction keyword of interaction languages with
And languages are interacted correspondingly with keyword is interacted.Such as second include that the identical interaction of three semantemes is crucial in text combination
Word " rescues me ", these three semantic identical interactive keywords correspond respectively to an interactive languages, such as " rescuing me " respectively corresponds
In " Chinese ", " English ", " Japanese " these three interaction languages.By by the corresponding category of language of second terminal obtained and institute
The interaction languages stated in the text combination that input text matches are compared, and can get and the second terminal category of language
Identical interactive languages.
S133 is generated according to languages and the preset text library of interacting identical with the second terminal category of language
Export text.
Specifically, if it is accessed it is identical with the second terminal category of language interact languages for English, with
Obtain the interaction keyword that interaction languages are English in the text combination that matches of input text, and by the interaction keyword
As output text.
S140 generates output voice according to the output text and the corresponding category of language of the second terminal, and will
The output voice is sent to the second terminal.
Specifically, the generation for exporting voice can realize that TTS is the abbreviation of Text To Speech by TTS technology, mean
" from Text To Speech ", for realizing speech synthesis.It is sent to the second terminal by the way that voice will be exported, to realize across languages
Interactive voice.
In some embodiments, as shown in fig. 6, step S140 includes step S141-S143.
S141, obtains the fundamental frequency of the input voice, and judges whether the fundamental frequency is greater than preset frequency
Threshold value.
Specifically, fundamental frequency refers to people when sending out voiced sound, because air-flow and glottis make vocal cords generate the vibration of relaxation vibrating type
It is dynamic, and then generate quasi-periodic driving pulse string.There are inseparable passes with the length of vocal cords, thin and thick, toughness for fundamental frequency
System is analyzed by the fundamental frequency to input voice, can obtain the corresponding user's gender of input voice.
S142 determines that the input voice is that female voice inputs voice if the fundamental frequency is greater than preset frequency threshold,
Female voice is generated according to the output text and the corresponding category of language of the second terminal and exports voice, and the female voice is defeated
Voice is sent to the second terminal out.
Specifically, the fundamental frequency of male voice is lower, and the fundamental frequency of male voice is generally 50-200Hz;The fundamental frequency of female voice
Higher, the fundamental frequency of female voice is generally 180-500Hz.Difference based on male voice Yu female voice fundamental frequency, preset frequency threshold
The range of value may be set to 180-200Hz.For example, preset frequency threshold may be set to 190Hz.If the fundamental frequency is greater than
190Hz determines that the input voice is that female voice inputs voice, and then generates schoolgirl and export voice.
S143 determines that the input voice is that male voice inputs language if the fundamental frequency is not more than preset frequency threshold
Sound generates male voice according to the output text and the corresponding category of language of the second terminal and exports voice, and by the male
Voice output voice is sent to the second terminal.
Specifically, if the fundamental frequency is not more than 190Hz, determine that the input voice is that male voice inputs voice, in turn
It generates boy student and exports voice.
Implement the embodiment of the present invention, by being analyzed input voice to obtain the corresponding user's gender of input voice,
And then determine the sex types of output voice, interactive voice can be made with more authenticity, improve the interest of interactive voice.
Fig. 7 is a kind of schematic block diagram of voice interaction device 100 provided in an embodiment of the present invention.As shown in fig. 7, corresponding
In the above voice interactive method, the present invention also provides a kind of voice interaction devices 100.The voice interaction device 100 includes being used for
The unit of above-mentioned voice interactive method is executed, which can be configured in server end, and server end can be independence
Server, be also possible to the server cluster of multiple servers composition.
Specifically, referring to Fig. 7, the voice interaction device 100 includes first acquisition unit 110, second acquisition unit
120, third acquiring unit 130 and the first generation unit 140.
First acquisition unit 110 for obtaining input voice by first terminal, and carries out voice to the input voice
Identifying processing is to obtain input text.
In some embodiments, as shown in figure 8, the first acquisition unit 110 includes the first judging unit 111 and the
One processing unit 112.
First judging unit 111, when for judging whether the input voice duration length is greater than preset
Between threshold value.
First processing units 112, if being not more than preset time threshold for the input voice duration length
Value carries out voice recognition processing to the input voice to obtain input text.
Second acquisition unit 120 obtains and the input text for traversing text combination all in preset text library
Originally the text combination to match.
In some embodiments, as shown in figure 9, the second acquisition unit 120 includes the second generation unit the 121, the 4th
Acquiring unit 122 and the first determination unit 123.
Second generation unit 121, for carrying out word segmentation processing to the input text, to generate text key word.
4th acquiring unit 122 is obtained and is closed with the text for traversing text combination all in preset text library
The identical interactive keyword of keyword.
First determination unit 123, for by the text combination interacted where keyword identical with the text key word
It is determined as the text combination to match with the input text.
Third acquiring unit 130, for obtaining the corresponding category of language of second terminal, and it is corresponding according to the second terminal
Category of language generate output text.
In some embodiments, as shown in Figure 10, the third acquiring unit 130 includes the 5th acquiring unit the 131, the 6th
Acquiring unit 132 and third generation unit 133.
5th acquiring unit 131, for obtaining the currently used system language type of second terminal, and eventually by described second
Hold currently used system language as the corresponding category of language of the second terminal.
6th acquiring unit 132, it is identical with the second terminal category of language for being obtained in preset text library
Interaction languages.
Third generation unit 133, for according to it is described it is identical with the second terminal category of language interact languages and
Preset text library generates output text.
First generation unit 140, for raw according to the output text and the corresponding category of language of the second terminal
At output voice, and the output voice is sent to the second terminal.
In some embodiments, as shown in figure 11, first generation unit 140 includes the 7th acquiring unit 141, second
Processing unit 142 and third processing unit 143.
Whether 7th acquiring unit 141 for obtaining the fundamental frequency of the input voice, and judges the fundamental frequency
Greater than preset frequency threshold.
The second processing unit 142 determines the input voice if being greater than preset frequency threshold for the fundamental frequency
Voice is inputted for female voice, female voice is generated according to the output text and the corresponding category of language of the second terminal and exports language
Sound, and female voice output voice is sent to the second terminal.
Third processing unit 143 determines the input language if being not more than preset frequency threshold for the fundamental frequency
Sound is that male voice inputs voice, generates male voice according to the output text and the corresponding category of language of the second terminal and exports language
Sound, and male voice output voice is sent to the second terminal.
It should be noted that it is apparent to those skilled in the art that, above-mentioned 100 He of voice interaction device
The specific implementation process of each unit can refer to the corresponding description in preceding method embodiment, for convenience of description and succinctly,
Details are not described herein.
Above-mentioned apparatus 100 can be implemented as a kind of form of computer program, and computer program can be as shown in figure 12
Computer equipment on run.
Figure 12 is please referred to, Figure 12 is a kind of schematic block diagram of computer equipment provided in an embodiment of the present invention.The calculating
Machine equipment 500 can be terminal.The terminal can be smart phone, tablet computer, laptop, desktop computer, a number
Word assistant and wearable device etc. have the electronic equipment of communication function.
The computer equipment 500 includes processor 520, memory and the network interface connected by system bus 510
550, wherein memory may include non-volatile memory medium 530 and built-in storage 540.
The non-volatile memory medium 530 can storage program area 531 and computer program 532.The computer program 532
It is performed, processor 520 may make to execute a kind of voice interactive method.
The processor 520 supports the operation of entire computer equipment 500 for providing calculating and control ability.
The built-in storage 540 provides environment for the operation of the computer program in non-volatile memory medium, the computer
When program is executed by processor 520, processor 520 may make to execute a kind of voice interactive method.
The network interface 550 is used to carry out network communication with other equipment.It will be understood by those skilled in the art that the calculating
The schematic block diagram of machine equipment is only the block diagram of part-structure relevant to the present invention program, is not constituted to the present invention program
The restriction for the computer equipment 500 being applied thereon, specific computer equipment 500 may include than as shown in the figure more or
Less component perhaps combines certain components or with different component layouts.
Wherein, the processor 520 is for running program code stored in memory, to implement function such as: logical
It crosses first terminal and obtains input voice, and voice recognition processing is carried out to obtain input text to the input voice;Traversal is pre-
If text library in all text combination, obtain the text combination to match with the input text;Obtain second terminal pair
The category of language answered, and output text is generated according to the corresponding category of language of the second terminal;According to the output text with
And the corresponding category of language of the second terminal generates output voice, and the output voice is sent to the second terminal.
In one embodiment, processor 520 inputs voice in described obtain by first terminal of execution, and to the input
When the step that voice carries out voice recognition processing to obtain input text, specifically executes following steps: judging the input voice
Whether duration length is greater than preset time threshold;If the input voice duration length is no more than pre-
If time threshold, to the input voice carry out voice recognition processing to obtain input text.
In one embodiment, processor 520 is executing text combination all in the preset text library of traversal, obtains
When the step of the text combination to match with the input text, specifically executes following steps: the input text is divided
Word processing, to generate text key word;Text combination all in preset text library is traversed, is obtained and the text key word
Identical interactive keyword;By with the text key word is identical interact keyword where text combination be determined as with it is described
The text combination that input text matches.
In one embodiment, processor 520 is executing the corresponding category of language of the acquisition second terminal, and according to described
When the corresponding category of language of second terminal generates the step of output text, following steps are specifically executed: it is current to obtain second terminal
The system language type used, and using the currently used system language of the second terminal as the corresponding language of the second terminal
Type;Acquisition is identical with the second terminal category of language in preset text library interacts languages;According to it is described with it is described
The identical interactive languages of second terminal category of language and preset text library generate output text.
In one embodiment, processor 520 is described corresponding according to the output text and the second terminal in execution
Category of language generate output voice, and when the output voice is sent to the step of the second terminal, specific execution is such as
Lower step: the fundamental frequency of the input voice is obtained, and judges whether the fundamental frequency is greater than preset frequency threshold;If
The fundamental frequency is greater than preset frequency threshold, determines that the input voice is that female voice inputs voice, according to the output text
This and the corresponding category of language of the second terminal generate female voice and export voice, and female voice output voice is sent to institute
State second terminal;If the fundamental frequency is not more than preset frequency threshold, determine that the input voice is that male voice inputs voice,
Male voice is generated according to the output text and the corresponding category of language of the second terminal and exports voice, and the male voice is defeated
Voice is sent to the second terminal out.
It should be appreciated that in embodiments of the present invention, processor 520 can be central processing unit (Central
Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital
Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit,
ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic
Device, discrete gate or transistor logic, discrete hardware components etc..Wherein, general processor can be microprocessor or
Person's processor is also possible to any conventional processor etc..
It will be understood by those skilled in the art that the schematic block diagram of the computer equipment 500 is not constituted and is set to computer
Standby 500 restriction may include perhaps combining certain components or different component cloth than illustrating more or fewer components
It sets.
A kind of computer readable storage medium is provided in another embodiment of the invention, the computer readable storage medium
It is stored with computer program, wherein computer program includes program instruction.Described program instruction is realized such as when being executed by processor
Lower step: obtaining input voice by first terminal, and carries out voice recognition processing to the input voice to obtain input text
This;Text combination all in preset text library is traversed, the text combination to match with the input text is obtained;Obtain the
The corresponding category of language of two terminals, and output text is generated according to the corresponding category of language of the second terminal;According to described defeated
Text and the corresponding category of language of the second terminal generate output voice out, and the output voice is sent to described the
Two terminals.
In one embodiment, described program instruction is executed by processor to realize that described obtain by first terminal inputs language
Sound, and to it is described input voice carry out voice recognition processing with obtain input text step when, be implemented as follows step: sentencing
Whether the input voice duration length of breaking is greater than preset time threshold;If it is described input voice it is lasting when
Between length be not more than preset time threshold, to the input voice progress voice recognition processing to obtain input text.
In one embodiment, described program instruction is executed by processor to realize in the preset text library of traversal and own
Text combination be implemented as follows step: to described when obtaining the step with the text combination that matches of input text
It inputs text and carries out word segmentation processing, to generate text key word;Traverse text combination all in preset text library, obtain with
The identical interactive keyword of the text key word;By the group of text interacted where keyword identical with the text key word
Close the text combination for being determined as matching with the input text.
In one embodiment, described program instruction is executed by processor to realize the corresponding language of the acquisition second terminal
Type, and when generating the step of output text according to the corresponding category of language of the second terminal, it is implemented as follows step: obtaining
The system language type that second terminal is currently used is taken, and using the currently used system language of the second terminal as described second
The corresponding category of language of terminal;Acquisition is identical with the second terminal category of language in preset text library interacts languages;
Output text is generated according to languages and the preset text library of interacting identical with the second terminal category of language.
In one embodiment, described program instruction is executed by processor described according to the output text and institute to realize
It states the corresponding category of language of second terminal and generates output voice, and the step of output voice is sent to the second terminal
When, it is implemented as follows step: obtaining the fundamental frequency of the input voice, and it is default to judge whether the fundamental frequency is greater than
Frequency threshold;If the fundamental frequency is greater than preset frequency threshold, determine that the input voice is that female voice inputs voice, root
Female voice is generated according to the output text and the corresponding category of language of the second terminal and exports voice, and the female voice is exported
Voice is sent to the second terminal;If the fundamental frequency is not more than preset frequency threshold, determine that the input voice is
Male voice inputs voice, generates male voice according to the output text and the corresponding category of language of the second terminal and exports voice,
And male voice output voice is sent to the second terminal.
The computer readable storage medium can be USB flash disk, mobile hard disk, read-only memory (ROM, Read-
OnlyMemory), the various media that can store program code such as magnetic or disk.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware
With the interchangeability of software, each exemplary composition and step are generally described according to function in the above description.This
A little functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Specially
Industry technical staff can use different methods to achieve the described function each specific application, but this realization is not
It is considered as beyond the scope of this invention.It is apparent to those skilled in the art that for convenience of description and simple
Clean, the device of foregoing description and the specific work process of unit can refer to corresponding processes in the foregoing method embodiment, herein
It repeats no more.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary.For example, the division of each unit, only
Only a kind of logical function partition, there may be another division manner in actual implementation.Such as more than one unit or assembly can
To combine or be desirably integrated into another system, or some features can be ignored or not executed.
The steps in the embodiment of the present invention can be sequentially adjusted, merged and deleted according to actual needs.This hair
Unit in bright embodiment device can be combined, divided and deleted according to actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, is also possible to two or more units and is integrated in one unit.It is above-mentioned integrated
Unit both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product,
It can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or
Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products
Reveal and, which is stored in a storage medium, including some instructions are with so that a computer is set
Standby (can be personal computer, terminal or the network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection scope subject to.
Claims (10)
1. a kind of voice interactive method, which is characterized in that the described method includes:
Input voice is obtained by first terminal, and voice recognition processing is carried out to obtain input text to the input voice;
Text combination all in preset text library is traversed, the text combination to match with the input text is obtained;
The corresponding category of language of second terminal is obtained, and output text is generated according to the corresponding category of language of the second terminal;
Output voice is generated according to the output text and the corresponding category of language of the second terminal, and by the output language
Sound is sent to the second terminal.
2. the method as described in claim 1, which is characterized in that it is described that input voice is obtained by first terminal, and to described
It inputs voice and carries out voice recognition processing to obtain input text, comprising:
Judge whether the input voice duration length is greater than preset time threshold;
If the input voice duration length is not more than preset time threshold, voice is carried out to the input voice
Identifying processing is to obtain input text.
3. the method as described in claim 1, which is characterized in that all text combinations in the preset text library of traversal,
Obtain the text combination to match with the input text, comprising:
Word segmentation processing is carried out to the input text, to generate text key word;
Text combination all in preset text library is traversed, acquisition is identical with the text key word to interact keyword;
By with the text key word is identical interact keyword where text combination be determined as and the input text phase
The text combination matched.
4. the method as described in claim 1, which is characterized in that the corresponding category of language of the acquisition second terminal, and according to
The corresponding category of language of the second terminal generates output text, comprising:
The currently used system language type of second terminal is obtained, and using the currently used system language of the second terminal as institute
State the corresponding category of language of second terminal;
Acquisition is identical with the second terminal category of language in preset text library interacts languages;
Output text is generated according to languages and the preset text library of interacting identical with the second terminal category of language.
5. the method as described in claim 1, which is characterized in that described according to the output text and the second terminal pair
The category of language answered generates output voice, and the output voice is sent to the second terminal, comprising:
The fundamental frequency of the input voice is obtained, and judges whether the fundamental frequency is greater than preset frequency threshold;
If the fundamental frequency is greater than preset frequency threshold, determine that the input voice is that female voice inputs voice, according to described
It exports text and the corresponding category of language of the second terminal generates female voice and exports voice, and female voice output voice is sent out
It send to the second terminal;
If the fundamental frequency is not more than preset frequency threshold, determine that the input voice is that male voice inputs voice, according to institute
It states output text and the corresponding category of language of the second terminal generates male voice output voice, and the male voice is exported into voice
It is sent to the second terminal.
6. a kind of voice interaction device, which is characterized in that described device includes:
First acquisition unit for obtaining input voice by first terminal, and carries out at speech recognition the input voice
Reason is to obtain input text;
Second acquisition unit obtains and the input text phase for traversing text combination all in preset text library
The text combination matched;
Third acquiring unit, for obtaining the corresponding category of language of second terminal, and according to the corresponding language of the second terminal
Type generates output text;
First generation unit, for generating output language according to the output text and the corresponding category of language of the second terminal
Sound, and the output voice is sent to the second terminal.
7. device as claimed in claim 6, which is characterized in that the first acquisition unit includes:
First judging unit, for judging whether the input voice duration length is greater than preset time threshold;
First processing units, if being not more than preset time threshold for the input voice duration length, to institute
It states input voice and carries out voice recognition processing to obtain input text.
8. device as claimed in claim 6, which is characterized in that the second acquisition unit includes:
Second generation unit, for carrying out word segmentation processing to the input text, to generate text key word;
4th acquiring unit obtains and the text key word phase for traversing text combination all in preset text library
Same interaction keyword;
First determination unit, for by it is identical with the text key word interact keyword where text combination be determined as with
The text combination that the input text matches.
9. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor
The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1 to 5
The voice interactive method of item.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence, the computer program make the processor execute language as described in any one in claim 1-5 when being executed by a processor
Sound exchange method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811554298.5A CN109658931A (en) | 2018-12-19 | 2018-12-19 | Voice interactive method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811554298.5A CN109658931A (en) | 2018-12-19 | 2018-12-19 | Voice interactive method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109658931A true CN109658931A (en) | 2019-04-19 |
Family
ID=66114910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811554298.5A Pending CN109658931A (en) | 2018-12-19 | 2018-12-19 | Voice interactive method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109658931A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414014A (en) * | 2019-08-05 | 2019-11-05 | 珠海格力电器股份有限公司 | A kind of speech ciphering equipment control method, device, storage medium and speech ciphering equipment |
CN110502631A (en) * | 2019-07-17 | 2019-11-26 | 招联消费金融有限公司 | A kind of input information response method, apparatus, computer equipment and storage medium |
CN112767920A (en) * | 2020-12-31 | 2021-05-07 | 深圳市珍爱捷云信息技术有限公司 | Method, device, equipment and storage medium for recognizing call voice |
WO2021087665A1 (en) * | 2019-11-04 | 2021-05-14 | 深圳市欢太科技有限公司 | Data processing method and apparatus, server, and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140242955A1 (en) * | 2013-02-22 | 2014-08-28 | Samsung Electronics Co., Ltd | Method and system for supporting a translation-based communication service and terminal supporting the service |
CN104394265A (en) * | 2014-10-31 | 2015-03-04 | 小米科技有限责任公司 | Automatic session method and device based on mobile intelligent terminal |
CN104536978A (en) * | 2014-12-05 | 2015-04-22 | 奇瑞汽车股份有限公司 | Voice data identifying method and device |
CN105511857A (en) * | 2015-11-27 | 2016-04-20 | 小米科技有限责任公司 | System language setting method and device |
CN107610706A (en) * | 2017-09-13 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | The processing method and processing unit of phonetic search result |
US20180189264A1 (en) * | 2017-01-02 | 2018-07-05 | International Business Machines Corporation | Enhancing QA System Cognition With Improved Lexical Simplification Using Multilingual Resources |
-
2018
- 2018-12-19 CN CN201811554298.5A patent/CN109658931A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140242955A1 (en) * | 2013-02-22 | 2014-08-28 | Samsung Electronics Co., Ltd | Method and system for supporting a translation-based communication service and terminal supporting the service |
CN104394265A (en) * | 2014-10-31 | 2015-03-04 | 小米科技有限责任公司 | Automatic session method and device based on mobile intelligent terminal |
CN104536978A (en) * | 2014-12-05 | 2015-04-22 | 奇瑞汽车股份有限公司 | Voice data identifying method and device |
CN105511857A (en) * | 2015-11-27 | 2016-04-20 | 小米科技有限责任公司 | System language setting method and device |
US20180189264A1 (en) * | 2017-01-02 | 2018-07-05 | International Business Machines Corporation | Enhancing QA System Cognition With Improved Lexical Simplification Using Multilingual Resources |
CN107610706A (en) * | 2017-09-13 | 2018-01-19 | 百度在线网络技术(北京)有限公司 | The processing method and processing unit of phonetic search result |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110502631A (en) * | 2019-07-17 | 2019-11-26 | 招联消费金融有限公司 | A kind of input information response method, apparatus, computer equipment and storage medium |
CN110502631B (en) * | 2019-07-17 | 2022-11-04 | 招联消费金融有限公司 | Input information response method and device, computer equipment and storage medium |
CN110414014A (en) * | 2019-08-05 | 2019-11-05 | 珠海格力电器股份有限公司 | A kind of speech ciphering equipment control method, device, storage medium and speech ciphering equipment |
WO2021087665A1 (en) * | 2019-11-04 | 2021-05-14 | 深圳市欢太科技有限公司 | Data processing method and apparatus, server, and storage medium |
CN112767920A (en) * | 2020-12-31 | 2021-05-07 | 深圳市珍爱捷云信息技术有限公司 | Method, device, equipment and storage medium for recognizing call voice |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10936664B2 (en) | Dialogue system and computer program therefor | |
JP6799574B2 (en) | Method and device for determining satisfaction with voice dialogue | |
US20200395008A1 (en) | Personality-Based Conversational Agents and Pragmatic Model, and Related Interfaces and Commercial Models | |
Althoff et al. | Large-scale analysis of counseling conversations: An application of natural language processing to mental health | |
CN109658931A (en) | Voice interactive method, device, computer equipment and storage medium | |
CN108491514B (en) | Method and device for questioning in dialog system, electronic equipment and computer readable medium | |
CN107797984A (en) | Intelligent interactive method, equipment and storage medium | |
US11531693B2 (en) | Information processing apparatus, method and non-transitory computer readable medium | |
CN115309877B (en) | Dialogue generation method, dialogue model training method and device | |
CN109461459A (en) | Speech assessment method, apparatus, computer equipment and storage medium | |
US9653078B2 (en) | Response generation method, response generation apparatus, and response generation program | |
Chiba et al. | An analysis of the effect of emotional speech synthesis on non-task-oriented dialogue system | |
Dsouza et al. | Chat with bots intelligently: A critical review & analysis | |
CN110335608A (en) | Voice print verification method, apparatus, equipment and storage medium | |
CN107657949A (en) | The acquisition methods and device of game data | |
CN109859747A (en) | Voice interactive method, equipment and storage medium | |
CN112307754A (en) | Statement acquisition method and device | |
Perez et al. | Mind the gap: On the value of silence representations to lexical-based speech emotion recognition. | |
JP2018180459A (en) | Speech synthesis system, speech synthesis method, and speech synthesis program | |
CN111324710B (en) | Online investigation method and device based on virtual person and terminal equipment | |
US20230140480A1 (en) | Utterance generation apparatus, utterance generation method, and program | |
JP7212888B2 (en) | Automatic dialogue device, automatic dialogue method, and program | |
CN112349294A (en) | Voice processing method and device, computer readable medium and electronic equipment | |
CN111553173B (en) | Natural language generation training method and device | |
Rallabandi et al. | Submission from CMU for blizzard challenge 2019 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |