CN110310632A - Method of speech processing and device and electronic equipment - Google Patents
Method of speech processing and device and electronic equipment Download PDFInfo
- Publication number
- CN110310632A CN110310632A CN201910583851.6A CN201910583851A CN110310632A CN 110310632 A CN110310632 A CN 110310632A CN 201910583851 A CN201910583851 A CN 201910583851A CN 110310632 A CN110310632 A CN 110310632A
- Authority
- CN
- China
- Prior art keywords
- voice
- redundancy
- voice messaging
- information
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000012545 processing Methods 0.000 title claims abstract description 53
- 238000001514 detection method Methods 0.000 claims description 65
- 230000004044 response Effects 0.000 claims description 10
- 238000007689 inspection Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 description 19
- 239000012634 fragment Substances 0.000 description 14
- 210000005036 nerve Anatomy 0.000 description 14
- 238000004590 computer program Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 206010052804 Drug tolerance Diseases 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000005266 casting Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000026781 habituation Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Telephonic Communication Services (AREA)
- Machine Translation (AREA)
Abstract
Present disclose provides a kind of method of speech processing, this method comprises: obtaining voice messaging;It determines in voice messaging with the presence or absence of redundancy;There are in the case where redundancy in the voice messaging, the redundancy is removed, obtains information to be processed;And according to information to be processed, determine the intent information for being directed to voice messaging.The disclosure additionally provides a kind of voice processing apparatus and a kind of electronic equipment.
Description
Technical field
This disclosure relates to a kind of method of speech processing and device and electronic equipment.
Background technique
With the fast development of electronic equipment, in order to improve user experience, the intellectualized technology of human-computer interaction such as voice
Identification technology is come into being.Speech recognition technology is inputted by monitoring users voice, and the voice input that discriminance analysis is monitored comes
It determines the phonetic order of user, so that electronic equipment can execute corresponding operating according to the phonetic order, realizes intelligentized
Human-computer interaction.
Existing speech recognition technology is often persistently monitored, and complete by what is persistently listened to when monitoring voice input
Voice input in portion's is sent to electronic equipment backstage and carries out identifying processing to determine the phonetic order of user.Due to redundancy language in the technology
Sound input can also be identified processing, can interfere the identification of right instructions to a certain extent.In order to avoid redundant voice input is dry
It disturbs, existing speech recognition technology can not also hear user speech input in the predetermined time, or listen to the superfluous of user
The input of remaining voice (such as " uh ", " ", " this " and/or " that " etc.) when cease listening for, the voice listened to is inputted and is sent out
Identifying processing is carried out toward backstage to determine the phonetic order of user.In view of redundant voice input is often the habituation of user
Statement, and the termination of voice input is not characterized, the above-mentioned scheme for listening to redundant semantic input and ceasing listening for can undoubtedly be led
The omission for causing efficient voice input, to influence the identification of right instructions.
Summary of the invention
An aspect of this disclosure provides a kind of for improving the method for speech processing of user experience.This method comprises:
Obtain voice messaging;It determines in voice messaging with the presence or absence of redundancy;There are in the case where redundancy in voice messaging,
Redundancy is removed, information to be processed is obtained;According to information to be processed, the intent information for being directed to voice messaging is determined.
Optionally, above-mentioned acquisition voice messaging includes: the starting point and language using the input of end-point detection model inspection voice
The terminating point of sound input;And according to the starting point of voice input and the terminating point of voice input, acquire voice messaging.
Optionally, the terminating point of detection voice input includes: the starting point in response to detecting voice input, determines detection
Voice input whether be redundant voice;In the case where determining the voice input of detection is redundant voice, end-point detection is changed
The parameter of model obtains updating aft terminal detection model;And according to aft terminal detection model is updated, detect the end of voice input
Stop.
Optionally, the parameter of above-mentioned end-point detection model includes: the waiting time for terminating point;Above-mentioned change endpoint inspection
The parameter for surveying model includes: to increase the waiting time for being directed to terminating point.
Optionally, include with the presence or absence of redundancy in above-mentioned determining voice messaging: being known using the first speech recognition modeling
Other voice messaging determines in voice messaging with the presence or absence of redundant voice information.Above-mentioned determination and the matched intention of voice messaging are believed
Breath includes: to identify information to be processed using the second speech recognition modeling, obtains the text to be processed with information matches to be processed;With
And according to text to be processed, using the determination of semantic understanding model and the matched intent information of voice messaging.Wherein, redundancy information
Include redundant voice information.
Optionally, above-mentioned removal redundancy, obtaining information to be processed includes: the starting point and redundancy according to redundancy
The terminating point of information, removes redundancy from voice messaging.
Optionally, include with the presence or absence of redundancy in above-mentioned determining voice messaging: being known using the second speech recognition modeling
Other voice messaging obtains and the matched speech text of voice messaging;It determines in speech text with the presence or absence of redundancy text;And
There are in the case where redundancy text, determine that there are redundancies in voice messaging in speech text.Above-mentioned determination is believed for voice
The intent information of breath includes: according to information to be processed, using the determination of semantic understanding model and the matched intent information of voice messaging.
Optionally, above-mentioned removal redundancy, obtaining information to be processed includes: the redundancy text removed in speech text,
Obtain text to be processed.Wherein, information to be processed includes text to be processed.
Another aspect of the present disclosure provides a kind of voice processing apparatus, which includes: acquisition module, for obtaining language
Message breath;Redundancy determining module, for determining in voice messaging with the presence or absence of redundancy;Redundancy remove module,
For, there are in the case where redundancy, removing redundancy in voice messaging, obtaining information to be processed;And intent information
Determining module, for determining the intent information for being directed to voice messaging according to information to be processed.
Optionally, above-mentioned acquisition module includes: detection sub-module, for using the input of end-point detection model inspection voice
Starting point and the terminating point of voice input;And acquisition submodule, what starting point and voice for being inputted according to voice inputted
Terminating point acquires voice messaging.
Optionally, above-mentioned detection sub-module includes: voice determination unit, for the starting in response to detecting voice input
Point determines whether the voice input of detection is redundant voice;Parameter change unit, for being superfluous in the voice input for determining detection
In the case where remaining voice, the parameter of end-point detection model is changed, obtains updating aft terminal detection model;And detection unit, it uses
According to aft terminal detection model is updated, the terminating point of voice input is detected.
Optionally, the parameter of above-mentioned end-point detection model includes: the waiting time for terminating point.Above-mentioned parameter modified application
Member is for increasing the waiting time for terminating point.
Optionally, above-mentioned redundancy determining module is used to identify voice messaging using the first speech recognition modeling, determines
It whether there is redundant voice information in voice messaging.Above-mentioned intent information determining module includes: the first identification submodule, for adopting
Information to be processed is identified with the second speech recognition modeling, obtains the text to be processed with information matches to be processed;And it is intended to true
Stator modules are used for according to text to be processed, using the determination of semantic understanding model and the matched intent information of voice messaging.Its
In, redundancy includes redundant voice information.
Optionally, above-mentioned redundancy remove module is used for the termination of the starting point and redundancy according to redundancy
Point, removes redundancy from voice messaging.
Optionally, above-mentioned redundancy determining module includes: the second identification submodule, for using the second speech recognition mould
Type identifies voice messaging, obtains and the matched speech text of voice messaging;Text determines submodule, for determining in speech text
With the presence or absence of redundancy text;And redundancy determines submodule, for, there are in the case where redundancy text, being determined in speech text
There are redundancies in voice messaging.Above-mentioned intent information determining module is used for according to information to be processed, using semantic understanding mould
Type determination and the matched intent information of voice messaging.
Optionally, above-mentioned redundancy remove module is used to remove the redundancy text in voice text, obtains text to be processed,
Wherein, information to be processed includes text to be processed.
Another aspect of the present disclosure provides a kind of electronic equipment, including one or more processors;And storage device,
For storing one or more programs, wherein when one or more of programs are executed by one or more of processors,
So that one or more of processors execute above-mentioned method of speech processing.
Another aspect of the disclosure provides a kind of computer readable storage medium, is stored with the executable finger of computer
It enables, which makes processor execute above-mentioned information processing method when being executed by processor.
Another aspect of the disclosure provides a kind of computer program, which, which includes that computer is executable, refers to
It enables, described instruction is when executed for realizing method as described above.
Detailed description of the invention
In order to which the disclosure and its advantage is more fully understood, referring now to being described below in conjunction with attached drawing, in which:
Fig. 1 is diagrammatically illustrated according to the method for speech processing and device of the embodiment of the present disclosure and answering for electronic equipment
With scene figure;
Fig. 2 diagrammatically illustrates the flow chart of the method for speech processing according to disclosure exemplary embodiment one;
Fig. 3 A diagrammatically illustrates the flow chart of the acquisition voice messaging according to disclosure exemplary embodiment;
Fig. 3 B diagrammatically illustrates the flow chart of the terminating point of the detection voice input according to disclosure exemplary embodiment;
Fig. 4 diagrammatically illustrates the flow chart of the method for speech processing according to disclosure exemplary embodiment two;
Fig. 5 diagrammatically illustrates the flow chart of the method for speech processing according to disclosure exemplary embodiment three;
Fig. 6 diagrammatically illustrates the exemplary process diagram of the method for speech processing according to disclosure exemplary embodiment;
Fig. 7 diagrammatically illustrates the structural block diagram of the voice processing apparatus according to disclosure exemplary embodiment;And
Fig. 8 diagrammatically illustrates the structure of the electronic equipment for being adapted for carrying out method of speech processing according to the embodiment of the present disclosure
Block diagram.
Specific embodiment
Hereinafter, will be described with reference to the accompanying drawings embodiment of the disclosure.However, it should be understood that these descriptions are only exemplary
, and it is not intended to limit the scope of the present disclosure.In the following detailed description, to elaborate many specific thin convenient for explaining
Section is to provide the comprehensive understanding to the embodiment of the present disclosure.It may be evident, however, that one or more embodiments are not having these specific thin
It can also be carried out in the case where section.In addition, in the following description, descriptions of well-known structures and technologies are omitted, to avoid
Unnecessarily obscure the concept of the disclosure.
Term as used herein is not intended to limit the disclosure just for the sake of description specific embodiment.It uses herein
The terms "include", "comprise" etc. show the presence of the feature, step, operation and/or component, but it is not excluded that in the presence of
Or add other one or more features, step, operation or component.
There are all terms (including technical and scientific term) as used herein those skilled in the art to be generally understood
Meaning, unless otherwise defined.It should be noted that term used herein should be interpreted that with consistent with the context of this specification
Meaning, without that should be explained with idealization or excessively mechanical mode.
It, in general should be according to this using statement as " at least one in A, B and C etc. " is similar to
Field technical staff is generally understood the meaning of the statement to make an explanation (for example, " system at least one in A, B and C "
Should include but is not limited to individually with A, individually with B, individually with C, with A and B, with A and C, have B and C, and/or
System etc. with A, B, C).Using statement as " at least one in A, B or C etc. " is similar to, generally come
Saying be generally understood the meaning of the statement according to those skilled in the art to make an explanation (for example, " having in A, B or C at least
One system " should include but is not limited to individually with A, individually with B, individually with C, with A and B, have A and C, have
B and C, and/or the system with A, B, C etc.).
Shown in the drawings of some block diagrams and/or flow chart.It should be understood that some sides in block diagram and/or flow chart
Frame or combinations thereof can be realized by computer program instructions.These computer program instructions can be supplied to general purpose computer,
The processor of special purpose computer or other programmable data processing units, so that these instructions are when executed by this processor can be with
Creation is for realizing function/operation device illustrated in these block diagrams and/or flow chart.The technology of the disclosure can be hard
The form of part and/or software (including firmware, microcode etc.) is realized.In addition, the technology of the disclosure, which can be taken, is stored with finger
The form of computer program product on the computer readable storage medium of order, the computer program product is for instruction execution system
System uses or instruction execution system is combined to use.
Embodiment of the disclosure provides a kind of method of speech processing for improving user experience and device and electronics
Equipment.Wherein, method of speech processing includes: acquisition voice messaging;It determines in voice messaging with the presence or absence of redundancy;In voice
There are in the case where redundancy in information, redundancy is removed, information to be processed is obtained;And according to information to be processed, really
Surely it is directed to the intent information of voice messaging.
The method of speech processing of the disclosure can be in voice messaging before determining the intent information of voice messaging
Redundancy is removed.When determining intent information, determine that user instruction is corresponding according to the voice for eliminating redundancy
Intent information.So as to avoid lacking because of phonetic order identification inaccuracy caused by the interference of redundancy in the prior art
It falls into, and therefore improves the accuracy of identification user speech instruction, improve user experience.
Fig. 1 is diagrammatically illustrated according to the method for speech processing and device of the embodiment of the present disclosure and answering for electronic equipment
With scene figure.It should be noted that being only the example that can apply the scene of the embodiment of the present disclosure shown in Fig. 1, to help ability
Field technique personnel understand the technology contents of the disclosure, but are not meant to that the embodiment of the present disclosure may not be usable for other equipment, be
System, environment or scene.
As shown in Figure 1, the application scenarios 100 of the embodiment of the present disclosure include terminal device 111,112,113.
Wherein, terminal device 111,112,113 has voice detection function, with for detect terminal device 111,112,
The voice messaging of 113 place environment, and in the phonetic order for detecting user, in response to the phonetic order of user, execute with
The matched operation of phonetic order.The terminal device 111,112,113 for example can include but is not limited to smart home, intelligent video
Equipment, intelligent wearable device, smart phone, tablet computer, pocket computer on knee and desktop computer etc..
In accordance with an embodiment of the present disclosure, which can also for example have processing function, to inspection
The voice messaging measured is identified and is handled, and determines intent information corresponding with the voice messaging of user, and believe according to intention
The phonetic order of breath response user.
In accordance with an embodiment of the present disclosure, various clients can be for example installed on the terminal device 111,112,113 to answer
With, for example, intelligent sound assistant, music class application, shopping class application, searching class application, instant messaging tools, social activity put down
Platform software etc. (merely illustrative).The terminal device 111,112,113 can when running the various client applications, in response to
The voice messaging that family is inputted by each client application determines intent information corresponding with voice messaging.
As shown in Figure 1, the application scenarios 100 of the embodiment of the present disclosure can also include network 120 and server 130.Network
120 between terminal device 111,112,113 and server 130 to provide the medium of communication link.Network 120 may include
Various connection types, such as wired, wireless communication link or fiber optic cables etc..
Server 130 can be to provide the server of various services, such as utilize terminal device 111,112,113 to user
The client application run provides the back-stage management server (merely illustrative) supported.The server 130 for example can also be
Virtual server equipped with cloud platform.The voice messaging that server can obtain terminal device 111,112,113 is known
Not Deng handle, and by processing result (such as may include intent information etc.) for voice messaging feed back to terminal device 111,
112,113, in order to which the phonetic order of 111,112,113 couples of users of terminal device responds.
It should be noted that method of speech processing provided by the embodiment of the present disclosure generally can by terminal device 111,
112,113 or server 130 execute.Correspondingly, voice processing apparatus provided by the embodiment of the present disclosure generally can be set in
In terminal device 111,112,113 or server 130.
It should be understood that the number and type of terminal device, network and server in Fig. 1 are only schematical.According to
It realizes and needs, can have the terminal device, network and server of arbitrary number and type.
Fig. 2 diagrammatically illustrates the flow chart of the method for speech processing according to disclosure exemplary embodiment one.
As shown in Fig. 2, the method for speech processing of the embodiment of the present disclosure includes operation S210~operation S240.The speech processes
Method can be executed by terminal device 111,112,113 or be executed by server 130.
In operation S210, voice messaging is obtained.
In accordance with an embodiment of the present disclosure, which for example can be the real-time acquisition of terminal device 111,112,113 and obtains
It takes.Alternatively, the voice messaging can also be voice acquisition device acquire and be sent in real time in real time terminal device 111,112,
113 or server 130.
In accordance with an embodiment of the present disclosure, which for example may include voice letter corresponding with the phonetic order of user
Breath, in order to which terminal device 111,112,113 is in response to the voice messaging, execution and the matched operation of phonetic order, thus real
Existing man-machine interactive voice.Wherein, voice messaging for example can be the acoustic signals etc. acquired in real time, and the disclosure does not limit this
It is fixed.
In accordance with an embodiment of the present disclosure, the acquisition methods of the voice messaging are detailed in the operating process of Fig. 3 A~Fig. 3 B description,
Details are not described herein.
In operation S220, determine in voice messaging with the presence or absence of redundancy.
In accordance with an embodiment of the present disclosure, operation S220 can for example be determined by carrying out identifying processing to voice messaging
It whether there is redundancy in voice messaging.Wherein, redundancy for example may include not having to the intent information of determining user
The information of help.It such as may include that the modal particles such as " ", " this ", " that " or auxiliary words of mood or user are accustomed to using
Pet phrase (such as " seeming ", " being such ") etc..
In accordance with an embodiment of the present disclosure, aforesaid operations S220 for example may include: first to use speech recognition technology by voice
Information is converted to speech text, then judges again in speech text with the presence or absence of redundancy text.This method is detailed in Fig. 5 to operation
S521~operation S523 description, this will not be detailed here.
In accordance with an embodiment of the present disclosure, aforesaid operations S220 for example may include: first to voice messaging (such as sound wave believe
Number) identifying processing is carried out, it determines in the acoustic signals with the presence or absence of the matched acoustic signals of pronunciation with above-mentioned redundancy vocabulary.
When there are acoustic signals matched with the pronunciation of redundancy vocabulary, determine that there are redundancies in voice messaging.This method is detailed in
To the description of operation S420 in Fig. 4, this will not be detailed here.
In operation S230, there are in the case where redundancy in voice messaging, redundancy is removed, letter to be processed is obtained
Breath.
In accordance with an embodiment of the present disclosure, operation S230 for example may include: to position redundancy according to determining redundancy
Position of the information in voice messaging or speech text, and according to the position of the redundancy from voice messaging or speech text
Delete the redundancy.
In accordance with an embodiment of the present disclosure, after removing redundancy, in order to enable remaining voice messaging or voice text
This has continuity, can also voice messaging to redundancy two sides or speech text carry out splicing, to obtain most
Whole information to be processed.
It should be noted that information to be processed is to spell in the case where operating S220 removal is the acoustic signals of redundancy
Acoustic signals after connecing.In the case where operating S220 removal is redundancy text, information to be processed is spliced voice text
This.
The intent information for being directed to voice messaging is determined according to information to be processed in operation S240.
In accordance with an embodiment of the present disclosure, the intent information for example may include that can characterize the text envelope of user's intention
The machine language (such as binary code or character string etc.) that breath or terminal device can identify.For example, the intent information can
To include text information corresponding with the user demands such as " playing music ", " casting weather forecast " or machine language etc..
In accordance with an embodiment of the present disclosure, the case where information to be processed is the acoustic signals after removing the acoustic signals of redundancy
Under, operation S240 may include: first to carry out identifying processing to remaining acoustic signals, obtain matching with remaining acoustic signals
Speech text.Then the processing such as feature extraction, syntactic analysis and text cluster is carried out to speech text again, obtained for voice
The intent information of information.
In accordance with an embodiment of the present disclosure, the case where information to be processed is the speech text after removing redundant voice text
Under, operation S240 may include: to carry out the processing such as feature extraction, syntactic analysis and text cluster to remaining speech text,
Obtain the intent information for voice messaging.
In summary, the method for speech processing of the embodiment of the present disclosure, due to determining the intent information for being directed to voice messaging
Before, first remove redundancy, so as to avoid the presence because of redundancy caused by determine intent information inaccuracy
Defect.Therefore the accuracy of determining intent information can be improved in the method for speech processing of the embodiment of the present disclosure, and therefore can
The phonetic order of enough precisely response users, improves user experience.
In accordance with an embodiment of the present disclosure, Fig. 2, which operates the voice messaging in S210, for example can acquire in real time voice messaging
While, the starting point and ending point of the voice messaging acquired in real time is determined using end-point detection model.
Fig. 3 A diagrammatically illustrates the flow chart of the acquisition voice messaging according to disclosure exemplary embodiment.
It as shown in Figure 3A, may include following operation S311~behaviour using the method that end-point detection model obtains voice messaging
Make S312.In operation S311, using the starting point of end-point detection model inspection voice input and the terminating point of voice input.
In accordance with an embodiment of the present disclosure, wherein end-point detection model for example may include based on end-point detection (Voice
Activity Detection, VAD) technology building model, with the starting point for accurately orienting voice from voice
And terminating point.Wherein, voice inputs the voice flow as acquired in real time.
In accordance with an embodiment of the present disclosure, temporal signatures and frequency domain that end-point detection model is for example mainly inputted according to voice
Thinking that feature combines, which constructs, to be formed.Wherein, temporal signatures for example may include time domain energy size, energy gradient
Deng.Frequency domain character for example may include fundamental frequency, frequency domain sub-band etc..
In accordance with an embodiment of the present disclosure, the detection of the starting point of voice input for example may include: in end-point detection model
The starting point of voice input is confirmly detected when detecting audio-frequency information.The detection of the terminating point of voice input for example can wrap
It includes: after the starting point for detecting voice input, if audio-frequency information is not detected in predetermined amount of time (such as in 100ms), really
Regular inspection measures the terminating point of voice input.Alternatively, if when the voice input detected is redundant voice information, it is determined that detect
The terminating point of voice input.
Voice messaging is acquired according to the starting point of voice input and the terminating point of voice input in operation S312.
In accordance with an embodiment of the present disclosure, operation S312 for example may include: detect voice input starting point when,
The voice flow that persistently will acquire is stored to terminal device 111,112,113 or server 130.In the termination for detecting voice input
When point, then stop storaged voice stream.Therefore, the voice flow stored in final terminal device 111,112,113 or server 130 is i.e.
For for carrying out redundancy judgement and determining the voice messaging of intent information.
In accordance with an embodiment of the present disclosure, in order to avoid using end-point detection model inspection voice input terminating point when,
The incomplete situation of voice messaging that storage caused by terminating point is directly determined because detecting redundant voice, it is defeated in detection voice
When the terminating point entered, such as can also be according to the parameter of the voice messaging adjustment endpoint detection model obtained in real time.So that adjustment
End-point detection model afterwards detect redundancy or it is mute etc. meet the voice messaging of termination condition when, be capable of providing longer
Waiting time determine whether to detect terminating point again, thus guarantee obtain voice messaging integrality.
Fig. 3 B diagrammatically illustrates the flow chart of the terminating point of the detection voice input according to disclosure exemplary embodiment.
As shown in Figure 3B, the method for the terminating point of the detection voice input of the embodiment of the present disclosure may include following operation
S3111~operation S3113.The voice input of detection is determined in response to detecting the starting point of voice input in operation S3111
It whether is redundant voice.The end is changed in the case where determining the voice input of detection is redundant voice in operation S3112
The parameter of point detection model obtains updating aft terminal detection model.It is examined in operation S3113 according to aft terminal detection model is updated
Survey the terminating point of voice input.
In accordance with an embodiment of the present disclosure, operation S3111 can for example use the first nerves network model of pre-training.?
After the starting point for detecting voice input, the voice messaging obtained in real time is inputted in first nerves network model, by the first mind
Output result through network model determines in voice messaging with the presence or absence of redundant voice.Wherein, the first nerves network mould
Type can be for example two disaggregated models etc., and the disclosure does not limit this.
In accordance with an embodiment of the present disclosure, make end-point detection in order to avoid the first nerves network model processing time is longer
The update of model, which exists, to be delayed, and lesser threshold value or the less number of plies can be set for the first nerves network model, to improve
The treatment effeciency at first nerves network.
In accordance with an embodiment of the present disclosure, the parameter of above-mentioned end-point detection model for example may include the waiting for terminating point
Time (Eos waiting time) determines the waiting time of terminating point that is, after detecting mute or redundancy.In order to avoid superfluous
The case where efficient voice input can also be collected after remaining information, operation S3112 may include: to determine that voice input is redundancy
When information, increases the waiting time for being directed to terminating point, the longer waiting time is established with the determination for terminating point.
Aforesaid operations S3113 determines voice messaging again after the waiting time for terminating point for increasing end-point detection model
Terminating point, it is ensured that obtain voice messaging during will not be terminated because detecting voice messaging, thereby may be ensured that
The voice messaging integrality of acquisition.
Fig. 4 diagrammatically illustrates the flow chart of the method for speech processing according to disclosure exemplary embodiment two.
As shown in figure 4, the method for speech processing of the embodiment of the present disclosure other than operating S210, further includes operation S420~behaviour
Make S430.Wherein, operating S420 and operating S430 is respectively a specific embodiment for operating S220 and operating S230.It is operating
S420 identifies voice messaging using the first speech recognition modeling, determines in voice messaging with the presence or absence of redundant voice information.It is depositing
In the case where redundancy, operation S430 is executed, according to the terminating point of the starting point of redundancy and redundancy, from voice
Redundancy is removed in information.Wherein, redundancy is redundant voice information.
In accordance with an embodiment of the present disclosure, which for example may include nervus opticus network model.On
Stating operation S420 may include: to input voice messaging in nervus opticus network model, handle via nervus opticus network model
Multiple voice fragments in voice messaging are exported afterwards to belong to the probability of redundant voice information or directly export in multiple voice fragments
Each voice fragment be redundant voice information or be not redundant voice information result.Belong to redundancy exporting multiple voice fragments
In the case where the probability of voice messaging, make a reservation for if being greater than first in the presence of the probability for belonging to redundant voice information in multiple voice fragments
The voice fragment of probability (such as 80%) determines that there are redundant voice information in voice messaging, and it is pre- to determine that probability is greater than first
The voice fragment for determining probability is redundant voice information.
In accordance with an embodiment of the present disclosure, for the ease of positioning position of each voice fragment in entire voice messaging, logical
It crosses in the output result that nervus opticus network model obtains, result corresponding with voice fragment each in multiple voice fragments may be used also
To include start position and final position of the voice fragment in entire voice messaging.Operating S430 can be according to multiple voice point
Start position and the final position for belonging to the voice fragment of redundant voice information in piece, remove this from entire voice messaging and belong to
The voice fragment of redundant voice information, to obtain information to be processed.
In accordance with an embodiment of the present disclosure, which for example can be also used for executing operation S420~operation
All operations of S430 description, then via the processing of nervus opticus network model, what is exported is to remove redundancy
Information to be processed.
The nervus opticus network model for needing to illustrate and the first nerves network mould referred in the aforementioned description to Fig. 3 B
The difference of type is that first nerves network model can only identify voice messaging, but cannot divide voice messaging
Piece processing and/or removal operation.
In accordance with an embodiment of the present disclosure, since the information to be processed obtained via operation S430 is still voice messaging,
In the intent information for determining user, need to be first text information by speech signal analysis.As shown in figure 4, operation S220 can be with
Including following operation S441~operation S442.In operation S441, information to be processed is identified using the second speech recognition modeling, is obtained
With the text to be processed of information matches to be processed.In operation S442, according to text to be processed, using semantic understanding model determine with
The matched intent information of voice messaging.
In accordance with an embodiment of the present disclosure, the second speech recognition modeling for example may include based on automatic speech recognition
The model of (Automatic Speech Recognition, ASR) technology building.Operation S441 can specifically include: using the
Two speech recognition modelings will operate the information to be processed that S430 is obtained and be converted to speech text.
In accordance with an embodiment of the present disclosure, semantic understanding model for example may include based on natural language understanding (Natural
Language Understanding, NLU) technology building model.Operation S442 can specifically include: using semantic understanding mould
Type in operation S441 to obtaining carrying out text to be processed sentence detection, participle, part-of-speech tagging, syntactic analysis, text classification/poly-
The processing such as class, obtains the intent information of user.
In summary, voice messaging is being converted to text using ASR technology by the method for speech processing of the embodiment of the present disclosure
Before, the redundant voice in voice messaging is removed, first so as to improve the accurate of speech text that ASR technology identifies
Rate, and therefore further increase user experience.
Fig. 5 diagrammatically illustrates the flow chart of the method for speech processing according to disclosure exemplary embodiment three.
As shown in figure 5, the method for speech processing of the embodiment of the present disclosure other than operating S210, further includes operation S521~behaviour
Make S523.Wherein, operation S521~S523 is operation mono- specific embodiment of S220.In operation S521, known using the second voice
Other model identifies voice messaging, obtains and the matched speech text of voice messaging.
In accordance with an embodiment of the present disclosure, operation S521 is similar with the operation S441 in Fig. 4, and difference is only that operation S521
In voice messaging be do not remove the voice messaging of redundancy, and operate the voice messaging in S441 be remove redundancy
Voice messaging.
In operation S522, judge in speech text with the presence or absence of redundancy text.
In accordance with an embodiment of the present disclosure, operation S522 may include: by speech text and pre-stored redundancy dictionary
It is compared, whether judges in speech text including the redundancy vocabulary in redundancy dictionary.If in speech text including redundancy vocabulary,
Then illustrate that there are redundancy texts in speech text;If do not included redundancy vocabulary in speech text, then illustrate not deposit in speech text
In redundancy text.
In accordance with an embodiment of the present disclosure, it such as can also be determined by using the good third nerve network model of pre-training
It whether there is redundancy text in redundancy dictionary.Operation S522 may include: that speech text is inputted third nerve network model,
It is handled by third nerve network model and exports the probability that the multiple vocabulary for obtaining composition speech text belong to redundancy vocabulary.More
When being greater than the vocabulary of the second predetermined probability (such as 70%) in the presence of the probability for belonging to redundancy vocabulary in a vocabulary, then language can be determined
There are redundancy texts in sound text.
There are operation S523 in the case where redundancy text, can be executed in speech text, determines in voice messaging and exist
Redundancy.It needs to move before determining the intent information of user to improve the accuracy of the intent information of determining user
Except the redundancy text in speech text.Therefore, removing the redundancy in voice messaging can be realized by operation S530.?
S530 is operated, the redundancy text in speech text is removed, obtains text to be processed.
In accordance with an embodiment of the present disclosure, operation S530 may include the redundancy vocabulary removed in speech text, obtain wait locate
Manage text.In accordance with an embodiment of the present disclosure, for the ease of positioning position of each text fragment in entire speech text, passing through
In the output result that third nerve network model obtains, not only includes the probability that each vocabulary belongs to redundancy vocabulary, can also wrap
Include byte location of each vocabulary in entire speech text.Operation S530 can be according to belonging to redundancy vocabulary in multiple vocabulary
Probability is greater than the byte location of the vocabulary of the second predetermined probability, and the probability for belonging to redundancy vocabulary is removed from entire speech text
Greater than the vocabulary of the second predetermined probability, to obtain text to be processed.In such cases, information to be processed above-mentioned be it is described to
Handle text.
After removing redundancy text, the determination of intent information can be carried out.Therefore operation S540 is executed, according to be processed
Information, using the determination of semantic understanding model and the matched intent information of voice messaging.Wherein, the behaviour in operation S540 and Fig. 4
It is identical to make S442, details are not described herein.
In the case where redundancy text is not present in speech text, it can determine and redundancy is not present in voice messaging,
So as to directly according to speech text determination and the matched intent information of voice messaging.It is not in the judging result of operation S522
There are in the case where redundancy text, directly execute operation S540 to determine the intent information of user.In accordance with an embodiment of the present disclosure,
The third nerve network and the difference of nervus opticus network is that nervus opticus network is used for voice messaging
Reason, and third nerve network is for handling speech text.
Fig. 6 diagrammatically illustrates the exemplary process diagram of the method for speech processing according to disclosure exemplary embodiment.
As shown in fig. 6, the method for speech processing of the embodiment of the present disclosure is firstly the need of acquisition raw tone (operation S610).Example
Such as, the raw tone acquired can be " this lustily water that grace plays Liu Dehua ".Operation S610 and operation S210
Similar, details are not described herein.
In order to avoid the inaccuracy of speech recognition caused by the presence of redundant voice in raw tone, need to move redundant voice
It removes.The removal can remove before raw tone is converted to text, can also be after raw tone to be converted to text
It removes.
Wherein, when removing redundant voice before raw tone is converted to text, operation S620 is executed, removes redundancy letter
Breath obtains voice messaging corresponding with " the lustily water for playing Liu Dehua ".After removing redundant voice, i.e., executable operation
Voice messaging is converted to text by automatic speech recognition technology by S630, obtains finally identifying that text 601, such as text " are broadcast
Put the lustily water of Liu Dehua ".After obtaining finally identifying text 601, operation S650 can be executed, using natural language understanding skill
Art carries out semantic understanding, obtains the intent information of user.Wherein, operation S620 specifically can by the operation S420 in Fig. 4~
S430 is operated to execute, operation S630 can be executed by the operation S441 in Fig. 4, and operation S650 can be by Fig. 4
S442 is operated to execute, details are not described herein.
Wherein, it when removing redundant voice after raw tone to be converted to text, needs first to be converted to raw tone
Text executes operation S630, raw tone is converted to text using automatic speech recognition technology.Then it carries out in text again
The removal of redundancy text, and operation S640 is executed, redundancy text is removed, obtains finally identifying that text 601, such as text " are broadcast
Put the lustily water of Liu Dehua ".After obtaining finally identifying text 601, operation S650 can be executed, using natural language understanding skill
Art carries out semantic understanding, obtains the intent information of user.Wherein, operation S630 specifically can by the operation S521 in Fig. 5 come
It executes, operation S640 can be executed by operation S522~operation S523 in Fig. 5 and operation S530, and operation S650 can lead to
The operation S540 that crosses in Fig. 5 is executed, and details are not described herein.
Fig. 7 diagrammatically illustrates the structural block diagram of the voice processing apparatus according to disclosure exemplary embodiment.
As shown in fig. 7, the voice processing apparatus 700 of the embodiment of the present disclosure is including obtaining module 710, redundancy determines mould
Block 720, redundancy remove module 730 and intent information determining module 740.
Wherein, module 710 is obtained for obtaining voice messaging (operation S210).
In accordance with an embodiment of the present disclosure, as shown in fig. 7, the acquisition module 710 may include detection sub-module 711 and obtain
Submodule 712.Wherein, detection sub-module is used to input using the starting point and voice of the input of end-point detection model inspection voice
Terminating point (operation S311).Wherein, the terminating point of acquisition submodule is used to be inputted according to voice starting point and voice input, is obtained
Obtain voice messaging (operation S312).
In accordance with an embodiment of the present disclosure, as shown in fig. 7, detection sub-module 711 may include voice determination unit 7111, ginseng
Number changing unit 7112 and detection unit 7113.Wherein, voice determination unit 7111 is used in response to detecting voice input
Starting point determines whether the voice input of detection is redundant voice (operation S3111).Parameter change unit 7112 is used in determination
In the case that the voice input of detection is redundant voice, the parameter of end-point detection model is changed, obtains updating aft terminal detection mould
Type (operation S3112).Detection unit 7113 is used to detect the terminating point of voice input according to aft terminal detection model is updated.
In accordance with an embodiment of the present disclosure, the parameter of above-mentioned end-point detection model includes the waiting time for terminating point.On
Parameter change unit 7112 is stated for increasing the waiting time for being directed to terminating point.
Wherein, redundancy determining module 720 is for determining in voice messaging with the presence or absence of redundancy (operation S220).
Redundancy remove module 730 is used in voice messaging remove redundancy there are in the case where redundancy, obtain wait locate
Manage information (operation S230).Intent information determining module 740 is used to determine the intention for being directed to voice messaging according to information to be processed
Information (operation S240).
In accordance with an embodiment of the present disclosure, above-mentioned redundancy determining module 720 is used to know using the first speech recognition modeling
Other voice messaging determines in voice messaging with the presence or absence of redundant voice information (operation S420).Above-mentioned redundancy remove module
730 from voice messaging for removing redundancy (operation according to the starting point of redundancy and the terminating point of redundancy
S430).As shown in fig. 7, intent information determining module 740 may include the first identification submodule 741 and be intended to determine submodule
742.Wherein, the first identification submodule is used to identify information to be processed using the second speech recognition modeling, obtains and letter to be processed
Cease matched text to be processed (operation S441).It is intended to determine that submodule 742 is used for according to text to be processed, using semantic understanding
Model determination and the matched intent information of voice messaging (operation S442).Wherein, redundancy includes redundant voice information.
In accordance with an embodiment of the present disclosure, as shown in fig. 7, above-mentioned redundancy determining module 720 may include the second identification
Submodule 721, text determine that submodule 722 and redundancy determine submodule 723.Wherein, the second identification submodule 721 is for using
Second speech recognition modeling identifies voice messaging, obtains and the matched speech text of voice messaging (operation S521).Text determines
Submodule 722 is for determining in speech text with the presence or absence of redundancy text (operation S522).Redundancy determines submodule in language
There are in the case where redundancy text, determine that there are redundancy (operation S523) in voice messaging in sound text.Above-mentioned redundancy letter
Breath remove module 730 be used for remove voice text in redundancy text, obtain text to be processed, wherein information to be processed include to
Handle text (operation S530).Above-mentioned intent information determining module 740 is used for according to information to be processed, using semantic understanding model
The determining and matched intent information of voice messaging (operation S540).
It is module according to an embodiment of the present disclosure, submodule, unit, any number of or in which any more in subelement
A at least partly function can be realized in a module.It is single according to the module of the embodiment of the present disclosure, submodule, unit, son
Any one or more in member can be split into multiple modules to realize.According to the module of the embodiment of the present disclosure, submodule,
Any one or more in unit, subelement can at least be implemented partly as hardware circuit, such as field programmable gate
Array (FPGA), programmable logic array (PLA), system on chip, the system on substrate, the system in encapsulation, dedicated integrated electricity
Road (ASIC), or can be by the hardware or firmware for any other rational method for integrate or encapsulate to circuit come real
Show, or with any one in three kinds of software, hardware and firmware implementations or with wherein any several appropriately combined next reality
It is existing.Alternatively, can be at least by part according to one or more of the module of the embodiment of the present disclosure, submodule, unit, subelement
Ground is embodied as computer program module, when the computer program module is run, can execute corresponding function.
Fig. 8 diagrammatically illustrates the structure of the electronic equipment for being adapted for carrying out method of speech processing according to the embodiment of the present disclosure
Block diagram.
As shown in figure 8, electronic equipment 800 includes processor 810 and computer readable storage medium 820.The electronic equipment
800 can execute the method according to the embodiment of the present disclosure.
Specifically, processor 810 for example may include general purpose microprocessor, instruction set processor and/or related chip group
And/or special microprocessor (for example, specific integrated circuit (ASIC)), etc..Processor 810 can also include using for caching
The onboard storage device on way.Processor 810 can be the different movements for executing the method flow according to the embodiment of the present disclosure
Single treatment unit either multiple processing units.
Computer readable storage medium 820, such as can be non-volatile computer readable storage medium, specific example
Including but not limited to: magnetic memory apparatus, such as tape or hard disk (HDD);Light storage device, such as CD (CD-ROM);Memory, such as
Random access memory (RAM) or flash memory;Etc..
Computer readable storage medium 820 may include computer program 821, which may include generation
Code/computer executable instructions execute processor 810 according to the embodiment of the present disclosure
Method or its any deformation.
Computer program 821 can be configured to have the computer program code for example including computer program module.Example
Such as, in the exemplary embodiment, the code in computer program 821 may include one or more program modules, for example including
821A, module 821B ....It should be noted that the division mode and number of module are not fixation, those skilled in the art can
To be combined according to the actual situation using suitable program module or program module, when these program modules are combined by processor 810
When execution, processor 810 is executed according to the method for the embodiment of the present disclosure or its any deformation.
According to an embodiment of the invention, at least one of each module of Fig. 7 description, each submodule, each unit can be real
It is now the computer program module described with reference to Fig. 8, when being executed by processor 810, may be implemented described above corresponding
Operation.
The disclosure additionally provides a kind of computer readable storage medium, which can be above-mentioned reality
It applies included in equipment/device/system described in example;Be also possible to individualism, and without be incorporated the equipment/device/
In system.Above-mentioned computer readable storage medium carries one or more program, when said one or multiple program quilts
When execution, the method according to the embodiment of the present disclosure is realized.
In accordance with an embodiment of the present disclosure, computer readable storage medium can be non-volatile computer-readable storage medium
Matter, such as can include but is not limited to: portable computer diskette, hard disk, random access storage device (RAM), read-only memory
(ROM), erasable programmable read only memory (EPROM or flash memory), portable compact disc read-only memory (CD-ROM), light
Memory device, magnetic memory device or above-mentioned any appropriate combination.In the disclosure, computer readable storage medium can
With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
Person is in connection.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
It will be understood by those skilled in the art that the feature recorded in each embodiment and/or claim of the disclosure can
To carry out multiple combinations and/or combination, even if such combination or combination are not expressly recited in the disclosure.Particularly, exist
In the case where not departing from disclosure spirit or teaching, the feature recorded in each embodiment and/or claim of the disclosure can
To carry out multiple combinations and/or combination.All these combinations and/or combination each fall within the scope of the present disclosure.
Although the disclosure, art technology has shown and described referring to the certain exemplary embodiments of the disclosure
Personnel it should be understood that in the case where the spirit and scope of the present disclosure limited without departing substantially from the following claims and their equivalents,
A variety of changes in form and details can be carried out to the disclosure.Therefore, the scope of the present disclosure should not necessarily be limited by above-described embodiment,
But should be not only determined by appended claims, also it is defined by the equivalent of appended claims.
Claims (10)
1. a kind of method of speech processing, comprising:
Obtain voice messaging;
It determines in the voice messaging with the presence or absence of redundancy;
There are in the case where redundancy in the voice messaging, the redundancy is removed, obtains information to be processed;And
According to the information to be processed, the intent information for being directed to the voice messaging is determined.
2. according to the method described in claim 1, wherein, the acquisition voice messaging includes:
Using the starting point of end-point detection model inspection voice input and the terminating point of voice input;And
According to the starting point of voice input and the terminating point of voice input, the voice messaging is acquired.
3. according to the method described in claim 2, wherein, the terminating point for detecting the voice input includes:
In response to detecting the starting point of the voice input, determine whether the voice input of detection is redundant voice;
In the case where determining the voice input of detection is redundant voice, the parameter of the end-point detection model is changed, is obtained more
New aft terminal detection model;And
According to the update aft terminal detection model, the terminating point of the voice input is detected.
4. according to the method described in claim 3, wherein:
The parameter of the end-point detection model includes: the waiting time for terminating point;
The parameter of the change end-point detection model includes: to increase the waiting time for being directed to terminating point.
5. according to the method described in claim 1, wherein:
It determines in the voice messaging and includes with the presence or absence of redundancy: the voice letter is identified using the first speech recognition modeling
Breath determines in the voice messaging with the presence or absence of redundant voice information;
It determines with the matched intent information of the voice messaging and includes:
The information to be processed is identified using the second speech recognition modeling, obtains the text to be processed with the information matches to be processed
This;And
It is determining with the matched intent information of the voice messaging using semantic understanding model according to the text to be processed,
Wherein, the redundancy includes the redundant voice information.
6. obtaining information to be processed includes: according to the method described in claim 5, wherein, removing the redundancy
According to the terminating point of the starting point of the redundancy and the redundancy, removed from the voice messaging described superfluous
Remaining information.
7. according to the method described in claim 1, wherein:
It determines in the voice messaging and includes with the presence or absence of redundancy:
The voice messaging is identified using the second speech recognition modeling, is obtained and the matched speech text of the voice messaging;
It determines in the speech text with the presence or absence of redundancy text;And
There are in the case where the redundancy text, determine that there are redundancy letters in the voice messaging in the speech text
Breath;
Determine for the voice messaging intent information include: according to the information to be processed, it is true using semantic understanding model
The fixed and matched intent information of the voice messaging.
8. obtaining information to be processed includes: according to the method described in claim 7, wherein, removing the redundancy
The redundancy text in the speech text is removed, text to be processed is obtained,
Wherein, the information to be processed includes the text to be processed.
9. a kind of voice processing apparatus, comprising:
Module is obtained, for obtaining voice messaging;
Redundancy determining module, for determining in the voice messaging with the presence or absence of redundancy;
Redundancy remove module, for there are in the case where redundancy, remove the redundancy letter in the voice messaging
Breath, obtains information to be processed;And
Intent information determining module, for determining the intent information for being directed to the voice messaging according to the information to be processed.
10. a kind of electronic equipment, comprising:
One or more processors;And
Storage device, for storing one or more programs,
Wherein, when one or more of programs are executed by one or more of processors, so that one or more of
Method described in processor execution according to claim 1~any one of 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910583851.6A CN110310632A (en) | 2019-06-28 | 2019-06-28 | Method of speech processing and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910583851.6A CN110310632A (en) | 2019-06-28 | 2019-06-28 | Method of speech processing and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110310632A true CN110310632A (en) | 2019-10-08 |
Family
ID=68079695
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910583851.6A Pending CN110310632A (en) | 2019-06-28 | 2019-06-28 | Method of speech processing and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110310632A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111292729A (en) * | 2020-02-06 | 2020-06-16 | 北京声智科技有限公司 | Method and device for processing audio data stream |
WO2021114840A1 (en) * | 2020-05-28 | 2021-06-17 | 平安科技(深圳)有限公司 | Scoring method and apparatus based on semantic analysis, terminal device, and storage medium |
CN113539295A (en) * | 2021-06-10 | 2021-10-22 | 联想(北京)有限公司 | Voice processing method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090063150A1 (en) * | 2007-08-27 | 2009-03-05 | International Business Machines Corporation | Method for automatically identifying sentence boundaries in noisy conversational data |
CN102567290A (en) * | 2010-12-30 | 2012-07-11 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for expanding short text to be processed |
CN103425744A (en) * | 2013-07-17 | 2013-12-04 | 百度在线网络技术(北京)有限公司 | Method and device used for identifying addressing request in inquiry sequence of user |
CN105118502A (en) * | 2015-07-14 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | End point detection method and system of voice identification system |
CN105427870A (en) * | 2015-12-23 | 2016-03-23 | 北京奇虎科技有限公司 | Voice recognition method and device aiming at pauses |
CN107195303A (en) * | 2017-06-16 | 2017-09-22 | 北京云知声信息技术有限公司 | Method of speech processing and device |
CN108257616A (en) * | 2017-12-05 | 2018-07-06 | 苏州车萝卜汽车电子科技有限公司 | Interactive detection method and device |
CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
US20190163691A1 (en) * | 2017-11-30 | 2019-05-30 | CrowdCare Corporation | Intent Based Dynamic Generation of Personalized Content from Dynamic Sources |
-
2019
- 2019-06-28 CN CN201910583851.6A patent/CN110310632A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090063150A1 (en) * | 2007-08-27 | 2009-03-05 | International Business Machines Corporation | Method for automatically identifying sentence boundaries in noisy conversational data |
CN102567290A (en) * | 2010-12-30 | 2012-07-11 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for expanding short text to be processed |
CN103425744A (en) * | 2013-07-17 | 2013-12-04 | 百度在线网络技术(北京)有限公司 | Method and device used for identifying addressing request in inquiry sequence of user |
CN105118502A (en) * | 2015-07-14 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | End point detection method and system of voice identification system |
CN105427870A (en) * | 2015-12-23 | 2016-03-23 | 北京奇虎科技有限公司 | Voice recognition method and device aiming at pauses |
CN107195303A (en) * | 2017-06-16 | 2017-09-22 | 北京云知声信息技术有限公司 | Method of speech processing and device |
US20190163691A1 (en) * | 2017-11-30 | 2019-05-30 | CrowdCare Corporation | Intent Based Dynamic Generation of Personalized Content from Dynamic Sources |
CN108257616A (en) * | 2017-12-05 | 2018-07-06 | 苏州车萝卜汽车电子科技有限公司 | Interactive detection method and device |
CN109377998A (en) * | 2018-12-11 | 2019-02-22 | 科大讯飞股份有限公司 | A kind of voice interactive method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111292729A (en) * | 2020-02-06 | 2020-06-16 | 北京声智科技有限公司 | Method and device for processing audio data stream |
WO2021114840A1 (en) * | 2020-05-28 | 2021-06-17 | 平安科技(深圳)有限公司 | Scoring method and apparatus based on semantic analysis, terminal device, and storage medium |
CN113539295A (en) * | 2021-06-10 | 2021-10-22 | 联想(北京)有限公司 | Voice processing method and device |
CN113539295B (en) * | 2021-06-10 | 2024-04-23 | 联想(北京)有限公司 | Voice processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107423363B (en) | Artificial intelligence based word generation method, device, equipment and storage medium | |
CN110047481B (en) | Method and apparatus for speech recognition | |
CN107305541A (en) | Speech recognition text segmentation method and device | |
US10535352B2 (en) | Automated cognitive recording and organization of speech as structured text | |
CN110310632A (en) | Method of speech processing and device and electronic equipment | |
US20180047387A1 (en) | System and method for generating accurate speech transcription from natural speech audio signals | |
CN106649253B (en) | Auxiliary control method and system based on rear verifying | |
US20230068897A1 (en) | On-device personalization of speech synthesis for training of speech recognition model(s) | |
US20210151039A1 (en) | Method and apparatus for speech interaction, and computer storage medium | |
US11783808B2 (en) | Audio content recognition method and apparatus, and device and computer-readable medium | |
CN111916088B (en) | Voice corpus generation method and device and computer readable storage medium | |
CN111462741B (en) | Voice data processing method, device and storage medium | |
CN111951779A (en) | Front-end processing method for speech synthesis and related equipment | |
CN112151015A (en) | Keyword detection method and device, electronic equipment and storage medium | |
CN108877779B (en) | Method and device for detecting voice tail point | |
CN109377990A (en) | A kind of information processing method and electronic equipment | |
CN111428011B (en) | Word recommendation method, device, equipment and storage medium | |
CN110889008B (en) | Music recommendation method and device, computing device and storage medium | |
CN112466287B (en) | Voice segmentation method, device and computer readable storage medium | |
CN113658586A (en) | Training method of voice recognition model, voice interaction method and device | |
JP6322125B2 (en) | Speech recognition apparatus, speech recognition method, and speech recognition program | |
CN112397053B (en) | Voice recognition method and device, electronic equipment and readable storage medium | |
CN112201225B (en) | Corpus acquisition method and device, readable storage medium and electronic equipment | |
CN112669833A (en) | Voice interaction error correction method and device | |
US11929070B1 (en) | Machine learning label generation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191008 |
|
RJ01 | Rejection of invention patent application after publication |