CN110379413A - A kind of method of speech processing, device, equipment and storage medium - Google Patents
A kind of method of speech processing, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110379413A CN110379413A CN201910580572.4A CN201910580572A CN110379413A CN 110379413 A CN110379413 A CN 110379413A CN 201910580572 A CN201910580572 A CN 201910580572A CN 110379413 A CN110379413 A CN 110379413A
- Authority
- CN
- China
- Prior art keywords
- voice
- sound
- sound bite
- phoneme
- speech processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/04—Real-time or near real-time messaging, e.g. instant messaging [IM]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/52—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services
Abstract
The embodiment of the present application discloses a kind of method of speech processing, which comprises voice is divided at least two sound bites;Determine the corresponding semantic segment of the sound bite;It is corresponding to show the sound bite and the semantic segment.The embodiment of the present application also discloses a kind of voice processing apparatus, equipment and storage medium.
Description
Technical field
The invention relates to field of computer technology, a kind of method of speech processing, device, equipment are related to, but are not limited to
And storage medium.
Background technique
User will receive one section of longer voice that other side sends sometimes in chat software, if it is desired to listening certain section of language again
Sound then needs to listen from the beginning, more troublesome.In the related art, current solution is that voice is changed into text, when making
When touching the position of corresponding text with finger, then voice is played since corresponding position;But using the solution in mobile phone
When end is operated, since the text that mobile phone terminal is shown is smaller, it is easy to occur accidentally to touch.
Summary of the invention
The embodiment of the present application provides a kind of method of speech processing, device, equipment and storage medium.
The technical solution of the embodiment of the present application is achieved in that
In a first aspect, the embodiment of the present application provides a kind of method of speech processing, which comprises
Voice is divided at least two sound bites;
Determine the corresponding semantic segment of the sound bite;
It is corresponding to show the sound bite and the semantic segment.
Second aspect, the embodiment of the present application provide a kind of voice processing apparatus, and described device includes: segmentation module, determination
Module and display module;Wherein,
The segmentation module, for voice to be divided at least two sound bites;
The determining module, for determining the corresponding semantic segment of the sound bite;
The display module shows the sound bite and the semantic segment for corresponding.
The third aspect, the embodiment of the present application also provides a kind of speech processing devices, comprising: processor and for storing energy
The memory of enough computer programs run on a processor;Wherein, when the processor is used to run the computer program,
Execute the step in method of speech processing described in any one of above scheme.
Fourth aspect, the embodiment of the present application also provides a kind of storage mediums, are stored thereon with computer program, the calculating
The step in method of speech processing described in any one of above scheme is realized when machine program is executed by processor.
In the embodiment of the present application, voice is divided at least two sound bites;Determine the corresponding language of the sound bite
Adopted segment;It is corresponding to show the sound bite and the semantic segment;In this way, user can be straight according to the semantic segment of display
The sound bite for wanting to listen again is selected in selecting, more convenient, improves user experience.
Detailed description of the invention
In attached drawing (it is not necessarily drawn to scale), similar appended drawing reference can describe phase in different views
As component.Similar reference numerals with different letter suffix can indicate the different examples of similar component.Attached drawing with example and
Unrestricted mode generally shows each embodiment discussed herein.
Figure 1A is the implementation process schematic diagram one of method of speech processing provided by the embodiment of the present application;
Figure 1B is the effect diagram one of method of speech processing provided by the embodiment of the present application;
Fig. 2 is the implementation process schematic diagram two of method of speech processing provided by the embodiment of the present application;
Fig. 3 is the implementation process schematic diagram three of method of speech processing provided by the embodiment of the present application;
Fig. 4 is the implementation process schematic diagram four of method of speech processing provided by the embodiment of the present application;
Fig. 5 is the implementation process schematic diagram five of method of speech processing provided by the embodiment of the present application;
Fig. 6 is the implementation process schematic diagram six of method of speech processing provided by the embodiment of the present application;
Fig. 7 is the effect diagram two of method of speech processing provided by the embodiment of the present application;
Fig. 8 is the effect diagram three of method of speech processing provided by the embodiment of the present application;
Fig. 9 is the implementation process schematic diagram seven of method of speech processing provided by the embodiment of the present application;
Figure 10 is the composed structure schematic diagram of voice processing apparatus provided by the embodiment of the present application;
Figure 11 is the hardware structural diagram of speech processing device provided by the embodiment of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, the specific technical solution of the application is described in further detail.Following embodiment is not for illustrating the application, but not
For limiting scope of the present application.
When the embodiment of the present application is described in detail, for purposes of illustration only, indicating that the sectional view of device architecture can disobey general proportion work
Partial enlargement, and the schematic diagram is example, should not limit the range of the application protection herein.In addition, in practical system
It should include the three-dimensional space of length, width and depth in work.
Method of speech processing provided by the embodiments of the present application can be applied to voice processing apparatus, and voice processing apparatus is implementable
In on speech processing device.The voice received is divided at least two sound bites by voice processing apparatus;Determine at least two
The corresponding semantic segment of a sound bite;At least two sound bites of corresponding display and the semantic segment, according to display
Sound bite semantic segment corresponding with the sound bite, user can choose the sound bite for needing to listen again.
The embodiment of the present application provides a kind of method of speech processing, and this method is applied at the voice for implementing method of speech processing
Equipment is managed, each functional module in speech processing device can be by the hardware of speech processing device (such as terminal device, server)
Resource, such as processor computing resource, sensor etc. detect resource, communication resource cooperative achievement.
Speech processing device can be any electronic equipment with information processing capability, in one embodiment, electronics
Equipment can be intelligent terminal, such as can be notebook etc. with the mobile terminal of wireless communication ability, AR/VR equipment.?
In another embodiment, electronic equipment can also be the inconvenient mobile terminal device with computing function, such as desk-top calculating
Machine, desktop computer, server etc..
Certainly, the embodiment of the present application is not limited to be provided as method and hardware, can also be there are many implementation, such as provides
For storage medium (being stored with the instruction for executing method of speech processing provided by the embodiments of the present application).
Figure 1A is the implementation process schematic diagram one of the method for speech processing in the embodiment of the present application, as shown in Figure 1A, the party
Method the following steps are included:
Step 101: voice is divided at least two sound bites;
Here, speech processing device receives a Duan Yuyin, and the voice received is divided at least two sound bites.
Wherein, the received voice of speech processing device can be what another equipment was sent, be also possible to server transmission.
After speech processing device receives one section of voice, according to specified rule, the voice received is split;
Here, specified rule can be averagely to divide according to the duration of voice, or according to the interval in voice, to voice into
Row segmentation.
If specified rule are as follows: averagely divide according to the duration of voice, then after receiving one section of voice, determine voice when
It is long, and average segmentation is carried out to the duration and obtains average duration, average segmentation is carried out according to the voice that the docking of average duration receives.
Such as: a period of time a length of 18 seconds voices are received, which is averagely divided into a length of 9 seconds sound bites.
If specified rule are as follows: according to the interval in voice, be split to voice, then after receiving one section of voice, really
Interval in the fixed voice is split the voice received according to determining interval.Such as: received one section of voice is
" A1 ... A2 ... A3 ", according to the interval " ... " in the determining voice by the voice be divided into sound bite " A1 ", " A2 ",
“A3”。
When dividing to voice, voice is divided at least two sound bite.Such as: voice A is divided into two
Sound bite A1, A2;For another example: voice A is divided into five sound bites A1, A2, A3, A4, A5.
Here, the duration of the voice under can first judging before being split to voice, when the duration of the voice is longer
When, then the voice is split.
Step 102: determining the corresponding semantic segment of the sound bite;
Here, when determining the corresponding semantic segment of sound bite, voice sheet can be determined according to semantics recognition model
The corresponding semantic segment of section.Semantics recognition model needs to be obtained according to multiple voice sample and semantic sample come used training.
Here, when voice is divided at least two sound bites, the corresponding semantic segment of the sound bite is determined.Than
Such as: the sound bite of segmentation is " A1 ", " A2 ", " A3 ", determines that the corresponding semantic segment of sound bite " A1 " is " this problem ", language
The corresponding semantic segment of tablet section " A2 " is " too difficult ", and the corresponding semantic segment of sound bite " A3 " is " I will not do ";Again
Such as: the sound bite of segmentation is " A1 ", " A2 ", " A3 ", " A4 ", determines that the corresponding semantic segment of sound bite " A1 " is " this
Secondary tourism ", the corresponding semantic segment of sound bite " A2 " are " must be very good experience ", the corresponding semanteme of sound bite " A3 "
Segment is " excellent ", and the corresponding semantic segment of sound bite " A4 " is " really expecting very much ".
Here, when determining the corresponding semantic segment of sound bite, at least two sound bites that segmentation can be obtained,
The corresponding semantic segment of at least two sound bites is determined respectively;It can also determine the corresponding semanteme of the voice, then by one section of language
Justice is divided into multiple semantic segments.
Step 103: corresponding to show the sound bite and the semantic segment.
Here, after obtaining at least two sound bites semantic segment corresponding with the sound bite, speech recognition is set
At least two sound bites of standby display semantic segment corresponding with the sound bite.
Wherein, the sound bite of display semantic segment corresponding with the sound bite, can be according to the successive of reception voice
Sequence is arranged successively on the display screen of speech recognition apparatus.
In display sound bite semantic segment corresponding with the sound bite, the display in speech recognition apparatus can be
Screen display.For example, as shown in Figure 1B: showing sound bite " A1 " and voice sheet respectively in the display screen of speech recognition apparatus
The corresponding semantic segment 11 of section " A1 " is " current tourism ", sound bite " A2 " and the corresponding semantic segment 12 of sound bite " A2 "
For " must be very good experience ", sound bite " A3 " and the corresponding semantic segment 13 of sound bite " A3 " be " excellent ", language
Tablet section " A4 " and the corresponding semantic segment 14 of sound bite " A4 " are " really expecting very much ".
Voice is divided at least two sound bites by method of speech processing provided by the embodiments of the present application;Described in determination
The corresponding semantic segment of sound bite;It is corresponding to show the sound bite and the semantic segment;In this way, user can basis
The semantic segment of display directly selects the sound bite for wanting to listen again, more convenient, improves user experience.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in Fig. 2, method includes the following steps:
Step 201: determining the corresponding duration of the voice;
Here, after speech processing device receives one section of voice, the corresponding duration of voice received is determined, such as:
Speech processing device receives one section of voice A, according to voice is received at the beginning of and the end time, determine the voice that receives
A length of 30 seconds when A is corresponding.
Step 202: in the case where the duration is greater than default specified duration, voice being divided at least two voice sheets
Section;
Here, specified duration is preset, is receiving the corresponding duration of one section of voice greater than the default specified duration
In the case of, it is believed that receiving one section of voice is long voice, and the long voice is divided at least two sound bites.
Such as: it is 20 seconds a length of when default specified, receive one section of voice it is corresponding when 30 seconds a length of, a received Duan Yuyin
Corresponding duration is greater than the default specified duration, and 30 seconds long voices are divided at least two sound bites.
In the case where receiving the corresponding duration of one section of voice less than the default specified duration, it is believed that receive one
Duan Yuyin is phrase sound, does not need to be split it.
It should be noted that default specified duration can be arranged according to the actual situation, the embodiment of the present application does not limit this
System.
Step 203: determining the corresponding semantic segment of the sound bite;
Step 204: corresponding to show the sound bite and the semantic segment.
Wherein, step 203 is to step 204 respectively referring to the step 102 in above-described embodiment to step 103.
Method of speech processing provided by the embodiments of the present application, the corresponding duration of true voice;When duration is greater than default specified
In the case where length, voice is divided at least two sound bites;Determine the corresponding semantic segment of sound bite;Corresponding display institute
State sound bite and the semantic segment;It so, it is possible to be split the voice when determining voice is long voice, obtain
To at least two sound bites, user experience is improved.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in figure 3, method includes the following steps:
Step 301: based on received the first operation for the voice, operation display interface;
Here, after speech processing device receives one section of voice, the voice received is shown to speech processing device
Display screen on, user operates voice on display screen, speech processing device based on it is received for voice first behaviour
Make, operation display interface;The operation interface includes: voice segmentation key.
Such as: speech processing device receives a length of 30 seconds voices at one section, and the voice is shown on a display screen,
User carries out right click operation to the voice, and speech processing device receives the right click operation for voice, operation display interface;It should
Operation interface includes: voice segmentation key.
It should be noted that the first operation can not limit this for touch control operations, the embodiment of the present application such as click, touches
System.
Step 302: based on received the second operation for voice segmentation key, generating voice split order;
Here, after operation display interface on the display screen of speech processing device, user is on operation display interface
Voice segmentation key is operated, and speech processing device generates language based on received the second operation for voice segmentation key
Sound split order.
Such as: user clicks the voice segmentation key on operation display interface, and speech processing device receives needle
To the clicking operation of voice segmentation key, voice split order is generated.
It should be noted that the second operation can not limit this for touch control operations, the embodiment of the present application such as click, touches
System.
Step 303: being based on the voice split order, the voice is divided at least two sound bites;
Here, key is divided based on voice, after generating voice split order, is based on voice split order, voice is divided
At at least two sound bites.
Wherein, voice split order can also carry the rule of voice segmentation, be divided according to voice split order and voice
Rule, voice is split.
Step 304: determining the corresponding semantic segment of the sound bite;
Step 305: corresponding to show the sound bite and the semantic segment.
Wherein, step 304 is to step 305 respectively referring to the step 102 in above-described embodiment to step 103.
Method of speech processing provided by the embodiments of the present application, based on received the first operation for voice, display operation
Interface;Operation interface includes: voice segmentation key;Based on received the second operation for voice segmentation key, voice is generated
Split order;Based on voice split order, voice is divided at least two sound bites;Determine the corresponding semanteme of sound bite
Segment;It is corresponding to show the sound bite and the semantic segment;It so, it is possible to divide key for voice based on received
Second operation, generate voice split order, voice is split, user experience is improved.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in figure 4, method includes the following steps:
Step 401: according to the interval in the voice, determining the first partitioning boundary of the voice;
Here, when voice is divided at least two sound bites, according to the interval in voice, the first of voice is determined
Partitioning boundary.Wherein, the interval in voice can be the pause in voice.
Such as: the pause " ... " in voice is determined as first point of voice for " A1 ... A2 ... A3 " by the voice received
Cut edge circle.
Step 402: according to first partitioning boundary, the voice being split, obtains at least two voice sheet
Section;
Here, according to the first determining partitioning boundary, voice is split, obtains at least two sound bites.Such as:
The voice received is that " A1 ... A2 ... A3 " is split voice according to the pause " ... " in voice, obtains three voice sheets
Section: sound bite " A1 ", sound bite " A2 ", sound bite " A3 ".
Step 403: determining the corresponding phoneme of the sound bite;
Here, true respectively after obtaining at least two sound bites when determining the corresponding semantic segment of sound bite
The fixed corresponding semantic segment of at least two sound bites.
When determining the corresponding semantic segment of at least two sound bites, the corresponding phoneme of the sound bite is first determined.Than
Such as: at least two sound bites are " A1 ", " A2 ", determine that the corresponding phoneme of sound bite " A1 " is " zhedaoti ", determine language
The corresponding phoneme of tablet section " A2 " is " tainanle ".
Here it is possible to determine the corresponding phoneme of sound bite according to semantics recognition model, deposited in advance in semantics recognition model
Store up the corresponding phoneme of pronunciation of each text in sound bite.
Step 404: the phoneme is matched with the phoneme of setting;
Here, it by the determining corresponding phoneme of sound bite, is matched with the phoneme of setting.Such as: determine voice sheet
The corresponding phoneme of section " A1 " is " zhedaoti ", and phoneme phoneme corresponding with all texts is matched.
Here, the phoneme of setting can be the corresponding phoneme of all texts, be stored in advance in speech processing device.
Step 405: if the phoneme is matched with the phoneme of setting, the corresponding semantic information of the phoneme of the setting is true
It is set to the corresponding semantic segment of the sound bite;
Here, it during the phoneme of phoneme and setting is carried out matched, if the corresponding phoneme of sound bite, and sets
Fixed phoneme matching, is determined as the corresponding semantic segment of sound bite for the corresponding semantic information of the phoneme of setting.
Such as: the corresponding phoneme of sound bite " A1 " is " zhedaoti ", phoneme " zhe ", " dao ", " ti " with setting
Matching, the phoneme " zhe " of setting, " dao ", " ti " corresponding semantic information are respectively " this ", " road ", " topic ", by semantic information
" this ", " road ", " topic " are determined as the corresponding semantic segment of sound bite " A1 " " this problem ".
Step 406: corresponding to show the sound bite and the semantic segment.
Wherein, step 406 is referring to the step 103 in above-described embodiment.
Method of speech processing provided by the embodiments of the present application determines the of the voice according to the interval in the voice
One partitioning boundary;According to first partitioning boundary, the voice is split, obtains at least two sound bites;It determines
The corresponding phoneme of the sound bite;The phoneme is matched with the phoneme of setting;If the sound of the phoneme and setting
Element matching, is determined as the corresponding semantic segment of the sound bite for the corresponding semantic information of the phoneme of the setting;It is corresponding aobvious
Show the sound bite and the semantic segment;In this way, being split according to partitioning boundary to voice, and according to phoneme, really
Determine the corresponding semantic segment of sound bite, improves user experience.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in figure 5, method includes the following steps:
Step 501: voice is divided at least two sound bites;
Wherein, step 501 is referring to the step 101 in above-described embodiment.
Step 502: determining the corresponding semantic information of the voice;
Here, when determining semantic information, a Duan Yuyin can be determined by receive one section of voice as a whole
Corresponding semantic information.
When determining the corresponding semantic information of one section of voice, the corresponding phoneme of the voice is first determined, by determining voice pair
The phoneme answered is matched with the phoneme of setting, if the corresponding phoneme of voice, matches with the phoneme of setting, by the sound of setting
The corresponding semantic information of element is determined as the corresponding semantic information of voice.
Such as: one section of voice is " A1 ... A2 ", determines that the corresponding phoneme of voice " A1 ... A2 " is " zhedaoti ...
Tainanle " matches phoneme phoneme corresponding with all texts, and the corresponding phoneme of voice " A1 ... A2 " is
" zhedaoti ... tainanle " is matched with phoneme " zhe ", " dao ", " ti ", " tai ", " nan ", " le " of setting, setting
The corresponding semantic information of phoneme be respectively " this ", " road ", " topic ", " too ", " hardly possible ", " ", by semantic information " this ", " road ",
" topic ", " too ", " hardly possible ", " " are determined as the corresponding semantic information " this problem is too difficult " of sound bite " A1 ... A2 ".
Here it is possible to determine the corresponding phoneme of sound bite according to semantics recognition model, deposited in advance in semantics recognition model
Store up the corresponding phoneme of pronunciation of each text in sound bite.The phoneme of setting can be the corresponding phoneme of all texts, in advance
It is first stored in speech processing device.
Step 503: according to the interval in the voice, institute's semantic information being added and is marked;
Here, after semantic information has been determined, according to the interval in voice, semantic information is added and is marked;Wherein, voice
In interval can be the pause in voice, can be to add punctuation mark to semantic information to semantic information addition label.
Such as: the voice received is " A1 ... A2 ", and corresponding semantic information is " this problem is too difficult ", according in voice
Pause " ... ", to semantic information " this problem is too difficult " add punctuate, obtain " this problem, too difficult ".
Step 504: according to the label, determining the second partitioning boundary of institute's semantic information;
Here, the label that semantic information will be added, the second partitioning boundary as semantic information.Such as: semantic information
Punctuation mark ", " in semantic information is determined as the second partitioning boundary of the semantic information for " this problem, too difficult ".
Step 505: according to second partitioning boundary, institute's semantic information being split, obtains at least two semantemes
Segment;
Here, according to the second determining partitioning boundary, semantic information is split, obtains at least two semantic segments.
Such as: the semantic information received is " this problem, too difficult ", according to the punctuation mark ", " in semantic information, to semantic information
It is split, obtains two semantic segments: semantic segment " this problem ", semantic segment " too difficult " ".
Step 506: corresponding to show the sound bite and the semantic segment.
Wherein, step 506 is referring to the step 103 in above-described embodiment.
Voice is divided at least two sound bites by method of speech processing provided by the embodiments of the present application;Described in determination
The corresponding semantic information of voice;According to the interval in the voice, institute's semantic information is added and is marked;According to the label,
Determine the second partitioning boundary of institute's semantic information;According to second partitioning boundary, institute's semantic information is split, is obtained
To at least two semantic segments;It is corresponding to show the sound bite and the semantic segment;In this way, according to the second segmentation side
Boundary is split semantic information, obtains at least two semantic segments, improves the experience of user.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in fig. 6, method includes the following steps:
Step 601: voice is divided at least two sound bites;
Step 602: determining the corresponding semantic segment of the sound bite;
Step 603: corresponding to show the sound bite and the semantic segment;
Wherein, step 601 is to step 603 respectively referring to the step 101 in above-described embodiment to step 103.
Step 604: the third received for the sound bite operates;
Here, it is shown that after sound bite and the corresponding semantic segment of sound bite, receive the third for being directed to sound bite
Operation.Such as: sound bite " A1 " and the corresponding semanteme of sound bite " A1 " are shown respectively in the display screen of speech recognition apparatus
Segment is " current tourism ", sound bite " A2 " and the corresponding semantic segment of sound bite " A2 " be " must be very good body
Test ", sound bite " A3 " and the corresponding semantic segment of sound bite " A3 " they are " excellent ", sound bite " A4 " and sound bite
" A4 " corresponding semantic segment is " really expecting very much ", and user clicks sound bite " A1 ", and speech recognition apparatus, which receives, to be directed to
The clicking operation of sound bite " A1 ".
It should be noted that third operation can not limit this for touch control operations, the embodiment of the present application such as click, touches
System.
Step 605: being operated based on the third, the sound bite is played out.
Here, third operation is received based on speech recognition apparatus, sound bite is played out.Such as: user clicks
Sound bite " A1 ", speech recognition apparatus receive the clicking operation for sound bite " A1 ", carry out to sound bite " A1 "
It plays.
Voice is divided at least two sound bites by method of speech processing provided by the embodiments of the present application;Described in determination
The corresponding semantic segment of sound bite;It is corresponding to show the sound bite and the semantic segment;It receives and is directed to the voice
The third of segment operates;It is operated, the sound bite is played out, in this way, user can be according to display based on the third
Semantic segment directly select and want the sound bite listened again, it is more convenient, improve user experience.
Method of speech processing provided by the embodiments of the present application is illustrated by concrete scene in the embodiment of the present application.
In the embodiment of the present application, after the long voice that user receives other side's transmission, " long voice segmentation " key, root are selected
Long voice and its identification text segmentation are showed into user at segment according to the pause in voice.
In one example, as shown in fig. 7, working as speech processing device, when receiving long voice 71, display function on display screen
Choice box 72 includes: using handset mode key 73, collection key 74, long voice segmentation key in display function choice box 72
75, the operation based on user on function choice box 72, long voice segmentation key 75 is selected.Speech processing device executive chairman's language
Cent cuts the corresponding dividing function of key 75, is split to long voice 71 is received, and obtains the long voice 71 corresponding three
Sound bite and corresponding text and corresponding display, as shown in Figure 8: " specifically travelling is for sound bite A1 and corresponding text 74
Primary very good experience ", sound bite A2 and corresponding text 75 " well done ", sound bite A3 and corresponding text 76 are " fast
It is fast to prepare thing ".
The implementation process schematic diagram of the method for speech processing of the embodiment of the present application, as shown in Figure 9:
Step 901: by long speech recognition at text, and punctuating.
Step 902: according to punctuate by long voice and text segmentation at sound bite and its corresponding text fragments.
Step 903: sound bite and its text fragments are showed into user.
The technical effect of the embodiment of the present application are as follows: user can directly select according to the text identified to want to listen again
Sound bite, it is more convenient, improve user experience.
The embodiment of the present application also provides a kind of voice processing apparatus, included by each module, each module included by the device
Each unit, can be realized by the processor of voice processing apparatus;Certainly it can also be realized by specific logic circuit;?
During implementation, processor can for central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or
Field programmable gate array (FPGA) etc..
As shown in Figure 10, voice processing apparatus 100 includes:
Divide module 1001, for voice to be divided at least two sound bites;
Determining module 1002, for determining the corresponding semantic segment of the sound bite;
Display module 1003 shows the sound bite and the semantic segment for corresponding.
In some embodiments, segmentation module 1001 includes: the first determination unit and cutting unit;Wherein,
First determination unit, for determining the corresponding duration of the voice;
Cutting unit, in the case where the duration is greater than default specified duration, voice to be divided at least two
Sound bite.
In some embodiments, voice processing apparatus 100 further include: display module and generation module;Wherein,
Display module, for based on received the first operation for the voice, operation display interface;Operation circle
Face includes: voice segmentation key;
Generation module, for generating voice and referring to based on received the second operation for voice segmentation key
It enables;
Correspondingly, divide module 1001, for being based on the voice split order, the voice is divided at least two
Sound bite.
In some embodiments, divide module 1001 further include: the second determination unit and third determination unit;Wherein,
Second determination unit, for determining the first partitioning boundary of the voice according to the interval in the voice;
Third determination unit, for being split to the voice, obtaining at least two according to first partitioning boundary
Sound bite.
In some embodiments, determining module 1002 further include: the 4th determination unit, matching unit and the 5th determine single
Member;Wherein,
4th determination unit, for determining the corresponding phoneme of the sound bite;
Matching unit, for matching the phoneme with the phoneme of setting;
5th determination unit, it is if matched for the phoneme with the phoneme of setting, the phoneme of the setting is corresponding
Semantic information is determined as the corresponding semantic segment of the sound bite.
In some embodiments, determining module 1002 further include: the 6th determination unit, marking unit, the 7th determination unit
With the 8th determination unit;Wherein,
6th determination unit, for determining the corresponding semantic information of the voice;
Marking unit, for adding and marking to institute's semantic information according to the interval in the voice;
7th determination unit, for determining the second partitioning boundary of institute's semantic information according to the label;
8th determination unit, for being split, being obtained at least to institute's semantic information according to second partitioning boundary
Two semantic segments.
In some embodiments, voice processing apparatus 100 further include: receiving module and playing module;Wherein,
Receiving module, for receiving the third operation for being directed to the sound bite;
Playing module plays out the sound bite for being operated based on the third.
It should be understood that voice processing apparatus provided by the above embodiment is in speech processes, only with above-mentioned each program
The division progress of module can according to need for example, in practical application and distribute above-mentioned processing by different program moulds
Block is completed, i.e., the internal structure of device is divided into different program modules, to complete all or part of place described above
Reason.In addition, voice processing apparatus provided by the above embodiment and method of speech processing embodiment belong to same design, it is specific real
Existing process is detailed in embodiment of the method, and which is not described herein again.
Speech processing device 110 shown in Figure 11 include: at least one processor 1110, memory 1140, at least one
Network interface 1120 and user interface 1130.Various components in speech processing device 110 are coupled in by bus system 1150
Together.It is understood that bus system 1150 is for realizing the connection communication between these components.It includes data that bus system 1150, which is removed,
It further include power bus, control bus and status signal bus in addition except bus.But for the sake of clear explanation, in Figure 11
Various buses are all designated as bus system 1150.
User interface 1130 may include display, keyboard, mouse, trace ball, click wheel, key, button, touch-sensitive plate or
Person's touch screen etc..
Memory 1140 can be volatile memory or nonvolatile memory, may also comprise volatile and non-volatile
Both memories.Wherein, nonvolatile memory can be read-only memory (ROM, Read Only Memory).Volatibility is deposited
Reservoir can be random access memory (RAM, Random Access Memory).The memory of description of the embodiment of the present invention
1140 are intended to include the memory of any suitable type.
Memory 1140 in the embodiment of the present invention can storing data to support the operation of speech processing device 110.This
The example of a little data includes: any computer program for operating on speech processing device 110, such as operating system and application
Program.Wherein, operating system includes various system programs, such as ccf layer, core library layer, driving layer etc., for realizing various
Basic business and the hardware based task of processing.Application program may include various application programs.
Wherein, when processor 1110 is used to run the computer program, to realize the voice provided in above-described embodiment
Step in processing method.
As the example that method provided in an embodiment of the present invention uses software and hardware combining to implement, the embodiment of the present invention is provided
Method can be embodied directly in and combined by the software module that processor 1110 executes, such as this language provided in an embodiment of the present invention
Sound processor, the software module of voice processing apparatus can store in memory 1140, and processor 1110 reads memory
The executable instruction that software module includes in 1140, in conjunction with necessary hardware (e.g., including processor 1110 and be connected to total
The other assemblies of line 1150) complete method of speech processing provided in an embodiment of the present invention.
As an example, processor 1110 can be a kind of IC chip, and the processing capacity with signal, for example, it is general
Processor, digital signal processor (DSP, Digital Signal Processor) or other programmable logic device are divided
Vertical door or transistor logic, discrete hardware components etc., wherein general processor can be microprocessor or any normal
The processor etc. of rule.
It need to be noted that: the above speech processing device implements the description of item, is similar with above method description
, there is with embodiment of the method identical beneficial effect, therefore do not repeat them here.For in the application speech processing device embodiment
Undisclosed technical detail, those skilled in the art please refer to the description of the application embodiment of the method and understand, to save a piece
Width, which is not described herein again.
In the exemplary embodiment, the embodiment of the present application also provides a kind of storage mediums, can deposit to be computer-readable
Storage media, the memory for example including storage computer program, above-mentioned computer program can be handled by processor, aforementioned to complete
Step described in method.Computer readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory,
The memories such as magnetic surface storage, CD or CD-ROM.
The embodiment of the present application also provides a kind of computer readable storage medium, is stored thereon with computer program, the calculating
Machine program realizes the step in the method for speech processing provided in above-described embodiment when being processed by the processor.
It need to be noted that: above computer media implements the description of item, be with above method description it is similar,
With the identical beneficial effect of same embodiment of the method, therefore do not repeat them here.For not disclosed in the application storage medium embodiment
Technical detail, those skilled in the art please refers to the description of the application embodiment of the method and understands, to save length, here
It repeats no more.
The method that above-mentioned the embodiment of the present application discloses can be applied in the processor, or real by the processor
It is existing.The processor may be a kind of IC chip, the processing capacity with signal.During realization, the above method
Each step can be completed by the instruction of the integrated logic circuit of the hardware in the processor or software form.Above-mentioned
The processor can be general processor, DSP or other programmable logic device, discrete gate or transistor logic device
Part, discrete hardware components etc..The processor may be implemented or execute disclosed each method, step in the embodiment of the present application
And logic diagram.General processor can be microprocessor or any conventional processor etc..In conjunction with the embodiment of the present application institute
The step of disclosed method, can be embodied directly in hardware decoding processor and execute completion, or with hard in decoding processor
Part and software module combination execute completion.Software module can be located in storage medium, which is located at memory, described
The step of processor reads the information in memory, completes preceding method in conjunction with its hardware.
It is appreciated that the memory (memory) of the embodiment of the present application can be volatile memory or non-volatile deposit
Reservoir may also comprise both volatile and non-volatile memories.Wherein, nonvolatile memory can be ROM, programmable
Read memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM,
Erasable Programmable Read-Only Memory), electrically erasable programmable read-only memory (EEPROM,
Electrically Erasable Programmable Read-Only Memory), magnetic RAM (FRAM,
Ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface storage, light
Disk or CD-ROM (CD-ROM, Compact Disc Read-Only Memory);Magnetic surface storage can be disk and deposit
Reservoir or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory),
It is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random
Access memory (SRAM, Static Random Access Memory), synchronous static random access memory (SSRAM,
Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic
Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random
Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate
Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory
(ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links dynamic random are deposited
Access to memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus arbitrary access are deposited
Reservoir (DRRAM, Direct Rambus Random Access Memory).The memory of the embodiment of the present application description is intended to wrap
Include but be not limited to the memory of these and any other suitable type.
It should be understood by those skilled in the art that, other of the method for speech processing of the embodiment of the present application constitute and make
With, be all for a person skilled in the art it is known, in order to reduce redundancy, the embodiment of the present application is not repeated them here.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example " " specific example "
Or the description of " some examples " etc. means particular features, structures, materials, or characteristics packet described in conjunction with this embodiment or example
In at least one embodiment or example contained in the application.In the present specification, schematic expression of the above terms are not necessarily
Refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any one
It can be combined in any suitable manner in a or multiple embodiment or examples.
While there has been shown and described that embodiments herein, it will be understood by those skilled in the art that: not
A variety of change, modification, replacement and modification can be carried out to these embodiments in the case where being detached from the principle and objective of the application, this
The range of application is by claim and its equivalent limits.
Claims (10)
1. a kind of method of speech processing, which comprises
Voice is divided at least two sound bites;
Determine the corresponding semantic segment of the sound bite;
It is corresponding to show the sound bite and the semantic segment.
2. according to the method described in claim 1, described be divided into voice before at least two sound bites, comprising:
Determine the corresponding duration of the voice;
In the case where the duration is greater than default specified duration, voice is divided at least two sound bites.
3. according to the method described in claim 1, the method also includes:
Based on received the first operation for the voice, operation display interface;The operation interface include: voice segmentation by
Key;
Based on received the second operation for voice segmentation key, voice split order is generated;
It is correspondingly, described that voice is divided at least two sound bites, comprising:
Based on the voice split order, the voice is divided at least two sound bites.
4. according to the method described in claim 1, described be divided at least two sound bites for voice, comprising:
According to the interval in the voice, the first partitioning boundary of the voice is determined;
According to first partitioning boundary, the voice is split, obtains at least two sound bites.
5. according to the method described in claim 1, the corresponding semantic segment of the determination sound bite, comprising:
Determine the corresponding phoneme of the sound bite;
The phoneme is matched with the phoneme of setting;
If the phoneme is matched with the phoneme of setting, the corresponding semantic information of the phoneme of the setting is determined as the voice
The corresponding semantic segment of segment.
6. according to the method described in claim 1, the corresponding semantic segment of the determination sound bite, comprising:
Determine the corresponding semantic information of the voice;
According to the interval in the voice, institute's semantic information is added and is marked;
According to the label, the second partitioning boundary of institute's semantic information is determined;
According to second partitioning boundary, institute's semantic information is split, at least two semantic segments are obtained.
7. according to the method described in claim 1, the method also includes:
The third received for the sound bite operates;
It is operated based on the third, the sound bite is played out.
8. a kind of voice processing apparatus, described device includes: segmentation module, determining module and display module;Wherein,
The segmentation module, for voice to be divided at least two sound bites;
The determining module, for determining the corresponding semantic segment of the sound bite;
The display module shows the sound bite and the semantic segment for corresponding.
9. a kind of speech processing device, including processor and for storing depositing for the computer program that can be run on a processor
Reservoir;Wherein, the processor is for when running the computer program, perform claim to be required at any one of 1 to 7 voice
Step in reason method.
10. a kind of storage medium, is stored thereon with computer program, realize that right is wanted when which is executed by processor
Seek the step in any one of 1 to 7 method of speech processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580572.4A CN110379413B (en) | 2019-06-28 | 2019-06-28 | Voice processing method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910580572.4A CN110379413B (en) | 2019-06-28 | 2019-06-28 | Voice processing method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110379413A true CN110379413A (en) | 2019-10-25 |
CN110379413B CN110379413B (en) | 2022-04-19 |
Family
ID=68251304
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910580572.4A Active CN110379413B (en) | 2019-06-28 | 2019-06-28 | Voice processing method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110379413B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103187061A (en) * | 2011-12-28 | 2013-07-03 | 上海博泰悦臻电子设备制造有限公司 | Speech conversational system in vehicle |
CN104616652A (en) * | 2015-01-13 | 2015-05-13 | 小米科技有限责任公司 | Voice transmission method and device |
CN106559541A (en) * | 2015-09-30 | 2017-04-05 | 北京奇虎科技有限公司 | voice data processing method and device |
CN106559540A (en) * | 2015-09-30 | 2017-04-05 | 北京奇虎科技有限公司 | voice data processing method and device |
CN107305541A (en) * | 2016-04-20 | 2017-10-31 | 科大讯飞股份有限公司 | Speech recognition text segmentation method and device |
CN108141498A (en) * | 2015-11-25 | 2018-06-08 | 华为技术有限公司 | A kind of interpretation method and terminal |
CN108874904A (en) * | 2018-05-24 | 2018-11-23 | 平安科技(深圳)有限公司 | Speech message searching method, device, computer equipment and storage medium |
CN109379641A (en) * | 2018-11-14 | 2019-02-22 | 腾讯科技(深圳)有限公司 | A kind of method for generating captions and device |
US20190066683A1 (en) * | 2017-08-31 | 2019-02-28 | Interdigital Ce Patent Holdings | Apparatus and method for residential speaker recognition |
CN109473104A (en) * | 2018-11-07 | 2019-03-15 | 苏州思必驰信息科技有限公司 | Speech recognition network delay optimization method and device |
-
2019
- 2019-06-28 CN CN201910580572.4A patent/CN110379413B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103187061A (en) * | 2011-12-28 | 2013-07-03 | 上海博泰悦臻电子设备制造有限公司 | Speech conversational system in vehicle |
CN104616652A (en) * | 2015-01-13 | 2015-05-13 | 小米科技有限责任公司 | Voice transmission method and device |
CN106559541A (en) * | 2015-09-30 | 2017-04-05 | 北京奇虎科技有限公司 | voice data processing method and device |
CN106559540A (en) * | 2015-09-30 | 2017-04-05 | 北京奇虎科技有限公司 | voice data processing method and device |
CN108141498A (en) * | 2015-11-25 | 2018-06-08 | 华为技术有限公司 | A kind of interpretation method and terminal |
CN107305541A (en) * | 2016-04-20 | 2017-10-31 | 科大讯飞股份有限公司 | Speech recognition text segmentation method and device |
US20190066683A1 (en) * | 2017-08-31 | 2019-02-28 | Interdigital Ce Patent Holdings | Apparatus and method for residential speaker recognition |
CN108874904A (en) * | 2018-05-24 | 2018-11-23 | 平安科技(深圳)有限公司 | Speech message searching method, device, computer equipment and storage medium |
CN109473104A (en) * | 2018-11-07 | 2019-03-15 | 苏州思必驰信息科技有限公司 | Speech recognition network delay optimization method and device |
CN109379641A (en) * | 2018-11-14 | 2019-02-22 | 腾讯科技(深圳)有限公司 | A kind of method for generating captions and device |
Also Published As
Publication number | Publication date |
---|---|
CN110379413B (en) | 2022-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11321535B2 (en) | Hierarchical annotation of dialog acts | |
EP3095113B1 (en) | Digital personal assistant interaction with impersonations and rich multimedia in responses | |
CN105068987B (en) | The words grade correcting method and system of voice input | |
CN110288980A (en) | Audio recognition method, the training method of model, device, equipment and storage medium | |
CN107924679A (en) | Delayed binding during inputting understanding processing in response selects | |
CN107230475A (en) | A kind of voice keyword recognition method, device, terminal and server | |
CN106897155B (en) | A kind of method for showing interface and device | |
CN108630193A (en) | Audio recognition method and device | |
CN107507615A (en) | Interface intelligent interaction control method, device, system and storage medium | |
US20140013192A1 (en) | Techniques for touch-based digital document audio and user interface enhancement | |
CN110459222A (en) | Sound control method, phonetic controller and terminal device | |
CN109634501B (en) | Electronic book annotation adding method, electronic equipment and computer storage medium | |
CN106598535A (en) | Volume adjustment method and apparatus | |
CN104123114A (en) | Method and device for playing voice | |
CN108885869A (en) | The playback of audio data of the control comprising voice | |
CN107591150A (en) | Audio recognition method and device, computer installation and computer-readable recording medium | |
CN105893351B (en) | Audio recognition method and device | |
CN103218555A (en) | Logging-in method and device for application program | |
CN108388597A (en) | Conference summary generation method and device | |
US20120053937A1 (en) | Generalizing text content summary from speech content | |
WO2015106646A1 (en) | Method and computer system for performing audio search on social networking platform | |
KR20140014510A (en) | Editing method of text generatied by a speech recognition and terminal thereof | |
CN111128254B (en) | Audio playing method, electronic equipment and storage medium | |
CN112309449A (en) | Audio recording method and device | |
CN110379413A (en) | A kind of method of speech processing, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |