CN110379413A - A kind of method of speech processing, device, equipment and storage medium - Google Patents

A kind of method of speech processing, device, equipment and storage medium Download PDF

Info

Publication number
CN110379413A
CN110379413A CN201910580572.4A CN201910580572A CN110379413A CN 110379413 A CN110379413 A CN 110379413A CN 201910580572 A CN201910580572 A CN 201910580572A CN 110379413 A CN110379413 A CN 110379413A
Authority
CN
China
Prior art keywords
voice
sound
sound bite
phoneme
speech processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910580572.4A
Other languages
Chinese (zh)
Other versions
CN110379413B (en
Inventor
赵泽清
汪俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201910580572.4A priority Critical patent/CN110379413B/en
Publication of CN110379413A publication Critical patent/CN110379413A/en
Application granted granted Critical
Publication of CN110379413B publication Critical patent/CN110379413B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services

Abstract

The embodiment of the present application discloses a kind of method of speech processing, which comprises voice is divided at least two sound bites;Determine the corresponding semantic segment of the sound bite;It is corresponding to show the sound bite and the semantic segment.The embodiment of the present application also discloses a kind of voice processing apparatus, equipment and storage medium.

Description

A kind of method of speech processing, device, equipment and storage medium
Technical field
The invention relates to field of computer technology, a kind of method of speech processing, device, equipment are related to, but are not limited to And storage medium.
Background technique
User will receive one section of longer voice that other side sends sometimes in chat software, if it is desired to listening certain section of language again Sound then needs to listen from the beginning, more troublesome.In the related art, current solution is that voice is changed into text, when making When touching the position of corresponding text with finger, then voice is played since corresponding position;But using the solution in mobile phone When end is operated, since the text that mobile phone terminal is shown is smaller, it is easy to occur accidentally to touch.
Summary of the invention
The embodiment of the present application provides a kind of method of speech processing, device, equipment and storage medium.
The technical solution of the embodiment of the present application is achieved in that
In a first aspect, the embodiment of the present application provides a kind of method of speech processing, which comprises
Voice is divided at least two sound bites;
Determine the corresponding semantic segment of the sound bite;
It is corresponding to show the sound bite and the semantic segment.
Second aspect, the embodiment of the present application provide a kind of voice processing apparatus, and described device includes: segmentation module, determination Module and display module;Wherein,
The segmentation module, for voice to be divided at least two sound bites;
The determining module, for determining the corresponding semantic segment of the sound bite;
The display module shows the sound bite and the semantic segment for corresponding.
The third aspect, the embodiment of the present application also provides a kind of speech processing devices, comprising: processor and for storing energy The memory of enough computer programs run on a processor;Wherein, when the processor is used to run the computer program, Execute the step in method of speech processing described in any one of above scheme.
Fourth aspect, the embodiment of the present application also provides a kind of storage mediums, are stored thereon with computer program, the calculating The step in method of speech processing described in any one of above scheme is realized when machine program is executed by processor.
In the embodiment of the present application, voice is divided at least two sound bites;Determine the corresponding language of the sound bite Adopted segment;It is corresponding to show the sound bite and the semantic segment;In this way, user can be straight according to the semantic segment of display The sound bite for wanting to listen again is selected in selecting, more convenient, improves user experience.
Detailed description of the invention
In attached drawing (it is not necessarily drawn to scale), similar appended drawing reference can describe phase in different views As component.Similar reference numerals with different letter suffix can indicate the different examples of similar component.Attached drawing with example and Unrestricted mode generally shows each embodiment discussed herein.
Figure 1A is the implementation process schematic diagram one of method of speech processing provided by the embodiment of the present application;
Figure 1B is the effect diagram one of method of speech processing provided by the embodiment of the present application;
Fig. 2 is the implementation process schematic diagram two of method of speech processing provided by the embodiment of the present application;
Fig. 3 is the implementation process schematic diagram three of method of speech processing provided by the embodiment of the present application;
Fig. 4 is the implementation process schematic diagram four of method of speech processing provided by the embodiment of the present application;
Fig. 5 is the implementation process schematic diagram five of method of speech processing provided by the embodiment of the present application;
Fig. 6 is the implementation process schematic diagram six of method of speech processing provided by the embodiment of the present application;
Fig. 7 is the effect diagram two of method of speech processing provided by the embodiment of the present application;
Fig. 8 is the effect diagram three of method of speech processing provided by the embodiment of the present application;
Fig. 9 is the implementation process schematic diagram seven of method of speech processing provided by the embodiment of the present application;
Figure 10 is the composed structure schematic diagram of voice processing apparatus provided by the embodiment of the present application;
Figure 11 is the hardware structural diagram of speech processing device provided by the embodiment of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the specific technical solution of the application is described in further detail.Following embodiment is not for illustrating the application, but not For limiting scope of the present application.
When the embodiment of the present application is described in detail, for purposes of illustration only, indicating that the sectional view of device architecture can disobey general proportion work Partial enlargement, and the schematic diagram is example, should not limit the range of the application protection herein.In addition, in practical system It should include the three-dimensional space of length, width and depth in work.
Method of speech processing provided by the embodiments of the present application can be applied to voice processing apparatus, and voice processing apparatus is implementable In on speech processing device.The voice received is divided at least two sound bites by voice processing apparatus;Determine at least two The corresponding semantic segment of a sound bite;At least two sound bites of corresponding display and the semantic segment, according to display Sound bite semantic segment corresponding with the sound bite, user can choose the sound bite for needing to listen again.
The embodiment of the present application provides a kind of method of speech processing, and this method is applied at the voice for implementing method of speech processing Equipment is managed, each functional module in speech processing device can be by the hardware of speech processing device (such as terminal device, server) Resource, such as processor computing resource, sensor etc. detect resource, communication resource cooperative achievement.
Speech processing device can be any electronic equipment with information processing capability, in one embodiment, electronics Equipment can be intelligent terminal, such as can be notebook etc. with the mobile terminal of wireless communication ability, AR/VR equipment.? In another embodiment, electronic equipment can also be the inconvenient mobile terminal device with computing function, such as desk-top calculating Machine, desktop computer, server etc..
Certainly, the embodiment of the present application is not limited to be provided as method and hardware, can also be there are many implementation, such as provides For storage medium (being stored with the instruction for executing method of speech processing provided by the embodiments of the present application).
Figure 1A is the implementation process schematic diagram one of the method for speech processing in the embodiment of the present application, as shown in Figure 1A, the party Method the following steps are included:
Step 101: voice is divided at least two sound bites;
Here, speech processing device receives a Duan Yuyin, and the voice received is divided at least two sound bites. Wherein, the received voice of speech processing device can be what another equipment was sent, be also possible to server transmission.
After speech processing device receives one section of voice, according to specified rule, the voice received is split; Here, specified rule can be averagely to divide according to the duration of voice, or according to the interval in voice, to voice into Row segmentation.
If specified rule are as follows: averagely divide according to the duration of voice, then after receiving one section of voice, determine voice when It is long, and average segmentation is carried out to the duration and obtains average duration, average segmentation is carried out according to the voice that the docking of average duration receives. Such as: a period of time a length of 18 seconds voices are received, which is averagely divided into a length of 9 seconds sound bites.
If specified rule are as follows: according to the interval in voice, be split to voice, then after receiving one section of voice, really Interval in the fixed voice is split the voice received according to determining interval.Such as: received one section of voice is " A1 ... A2 ... A3 ", according to the interval " ... " in the determining voice by the voice be divided into sound bite " A1 ", " A2 ", “A3”。
When dividing to voice, voice is divided at least two sound bite.Such as: voice A is divided into two Sound bite A1, A2;For another example: voice A is divided into five sound bites A1, A2, A3, A4, A5.
Here, the duration of the voice under can first judging before being split to voice, when the duration of the voice is longer When, then the voice is split.
Step 102: determining the corresponding semantic segment of the sound bite;
Here, when determining the corresponding semantic segment of sound bite, voice sheet can be determined according to semantics recognition model The corresponding semantic segment of section.Semantics recognition model needs to be obtained according to multiple voice sample and semantic sample come used training.
Here, when voice is divided at least two sound bites, the corresponding semantic segment of the sound bite is determined.Than Such as: the sound bite of segmentation is " A1 ", " A2 ", " A3 ", determines that the corresponding semantic segment of sound bite " A1 " is " this problem ", language The corresponding semantic segment of tablet section " A2 " is " too difficult ", and the corresponding semantic segment of sound bite " A3 " is " I will not do ";Again Such as: the sound bite of segmentation is " A1 ", " A2 ", " A3 ", " A4 ", determines that the corresponding semantic segment of sound bite " A1 " is " this Secondary tourism ", the corresponding semantic segment of sound bite " A2 " are " must be very good experience ", the corresponding semanteme of sound bite " A3 " Segment is " excellent ", and the corresponding semantic segment of sound bite " A4 " is " really expecting very much ".
Here, when determining the corresponding semantic segment of sound bite, at least two sound bites that segmentation can be obtained, The corresponding semantic segment of at least two sound bites is determined respectively;It can also determine the corresponding semanteme of the voice, then by one section of language Justice is divided into multiple semantic segments.
Step 103: corresponding to show the sound bite and the semantic segment.
Here, after obtaining at least two sound bites semantic segment corresponding with the sound bite, speech recognition is set At least two sound bites of standby display semantic segment corresponding with the sound bite.
Wherein, the sound bite of display semantic segment corresponding with the sound bite, can be according to the successive of reception voice Sequence is arranged successively on the display screen of speech recognition apparatus.
In display sound bite semantic segment corresponding with the sound bite, the display in speech recognition apparatus can be Screen display.For example, as shown in Figure 1B: showing sound bite " A1 " and voice sheet respectively in the display screen of speech recognition apparatus The corresponding semantic segment 11 of section " A1 " is " current tourism ", sound bite " A2 " and the corresponding semantic segment 12 of sound bite " A2 " For " must be very good experience ", sound bite " A3 " and the corresponding semantic segment 13 of sound bite " A3 " be " excellent ", language Tablet section " A4 " and the corresponding semantic segment 14 of sound bite " A4 " are " really expecting very much ".
Voice is divided at least two sound bites by method of speech processing provided by the embodiments of the present application;Described in determination The corresponding semantic segment of sound bite;It is corresponding to show the sound bite and the semantic segment;In this way, user can basis The semantic segment of display directly selects the sound bite for wanting to listen again, more convenient, improves user experience.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in Fig. 2, method includes the following steps:
Step 201: determining the corresponding duration of the voice;
Here, after speech processing device receives one section of voice, the corresponding duration of voice received is determined, such as: Speech processing device receives one section of voice A, according to voice is received at the beginning of and the end time, determine the voice that receives A length of 30 seconds when A is corresponding.
Step 202: in the case where the duration is greater than default specified duration, voice being divided at least two voice sheets Section;
Here, specified duration is preset, is receiving the corresponding duration of one section of voice greater than the default specified duration In the case of, it is believed that receiving one section of voice is long voice, and the long voice is divided at least two sound bites.
Such as: it is 20 seconds a length of when default specified, receive one section of voice it is corresponding when 30 seconds a length of, a received Duan Yuyin Corresponding duration is greater than the default specified duration, and 30 seconds long voices are divided at least two sound bites.
In the case where receiving the corresponding duration of one section of voice less than the default specified duration, it is believed that receive one Duan Yuyin is phrase sound, does not need to be split it.
It should be noted that default specified duration can be arranged according to the actual situation, the embodiment of the present application does not limit this System.
Step 203: determining the corresponding semantic segment of the sound bite;
Step 204: corresponding to show the sound bite and the semantic segment.
Wherein, step 203 is to step 204 respectively referring to the step 102 in above-described embodiment to step 103.
Method of speech processing provided by the embodiments of the present application, the corresponding duration of true voice;When duration is greater than default specified In the case where length, voice is divided at least two sound bites;Determine the corresponding semantic segment of sound bite;Corresponding display institute State sound bite and the semantic segment;It so, it is possible to be split the voice when determining voice is long voice, obtain To at least two sound bites, user experience is improved.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in figure 3, method includes the following steps:
Step 301: based on received the first operation for the voice, operation display interface;
Here, after speech processing device receives one section of voice, the voice received is shown to speech processing device Display screen on, user operates voice on display screen, speech processing device based on it is received for voice first behaviour Make, operation display interface;The operation interface includes: voice segmentation key.
Such as: speech processing device receives a length of 30 seconds voices at one section, and the voice is shown on a display screen, User carries out right click operation to the voice, and speech processing device receives the right click operation for voice, operation display interface;It should Operation interface includes: voice segmentation key.
It should be noted that the first operation can not limit this for touch control operations, the embodiment of the present application such as click, touches System.
Step 302: based on received the second operation for voice segmentation key, generating voice split order;
Here, after operation display interface on the display screen of speech processing device, user is on operation display interface Voice segmentation key is operated, and speech processing device generates language based on received the second operation for voice segmentation key Sound split order.
Such as: user clicks the voice segmentation key on operation display interface, and speech processing device receives needle To the clicking operation of voice segmentation key, voice split order is generated.
It should be noted that the second operation can not limit this for touch control operations, the embodiment of the present application such as click, touches System.
Step 303: being based on the voice split order, the voice is divided at least two sound bites;
Here, key is divided based on voice, after generating voice split order, is based on voice split order, voice is divided At at least two sound bites.
Wherein, voice split order can also carry the rule of voice segmentation, be divided according to voice split order and voice Rule, voice is split.
Step 304: determining the corresponding semantic segment of the sound bite;
Step 305: corresponding to show the sound bite and the semantic segment.
Wherein, step 304 is to step 305 respectively referring to the step 102 in above-described embodiment to step 103.
Method of speech processing provided by the embodiments of the present application, based on received the first operation for voice, display operation Interface;Operation interface includes: voice segmentation key;Based on received the second operation for voice segmentation key, voice is generated Split order;Based on voice split order, voice is divided at least two sound bites;Determine the corresponding semanteme of sound bite Segment;It is corresponding to show the sound bite and the semantic segment;It so, it is possible to divide key for voice based on received Second operation, generate voice split order, voice is split, user experience is improved.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in figure 4, method includes the following steps:
Step 401: according to the interval in the voice, determining the first partitioning boundary of the voice;
Here, when voice is divided at least two sound bites, according to the interval in voice, the first of voice is determined Partitioning boundary.Wherein, the interval in voice can be the pause in voice.
Such as: the pause " ... " in voice is determined as first point of voice for " A1 ... A2 ... A3 " by the voice received Cut edge circle.
Step 402: according to first partitioning boundary, the voice being split, obtains at least two voice sheet Section;
Here, according to the first determining partitioning boundary, voice is split, obtains at least two sound bites.Such as: The voice received is that " A1 ... A2 ... A3 " is split voice according to the pause " ... " in voice, obtains three voice sheets Section: sound bite " A1 ", sound bite " A2 ", sound bite " A3 ".
Step 403: determining the corresponding phoneme of the sound bite;
Here, true respectively after obtaining at least two sound bites when determining the corresponding semantic segment of sound bite The fixed corresponding semantic segment of at least two sound bites.
When determining the corresponding semantic segment of at least two sound bites, the corresponding phoneme of the sound bite is first determined.Than Such as: at least two sound bites are " A1 ", " A2 ", determine that the corresponding phoneme of sound bite " A1 " is " zhedaoti ", determine language The corresponding phoneme of tablet section " A2 " is " tainanle ".
Here it is possible to determine the corresponding phoneme of sound bite according to semantics recognition model, deposited in advance in semantics recognition model Store up the corresponding phoneme of pronunciation of each text in sound bite.
Step 404: the phoneme is matched with the phoneme of setting;
Here, it by the determining corresponding phoneme of sound bite, is matched with the phoneme of setting.Such as: determine voice sheet The corresponding phoneme of section " A1 " is " zhedaoti ", and phoneme phoneme corresponding with all texts is matched.
Here, the phoneme of setting can be the corresponding phoneme of all texts, be stored in advance in speech processing device.
Step 405: if the phoneme is matched with the phoneme of setting, the corresponding semantic information of the phoneme of the setting is true It is set to the corresponding semantic segment of the sound bite;
Here, it during the phoneme of phoneme and setting is carried out matched, if the corresponding phoneme of sound bite, and sets Fixed phoneme matching, is determined as the corresponding semantic segment of sound bite for the corresponding semantic information of the phoneme of setting.
Such as: the corresponding phoneme of sound bite " A1 " is " zhedaoti ", phoneme " zhe ", " dao ", " ti " with setting Matching, the phoneme " zhe " of setting, " dao ", " ti " corresponding semantic information are respectively " this ", " road ", " topic ", by semantic information " this ", " road ", " topic " are determined as the corresponding semantic segment of sound bite " A1 " " this problem ".
Step 406: corresponding to show the sound bite and the semantic segment.
Wherein, step 406 is referring to the step 103 in above-described embodiment.
Method of speech processing provided by the embodiments of the present application determines the of the voice according to the interval in the voice One partitioning boundary;According to first partitioning boundary, the voice is split, obtains at least two sound bites;It determines The corresponding phoneme of the sound bite;The phoneme is matched with the phoneme of setting;If the sound of the phoneme and setting Element matching, is determined as the corresponding semantic segment of the sound bite for the corresponding semantic information of the phoneme of the setting;It is corresponding aobvious Show the sound bite and the semantic segment;In this way, being split according to partitioning boundary to voice, and according to phoneme, really Determine the corresponding semantic segment of sound bite, improves user experience.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in figure 5, method includes the following steps:
Step 501: voice is divided at least two sound bites;
Wherein, step 501 is referring to the step 101 in above-described embodiment.
Step 502: determining the corresponding semantic information of the voice;
Here, when determining semantic information, a Duan Yuyin can be determined by receive one section of voice as a whole Corresponding semantic information.
When determining the corresponding semantic information of one section of voice, the corresponding phoneme of the voice is first determined, by determining voice pair The phoneme answered is matched with the phoneme of setting, if the corresponding phoneme of voice, matches with the phoneme of setting, by the sound of setting The corresponding semantic information of element is determined as the corresponding semantic information of voice.
Such as: one section of voice is " A1 ... A2 ", determines that the corresponding phoneme of voice " A1 ... A2 " is " zhedaoti ... Tainanle " matches phoneme phoneme corresponding with all texts, and the corresponding phoneme of voice " A1 ... A2 " is " zhedaoti ... tainanle " is matched with phoneme " zhe ", " dao ", " ti ", " tai ", " nan ", " le " of setting, setting The corresponding semantic information of phoneme be respectively " this ", " road ", " topic ", " too ", " hardly possible ", " ", by semantic information " this ", " road ", " topic ", " too ", " hardly possible ", " " are determined as the corresponding semantic information " this problem is too difficult " of sound bite " A1 ... A2 ".
Here it is possible to determine the corresponding phoneme of sound bite according to semantics recognition model, deposited in advance in semantics recognition model Store up the corresponding phoneme of pronunciation of each text in sound bite.The phoneme of setting can be the corresponding phoneme of all texts, in advance It is first stored in speech processing device.
Step 503: according to the interval in the voice, institute's semantic information being added and is marked;
Here, after semantic information has been determined, according to the interval in voice, semantic information is added and is marked;Wherein, voice In interval can be the pause in voice, can be to add punctuation mark to semantic information to semantic information addition label.
Such as: the voice received is " A1 ... A2 ", and corresponding semantic information is " this problem is too difficult ", according in voice Pause " ... ", to semantic information " this problem is too difficult " add punctuate, obtain " this problem, too difficult ".
Step 504: according to the label, determining the second partitioning boundary of institute's semantic information;
Here, the label that semantic information will be added, the second partitioning boundary as semantic information.Such as: semantic information Punctuation mark ", " in semantic information is determined as the second partitioning boundary of the semantic information for " this problem, too difficult ".
Step 505: according to second partitioning boundary, institute's semantic information being split, obtains at least two semantemes Segment;
Here, according to the second determining partitioning boundary, semantic information is split, obtains at least two semantic segments. Such as: the semantic information received is " this problem, too difficult ", according to the punctuation mark ", " in semantic information, to semantic information It is split, obtains two semantic segments: semantic segment " this problem ", semantic segment " too difficult " ".
Step 506: corresponding to show the sound bite and the semantic segment.
Wherein, step 506 is referring to the step 103 in above-described embodiment.
Voice is divided at least two sound bites by method of speech processing provided by the embodiments of the present application;Described in determination The corresponding semantic information of voice;According to the interval in the voice, institute's semantic information is added and is marked;According to the label, Determine the second partitioning boundary of institute's semantic information;According to second partitioning boundary, institute's semantic information is split, is obtained To at least two semantic segments;It is corresponding to show the sound bite and the semantic segment;In this way, according to the second segmentation side Boundary is split semantic information, obtains at least two semantic segments, improves the experience of user.
A kind of method of speech processing is provided in the embodiment of the present application, as shown in fig. 6, method includes the following steps:
Step 601: voice is divided at least two sound bites;
Step 602: determining the corresponding semantic segment of the sound bite;
Step 603: corresponding to show the sound bite and the semantic segment;
Wherein, step 601 is to step 603 respectively referring to the step 101 in above-described embodiment to step 103.
Step 604: the third received for the sound bite operates;
Here, it is shown that after sound bite and the corresponding semantic segment of sound bite, receive the third for being directed to sound bite Operation.Such as: sound bite " A1 " and the corresponding semanteme of sound bite " A1 " are shown respectively in the display screen of speech recognition apparatus Segment is " current tourism ", sound bite " A2 " and the corresponding semantic segment of sound bite " A2 " be " must be very good body Test ", sound bite " A3 " and the corresponding semantic segment of sound bite " A3 " they are " excellent ", sound bite " A4 " and sound bite " A4 " corresponding semantic segment is " really expecting very much ", and user clicks sound bite " A1 ", and speech recognition apparatus, which receives, to be directed to The clicking operation of sound bite " A1 ".
It should be noted that third operation can not limit this for touch control operations, the embodiment of the present application such as click, touches System.
Step 605: being operated based on the third, the sound bite is played out.
Here, third operation is received based on speech recognition apparatus, sound bite is played out.Such as: user clicks Sound bite " A1 ", speech recognition apparatus receive the clicking operation for sound bite " A1 ", carry out to sound bite " A1 " It plays.
Voice is divided at least two sound bites by method of speech processing provided by the embodiments of the present application;Described in determination The corresponding semantic segment of sound bite;It is corresponding to show the sound bite and the semantic segment;It receives and is directed to the voice The third of segment operates;It is operated, the sound bite is played out, in this way, user can be according to display based on the third Semantic segment directly select and want the sound bite listened again, it is more convenient, improve user experience.
Method of speech processing provided by the embodiments of the present application is illustrated by concrete scene in the embodiment of the present application.
In the embodiment of the present application, after the long voice that user receives other side's transmission, " long voice segmentation " key, root are selected Long voice and its identification text segmentation are showed into user at segment according to the pause in voice.
In one example, as shown in fig. 7, working as speech processing device, when receiving long voice 71, display function on display screen Choice box 72 includes: using handset mode key 73, collection key 74, long voice segmentation key in display function choice box 72 75, the operation based on user on function choice box 72, long voice segmentation key 75 is selected.Speech processing device executive chairman's language Cent cuts the corresponding dividing function of key 75, is split to long voice 71 is received, and obtains the long voice 71 corresponding three Sound bite and corresponding text and corresponding display, as shown in Figure 8: " specifically travelling is for sound bite A1 and corresponding text 74 Primary very good experience ", sound bite A2 and corresponding text 75 " well done ", sound bite A3 and corresponding text 76 are " fast It is fast to prepare thing ".
The implementation process schematic diagram of the method for speech processing of the embodiment of the present application, as shown in Figure 9:
Step 901: by long speech recognition at text, and punctuating.
Step 902: according to punctuate by long voice and text segmentation at sound bite and its corresponding text fragments.
Step 903: sound bite and its text fragments are showed into user.
The technical effect of the embodiment of the present application are as follows: user can directly select according to the text identified to want to listen again Sound bite, it is more convenient, improve user experience.
The embodiment of the present application also provides a kind of voice processing apparatus, included by each module, each module included by the device Each unit, can be realized by the processor of voice processing apparatus;Certainly it can also be realized by specific logic circuit;? During implementation, processor can for central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or Field programmable gate array (FPGA) etc..
As shown in Figure 10, voice processing apparatus 100 includes:
Divide module 1001, for voice to be divided at least two sound bites;
Determining module 1002, for determining the corresponding semantic segment of the sound bite;
Display module 1003 shows the sound bite and the semantic segment for corresponding.
In some embodiments, segmentation module 1001 includes: the first determination unit and cutting unit;Wherein,
First determination unit, for determining the corresponding duration of the voice;
Cutting unit, in the case where the duration is greater than default specified duration, voice to be divided at least two Sound bite.
In some embodiments, voice processing apparatus 100 further include: display module and generation module;Wherein,
Display module, for based on received the first operation for the voice, operation display interface;Operation circle Face includes: voice segmentation key;
Generation module, for generating voice and referring to based on received the second operation for voice segmentation key It enables;
Correspondingly, divide module 1001, for being based on the voice split order, the voice is divided at least two Sound bite.
In some embodiments, divide module 1001 further include: the second determination unit and third determination unit;Wherein,
Second determination unit, for determining the first partitioning boundary of the voice according to the interval in the voice;
Third determination unit, for being split to the voice, obtaining at least two according to first partitioning boundary Sound bite.
In some embodiments, determining module 1002 further include: the 4th determination unit, matching unit and the 5th determine single Member;Wherein,
4th determination unit, for determining the corresponding phoneme of the sound bite;
Matching unit, for matching the phoneme with the phoneme of setting;
5th determination unit, it is if matched for the phoneme with the phoneme of setting, the phoneme of the setting is corresponding Semantic information is determined as the corresponding semantic segment of the sound bite.
In some embodiments, determining module 1002 further include: the 6th determination unit, marking unit, the 7th determination unit With the 8th determination unit;Wherein,
6th determination unit, for determining the corresponding semantic information of the voice;
Marking unit, for adding and marking to institute's semantic information according to the interval in the voice;
7th determination unit, for determining the second partitioning boundary of institute's semantic information according to the label;
8th determination unit, for being split, being obtained at least to institute's semantic information according to second partitioning boundary Two semantic segments.
In some embodiments, voice processing apparatus 100 further include: receiving module and playing module;Wherein,
Receiving module, for receiving the third operation for being directed to the sound bite;
Playing module plays out the sound bite for being operated based on the third.
It should be understood that voice processing apparatus provided by the above embodiment is in speech processes, only with above-mentioned each program The division progress of module can according to need for example, in practical application and distribute above-mentioned processing by different program moulds Block is completed, i.e., the internal structure of device is divided into different program modules, to complete all or part of place described above Reason.In addition, voice processing apparatus provided by the above embodiment and method of speech processing embodiment belong to same design, it is specific real Existing process is detailed in embodiment of the method, and which is not described herein again.
Speech processing device 110 shown in Figure 11 include: at least one processor 1110, memory 1140, at least one Network interface 1120 and user interface 1130.Various components in speech processing device 110 are coupled in by bus system 1150 Together.It is understood that bus system 1150 is for realizing the connection communication between these components.It includes data that bus system 1150, which is removed, It further include power bus, control bus and status signal bus in addition except bus.But for the sake of clear explanation, in Figure 11 Various buses are all designated as bus system 1150.
User interface 1130 may include display, keyboard, mouse, trace ball, click wheel, key, button, touch-sensitive plate or Person's touch screen etc..
Memory 1140 can be volatile memory or nonvolatile memory, may also comprise volatile and non-volatile Both memories.Wherein, nonvolatile memory can be read-only memory (ROM, Read Only Memory).Volatibility is deposited Reservoir can be random access memory (RAM, Random Access Memory).The memory of description of the embodiment of the present invention 1140 are intended to include the memory of any suitable type.
Memory 1140 in the embodiment of the present invention can storing data to support the operation of speech processing device 110.This The example of a little data includes: any computer program for operating on speech processing device 110, such as operating system and application Program.Wherein, operating system includes various system programs, such as ccf layer, core library layer, driving layer etc., for realizing various Basic business and the hardware based task of processing.Application program may include various application programs.
Wherein, when processor 1110 is used to run the computer program, to realize the voice provided in above-described embodiment Step in processing method.
As the example that method provided in an embodiment of the present invention uses software and hardware combining to implement, the embodiment of the present invention is provided Method can be embodied directly in and combined by the software module that processor 1110 executes, such as this language provided in an embodiment of the present invention Sound processor, the software module of voice processing apparatus can store in memory 1140, and processor 1110 reads memory The executable instruction that software module includes in 1140, in conjunction with necessary hardware (e.g., including processor 1110 and be connected to total The other assemblies of line 1150) complete method of speech processing provided in an embodiment of the present invention.
As an example, processor 1110 can be a kind of IC chip, and the processing capacity with signal, for example, it is general Processor, digital signal processor (DSP, Digital Signal Processor) or other programmable logic device are divided Vertical door or transistor logic, discrete hardware components etc., wherein general processor can be microprocessor or any normal The processor etc. of rule.
It need to be noted that: the above speech processing device implements the description of item, is similar with above method description , there is with embodiment of the method identical beneficial effect, therefore do not repeat them here.For in the application speech processing device embodiment Undisclosed technical detail, those skilled in the art please refer to the description of the application embodiment of the method and understand, to save a piece Width, which is not described herein again.
In the exemplary embodiment, the embodiment of the present application also provides a kind of storage mediums, can deposit to be computer-readable Storage media, the memory for example including storage computer program, above-mentioned computer program can be handled by processor, aforementioned to complete Step described in method.Computer readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, The memories such as magnetic surface storage, CD or CD-ROM.
The embodiment of the present application also provides a kind of computer readable storage medium, is stored thereon with computer program, the calculating Machine program realizes the step in the method for speech processing provided in above-described embodiment when being processed by the processor.
It need to be noted that: above computer media implements the description of item, be with above method description it is similar, With the identical beneficial effect of same embodiment of the method, therefore do not repeat them here.For not disclosed in the application storage medium embodiment Technical detail, those skilled in the art please refers to the description of the application embodiment of the method and understands, to save length, here It repeats no more.
The method that above-mentioned the embodiment of the present application discloses can be applied in the processor, or real by the processor It is existing.The processor may be a kind of IC chip, the processing capacity with signal.During realization, the above method Each step can be completed by the instruction of the integrated logic circuit of the hardware in the processor or software form.Above-mentioned The processor can be general processor, DSP or other programmable logic device, discrete gate or transistor logic device Part, discrete hardware components etc..The processor may be implemented or execute disclosed each method, step in the embodiment of the present application And logic diagram.General processor can be microprocessor or any conventional processor etc..In conjunction with the embodiment of the present application institute The step of disclosed method, can be embodied directly in hardware decoding processor and execute completion, or with hard in decoding processor Part and software module combination execute completion.Software module can be located in storage medium, which is located at memory, described The step of processor reads the information in memory, completes preceding method in conjunction with its hardware.
It is appreciated that the memory (memory) of the embodiment of the present application can be volatile memory or non-volatile deposit Reservoir may also comprise both volatile and non-volatile memories.Wherein, nonvolatile memory can be ROM, programmable Read memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM, Erasable Programmable Read-Only Memory), electrically erasable programmable read-only memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic RAM (FRAM, Ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface storage, light Disk or CD-ROM (CD-ROM, Compact Disc Read-Only Memory);Magnetic surface storage can be disk and deposit Reservoir or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory), It is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random Access memory (SRAM, Static Random Access Memory), synchronous static random access memory (SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links dynamic random are deposited Access to memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus arbitrary access are deposited Reservoir (DRRAM, Direct Rambus Random Access Memory).The memory of the embodiment of the present application description is intended to wrap Include but be not limited to the memory of these and any other suitable type.
It should be understood by those skilled in the art that, other of the method for speech processing of the embodiment of the present application constitute and make With, be all for a person skilled in the art it is known, in order to reduce redundancy, the embodiment of the present application is not repeated them here.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example " " specific example " Or the description of " some examples " etc. means particular features, structures, materials, or characteristics packet described in conjunction with this embodiment or example In at least one embodiment or example contained in the application.In the present specification, schematic expression of the above terms are not necessarily Refer to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be any one It can be combined in any suitable manner in a or multiple embodiment or examples.
While there has been shown and described that embodiments herein, it will be understood by those skilled in the art that: not A variety of change, modification, replacement and modification can be carried out to these embodiments in the case where being detached from the principle and objective of the application, this The range of application is by claim and its equivalent limits.

Claims (10)

1. a kind of method of speech processing, which comprises
Voice is divided at least two sound bites;
Determine the corresponding semantic segment of the sound bite;
It is corresponding to show the sound bite and the semantic segment.
2. according to the method described in claim 1, described be divided into voice before at least two sound bites, comprising:
Determine the corresponding duration of the voice;
In the case where the duration is greater than default specified duration, voice is divided at least two sound bites.
3. according to the method described in claim 1, the method also includes:
Based on received the first operation for the voice, operation display interface;The operation interface include: voice segmentation by Key;
Based on received the second operation for voice segmentation key, voice split order is generated;
It is correspondingly, described that voice is divided at least two sound bites, comprising:
Based on the voice split order, the voice is divided at least two sound bites.
4. according to the method described in claim 1, described be divided at least two sound bites for voice, comprising:
According to the interval in the voice, the first partitioning boundary of the voice is determined;
According to first partitioning boundary, the voice is split, obtains at least two sound bites.
5. according to the method described in claim 1, the corresponding semantic segment of the determination sound bite, comprising:
Determine the corresponding phoneme of the sound bite;
The phoneme is matched with the phoneme of setting;
If the phoneme is matched with the phoneme of setting, the corresponding semantic information of the phoneme of the setting is determined as the voice The corresponding semantic segment of segment.
6. according to the method described in claim 1, the corresponding semantic segment of the determination sound bite, comprising:
Determine the corresponding semantic information of the voice;
According to the interval in the voice, institute's semantic information is added and is marked;
According to the label, the second partitioning boundary of institute's semantic information is determined;
According to second partitioning boundary, institute's semantic information is split, at least two semantic segments are obtained.
7. according to the method described in claim 1, the method also includes:
The third received for the sound bite operates;
It is operated based on the third, the sound bite is played out.
8. a kind of voice processing apparatus, described device includes: segmentation module, determining module and display module;Wherein,
The segmentation module, for voice to be divided at least two sound bites;
The determining module, for determining the corresponding semantic segment of the sound bite;
The display module shows the sound bite and the semantic segment for corresponding.
9. a kind of speech processing device, including processor and for storing depositing for the computer program that can be run on a processor Reservoir;Wherein, the processor is for when running the computer program, perform claim to be required at any one of 1 to 7 voice Step in reason method.
10. a kind of storage medium, is stored thereon with computer program, realize that right is wanted when which is executed by processor Seek the step in any one of 1 to 7 method of speech processing.
CN201910580572.4A 2019-06-28 2019-06-28 Voice processing method, device, equipment and storage medium Active CN110379413B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910580572.4A CN110379413B (en) 2019-06-28 2019-06-28 Voice processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910580572.4A CN110379413B (en) 2019-06-28 2019-06-28 Voice processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110379413A true CN110379413A (en) 2019-10-25
CN110379413B CN110379413B (en) 2022-04-19

Family

ID=68251304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910580572.4A Active CN110379413B (en) 2019-06-28 2019-06-28 Voice processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110379413B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103187061A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Speech conversational system in vehicle
CN104616652A (en) * 2015-01-13 2015-05-13 小米科技有限责任公司 Voice transmission method and device
CN106559541A (en) * 2015-09-30 2017-04-05 北京奇虎科技有限公司 voice data processing method and device
CN106559540A (en) * 2015-09-30 2017-04-05 北京奇虎科技有限公司 voice data processing method and device
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
CN108141498A (en) * 2015-11-25 2018-06-08 华为技术有限公司 A kind of interpretation method and terminal
CN108874904A (en) * 2018-05-24 2018-11-23 平安科技(深圳)有限公司 Speech message searching method, device, computer equipment and storage medium
CN109379641A (en) * 2018-11-14 2019-02-22 腾讯科技(深圳)有限公司 A kind of method for generating captions and device
US20190066683A1 (en) * 2017-08-31 2019-02-28 Interdigital Ce Patent Holdings Apparatus and method for residential speaker recognition
CN109473104A (en) * 2018-11-07 2019-03-15 苏州思必驰信息科技有限公司 Speech recognition network delay optimization method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103187061A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Speech conversational system in vehicle
CN104616652A (en) * 2015-01-13 2015-05-13 小米科技有限责任公司 Voice transmission method and device
CN106559541A (en) * 2015-09-30 2017-04-05 北京奇虎科技有限公司 voice data processing method and device
CN106559540A (en) * 2015-09-30 2017-04-05 北京奇虎科技有限公司 voice data processing method and device
CN108141498A (en) * 2015-11-25 2018-06-08 华为技术有限公司 A kind of interpretation method and terminal
CN107305541A (en) * 2016-04-20 2017-10-31 科大讯飞股份有限公司 Speech recognition text segmentation method and device
US20190066683A1 (en) * 2017-08-31 2019-02-28 Interdigital Ce Patent Holdings Apparatus and method for residential speaker recognition
CN108874904A (en) * 2018-05-24 2018-11-23 平安科技(深圳)有限公司 Speech message searching method, device, computer equipment and storage medium
CN109473104A (en) * 2018-11-07 2019-03-15 苏州思必驰信息科技有限公司 Speech recognition network delay optimization method and device
CN109379641A (en) * 2018-11-14 2019-02-22 腾讯科技(深圳)有限公司 A kind of method for generating captions and device

Also Published As

Publication number Publication date
CN110379413B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
US11321535B2 (en) Hierarchical annotation of dialog acts
EP3095113B1 (en) Digital personal assistant interaction with impersonations and rich multimedia in responses
CN105068987B (en) The words grade correcting method and system of voice input
CN110288980A (en) Audio recognition method, the training method of model, device, equipment and storage medium
CN107924679A (en) Delayed binding during inputting understanding processing in response selects
CN107230475A (en) A kind of voice keyword recognition method, device, terminal and server
CN106897155B (en) A kind of method for showing interface and device
CN108630193A (en) Audio recognition method and device
CN107507615A (en) Interface intelligent interaction control method, device, system and storage medium
US20140013192A1 (en) Techniques for touch-based digital document audio and user interface enhancement
CN110459222A (en) Sound control method, phonetic controller and terminal device
CN109634501B (en) Electronic book annotation adding method, electronic equipment and computer storage medium
CN106598535A (en) Volume adjustment method and apparatus
CN104123114A (en) Method and device for playing voice
CN108885869A (en) The playback of audio data of the control comprising voice
CN107591150A (en) Audio recognition method and device, computer installation and computer-readable recording medium
CN105893351B (en) Audio recognition method and device
CN103218555A (en) Logging-in method and device for application program
CN108388597A (en) Conference summary generation method and device
US20120053937A1 (en) Generalizing text content summary from speech content
WO2015106646A1 (en) Method and computer system for performing audio search on social networking platform
KR20140014510A (en) Editing method of text generatied by a speech recognition and terminal thereof
CN111128254B (en) Audio playing method, electronic equipment and storage medium
CN112309449A (en) Audio recording method and device
CN110379413A (en) A kind of method of speech processing, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant