CN109215661A

CN109215661A - Speech-to-text method, apparatus equipment and storage medium

Info

Publication number: CN109215661A
Application number: CN201811006413.5A
Authority: CN
Inventors: 王文斌; 周围; 李封翔
Original assignee: Shanghai Wind Communication Technologies Co Ltd
Current assignee: Kunshan Pinyuan Intellectual Property Operating Technology Co., Ltd.
Priority date: 2018-08-30
Filing date: 2018-08-30
Publication date: 2019-01-15

Abstract

The embodiment of the invention discloses a kind of speech-to-text method, apparatus equipment and storage mediums, this method comprises: obtaining voice messaging, and the voice messaging are converted into text paragraph；The text paragraph is broken into pieces to form multiple phrases；Receive the target phrase of user's selection；Receive the replacement phrase for being used to replace the target phrase of user's input；The target phrase is replaced using the replacement phrase, to update the text paragraph.The voice for solving the prior art is directly changed into the lower technical problem of the modification efficiency after text paragraph, to improve the modification efficiency for the text paragraph that voice messaging is converted into.

Description

Speech-to-text method, apparatus equipment and storage medium

Technical field

The present embodiments relate to data processing field more particularly to a kind of speech-to-text method, apparatus equipment and deposit Storage media.

Background technique

Voice is directly changed into text paragraph, it will usually which there are many mistakes, in order to enable recipient or reader The content that voice is determined according to the text after conversion needs to modify to the text paragraph after conversion.Existing modification It is the position for moving the cursor to transcription error, then deletes the content of mistake, then be manually entered correct content, modifies speed It is lower with efficiency.

Summary of the invention

The embodiment of the present invention provides a kind of speech-to-text method, apparatus equipment and storage medium, solves the prior art Voice be directly changed into the lower technical problem of the modification efficiency after text paragraph.

In a first aspect, the embodiment of the invention provides a kind of speech-to-text methods, comprising:

Voice messaging is obtained, and the voice messaging is converted into text paragraph；

The text paragraph is broken into pieces to form multiple phrases；

Receive the target phrase of user's selection；

Receive the replacement phrase for being used to replace the target phrase of user's input；

The target phrase is replaced using the replacement phrase, to update the text paragraph.

Further, after the target phrase for receiving user's selection, further includes:

Pop-up inputs the input frame of the replacement phrase for user；

The replacement phrase for being used to replace the target phrase for receiving user's input, comprising:

Receive the replacement phrase for being used to replace the target phrase that user inputs in the input frame.

Further, the target phrase for receiving user's selection, including；

One or more phrases for needing to modify of user's selection are received, and using the selected phrase of user as target word Group.

Further, after the replacement phrase for replacing the target phrase for receiving user's input, further includes:

Determine the corresponding target voice phrase of target phrase described in the voice messaging；

According to the corresponding target voice phrase of the target phrase and replacement phrase, the target voice phrase and institute are established The conversion corresponding relationship of replacement phrase is stated, to replace the conversion between the target voice phrase and the target phrase is corresponding to close System.

Further, before/after the target phrase for receiving user's selection, further includes:

The default text for obtaining synchronous vacations modifies range, by target phrase and default text modification range with The identical phrase of the target phrase is used as target phrase.

All target phrases of the default text modification range are identified.

Further, described to replace the target phrase using the replacement phrase, to update the text paragraph, comprising:

It determines whether correct with the semanteme of context after the replacement phrase replaces the target phrase；

If correct, the target phrase is replaced using the replacement phrase；

If incorrect, suggestion prompting is exported, if user reaffirms modification, is replaced using the replacement phrase The target phrase.

Second aspect, the embodiment of the invention also provides a kind of speech-to-text devices, comprising:

Conversion module is converted into text paragraph for obtaining voice messaging, and by the voice messaging；

Module is broken into pieces, for breaking into pieces the text paragraph to form multiple phrases；

Target phrase receiving module, for receiving the target phrase of user's selection；

Phrase receiving module is replaced, for receiving the replacement phrase for being used to replace the target phrase of user's input；

Replacement module, for replacing the target phrase using the replacement phrase, to update the text paragraph.

The third aspect, the embodiment of the invention also provides a kind of equipment, the equipment includes:

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes speech-to-text method as described in relation to the first aspect.

Fourth aspect, it is described the embodiment of the invention also provides a kind of storage medium comprising computer executable instructions Computer executable instructions by computer processor when being executed for executing speech-to-text method as described in relation to the first aspect.

The technical solution of speech-to-text method provided in an embodiment of the present invention, including voice messaging is obtained, and by voice Information is converted into text paragraph；Text paragraph is broken into pieces to form multiple phrases；Receive the target phrase of user's selection；It receives and uses The replacement phrase for being used to replace target phrase of family input；Target phrase is replaced using replacement phrase, with more new literacy paragraph.Directly It connects and target phrase is replaced using replacement phrase, without deleting original target phrase, eliminate the deletion of target phrase Operation, while reducing the mobile number of cursor, and then the modification efficiency of text paragraph can be greatly improved, reduce text section The modification time fallen.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing does one and simply introduces, it should be apparent that, drawings in the following description are some embodiments of the invention, for this For the those of ordinary skill of field, without creative efforts, it can also be obtained according to these attached drawings others Attached drawing.

Fig. 1 is the flow chart for the speech-to-text method that the embodiment of the present invention one provides；

Fig. 2 is the structural block diagram of speech-to-text device provided by Embodiment 2 of the present invention；

Fig. 3 is the structural block diagram for the equipment that the embodiment of the present invention three provides.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, hereinafter with reference to attached in the embodiment of the present invention Figure, clearly and completely describes technical solution of the present invention by embodiment, it is clear that described embodiment is the present invention one Section Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.

Embodiment one

Fig. 1 is the flow chart for the speech-to-text method that the embodiment of the present invention one provides.The technical solution of the present embodiment is suitable The case where text paragraph for being converted into voice messaging is modified.This method can be by provided in an embodiment of the present invention Speech-to-text device executes, which can be realized by the way of software and/or hardware, and configures and answer in the processor With.This method specifically comprises the following steps:

S101, voice messaging is obtained, and converts speech information into text paragraph.

When for being inconvenient to listen to the occasion of voice, or needing to carry out text backup to voice messaging, usually need Convert speech information into correct text paragraph.And its accurate text paragraph is obtained according to voice messaging, usually need Voice messaging is first converted into text paragraph.

S102, text paragraph is broken into pieces to form multiple phrases.

It modifies after voice messaging is converted into text paragraph, then to text paragraph.In order to improve repairing for text paragraph Change efficiency, the present embodiment is first broken text paragraph into pieces to form multiple phrases, for example, " will arrive in autumn, flock of wild geese is southward Fly ", it is available after breaking into pieces, autumn/arrived/,/a group/wild goose/southward fly/.I.e. the present embodiment is substantially single with phrase It modifies to text paragraph position.

It should be noted that being broken text paragraph to form multiple phrases into pieces using the prior art, the present embodiment is not Method is broken into pieces to specific text paragraph to be defined.

S103, the target phrase for receiving user's selection.

After text paragraph is broken into pieces, user selects to need the phrase modified as target phrase.Illustratively, for " autumn/ Arrived/, a group/wild goose/fly toward blue ", since " flying toward blue " should be " southward flying ", therefore, it is necessary to modify to it.And it is right It is modified, and user needs first to select " flying toward blue ", as target phrase.

It is understood that the determination mode about target phrase, can only select a phrase as target phrase, such as It, can also be using the multiple phrases of simultaneous selection as target phrase when the multiple phrases of fruit are adjacent.

S104, the replacement phrase for being used to replace target phrase for receiving user's input.

After target phrase determines, needs to modify to target phrase, that is, need to obtain the replacement phrase of target phrase.This Embodiment receives the replacement phrase for being used to replace target phrase of user's input by input frame, therefore for aforementioned exemplary, needs It to be inputted in input frame " southward flying ".

It is understood that input frame can be constantly in display state, user is after target phrase determines, directly defeated Enter input replacement phrase in frame；Input frame can also be converted in display and between hiding, when input frame is hidden, Ke Yi Page setup input frame triggers icon, and when the user clicks or touching the triggering icon, input frame pop-up, user can inputted Input replacement phrase in frame；In addition, can also directly be hidden for hiding input frame, without in the corresponding triggering of page setup Icon, when user press target phrase exceed preset time after, that is, automatic spring input frame, user inputs replacement in input frame Phrase, and after target phrase has been replaced, input frame is hidden automatically.

If being deposited in the text paragraph after converting in voice messaging comprising repeating phrase, and the repetition phrase transcription error In multiple identical target phrases, at this point, in order to improve text modification efficiency, the present embodiment in default modification literal scope and The identical phrase of target phrase is modified simultaneously, specifically, obtaining the default text of synchronous vacations before or after this step Word modifies range, regard phrase identical with target phrase in target phrase and default text modification range as target word Group.Wherein, presetting text modification range can be preset characters number, default line number or default nature number of segment etc., in actual use It can be set or be selected as the case may be.

Illustratively, it is assumed that default text modification range is that a natural character field is fallen, when depositing during the natural character field is fallen At multiple " flying toward blue ", and user has selected one of them " flying toward blue ", then natural character field fall in all " flying toward blue " Become target phrase simultaneously.

All target phrases are confirmed or checked for the ease of user, after all target phrases determine, the present embodiment All target phrases are identified by color, target phrase is more intuitively presented, convenient for user's visually wink Between position all target phrases.

S105, target phrase is replaced using replacement phrase, with more new literacy paragraph.

The modification of target phrase is completed using replacement phrase replacement target phrase.In voiced translation, it is normally based on The mandarin or dialect of standard carry out voice conversion, such as the south of Fujian Province words, but in real life, the pronunciation of many people is not marked The pronunciation of standard, especially adult are generally difficult to change, therefore identical mistake can often occur in voice conversion.In order to mention High voice messaging is converted into the accuracy of text paragraph, and reduces the modification amount of text paragraph, and the present embodiment first determines voice The corresponding target voice phrase of target phrase in information；According to the corresponding target voice phrase of target phrase and replacement phrase, build Vertical target voice phrase and the conversion corresponding relationship for replacing phrase, to replace the conversion between target voice phrase and target phrase Corresponding relationship, to can be directly changed into target voice phrase when converting speech information into text paragraph next time Replace phrase.

In order to improve the accuracy of text paragraph modification, the present embodiment is when using replacement phrase replacement target phrase, first Judge the semanteme of replacement phrase and context, if semantic correct, then be used directly and replace phrase replacement target phrase, if semanteme is not Correctly, then it exports prompting to suggest to remind user, if user adheres to continuing to modify, replaces target word using replacement phrase Group.

The technical solution of speech-to-text method provided in an embodiment of the present invention, including voice messaging is obtained, and by voice Information is converted into text paragraph；Text paragraph is broken into pieces to form multiple phrases, and determines the phrase that needs to modify using as mesh Mark phrase；The replacement phrase of target phrase is obtained by input frame；Using replacement phrase replacement target phrase with more new literacy section It falls.Directly target phrase is replaced using replacement phrase, without deleting original target phrase, eliminates target phrase Delete operation, while reducing the mobile number of cursor, and then the modification efficiency of text paragraph can be greatly improved, reduce text The modification time that field is fallen.

Embodiment two

Fig. 2 is the structural block diagram of speech-to-text device provided by Embodiment 2 of the present invention.The device is above-mentioned for executing Speech-to-text method provided by any embodiment, the device are chosen as software or hardware realization.The device includes:

Conversion module 11 is converted into text paragraph for obtaining voice messaging, and by the voice messaging；

Module 12 is broken into pieces, for breaking into pieces the text paragraph to form multiple phrases；

Target phrase receiving module 13, for receiving the target phrase of user's selection；

Phrase receiving module 14 is replaced, for receiving the replacement phrase for being used to replace the target phrase of user's input；

Replacement module 15, for replacing the target phrase using the replacement phrase, to update the text paragraph.

The technical solution of speech-to-text device provided in an embodiment of the present invention obtains voice messaging by conversion module, And convert speech information into text paragraph；Text paragraph is broken into pieces to form multiple phrases by breaking module into pieces；Pass through target Phrase receiving module receives the target phrase of user's selection；By replacement phrase receiving module reception user's input for replacing The replacement phrase of the target phrase；Target phrase is replaced with more new literacy paragraph using replacement phrase by replacement module.Directly It connects and target phrase is replaced using replacement phrase, without deleting original target phrase, eliminate the deletion of target phrase Operation, while reducing the mobile number of cursor, and then the modification efficiency of text paragraph can be greatly improved, reduce text section The modification time fallen.

Voice provided by any embodiment of the invention can be performed in speech-to-text device provided by the embodiment of the present invention Turn text method, has the corresponding functional module of execution method and beneficial effect.

Embodiment three

Fig. 3 is the structural schematic diagram for the equipment that the embodiment of the present invention three provides, as shown in figure 3, the equipment includes processor 201, memory 202, input unit 203 and output device 204；The quantity of processor 201 can be one or more in equipment It is a, in Fig. 3 by taking a processor 201 as an example；Processor 201, memory 202, input unit 203 and output dress in equipment Setting 204 can be connected by bus or other modes, in Fig. 3 for being connected by bus.

Memory 202 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer Sequence and module, if the corresponding program instruction/module of speech-to-text method in the embodiment of the present invention is (for example, conversion module 11, module 12, target phrase receiving module 13, replacement phrase receiving module 14 and replacement module 15 are broken into pieces).Processor 201 By running the software program, instruction and the module that are stored in memory 202, thereby executing equipment various function application with And data processing, that is, realize above-mentioned Y speech-to-text method.

Memory 202 can mainly include storing program area and storage data area, wherein storing program area can store operation system Application program needed for system, at least one function；Storage data area, which can be stored, uses created data etc. according to terminal.This Outside, memory 202 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 202 can be into one Step includes the memory remotely located relative to processor 201, these remote memories can pass through network connection to equipment. The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

Input unit 203 can be used for receiving the number or character information of input, and generate with the user setting of equipment with And the related key signals input of function control.

Output device 204 may include that display screen etc. shows equipment, for example, the display screen of user terminal.

Example IV

The embodiment of the present invention four also provides a kind of storage medium comprising computer executable instructions, and the computer can be held Row instruction is used to execute a kind of speech-to-text method when being executed by computer processor, this method comprises:

The text paragraph is broken into pieces to form multiple phrases；

Receive the target phrase of user's selection；

Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention The method operation that executable instruction is not limited to the described above, can also be performed voice provided by any embodiment of the invention and turns text Relevant operation in word method.

By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, abbreviation ROM), random access memory (Random such as computer Access Memory, abbreviation RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are used so that a calculating Machine equipment (can be personal computer, server or the network equipment etc.) executes voice described in each embodiment of the present invention Turn text method.

It is worth noting that, in the embodiment of above-mentioned speech-to-text device, included each unit and module are It is divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized；Separately Outside, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of speech-to-text method characterized by comprising

The text paragraph is broken into pieces to form multiple phrases；

Receive the target phrase of user's selection；

2. the method according to claim 1, wherein also being wrapped after the target phrase for receiving user's selection It includes:

Pop-up inputs the input frame of the replacement phrase for user；

3. according to the method described in claim 2, it is characterized in that, it is described receive user selection target phrase, including；

One or more phrases for needing to modify of user's selection are received, and using the selected phrase of user as target phrase.

4. according to the method described in claim 3, it is characterized in that, the reception user input is used to replace the target word After the replacement phrase of group, further includes:

According to the corresponding target voice phrase of the target phrase and replacement phrase, establishes the target voice phrase and replaced with described The conversion corresponding relationship of phrase is changed, to replace the conversion corresponding relationship between the target voice phrase and the target phrase.

5. the method according to claim 1, wherein it is described receive user selection target phrase before/after, Further include:

The default text for obtaining synchronous vacations modifies range, by target phrase and default text modification range with it is described The identical phrase of target phrase is used as target phrase.

6. according to the method described in claim 5, it is characterized in that, also being wrapped after the target phrase for receiving user's selection It includes:

All target phrases of the default text modification range are identified.

7. the method according to claim 1, wherein described replace the target word using the replacement phrase Group, to update the text paragraph, comprising:

If correct, the target phrase is replaced using the replacement phrase；

If incorrect, suggestion prompting is exported, if user reaffirms modification, using described in replacement phrase replacement Target phrase.

8. a kind of speech-to-text device characterized by comprising

9. a kind of equipment, which is characterized in that the equipment includes:

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now speech-to-text method as described in any in claim 1-7.

10. a kind of storage medium comprising computer executable instructions, which is characterized in that the computer executable instructions by For executing the speech-to-text method as described in any in claim 1-7 when computer processor executes.