CN109817210A - Voice writing method, device, terminal and storage medium - Google Patents
Voice writing method, device, terminal and storage medium Download PDFInfo
- Publication number
- CN109817210A CN109817210A CN201910111502.4A CN201910111502A CN109817210A CN 109817210 A CN109817210 A CN 109817210A CN 201910111502 A CN201910111502 A CN 201910111502A CN 109817210 A CN109817210 A CN 109817210A
- Authority
- CN
- China
- Prior art keywords
- intended
- user
- behavior
- text
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The embodiment of the invention discloses a kind of voice writing method, device, terminal and storage mediums.This method comprises: the user speech information that will acquire is converted to user version information in user speech writing process;The user version information is intended to be intended to as goal behavior with the candidate behavior that candidate behavior intention is matched, and be will match to;It is intended to according to the goal behavior, the content of text of editing area is edited.The embodiment of the present invention passes through the identification that is intended to user speech behavior, user terminal itself is avoided to the learning process of user speech complexity, and the matching of voice writing instruction is realized in user terminal, improve response efficiency and accuracy that user terminal writes instruction to user, any demand of the user in writing is met, brings good voice writing experience for user.
Description
Technical field
The present embodiments relate to voice processing technology field more particularly to a kind of voice writing method, device, terminal and
Storage medium.
Background technique
With the fast development of voice processing technology, more and more mobile terminals or equipment using voice processing technology into
Reception, processing and analysis of row voice etc., such as the edit operations such as the input of text are carried out using voice.
Currently, mobile terminal itself can be using strong matched mode, the character string digitized the speech into is corresponding with logic
Character string is compared, and executes corresponding logical process in the case where matching essentially equal;Or mobile terminal itself can
In a manner of using fuzzy matching, the character string digitized the speech into splits into multiple word blocks, when string matching corresponding with logic
I.e. it is believed that successful match, executes corresponding logical process again at this time when degree reaches certain proportion.In addition, mobile terminal can be with
By means of the powerful processing function of server-side, the character string digitized the speech into passes server-side back, passes through semantic analysis by server-side
Or the matching that the processing means of the complexity such as deep learning are instructed, and matching result is returned into mobile terminal and is carried out at execution
Reason.
However, being directed in voice writing scene, the speech processes mode of existing mobile terminal itself inputs user
Phonetic matrix is more demanding, and matched accuracy rate is lower;And it is existing by the matched mode of server-side, for network quality requirement
It is higher, it is difficult to the voice Writing Speed of user be followed to complete the Writing contents of big length.Therefore existing voice text editing mode
The writing process of user can not be adapted to, the scope of application is less, reduces user experience.
Summary of the invention
The embodiment of the invention provides a kind of voice writing method, device, terminal and storage mediums, can be improved mobile whole
The response efficiency of end subscriber writing instruction and accuracy.
In a first aspect, the embodiment of the invention provides a kind of voice writing methods, comprising:
In user speech writing process, the user speech information that will acquire is converted to user version information;
The user version information is intended to conduct with the candidate behavior that candidate behavior intention is matched, and be will match to
Goal behavior is intended to;
It is intended to according to the goal behavior, the content of text of editing area is edited.
Second aspect, the embodiment of the invention provides a kind of voice writing devices, comprising:
Voice conversion module, in user speech writing process, the user speech information that will acquire to be converted to user
Text information;
Intention assessment module for matching the user version information with candidate behavior intention, and will match to
Candidate behavior be intended to as goal behavior be intended to;
Text editing module edits the content of text of editing area for being intended to according to the goal behavior.
The third aspect, the embodiment of the invention provides a kind of terminals, comprising:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes voice writing method described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey
Sequence realizes voice writing method described in any embodiment of that present invention when the program is executed by processor.
For the embodiment of the present invention in user speech writing process, the user speech information that will acquire is converted to user version letter
Breath, by matching user version information with candidate behavior intention, so that the candidate behavior that will match to is intended to as mesh
Mark behavior is intended to, and is edited according to goal behavior intention to the content of text of editing area.The embodiment of the present invention by with
The identification that family speech act is intended to, avoids user terminal itself to the learning process of user speech complexity, and realizes in user terminal
The matching of voice writing instruction improves response efficiency and accuracy that user terminal writes instruction to user, meets user
Any demand in writing brings good voice writing experience for user.
Detailed description of the invention
Fig. 1 is a kind of flow chart for voice writing method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of voice writing method provided by Embodiment 2 of the present invention;
Fig. 3 is the exemplary diagram of voice control insert pictures in user speech writing process provided by Embodiment 2 of the present invention;
Fig. 4 is another example of voice control insert pictures in user speech writing process provided by Embodiment 2 of the present invention
Figure;
Fig. 5 is the flow chart of voice provided by Embodiment 2 of the present invention writing;
Fig. 6 is a kind of structural schematic diagram for voice writing device that the embodiment of the present invention three provides;
Fig. 7 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides.
Specific embodiment
The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this
Locate described specific embodiment and is used only for explaining the embodiment of the present invention, rather than limitation of the invention.It further needs exist for
Bright, only parts related to embodiments of the present invention are shown for ease of description, in attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart for voice writing method that the embodiment of the present invention one provides, and the present embodiment is applicable to use
The case where family carries out voice writing and content of text editor by voice control, this method can be held by a kind of voice writing device
Row, which can be realized by the way of software and/or hardware, preferably be arranged in mobile terminal.This method specifically includes
It is as follows:
S110, in user speech writing process, the user speech information that will acquire is converted to user version information.
In the specific embodiment of the invention, voice writing refers to the voice that mobile terminal is inputted by receiving user, and adds
It the processing such as to analyze and identify, realizes and is manually operated without user, can at least complete copy editor, picture insertion, in text
Hold the writing relevant operations such as editor, format editor, networking search.
Wherein, user speech information can be any voice content relevant to writing that user is inputted with speech form,
It can be the voice messaging for including the be inserted into text of user, or the phonetic control command of control writing operation.Correspondingly,
User speech information can also wake up information for the voice of triggering text insertion, or the voice of triggering control writing operation
Information is waken up, to wake up the user speech information execution phase after information according to voice when identifying that voice wakes up information
The voice of pass writes operation.
Specifically, in user speech writing process, acquisition for mobile terminal user speech information, and in local by user's language
Sound is converted into alphabetic character, obtains the user version information presented with character style.Wherein, the present embodiment is not to the conversion of voice
Mode is defined, and any mode that can be realized voice converting text can be using in this present embodiment.
S120, user version information is intended to make with the candidate behavior that candidate behavior intention is matched, and be will match to
For goal behavior intention.
In the specific embodiment of the invention, candidate behavior, which is intended that, refers to that predetermined user may in writing process
The operation behavior being related to.Candidate behavior intention may include that insert pictures are intended to, word content modification is intended to and text formatting
At least one of modification intention.Wherein, word content modification intention is intended to including content deletion and/or content replaces sub anticipate
Figure;Text formatting modification is intended to include that punctuate is intended to, line feed is intended to, overstriking is intended to, separator is intended to and reference
At least one of be intended to.
The present embodiment determine user speech information conversion user version information after, can by user version information with
The candidate behavior that candidate behavior intention is matched, and will match to is intended to be intended to as goal behavior.Specifically, can be preparatory
Each candidate behavior is set and is intended to associated behavior matching word, such as matching word can be preset in insert pictures intention and be
" picture of insertion target A ", it is that " modification target A word is target B that matching word can be preset in word content modification is intended to
Word ", can preset matching word in text formatting modification is intended to is " by the overstriking of target A word " etc..Wherein, candidate behavior meaning
Figure can be based on semantic analysis or deep learning scheduling algorithm for server-side, be obtained according to user version information learning.Simultaneously
Candidate behavior can be intended to be verified, updated and added etc. again based on the feedback result of each user version information
It determines.To obtain and determine from server-side when mobile terminal carries out behavior intention assessment to user speech information every time
Candidate behavior be intended to, and candidate behavior based on acquisition is intended to carry out the identification of goal behavior intention.
In addition, user version information and candidate behavior are intended to carry out matched trigger condition to be user in the present embodiment
A upper user version information before text information is to wake up text.Specifically, in the use for determining the conversion of user speech information
After the text information of family, voice can be carried out and wake up the matching of information, to identify the writing operation that user will carry out, and root
Corresponding writing operation is executed according to subsequent user speech information.Alternatively, the present embodiment can directly to user version information into
Row identification is content of text or command content with identify user's input.If it is content of text then by user speech information
The user version information input of conversion is in the input area of text;If it is command content, logic matching is carried out to command content,
To execute corresponding interaction process.Wherein, the present embodiment can also be handled user version information;According to processing result,
Determine at least one of the punctuate class punctuation mark, punctuation marks used to enclose the title and emotion class punctuation mark for including in user version information, from
It and is that content of text makes pauses in reading unpunctuated ancient writings automatically and adds punctuation mark.
Illustratively, when content of text is inserted into, user can carry out expansive voice messaging input, and support to use
The long sentence at family inputs.It is defeated to be added to writing by the voice messaging progress text conversion to acquisition by mobile terminal for content of text
Enter in region.The punctuate and punctuate that content of text is automated can also be added simultaneously.For example, differentiated according to title, it can
To add punctuation marks used to enclose the title automatically to book name included in voice messaging.And in the input of command content, by wake-up
The matching of word, such as " the small small degree of degree " both can be matched and be identified to goal behavior intention based on the triggering for waking up word.
S130, it is intended to according to goal behavior, the content of text of editing area is edited.
In the specific embodiment of the invention, editing area refers to the specified display area that voice writing is carried out in mobile terminal,
The writings such as content of text editor operation is carried out in the display area.Editing area can be the application software with voice writing function
Display editing area, or the display editing area of specified text document.Content of text may include involved by writing
Various forms of contents such as text, picture and punctuation mark.Content of text editor may include content of text itself and its
The edit operations such as show the increasing of form, delete, change, looking into.Wherein, the content of text of the forms such as picture can locally be obtained from mobile terminal
, such as the insertion for having called local photograph album to carry out formulation picture;The search that can also network under the control of user speech obtains,
Such as it is inserted into the picture of search.
In the present embodiment, it is intended to determine that target edit object and target editor are acted according to goal behavior, thus compiling
It collects in area and corresponding target editor movement is executed to target edit object.Specifically, if goal behavior is intended that insert pictures meaning
Figure, then call picture processing component, determine image credit and picture screening conditions according to user version information;It is screened according to picture
Condition obtains the Target Photo being inserted into from image credit;Target Photo is inserted into the content of text of editing area.If target
Behavior is intended that content and deletes son intention, then according to user version information, determines initial position and the stop bit of content to be deleted
It sets;According to the initial position of content to be deleted and final position, delete operation is executed to the content of text of editing area.
The technical solution of the present embodiment, in user speech writing process, the user speech information that will acquire is converted to use
Family text information, by matching user version information with candidate behavior intention, thus the candidate behavior meaning that will match to
Figure is intended to as goal behavior, and is edited according to goal behavior intention to the content of text of editing area.The embodiment of the present invention
By the identification being intended to user speech behavior, avoid user terminal itself to the learning process of user speech complexity, and with
Family end realizes the matching of voice writing instruction, improves response efficiency and accuracy that user terminal writes instruction to user, full
Foot any demand of the user in writing brings good voice writing experience for user.
Embodiment two
The present embodiment on the basis of the above embodiment 1, provides a preferred embodiment of voice writing method,
The study of behavior intention and feedback determination process can be integrated in server-side, pass through the candidate behavior that server-side determines that obtains
It is intended to, realizes the matching of voice writing instruction in user terminal.Fig. 2 is a kind of voice writing side provided by Embodiment 2 of the present invention
The flow chart of method, as shown in Fig. 2, this method includes in detail below:
S210, in user speech writing process, the user speech information that will acquire is converted to user version information.
S220, user version information is handled;According to processing result, the punctuate for including in user version information is determined
At least one of class punctuation mark, punctuation marks used to enclose the title and emotion class punctuation mark.
In the specific embodiment of the invention, carrying out processing to user version information may include the identification to text information
And/or semantic analysis etc. automatically processes process, determines pause that user version information is included or interval, the emotion conveyed with
And proprietary word etc., so that the addition of punctuate and punctuation mark is carried out for user version information automatically.Wherein it is possible in determination
Punctuate at add punctuate class punctuation mark, such as pause mark, comma, branch or fullstop etc. automatically;It can be according to emotion information
Identify the identification of such as emotion word, it is automatic to add emotion class punctuation mark, such as exclamation mark or question mark etc..It was added in punctuation marks used to enclose the title
Cheng Zhong can match user version information with candidate title, if successful match, it is determined that include matching in text information
The candidate title arrived, and punctuation marks used to enclose the title are added automatically to book name included in text information;User goes back in writing process
Title can be specified by voice mode, such as user version information by voice input is " title be certain so-and-so ", then exists
Editing area generates " certain so-and-so ".In addition, user can also call the quick selection mode of punch mark by voice command control, it is main
It moves and adds punctuation mark for text information.
S230, the candidate behavior that user is obtained from server-side are intended to.
In the specific embodiment of the invention, candidate behavior is intended to that semantic analysis or deep learning etc. can be based on for server-side
Algorithm is obtained according to user version information learning.It simultaneously can also be based on the feedback result of each user version information, to time
Behavior is selected to be intended to be verified, updated and added etc. to redefine.It goes every time to user speech information in mobile terminal
When for intention assessment, determining candidate behavior can be obtained from server-side and be intended to, and the candidate behavior based on acquisition is intended to carry out
The identification that goal behavior is intended to.So that complicated behavior intention determination process is integrated in server-side to execute, mobile terminal is only
It need to be intended to be matched according to the candidate behavior of acquisition, not only increase candidate behavior and be intended to determining accuracy, and
Further improve the matching efficiency and accuracy that user terminal is intended to goal behavior.
Optionally, user version information and candidate behavior are intended to carry out matched trigger condition to be user version information
A upper user version information before is to wake up text.
In the present embodiment, after the user version information for determining the conversion of user speech information, voice wake-up can be carried out
The matching of information, to identify the writing operation that user will carry out, and corresponding according to the execution of subsequent user speech information
Writing operation.It illustratively,, both can be with by the matching to word is waken up, such as " small degree small degree " in the input of command content
Based on the triggering for waking up word, goal behavior intention is matched and identified.
S240, user version information is intended to make with the candidate behavior that candidate behavior intention is matched, and be will match to
For goal behavior intention.
S250, it is intended to according to goal behavior, the content of text of editing area is edited.
In the specific embodiment of the invention, it is intended to determine that target edit object and target editor are dynamic according to goal behavior
Make, is acted to execute corresponding target editor to target edit object in editing area.
Optionally, if goal behavior is intended that insert pictures intention, picture processing component is called, is believed according to user version
It ceases and determines image credit and picture screening conditions;According to picture screening conditions, the Target Photo being inserted into is obtained from image credit;
Target Photo is inserted into the content of text of editing area.
In the present embodiment, insert pictures intention refers to that the Target Photo for specifying user is inserted into editing area.Wherein, it uses
Family text information can defines image credit and picture screening conditions, and image credit may include that local picture and network are searched
Rope picture, picture screening conditions may include the restrictive conditions such as personage associated by picture, time, place and event, thus
The picture for meeting picture screening conditions is filtered out from image credit as the Target Photo being inserted into.Meanwhile user version is believed
Breath can also limit the insertion position of picture in editing area, so that Target Photo to be inserted into the insertion position specified in editing area
It sets.
Illustratively, if user speech information is " insertion 7 points of photo last night ", " is inserted in the photograph of the mansion A shooting
Piece " or " group photo for being inserted into I and B " etc. can determine image credit for local phase then according to the user version information after conversion
Volume, picture screening conditions are respectively the content of the shooting time of picture, the shooting location of picture and shooting, and then from local phase
It is filtered out respectively in volume and meets the Target Photos of above-mentioned picture screening conditions and be inserted into.For example, Fig. 3 is that user speech was write
The exemplary diagram of voice control insert pictures in journey.As shown in figure 3, Fig. 3 left figure is that user is intended to carry out picture insertion from local
User behavior intention assessment is insert pictures intention by waking up the triggering of word by voice control exemplary diagram, by looking into from local
The picture for meeting picture screening conditions is looked for, and shows user selective, the local picture search result example on the right side of Fig. 3
Figure.Correspondingly, user can also be selected from least two pictures for meeting picture screening conditions by voice control,
And the picture of final choice is inserted into as Target Photo.Further, it is also possible to be inserted at least two pictures simultaneously.
For another example if user speech information is " poster of insertion film A ", " taking a picture in the street for being inserted into star B " or " insertion
The awards ceremony photo of star C " can determine that image credit is web search, figure then according to the user version information after conversion
Piece screening conditions limit image content, and then the Target Photo that search meets above-mentioned picture screening conditions from network carries out
Insertion.For example, Fig. 4 is another exemplary diagram of voice control insert pictures in user speech writing process.As shown in figure 4, can be with
The picture searched from network is supplied to user to select and be inserted into.
Optionally, it if goal behavior is intended that content and deletes son intention, according to user version information, determines in be deleted
The initial position of appearance and final position;According to the initial position of content to be deleted and final position, to the content of text of editing area
Execute delete operation.
In the present embodiment, word content modification, which is intended to refer to, modifies to the text itself in editing area content of text,
May include content delete son be intended to and/or content replacement son be intended to, wherein content delete son be intended that finger to finger determine text into
Row delete operation, content replacement, which is intended that finger to finger and determines text, is replaced operation.Correspondingly, carrying out content of text modification
When, word content to be modified and/or the position in editing area can be determined according to user version information, thus to designated position
Content modify operation.
Illustratively, if user speech information is " deleting from position A to position B ", according to the user version after conversion
Information can determine that the initial position of content to be deleted is position A, and the final position of content to be deleted is position B, and then is deleted
Content of text in editing area between initial position and final position.
For another example being believed if user speech information is " replacing with content B from content A " according to the user version after conversion
Breath can determine that content to be replaced is content A, and then replacing content A is content B.Wherein, the content of replacement can for character,
Word, sentence or paragraph etc..
In the present embodiment, candidate behavior is intended to be intended to for text formatting modification, and text formatting modification is intended to refer to
The formal modifications such as typesetting, format are carried out to the specified content of text of editing area.Text formatting modification is intended to may include mark
At least one of idea is intended to, line feed is intended to, overstriking is intended to, separator is intended to and reference is intended to.Due to part
Text modifies complex or less use, and has no way of knowing its operating method, therefore carry out text formatting according to user speech
Modification, avoid the manual operation of user's complexity, improve the modification efficiency and accuracy of format.
S260, the associated user version information of goal behavior intention is sent to server-side, used based on the received by server-side
Family text information determines that the candidate behavior of user is intended to.
In the specific embodiment of the invention, goal behavior is intended to associated user version information and refers to determining goal behavior meaning
User version information based on figure.It, can be by determining target after the process of primary user speech control writing executes
Behavior is intended to associated user version information and feeds back to server-side, can also be by this practical writing operational feedback to service
End, so that goal behavior intention is verified, updated and be added according to the user version information received by server-side
It redefines.For example, retaining goal behavior if goal behavior intention is verified and being intended to be intended to as candidate behavior;If mesh
Mark behavior is intended to authentication failed, then the history recognition accuracy that can be intended to according to goal behavior is intended to carry out to goal behavior
Amendment, and revised goal behavior is intended to be intended to as candidate behavior, realize the update being intended to candidate behavior;It can be with
It generates new behavior to be intended to be intended to as candidate behavior, to enrich the rich of user speech writing control, meets user demand.
Therefore, the complicated candidate behavior intention determination process based on semantic analysis or deep learning is integrated in server-side by the present embodiment
In, improve the accuracy that candidate behavior is intended to, further improve recognition efficiency that mobile terminal is intended to behavior with
Accuracy.
Illustratively, Fig. 5 is the flow chart of voice writing.As shown in figure 5, in the application program launching write for voice
Afterwards, the configuration information that candidate behavior is intended to can be pulled from remote service end, it can also be by user version information and candidate row
To be intended to pull the configuration information of candidate behavior intention before being matched.User speech information is received, and user speech is believed
Breath is converted to user version information.Judge in user version information whether to include the wake-up word for triggering content of text editor.If no
In the presence of content of text is directly then input to editing area;If it exists, then according to wake up the subsequent input of word user version information into
The identification that row user behavior is intended to.Wherein, if identifying, candidate behavior included by configuration information is intended to, and executes associated text
This edit operation;Otherwise, content of text is directly input to editing area, while user version information is fed back into remote service
End carries out verifying, update or addition etc. that candidate behavior is intended to according to the user version information of feedback by server-side and redefines
Operation, calling when voice is write next time for user terminal.
The technical solution of the present embodiment, in user speech writing process, the user speech information that will acquire is converted to use
Family text information, by matching user version information with candidate behavior intention, thus the candidate behavior meaning that will match to
Figure is intended to as goal behavior, and is edited according to goal behavior intention to the content of text of editing area.The embodiment of the present invention
By the identification being intended to user speech behavior, the study of behavior intention and feedback determination process are integrated in server-side, mentioned
High candidate behavior is intended to determining accuracy, avoids user terminal itself to the learning process of user speech complexity, Jin Ertong
It crosses and obtains the candidate behavior intention that server-side determines, realize the matching of voice writing instruction in user terminal, improve user terminal
Response efficiency and the accuracy of instruction are write to user.And picture searching and load are carried out based on phonetic control command, it improves
The rich and picture of content of text adds efficiency in writing, meets any demand of the user in writing, is user
Bring good voice writing experience.
Embodiment three
Fig. 6 is a kind of structural schematic diagram for voice writing device that the embodiment of the present invention three provides, and the present embodiment is applicable
The case where voice writing is with content of text editor is carried out by voice control in user, which, which can realize that the present invention is any, implements
Voice writing method described in example.The device specifically includes:
Voice conversion module 610, in user speech writing process, the user speech information that will acquire to be converted to use
Family text information;
Intention assessment module 620, for the user version information to be matched with candidate behavior intention, and will matching
The candidate behavior arrived is intended to be intended to as goal behavior;
Text editing module 630 edits the content of text of editing area for being intended to according to the goal behavior.
Optionally, the candidate behavior is intended to include that insert pictures are intended to, word content modification is intended to and text formatting is repaired
Change at least one of intention;
The word content modification intention is intended to including content deletion and/or content replaces sub- intention;
The text formatting modification is intended to include that punctuate is intended to, line feed is intended to, overstriking is intended to, separator is intended to
With reference son be intended at least one of.
Optionally, the text editing module 630 is specifically used for:
It is intended to if the goal behavior is intended that the insert pictures, picture processing component is called, according to the user
Text information determines image credit and picture screening conditions;
According to the picture screening conditions, the Target Photo being inserted into is obtained from the image credit;
The Target Photo is inserted into the content of text of editing area.
Optionally, the text editing module 630 is specifically used for:
Delete son if the goal behavior is intended that the content and be intended to, according to the user version information, determine to
Delete initial position and the final position of content;
According to the initial position of the content to be deleted and final position, the content of text of editing area is executed and deletes behaviour
Make.
Further, described device further includes punctuation mark adding module 640;The punctuation mark adding module 640 has
Body is used for:
After the user speech information that will acquire is converted into user version information, to the user version information into
Row processing;
According to processing result, punctuate class punctuation mark, punctuation marks used to enclose the title and the emotion for including in the user version information are determined
At least one of class punctuation mark.
Optionally, the user version information and candidate behavior are intended to carry out matched trigger condition to be the user
A upper user version information before text information is to wake up text.
Further, described device further includes that candidate behavior is intended to obtain module 650;Candidate's behavior is intended to obtain mould
Block 650 is specifically used for:
It is described the user version information is matched with candidate behavior intention before, obtain user's from server-side
Candidate behavior is intended to;
Correspondingly, described device further includes user version information feedback module 660;The user version information feedback module
660 are specifically used for:
After the candidate behavior that will match to is intended to be intended to as goal behavior, the target is sent to server-side
Behavior is intended to associated user version information, and by server-side, user version information determines that the candidate behavior of user is anticipated based on the received
Figure.
The technical solution of the present embodiment is realized candidate behavior and is intended to by the mutual cooperation between each functional module
Acquisition, the conversion of user speech, the identification of converting text, the wake-up of edit operation, the addition of punctuation mark, text formatting
The operation such as feedback of modification, the insertion of picture and user version information.Correspondingly, server-side realizes what candidate behavior was intended to
It determines, and according to the user version information of feedback, it is dynamic to candidate behavior redefining of being intended to be verified, updated or added
Make.The embodiment of the present invention by the study of behavior intention and feeds back determination process collection by the identification being intended to user speech behavior
It is intended to determining accuracy at candidate behavior in server-side, is improved, avoids user terminal itself to user speech complexity
Learning process, and then be intended to by obtaining the candidate behavior that server-side determines, of voice writing instruction is realized in user terminal
Match, improves response efficiency and accuracy that user terminal writes instruction to user.And picture is carried out based on phonetic control command and is searched
Rope and load improve the rich and picture addition efficiency of content of text in writing, meet user in writing
Any demand brings good voice writing experience for user.
Example IV
Fig. 7 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides, and Fig. 7, which is shown, to be suitable for being used to realizing this
The block diagram of the exemplary terminal of inventive embodiments embodiment.The terminal that Fig. 7 is shown is only an example, should not be to the present invention
The function and use scope of embodiment bring any restrictions.
The terminal 12 that Fig. 7 is shown is only an example, should not function to the embodiment of the present invention and use scope bring
Any restrictions.
As shown in fig. 7, terminal 12 is showed in the form of universal computing device.The component of terminal 12 may include but unlimited
In one or more processor 16, system storage 28, different system components (including system storage 28 and processing are connected
Device 16) bus 18.
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Terminal 12 typically comprises a variety of computer system readable media.These media can be it is any can be by terminal 12
The usable medium of access, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (RAM) 30 and/or cache memory 32.Terminal 12 may further include it is other it is removable/nonremovable,
Volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing irremovable
, non-volatile magnetic media (Fig. 7 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 7, use can be provided
In the disc driver read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk
The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can
To be connected by one or more data media interfaces with bus 18.System storage 28 may include that at least one program produces
Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention real
Apply the function of each embodiment of example.
Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system
In device 28, such program module 42 includes but is not limited to operating system, one or more application program, other program modules
And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 42
Usually execute the function and/or method in described embodiment of the embodiment of the present invention.
Terminal 12 can also be communicated with one or more exterior terminals 14 (such as keyboard, sensing equipment, display 24 etc.),
Can also be enabled a user to one or more equipment interacted with the terminal 12 communication, and/or with enable the terminal 12 with
One or more of the other any equipment (such as network interface card, modem etc.) communication for calculating equipment and being communicated.It is this logical
Letter can be carried out by input/output (I/O) interface 22.Also, terminal 12 can also by network adapter 20 and one or
The multiple networks of person (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown,
Network adapter 20 is communicated by bus 18 with other modules of terminal 12.It should be understood that although not shown in the drawings, can combine
Terminal 12 uses other hardware and/or software module, including but not limited to: microcode, device driver, redundant processor, outer
Portion's disk drive array, RAID system, tape drive and data backup storage system etc..
The program that processor 16 is stored in system storage 28 by operation, thereby executing various function application and number
According to processing, such as realize voice writing method provided by the embodiment of the present invention.
Embodiment five
The embodiment of the present invention five also provides a kind of computer readable storage medium, be stored thereon with computer program (or
For computer executable instructions), for executing a kind of voice writing method when which is executed by processor, this method comprises:
In user speech writing process, the user speech information that will acquire is converted to user version information;
The user version information is intended to conduct with the candidate behavior that candidate behavior intention is matched, and be will match to
Goal behavior is intended to;
It is intended to according to the goal behavior, the content of text of editing area is edited.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage
Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device
Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with one or more programming languages or combinations thereof come write for execute the embodiment of the present invention operation
Computer program code, described program design language include object oriented program language-such as Java,
Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program design language
Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence
Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or
It is executed in terminal.In situations involving remote computers, remote computer can pass through the network of any kind --- including
Local area network (LAN) or wide area network (WAN)-are connected to subscriber computer, or, it may be connected to outer computer (such as using
ISP is connected by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being implemented by above embodiments to the present invention
Example is described in further detail, but the embodiment of the present invention is not limited only to above embodiments, is not departing from structure of the present invention
It can also include more other equivalent embodiments in the case where think of, and the scope of the present invention is determined by scope of the appended claims
It is fixed.
Claims (10)
1. a kind of voice writing method characterized by comprising
In user speech writing process, the user speech information that will acquire is converted to user version information;
The user version information is intended to the candidate behavior that candidate behavior intention is matched, and be will match to as target
Behavior is intended to;
It is intended to according to the goal behavior, the content of text of editing area is edited.
2. the method according to claim 1, wherein candidate's behavior is intended to include that insert pictures are intended to, are literary
At least one of word content modification is intended to and text formatting modification is intended to;
The word content modification intention is intended to including content deletion and/or content replaces sub- intention;
The text formatting modification is intended to include that punctuate is intended to, line feed is intended to, overstriking is intended to, separator is intended to and draws
With at least one of sub- intention.
3. according to the method described in claim 2, it is characterized in that, if the goal behavior is intended that the insert pictures meaning
Figure, then it is described to be intended to according to the goal behavior, the content of text of editing area is edited, comprising:
Picture processing component is called, image credit and picture screening conditions are determined according to the user version information;
According to the picture screening conditions, the Target Photo being inserted into is obtained from the image credit;
The Target Photo is inserted into the content of text of editing area.
4. according to the method described in claim 2, it is characterized in that, if the goal behavior is intended that the content and deletes son meaning
Figure, then it is described to be intended to according to the goal behavior, the content of text of editing area is edited, comprising:
According to the user version information, initial position and the final position of content to be deleted are determined;
According to the initial position of the content to be deleted and final position, delete operation is executed to the content of text of editing area.
5. the method according to claim 1, wherein being converted into user in the user speech information that will acquire
After text information, further includes:
The user version information is handled;
According to processing result, punctuate class punctuation mark, punctuation marks used to enclose the title and the emotion category for including in the user version information are determined
At least one of point symbol.
6. the method according to claim 1, wherein the user version information and candidate behavior are intended to carry out
Matched trigger condition is that the upper user version information before the user version information is to wake up text.
7. the method according to claim 1, wherein the user version information and candidate behavior are anticipated described
Before figure is matched, further includes: the candidate behavior for obtaining user from server-side is intended to;
Correspondingly, after the candidate behavior that will match to is intended to be intended to as goal behavior, further includes: sent out to server-side
The goal behavior is sent to be intended to associated user version information, user version information determines user's based on the received by server-side
Candidate behavior is intended to.
8. a kind of voice writing device characterized by comprising
Voice conversion module, in user speech writing process, the user speech information that will acquire to be converted to user version
Information;
Intention assessment module, for the user version information to be intended to the time that matches, and will match to candidate behavior
Behavior is selected to be intended to be intended to as goal behavior;
Text editing module edits the content of text of editing area for being intended to according to the goal behavior.
9. a kind of terminal characterized by comprising
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as voice writing method of any of claims 1-7.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
Such as voice writing method of any of claims 1-7 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910111502.4A CN109817210B (en) | 2019-02-12 | 2019-02-12 | Voice writing method, device, terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910111502.4A CN109817210B (en) | 2019-02-12 | 2019-02-12 | Voice writing method, device, terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109817210A true CN109817210A (en) | 2019-05-28 |
CN109817210B CN109817210B (en) | 2021-08-17 |
Family
ID=66606492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910111502.4A Active CN109817210B (en) | 2019-02-12 | 2019-02-12 | Voice writing method, device, terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109817210B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110827825A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Punctuation prediction method, system, terminal and storage medium for speech recognition text |
CN111261155A (en) * | 2019-12-27 | 2020-06-09 | 北京得意音通技术有限责任公司 | Speech processing method, computer-readable storage medium, computer program, and electronic device |
CN111292721A (en) * | 2020-02-20 | 2020-06-16 | 深圳壹账通智能科技有限公司 | Code compiling method and device and computer equipment |
CN111883136A (en) * | 2020-07-30 | 2020-11-03 | 潘忠鸿 | Rapid writing method and device based on artificial intelligence |
CN112115686A (en) * | 2019-06-21 | 2020-12-22 | 珠海金山办公软件有限公司 | Document editing method and device, computer storage medium and terminal |
CN112307073A (en) * | 2019-08-30 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Information query method, device, equipment and storage medium |
CN113139368A (en) * | 2021-05-18 | 2021-07-20 | 清华大学 | Text editing method and system |
CN113571061A (en) * | 2020-04-28 | 2021-10-29 | 阿里巴巴集团控股有限公司 | System, method, device and equipment for editing voice transcription text |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5566272A (en) * | 1993-10-27 | 1996-10-15 | Lucent Technologies Inc. | Automatic speech recognition (ASR) processing using confidence measures |
US20070299671A1 (en) * | 2004-03-31 | 2007-12-27 | Ruchika Kapur | Method and apparatus for analysing sound- converting sound into information |
KR101631754B1 (en) * | 2014-07-23 | 2016-06-17 | 엘지전자 주식회사 | Mobile terminal |
KR101756836B1 (en) * | 2010-11-12 | 2017-07-11 | 에스프린팅솔루션 주식회사 | Method and system for generating document using speech data, and Image forming apparatus having it |
CN107146606A (en) * | 2016-03-01 | 2017-09-08 | 谷歌公司 | Developer's speech action system |
CN107291690A (en) * | 2017-05-26 | 2017-10-24 | 北京搜狗科技发展有限公司 | Punctuate adding method and device, the device added for punctuate |
CN107450746A (en) * | 2017-08-18 | 2017-12-08 | 联想(北京)有限公司 | A kind of insertion method of emoticon, device and electronic equipment |
CN107507615A (en) * | 2017-08-29 | 2017-12-22 | 百度在线网络技术(北京)有限公司 | Interface intelligent interaction control method, device, system and storage medium |
CN107516511A (en) * | 2016-06-13 | 2017-12-26 | 微软技术许可有限责任公司 | The Text To Speech learning system of intention assessment and mood |
CN107529640A (en) * | 2017-08-16 | 2018-01-02 | 杭州上手科技有限公司 | A kind of acoustic control composing system |
CN107885416A (en) * | 2017-10-30 | 2018-04-06 | 努比亚技术有限公司 | A kind of text clone method, terminal and computer-readable recording medium |
CN108255917A (en) * | 2017-09-15 | 2018-07-06 | 广州市动景计算机科技有限公司 | Image management method, equipment and electronic equipment |
CN108446280A (en) * | 2017-02-06 | 2018-08-24 | 北京嘀嘀无限科技发展有限公司 | Data-updating method and device |
CN108509393A (en) * | 2018-03-20 | 2018-09-07 | 联想(北京)有限公司 | A kind of method and apparatus of edit text message |
CN108831479A (en) * | 2018-06-27 | 2018-11-16 | 努比亚技术有限公司 | A kind of audio recognition method, terminal and computer readable storage medium |
CN109215641A (en) * | 2017-07-03 | 2019-01-15 | 九阳股份有限公司 | Home appliance voice control method and system based on cloud |
-
2019
- 2019-02-12 CN CN201910111502.4A patent/CN109817210B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5566272A (en) * | 1993-10-27 | 1996-10-15 | Lucent Technologies Inc. | Automatic speech recognition (ASR) processing using confidence measures |
US20070299671A1 (en) * | 2004-03-31 | 2007-12-27 | Ruchika Kapur | Method and apparatus for analysing sound- converting sound into information |
KR101756836B1 (en) * | 2010-11-12 | 2017-07-11 | 에스프린팅솔루션 주식회사 | Method and system for generating document using speech data, and Image forming apparatus having it |
KR101631754B1 (en) * | 2014-07-23 | 2016-06-17 | 엘지전자 주식회사 | Mobile terminal |
CN107146606A (en) * | 2016-03-01 | 2017-09-08 | 谷歌公司 | Developer's speech action system |
CN107516511A (en) * | 2016-06-13 | 2017-12-26 | 微软技术许可有限责任公司 | The Text To Speech learning system of intention assessment and mood |
CN108446280A (en) * | 2017-02-06 | 2018-08-24 | 北京嘀嘀无限科技发展有限公司 | Data-updating method and device |
CN107291690A (en) * | 2017-05-26 | 2017-10-24 | 北京搜狗科技发展有限公司 | Punctuate adding method and device, the device added for punctuate |
CN109215641A (en) * | 2017-07-03 | 2019-01-15 | 九阳股份有限公司 | Home appliance voice control method and system based on cloud |
CN107529640A (en) * | 2017-08-16 | 2018-01-02 | 杭州上手科技有限公司 | A kind of acoustic control composing system |
CN107450746A (en) * | 2017-08-18 | 2017-12-08 | 联想(北京)有限公司 | A kind of insertion method of emoticon, device and electronic equipment |
CN107507615A (en) * | 2017-08-29 | 2017-12-22 | 百度在线网络技术(北京)有限公司 | Interface intelligent interaction control method, device, system and storage medium |
CN108255917A (en) * | 2017-09-15 | 2018-07-06 | 广州市动景计算机科技有限公司 | Image management method, equipment and electronic equipment |
CN107885416A (en) * | 2017-10-30 | 2018-04-06 | 努比亚技术有限公司 | A kind of text clone method, terminal and computer-readable recording medium |
CN108509393A (en) * | 2018-03-20 | 2018-09-07 | 联想(北京)有限公司 | A kind of method and apparatus of edit text message |
CN108831479A (en) * | 2018-06-27 | 2018-11-16 | 努比亚技术有限公司 | A kind of audio recognition method, terminal and computer readable storage medium |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112115686A (en) * | 2019-06-21 | 2020-12-22 | 珠海金山办公软件有限公司 | Document editing method and device, computer storage medium and terminal |
CN112115686B (en) * | 2019-06-21 | 2024-05-07 | 珠海金山办公软件有限公司 | Method and device for editing document, computer storage medium and terminal |
CN112307073A (en) * | 2019-08-30 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Information query method, device, equipment and storage medium |
CN110827825A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Punctuation prediction method, system, terminal and storage medium for speech recognition text |
CN111261155A (en) * | 2019-12-27 | 2020-06-09 | 北京得意音通技术有限责任公司 | Speech processing method, computer-readable storage medium, computer program, and electronic device |
CN111292721A (en) * | 2020-02-20 | 2020-06-16 | 深圳壹账通智能科技有限公司 | Code compiling method and device and computer equipment |
CN113571061A (en) * | 2020-04-28 | 2021-10-29 | 阿里巴巴集团控股有限公司 | System, method, device and equipment for editing voice transcription text |
CN111883136A (en) * | 2020-07-30 | 2020-11-03 | 潘忠鸿 | Rapid writing method and device based on artificial intelligence |
CN113139368A (en) * | 2021-05-18 | 2021-07-20 | 清华大学 | Text editing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN109817210B (en) | 2021-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109817210A (en) | Voice writing method, device, terminal and storage medium | |
US20170169822A1 (en) | Dialog text summarization device and method | |
CN110166650B (en) | Video set generation method and device, computer equipment and readable medium | |
CN104485105A (en) | Electronic medical record generating method and electronic medical record system | |
CN109119079B (en) | Voice input processing method and device | |
CN104282302A (en) | Apparatus and method for recognizing voice and text | |
CN110750996B (en) | Method and device for generating multimedia information and readable storage medium | |
CN112084756B (en) | Conference file generation method and device and electronic equipment | |
CN112579733B (en) | Rule matching method, rule matching device, storage medium and electronic equipment | |
CN103150294A (en) | Method and system for correcting based on voice identification results | |
CN109782997B (en) | Data processing method, device and storage medium | |
US9772816B1 (en) | Transcription and tagging system | |
CN106126157A (en) | Pronunciation inputting method based on hospital information system and device | |
EP4322029A1 (en) | Method and apparatus for generating video corpus, and related device | |
CN111309876A (en) | Service request processing method and device, electronic equipment and storage medium | |
JPH07222248A (en) | System for utilizing speech information for portable information terminal | |
CN112114771A (en) | Presentation file playing control method and device | |
CN112201253B (en) | Text marking method, text marking device, electronic equipment and computer readable storage medium | |
CN103559242A (en) | Method for achieving voice input of information and terminal device | |
CN112149403A (en) | Method and device for determining confidential text | |
KR101705228B1 (en) | Electronic document producing apparatus, and control method thereof | |
CN115640790A (en) | Information processing method and device and electronic equipment | |
CN107368602A (en) | A kind of photo storage method and photo storage device for smart machine | |
US11017073B2 (en) | Information processing apparatus, information processing system, and method of processing information | |
CN108255917A (en) | Image management method, equipment and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |