CN109817210A

CN109817210A - Voice writing method, device, terminal and storage medium

Info

Publication number: CN109817210A
Application number: CN201910111502.4A
Authority: CN
Inventors: 赵洪飞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-02-12
Filing date: 2019-02-12
Publication date: 2019-05-28
Anticipated expiration: 2039-02-12
Also published as: CN109817210B

Abstract

The embodiment of the invention discloses a kind of voice writing method, device, terminal and storage mediums.This method comprises: the user speech information that will acquire is converted to user version information in user speech writing process；The user version information is intended to be intended to as goal behavior with the candidate behavior that candidate behavior intention is matched, and be will match to；It is intended to according to the goal behavior, the content of text of editing area is edited.The embodiment of the present invention passes through the identification that is intended to user speech behavior, user terminal itself is avoided to the learning process of user speech complexity, and the matching of voice writing instruction is realized in user terminal, improve response efficiency and accuracy that user terminal writes instruction to user, any demand of the user in writing is met, brings good voice writing experience for user.

Description

Voice writing method, device, terminal and storage medium

Technical field

The present embodiments relate to voice processing technology field more particularly to a kind of voice writing method, device, terminal and Storage medium.

Background technique

With the fast development of voice processing technology, more and more mobile terminals or equipment using voice processing technology into Reception, processing and analysis of row voice etc., such as the edit operations such as the input of text are carried out using voice.

Currently, mobile terminal itself can be using strong matched mode, the character string digitized the speech into is corresponding with logic Character string is compared, and executes corresponding logical process in the case where matching essentially equal；Or mobile terminal itself can In a manner of using fuzzy matching, the character string digitized the speech into splits into multiple word blocks, when string matching corresponding with logic I.e. it is believed that successful match, executes corresponding logical process again at this time when degree reaches certain proportion.In addition, mobile terminal can be with By means of the powerful processing function of server-side, the character string digitized the speech into passes server-side back, passes through semantic analysis by server-side Or the matching that the processing means of the complexity such as deep learning are instructed, and matching result is returned into mobile terminal and is carried out at execution Reason.

However, being directed in voice writing scene, the speech processes mode of existing mobile terminal itself inputs user Phonetic matrix is more demanding, and matched accuracy rate is lower；And it is existing by the matched mode of server-side, for network quality requirement It is higher, it is difficult to the voice Writing Speed of user be followed to complete the Writing contents of big length.Therefore existing voice text editing mode The writing process of user can not be adapted to, the scope of application is less, reduces user experience.

Summary of the invention

The embodiment of the invention provides a kind of voice writing method, device, terminal and storage mediums, can be improved mobile whole The response efficiency of end subscriber writing instruction and accuracy.

In a first aspect, the embodiment of the invention provides a kind of voice writing methods, comprising:

In user speech writing process, the user speech information that will acquire is converted to user version information；

The user version information is intended to conduct with the candidate behavior that candidate behavior intention is matched, and be will match to Goal behavior is intended to；

It is intended to according to the goal behavior, the content of text of editing area is edited.

Second aspect, the embodiment of the invention provides a kind of voice writing devices, comprising:

Voice conversion module, in user speech writing process, the user speech information that will acquire to be converted to user Text information；

Intention assessment module for matching the user version information with candidate behavior intention, and will match to Candidate behavior be intended to as goal behavior be intended to；

Text editing module edits the content of text of editing area for being intended to according to the goal behavior.

The third aspect, the embodiment of the invention provides a kind of terminals, comprising:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes voice writing method described in any embodiment of that present invention.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey Sequence realizes voice writing method described in any embodiment of that present invention when the program is executed by processor.

For the embodiment of the present invention in user speech writing process, the user speech information that will acquire is converted to user version letter Breath, by matching user version information with candidate behavior intention, so that the candidate behavior that will match to is intended to as mesh Mark behavior is intended to, and is edited according to goal behavior intention to the content of text of editing area.The embodiment of the present invention by with The identification that family speech act is intended to, avoids user terminal itself to the learning process of user speech complexity, and realizes in user terminal The matching of voice writing instruction improves response efficiency and accuracy that user terminal writes instruction to user, meets user Any demand in writing brings good voice writing experience for user.

Detailed description of the invention

Fig. 1 is a kind of flow chart for voice writing method that the embodiment of the present invention one provides；

Fig. 2 is a kind of flow chart of voice writing method provided by Embodiment 2 of the present invention；

Fig. 3 is the exemplary diagram of voice control insert pictures in user speech writing process provided by Embodiment 2 of the present invention；

Fig. 4 is another example of voice control insert pictures in user speech writing process provided by Embodiment 2 of the present invention Figure；

Fig. 5 is the flow chart of voice provided by Embodiment 2 of the present invention writing；

Fig. 6 is a kind of structural schematic diagram for voice writing device that the embodiment of the present invention three provides；

Fig. 7 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides.

Specific embodiment

The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this Locate described specific embodiment and is used only for explaining the embodiment of the present invention, rather than limitation of the invention.It further needs exist for Bright, only parts related to embodiments of the present invention are shown for ease of description, in attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is a kind of flow chart for voice writing method that the embodiment of the present invention one provides, and the present embodiment is applicable to use The case where family carries out voice writing and content of text editor by voice control, this method can be held by a kind of voice writing device Row, which can be realized by the way of software and/or hardware, preferably be arranged in mobile terminal.This method specifically includes It is as follows:

S110, in user speech writing process, the user speech information that will acquire is converted to user version information.

In the specific embodiment of the invention, voice writing refers to the voice that mobile terminal is inputted by receiving user, and adds It the processing such as to analyze and identify, realizes and is manually operated without user, can at least complete copy editor, picture insertion, in text Hold the writing relevant operations such as editor, format editor, networking search.

Wherein, user speech information can be any voice content relevant to writing that user is inputted with speech form, It can be the voice messaging for including the be inserted into text of user, or the phonetic control command of control writing operation.Correspondingly, User speech information can also wake up information for the voice of triggering text insertion, or the voice of triggering control writing operation Information is waken up, to wake up the user speech information execution phase after information according to voice when identifying that voice wakes up information The voice of pass writes operation.

Specifically, in user speech writing process, acquisition for mobile terminal user speech information, and in local by user's language Sound is converted into alphabetic character, obtains the user version information presented with character style.Wherein, the present embodiment is not to the conversion of voice Mode is defined, and any mode that can be realized voice converting text can be using in this present embodiment.

S120, user version information is intended to make with the candidate behavior that candidate behavior intention is matched, and be will match to For goal behavior intention.

In the specific embodiment of the invention, candidate behavior, which is intended that, refers to that predetermined user may in writing process The operation behavior being related to.Candidate behavior intention may include that insert pictures are intended to, word content modification is intended to and text formatting At least one of modification intention.Wherein, word content modification intention is intended to including content deletion and/or content replaces sub anticipate Figure；Text formatting modification is intended to include that punctuate is intended to, line feed is intended to, overstriking is intended to, separator is intended to and reference At least one of be intended to.

The present embodiment determine user speech information conversion user version information after, can by user version information with The candidate behavior that candidate behavior intention is matched, and will match to is intended to be intended to as goal behavior.Specifically, can be preparatory Each candidate behavior is set and is intended to associated behavior matching word, such as matching word can be preset in insert pictures intention and be " picture of insertion target A ", it is that " modification target A word is target B that matching word can be preset in word content modification is intended to Word ", can preset matching word in text formatting modification is intended to is " by the overstriking of target A word " etc..Wherein, candidate behavior meaning Figure can be based on semantic analysis or deep learning scheduling algorithm for server-side, be obtained according to user version information learning.Simultaneously Candidate behavior can be intended to be verified, updated and added etc. again based on the feedback result of each user version information It determines.To obtain and determine from server-side when mobile terminal carries out behavior intention assessment to user speech information every time Candidate behavior be intended to, and candidate behavior based on acquisition is intended to carry out the identification of goal behavior intention.

In addition, user version information and candidate behavior are intended to carry out matched trigger condition to be user in the present embodiment A upper user version information before text information is to wake up text.Specifically, in the use for determining the conversion of user speech information After the text information of family, voice can be carried out and wake up the matching of information, to identify the writing operation that user will carry out, and root Corresponding writing operation is executed according to subsequent user speech information.Alternatively, the present embodiment can directly to user version information into Row identification is content of text or command content with identify user's input.If it is content of text then by user speech information The user version information input of conversion is in the input area of text；If it is command content, logic matching is carried out to command content, To execute corresponding interaction process.Wherein, the present embodiment can also be handled user version information；According to processing result, Determine at least one of the punctuate class punctuation mark, punctuation marks used to enclose the title and emotion class punctuation mark for including in user version information, from It and is that content of text makes pauses in reading unpunctuated ancient writings automatically and adds punctuation mark.

Illustratively, when content of text is inserted into, user can carry out expansive voice messaging input, and support to use The long sentence at family inputs.It is defeated to be added to writing by the voice messaging progress text conversion to acquisition by mobile terminal for content of text Enter in region.The punctuate and punctuate that content of text is automated can also be added simultaneously.For example, differentiated according to title, it can To add punctuation marks used to enclose the title automatically to book name included in voice messaging.And in the input of command content, by wake-up The matching of word, such as " the small small degree of degree " both can be matched and be identified to goal behavior intention based on the triggering for waking up word.

S130, it is intended to according to goal behavior, the content of text of editing area is edited.

In the specific embodiment of the invention, editing area refers to the specified display area that voice writing is carried out in mobile terminal, The writings such as content of text editor operation is carried out in the display area.Editing area can be the application software with voice writing function Display editing area, or the display editing area of specified text document.Content of text may include involved by writing Various forms of contents such as text, picture and punctuation mark.Content of text editor may include content of text itself and its The edit operations such as show the increasing of form, delete, change, looking into.Wherein, the content of text of the forms such as picture can locally be obtained from mobile terminal , such as the insertion for having called local photograph album to carry out formulation picture；The search that can also network under the control of user speech obtains, Such as it is inserted into the picture of search.

In the present embodiment, it is intended to determine that target edit object and target editor are acted according to goal behavior, thus compiling It collects in area and corresponding target editor movement is executed to target edit object.Specifically, if goal behavior is intended that insert pictures meaning Figure, then call picture processing component, determine image credit and picture screening conditions according to user version information；It is screened according to picture Condition obtains the Target Photo being inserted into from image credit；Target Photo is inserted into the content of text of editing area.If target Behavior is intended that content and deletes son intention, then according to user version information, determines initial position and the stop bit of content to be deleted It sets；According to the initial position of content to be deleted and final position, delete operation is executed to the content of text of editing area.

The technical solution of the present embodiment, in user speech writing process, the user speech information that will acquire is converted to use Family text information, by matching user version information with candidate behavior intention, thus the candidate behavior meaning that will match to Figure is intended to as goal behavior, and is edited according to goal behavior intention to the content of text of editing area.The embodiment of the present invention By the identification being intended to user speech behavior, avoid user terminal itself to the learning process of user speech complexity, and with Family end realizes the matching of voice writing instruction, improves response efficiency and accuracy that user terminal writes instruction to user, full Foot any demand of the user in writing brings good voice writing experience for user.

Embodiment two

The present embodiment on the basis of the above embodiment 1, provides a preferred embodiment of voice writing method, The study of behavior intention and feedback determination process can be integrated in server-side, pass through the candidate behavior that server-side determines that obtains It is intended to, realizes the matching of voice writing instruction in user terminal.Fig. 2 is a kind of voice writing side provided by Embodiment 2 of the present invention The flow chart of method, as shown in Fig. 2, this method includes in detail below:

S210, in user speech writing process, the user speech information that will acquire is converted to user version information.

S220, user version information is handled；According to processing result, the punctuate for including in user version information is determined At least one of class punctuation mark, punctuation marks used to enclose the title and emotion class punctuation mark.

In the specific embodiment of the invention, carrying out processing to user version information may include the identification to text information And/or semantic analysis etc. automatically processes process, determines pause that user version information is included or interval, the emotion conveyed with And proprietary word etc., so that the addition of punctuate and punctuation mark is carried out for user version information automatically.Wherein it is possible in determination Punctuate at add punctuate class punctuation mark, such as pause mark, comma, branch or fullstop etc. automatically；It can be according to emotion information Identify the identification of such as emotion word, it is automatic to add emotion class punctuation mark, such as exclamation mark or question mark etc..It was added in punctuation marks used to enclose the title Cheng Zhong can match user version information with candidate title, if successful match, it is determined that include matching in text information The candidate title arrived, and punctuation marks used to enclose the title are added automatically to book name included in text information；User goes back in writing process Title can be specified by voice mode, such as user version information by voice input is " title be certain so-and-so ", then exists Editing area generates " certain so-and-so ".In addition, user can also call the quick selection mode of punch mark by voice command control, it is main It moves and adds punctuation mark for text information.

S230, the candidate behavior that user is obtained from server-side are intended to.

In the specific embodiment of the invention, candidate behavior is intended to that semantic analysis or deep learning etc. can be based on for server-side Algorithm is obtained according to user version information learning.It simultaneously can also be based on the feedback result of each user version information, to time Behavior is selected to be intended to be verified, updated and added etc. to redefine.It goes every time to user speech information in mobile terminal When for intention assessment, determining candidate behavior can be obtained from server-side and be intended to, and the candidate behavior based on acquisition is intended to carry out The identification that goal behavior is intended to.So that complicated behavior intention determination process is integrated in server-side to execute, mobile terminal is only It need to be intended to be matched according to the candidate behavior of acquisition, not only increase candidate behavior and be intended to determining accuracy, and Further improve the matching efficiency and accuracy that user terminal is intended to goal behavior.

Optionally, user version information and candidate behavior are intended to carry out matched trigger condition to be user version information A upper user version information before is to wake up text.

In the present embodiment, after the user version information for determining the conversion of user speech information, voice wake-up can be carried out The matching of information, to identify the writing operation that user will carry out, and corresponding according to the execution of subsequent user speech information Writing operation.It illustratively,, both can be with by the matching to word is waken up, such as " small degree small degree " in the input of command content Based on the triggering for waking up word, goal behavior intention is matched and identified.

S240, user version information is intended to make with the candidate behavior that candidate behavior intention is matched, and be will match to For goal behavior intention.

S250, it is intended to according to goal behavior, the content of text of editing area is edited.

In the specific embodiment of the invention, it is intended to determine that target edit object and target editor are dynamic according to goal behavior Make, is acted to execute corresponding target editor to target edit object in editing area.

Optionally, if goal behavior is intended that insert pictures intention, picture processing component is called, is believed according to user version It ceases and determines image credit and picture screening conditions；According to picture screening conditions, the Target Photo being inserted into is obtained from image credit； Target Photo is inserted into the content of text of editing area.

In the present embodiment, insert pictures intention refers to that the Target Photo for specifying user is inserted into editing area.Wherein, it uses Family text information can defines image credit and picture screening conditions, and image credit may include that local picture and network are searched Rope picture, picture screening conditions may include the restrictive conditions such as personage associated by picture, time, place and event, thus The picture for meeting picture screening conditions is filtered out from image credit as the Target Photo being inserted into.Meanwhile user version is believed Breath can also limit the insertion position of picture in editing area, so that Target Photo to be inserted into the insertion position specified in editing area It sets.

Illustratively, if user speech information is " insertion 7 points of photo last night ", " is inserted in the photograph of the mansion A shooting Piece " or " group photo for being inserted into I and B " etc. can determine image credit for local phase then according to the user version information after conversion Volume, picture screening conditions are respectively the content of the shooting time of picture, the shooting location of picture and shooting, and then from local phase It is filtered out respectively in volume and meets the Target Photos of above-mentioned picture screening conditions and be inserted into.For example, Fig. 3 is that user speech was write The exemplary diagram of voice control insert pictures in journey.As shown in figure 3, Fig. 3 left figure is that user is intended to carry out picture insertion from local User behavior intention assessment is insert pictures intention by waking up the triggering of word by voice control exemplary diagram, by looking into from local The picture for meeting picture screening conditions is looked for, and shows user selective, the local picture search result example on the right side of Fig. 3 Figure.Correspondingly, user can also be selected from least two pictures for meeting picture screening conditions by voice control, And the picture of final choice is inserted into as Target Photo.Further, it is also possible to be inserted at least two pictures simultaneously.

For another example if user speech information is " poster of insertion film A ", " taking a picture in the street for being inserted into star B " or " insertion The awards ceremony photo of star C " can determine that image credit is web search, figure then according to the user version information after conversion Piece screening conditions limit image content, and then the Target Photo that search meets above-mentioned picture screening conditions from network carries out Insertion.For example, Fig. 4 is another exemplary diagram of voice control insert pictures in user speech writing process.As shown in figure 4, can be with The picture searched from network is supplied to user to select and be inserted into.

Optionally, it if goal behavior is intended that content and deletes son intention, according to user version information, determines in be deleted The initial position of appearance and final position；According to the initial position of content to be deleted and final position, to the content of text of editing area Execute delete operation.

In the present embodiment, word content modification, which is intended to refer to, modifies to the text itself in editing area content of text, May include content delete son be intended to and/or content replacement son be intended to, wherein content delete son be intended that finger to finger determine text into Row delete operation, content replacement, which is intended that finger to finger and determines text, is replaced operation.Correspondingly, carrying out content of text modification When, word content to be modified and/or the position in editing area can be determined according to user version information, thus to designated position Content modify operation.

Illustratively, if user speech information is " deleting from position A to position B ", according to the user version after conversion Information can determine that the initial position of content to be deleted is position A, and the final position of content to be deleted is position B, and then is deleted Content of text in editing area between initial position and final position.

For another example being believed if user speech information is " replacing with content B from content A " according to the user version after conversion Breath can determine that content to be replaced is content A, and then replacing content A is content B.Wherein, the content of replacement can for character, Word, sentence or paragraph etc..

In the present embodiment, candidate behavior is intended to be intended to for text formatting modification, and text formatting modification is intended to refer to The formal modifications such as typesetting, format are carried out to the specified content of text of editing area.Text formatting modification is intended to may include mark At least one of idea is intended to, line feed is intended to, overstriking is intended to, separator is intended to and reference is intended to.Due to part Text modifies complex or less use, and has no way of knowing its operating method, therefore carry out text formatting according to user speech Modification, avoid the manual operation of user's complexity, improve the modification efficiency and accuracy of format.

S260, the associated user version information of goal behavior intention is sent to server-side, used based on the received by server-side Family text information determines that the candidate behavior of user is intended to.

In the specific embodiment of the invention, goal behavior is intended to associated user version information and refers to determining goal behavior meaning User version information based on figure.It, can be by determining target after the process of primary user speech control writing executes Behavior is intended to associated user version information and feeds back to server-side, can also be by this practical writing operational feedback to service End, so that goal behavior intention is verified, updated and be added according to the user version information received by server-side It redefines.For example, retaining goal behavior if goal behavior intention is verified and being intended to be intended to as candidate behavior；If mesh Mark behavior is intended to authentication failed, then the history recognition accuracy that can be intended to according to goal behavior is intended to carry out to goal behavior Amendment, and revised goal behavior is intended to be intended to as candidate behavior, realize the update being intended to candidate behavior；It can be with It generates new behavior to be intended to be intended to as candidate behavior, to enrich the rich of user speech writing control, meets user demand. Therefore, the complicated candidate behavior intention determination process based on semantic analysis or deep learning is integrated in server-side by the present embodiment In, improve the accuracy that candidate behavior is intended to, further improve recognition efficiency that mobile terminal is intended to behavior with Accuracy.

Illustratively, Fig. 5 is the flow chart of voice writing.As shown in figure 5, in the application program launching write for voice Afterwards, the configuration information that candidate behavior is intended to can be pulled from remote service end, it can also be by user version information and candidate row To be intended to pull the configuration information of candidate behavior intention before being matched.User speech information is received, and user speech is believed Breath is converted to user version information.Judge in user version information whether to include the wake-up word for triggering content of text editor.If no In the presence of content of text is directly then input to editing area；If it exists, then according to wake up the subsequent input of word user version information into The identification that row user behavior is intended to.Wherein, if identifying, candidate behavior included by configuration information is intended to, and executes associated text This edit operation；Otherwise, content of text is directly input to editing area, while user version information is fed back into remote service End carries out verifying, update or addition etc. that candidate behavior is intended to according to the user version information of feedback by server-side and redefines Operation, calling when voice is write next time for user terminal.

The technical solution of the present embodiment, in user speech writing process, the user speech information that will acquire is converted to use Family text information, by matching user version information with candidate behavior intention, thus the candidate behavior meaning that will match to Figure is intended to as goal behavior, and is edited according to goal behavior intention to the content of text of editing area.The embodiment of the present invention By the identification being intended to user speech behavior, the study of behavior intention and feedback determination process are integrated in server-side, mentioned High candidate behavior is intended to determining accuracy, avoids user terminal itself to the learning process of user speech complexity, Jin Ertong It crosses and obtains the candidate behavior intention that server-side determines, realize the matching of voice writing instruction in user terminal, improve user terminal Response efficiency and the accuracy of instruction are write to user.And picture searching and load are carried out based on phonetic control command, it improves The rich and picture of content of text adds efficiency in writing, meets any demand of the user in writing, is user Bring good voice writing experience.

Embodiment three

Fig. 6 is a kind of structural schematic diagram for voice writing device that the embodiment of the present invention three provides, and the present embodiment is applicable The case where voice writing is with content of text editor is carried out by voice control in user, which, which can realize that the present invention is any, implements Voice writing method described in example.The device specifically includes:

Voice conversion module 610, in user speech writing process, the user speech information that will acquire to be converted to use Family text information；

Intention assessment module 620, for the user version information to be matched with candidate behavior intention, and will matching The candidate behavior arrived is intended to be intended to as goal behavior；

Text editing module 630 edits the content of text of editing area for being intended to according to the goal behavior.

Optionally, the candidate behavior is intended to include that insert pictures are intended to, word content modification is intended to and text formatting is repaired Change at least one of intention；

The word content modification intention is intended to including content deletion and/or content replaces sub- intention；

The text formatting modification is intended to include that punctuate is intended to, line feed is intended to, overstriking is intended to, separator is intended to With reference son be intended at least one of.

Optionally, the text editing module 630 is specifically used for:

It is intended to if the goal behavior is intended that the insert pictures, picture processing component is called, according to the user Text information determines image credit and picture screening conditions；

According to the picture screening conditions, the Target Photo being inserted into is obtained from the image credit；

The Target Photo is inserted into the content of text of editing area.

Optionally, the text editing module 630 is specifically used for:

Delete son if the goal behavior is intended that the content and be intended to, according to the user version information, determine to Delete initial position and the final position of content；

According to the initial position of the content to be deleted and final position, the content of text of editing area is executed and deletes behaviour Make.

Further, described device further includes punctuation mark adding module 640；The punctuation mark adding module 640 has Body is used for:

After the user speech information that will acquire is converted into user version information, to the user version information into Row processing；

According to processing result, punctuate class punctuation mark, punctuation marks used to enclose the title and the emotion for including in the user version information are determined At least one of class punctuation mark.

Optionally, the user version information and candidate behavior are intended to carry out matched trigger condition to be the user A upper user version information before text information is to wake up text.

Further, described device further includes that candidate behavior is intended to obtain module 650；Candidate's behavior is intended to obtain mould Block 650 is specifically used for:

It is described the user version information is matched with candidate behavior intention before, obtain user's from server-side Candidate behavior is intended to；

Correspondingly, described device further includes user version information feedback module 660；The user version information feedback module 660 are specifically used for:

After the candidate behavior that will match to is intended to be intended to as goal behavior, the target is sent to server-side Behavior is intended to associated user version information, and by server-side, user version information determines that the candidate behavior of user is anticipated based on the received Figure.

The technical solution of the present embodiment is realized candidate behavior and is intended to by the mutual cooperation between each functional module Acquisition, the conversion of user speech, the identification of converting text, the wake-up of edit operation, the addition of punctuation mark, text formatting The operation such as feedback of modification, the insertion of picture and user version information.Correspondingly, server-side realizes what candidate behavior was intended to It determines, and according to the user version information of feedback, it is dynamic to candidate behavior redefining of being intended to be verified, updated or added Make.The embodiment of the present invention by the study of behavior intention and feeds back determination process collection by the identification being intended to user speech behavior It is intended to determining accuracy at candidate behavior in server-side, is improved, avoids user terminal itself to user speech complexity Learning process, and then be intended to by obtaining the candidate behavior that server-side determines, of voice writing instruction is realized in user terminal Match, improves response efficiency and accuracy that user terminal writes instruction to user.And picture is carried out based on phonetic control command and is searched Rope and load improve the rich and picture addition efficiency of content of text in writing, meet user in writing Any demand brings good voice writing experience for user.

Example IV

Fig. 7 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides, and Fig. 7, which is shown, to be suitable for being used to realizing this The block diagram of the exemplary terminal of inventive embodiments embodiment.The terminal that Fig. 7 is shown is only an example, should not be to the present invention The function and use scope of embodiment bring any restrictions.

The terminal 12 that Fig. 7 is shown is only an example, should not function to the embodiment of the present invention and use scope bring Any restrictions.

As shown in fig. 7, terminal 12 is showed in the form of universal computing device.The component of terminal 12 may include but unlimited In one or more processor 16, system storage 28, different system components (including system storage 28 and processing are connected Device 16) bus 18.

Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Terminal 12 typically comprises a variety of computer system readable media.These media can be it is any can be by terminal 12 The usable medium of access, including volatile and non-volatile media, moveable and immovable medium.

System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (RAM) 30 and/or cache memory 32.Terminal 12 may further include it is other it is removable/nonremovable, Volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing irremovable , non-volatile magnetic media (Fig. 7 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 7, use can be provided In the disc driver read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can To be connected by one or more data media interfaces with bus 18.System storage 28 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention real Apply the function of each embodiment of example.

Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system In device 28, such program module 42 includes but is not limited to operating system, one or more application program, other program modules And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 42 Usually execute the function and/or method in described embodiment of the embodiment of the present invention.

Terminal 12 can also be communicated with one or more exterior terminals 14 (such as keyboard, sensing equipment, display 24 etc.), Can also be enabled a user to one or more equipment interacted with the terminal 12 communication, and/or with enable the terminal 12 with One or more of the other any equipment (such as network interface card, modem etc.) communication for calculating equipment and being communicated.It is this logical Letter can be carried out by input/output (I/O) interface 22.Also, terminal 12 can also by network adapter 20 and one or The multiple networks of person (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown, Network adapter 20 is communicated by bus 18 with other modules of terminal 12.It should be understood that although not shown in the drawings, can combine Terminal 12 uses other hardware and/or software module, including but not limited to: microcode, device driver, redundant processor, outer Portion's disk drive array, RAID system, tape drive and data backup storage system etc..

The program that processor 16 is stored in system storage 28 by operation, thereby executing various function application and number According to processing, such as realize voice writing method provided by the embodiment of the present invention.

Embodiment five

The embodiment of the present invention five also provides a kind of computer readable storage medium, be stored thereon with computer program (or For computer executable instructions), for executing a kind of voice writing method when which is executed by processor, this method comprises:

The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

Can with one or more programming languages or combinations thereof come write for execute the embodiment of the present invention operation Computer program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program design language Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or It is executed in terminal.In situations involving remote computers, remote computer can pass through the network of any kind --- including Local area network (LAN) or wide area network (WAN)-are connected to subscriber computer, or, it may be connected to outer computer (such as using ISP is connected by internet).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being implemented by above embodiments to the present invention Example is described in further detail, but the embodiment of the present invention is not limited only to above embodiments, is not departing from structure of the present invention It can also include more other equivalent embodiments in the case where think of, and the scope of the present invention is determined by scope of the appended claims It is fixed.

Claims

1. a kind of voice writing method characterized by comprising

The user version information is intended to the candidate behavior that candidate behavior intention is matched, and be will match to as target Behavior is intended to；

2. the method according to claim 1, wherein candidate's behavior is intended to include that insert pictures are intended to, are literary At least one of word content modification is intended to and text formatting modification is intended to；

The text formatting modification is intended to include that punctuate is intended to, line feed is intended to, overstriking is intended to, separator is intended to and draws With at least one of sub- intention.

3. according to the method described in claim 2, it is characterized in that, if the goal behavior is intended that the insert pictures meaning Figure, then it is described to be intended to according to the goal behavior, the content of text of editing area is edited, comprising:

Picture processing component is called, image credit and picture screening conditions are determined according to the user version information；

The Target Photo is inserted into the content of text of editing area.

4. according to the method described in claim 2, it is characterized in that, if the goal behavior is intended that the content and deletes son meaning Figure, then it is described to be intended to according to the goal behavior, the content of text of editing area is edited, comprising:

According to the user version information, initial position and the final position of content to be deleted are determined；

According to the initial position of the content to be deleted and final position, delete operation is executed to the content of text of editing area.

5. the method according to claim 1, wherein being converted into user in the user speech information that will acquire After text information, further includes:

The user version information is handled；

According to processing result, punctuate class punctuation mark, punctuation marks used to enclose the title and the emotion category for including in the user version information are determined At least one of point symbol.

6. the method according to claim 1, wherein the user version information and candidate behavior are intended to carry out Matched trigger condition is that the upper user version information before the user version information is to wake up text.

7. the method according to claim 1, wherein the user version information and candidate behavior are anticipated described Before figure is matched, further includes: the candidate behavior for obtaining user from server-side is intended to；

Correspondingly, after the candidate behavior that will match to is intended to be intended to as goal behavior, further includes: sent out to server-side The goal behavior is sent to be intended to associated user version information, user version information determines user's based on the received by server-side Candidate behavior is intended to.

8. a kind of voice writing device characterized by comprising

Voice conversion module, in user speech writing process, the user speech information that will acquire to be converted to user version Information；

Intention assessment module, for the user version information to be intended to the time that matches, and will match to candidate behavior Behavior is selected to be intended to be intended to as goal behavior；

9. a kind of terminal characterized by comprising

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as voice writing method of any of claims 1-7.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor Such as voice writing method of any of claims 1-7 is realized when execution.