CN106570106A

CN106570106A - Method and device for converting voice information into expression in input process

Info

Publication number: CN106570106A
Application number: CN201610935621.8A
Authority: CN
Inventors: 门文
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-11-01
Filing date: 2016-11-01
Publication date: 2017-04-19

Abstract

The invention provides a method and device for converting voice information into an expression in an input process. The method comprises the following steps of: obtaining voice information input through user equipment, performing voice analysis of the voice information, extracting a voice keyword in the voice information, according to the voice keyword, matching to obtain at least one expression corresponding to the voice keyword, and providing the at least one expression for the user equipment. By means of the method and the device provided by the invention, expression association and translation of the voice information can be realized; the voice information can be converted into the expression for appropriately expressing the voice information; the voice input process is relatively abundant and various; the application input interestingness in the user equipment is enhanced; the application input intellectuality is enhanced; and thus, the use experience of users is improved.

Description

A kind of method and apparatus that voice messaging is converted into into expression in input process

Technical field

The present invention relates to voice messaging is converted into into expression in field of computer technology, more particularly to a kind of input process Technology.

Background technology

In existing input Related product, phonetic entry process is relatively uninteresting, the language that prior art simply sends user Message breath is recorded and directly transmitted away, or voice messaging is identified as to send after text message, do not have into The more associations of row so that user's no more selections when phonetic entry is carried out, it is interesting not high, easily allow user to produce Boredom.And user is being inconvenient to select expression, can only be by sending voice and being input in the case of, increased user Operation difficulty, bring great inconvenience to the input process of user.

Therefore, how to provide in a kind of input process and be converted into the technology of expression to improve user speech by voice messaging The interest of input so that phonetic entry process is more rich and varied, the technology for becoming those skilled in the art's urgent need to resolve is asked One of topic.

The content of the invention

It is an object of the invention to provide the method and apparatus that voice messaging is converted into into expression in a kind of input process.

According to an aspect of the invention, there is provided a kind of method that voice messaging is converted into into expression in input process, Wherein, the method includes：

A. the voice messaging being input into via user equipment is obtained；

B. speech analysises are carried out to the voice messaging, extracts the voice key word in the voice messaging；

C. according to the voice key word, matching obtains at least one expression corresponding with the voice key word；

D. described at least one expression is provided to the user equipment.

Preferably, step d includes：

Obtain at least one expression that user selects from described at least one expression；

Selected at least one expression of the user is provided to the user equipment.

Preferably, step d includes：

Described at least one expression and the voice messaging are provided together to the user equipment.

Preferably, step b includes：

Based on the speech analysises to the voice messaging, the voice fields in the voice messaging are extracted, as The voice key word.

Preferably, step c includes：

According to the voice key word, the matching in expression storehouse obtains at least one table corresponding with the voice key word Feelings, wherein, the mapping relations of the pre-stored voice field that is stored with the expression storehouse and corresponding expression.

Preferably, step c includes：

By the phonetic feature of the voice key word respectively with the expression storehouse in pre-stored voice field phonetic feature Matched, obtained corresponding matching degree；

It is if the matching degree reaches predetermined threshold, the corresponding expression of the pre-stored voice field is crucial as the voice The corresponding expression of word.

Preferably, the phonetic feature includes following at least any one：

Semantic feature；

Word speed feature；

Intonation feature.

Preferably, step b includes：

Based on the speech recognition to the voice messaging, the voice messaging is converted into into corresponding text message；

Based on the semantic analysis to the text message, the text key word in the text message is extracted, as institute Predicate sound key word.

Preferably, step d includes：

Described at least one expression and the text message are provided together to the user equipment.

According to another aspect of the present invention, additionally provide in a kind of input process and voice messaging is changed into into turning for expression Changing device, wherein, the conversion equipment includes：

Acquisition device, for obtaining the voice messaging being input into via user equipment；

Extraction element, for carrying out speech analysises to the voice messaging, the voice extracted in the voice messaging is closed Keyword；

Coalignment, for according to the voice key word, matching to obtain corresponding with the voice key word at least one Individual expression；

Offer device, for described at least one expression to be provided to the user equipment.

Preferably, the offer device is used for：

Selected at least one expression of the user is provided to the user equipment.

Preferably, the offer device is used for：

Preferably, the extraction element is used for：

Preferably, the coalignment is used for：

Preferably, the phonetic feature includes following at least any one：

Semantic feature；

Word speed feature；

Intonation feature.

Preferably, the extraction element includes：

Recognition unit, for based on the speech recognition to the voice messaging, the voice messaging being converted into corresponding Text message；

Extraction unit, for based on the semantic analysis to the text message, extracting the text in the text message Key word, as the voice key word.

Preferably, the offer device is used for：

Compared with prior art, the present invention has advantages below：

The present invention extracts corresponding voice key word by the way that the voice messaging of acquisition is carried out into speech analysises, and according to institute Predicate sound Keywords matching obtains at least one expression corresponding with the voice key word, there is provided to the user equipment, supply User is selected to shield and is sent, and realizes the association and translation expressed one's feelings to voice messaging, and changing into being capable of proper table Show the expression of the voice messaging, make phonetic entry process more rich and varied, enhance in user equipment using the interest of input Taste, increased using the intelligent of input, improve the experience of user.

Further, the voice fields during the present invention is by extracting voice messaging are used as the voice key word, in expression According to the voice fields in storehouse, and matched according to predetermined matched rule, obtained corresponding with the voice key word At least one expression, by the corresponding expression of voice fields direct correlation, improves the efficiency and accurately of the correspondence expression of matching Degree, improves the experience of user.

Further, the present invention by the way that the voice messaging is converted into into text message, and can also extract text pass Keyword, corresponding expression is matched according to the text key word in corresponding text-Expression Mapping data base, enriches basis The mode of voice messaging matching correspondence expression, user can be the voice letter in different data bases by different modes The corresponding expression of breath matching, extends the scope and mode matched for the voice messaging, improves the accuracy of matching.

Description of the drawings

By reading the detailed description made to non-limiting example made with reference to the following drawings, other of the invention Feature, objects and advantages will become more apparent upon：

Voice messaging is converted into the converting means of expression in a kind of input process that Fig. 1 illustrates according to one aspect of the invention The structural representation put；

The method that voice messaging is converted into into expression in a kind of input process that Fig. 2 illustrates according to a further aspect of the present invention Schematic flow sheet.

Same or analogous reference represents same or analogous part in accompanying drawing.

Specific embodiment

It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The process described as flow chart or method.Although operations to be described as flow chart the process of order, therein to be permitted Multioperation can by concurrently, concomitantly or while implement.Additionally, the order of operations can be rearranged.When it Process when operation is completed can be terminated, it is also possible to have the additional step being not included in accompanying drawing.The process Can correspond to method, function, code, subroutine, subprogram etc..

Alleged within a context " computer equipment ", also referred to as " computer ", referring to by running preset program or can refer to Make performing the intelligent electronic device of the predetermined process process such as numerical computations and/or logical calculated, its can include processor with Memorizer, the survival instruction prestored in memory by computing device performing predetermined process process, or by ASIC, The hardware such as FPGA, DSP perform predetermined process process, or are combined to realize by said two devices.Computer equipment includes but does not limit In server, PC, notebook computer, panel computer, smart mobile phone etc..

The computer equipment includes user equipment and the network equipment.Wherein, the user equipment includes but is not limited to electricity Brain, smart mobile phone, PDA etc.；The network equipment includes but is not limited to what single network server, multiple webservers were constituted Server group or the cloud being made up of a large amount of computers or the webserver based on cloud computing (Cloud Computing), wherein, Cloud computing is one kind of Distributed Calculation, a super virtual computer being made up of the loosely-coupled computer collection of a group.Its In, the computer equipment can isolated operation can access realizing the present invention, also network and by with network in other calculating The interactive operation of machine equipment is realizing the present invention.Wherein, the network residing for the computer equipment include but is not limited to the Internet, Wide area network, Metropolitan Area Network (MAN), LAN, VPN etc..

It should be noted that the user equipment, the network equipment and network etc. are only for example, other are existing or from now on may be used The computer equipment or network that can occur such as is applicable to the present invention, also should be included within the scope of the present invention, and to draw It is incorporated herein with mode.

Method (some of them are illustrated by flow process) discussed hereafter can pass through hardware, software, firmware, centre Part, microcode, hardware description language or its combination in any are implementing.When being implemented with software, firmware, middleware or microcode When, can be stored in machine to the program code or code segment of implementing necessary task or computer-readable medium (is such as deposited Storage media) in.(one or more) processor can implement necessary task.

Concrete structure disclosed herein and function detail are only representational, and are for describing showing for the present invention The purpose of example property embodiment.But the present invention can be implemented by many alternative forms, and be not interpreted as It is limited only by the embodiments set forth herein.

Although it should be appreciated that may have been used term " first ", " second " etc. here to describe unit, But these units should not be limited by these terms.It is used for the purpose of a unit and another unit using these terms Make a distinction.For example, in the case of the scope without departing substantially from exemplary embodiment, it is single that first module can be referred to as second Unit, and similarly second unit can be referred to as first module.Term "and/or" used herein above include one of them or Any and all combination of more listed associated items.

It should be appreciated that when a unit is referred to as " connection " or during " coupled " to another unit, it can directly connect Another unit is connect or be coupled to, or there may be temporary location.On the other hand, when a unit is referred to as " directly connecting Connect " or " direct-coupling " arrive another unit when, then there is no temporary location.Should in a comparable manner explain and be used to retouch State relation between unit other words (such as compared to " between being directly in ... " " between being in ... ", " and with ... it is adjacent Closely " compared to " with ... be directly adjacent to " etc.).

Term used herein above is not intended to limit exemplary embodiment just for the sake of description specific embodiment.Unless Context clearly refers else, and singulative " one " otherwise used herein above, " one " also attempt to include plural number.Should also When being understood by, term " including " used herein above and/or "comprising" specify stated feature, integer, step, operation, The presence of unit and/or component, and do not preclude the presence or addition of one or more other features, integer, step, operation, unit, Component and/or its combination.

It should further be mentioned that in some replaces realization modes, the function/action being previously mentioned can be according to different from attached The order indicated in figure occurs.For example, depending on involved function/action, the two width figures for illustrating in succession actually may be used Substantially simultaneously to perform or can perform in a reverse order sometimes.

The present invention is described in further detail below in conjunction with the accompanying drawings.

Voice messaging is converted into the converting means of expression in a kind of input process that Fig. 1 illustrates according to one aspect of the invention The structural representation put.Conversion equipment 1 includes：Acquisition device 101, extraction element 102, coalignment 103 and offer device 104。

Here, conversion equipment 1 for example may be located in computer equipment, the computer equipment includes user equipment and net Network equipment.When the conversion equipment 1 is located at the network equipment, it is in communication with each other between user equipment by network, is obtained The voice messaging that user is input into via user equipment, to the voice messaging speech analysises are carried out, and extract the voice messaging In voice key word, according to the voice key word, matching obtains at least one expression corresponding with the voice key word, Described at least one expression is provided to the user equipment.The user equipment includes but is not limited to PC, portable Computer, panel computer, smart mobile phone, PDA.Those skilled in the art will be understood that above-mentioned user equipment is only for example, existing Or the user equipment being likely to occur from now on is such as applicable to the present invention and also should be included in the scope of the present invention, and to quote Mode be incorporated herein.

It is described in detail so that the conversion equipment 1 is located in user equipment as an example below.

Wherein, acquisition device 101 obtains the voice messaging being input into via user equipment.Specifically, acquisition device 101 passes through With interacting for the user equipment, such as one or many call input application provide application programming interfaces (API), obtain via with The voice messaging of family equipment input, for example, user by interacting between user equipment, by phonetic entry work thereon Tool, such as mike, real-time input voice information, when the user equipment is the mobile terminals such as mobile phone, user can send out in real time Go out voice messaging, acquisition device 101 obtains the voice messaging that user is input in real time via user equipment, for example, user is by length The phonetic entry button for pressing input application starts phonetic entry program, then sends voice messaging, when input is finished, Yong Husong The button is driven, acquisition device 101 stops obtaining voice messaging.It is preferred that acquisition device 101 can be by calling user equipment In the voice messaging that prestores obtaining the voice messaging that user pre-enters via user equipment, or obtain and set via user The standby voice messaging for prestoring, voice messaging that the audio file or user for example downloaded by network was once input into is deposited Storage file.

Those skilled in the art will be understood that the mode of above-mentioned acquisition voice messaging is only for example, and other are existing or from now on The mode of the acquisition voice messaging being likely to occur such as is applicable to the present invention, also should be included within the scope of the present invention, and Here is incorporated herein by reference.

102 pairs of voice messagings of extraction element carry out speech analysises, and the voice extracted in the voice messaging is crucial Word.Specifically, extraction element 102 by way of voice segmentation, speech recognition or both combine to the voice messaging Speech analysises are carried out, the voice key word in the voice messaging is extracted, wherein, the key word in the voice key word can With all phonetic matrixs or text formatting, or for the combination of two kinds of forms.For example, extraction element 102 is by the voice Information carries out voice segmentation, extracts the voice key word of phonetic matrix, the voice key word that conversion equipment 1 passes through phonetic matrix Match corresponding expression；Or the voice messaging is carried out speech recognition by extraction element 102, speech text is changed into, extracted Go out the voice key word of text formatting, the corresponding expression of voice Keywords matching that conversion equipment 1 passes through text formatting；Or carry The voice messaging is carried out respectively voice segmentation and speech recognition to take device 102, is extracted corresponding both including phonetic matrix Again including the voice key word of text formatting.Further, for the negative word in the voice messaging or there is Negation Word, to be identified processing when voice segmentation or speech recognition is carried out, it is to avoid the voice key word that extracts and The contrary situation of the voice messaging meaning.

Preferably, the phase that extraction element 102 is stored by the way of fuzzy matching by network or the user equipment It is that the voice messaging matches accurate voice key word to close data base, for example, when the voice messaging that acquisition device 101 is obtained is When " my thin, the Lentinus Edodess of good indigo plant ", extraction element 102 is included by the voice key word that speech analysises are extracted：Voice " lanshou " " xianggu " (when the voice key word is phonetic matrix) or text " blue thin " and " Lentinus Edodess " (described voice key word For text formatting when), can be by way of fuzzy matching, according to the speech database for pre-setting, by voice " lanshou " " xianggu " respectively matching is voice " nanshou " and " xiangku ", or according to the text database for pre-setting, will Text " blue thin " and " Lentinus Edodess " matching are text " feeling bad " and " wanting to cry ", then " will feel bad " and " wanting to cry " adds the voice to close Keyword is extended, or is replaced " blue thin " and " Lentinus Edodess " in voice key word with " feeling bad " and " wanting to cry ", improves institute The degree of accuracy of predicate sound key word.Wherein, the speech database is stored with the voice fields of conventional key word, the textual data According to the conventional text key word that is stored with storehouse, the speech database and text database may be located in the user equipment, Can also be located in the third party device being connected by network with the user equipment, the speech database and the textual data Can be updated or resequence according to the usage record of user and usage frequency according to storehouse.

Preferably, extraction element 102 understands the emotion and meaning implied in the voice messaging by the cognitive method for calculating Think, improve the goodness of fit of extracted voice key word and the voice messaging.For example, the voice that acquisition device 101 is obtained Information is：" you say the such thing of generation, and I can be happily", by carrying out speech analysises to this voice messaging, it is easy to Draw comprising " happy " this key word in corresponding voice key word, but voice messaging expression is " unhappy " Information, extraction element 102 can be obtained by the tone of the voice messaging, intonation or before by the cognitive method for calculating The speech information content for taking, sums up " unhappy " this key word, and so as to improve extraction element 102 the voice pass is extracted The accuracy of keyword.

Those skilled in the art will be understood that it is above-mentioned the mode that the voice messaging carries out speech analysises is only for example, The mode of speech analysises that is existing or being likely to occur from now on is such as applicable to the present invention, should all be included in the protection of the present invention In the range of, and here is incorporated herein by reference.

Here, 102 pairs of voice messagings of extraction element carry out speech analysises, the voice for extracting different-format is crucial Word, improves accuracy and motility that conversion equipment 1 obtains corresponding expression by the voice key word.Preferably, obtain Device 102 adopts fuzzy matching and the cognitive mode for calculating comprehensively to match voice key word for the voice messaging, improves The accuracy of the voice key word.

Coalignment 103 is obtained and the voice key word corresponding at least one according to the voice key word, matching Expression.Specifically, the voice key word that coalignment 103 is extracted according to extraction element 102, in corresponding data base, For example, expression storehouse, matches and obtains at least one expression corresponding with the voice key word, when the voice key word is language During the key word of sound form, coalignment 103 leads to according to the voice key word in voice fields-Expression Mapping data base Cross, for example, fuzzy matching obtains at least one expression corresponding with the voice key word；When the voice key word is text During the key word of this form, coalignment 103 passes through, example according to the voice key word in text-Expression Mapping data base Such as, association matching, obtains at least one expression corresponding with the voice key word.Further, coalignment 103 is according to institute Predicate sound key word, matches at least one expression corresponding with the voice key word in history match data base, wherein, institute The conversion equipment 1 that is stored with history match data base is stated by voice key word and the matching historical record of corresponding expression, is matched Device 103 can be by the voice key word and the voice key word of the storage in the history match data base and corresponding expression Matching historical record, be that the voice key word matches corresponding expression；If do not exist in the history match data base with During the corresponding expression of the voice key word, coalignment 103 again in corresponding data base, for example, storehouse of expressing one's feelings, matching and institute The corresponding expression of predicate sound key word, obtains at least one expression corresponding with the voice key word.Wherein, the expression bag Include but be not limited to：The forms such as emoji, face word, gif figures.

Those skilled in the art will be understood that above-mentioned matching way is only for example, existing or be likely to occur from now on If matching way can be used for the present invention all should include within the scope of the present invention, and here would be contained in by reference This.

Offer device 104 provides described at least one expression to the user equipment.Specifically, coalignment 103 After with acquisition at least one expression corresponding with the voice key word, there is provided device 104 is for example called by one or many The dynamic page technology such as JSP, ASP or PHP, by described at least one expression by display frames with horizontally-arranged or vertical setting of types side Formula is displayed on the screen of user equipment and is supplied to user, is selected for user, wherein, at least one expression can be by Shown on the screen of user equipment according to user's history usage frequency, user by clicking one or while can click Multiple expressions.Or, there is provided device 104 directly can provide at least one expression to user equipment, carry out as information Upper screen is directly transmitted.Preferably, there is provided device 104 is by the voice messaging or the corresponding text message of the voice messaging It is shown in being supplied to user in the display frames, user can select to be gone up in screen or transmission according to self-demand Hold.

Those skilled in the art will be understood that above-mentioned presentation mode is only for example, and other are existing or are likely to occur from now on New presentation mode is such as applicable to the present invention, also should be included within the scope of the present invention, and here is wrapped by reference It is contained in this.

Here, conversion equipment 1 extracts corresponding voice key word by the way that the voice messaging of acquisition is carried out into speech analysises, And at least one expression corresponding with the voice key word is obtained according to the voice Keywords matching, there is provided to the user Equipment, is selected to shield and send for user, realizes the association and translation expressed one's feelings to voice messaging, and changing into can Appropriateness represents the expression of the voice messaging, makes phonetic entry process more rich and varied, enhances in user equipment using defeated The interest for entering, increased using the intelligent of input, improve the experience of user.

Preferably, there is provided device 104 obtains at least one expression that user selects from described at least one expression, by institute State selected at least one expression of user to provide to the user equipment.Specifically, in offer device 104 by least one table Feelings are supplied to after user, and user can click directly on wherein any one expression and be selected, or, there is provided device 104 exists The side of each expression respectively arranges one and clicks frame, and user can click one or while click multiple expressions, select to finish Afterwards by screen or send button in click, selection result is provided to offer device 104, there is provided device 104 obtains user's choosing One or more expression selected, one or more of expressions are provided to be carried out to user equipment or by user equipment Send.Here, if the user desired that user equipment application input in be only input into expression corresponding with the voice messaging, then Only by it is one expression either it is multiple expression provide be transmitted to user equipment or by user equipment, and do not provide or Person sends other kinds of information, for example, the type information such as voice or text.

Preferably, there is provided device 104 together provides described at least one expression and the voice messaging to the user Equipment.Specifically, if user wants at least one table that the matching of coalignment 103 is sent while the voice messaging is sent Feelings, then after at least one expression that the acquisition user of offer device 104 selects from described at least one expression, directly by described in extremely A few expression and the voice messaging are provided together to the user equipment, or provide the user choice box, and user can be with Choose whether with described at least one expression together to provide the voice messaging to the user equipment by the mode such as clicking.

Preferably, extraction element 102 extracts the voice messaging based on the speech analysises to the voice messaging In voice fields, as the voice key word.Specifically, acquisition device 102 to the voice messaging by carrying out automatically Voice is split, and the voice messaging is disassembled into into multiple voice fields, using the plurality of voice fields as voice key word, or Person is analyzed the plurality of voice fields after process, weeds out wherein apparently without practical significance or does not correspond to therewith Expression at least one voice fields, for example " ", " ", the voice fields such as " ", by remaining at least one voice word Duan Zuowei voice key words；Further, extraction element 102 is analyzed and is managed by cognitive calculating to the voice messaging Solve, implied by voice messaging described in the tone, intonation and word speed intelligent decision according to the voice messaging, from extension At least one extended voice field identical with the voice messaging meaning is called in sound bank, by described at least one language is extended Sound field is added into the voice key word and is extended.Extended voice stock contains the voice word corresponding with commonly using expression Section, wherein, the extended voice storehouse may be located in the user equipment, it is also possible to positioned at being connected by network with the user equipment In the third party device for connecing, the extended voice storehouse can be updated or weight according to the usage record of user and usage frequency New sort.

Preferably, according to the voice key word, the matching in expression storehouse obtains crucial with the voice to coalignment 103 Corresponding at least one expression of word, wherein, the mapping relations of the pre-stored voice field that is stored with the expression storehouse and corresponding expression. Specifically, by the corresponding relation for counting a large amount of voice fields with express one's feelings, voice fields are stored in into table with the mapping relations of expression In feelings storehouse, express one's feelings storehouse in the voice fields as pre-stored voice field, wherein, expression storehouse in every kind of expression can correspond at least One pre-stored voice field, for example, expression" happy ", " happiness ", " heartily ", " making me laugh to death ", " buoyant can be corresponded to " ... wait voice fields, a voice fields to correspond at least one expression, for example, " sad " this voice fields can be with The following expression of correspondence：Coalignment 103, can be from expression storehouse according to voice key word Matching obtains an expression or multiple expressions, there is provided to user equipment, selected for user.Preferably, the expression storehouse In be not only stored with the expression of above-mentioned expression mood, the expression of be also stored with expression action or article.Preferably, coalignment 103 according at least one of voice key word voice fields, institute in matching with the voice key word in the expression storehouse The similar or close pre-stored voice field of at least one voice fields voice content is stated, voice content is similar or close Corresponding one or more expression of pre-stored voice field is provided to the user equipment.

Coalignment 103 by the phonetic feature of the voice key word respectively with the expression storehouse in pre-stored voice field Phonetic feature matched, obtain corresponding matching degree；If the matching degree reaches predetermined threshold, by the pre-stored voice word The corresponding expression of section is used as the corresponding expression of the voice key word.Specifically, when the voice key word is phonetic matrix Voice fields, when the pre-stored voice field expressed one's feelings in storehouse is the pre-stored voice field of phonetic matrix, coalignment 103 is according to certain Matched rule, judge that the voice of voice fields prestored in the phonetic feature of voice fields described at least one and expression storehouse is special Matching degree between levying, judges in the expression storehouse with the presence or absence of corresponding with the voice fields according to the size of the matching degree Expression.If phonetic feature certain voice word corresponding with the voice messaging of certain predetermined voice field A in expression storehouse Matching degree between the phonetic feature of section reaches predetermined threshold, and such as 90%, then exist and the voice fields pair in storehouse of expressing one's feelings The expression answered, predetermined voice field A corresponding at least one expression matching in expression storehouse is voice messaging correspondence At least one expression.Wherein, the phonetic feature is included but is not limited to：Semantic feature；Word speed feature；Intonation feature.Specifically Ground, coalignment 103 can be matched, if institute according to the semantic feature of the voice key word in the expression storehouse The semanteme and the semantic matching degree of certain pre-stored voice field B in expression storehouse for stating corresponding certain voice fields of voice messaging reaches Predetermined threshold, then the match is successful for corresponding at least one expression of the voice messaging.Preferably for the matched rule, can Through a large amount of samplings, calculating, to determine after being tested repeatedly, calculated by certain training pattern, and according to user Service condition carry out Policy Updates, for example, voice fields：" happiness " and " happy ", although be two different voices that pronounce Field, but both mean it is close, therefore, when the two voice fields are matched, matching degree can reach predetermined Threshold value, " happy " corresponding at least one expression in expression storehouse, can match as " happiness " corresponding expression, further, For example, have a meal, have breakfast, have lunch, although expression mean it is not quite alike, but can match and be during matching expressionTherefore the matching degree after being matched to above-mentioned three based on the matched rule is that comparison is high.Further, because Sometimes user's word speed, intonation when happy or unhappy is probably different, and possible some people also have local accent, Now not only need to match semantic this phonetic feature, also by " intonation " or " word speed " as the feature for referring to, Emotion when user sends the voice messaging is analyzed by cognitive calculating, it is relatively smart so as to match for the voice messaging True expression.

Here, extraction element 102 is used as the voice key word, matching dress by the voice fields in extraction voice messaging 103 are put in expression storehouse according to the voice fields, and is matched according to predetermined matched rule, obtained and closed with the voice Corresponding at least one expression of keyword, by the corresponding expression of voice fields direct correlation, improves the correspondence expression of matching Efficiency and accuracy, improve the experience of user.

Preferably, extraction element 102 includes the (not shown) of recognition unit 1021 and the (not shown) of extraction unit 1022.

The voice messaging is converted into corresponding text by recognition unit 1021 based on the speech recognition to the voice messaging This information.Specifically, recognition unit 1021 carries out feature extraction to the voice messaging (including to language by speech recognition technology Sound carries out sub-frame processing), acoustic model modeling is then carried out, voice knowledge is carried out to unknown voice frame sequence based on acoustic model Not, the voice messaging is converted into into corresponding text message.

Extraction unit 1022 is closed based on the semantic analysis to the text message, the text extracted in the text message Keyword, as the voice key word.Specifically, extraction unit 1022 adopts the participle technique of full cutting to the text message Participle is carried out, the text key word in the text message is extracted, as the voice key word.The basis of coalignment 103 The voice key word, the matching in text-Expression Mapping data base is obtained and the voice key word corresponding at least one Expression, such as by carrying out semantic analysis to the text key word in the text key word and the text-Expression Mapping storehouse, Judge whether both are semantic similar or close, obtain similarity, if similarity reaches predetermined threshold, such as 95%, then With success, the text key word can match the corresponding expression of acquisition in the text-expression data storehouse.Offer device 104 together provide described at least one expression and the text message to the user equipment.Specifically, if user be intended to by At least one expression and the text message are all provided to user equipment and are transmitted, then offer device 104 directly will At least one expression and the text message are provided to user equipment by way of upper screen, if user midway changes master Meaning, can be the content that carries out being selected by way of the operation such as deleting in input frame sending, or offer device 104 will be described At least one expression and text message are provided to user equipment in the way of choice box, and user can select according to their needs to need The content to be sent.

Here, conversion equipment 1 can also be by being converted into text message by the voice messaging, and it is crucial to extract text Word, corresponding expression is matched according to the text key word in corresponding text-Expression Mapping data base, is enriched according to language The mode of the corresponding expression of sound information matches, user can be the voice letter in different data bases by different modes The corresponding expression of breath matching, improves the accuracy of matching.

Dynamic updates the flow process signal of the method for application when Fig. 2 illustrates a kind of operation according to a further aspect of the present invention Figure.

Wherein, in step s 201, conversion equipment 1 obtains the voice messaging being input into via user equipment.Specifically, in step In rapid S201, by interacting with the user equipment, such as one or many calls the application journey that input application is provided to conversion equipment 1 Sequence interface (API), obtain via user equipment be input into voice messaging, for example, user by interacting between user equipment, By phonetic entry instrument thereon, such as mike, real-time input voice information, when the user equipment is that mobile phone etc. is mobile eventually During end, user can in real time send voice messaging, and in step s 201, it is real-time via user equipment that conversion equipment 1 obtains user The voice messaging of input, for example, user starts phonetic entry program, Ran Houfa by the phonetic entry button that length presses input application Go out voice messaging, when input is finished, user unclamps the button, in step s 201, conversion equipment 1 stops obtaining voice letter Breath.It is preferred that in step s 201, conversion equipment 1 can by calling user equipment in the voice messaging that prestores obtaining use The voice messaging that family pre-enters via user equipment, or the voice messaging prestored via user equipment is obtained, for example The storage file of the voice messaging that the audio file or user downloaded by network was once input into.

In step S202,1 pair of voice messaging of conversion equipment carries out speech analysises, in extracting the voice messaging Voice key word.Specifically, in step S202, conversion equipment 1 is combined by voice segmentation, speech recognition or both Mode speech analysises are carried out to the voice messaging, extract the voice key word in the voice messaging, wherein, institute's predicate Key word in sound key word can be with all phonetic matrixs or text formatting, or for the combination of two kinds of forms.For example, exist In step S202, the voice messaging is carried out voice segmentation by conversion equipment 1, extracts the voice key word of phonetic matrix, is turned The corresponding expression of voice Keywords matching that changing device 1 passes through phonetic matrix；Or in step S202, conversion equipment 1 is by institute Stating voice messaging carries out speech recognition, changes into speech text, extracts the voice key word of text formatting, and conversion equipment 1 leads to Cross the corresponding expression of voice Keywords matching of text formatting；Or in step S202, conversion equipment 1 is by the voice messaging Voice segmentation and speech recognition are carried out respectively, are extracted corresponding not only including phonetic matrix but also crucial including the voice of text formatting Word.Further, for the negative word in the voice messaging or there is the word of Negation, carrying out voice segmentation or language Sound will be identified processing when recognizing, it is to avoid the voice key word for the extracting feelings contrary with the voice messaging meaning Condition.

Preferably, in step S202, conversion equipment 1 is set by the way of fuzzy matching by network or the user The Relational database of standby storage is that the voice messaging matches accurate voice key word, for example, when conversion equipment 1 is in step When the voice messaging obtained in S201 is " my thin, the Lentinus Edodess of good indigo plant ", in step S202, conversion equipment 1 is carried by speech analysises The voice key word of taking-up includes：Voice " lanshou " and " xianggu " (when the voice key word is phonetic matrix) or Text " blue thin " and " Lentinus Edodess " (when the voice key word is text formatting), can be by way of fuzzy matching, according to pre- The speech database for first arranging, voice " lanshou " and " xianggu " are matched respectively for voice " nanshou " and " xiangku ", or according to the text database for pre-setting, by text " blue thin " and " Lentinus Edodess " matching be text " feeling bad " and " wanting to cry ", then " will feel bad " and " wanting to cry " adds the voice key word to be extended, or will with " feeling bad " and " wanting to cry " " blue thin " and " Lentinus Edodess " in voice key word is replaced, and improves the degree of accuracy of the voice key word.Wherein, the voice number The voice fields of conventional key word are contained according to stock, be stored with conventional text key word in the text database, institute's predicate Sound data base and text database may be located in the user equipment, it is also possible to positioned at being connected by network with the user equipment Third party device in, the speech database and the text database can be according to the usage record of user and usage frequencies It is updated or resequences.

Preferably, in step S202, conversion equipment 1 understands hidden in the voice messaging by the cognitive method for calculating The emotion for containing and the meaning, improve the goodness of fit of extracted voice key word and the voice messaging.For example, in step S201 In, the voice messaging that conversion equipment 1 is obtained is：" you say the such thing of generation, and I can be happily", by this voice letter Breath carries out speech analysises, it is easy to draws and include in corresponding voice key word " happy " this key word, but the voice Information representation be " unhappy " information, in step S202, conversion equipment 1 can be passed through by the cognitive method for calculating The tone of the voice messaging, intonation or the speech information content for obtaining before, sum up " unhappy " this key word, from And improve in step S202, conversion equipment 1 extracts the accuracy of the voice key word.

Here, in step S202,1 pair of voice messaging of conversion equipment carries out speech analysises, extracts different-format Voice key word, improve accuracy and motility of the conversion equipment 1 by the voice key word corresponding expression of acquisition.It is excellent Selection of land, in step S202, conversion equipment 1 adopts fuzzy matching and the cognitive mode for calculating comprehensively for the voice messaging With voice key word, the accuracy of the voice key word is improve.

In step S203, conversion equipment 1 obtains corresponding with the voice key word according to the voice key word, matching At least one expression.Specifically, in step S203, conversion equipment 1 according in step S202, what conversion equipment 1 was extracted The voice key word, in corresponding data base, for example, storehouse of expressing one's feelings matches and obtains corresponding with the voice key word At least one expression, when key word of the voice key word for phonetic matrix, in step S203, the basis of conversion equipment 1 The voice key word, by, for example, fuzzy matching is obtained and closed with the voice in voice fields-Expression Mapping data base Corresponding at least one expression of keyword；When key word of the voice key word for text formatting, in step S203, conversion Device 1 according to the voice key word, in text-Expression Mapping data base by, for example, association matching, obtain with it is described Corresponding at least one expression of voice key word.Further, in step S203, conversion equipment 1 is crucial according to the voice Word, matches at least one expression corresponding with the voice key word in history match data base, wherein, the history match The conversion equipment 1 that is stored with data base in step S203, is turned by voice key word and the matching historical record of corresponding expression Changing device 1 can be by the voice key word and the voice key word of the storage in the history match data base and corresponding expression Matching historical record, be that the voice key word matches corresponding expression；If do not exist in the history match data base with During the corresponding expression of the voice key word, in step S203, conversion equipment 1 again in corresponding data base, for example, expression Storehouse, matches expression corresponding with the voice key word, obtains at least one expression corresponding with the voice key word.Its In, the expression is included but is not limited to：The forms such as emoji, face word, gif figures.

In step S204, conversion equipment 1 provides described at least one expression to the user equipment.Specifically, exist In step S203, the matching of conversion equipment 1 is obtained after at least one expression corresponding with the voice key word, in step S204 In, conversion equipment 1 for example calls the dynamic page technology such as JSP, ASP or PHP by one or many, at least one by described in Individual expression is displayed on the screen of user equipment by way of display frames are with horizontally-arranged or vertical setting of types and is supplied to user, enters for user Row is selected, wherein, at least one expression can be shown according to user's history usage frequency on the screen of user equipment Show, user by clicking one or while can click multiple expressions.Or, in step S204, conversion equipment 1 can be straight Connect and at least one expression is provided to user equipment, carry out upper screen as information or directly transmit.Preferably, in step S204 In, conversion equipment 1 is shown in the voice messaging or the voice messaging corresponding text message in the display frames It is supplied to user, user to select the content to be gone up screen or be sent according to self-demand.

Preferably, in step S204, conversion equipment 1 obtains at least that user selects from described at least one expression Individual expression, selected at least one expression of the user is provided to the user equipment.Specifically, will in conversion equipment 1 At least one expression is supplied to after user, and user can click directly on wherein any one expression and be selected, or, in step In rapid S204, conversion equipment 1 respectively arranges one and clicks frame on each side expressed one's feelings, and user can click one or same time point By screen or send button in click after selecting multiple expressions, selection to finish, selection result is provided to conversion equipment 1, conversion Device 1 obtains one or more expression that user selects, one or more of expressions are provided to user equipment or It is transmitted by user equipment.Here, if the user desired that being only input into and the voice in the application input of user equipment The corresponding expression of information, then only by one expression, either multiple expressions provide to enter to user equipment or by user equipment Row sends, and does not provide or send other kinds of information, for example, the type information such as voice or text.

Preferably, in step S204, conversion equipment 1 together provides described at least one expression and the voice messaging To the user equipment.Specifically, if user wants that conversion equipment 1 is sent while the voice messaging is sent to be matched extremely A few expression, then in step S204, conversion equipment 1 obtains at least one that user selects from described at least one expression After expression, directly described at least one expression and the voice messaging are provided together to the user equipment, or for user Choice box is provided, user can by click etc. mode choose whether by the voice messaging with described at least one expression together There is provided to the user equipment.

Preferably, in step S202, conversion equipment 1 is extracted based on the speech analysises to the voice messaging Voice fields in the voice messaging, as the voice key word.Specifically, in step S202, conversion equipment 1 passes through Automatic speech segmentation is carried out to the voice messaging, the voice messaging is disassembled into into multiple voice fields, by the plurality of language Sound field as voice key word, or by the plurality of voice fields be analyzed process after, weed out wherein apparently without Practical significance or at least one voice fields without corresponding expression, for example " ", " ", the voice word such as " " Section, using remaining at least one voice fields as voice key word；Further, in step S202, conversion equipment 1 passes through Cognition calculates and the voice messaging is analyzed and is understood, the tone, intonation and word speed according to the voice messaging is intelligently sentenced Implied by the disconnected voice messaging, calls at least coincide with the voice messaging meaning from extended voice storehouse Individual extended voice field, at least one extended voice field is added into the voice key word and is extended.Extension Sound bank is stored with commonly using corresponding voice fields of expressing one's feelings, wherein, the extended voice storehouse may be located at the user equipment In, it is also possible in the third party device being connected by network with the user equipment, the extended voice storehouse can basis The usage record and usage frequency of user is updated or resequences.

Preferably, in step S203, conversion equipment 1 according to the voice key word, in expression storehouse matching obtain with Corresponding at least one expression of the voice key word, wherein, the pre-stored voice field that is stored with the expression storehouse and corresponding table The mapping relations of feelings.Specifically, by counting the corresponding relation of a large amount of voice fields and expression, by voice fields and reflecting for expressing one's feelings The relation of penetrating be stored in expression storehouse in, express one's feelings storehouse in the voice fields as pre-stored voice field, wherein, expression storehouse in every kind of table Feelings can correspond at least one pre-stored voice field, for example, expressionCan correspond to " happy ", " happiness ", " heartily ", " make me laugh to death ", " buoyant " ... wait voice fields, a voice fields to correspond at least one expression, for example, " sad " this language Sound field can be corresponded to expresses one's feelings as follows：In step S203, conversion equipment 1 is according to voice Key word, can obtain an expression or multiple expressions, there is provided to user equipment, selected for user from matching in expression storehouse Select.Preferably, the expression of the above-mentioned expression mood that is not only stored with the expression storehouse, be also stored with expression action or article Expression.Preferably, in step S203, conversion equipment 1 according at least one of voice key word voice fields, in the table Match in feelings storehouse with the similar or close language that prestores of at least one voice fields voice content described in the voice key word Sound field, corresponding one or more expression of the similar or close pre-stored voice field of voice content is provided to the use Family equipment.

In step S203, conversion equipment 1 by the phonetic feature of the voice key word respectively with the expression storehouse in The phonetic feature of pre-stored voice field is matched, and obtains corresponding matching degree；If the matching degree reaches predetermined threshold, by institute The corresponding expression of pre-stored voice field is stated as the corresponding expression of the voice key word.Specifically, when the voice key word For the voice fields of phonetic matrix, the pre-stored voice field expressed one's feelings in storehouse for phonetic matrix pre-stored voice field when, in step In S203, conversion equipment 1 judges phonetic feature and the expression storehouse of voice fields described at least one according to certain matched rule In matching degree between the phonetic feature of voice fields that prestores, judged according to the size of the matching degree be in the expression storehouse It is no to there is expression corresponding with the voice fields.If expression storehouse in certain predetermined voice field A phonetic feature with it is described Matching degree between the phonetic feature of corresponding certain voice fields of voice messaging reaches predetermined threshold, and such as 90%, then express one's feelings There is expression corresponding with the voice fields, predetermined voice field A corresponding at least one table in expression storehouse in storehouse Feelings matching is corresponding at least one expression of the voice messaging.Wherein, the phonetic feature is included but is not limited to：It is semantic special Levy；Word speed feature；Intonation feature.Specifically, in step S203, conversion equipment 1 can be according to the language of the voice key word Adopted feature, is matched in the expression storehouse, if the semanteme of corresponding certain voice fields of the voice messaging and expression The semantic matching degree of certain pre-stored voice field B reaches predetermined threshold in storehouse, then corresponding at least one table of the voice messaging The match is successful for feelings.Preferably for the matched rule, can be entered by certain training pattern through a large amount of samplings, calculating Determine after capable test repeatedly, calculating, and Policy Updates, for example, voice fields are carried out according to the service condition of user：It is " high It is emerging " and " happy ", although be two different voice fields that pronounce, but both mean it is close, therefore, the two languages When sound field is matched, matching degree can reach predetermined threshold, " happy " corresponding at least one expression in expression storehouse, can To match as " happiness " corresponding expression, further, for example, have a meal, have breakfast, have lunch, although expression means not It is too the same, but matching can match when expressing one's feelings and beTherefore above-mentioned three is matched based on the matched rule Matching degree afterwards is that comparison is high.Further, because sometimes user word speed, intonation can when happy or unhappy Can be different, possible some people also have local accent, now not only need to match semantic this phonetic feature, also By " intonation " or " word speed " as the feature of reference, feelings when user sends the voice messaging are analyzed by cognitive calculating Thread, so as to match relatively accurate expression for the voice messaging.

Here, in step S202, conversion equipment 1 is used as the voice and is closed by the voice fields in extraction voice messaging Keyword, in step S203, conversion equipment 1, according to the voice fields, and enters in expression storehouse according to predetermined matched rule Row matching, obtains at least one expression corresponding with the voice key word, by the corresponding expression of voice fields direct correlation, The efficiency and accuracy of the correspondence expression of matching are improve, the experience of user is improved.

Preferably, step S202 includes sub-step S2021 (not shown) and sub-step S2022 (not shown).

In sub-step S2021, conversion equipment 1 is based on the speech recognition to the voice messaging, by the voice messaging It is converted into corresponding text message.Specifically, in sub-step S2021, conversion equipment 1 is by speech recognition technology to institute's predicate Message breath carries out feature extraction (carrying out sub-frame processing including to voice), acoustic model modeling is then carried out, based on acoustic model Speech recognition is carried out to unknown voice frame sequence, the voice messaging is converted into into corresponding text message.

In sub-step S2022, conversion equipment 1 extracts the text based on the semantic analysis to the text message Text key word in information, as the voice key word.Specifically, in sub-step S2022, conversion equipment 1 is using complete The participle technique of cutting carries out participle to the text message, extracts the text key word in the text message, used as institute Predicate sound key word.In step S203, conversion equipment 1 according to the voice key word, in text-Expression Mapping data base Matching obtains at least one expression corresponding with the voice key word, such as by the text key word and the text Text key word in sheet-Expression Mapping storehouse carries out semantic analysis, judges whether both are semantic similar or close, obtain similar Degree, if similarity reaches predetermined threshold, such as 95%, then the match is successful, and the text key word can be in the text-table Matching in feelings data base obtains corresponding expression.In step S204, conversion equipment 1 is expressed one's feelings described at least one and the text This information is provided together to the user equipment.Specifically, if user is intended to described at least one expression and the text envelope Breath all provide to user equipment and be transmitted, then in step S204, conversion equipment 1 directly will described at least one expression with The text message is provided to user equipment by way of upper screen, if user midway is changed mind, can be by input frame In carry out deleting etc. the mode of operation and select the content for sending, or conversion equipment 1 described at least one will express one's feelings and text envelope Breath is provided to user equipment in the way of choice box, and user can according to their needs select the content for needing to send.

The present invention is applied to each input method product, and it is all with phonetic entry, phonetic search, voice the related work(such as associate The product of energy.

Preferably, present invention also offers a kind of computer equipment, the computer equipment includes one or more processors And memorizer, the memorizer be used for store one or more computer programs；When one or more of computer programs are by institute When stating one or more processors execution so that one or more of processors are realized such as any one of step S201 to S204 Described operation.

It should be noted that the present invention can be carried out in the assembly of software and/or software with hardware, for example, this Each bright device can be realized using special IC (ASIC) or any other similar hardware device.In one embodiment In, the software program of the present invention can pass through computing device to realize steps described above or function.Similarly, it is of the invention Software program (including related data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetically or optically driver or floppy disc and similar devices.In addition, some steps or function of the present invention can employ hardware to realize, example Such as, as coordinating so as to perform the circuit of each step or function with processor.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of spirit or essential attributes without departing substantially from the present invention, the present invention can be in other specific forms realized.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference in claim should not be considered as and limit involved claim.This Outward, it is clear that " including ", a word was not excluded for other units or step, and odd number is not excluded for plural number.That what is stated in system claims is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table Show title, and be not offered as any specific order.

Claims

1. a kind of method for voice messaging being changed into into expression in input process, wherein, the method includes：

A. the voice messaging being input into via user equipment is obtained；

D. described at least one expression is provided to the user equipment.

2. method according to claim 1, wherein, step d includes：

Selected at least one expression of the user is provided to the user equipment.

3. method according to claim 1 and 2, wherein, step d includes：

4. the method according to any one of right wants 1 to 3, wherein, step b includes：

Based on the speech analysises to the voice messaging, the voice fields in the voice messaging are extracted, as described Voice key word.

5. method according to claim 4, wherein, step c includes：

According to the voice key word, the matching in expression storehouse obtains at least one expression corresponding with the voice key word, Wherein, be stored with pre-stored voice field and the mapping relations of corresponding expression in the expression storehouse.

6. method according to claim 5, wherein, step c includes：

The phonetic feature of the voice key word is carried out respectively with the phonetic feature of pre-stored voice field in the expression storehouse Matching, obtains corresponding matching degree；

If the matching degree reaches predetermined threshold, using the corresponding expression of the pre-stored voice field as the voice key word pair The expression answered.

7. method according to claim 6, wherein, the phonetic feature includes following at least any one：

Semantic feature；

Word speed feature；

Intonation feature.

8. the method according to right wants 1 or 2, wherein, step b includes：

Based on the semantic analysis to the text message, the text key word in the text message is extracted, as institute's predicate Sound key word.

9. method according to claim 8, wherein, step d includes：

10. voice messaging is changed into the conversion equipment of expression in a kind of input process, wherein, the conversion equipment includes：

Extraction element, for carrying out speech analysises to the voice messaging, extracts the voice key word in the voice messaging；

Coalignment, for according to the voice key word, matching to obtain at least one table corresponding with the voice key word Feelings；

11. conversion equipments according to claim 10, wherein, the offer device is used for：

Selected at least one expression of the user is provided to the user equipment.

12. conversion equipments according to claim 10 or 11, wherein, the offer device is used for：

13. conversion equipments according to any one of right wants 10 to 12, wherein, the extraction element is used for：

14. conversion equipments according to claim 13, wherein, the coalignment is used for：

15. conversion equipments according to claim 14, wherein, the coalignment is used for：

16. conversion equipments according to claim 15, wherein, the phonetic feature includes following at least any one：

Semantic feature；

Word speed feature；

Intonation feature.

17. conversion equipments according to right wants 10 or 11, wherein, the extraction element includes：

Recognition unit, for based on the speech recognition to the voice messaging, the voice messaging being converted into into corresponding text Information；

Extraction unit, for based on the semantic analysis to the text message, the text extracted in the text message to be crucial Word, as the voice key word.

18. conversion equipments according to claim 17, wherein, the offer device is used for：

A kind of 19. computer equipments, the computer equipment includes：

One or more processors；

Memorizer, for storing one or more computer programs；

When one or more of computer programs are by one or more of computing devices so that one or more of Processor realizes method as claimed in any one of claims 1-9 wherein.