CN108073294B

CN108073294B - Intelligent word forming method and device for intelligent word forming

Info

Publication number: CN108073294B
Application number: CN201611004532.8A
Authority: CN
Inventors: 费腾
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2016-11-11
Filing date: 2016-11-11
Publication date: 2021-11-02
Anticipated expiration: 2036-11-11
Also published as: CN108073294A

Abstract

The embodiment of the invention provides an intelligent word forming method and device and a device for intelligent word forming, wherein the method specifically comprises the following steps: receiving input content of a user; analyzing the input content based on the part-of-speech template to obtain a character string of the template to be word-organized and a vocabulary to be word-organized which are matched with the part-of-speech template; utilizing multivariate relational data to perform word formation on the template character string to be word-formed and/or the template character string to be word-formed and adjacent words-formed to obtain corresponding word-forming results; the multivariate relation data is used for recording multivariate relations among the template character strings or between the template character strings and vocabularies; and replacing the character strings of the template of the words to be grouped in the word grouping result with corresponding words to be grouped. The embodiment of the invention can improve the coverage rate of the multivariate relation and the success rate of word formation under the condition of saving the storage space.

Description

Intelligent word forming method and device for intelligent word forming

Technical Field

The invention relates to the technical field of computer information input, in particular to an intelligent word forming method and device and a device for intelligently forming words.

Background

At present, devices involved in interaction generally require a user to recognize their own operation intention in interaction with the devices through an input method system. For example, the user may input an input string or speech, and the input string or speech is recognized by the input method system according to the preset standard mapping rule, so as to convert the input content into a candidate item of a corresponding language and display the candidate item, and then the candidate item selected by the user is displayed on the screen.

When the vocabulary entry directly hitting the input string does not exist in the word stock, the input method system can trigger the intelligent word-forming function. The existing intelligent word-composing scheme is to search the binary relation in the binary library, calculate the path probability of the vocabulary string in each word-composing scheme according to the hit condition of the binary relation, and return the word-composing scheme with the maximum path probability as a preference to the user. The binary relation refers to the collocation relation between the vocabularies, such as weather-good and hot, me-know, like-you, and hundred thousand-eight thousand, and the like, which can have a binary relation. The intelligent word-forming function is very important, the quality of the intelligent word-forming result directly determines the quality of an input method system, and the user experience is directly influenced.

However, in the process of implementing the embodiment of the present invention, the inventor finds that for an intelligent word group containing a number word, a quantifier or an adverb, a very large number of binary relations are often required, which has high requirements on the size and storage space of a binary library. Taking the intelligent word formation of several words as an example, a large number of binary relationships such as "ten thousand-one thousand", "twenty thousand-one thousand", "thirty thousand-one thousand", …, "nine thousand-one thousand", "twenty thousand-two thousand", … "nine thousand-nine thousand", "one thousand-one hundred", …, and "nine thousand-nine hundred" need to be stored, which will make the binary library occupy a large storage space.

In addition, in practical application, the binary relations stored in the binary library are often obtained in a statistical learning manner, and the stored binary relations are difficult to ensure that all situations can be covered, so that the problem of failure in intelligent word formation is caused. For example, if "ninety thousand to eight thousand" is not stored in the binary library, the words "ninety thousand" and "eight thousand" corresponding to the input string "jiuuanbaqian" will not hit the binary relationship in the binary library, thereby causing failure of intelligent word formation.

Disclosure of Invention

In view of the foregoing problems, embodiments of the present invention provide an intelligent word organizing method, an intelligent word organizing device, and an apparatus for intelligent word organizing that overcome the foregoing problems or at least partially solve the foregoing problems.

In order to solve the problems, the invention discloses an intelligent word forming method, which comprises the following steps:

receiving input content of a user;

analyzing the input content based on the part-of-speech template to obtain a character string of the template to be word-organized and a vocabulary to be word-organized which are matched with the part-of-speech template;

utilizing multivariate relational data to perform word formation on the template character string to be word-formed and/or the template character string to be word-formed and adjacent words-formed to obtain corresponding word-forming results; the multivariate relation data is used for recording multivariate relations among the template character strings or between the template character strings and vocabularies;

and replacing the character strings of the template of the words to be grouped in the word grouping result with corresponding words to be grouped.

Optionally, the step of performing word formation on the template character string to be word-formed and/or the template character string to be word-formed and the adjacent word-formed words by using the multivariate relation data includes:

searching in multivariate relational data according to the template character string to be word-organized and/or the template character string to be word-organized and the adjacent words to be word-organized;

and if the search is hit, obtaining a corresponding word forming result according to the multivariate relation recorded in the multivariate relation data.

Optionally, the input content includes: inputting a string, the method further comprising:

segmenting the input string to obtain a corresponding segmentation result;

and searching in a word bank to obtain words matched with the segmentation result, wherein the words are used as words to be grouped corresponding to the input string.

Optionally, the method further comprises:

and setting corresponding priority aiming at the segmentation result according to the matching information of the vocabulary to be grouped corresponding to the segmentation result and the part of speech template.

Optionally, the input content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.

Optionally, the multivariate relationship data is obtained by:

and acquiring the multivariate relation conforming to the part of speech template, and storing the multivariate relation conforming to the part of speech template as multivariate relation data.

Optionally, the multivariate relationship data is obtained by:

acquiring a plurality of adjacent words from a preset corpus; the plurality of words includes: presetting part-of-speech words;

analyzing preset part-of-speech words contained in the vocabularies into corresponding template character strings according to the part-of-speech templates;

and aiming at the vocabularies, storing the corresponding template character strings or the multivariate relation between the template character strings and the vocabularies as multivariate relation data.

Optionally, the step of parsing the input content based on the part-of-speech template includes:

extracting preset part-of-speech words from the vocabulary to be grouped corresponding to the input content;

and analyzing the preset part-of-speech words into character strings of the template to be composed corresponding to the part-of-speech words according to the part-of-speech templates corresponding to the preset part-of-speech words.

Optionally, the preset part-of-speech words include: the first preset part-of-speech words and/or the second preset part-of-speech words.

Optionally, the part-of-speech template is constructed by:

and taking the modification relation between the preset part-of-speech words and other vocabularies or the modification relation between the preset part-of-speech words and the preset part-of-speech words as part-of-speech templates.

On the other hand, the invention discloses an intelligent word-composing device, comprising:

the content receiving module is used for receiving input content of a user;

the analysis module is used for analyzing the input content based on the part-of-speech template to obtain a character string of the template to be composed and a vocabulary to be composed, wherein the character string of the template to be composed is matched with the part-of-speech template;

the word forming module is used for forming words on the template character string to be formed and/or the template character string to be formed and the adjacent word to be formed by utilizing the multivariate relational data so as to obtain a corresponding word forming result; the multivariate relation data is used for recording multivariate relations among the template character strings or between the template character strings and vocabularies; and

and the replacing module is used for replacing the character strings of the template of the words to be grouped in the word grouping result with the corresponding words to be grouped.

Optionally, the word formation module includes:

the searching submodule is used for searching in the multivariate relational data according to the template character string to be word-organized and/or the template character string to be word-organized and the adjacent words to be word-organized;

and the word-forming submodule is used for obtaining a corresponding word-forming result according to the multivariate relation recorded in the multivariate relation data when searching for hits.

Optionally, the input content includes: inputting a string, the apparatus further comprising:

the segmentation module is used for segmenting the input string to obtain a corresponding segmentation result;

and the word stock searching module is used for searching in a word stock to obtain words matched with the segmentation result and used as the words to be grouped corresponding to the input string.

Optionally, the apparatus further comprises:

and the priority setting module is used for setting corresponding priority aiming at the segmentation result according to the matching information of the vocabulary to be grouped corresponding to the segmentation result and the part of speech template.

Optionally, the apparatus further comprises:

and the first storage module is used for acquiring the multivariate relation conforming to the part of speech template and storing the multivariate relation conforming to the part of speech template as multivariate relation data.

Optionally, the apparatus further comprises:

the adjacent vocabulary acquisition module is used for acquiring a plurality of adjacent vocabularies from the preset corpus; the plurality of words includes: presetting part-of-speech words;

the vocabulary analyzing module is used for analyzing preset part-of-speech words contained in the vocabularies into corresponding template character strings according to the part-of-speech template;

and the second storage module is used for storing the corresponding template character strings or the multivariate relation between the template character strings and the vocabularies as multivariate relation data aiming at the vocabularies.

Optionally, the parsing module includes:

the extraction submodule is used for extracting preset part-of-speech words from the vocabulary to be grouped corresponding to the input content;

and the analysis submodule is used for analyzing the preset part-of-speech words into character strings of the template to be composed corresponding to the part-of-speech words according to the part-of-speech templates corresponding to the preset part-of-speech words.

Optionally, the apparatus further comprises:

and the part-of-speech template construction module is used for taking the modification relation between the preset part-of-speech words and other vocabularies or the modification relation between the preset part-of-speech words and the preset part-of-speech words as the part-of-speech template.

In yet another aspect, an apparatus for intelligent word formation is disclosed that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to include instructions for:

receiving input content of a user;

The embodiment of the invention has the following advantages:

the embodiment of the invention adopts the template character strings to describe the multivariate relation among the vocabularies, and utilizes the multivariate relation data containing the template character strings to group the vocabularies to be grouped corresponding to the input content; since the template character string of the embodiment of the present invention corresponds to the part-of-speech template, and the part-of-speech template represents a general modification attribute related to the part-of-speech, the template character string corresponding to the part-of-speech template may be applicable to all modification scenes of the related vocabulary, for example, the template character string "NUM _ ten" may be applicable to all modification scenes of "ten thousand", the template character string "NUM _ kg" may be applicable to all modification scenes of "kg", the template character string "ADV _ like" may be applicable to all modification scenes of "like", and the like, the template character string "NUM _ MEA" represents all modification scenes between the number word and the quantifier, for example, "NUM" may represent an arbitrary number word, so in the case that the number word included in the vocabulary to be grouped is any number word such as "one", "two", … "ten", "one hundred", and the like; the MEA can represent any quantitative word, so that in the case that the quantitative words included in the vocabulary to be grouped are any quantitative words such as "kg", … "km" and "newton", the embodiments of the present invention can successfully complete the word grouping, and thus the embodiments of the present invention can improve the coverage rate of the multivariate relationship and the success rate of word grouping.

Moreover, compared with the existing scheme that a large number of binary relations such as "ten thousand-one thousand", "twenty thousand-one thousand", "thirty thousand-one thousand", …, "ninety thousand-one thousand", "twenty thousand-two thousand", … "ninety thousand-nine thousand", "one thousand-one hundred", … "nine thousand-nine hundred" and the like need to be stored, the embodiment of the invention can successfully complete word formation under the condition of storing a one-to-many relation including "NUM _ ten thousand", so that the storage space required by the many-to-many relation can be saved.

Drawings

FIG. 1 is a flowchart illustrating the steps of a first embodiment of an intelligent word-composing method of the present invention;

FIG. 2 is a flowchart illustrating the steps of a second embodiment of the intelligent word organizing method of the present invention;

FIG. 3 is a block diagram of an embodiment of an intelligent word forming apparatus according to the present invention;

FIG. 4 is a block diagram of an apparatus 900 for intelligent word formation in accordance with the present invention; and

fig. 5 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Method embodiment one

Referring to fig. 1, a flowchart illustrating steps of a first embodiment of an intelligent word organizing method according to the present invention is shown, which may specifically include the following steps:

step 101, receiving input content of a user;

102, analyzing the input content based on a part-of-speech template to obtain a character string of a template to be word-organized and a vocabulary to be word-organized, wherein the character string of the template to be word-organized is matched with the part-of-speech template;

103, utilizing multivariate relation data to perform word formation on the template character string to be word-formed and/or the template character string to be word-formed and adjacent words-formed to obtain corresponding word-forming results; the multivariate relation data is used for recording multivariate relations among the template character strings or between the template character strings and vocabularies;

and 104, replacing the character strings of the template of the words to be grouped in the word grouping result with corresponding words to be grouped.

The embodiment of the invention can be applied to input method systems of various input modes, for example, the input modes can specifically include input modes such as keyboard symbols, handwritten information, voice input and the like, that is, a user can input on-screen contents through coded character strings, handwritten attribute characteristics and the like. Taking a voice input mode as an example, the input method system can collect a voice signal input by a user, convert the voice signal into text information, and divide the text information into words to be formed to form words. The following description will mainly take an example of an input method of a code string (hereinafter, referred to as an input string), and other input methods may be referred to each other.

In the field of input method systems, no matter the input method systems are in Chinese, Japanese, Korean or other languages, input strings of users are converted into candidate items of corresponding languages, and then the users select contents output to an application program, wherein the contents output to the application program through screen-up operation are also screen-up contents. In the process of converting the input string of the user into the candidate item of the corresponding language, the entry corresponding to the input string can be directly searched from the word stock, and if the entry is searched, the searched entry can be used as the candidate item, for example, the entry corresponding to the input string 'nihao' or 'tianqihenhao' or 'good weather' and the like can be directly searched from the word stock. Optionally, the word bank of the embodiment of the present invention may specifically include: a system thesaurus, a user thesaurus, a cell thesaurus, a cloud thesaurus, and the like, and the specific thesaurus is not limited in the embodiment of the present invention.

However, in practical applications, there are many reasons that there is no entry in the lexicon that directly hits the input string, and optionally, when the number of words to be input by the user is large (e.g., phrases or long sentences) or contents that have not been input before are to be input, there may be a case that there is no entry in the lexicon that directly hits the input string, in which case the input method system may trigger the intelligent word formation function. For example, when a user wants to input "eighty thousand three hundred and forty" through the input string "bawanliligqainsbaishi", or wants to input "ninety thousand eight thousand" through the input string "jiuuanbaqian", or wants to input "gently drop" through the input string "qiingdifainxia", or wants to input "better understand the present invention" through "genghaodiilijebenfam", entries that these input strings directly hit may not exist in the thesaurus.

The existing intelligent word-composing scheme utilizes the binary relation (the collocation relation between words and phrases) in the binary library to compose words for input strings. However, for intelligent word formation including a number word, a quantifier or an adverb, a great number of binary relations are often required, which not only has high requirements on the size and storage space of a binary library, but also often leads to failure of intelligent word formation due to insufficient coverage rate of the binary relations. Taking intelligent word formation of the number words as an example, the collocation relationship among all the number words needs to be stored in the binary library, and if the coverage rate of storage is not enough, the intelligent word formation will fail. Although a large number of binary relationships such as "ten thousand-one thousand", "twenty thousand-one thousand", "thirty thousand-one thousand", …, "ninety thousand-one thousand", "twenty thousand-two thousand", … "ninety thousand-nine thousand", "one thousand-one hundred", … "nine thousand-nine hundred" are stored in the binary library, if "ninety thousand-eight thousand" is not stored, an intelligent word formation failure may occur even when the input string is "jiuwanbaiian".

Aiming at the problems of intelligent word formation of digital words, quantifier words or adverbs, the embodiment of the invention creatively provides a part-of-speech template, presets a corresponding template character string aiming at the part-of-speech template, and adopts the template character string to describe the multivariate relation among words. Wherein the part-of-speech template represents a generic modifier attribute associated with the part-of-speech.

Alternatively, modification relationships between the preset part-of-speech words and other words or modification relationships between the preset part-of-speech words and the preset part-of-speech words may be used as the part-of-speech template. For example, the part-of-speech template may specifically include: a number word template, a number quantifier template or an adverb template, etc. The number template is used to constrain the attribute of the number, for example, the number "ten thousand" and … "hundred thousand" may correspond to the same number template, the corresponding template character string may be "NUM _ ten", the number "one thousand" and … "nine thousand" may correspond to the same number template, and the corresponding template character string may be "NUM _ thousand". The quantitative words are used for restricting the attributes of the quantitative words, for example, the quantitative words "one", … "ten" and the like may correspond to the same quantitative word template, the corresponding template character strings thereof may be "NUM _ number", the quantitative words "one kilogram", … "ten kilogram" and the like may correspond to the same quantitative word template, and the corresponding template character strings thereof may be "NUM _ kilogram" and the like. The number word and quantifier template can be used for constraining all the modification scenes between the number words and the quantifiers, and the corresponding template character string can be 'NUM _ MEA', wherein 'NUM' can represent any number word, and 'MEA' can represent any quantifier. Adverb templates can be used to constrain the attributes of adverbs and their modified verbs or adjectives, e.g., adverb + verb "put down gently", "like very much", "understand better", "dislike", "go right away", etc., and the corresponding template strings can be "ADV _ put down", "ADV _ like", "ADV _ go", etc. It can be seen that the template character strings corresponding to the part-of-speech templates may be applicable to all the modified scenes of the related words, for example, "NUM _ ten" may be applicable to all the modified scenes of "ten thousand," NUM _ kg "may be applicable to all the modified scenes of" kg, "ADV _ like" may be applicable to all the modified scenes of "like," NUM _ MEA "may be applicable to the modified scenes between any number words and any number words, where" MEA "may be used to represent any number words such as" kg, "kg," … "km," "newtons," and the like.

It is understood that the above-mentioned number template, or adverb template is only an alternative embodiment of the present invention, and in fact, those skilled in the art can set the required part-of-speech template according to the actual application requirement, such as the adverb VERB template "ADV _ VERB", etc., where "VERB" can be used to represent any VERB such as "like", "love", "hate", "angry", "surprised", etc.

In addition, the parts of speech template is mainly described above by taking the parts of speech template of the chinese language as an example, and it can be understood that, according to the actual application requirements, a person skilled in the art can set corresponding parts of speech templates for the parts of speech set for other languages besides the chinese language, for piece false and for flat false, and set corresponding parts of speech templates for the parts of speech set for french, and the like, and it can be understood that the modification relationship between any parts of speech of any language is within the protection scope of the parts of speech template of the embodiment of the present invention.

In the embodiment of the present invention, the multivariate relation data may be used to record multivariate relations between template character strings or between template character strings and vocabularies, that is, the multivariate relation data in the embodiment of the present invention may include template character strings.

The embodiment of the invention can provide various technical schemes for acquiring the multi-element relation data:

technical solution 1

The technical scheme 1 can be used for reconstructing the existing multivariate library to obtain corresponding multivariate relational data. Specifically, a multivariate relationship conforming to the part of speech template is obtained, and the multivariate relationship conforming to the part of speech template is stored as multivariate relationship data, wherein the multivariate relationship can be stored according to a template character string corresponding to the part of speech template.

In practical applications, the multivariate library may specifically include: a system multi-element library, a user multi-element library and the like; the multivariate relationship may specifically include: binary or a relationship of more than two. The embodiment of the present invention is mainly described by taking a binary relationship as an example, and the relationship above the binary relationship can be referred to each other. In the embodiment of the present invention, the binary relation is mainly used to reflect the probability of using two adjacent elements (hereinafter referred to as connection probability). In the existing scheme, two elements in the binary relation are both vocabularies, and the two elements in the binary relation in the embodiment of the present invention may include: and (4) template character strings corresponding to the vocabularies. For example, a binary relation "hundred thousand to eight thousand" is recorded in the multivariate library, and the binary relation conforms to the number template, and then two words "hundred thousand" and "eight thousand" in the binary relation can be respectively processed according to the number template to obtain binary relation data "NUM _ ten thousand to NUM _ thousand". Similarly, if a binary relation "very-liked" is recorded in the multi-element library, the binary relation may be stored as multi-element relation data according to the adverb template or the adverb verb template, and the corresponding multi-element relation data may be: "ADV _ like", "ADV _ VERB".

Technical solution 2

The technical scheme 2 can analyze the preset linguistic data to obtain corresponding multivariate relational data. Specifically, a plurality of adjacent words can be obtained from a preset corpus; the plurality of words may specifically include: presetting part-of-speech words; analyzing preset part-of-speech words contained in the plurality of vocabularies into corresponding template character strings according to the part-of-speech template; and aiming at the vocabularies, storing the corresponding template character strings or the multivariate relation between the template character strings and the vocabularies as multivariate relation data.

In practical applications, the preset corpus may specifically include: the method comprises the steps of obtaining an internet corpus and a corpus accumulated by a cloud computing input method based on a web crawler technology; in addition, the internet corpus can be an internet blog corpus, an internet news corpus, and/or an internet forum corpus, and the like. The corpus accumulated by the cloud computing input method can be derived from historical input behavior data of a network-wide user, and it can be understood that the embodiment of the invention does not limit specific preset corpora.

In the embodiment of the present invention, the preset part-of-speech word may be used to represent a vocabulary whose part-of-speech is a preset part-of-speech, and optionally, the preset part-of-speech word may specifically include: in practical application, a first preset part of speech word, such as the words "one hundred thousand", "eight thousand", etc., may be searched from a plurality of vocabularies, and then the first preset part of speech word included in the plurality of vocabularies may be parsed into a corresponding template character string according to a part of speech template. For example, if "one hundred thousand" and "eight thousand" appear adjacently in the preset corpus, the two words "one hundred thousand" and "eight thousand" may be analyzed according to the number template, so as to obtain the binary relationship "NUM _ ten thousand-NUM _ thousand". For another example, if "one hundred thousand", "one", and "cold joke" appear adjacently, the "one hundred thousand" can be analyzed according to the digital template, and finally the ternary relationship "NUM _ ten thousand-one-cold joke" is obtained.

Alternatively, the preset part-of-speech words may include: the word processing method comprises the following steps that a first preset part-of-speech word and a second preset part-of-speech word which are adjacent to each other are used, such as a number word + quantifier, an adverb + verb, or an adverb + adjective. In practical application, a first preset part-of-speech word may be searched from a plurality of vocabularies, and then whether a vocabulary adjacent to the first preset part-of-speech word is a second preset part-of-speech word is determined, if yes, the first preset part-of-speech word and the second preset part-of-speech word included in the plurality of vocabularies are analyzed into corresponding template character strings according to a part-of-speech template. Optionally, the first preset part-of-speech word may be a number word, an adverb, or the like, and the corresponding second preset part-of-speech word may be a quantifier, a verb, an adjective, or the like. For example, if the word "fifty", the word "jin" and the term "rice" appear adjacent to each other in the preset corpus, the word "fifty jin" may be analyzed as "NUM _ jin" according to the part-of-speech template, and a binary relationship between "NUM _ jin" and "rice" is established. For another example, if the adverb "light", the verb "drop", and the noun "you" appear adjacent in the preset corpus, the "light drop" can be resolved into "ADV _ drop" according to the part-of-speech template, and a binary relationship between "ADV _ drop" and "you" is established. For another example, if the adverb "very", the verb "like" and the noun "you" appear adjacent to each other in the preset corpus, or the adverb "very", the verb "like" and the noun "you" appear adjacent to each other in the preset corpus, the "very like" or the "very like" may be resolved into "ADV _ like" according to the part-of-speech template, and a binary relationship between "ADV _ like" and "you" is established. It can be understood that, according to the actual application requirements, a person skilled in the art may use the required first preset part-of-speech word and the second preset part-of-speech word, and the embodiment of the present invention does not limit the specific first preset part-of-speech word and the specific second preset part-of-speech word.

In addition, the multivariate relationship data according to the embodiment of the present invention is mainly described above by taking a binary relationship including the template character string as an example, and actually, the multivariate relationship data according to the embodiment of the present invention may also relate to a relationship of more than two, such as a ternary relationship "NUM _ ten-NUM _ thousand-NUM _ hundred", for example, a quaternary relationship "NUM _ ten-NUM _ thousand-NUM _ hundred-NUM _ ten", and the like.

In addition, the technical scheme 1 for modifying the existing multivariate library and the technical scheme 2 for analyzing the preset corpus are only optional technical schemes for acquiring multivariate relational data according to the embodiment of the present invention, and actually, a person skilled in the art can also adopt other optional technical schemes for acquiring multivariate relational data according to actual application requirements, for example, for a commonly used preset part-of-speech word, collect modified words adjacent to the commonly used preset part-of-speech word, analyze the preset part-of-speech word and the modified words thereof into corresponding template character strings according to a part-of-speech template, and further establish a multivariate relation and the like.

In this embodiment of the present invention, optionally, the input content may include: and inputting a string, wherein the embodiment of the invention can search and obtain the vocabulary to be grouped corresponding to the input string in the word stock. For example, if the input string is "bawanliliangqiansabishi", the corresponding vocabulary to be assembled may include: "eighty thousand", "two thousand", "three hundred", "forty", etc.

In another optional embodiment of the present invention, the input content may further include, in addition to the input string: the context to which the input string corresponds. This above may be applicable to a scenario where the user enters the coherent content multiple times. For example, if the user wants to input "eighty thousand, three hundred and forty", he first inputs and displays "eighty thousand" and then inputs "liangqian", the vocabulary corresponding to "eighty thousand" and "liangqian" may be used as the vocabulary to be assembled. The context may be applicable to a situation where a user edits the already-on-screen content. For example, if the user first inputs "today is sunny", then moves the cursor to a position before "sunny", and types in the input string "feech", the embodiment of the present invention may group the word corresponding to "feech" with its "sunny" below. It is understood that the embodiment of the present invention does not impose any limitation on the specific word-forming scenarios corresponding to the contexts.

In the embodiment of the invention, part of the vocabulary to be composed or all of the vocabulary to be composed may conform to the part-of-speech template, and then the part of the vocabulary to be composed or all of the vocabulary to be composed may be analyzed into the corresponding character string of the template to be composed. In an optional embodiment of the present invention, the step 102 of analyzing the input content based on the part-of-speech template to obtain a character string of the template of words to be grouped and words to be grouped, which are matched with the part-of-speech template, may specifically include: extracting preset part-of-speech words from the vocabulary to be grouped corresponding to the input content; and analyzing the preset part-of-speech words into character strings of the template to be composed corresponding to the part-of-speech words according to the part-of-speech templates corresponding to the preset part-of-speech words. That is, the embodiment of the present invention may only perform parsing on the preset part-of-speech word or the modifier corresponding to the preset part-of-speech word, so as to implement intelligent word formation related to the preset part-of-speech word.

Optionally, the preset part-of-speech word may include: the word processing method includes the steps of obtaining a first preset part-of-speech word and/or a second preset part-of-speech word, that is, the preset part-of-speech word may include the first preset part-of-speech word, or the preset part-of-speech word may include: the method comprises the following steps that a first preset part-of-speech word and a second preset part-of-speech word occur adjacently.

In another optional embodiment of the present invention, the step of performing word formation on the template character string to be word-formed and/or the template character string to be word-formed and the adjacent words to be word-formed by using the multivariate relational data may specifically include: searching in multivariate relational data according to the template character string to be word-organized and/or the template character string to be word-organized and the adjacent words to be word-organized; and if the search is hit, obtaining a corresponding word forming result according to the multivariate relation recorded in the multivariate relation data. Optionally, the multiple template character strings to be word-organized may be matched with each content of the multivariate relation, and if matching is successful, it indicates that the search is hit; or, the character string of the template to be word-organized and the adjacent words to be word-organized may be matched with each content of the multivariate relation, and if the matching is successful, the search hit is indicated. For example, if the template string of the words to be grouped is "ADV _ like", and the adjacent words to be grouped are "you", then "ADV _ like" and "you" can be respectively matched with the contents of the multi-element relationship. Or, if the adjacent template character strings to be composed are "NUM _ ten" and "NUM _ thousand", the "NUM _ ten" and "NUM _ thousand" may be respectively matched with each content of the multivariate relation.

In yet another alternative embodiment of the present invention, the vocabulary to be grouped or the result of word formation corresponding to the vocabulary to be grouped may be sorted according to the matching information between the vocabulary to be grouped and the part-of-speech template. For example, the vocabulary to be composed corresponding to the input string "liangwanyiqiian" may include: "twenty thousand", "two bowls", "two nights", "air-dried", "one thousand", "before", "signed", etc., since the vocabulary to be composed "twenty thousand" and "two bowls" all hit the number word quantifier template "NUM _ MEA", it can have a higher priority.

In summary, the intelligent word formation method of the embodiment of the invention adopts the template character strings to describe the multivariate relation among words and phrases, and performs word formation on the words to be formed corresponding to the input content by utilizing the multivariate relation data containing the template character strings; because the template character string of the embodiment of the invention corresponds to the part-of-speech template which represents the general modification attribute related to the part-of-speech, the template character string corresponding to the part-of-speech template can be applicable to all modification scenes of the related vocabulary, for example, the template string "NUM _ ten" may be applied to all of the modifier scenes of "ten thousand", the template string "NUM _ kg" may be applied to all of the modifier scenes of "kg", the template string "ADV _ like" may be applied to all of the modifier scenes of "like", and so on, for example, "NUM" may represent any number, so that in the case where the words to be grouped include any number such as "one", "two", … "ten", "hundred", etc., the embodiment of the invention can successfully complete word formation, so that the embodiment of the invention can improve the coverage rate of the multivariate relation and the success rate of word formation.

Method embodiment two

Referring to fig. 2, a flowchart illustrating steps of a second embodiment of the intelligent word organizing method of the present invention is shown, which may specifically include the following steps:

step 201, receiving input content of a user; the input content may include: an input string, or the input string and its corresponding context;

step 202, segmenting the input string to obtain a corresponding segmentation result;

step 203, searching in a word bank to obtain a vocabulary matched with the segmentation result, and using the vocabulary as a vocabulary to be grouped corresponding to the input string;

step 204, analyzing the input string or the input string and a vocabulary to be composed corresponding to the context thereof based on a part-of-speech template to obtain a character string of the template to be composed and a vocabulary to be composed which are matched with the part-of-speech template;

step 205, utilizing multivariate relation data to perform word formation on the template character string to be word-formed and/or the template character string to be word-formed and the adjacent word-formed words to obtain corresponding word-forming results; the multivariate relation data is used for recording multivariate relations among the template character strings or between the template character strings and vocabularies;

and step 206, replacing the character strings of the template of the words to be grouped in the word grouping result with corresponding words to be grouped.

In practical application, the input string may be segmented according to the rules of the input string. If the input string is a pinyin string, segmentation can be performed according to syllable rules. An input string may have one or more slicing schemes, and the slicing result corresponding to each slicing scheme may include one or more substrings. For example, the input string "bawanlianggjiaan" may be sliced into "ba ' wan ' liang ' jiaan" and the input string "fangan" may be sliced into "fang ' an" or "fan ' gan".

In practical application, the vocabulary to be grouped corresponding to each substring can be obtained by searching in a system word bank and a user word bank. Words to be grouped such as "ba' wan" may include: the vocabulary to be composed corresponding to "eighty thousand", "pulled out", "liang' qian" may include: "two thousand", "beam qian", etc.

In an optional embodiment of the present invention, a corresponding priority may be further set for the segmentation result according to matching information between the vocabulary to be grouped and the part-of-speech template corresponding to the segmentation result. When the matching information is matching, that is, the vocabulary to be grouped corresponding to the segmentation result is matched with the part-of-speech template, a higher priority can be set for the segmentation result; when the matching information is not matched, that is, the vocabulary to be grouped corresponding to the segmentation result is not matched with the part-of-speech template, a lower priority can be set for the segmentation result. The priority of the segmentation result can be used for determining the quality of the segmentation result corresponding to the vocabulary to be grouped, for example, the higher the priority is, the higher the corresponding quality is; alternatively, the priority of the segmentation result may be used to determine a path score of a word-assembling path of the segmentation result corresponding to the word assembly to be assembled, for example, the higher the priority is, the higher the corresponding path score is.

After the vocabulary to be composed is analyzed into the corresponding template character strings, the embodiment of the invention can combine the template character strings and/or the vocabulary to be composed pairwise to obtain a plurality of corresponding vocabulary composing paths. For example, a wording path corresponding to "bawanliangqian" may include: "NUM _ ten thousand + NUM _ thousand", "NUM _ ten thousand + beam beautiful", "pulled out + NUM _ thousand", "pulled out beam beautiful", and the like.

For each word-forming path, searching in the multivariate relation data of the embodiment of the invention according to the template character string and/or the word-forming vocabulary to be formed, if searching hit, directly taking the multivariate relation as the corresponding word-forming result; or, the path probability of the whole word group path may be calculated according to the connection probability corresponding to the multivariate relation, and the word group path with the maximum path probability is used as the word group result.

It should be noted that, in the embodiment of the present invention, the vocabulary to be assembled is analyzed into the template character string before the vocabulary is assembled, so that the template character string in the result of the vocabulary assembly needs to be replaced with the original vocabulary to be assembled after the vocabulary assembly.

In addition, it should be noted that, in the embodiment of the present invention, the candidate item corresponding to the input string may be output according to the replaced word formation result. If the input content includes the input string, the replaced word formation result can be directly used as a candidate item to be output. If the input content includes the input string and the context thereof, the corresponding candidate item can be output after the context is removed from the replaced word forming result.

In order to make the embodiment of the present invention better understood, an example of an intelligent word-forming method of the present invention is provided herein, which may specifically include the following steps:

step S1, receiving an input string 'bawanliligqainsbaissishi';

step S2, the input string is segmented to obtain a segmentation result ' ba ' wan ' liang ' qian ' san ' bai ' si ' shi ';

step S3, searching in a lexicon to obtain a vocabulary to be grouped corresponding to the segmentation result: eight ten thousand, two thousand, three hundred and forty,

step S4, parsing the vocabulary to be assembled into corresponding template character strings to obtain "NUM _ ten", "NUM _ thousand", "NUM _ hundred", and "NUM _ ten",

step S5, performing word formation on the template character string by using the multivariate relation data of the embodiment of the present invention to obtain a word formation result "NUM _ ten + NUM _ thousand + NUM _ hundred + NUM _ ten";

and step S6, replacing the template character string in the word combination result with the original word combination to be combined to obtain the final word combination result of 'eight thousand, two thousand, three hundred and forty'.

In practical application, the multivariate relationship data of the embodiment of the present invention may record a binary relationship between "NUM _ ten" and "NUM _ thousand" and a binary relationship between "NUM _ thousand" and "NUM _ hundred", so that, no matter whether the vocabulary to be grouped is "ninety-three-thousand two hundred", "eighty-four-thousand three hundred", or other vocabularies satisfying x-ten-x-thousand x-hundred, the embodiment of the present invention may successfully complete word grouping, and the embodiment of the present invention may only store two binary relationships, which can greatly save storage space compared to the existing scheme.

It should be noted that, for simplicity of description, the method embodiments are described as a series of motion combinations, but those skilled in the art should understand that the present invention is not limited by the described motion sequences, because some steps may be performed in other sequences or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no moving act is required as an embodiment of the invention.

Device embodiment

Referring to fig. 3, a block diagram of an embodiment of an input device according to the present invention is shown, which may specifically include: a content receiving module 301, a parsing module 302, a word-composing module 303, and a replacing module 304.

The content receiving module 301 is configured to receive input content of a user;

the parsing module 302 is configured to parse the input content based on a part-of-speech template to obtain a template character string to be word-composed and a vocabulary to be word-composed, where the template character string to be word-composed is matched with the part-of-speech template;

the word formation module 303 is configured to perform word formation on the template character string to be word formed and/or the template character string to be word formed and the adjacent word formation to be word formed by using multivariate relational data to obtain a corresponding word formation result; the multivariate relation data is used for recording multivariate relations among the template character strings or between the template character strings and vocabularies; and

the replacing module 304 is configured to replace the character string of the template of the word to be composed in the word composing result with a corresponding word to be composed.

Optionally, the word forming module 303 may include:

Optionally, the input content may include: inputting a string, the apparatus may further include:

Optionally, the apparatus may further include:

Optionally, the input content may further include: the context corresponding to the input string, the vocabulary to be composed corresponding to the input content may include: and the vocabulary to be grouped corresponding to the input string and the context.

Optionally, the apparatus may further include:

the adjacent vocabulary acquisition module is used for acquiring a plurality of adjacent vocabularies from the preset corpus; the plurality of words may include: presetting part-of-speech words;

Optionally, the parsing module 302 may include:

Optionally, the preset part-of-speech words may include: the first preset part-of-speech words and/or the second preset part-of-speech words.

Optionally, the apparatus may further include:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 4 is a block diagram illustrating an apparatus 900 for intelligent word formation, according to an example embodiment. For example, the apparatus 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 4, apparatus 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916.

The processing component 902 generally controls overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 906 provides power to the various components of the device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 900.

The multimedia component 908 comprises a screen providing an output interface between the device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide motion action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when apparatus 900 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing status assessment of various aspects of the apparatus 900. For example, the sensor assembly 914 may detect an open/closed state of the device 900, the relative positioning of the components, such as a display and keypad of the apparatus 900, the sensor assembly 914 may also detect a change in the position of the apparatus 900 or a component of the apparatus 900, the presence or absence of user contact with the apparatus 900, orientation or acceleration/deceleration of the apparatus 900, and a change in the temperature of the apparatus 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate communications between the apparatus 900 and other devices in a wired or wireless manner. The apparatus 900 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the apparatus 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a smart terminal, enable the smart terminal to perform an intelligent word-composing method, the method comprising: receiving input content of a user; analyzing the input content based on the part-of-speech template to obtain a character string of the template to be word-organized and a vocabulary to be word-organized which are matched with the part-of-speech template; utilizing multivariate relational data to perform word formation on the template character string to be word-formed and/or the template character string to be word-formed and adjacent words-formed to obtain corresponding word-forming results; the multivariate relation data is used for recording multivariate relations among the template character strings or between the template character strings and vocabularies; and replacing the character strings of the template of the words to be grouped in the word grouping result with corresponding words to be grouped.

Fig. 5 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is only limited by the appended claims

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The foregoing describes in detail an intelligent word organizing method, an intelligent word organizing device, and a device for intelligent word organizing provided by the present invention, and specific examples are applied herein to explain the principle and the implementation of the present invention, and the description of the above examples is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An intelligent word-composing method, comprising:

receiving input content of a user;

replacing the character strings of the template to be word-composed in the word-composing result with corresponding words-to-be-composed words;

wherein the parsing the input content based on the part of speech template includes:

2. The method according to claim 1, wherein the step of using the multivariate relational data to group the to-be-grouped word template character string and/or the to-be-grouped word template character string and the adjacent to-be-grouped word thereof comprises:

3. The method of claim 1 or 2, wherein the inputting the content comprises: inputting a string, the method further comprising:

segmenting the input string to obtain a corresponding segmentation result;

4. The method of claim 3, further comprising:

5. The method of claim 3, wherein inputting the content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.

6. The method according to claim 1 or 2, wherein the multivariate relational data is obtained by:

7. The method according to claim 1 or 2, wherein the multivariate relational data is obtained by:

obtaining a plurality of adjacent words from a preset corpus, wherein the words comprise: presetting part-of-speech words;

8. The method of claim 1, wherein the predetermined part-of-speech words comprise: the first preset part-of-speech words and/or the second preset part-of-speech words.

9. The method according to claim 1 or 2, wherein the part-of-speech template is constructed by:

10. An intelligent word-composing device, comprising:

the content receiving module is used for receiving input content of a user;

the replacing module is used for replacing the character strings of the template to be composed in the character composing result with the corresponding vocabulary to be composed;

wherein the parsing module comprises:

11. The apparatus of claim 10, wherein the word formation module comprises:

12. The apparatus of claim 10, wherein the input content comprises: inputting a string, the apparatus further comprising:

13. The apparatus of claim 12, further comprising:

14. The apparatus of claim 12, wherein the input content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.

15. The apparatus of claim 10, further comprising:

16. The apparatus of claim 10 or 11, further comprising:

17. The apparatus of claim 10, wherein the predetermined part-of-speech words comprise: the first preset part-of-speech words and/or the second preset part-of-speech words.

18. The apparatus of claim 10 or 11, further comprising:

19. An apparatus for intelligent word formation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors, the one or more programs comprising instructions for:

receiving input content of a user;

utilizing multivariate relational data to perform word formation on the template character string to be word-formed and/or the template character string to be word-formed and adjacent words-formed to obtain corresponding word-forming results; the multivariate relation data is used for recording multivariate relations among template character strings or between the template character strings and vocabularies, and the template character strings correspond to the part-of-speech templates;

20. The apparatus according to claim 19, wherein the using the multivariate relational data to group the template character string to be grouped and/or the template character string to be grouped and the adjacent vocabulary to be grouped comprises:

21. The apparatus of claim 19 or 20, wherein the input content comprises: the device is also configured to execute, by one or more processors, the one or more programs including instructions for:

segmenting the input string to obtain a corresponding segmentation result;

22. The device of claim 21, wherein the device is also configured to execute the one or more programs by one or more processors includes instructions for:

23. The apparatus of claim 21, wherein the input content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.

24. The apparatus of claim 19 or 20, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:

25. The apparatus of claim 19 or 20, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:

26. The apparatus of claim 19, wherein the predetermined part-of-speech words comprise: the first preset part-of-speech words and/or the second preset part-of-speech words.

27. The apparatus of claim 19 or 20, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:

28. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method of one or more of claims 1-9.