CN108073292B - Intelligent word forming method and device for intelligent word forming - Google Patents

Intelligent word forming method and device for intelligent word forming Download PDF

Info

Publication number
CN108073292B
CN108073292B CN201610996202.5A CN201610996202A CN108073292B CN 108073292 B CN108073292 B CN 108073292B CN 201610996202 A CN201610996202 A CN 201610996202A CN 108073292 B CN108073292 B CN 108073292B
Authority
CN
China
Prior art keywords
word
speech
collocation
path
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610996202.5A
Other languages
Chinese (zh)
Other versions
CN108073292A (en
Inventor
费腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201610996202.5A priority Critical patent/CN108073292B/en
Publication of CN108073292A publication Critical patent/CN108073292A/en
Application granted granted Critical
Publication of CN108073292B publication Critical patent/CN108073292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention provides an intelligent word forming method and device and a device for intelligent word forming, wherein the method specifically comprises the following steps: acquiring input content of a user; acquiring vocabularies to be grouped corresponding to the input content and the parts of speech of the vocabularies to be grouped; determining part-of-speech collocation scores between adjacent words in a word-grouping path corresponding to the words to be grouped according to preset part-of-speech collocation rules and the parts of speech of the words to be grouped; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech; determining a path score of the word grouping path according to a part of speech collocation score between adjacent words contained in the word grouping path; and acquiring word forming candidates from the word forming path according to the path score. The embodiment of the invention can improve the reasonability and quality of the word forming candidates, so that the reasonable word forming candidates can be provided even under the condition of intelligent word forming failure, and the input efficiency of a user is further improved.

Description

Intelligent word forming method and device for intelligent word forming
Technical Field
The invention relates to the technical field of computer information input, in particular to an intelligent word forming method and device and a device for intelligently forming words.
Background
At present, devices involved in interaction generally require a user to recognize their own operation intention in interaction with the devices through an input method system. For example, the user may input an input string, and the input string is converted into a candidate item of a corresponding language and displayed by the input method system according to a preset standard mapping rule, so that the candidate item selected by the user is displayed on a screen.
When the vocabulary entry directly hit by the input string does not exist in the word stock, the input method system can trigger the intelligent word-forming function. The existing intelligent word-forming scheme is as follows: searching the binary relation in the binary library, calculating the path probability of the vocabulary string in each word forming scheme according to the hit condition of the binary relation, and returning the word forming scheme with the maximum path probability to the user as a preference. The binary relation refers to a collocation relation between vocabularies, such as weather-good and hot, me-know, like-you, and hundred thousand-eight thousand, and the like, and can have a binary relation. The intelligent word-forming function is very important, the quality of the intelligent word-forming result directly determines the quality of an input method system, and the user experience is directly influenced.
In practical applications, many binary relations are often required for intelligent word groups including numbers, quantifiers or adverbs. However, on the one hand, limited storage space, the stored binary relation is limited; on the other hand, the binary relation stored in the binary library is often obtained by a statistical learning mode, and the stored binary relation is difficult to ensure that all conditions can be covered; therefore, if the binary relation in the binary library cannot be hit in the intelligent word composing process, the intelligent word composing will fail. For example, if "ninety-eight thousand" and "eight thousand-yuan" are not stored in the binary library, the words "ninety thousand" and "eight thousand" and "yuan" corresponding to the input string "jiuwanliligqaanyuan" will not hit the binary relationship in the binary library, thereby causing failure of intelligent word formation. When the intelligent word formation fails, the existing scheme usually selects the word with the highest word frequency to combine to obtain the corresponding word formation candidate, for example, the word formation candidate corresponding to the input string "jiuwanliliangqianyuan" is "qianjianqian on playing", but "qianjiangyuan on playing" is obviously a lower-quality and more unreasonable candidate, and the probability of meeting the input intention of the user is lower.
Disclosure of Invention
In view of the above problems, embodiments of the present invention provide an intelligent word organizing method, an intelligent word organizing apparatus, and an apparatus for intelligent word organizing that overcome or at least partially solve the above problems, and the embodiments of the present invention can improve the rationality and quality of word organizing candidates, so that even in the case of failure of intelligent word organizing, more reasonable word organizing candidates can be provided, and further, the input efficiency of the user can be improved.
In order to solve the problems, the invention discloses an intelligent word forming method, which comprises the following steps:
acquiring input content of a user;
acquiring vocabularies to be grouped corresponding to the input content and the parts of speech of the vocabularies to be grouped;
determining part-of-speech collocation scores between adjacent words in a word-grouping path corresponding to the words to be grouped according to preset part-of-speech collocation rules and the parts of speech of the words to be grouped; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech;
determining a path score of the word grouping path according to a part of speech collocation score between adjacent words contained in the word grouping path;
and acquiring word forming candidates from the word forming path according to the path score.
Optionally, the step of determining a part-of-speech collocation score between adjacent words in a word grouping path corresponding to the to-be-grouped word includes:
determining the part of speech of adjacent words in the word-forming path corresponding to the word-forming vocabularies to be formed according to the part of speech of each word-forming vocabularies to be formed;
and when the part-of-speech collocation of the adjacent words accords with a preset part-of-speech collocation rule, taking the score corresponding to the preset part-of-speech collocation rule as the part-of-speech collocation score between the adjacent words.
Optionally, the score corresponding to the preset part-of-speech collocation rule is obtained through the following steps:
acquiring part-of-speech collocation contents which accord with the preset part-of-speech collocation rule from a preset corpus;
counting collocation probabilities between adjacent words in the part-of-speech collocation contents;
and determining a score corresponding to the preset part-of-speech collocation rule according to the collocation probability between adjacent words in all part-of-speech collocation contents.
Optionally, the input content includes: inputting a string, the method further comprising:
segmenting the input string to obtain a corresponding segmentation result;
and searching in a word bank to obtain words matched with the segmentation result, wherein the words are used as words to be grouped corresponding to the input string.
Optionally, the input content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.
Optionally, the step of determining the path score of the word grouping path according to the part-of-speech collocation score between adjacent words included in the word grouping path includes:
obtaining a path score of the word grouping path according to the part of speech collocation scores between all adjacent words contained in the word grouping path; or
And obtaining a path score of the word forming path according to the part of speech collocation scores of all adjacent words contained in the word forming path and the binary relation score hit by the word forming path.
Optionally, before the step of determining a part-of-speech matching score between adjacent words in a word-grouping path corresponding to the word-grouping object according to a preset part-of-speech matching rule and the part-of-speech of each word-grouping object, the method further includes:
searching in a binary library according to adjacent words in the word forming path corresponding to the words to be formed to obtain a binary relation matched with the adjacent words;
and when the search of the binary library is not hit, executing the step of determining the part of speech collocation score between adjacent words in the word-group path corresponding to the word-group to be formed according to a preset part of speech collocation rule and the part of speech of each word-group to be formed.
Optionally, the step of obtaining word grouping candidates from the word grouping path according to the path score includes:
ranking the path scores;
and selecting the word forming paths ranked at the top N from the word forming paths as word forming candidates according to the ranking result of the path scores.
Optionally, the preset part-of-speech collocation rule includes: at least one of collocation rules between the numerals, collocation rules between adverbs and verbs, collocation rules between adverbs and adjectives, collocation rules between verbs and nouns, collocation rules between adjectives and nouns, and collocation rules between quantifiers and nouns.
On the other hand, the invention discloses an intelligent word-composing device, comprising:
the content receiving module is used for acquiring input content of a user;
the word part of speech acquisition module is used for acquiring the words to be grouped corresponding to the input content and the parts of speech of the words to be grouped;
a collocation score determining module, configured to determine a part-of-speech collocation score between adjacent words in a word-grouping path corresponding to the word-grouping object according to a preset part-of-speech collocation rule and the part-of-speech of each word-grouping object; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech;
a path score determining module, configured to determine a path score of the word grouping path according to a part-of-speech collocation score between adjacent words included in the word grouping path; and
and the word group candidate acquisition module is used for acquiring word group candidates from the word group path according to the path score.
Optionally, the collocation score determination module includes:
the part of speech determining submodule is used for determining the part of speech of adjacent words in the word forming path corresponding to the word to be formed according to the part of speech of each word to be formed; and
and the score determining submodule is used for taking the score corresponding to the preset part-of-speech collocation rule as the part-of-speech collocation score between the adjacent words when the part-of-speech collocation of the adjacent words accords with the preset part-of-speech collocation rule.
Optionally, the apparatus further comprises: a score obtaining module for obtaining a score corresponding to the preset part of speech collocation rule;
the score acquisition module comprises:
the part-of-speech matching content submodule is used for acquiring part-of-speech matching contents which accord with the preset part-of-speech matching rule from a preset corpus;
the collocation probability statistic submodule is used for counting collocation probabilities between adjacent words in the part-of-speech collocation contents; and
and the score determining submodule is used for determining a score corresponding to the preset part-of-speech collocation rule according to collocation probabilities between adjacent words in all part-of-speech collocation contents.
Optionally, the input content includes: inputting a string, the apparatus further comprising:
the segmentation module is used for segmenting the input string to obtain a corresponding segmentation result;
and the word stock searching module is used for searching in a word stock to obtain words matched with the segmentation result and used as the words to be grouped corresponding to the input string.
Optionally, the input content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.
Optionally, the path score determining module comprises:
a first path score determining sub-module, configured to obtain a path score of the word grouping path according to a part-of-speech collocation score between all adjacent words included in the word grouping path; or
And the second path score determining submodule is used for obtaining the path score of the word forming path according to the part of speech collocation scores of all adjacent words contained in the word forming path and the binary relation score hit by the word forming path.
Optionally, the apparatus further comprises:
and the binary library searching module is used for searching in the binary library according to the adjacent words in the word group path corresponding to the word group to be formed before the collocation score determining module determines the word property collocation score between the adjacent words in the word group path corresponding to the word group to be formed according to a preset word property collocation rule and the word property of each word group to be formed, so as to obtain a binary relation matched with the adjacent words, and triggering the collocation score determining module when the search of the binary library is not in use.
Optionally, the word group candidate obtaining module includes:
a sorting submodule for sorting the path scores;
and the selection sub-module is used for selecting the word forming paths ranked at the top N from the word forming paths as word forming candidates according to the ranking result of the path scores.
Optionally, the preset part-of-speech collocation rule includes: at least one of collocation rules between the numerals, collocation rules between adverbs and verbs, collocation rules between adverbs and adjectives, collocation rules between verbs and nouns, collocation rules between adjectives and nouns, and collocation rules between quantifiers and nouns.
In yet another aspect, an apparatus for intelligent word formation is disclosed that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to include instructions for:
acquiring input content of a user;
acquiring vocabularies to be grouped corresponding to the input content and the parts of speech of the vocabularies to be grouped;
determining part-of-speech collocation scores between adjacent words in a word-grouping path corresponding to the words to be grouped according to preset part-of-speech collocation rules and the parts of speech of the words to be grouped; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech;
determining a path score of the word grouping path according to a part of speech collocation score between adjacent words contained in the word grouping path;
and acquiring word forming candidates from the word forming path according to the path score.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, in the intelligent word composing process, a preset part-of-speech collocation rule is utilized to determine part-of-speech collocation scores between adjacent words in a word composing path corresponding to a word to be composed; because the preset part-of-speech collocation rule is used for describing collocation relationship between parts of speech, usually the stronger the collocation relationship between parts of speech, the higher the corresponding part-of-speech collocation score is, and the weaker the collocation relationship between parts of speech, the lower the corresponding part-of-speech collocation score is, the embodiment of the invention uses the part-of-speech collocation score as the basis of the path score of the word-formation path, so that the path score of the word-formation path with the stronger collocation relationship between parts of speech is higher than the path score of the word-formation path with the weaker collocation relationship between parts of speech, thereby improving the probability that the word-formation path with the stronger collocation relationship between parts of speech is used as the word-formation candidate, that is, the embodiment of the invention uses the part-of-speech collocation score as the basis of the path score of the word-formation path, can improve the rationality and quality of the word-formation candidate, thus, even under the situation of intelligent word-formation failure, can provide more reasonable word-formation candidates, thereby improving the input efficiency of the user.
Drawings
FIG. 1 is a flowchart illustrating the steps of a first embodiment of an intelligent word-composing method of the present invention;
FIG. 2 is a flowchart illustrating the steps of a second embodiment of the intelligent word organizing method of the present invention;
FIG. 3 is a block diagram of an embodiment of an intelligent word forming apparatus according to the present invention;
FIG. 4 is a block diagram of an apparatus 900 for intelligent word formation in accordance with the present invention; and
fig. 5 is a schematic diagram of a server in some embodiments of the invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Method embodiment one
Referring to fig. 1, a flowchart illustrating steps of a first embodiment of an intelligent word organizing method according to the present invention is shown, which may specifically include the following steps:
step 101, acquiring input content of a user;
102, acquiring vocabularies to be grouped corresponding to the input content and the parts of speech of the vocabularies to be grouped;
103, determining part-of-speech collocation scores between adjacent words in a word-grouping path corresponding to the words to be grouped according to preset part-of-speech collocation rules and the parts of speech of the words to be grouped; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech;
104, determining a path score of the word grouping path according to a part of speech collocation score between adjacent words contained in the word grouping path;
and 105, acquiring word forming candidates from the word forming path according to the path score.
The embodiment of the invention can be applied to input method systems of various input modes, for example, the input modes can specifically include input modes such as keyboard symbols, handwritten information, voice input and the like, that is, a user can input on-screen contents through coded character strings, handwritten attribute characteristics and the like. Taking a voice input mode as an example, the input method system can collect a voice signal input by a user, convert the voice signal into text information, and divide the text information into words to be formed to form words. The following description will mainly take an example of an input method of a code string (hereinafter, referred to as an input string), and other input methods may be referred to each other.
In the field of input method systems, no matter the input method systems are in Chinese, Japanese, Korean or other languages, input strings of users are converted into candidate items of corresponding languages, and then the users select contents output to an application program, wherein the contents output to the application program through screen-up operation are also screen-up contents. In the process of converting the input string of the user into the candidate item of the corresponding language, the entry corresponding to the input string can be directly searched from the word stock, and if the entry is searched, the searched entry can be used as the candidate item, for example, the entry corresponding to the input string 'nihao' or 'tianqihenhao' or 'good weather' and the like can be directly searched from the word stock. Optionally, the word bank of the embodiment of the present invention may specifically include: a system thesaurus, a user thesaurus, a cell thesaurus, a cloud thesaurus, and the like, and the specific thesaurus is not limited in the embodiment of the present invention.
However, in practical applications, there are many reasons that there is no entry directly hit by an input string in the vocabulary library, and optionally, when there is a large number of words (e.g. phrases or long sentences) to be input by the user or when there is content to be input that has not been input before, there may be a case that there is no entry directly hit by an input string in the vocabulary library, in which case the input method system may trigger the intelligent word-forming function. For example, a user may want to enter "nine ten thousand eighty yuan" through the input string "jiuwanliliangianyuan", or "nine ten thousand eighty thousand" through the input string "jiuwanlilianqian", or "jiuwangdifangxia" through the input string "qingqingdifangxia" to put down gently ", or" genidobeiebenfam "to input" through "genghaodiilijebenfam" to better understand the present invention ", and there may be no entry in the thesaurus that these input strings hit directly.
The existing intelligent word-composing scheme utilizes the binary relation (the collocation relation between words and phrases) in the binary library to compose words for input strings. However, for intelligent word formation including a number word, a quantifier or an adverb, a great number of binary relations are often required, which not only has high requirements on the size and storage space of a binary library, but also often leads to failure of intelligent word formation due to insufficient coverage rate of the binary relations. Taking intelligent word formation of the number words as an example, the collocation relationship among all the number words needs to be stored in the binary library, and if the coverage rate of storage is not enough, the intelligent word formation will fail. Although a large number of binary relationships such as "ten thousand-one thousand", "twenty thousand-one thousand", "thirty thousand-one thousand", …, "ninety thousand-one thousand", "twenty thousand-two thousand", … "ninety thousand-nine thousand", "one thousand-one hundred", … "nine thousand-nine hundred" are stored in the binary library, if "ninety thousand-eight thousand" and "eighty thousand-two hundred" are not stored, when the input string is "jiuwanlilalgianwa", an intelligent word formation failure may occur.
Aiming at the problems of intelligent word formation of digital words, quantifier words or adverbs, the embodiment of the invention creatively provides a preset part of speech collocation rule, and utilizes the preset part of speech collocation rule to determine part of speech collocation scores between adjacent words in a word formation path corresponding to a word formation to be formed in an intelligent word formation process; because the preset part-of-speech collocation rule is used for describing collocation relationship between parts of speech, usually the stronger the collocation relationship between parts of speech, the higher the corresponding part-of-speech collocation score is, and the weaker the collocation relationship between parts of speech, the lower the corresponding part-of-speech collocation score is, the embodiment of the invention uses the part-of-speech collocation score as the basis of the path score of the word-formation path, so that the path score of the word-formation path with the stronger collocation relationship between parts of speech is higher than the path score of the word-formation path with the weaker collocation relationship between parts of speech, thereby improving the probability that the word-formation path with the stronger collocation relationship between parts of speech is used as the word-formation candidate, that is, the embodiment of the invention uses the part-of-speech collocation score as the basis of the path score of the word-formation path, can improve the rationality and quality of the word-formation candidate, thus, even under the situation of intelligent word-formation failure, can provide more reasonable word-formation candidates, thereby improving the input efficiency of the user.
In this embodiment of the present invention, optionally, the input content may include: and inputting a string, wherein the embodiment of the invention can search and obtain the vocabulary to be grouped corresponding to the input string in the word stock. For example, if the input string is "jiuwanlilaganqianyuan", the corresponding vocabulary to be composed may include: "ninety thousand", "two thousand", "yuan", or "play", "qian", "hospital", etc.
In another optional embodiment of the present invention, the input content may further include, in addition to the input string: the context to which the input string corresponds. This above may be applicable to a scenario where the user enters the coherent content multiple times. For example, if the user wants to input "eighty thousand, three hundred and forty", he first inputs and displays "eighty thousand" and then inputs "liangqian", the vocabulary corresponding to "eighty thousand" and "liangqian" may be used as the vocabulary to be assembled. The context may be applicable to a situation where a user edits the already-on-screen content. For example, if the user first inputs "today is sunny" and then moves the cursor to a position before "sunny" and types in the input string "feech," an embodiment of the present invention may group the word corresponding to "feec" with its "sunny" below. It is understood that the embodiment of the present invention does not impose any limitation on the specific word-forming scenarios corresponding to the contexts.
In the embodiment of the present invention, the preset part-of-speech collocation rule may be used to describe a collocation relationship between arbitrary parts of speech, such as the same part of speech or different parts of speech. Moreover, the preset part-of-speech collocation rule can relate to collocation relationship between two or more than two parts of speech. Optionally, the preset part-of-speech collocation rule may specifically include: at least one of collocation rules between the numerals, collocation rules between adverbs and verbs, collocation rules between adverbs and adjectives, collocation rules between verbs and nouns, collocation rules between adjectives and nouns, and collocation rules between quantifiers and nouns. It can be understood that those skilled in the art can determine the required preset part-of-speech collocation rules according to the actual application requirements, and any collocation relationship between parts of speech is within the protection scope of the preset part-of-speech collocation rules of the embodiments of the present invention.
The embodiment of the invention can carry out word formation on the vocabulary to be formed so as to obtain the corresponding word formation path. For example, each word-grouping path may include n words to be grouped, respectively denoted as V1、V2…Vi…VnIn the word formation process of the words to be formed, the embodiment of the present invention may determine the part-of-speech matching score between adjacent words in the word formation path corresponding to the words to be formed according to a preset part-of-speech matching rule and the part of speech of each word to be formed. Alternatively, the part-of-speech collocation score between adjacent words may be represented as Vi-1And ViThe collocation score between the two adjacent words can also be expressed as Vi-1、Vi、Vi+1The matching score between them.
In an optional embodiment of the present invention, the step 103 of determining the part-of-speech collocation score between adjacent words in the word formation path corresponding to the to-be-formed word may specifically include: determining the part of speech of adjacent words in the word-forming path corresponding to the word-forming vocabularies to be formed according to the part of speech of each word-forming vocabularies to be formed; and when the part-of-speech collocation of the adjacent words accords with a preset part-of-speech collocation rule, taking the score corresponding to the preset part-of-speech collocation rule as the part-of-speech collocation score between the adjacent words. Assuming that the number of words to be grouped corresponding to the input content is P, each word-grouping path may include n words to be grouped, and usually P is greater than n, the parts of speech of the adjacent words in each word-grouping path corresponding to the words to be grouped may be determined according to the parts of speech of the P words to be grouped. For example, the vocabulary to be composed corresponding to the input string "jiuwanlilianqyanyuan" may include: the parts of speech of the adjacent words in each word group path, such as the parts of speech of the adjacent words in the word group path 1 "ninety thousand + two thousand + yuan", or the parts of speech of the adjacent words in the word group path 2 "play + qian lian + hospital", can be obtained from all the words to be grouped.
In the embodiment of the present invention, optionally, the score corresponding to the preset part-of-speech matching rule may be obtained by presetting, for example, the input method system may preset the score corresponding to the preset part-of-speech matching rule based on experience, or the user may preset the score corresponding to the preset part-of-speech matching rule based on the user's own needs, and the like.
In an optional embodiment of the present invention, the score corresponding to the preset part-of-speech collocation rule may be divided into a plurality of score levels, where different score levels are used to indicate the strength of the collocation relationship between parts of speech. For example, the number of the score levels may be 3, and referring to table 1, an example of a preset part-of-speech collocation rule and a corresponding score thereof according to the present invention is shown, where a > B > C, for example, "nine ten thousand" and "two thousand" are both numbers, and the collocation relationship between the two is very strong, so the corresponding score may be a; the matching relationship between the quantifier and the noun is weaker than that between the quantifier "station" and the noun "tv", for example. Optionally, a is equal to 1, B is equal to 0.7, and C is equal to 0.4, it can be understood that a person skilled in the art may determine the value of A, B, C according to the actual application requirement, and the embodiment of the present invention does not limit the specific score value corresponding to the preset part-of-speech collocation rule.
TABLE 1
Preset part-of-speech collocation rules Score of
Collocation rule between digital words A
Collocation rule between digital words and quantifier words A
Rules for collocation between verbs and nouns B
Collocation rules between adjectives and nouns B
Collocation rules between adverbs and verbs B
Rules for matching between adverbs and adjectives B
Collocation rule between quantifier and noun C
In an optional embodiment of the present invention, the score corresponding to the preset part-of-speech collocation rule may be obtained based on statistics of the preset corpus, and accordingly, the process of obtaining the score corresponding to the preset part-of-speech collocation rule may include: acquiring part-of-speech collocation contents which accord with the preset part-of-speech collocation rule from a preset corpus; counting collocation probabilities between adjacent words in the part-of-speech collocation contents; and determining a score corresponding to the preset part-of-speech collocation rule according to the collocation probability between adjacent words in all part-of-speech collocation contents.
In practical applications, the preset corpus may be derived from an existing corpus, for example, for fast input of chinese, the existing corpus may include a chinese corpus, or the preset corpus may be derived from a famous book, an internet corpus, a history input record recorded by an input method program, or the like. It is understood that any corpus is within the scope of the preset corpus of the embodiment of the present invention.
In the embodiment of the present invention, part-of-speech matching contents meeting the preset part-of-speech matching rules may be obtained from a preset corpus, for example, for matching rules between a digital word and a digital word, part-of-speech matching contents meeting the matching rules between the digital word and the digital word, such as "ten thousand-one thousand", "twenty thousand-one thousand", "thirty thousand-one thousand", "ninety thousand-one thousand", "twenty thousand-two thousand", and the like, may be obtained from the preset corpus; further, the collocation probability between adjacent words in each part-of-speech collocation content may be obtained in a statistical manner, and optionally, the collocation probability may be obtained according to the adjacent co-occurrence probability of the adjacent words, for example, the number of sentences or word strings obtained by segmenting a preset corpus is Q, the number of occurrences of a certain part-of-speech collocation content in Q sentences or word strings is M, and the corresponding adjacent co-occurrence probability is M/Q, which can be understood.
In the process of determining the score corresponding to the preset part-of-speech collocation rule according to the collocation probability between adjacent words in all part-of-speech collocation contents, the collocation probability between adjacent words in all part-of-speech collocation contents can be averaged and the average value is used as the score corresponding to the preset part-of-speech collocation rule, or the collocation probability between adjacent words in all part-of-speech collocation contents can be weighted and averaged, and the weighted average processing result is used as the score corresponding to the preset part-of-speech collocation rule. In an application example of the present invention, for the collocation rule between the number words, the collocation probability between the adjacent words in all the part-of-speech collocation contents is higher, so the corresponding score is also higher; as for the collocation rules between quantifications and nouns, the collocation probability between adjacent words in some part of speech collocation contents (such as the quantifications "table" and "tv", such as the quantifications "person" and "apple") is higher, and the collocation probability between adjacent words in some part of speech collocation contents (such as the quantifications "table" and "man", the quantifications "bar" and "man") is lower, so the corresponding scores are also lower.
In another optional embodiment of the present invention, the score corresponding to the preset part-of-speech collocation rule may be obtained based on statistics of binary relations recorded in the binary library, and specifically, multiple binary relations conforming to the preset part-of-speech collocation rule may be obtained from the binary library, and the collocation probabilities between two words corresponding to the multiple binary relations are averaged to obtain the score corresponding to the preset part-of-speech collocation rule. Taking the collocation rule between the digital words as an example, all binary relations conforming to the collocation rule between the digital words can be obtained from the binary library, such as "ten thousand-one thousand", "twenty thousand-one thousand", "thirty thousand-one thousand", "nine thousand-one thousand", "twenty thousand-two thousand", and the like, and the collocation probabilities between two words corresponding to the various binary relations are averaged. The embodiment of the present invention does not limit the specific process of obtaining the score corresponding to the preset part-of-speech collocation rule based on the statistics of the binary relation recorded in the binary library.
The preset part-of-speech collocation rules are mainly described by taking the preset part-of-speech collocation rules of the Chinese as an example, and it can be understood that a person skilled in the art can set the applicable preset part-of-speech collocation rules for other languages except the Chinese according to the actual application requirements, for example, the preset part-of-speech collocation rules corresponding to the part-of-speech collocation rules of the English are set for the part-of-speech of the English, the preset part-of-speech collocation rules corresponding to the part-of-speech of the Japanese and the part-of-speech of the French are set for the part-of-speech of the Japanese, the preset part-of-speech collocation rules corresponding to the part-of-speech of the French, and the like, and it can be understood that the collocation relationship between any part-of-speech of any language is within the protection scope of the preset part-of-speech collocation rules of the embodiment of the present invention.
In an optional embodiment of the present invention, step 103 may have a corresponding trigger condition, and specifically, before step 103, the method may further include: searching in a binary library according to adjacent words in the word forming path corresponding to the words to be formed to obtain a binary relation matched with the adjacent words; and when the search of the binary library is not hit, executing the step 103 of determining part of speech collocation scores between adjacent words in the word-group path corresponding to the word-group to be formed according to preset part of speech collocation rules and the part of speech of each word-group to be formed. Of course, step 103 may be executed without any trigger condition, or step 103 may be executed when the search of the binary library hits, in which case, the path score of the word group path may be obtained according to the part of speech collocation scores of all adjacent words included in the word group path and the binary relation score hit by the word group path at the same time. It is understood that the embodiment of the present invention does not impose any limitation on the specific trigger condition of step 103.
Step 104 may determine a path score of the word grouping path according to the part-of-speech collocation score between adjacent words included in the word grouping path output in step 103. In an alternative embodiment of the present invention, step 104 may comprise:
obtaining a path score of the word grouping path according to the part of speech collocation scores between all adjacent words contained in the word grouping path; or
And obtaining a path score of the word forming path according to the part of speech collocation scores of all adjacent words contained in the word forming path and the binary relation score hit by the word forming path.
In practical applications, the basis of the path score may include only the part of speech collocation score, or may include: a combination of part-of-speech collocation scores and other scores, which may optionally include: a binary relation score (i.e., a score when a binary relation recorded in the binary library is hit), a word frequency in each vocabulary to be composed in the vocabulary composing path, a word library (in which the score of the user word library is greater than that of the non-user word library), and the like. When the combination of the part-of-speech collocation score and other scores is adopted, the part-of-speech collocation score and other scores may be weighted and averaged, for example, the part-of-speech collocation score, the binary relation score, the word frequency, the word bank, and the like may all have corresponding weights, it can be understood that a person skilled in the art may determine the corresponding weights according to actual application requirements, and for example, the part-of-speech collocation score, the binary relation score, the word frequency, the word bank, and the like may all have corresponding weights of 0.3, 0.4, 0.15, and the like, and the embodiment of the present invention does not limit the specific weights corresponding to the part-of-speech collocation score, the binary relation score, the word frequency, and the word bank.
In an optional embodiment of the present invention, in order to ensure the priority of the binary relationship, the weight of the part of speech collocation score does not exceed the weight of the binary relationship score, and of course, the embodiment of the present invention does not limit the specific weights of the part of speech collocation score and the binary relationship score.
Step 105 may obtain word grouping candidates from the word grouping path according to the path score output in step 104. For example, a word formation path with the highest path score may be selected from the word formation paths as a word formation candidate according to the path scores, or a word formation path with a path score greater than a score threshold may be selected from the word formation paths as a word formation candidate, or a plurality of word formation paths with the highest path scores may be selected from the word formation paths as a word formation candidate, specifically, the path scores may be sorted, and a word formation path with the top N bits may be selected from the word formation paths as a word formation candidate according to a sorting result of the path scores, where N is a natural number.
In summary, in the intelligent word organizing method of the embodiment of the present invention, in the intelligent word organizing process, a preset part-of-speech collocation rule is used to determine part-of-speech collocation scores between adjacent words in a word organizing path corresponding to a word to be organized; because the preset part-of-speech collocation rule is used for describing collocation relationship between parts of speech, usually the stronger the collocation relationship between parts of speech, the higher the corresponding part-of-speech collocation score is, and the weaker the collocation relationship between parts of speech, the lower the corresponding part-of-speech collocation score is, the embodiment of the invention uses the part-of-speech collocation score as the basis of the path score of the word-formation path, so that the path score of the word-formation path with the stronger collocation relationship between parts of speech is higher than the path score of the word-formation path with the weaker collocation relationship between parts of speech, thereby improving the probability that the word-formation path with the stronger collocation relationship between parts of speech is used as the word-formation candidate, that is, the embodiment of the invention uses the part-of-speech collocation score as the basis of the path score of the word-formation path, can improve the rationality and quality of the word-formation candidate, thus, even under the situation of intelligent word-formation failure, can provide more reasonable word-formation candidates, thereby improving the input efficiency of the user.
Method embodiment two
Referring to fig. 2, a flowchart illustrating steps of a second embodiment of the intelligent word organizing method of the present invention is shown, which may specifically include the following steps:
step 201, acquiring input content of a user; the input content may include: an input string, or the input string and its corresponding context;
step 202, segmenting the input string to obtain a corresponding segmentation result;
step 203, searching in a word bank to obtain a vocabulary matched with the segmentation result, and using the vocabulary as a vocabulary to be grouped corresponding to the input string;
step 204, acquiring the part of speech of each vocabulary to be grouped;
step 205, determining a part-of-speech collocation score between adjacent words in a word-grouping path corresponding to the word-to-be-grouped according to a preset part-of-speech collocation rule and the part-of-speech of each word-to-be-grouped; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech;
step 206, determining a path score of the word grouping path according to a part-of-speech collocation score between adjacent words contained in the word grouping path;
step 207, obtaining word forming candidates from the word forming path according to the path score.
In practical application, the input string may be segmented according to the rules of the input string. If the input string is a pinyin string, segmentation can be performed according to syllable rules. An input string may have one or more slicing schemes, each of which may include one or more substrings. For example, the input string "jiuwanliangqianyuan" may be split into "jiu 'wan' liang 'qian' yuan" and the input string "fangan" may be split into "fang 'an" or "fan' gan".
In practical application, the vocabulary to be grouped corresponding to each substring can be obtained by searching in a system word bank and a user word bank. Words to be grouped such as "jiu' wan" may include: the vocabulary to be composed corresponding to "nine ten thousand", "just play", "liang' qian" may include: the words to be formed corresponding to "two thousand", "beam qian" and "yuan" may include: "Yuan", "Hospital", etc., wherein the parts of speech of "Jiuwan", "Play", "two thousand", "Liqian", "Yuan", "Hospital" are respectively the number, verb, number, noun, quantifier, noun.
In the word forming process of the words to be formed, the word matching score between adjacent words in the word forming path corresponding to the words to be formed is determined according to the preset word matching rule and the word property of each word to be formed.
In order to make the embodiment of the present invention better understood, an example of an intelligent word-forming method of the present invention is provided herein, which may specifically include the following steps:
step S1, receiving an input string 'jiuwanlilagaqianyuan';
step S2, the input string is cut to obtain the cutting result
“jiu’wan’liang’qian’yuan”;
Step S3, searching in a word stock to obtain a vocabulary to be grouped corresponding to the segmentation result;
step S4, performing word formation on the vocabulary to be formed to obtain a corresponding word formation path; assume a word-grouping path 1: "nine ten thousand + two thousand + yuan", word-forming path 2: "just play + qian beam + home";
step S5, determining part-of-speech collocation scores between adjacent words in the word-forming path corresponding to the words to be formed according to preset part-of-speech collocation rules and the parts of speech of each word to be formed;
in practical application, the preset part of speech collocation rule can be used for scoring two word formation paths, namely the word formation path 1 and the word formation path 2. For the word "nine ten thousand + two thousand + yuan", the word "nine ten thousand + two thousand" conforms to the collocation rule between the number word and the number word, so that the score a can be obtained, and the word "two thousand + yuan" conforms to the collocation rule between the number word and the number word, so that the score a can be obtained, so that the part of speech collocation score of the word "nine ten thousand + two thousand + yuan" is 2A; for the 'Play + Liqian + yard', the score B can be obtained because the 'Play + Liqian' conforms to the collocation rule between the verb and the noun, and the 'Liqian + yard' does not conform to the preset part-of-speech collocation rule and is not obtained, so the part-of-speech collocation score of the 'Play + Liqian + yard' is B.
Step S6, determining a path score of the word grouping path according to the part-of-speech collocation score between adjacent words included in the word grouping path, and obtaining word grouping candidates from the word grouping path according to the path score.
Assuming that neither the word formation path 1 nor the word formation path 2 has a hit binary relationship, the corresponding path scores are 2A and B, respectively, and since 2A is much greater than B, the candidate "ninety thousand two thousand yuan" corresponding to the word formation path 1 "ninety thousand + two thousand + yuan" can be used as a word formation candidate.
The embodiment of the invention takes the part of speech collocation score as the basis of the path score of the word group path, so that the path score of the word group path with strong collocation relationship between the parts of speech is higher than the path score of the word group path with strong collocation relationship between the parts of speech, and the probability of taking the word group path with strong collocation relationship between the parts of speech as the word group candidate is further improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of motion combinations, but those skilled in the art should understand that the present invention is not limited by the described motion sequences, because some steps may be performed in other sequences or simultaneously according to the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no moving act is required as an embodiment of the invention.
Device embodiment
Referring to fig. 3, a block diagram of an embodiment of an input device according to the present invention is shown, which may specifically include: a content receiving module 301, a vocabulary part of speech obtaining module 302, a collocation score determining module 303, a path score determining module 304 and a word group candidate obtaining module 305.
The content receiving module 301 is configured to obtain input content of a user;
a vocabulary part-of-speech obtaining module 302, configured to obtain vocabularies to be grouped corresponding to the input content and parts-of-speech of each vocabulary to be grouped;
a collocation score determining module 303, configured to determine a part-of-speech collocation score between adjacent words in a word grouping path corresponding to the word grouping unit according to a preset part-of-speech collocation rule and the part of speech of each word to be grouped; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech;
a path score determining module 304, configured to determine a path score of the word grouping path according to a part-of-speech collocation score between adjacent words included in the word grouping path;
a word group candidate obtaining module 305, configured to obtain a word group candidate from the word group path according to the path score.
Optionally, the collocation score determining module 303 may include:
the part of speech determining submodule is used for respectively determining the part of speech of adjacent words in the word forming path corresponding to the words to be formed according to the part of speech of the words to be formed; and
and the score determining submodule is used for taking the score corresponding to the preset part-of-speech collocation rule as the part-of-speech collocation score between the adjacent words when the part-of-speech collocation of the adjacent words accords with the preset part-of-speech collocation rule.
Optionally, the apparatus may further include: a score obtaining module for obtaining the score corresponding to the preset part of speech collocation rule;
the score acquisition module may include:
the part-of-speech matching content submodule is used for acquiring part-of-speech matching contents which accord with the preset part-of-speech matching rule from a preset corpus;
the collocation probability statistic submodule is used for counting collocation probabilities between adjacent words in the part-of-speech collocation contents; and
and the score determining submodule is used for determining a score corresponding to the preset part-of-speech collocation rule according to collocation probabilities between adjacent words in all part-of-speech collocation contents.
Optionally, the input content may include: inputting a string, the apparatus may further include:
the segmentation module is used for segmenting the input string to obtain a corresponding segmentation result;
and the word stock searching module is used for searching in a word stock to obtain words matched with the segmentation result and used as the words to be grouped corresponding to the input string.
Optionally, the input content may further include: the context corresponding to the input string, the vocabulary to be composed corresponding to the input content may include: and the vocabulary to be grouped corresponding to the input string and the context.
Optionally, the path score determining module 304 may include:
a first path score determining sub-module, configured to obtain a path score of the word grouping path according to a part-of-speech collocation score between all adjacent words included in the word grouping path; or
And the second path score determining submodule is used for obtaining the path score of the word forming path according to the part of speech collocation scores of all adjacent words contained in the word forming path and the binary relation score hit by the word forming path.
Optionally, the apparatus may further include:
a binary library searching module, configured to search in a binary library according to adjacent words in the word group path corresponding to the word group to be formed before the collocation score determining module 303 determines a part-of-speech collocation score between adjacent words in the word group path corresponding to the word group to be formed according to a preset part-of-speech collocation rule and the part of speech of each word group to be formed, so as to obtain a binary relationship matching the adjacent words, and trigger the collocation score determining module 303 when the search in the binary library is not in order.
Optionally, the word group candidate obtaining module 305 may include:
a sorting submodule for sorting the path scores;
and the selection sub-module is used for selecting the word forming paths ranked at the top N from the word forming paths as word forming candidates according to the ranking result of the path scores.
Optionally, the preset part of speech collocation rule may include: at least one of collocation rules between the numerals, collocation rules between adverbs and verbs, collocation rules between adverbs and adjectives, collocation rules between verbs and nouns, collocation rules between adjectives and nouns, and collocation rules between quantifiers and nouns.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 4 is a block diagram illustrating an apparatus 900 for intelligent word formation, according to an example embodiment. For example, the apparatus 900 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 4, apparatus 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916.
The processing component 902 generally controls overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.
The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 906 provides power to the various components of the device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 900.
The multimedia component 908 comprises a screen providing an output interface between the device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide motion action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when apparatus 900 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.
I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 914 includes one or more sensors for providing status assessment of various aspects of the apparatus 900. For example, the sensor assembly 914 may detect an open/closed state of the device 900, the relative positioning of the components, such as a display and keypad of the apparatus 900, the sensor assembly 914 may also detect a change in the position of the apparatus 900 or a component of the apparatus 900, the presence or absence of user contact with the apparatus 900, orientation or acceleration/deceleration of the apparatus 900, and a change in the temperature of the apparatus 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 916 is configured to facilitate communications between the apparatus 900 and other devices in a wired or wireless manner. The apparatus 900 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the apparatus 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of a smart terminal, enable the smart terminal to perform an intelligent word-composing method, the method comprising: acquiring input content of a user; acquiring vocabularies to be grouped corresponding to the input content and the parts of speech of the vocabularies to be grouped; determining part-of-speech collocation scores between adjacent words in a word-grouping path corresponding to the words to be grouped according to preset part-of-speech collocation rules and the parts of speech of the words to be grouped; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech; determining a path score of the word grouping path according to a part of speech collocation score between adjacent words contained in the word grouping path; and acquiring word forming candidates from the word forming path according to the path score.
Fig. 5 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is only limited by the appended claims
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
The foregoing describes in detail an intelligent word organizing method, an intelligent word organizing device, and a device for intelligent word organizing provided by the present invention, and specific examples are applied herein to explain the principle and the implementation of the present invention, and the description of the above examples is only used to help understand the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (28)

1. An intelligent word-composing method, comprising:
acquiring input content of a user;
acquiring vocabularies to be grouped corresponding to the input content and the parts of speech of the vocabularies to be grouped;
if the binary relation matched with the adjacent words does not exist in the binary library, determining part-of-speech collocation scores between the adjacent words in the word-forming path corresponding to the words to be formed according to a preset part-of-speech collocation rule and the part of speech of each word to be formed; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech; the adjacent vocabulary is specifically adjacent vocabulary in the vocabulary combination path corresponding to the vocabulary combination to be combined; the preset part-of-speech collocation rules comprise: matching rules between the digital words and the quantifier;
determining a path score of the word grouping path according to a part of speech collocation score between adjacent words contained in the word grouping path;
and acquiring word forming candidates from the word forming path according to the path score.
2. The method according to claim 1, wherein the step of determining the part-of-speech collocation score between adjacent words in the word-composing path corresponding to the word-to-be-composed includes:
determining the part of speech of adjacent words in the word-forming path corresponding to the word-forming vocabularies to be formed according to the part of speech of each word-forming vocabularies to be formed;
and when the part-of-speech collocation of the adjacent words accords with a preset part-of-speech collocation rule, taking the score corresponding to the preset part-of-speech collocation rule as the part-of-speech collocation score between the adjacent words.
3. The method according to claim 1 or 2, wherein the score corresponding to the preset part of speech collocation rule is obtained by the following steps:
acquiring part-of-speech collocation contents which accord with the preset part-of-speech collocation rule from a preset corpus;
counting collocation probabilities between adjacent words in the part-of-speech collocation contents;
and determining a score corresponding to the preset part-of-speech collocation rule according to the collocation probability between adjacent words in all part-of-speech collocation contents.
4. The method of claim 1 or 2, wherein the inputting the content comprises: inputting a string, the method further comprising:
segmenting the input string to obtain a corresponding segmentation result;
and searching in a word bank to obtain words matched with the segmentation result, wherein the words are used as words to be grouped corresponding to the input string.
5. The method of claim 4, wherein the inputting the content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.
6. The method according to claim 1 or 2, wherein the step of determining the path score of the word grouping path according to the part of speech collocation score between adjacent words included in the word grouping path comprises:
obtaining a path score of the word grouping path according to the part of speech collocation scores between all adjacent words contained in the word grouping path; or
And obtaining a path score of the word forming path according to the part of speech collocation scores of all adjacent words contained in the word forming path and the binary relation score hit by the word forming path.
7. The method according to claim 1 or 2, wherein before the step of determining the part-of-speech collocation score between adjacent words in the word-grouping path corresponding to the word-grouping object according to the preset part-of-speech collocation rule and the part-of-speech of each word-grouping object, the method further comprises:
searching in a binary library according to adjacent words in the word forming path corresponding to the words to be formed to obtain a binary relation matched with the adjacent words;
and when the search of the binary library is not hit, executing the step of determining the part of speech collocation score between adjacent words in the word-group path corresponding to the word-group to be formed according to a preset part of speech collocation rule and the part of speech of each word-group to be formed.
8. The method according to claim 1 or 2, wherein the step of obtaining the word grouping candidate from the word grouping path according to the path score comprises:
ranking the path scores;
and selecting the word forming paths ranked at the top N from the word forming paths as word forming candidates according to the ranking result of the path scores.
9. The method according to claim 1 or 2, wherein the preset part of speech collocation rules further comprise: at least one of a collocation rule between adverbs and verbs, a collocation rule between adverbs and adjectives, a collocation rule between adjectives and nouns, and a collocation rule between quantifiers and nouns.
10. An intelligent word-composing device, comprising:
the content receiving module is used for acquiring input content of a user;
the word part of speech acquisition module is used for acquiring the words to be grouped corresponding to the input content and the parts of speech of the words to be grouped;
a collocation score determining module, configured to determine a part-of-speech collocation score between adjacent words in a word-formation path corresponding to the word-formation to be performed according to a preset part-of-speech collocation rule and the part-of-speech of each word-formation to be performed if a binary relation matching with the adjacent words does not exist in the binary library; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech; the adjacent vocabulary is specifically adjacent vocabulary in the vocabulary combination path corresponding to the vocabulary combination to be combined; the preset part-of-speech collocation rules comprise: matching rules between the digital words and the quantifier;
a path score determining module, configured to determine a path score of the word grouping path according to a part-of-speech collocation score between adjacent words included in the word grouping path; and
and the word group candidate acquisition module is used for acquiring word group candidates from the word group path according to the path score.
11. The apparatus of claim 10, wherein the collocation score determination module comprises:
the part of speech determining submodule is used for determining the part of speech of adjacent words in the word forming path corresponding to the word to be formed according to the part of speech of each word to be formed; and
and the score determining submodule is used for taking the score corresponding to the preset part-of-speech collocation rule as the part-of-speech collocation score between the adjacent words when the part-of-speech collocation of the adjacent words accords with the preset part-of-speech collocation rule.
12. The apparatus of claim 10 or 11, further comprising: a score obtaining module for obtaining a score corresponding to the preset part of speech collocation rule;
the score acquisition module comprises:
the part-of-speech matching content submodule is used for acquiring part-of-speech matching contents which accord with the preset part-of-speech matching rule from a preset corpus;
the collocation probability statistic submodule is used for counting collocation probabilities between adjacent words in the part-of-speech collocation contents; and
and the score determining submodule is used for determining a score corresponding to the preset part-of-speech collocation rule according to collocation probabilities between adjacent words in all part-of-speech collocation contents.
13. The apparatus of claim 10 or 11, wherein the input content comprises: inputting a string, the apparatus further comprising:
the segmentation module is used for segmenting the input string to obtain a corresponding segmentation result;
and the word stock searching module is used for searching in a word stock to obtain words matched with the segmentation result and used as the words to be grouped corresponding to the input string.
14. The apparatus of claim 13, wherein the input content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.
15. The apparatus of claim 10 or 11, wherein the path score determination module comprises:
a first path score determining sub-module, configured to obtain a path score of the word grouping path according to a part-of-speech collocation score between all adjacent words included in the word grouping path; or
And the second path score determining submodule is used for obtaining the path score of the word forming path according to the part of speech collocation scores of all adjacent words contained in the word forming path and the binary relation score hit by the word forming path.
16. The apparatus of claim 10 or 11, further comprising:
and the binary library searching module is used for searching in the binary library according to the adjacent words in the word group path corresponding to the word group to be formed before the collocation score determining module determines the word property collocation score between the adjacent words in the word group path corresponding to the word group to be formed according to a preset word property collocation rule and the word property of each word group to be formed, so as to obtain a binary relation matched with the adjacent words, and triggering the collocation score determining module when the search of the binary library is not in use.
17. The apparatus according to claim 10 or 11, wherein the word group candidate obtaining module comprises:
a sorting submodule for sorting the path scores;
and the selection sub-module is used for selecting the word forming paths ranked at the top N from the word forming paths as word forming candidates according to the ranking result of the path scores.
18. The apparatus according to claim 10 or 11, wherein the preset part of speech collocation rules comprise: at least one of collocation rules between the numerals, collocation rules between adverbs and verbs, collocation rules between adverbs and adjectives, collocation rules between verbs and nouns, collocation rules between adjectives and nouns, and collocation rules between quantifiers and nouns.
19. An apparatus for intelligent word formation, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors, the one or more programs comprising instructions for:
acquiring input content of a user;
acquiring vocabularies to be grouped corresponding to the input content and the parts of speech of the vocabularies to be grouped;
if the binary relation matched with the adjacent words does not exist in the binary library, determining part-of-speech collocation scores between the adjacent words in the word-forming path corresponding to the words to be formed according to a preset part-of-speech collocation rule and the part of speech of each word to be formed; the preset part-of-speech collocation rules are used for describing collocation relations among parts of speech; the adjacent vocabulary is specifically adjacent vocabulary in the vocabulary combination path corresponding to the vocabulary combination to be combined; the preset part-of-speech collocation rules comprise: matching rules between the digital words and the quantifier;
determining a path score of the word grouping path according to a part of speech collocation score between adjacent words contained in the word grouping path;
and acquiring word forming candidates from the word forming path according to the path score.
20. The apparatus of claim 19, wherein the determining the part-of-speech collocation score between neighboring words in the word-grouping path corresponding to the word to be grouped comprises:
determining the part of speech of adjacent words in the word-forming path corresponding to the word-forming vocabularies to be formed according to the part of speech of each word-forming vocabularies to be formed;
and when the part-of-speech collocation of the adjacent words accords with a preset part-of-speech collocation rule, taking the score corresponding to the preset part-of-speech collocation rule as the part-of-speech collocation score between the adjacent words.
21. The apparatus of claim 19 or 20, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:
acquiring part-of-speech collocation contents which accord with the preset part-of-speech collocation rule from a preset corpus;
counting collocation probabilities between adjacent words in the part-of-speech collocation contents;
and determining a score corresponding to the preset part-of-speech collocation rule according to the collocation probability between adjacent words in all part-of-speech collocation contents.
22. The apparatus of claim 19 or 20, wherein the input content comprises: the device is also configured to execute, by one or more processors, the one or more programs including instructions for:
segmenting the input string to obtain a corresponding segmentation result;
and searching in a word bank to obtain words matched with the segmentation result, wherein the words are used as words to be grouped corresponding to the input string.
23. The apparatus of claim 22, wherein the input content further comprises: and if the context corresponding to the input string is the context corresponding to the input content, the vocabulary to be composed corresponding to the input content comprises: and the vocabulary to be grouped corresponding to the input string and the context.
24. The apparatus according to claim 19 or 20, wherein said determining the path score of the word grouping path according to the part of speech collocation score between adjacent words included in the word grouping path comprises:
obtaining a path score of the word grouping path according to the part of speech collocation scores between all adjacent words contained in the word grouping path; or
And obtaining a path score of the word forming path according to the part of speech collocation scores of all adjacent words contained in the word forming path and the binary relation score hit by the word forming path.
25. The apparatus of claim 19 or 20, wherein the apparatus is also configured to execute the one or more programs by one or more processors includes instructions for:
before determining the part-of-speech collocation score between adjacent words in the word-forming path corresponding to the word-forming group according to the preset part-of-speech collocation rule and the part-of-speech of each word-forming group, searching in a binary library according to the adjacent words in the word-forming path corresponding to the word-forming group to obtain a binary relation matched with the adjacent words;
and when the search of the binary library is not hit, determining part-of-speech collocation scores between adjacent words in a word-grouping path corresponding to the word-grouping unit to be grouped according to preset part-of-speech collocation rules and the part of speech of each word-grouping unit to be grouped.
26. The apparatus according to claim 19 or 20, wherein said obtaining a word group candidate from the word group path according to the path score comprises:
ranking the path scores;
and selecting the word forming paths ranked at the top N from the word forming paths as word forming candidates according to the ranking result of the path scores.
27. The apparatus according to claim 19 or 20, wherein the preset part of speech collocation rules comprise: at least one of collocation rules between the numerals, collocation rules between adverbs and verbs, collocation rules between adverbs and adjectives, collocation rules between verbs and nouns, collocation rules between adjectives and nouns, and collocation rules between quantifiers and nouns.
28. One or more machine readable media having instructions stored thereon that, when executed by one or more processors, cause an apparatus to perform the method of one or more of claims 1-9.
CN201610996202.5A 2016-11-11 2016-11-11 Intelligent word forming method and device for intelligent word forming Active CN108073292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610996202.5A CN108073292B (en) 2016-11-11 2016-11-11 Intelligent word forming method and device for intelligent word forming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610996202.5A CN108073292B (en) 2016-11-11 2016-11-11 Intelligent word forming method and device for intelligent word forming

Publications (2)

Publication Number Publication Date
CN108073292A CN108073292A (en) 2018-05-25
CN108073292B true CN108073292B (en) 2021-10-15

Family

ID=62153729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610996202.5A Active CN108073292B (en) 2016-11-11 2016-11-11 Intelligent word forming method and device for intelligent word forming

Country Status (1)

Country Link
CN (1) CN108073292B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807321A (en) * 2018-07-20 2020-02-18 北京搜狗科技发展有限公司 Word combination method and device, electronic equipment and readable storage medium
CN108664143A (en) * 2018-09-06 2018-10-16 上海二三四五网络科技有限公司 A kind of control method and control device handling context association input in input method system
CN110908523A (en) * 2018-09-14 2020-03-24 北京搜狗科技发展有限公司 Input method and device
CN110209765B (en) * 2019-05-23 2021-03-30 武汉绿色网络信息服务有限责任公司 Method and device for searching keywords according to meanings
CN110309513B (en) * 2019-07-09 2023-07-25 北京金山数字娱乐科技有限公司 Text dependency analysis method and device
CN110781288A (en) * 2019-10-30 2020-02-11 安阳师范学院 Method and device for composing words by Chinese characters
CN112987941B (en) * 2019-12-17 2024-02-13 北京搜狗科技发展有限公司 Method and device for generating candidate words

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100458795C (en) * 2007-02-13 2009-02-04 北京搜狗科技发展有限公司 Intelligent word input method and input method system and updating method thereof
WO2012159249A1 (en) * 2011-05-20 2012-11-29 Microsoft Corporation Advaced prediction
CN104182059A (en) * 2013-05-23 2014-12-03 华为技术有限公司 Generation method and system of natural language
CN104423623B (en) * 2013-09-02 2018-10-12 联想(北京)有限公司 It is a kind of to select word treatment method and electronic equipment
CN104850241A (en) * 2015-05-28 2015-08-19 北京奇点机智信息技术有限公司 Mobile terminal and text input method thereof

Also Published As

Publication number Publication date
CN108073292A (en) 2018-05-25

Similar Documents

Publication Publication Date Title
CN108073292B (en) Intelligent word forming method and device for intelligent word forming
CN107918496B (en) Input error correction method and device for input error correction
CN107870677B (en) Input method, input device and input device
CN107291260B (en) Information input method and device for inputting information
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN108304412B (en) Cross-language search method and device for cross-language search
CN110069624B (en) Text processing method and device
CN109101505B (en) Recommendation method, recommendation device and device for recommendation
CN107424612B (en) Processing method, apparatus and machine-readable medium
CN108073294B (en) Intelligent word forming method and device for intelligent word forming
CN110780749B (en) Character string error correction method and device
CN109979435B (en) Data processing method and device for data processing
CN110633017A (en) Input method, input device and input device
CN112987941B (en) Method and device for generating candidate words
CN114115550A (en) Method and device for processing association candidate
CN109992120B (en) Input error correction method and device
CN109388252B (en) Input method and device
CN114610163A (en) Recommendation method, apparatus and medium
CN113589954A (en) Data processing method and device and electronic equipment
CN112306252A (en) Data processing method and device and data processing device
CN112181163A (en) Input method, input device and input device
CN110781270A (en) Method and device for constructing non-keyword model in decoding network
CN111103986A (en) User word stock management method and device and input method and device
CN111381685B (en) Sentence association method and sentence association device
CN110716653B (en) Method and device for determining association source

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant