Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The embodiment of the invention is extracted the name language material from the user thesaurus of input method, and screening generates people's thesaurus, and people's thesaurus is trained optimization, and the name Word library updating after training is optimized improves the rationality and the accuracy rate of name group speech to the user who uses input method.
Fig. 1 shows the realization flow of people's thesaurus generation method that the embodiment of the invention provides, and details are as follows:
In step S101, obtain user thesaurus;
During user's input characters, input method can be recorded in subscriber's local with the word string of user's input, forms user thesaurus.In embodiments of the present invention, can automatically user thesaurus be reported background server by input method, perhaps initiatively the user thesaurus of this locality is reported background server by the user, background server obtains the user thesaurus of input method or reporting of user, based on user thesaurus, from user thesaurus, extract the name language material.
In step S102, from the user thesaurus that is obtained, extract the name language material;
In embodiments of the present invention, realize name group speech, at first will collect the surname of name, to distinguish a speech or whether a phrase (2~4 words) is a name.Because the surname of China is limited, by being easy to finish, collect the name surname based on artificially collecting of One Hundred Family Names, generate the surname dictionary.
The embodiment of the invention is extracted the name language material according to surname from user thesaurus after obtaining user thesaurus.
Surname has the branch of monosyllabic name and two-character surname, and name generally has the branch of single-character given name and two-character given name, if monosyllabic name, then name generally mostly is most three words, and minimum is two words, if two-character surname, then name generally mostly is most four words, and minimum is three words.The embodiment of the invention can be passed through these name rules when extracting the name language material, utilize the surname in the surname dictionary to search in user thesaurus as key word, according to the name of a word or two words, extracts the name language material that may exist in the user thesaurus.For example the word string in the user thesaurus is " Zhang San has gone to school ", and can judge " opening " from the surname dictionary is a surname, and " three " and " three go " that then extract the back are as the name language material.
In step S103, screening name group speech generates people's thesaurus from the name language material;
In embodiments of the present invention, because the name language material is simply to divide from user thesaurus according to the name of one or two word, may have some so is not the data of name group speech, for example " Zhang San goes ", therefore need screen the name language material, filter out the data that are not name group speech.
At first, the single-character given name word directly as name group speech, is write people's thesaurus.
The name language material of single-character given name, for example in " Zhang San ", these " three " also can be used as the part of name even if be not a name, therefore for the name language material of single-character given name, can regard as name group speech, and people's thesaurus writes direct.
Add up for the surname that adopts father and mother two sides, and then add the name of a word, for example " Yan Yangtian ", the name language material of this part is as the name group speech of the individual character people's thesaurus that also writes direct.
Secondly, the user is when input characters, and for not having in the user thesaurus, and it is complete to need the name of often input often once to spell, and deposits user thesaurus in, uses with after convenient.Therefore, from the name group speech of making, people's thesaurus can write direct for the user in the user thesaurus.
In addition, significant word generally can be used as the name group speech of double word, in embodiments of the present invention, with the name language material of the double word that occurs and the core word bank of input method, perhaps other more accurately the word of dictionary compare, filter out significant word and the nonsensical word of possibility, a name group speech regarded as basically in significant word.For example " three go " in " Zhang San goes " is not a significant word, can not find in the core word bank of input method, therefore keeps further screening.If there is a double word to be " longevity ", because " longevity " can find, think that then " longevity " is a name group speech in the dictionary of standard, write people's thesaurus.
Being subject to the capacity and the degree of accuracy of standard dictionary, may not be in the name language material of name group speech for what screen, and may also have greatly also is name group speech.In embodiments of the present invention, for this part name language material, can screen by artificial participation, can write people's thesaurus as the nonsense words of name group speech, to guarantee the precision of name group speech in the dictionary, for example " three is rich " in " Zhang Sanfeng " is not a significant word, by artificial participation screening, it as a name group speech, is write people's thesaurus.
For derogatory term or the bad word of implication, for example " bad ", the derogatory term of " wretch " and so on generally can not occur as name group speech, but may occur in the name language material of collecting, can screen and investigate removal by artificial participation, perhaps remove by collecting relevant word database.
In step S104, adjust the frequency of occurrences of name group speech in people's thesaurus.
In embodiments of the present invention, can obtain relatively comprehensively people's thesaurus by the way.By the probability that statistics name group speech occurs, adjust the word frequency of name group speech in people's thesaurus, to guarantee the quality of name group speech.
When specific implementation, background server is according to the user thesaurus that extracts, carry out the participle of surname and name, the probability that statistics name group speech occurs in the different user dictionary, basis as name group word frequency rate in the input method, thereby adjust the word frequency of name group speech in people's thesaurus, for example a name group speech occurs in a plurality of user thesaurus, illustrate that then this name group speech is the high-frequency name group speech that a lot of users are concerned about, then this name group speech is come the front of people's thesaurus, preferential output is shown to user's selection when the user imports.
In addition, according to being named custom, owing to generally all can consider to blurt out when being named, the whole identical words of for example seldom useful tone are as name, the data of this part can only be references, when this class speech occurs as name, its frequency in people's thesaurus can suitably be turned down.
In embodiments of the present invention, after the word frequency adjustment of people's thesaurus finished, the name Word library updating after the word frequency adjustment can being finished was to the user.
People's thesaurus can be used as an individual files, upgrades for user's download.During specific implementation, can in people's thesaurus, write a version number according to certain rule, for example can be with main. pair is represented, also can represent with a numbering, constantly increases progressively, and perhaps does version number according to the date of formation of people's thesaurus.
After input method starts, call its automatic refresh routine and background server communication, the version number information of background server verification people thesaurus judges whether to satisfy other update condition in case of necessity, for example the input method for some version may not need to upgrade, and perhaps can not upgrade people's thesaurus.
When finding to need to upgrade after refresh routine and the background server verification automatically, then download people's thesaurus to this locality, and cover the people's thesaurus file in the local original installation kit from background server.
Fig. 2 shows the structure of the name word stock generation device that the embodiment of the invention provides, and for convenience of explanation, only shows the part relevant with the embodiment of the invention.
This name word stock generation device may operate in the background server in the various character input systems, from the local user vocabulary of input method, extract the name language material, screening generates people's thesaurus, people's thesaurus is trained optimization, and the name Word library updating after training can being optimized improves the rationality and the accuracy rate of name group speech to the user who uses input method.
User thesaurus acquiring unit 21 obtains user thesaurus, and user thesaurus specifically can be reported automatically by input method, and perhaps the user initiatively reports.
Name language material extraction unit 22 extracts the name language material from the user thesaurus that is obtained.
As one embodiment of the present of invention, storage surname information in the surname dictionary 221, the name language material is searched extraction module 222 according to the surname in the surname dictionary 221, and name rule, search in user thesaurus and extract the name language material, specific implementation repeats no more as mentioned above.
People's thesaurus generation unit 23 is screening name group speech from the name language material that name language material extraction unit 22 extracts, and generates people's thesaurus 24, and specific implementation repeats no more as mentioned above.
The name group speech of people's thesaurus 24 storage people thesaurus generation units 23 screenings.
As one embodiment of the present of invention, name group speech in people's thesaurus 24 comprises the single-character given name word of single-character given name word, two surname stacks, name group speech, the significant word that the user makes certainly, perhaps through the nonsense words of artificial screening, specific implementation repeats no more as mentioned above.
Word frequency adjustment unit 25 is adjusted the word frequency of name group speech in people's thesaurus 24.
When the name group speech in 25 pairs of people's thesaurus 24 of word frequency adjustment unit carried out the word frequency adjustment, as one embodiment of the present of invention, name word-dividing mode 251 was carried out the participle of surname and name according to the user thesaurus and the surname dictionary 221 that extract.
The probability that probability of occurrence statistical module 252 statistics name group speech occur in the different user dictionary.
The probability that word frequency adjusting module 253 occurs in the different user dictionary according to name group speech is adjusted the word frequency of name group speech in people's thesaurus 24.
As one embodiment of the present of invention, after the word frequency adjustment of people's thesaurus was finished, the name Word library updating after the word frequency adjustment can being finished was to the user.
The name Word library updating that name Word library updating unit 26 will be adjusted after the word frequency arrives the user, and concrete renewal process repeats no more as mentioned above.
The embodiment of the invention is extracted the name language material from user thesaurus, people's thesaurus set up in screening name group speech, makes that the name language material is unrestricted, can generate better people's thesaurus, improves rationality and the accuracy rate of input method when the output name.Simultaneously, the name Word library updating is user-friendly to the user.
The above only is preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of being done within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.