CN104423621A - Pinyin string processing method and device - Google Patents

Pinyin string processing method and device Download PDF

Info

Publication number
CN104423621A
CN104423621A CN201310370370.XA CN201310370370A CN104423621A CN 104423621 A CN104423621 A CN 104423621A CN 201310370370 A CN201310370370 A CN 201310370370A CN 104423621 A CN104423621 A CN 104423621A
Authority
CN
China
Prior art keywords
information
user
character string
syllable
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310370370.XA
Other languages
Chinese (zh)
Inventor
张雷
张霓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201310370370.XA priority Critical patent/CN104423621A/en
Publication of CN104423621A publication Critical patent/CN104423621A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0236Character input methods using selection techniques to select from displayed items

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a pinyin string processing method and device. The pinyin string processing method comprises receiving input of a pinyin string and obtaining the current environmental information, wherein the current environmental information is information of a current application program and or an input frame receiving the pinyin string; obtaining individual information with the pinyin string being input according to the preset configuration information, wherein the individual information comprises user habit information and history environmental information and the history environmental information comprises the information of the environment at which Chinese character input is performed through a user originally; screening at least a group of pinyin combination which is corresponding to the pinyin string according to the individual information and the current environment information to determine a position of a word segmentation sign in the pinyin string. The pinyin string processing method reduces mutual action when the user inputs the Chinese character and improves the input efficiency.

Description

Pinyin character string manipulation method and apparatus
Technical field
The present invention relates to input method technique field, particularly relate to a kind of pinyin character string manipulation method and apparatus.
Background technology
Input method is the method according to pinyin character string and pinyin string input Chinese character, in order to the pinyin character string of user's input being converted to Chinese character output, need first to carry out cutting to the pinyin character string of user's input, pinyin character string being divided into the combination of legal syllable, is then Chinese character by syllable Combination conversion.
As shown in Figure 1, the method comprises the following steps a kind of pinyin character string manipulation method: step S102, carries out cutting to the pinyin character string received, and respectively the initial consonant in pinyin character string and simple or compound vowel of a Chinese syllable is obtained cutting substring sequence as cutting substring; Step S104, expands the cutting substring in cutting substring sequence, and generates expansion substring arrangement set according to spreading result; Step S106, carries out syllable extraction according to syllable composition characteristic to each expansion substring sequence in expansion substring arrangement set, obtains corresponding syllable sequence; Step S108, carries out legitimate verification to the syllable combination in each syllable sequence, and deletes the syllable sequence including illegal syllable according to the result; Step S110, exports the syllable combination after having carried out legitimate verification.
When using above-mentioned pinyin character string manipulation method to input Chinese character, adopting the mode of simple match to show each group of syllable by system combines for user's selection, as adopted intrinsic rule, participle is carried out to pinyin character string, show syllable combination in order, also namely, system is too rambunctiously for syllable corresponding for pinyin character string combination is sorted and selects by user.Especially when user uses nine traditional key board, when using nine key spelling input methods to input, because button each in keyboard exists multiplexing situation, syllable combination much that each pinyin character string inputted is corresponding, above-mentioned traditional pinyin character string manipulation mode cannot meet the expectation of user to the content of input, this mode only can provide the sequence of each group of same syllable combination uniformly, causing user more interactive action can be used just to find the candidate item of wish input when selecting the combination of the syllable of needs, reducing input efficiency.
Summary of the invention
The invention provides a kind of pinyin character string manipulation method and apparatus, to solve in current pinyin character string manipulation process, only can provide the sequence of each group of same syllable combination uniformly, user is caused to use more interactive action just can find the candidate item of wish input when selecting the combination of the syllable of needs, cannot satisfying personalized input demand, the problem that input efficiency is lower.
In order to solve the problem, the invention discloses a kind of pinyin character string manipulation method, comprising:
Receive the input of pinyin character string, obtain current context information; Wherein, described current context information is receive the described current application program of pinyin character string and/or the information of input frame;
According to the configuration information preset, obtain the customized information of the described pinyin character string of input; Wherein, described customized information comprises user habit information and history environment information, environmental information residing when described history environment information comprises that user is previous carries out Chinese charater input;
Be combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, thus determine the position of point lexicon in described pinyin character string.
Preferably, described customized information also comprises: temporal information and/or positional information; Wherein, to be that described user is previous carry out inputting and/or the previous time residing when carrying out network access described temporal information; To be that described user is previous carry out inputting and/or previous geographic position residing when carrying out network access described positional information;
Described when being combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, according to described user habit information and described current context information, and described current time information and/or described current location information, at least one group syllable corresponding to described pinyin character string is combined into row filter.
Preferably, the described at least one group syllable corresponding to described pinyin character string according to the information of described user habit information and described current environment is combined into row filter, thus determines that the step of the position of point lexicon in described pinyin character string comprises:
The each group syllable combination corresponding according to described pinyin character string and the matching result of described user habit information, obtain respectively and often organize custom weights corresponding to syllable combination;
According to the matching result of described current context information and described history environment information, obtain respectively and often organize environment weights corresponding to syllable combination;
According to the matching result of described current time information and/or described current location information and corresponding described temporal information and/or positional information, obtain respectively and often organize time weight corresponding to syllable combination and/or place weights;
The each weights corresponding to described each group of syllable combination are sued for peace, obtain the total characteristic weights of described each group of syllable combination, according to described total characteristic weights, line ordering is combined into described each group of syllable, and determines the position of point lexicon in described pinyin character string according to the preceding syllable combination of sequence.
Preferably, described customized information obtains from server end, and generates in the following manner:
Obtain described user carry out network access time historical behavior information; Wherein, described historical behavior information comprises: the content that described user is previous carries out inputting, time, geographic position, input environment, and/or described user is previous carries out the content of network access, time, geographic position;
Respectively according to interest model, environmental model, time model and the position model preset, respectively analyzing and processing is carried out to described historical behavior information;
Carry out the result of analyzing and processing according to described historical behavior information, generate the customized information that described server end is corresponding.
Preferably, described interest model comprises individual interest model and group interest model;
When each group syllable corresponding according to described pinyin character string combines the matching result with described user habit information, when obtaining the custom weights often organizing syllable combination correspondence respectively: described each group of syllable is combined and mates with the user habit information generated by described individual interest model respectively;
If mate unsuccessful, then described each group of syllable combination is mated with the user habit information by group interest model generation corresponding to described individual interest model respectively;
Obtain described each group of syllable according to described each group of syllable combination with the matching result of described group interest model and combine corresponding custom weights.
Preferably, described custom weights obtain in the following manner: the combination of target syllable is always inputted number of times × N by the multiple syllables combinations selecting pinyin character string described in number of times ÷ corresponding, and wherein, N is weight coefficient, and N be greater than 0 natural number.
Preferably, after the described at least one group syllable corresponding to described pinyin character string is combined into row filter, also comprise:
Obtain the candidate word of screening the syllable combination difference correspondence obtained;
Obtain the additional weights that each described candidate word is corresponding; Wherein, described additional weights obtain after carrying out analytic statistics to the selection of all users to described each group of Chinese character;
Export after described candidate word being sorted according to described additional weights.
Preferably, described additional weights, except obtaining after basis carries out analytic statistics to the selection of all users to described each group of Chinese character, also comprise:
According to described customized information and described current context information, obtain after analytic statistics is carried out to the described each group of Chinese character selected.
Preferably, described user habit information uses the number of times of each group of syllable combination and user to use the final time of each group of syllable combination to obtain according to active user.
In order to solve the problem, the invention also discloses a kind of pinyin character string manipulation device, comprising:
First acquisition module, for receiving the input of pinyin character string, obtains current context information; Wherein, described current context information is receive the described current application program of pinyin character string and/or the information of input frame;
Second acquisition module, for according to the configuration information preset, obtains the customized information of the described pinyin character string of input; Wherein, described customized information comprises user habit information and history environment information, environmental information residing when described history environment information comprises that user is previous carries out Chinese charater input;
3rd acquisition module, for being combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, thus determines the position of point lexicon in described pinyin character string.
Preferably, described customized information also comprises: temporal information and/or positional information; Wherein, to be that described user is previous carry out inputting and/or the previous time residing when carrying out network access described temporal information; To be that described user is previous carry out inputting and/or previous geographic position residing when carrying out network access described positional information;
Described 3rd acquisition module is when being combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, according to described user habit information and described current context information, and described current time information and/or described current location information, at least one group syllable corresponding to described pinyin character string is combined into row filter.
Compared with prior art, the present invention has the following advantages:
In pinyin character string manipulation scheme of the present invention, when user's input Pinyin character string, simultaneously according to customized information and the current context information of user, at least one group syllable corresponding to pinyin character string is combined into row filter, thus determines the position of point lexicon in pinyin character string.Current context information clearly indicates and receives the current application program of pinyin character string and/or the information of input frame, and the history environment information in customized information then indicates custom or the tendentiousness feature that this user carries out the input of pinyin character string under different input environment.Therefore, when the syllable corresponding to pinyin character string is combined into row filter, by the history environment information match in current context information and user personalized information, the syllable combination that screening is obtained more meets the custom of user, have more specific aim, thus solve in current pinyin character string manipulation process, only can provide the problem of the sequence of each group of same syllable combination uniformly.Especially when user uses nine key board inputs, can efficiently by the syllable combined sorting closer to user's request out, the repeatedly triggering repeated is needed when avoiding the selection syllable combination caused because of keyboard multiplexing, decreasing the interactive action of user when carrying out Chinese charater input, improve input efficiency.
Accompanying drawing explanation
Fig. 1 is the flow chart of steps of a kind of pinyin character string manipulation method of the prior art;
Fig. 2 is the flow chart of steps of a kind of pinyin character string manipulation method according to the embodiment of the present invention one;
Fig. 3 is the flow chart of steps of a kind of pinyin character string manipulation method according to the embodiment of the present invention two;
Fig. 4 is the flow chart of steps of a kind of pinyin character string manipulation method according to the embodiment of the present invention three;
Fig. 5 is the structured flowchart of a kind of pinyin character string manipulation device according to the embodiment of the present invention four.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
Embodiment one
With reference to Fig. 2, show the flow chart of steps of a kind of pinyin character string manipulation method according to the embodiment of the present application one.
The pinyin character string manipulation method of the present embodiment comprises the following steps:
Step S202: the input receiving pinyin character string, obtains current context information.
Wherein, current context information is receive the current application program of pinyin character string and/or information corresponding to input frame, comprise the category of employment information belonging to current application program, the function information etc. that input frame is corresponding, the installation descriptor corresponding by acquisition current application program or the prompting text capturing input frame corresponding acquire.
User can carry out input Pinyin character string by multiple input form, as inputted with the form of nine key boards by conventional keys mobile phone, also the conventional keyboard of the QWERTY standard of 26 keys can be adopted to input by smart mobile phone, can also by other forms inputs such as computor-keyboards.Input method can the environment of automatic acquisition input Pinyin character string, i.e. current input environment information, such as: when user is at note input frame input content, input method automatic acquisition current environment is note input frame, when user is at " google map " middle input content, input method automatic acquisition current environment is map software.
Step S204: according to the configuration information preset, obtains the customized information of input Pinyin character string.
Wherein, customized information comprises user habit information and history environment information, environmental information residing when history environment information comprises that user is previous carries out Chinese charater input.
In customized information, user habit information is used to indicate the tendentiousness feature of the previous Chinese charater input of user and/or the access of previous network; History environment information is used to indicate the previous tendentiousness feature of carrying out the input environment of Chinese charater input of user.The configuration information preset may be used for indicating from this locality or the customized information obtaining user from server end.
The tendentiousness feature of user habit information representation user previous Chinese charater input and/or the access of previous network.Such as, monitoring server often inputs the contents such as fishing, fishing gear, fishing friend in the terminal to user is previous, then can be analyzed by this content and show that this user has the interest of fishing, namely collect the habits information of user.This information will instruct the cutting of pinyin character string in follow-up input, as when this user inputs " diaoyu ", corresponding preferential syllable combination should be " diao " and " yu ", then divide the position of lexicon for " diao ' yu ", adjusting other again may syllable combination, and the position obtaining corresponding point lexicon is as " di ' ao ' yu " etc.Again such as, although user did not input the content of fishing aspect, but often browse fishing class website, then the previous network access data of server to user is analyzed and can be shown that this user has the interest of fishing equally, has collected the habits information of user equally.So, if during this user input " diaoyu ", also can obtain preferential syllable combination and should be " diao " and " yu ", thus the position of point lexicon is set to " diao ' yu ".In addition, environmental information residing when history environment information comprises that user is previous carries out Chinese charater input, characterize the previous tendentiousness feature of carrying out the input environment of Chinese charater input of user, as for pinyin character string " diaoyu ", tend to obtain syllable and be combined as " diao " and " yu " during the previous input in a search engine of user, and previously tend to obtain syllable combination " di " when inputting in map, " ao is " with " yu ", then in conjunction with above-mentioned user habit information, when user in a search engine input Pinyin character string " diaoyu " time, preferentially the position of point lexicon is set to " diao ' yu ", and in other application or input frame during input " diaoyu ", then can according to user habit information and current context information, determine each group of syllable combination that " diaoyu " is corresponding, as the position of point lexicon is set to " di ' ao ' yu " or " diao ' y ' u " etc.
Step S206: be combined into row filter according to customized information and the current context information at least one group of syllable corresponding to pinyin character string, thus determine the position of point lexicon in pinyin character string.
Due to the syllabification that each group syllable combination in pinyin character string is corresponding different, and then determine the diverse location of point lexicon in pinyin character string according to these syllabifications obtained.
Further, when at least one group of syllable that the information according to user habit information and current environment is corresponding to pinyin character string is combined into row filter, can also calculate according to customized information and current context information the weights often organizing syllable combination.Height according to weights is combined into line ordering to many spellings sound, the syllable combination that selected and sorted is the most forward, then the Chinese character combination obtaining the coupling that syllable combination is corresponding therewith.In other embodiments, also can according to customized information and current context information, after the DISPLAY ORDER determining each group of syllable combination, being each group of Chinese character setting numbering, is many group pinyin combinations sequence etc. according to number order.
By the pinyin character string manipulation scheme of the present embodiment, when user's input Pinyin character string, simultaneously according to customized information and the current context information of user, at least one group syllable corresponding to described pinyin character string is combined into row filter, thus determines the position of point lexicon in described pinyin character string.Current context information clearly indicates and receives the application program of current pinyin character string input and/or the information of input frame, and the history environment information in customized information then indicates custom or the tendentiousness feature that this user carries out Chinese charater input under different input environment.Therefore, participle is being carried out to pinyin character string, when screening each group of syllable combination, not only with reference to user habit information, also by the history environment information match in current context information and customized information, determine according to matching result the syllable combination meeting user habit, thus the syllable that this is obtained combination more meets the custom of user, improves the input efficiency of user.By the pinyin character string manipulation scheme of the present embodiment, solve in current pinyin character string manipulation process the sequence that only can provide each group of same syllable combination uniformly, , user is caused to need to use more interactive action could obtain the syllable combination of needs, input cost is high, the problem of user individual input demand cannot be met, especially when user uses nine key board inputs, syllable corresponding to pinyin character string is by the way combined into row filter, can by the syllable combined sorting closer to user's request out, what situation about solving because of keyboard multiplexing caused can corresponding multiple syllable combination when user inputs a pinyin character string, user needs the problem repeatedly carrying out triggering when the syllable combination selecting to need, decrease the interactive action of user when carrying out Chinese charater input, improve input efficiency.
Embodiment two
With reference to Fig. 3, show the flow chart of steps of a kind of pinyin character string manipulation method of the embodiment of the present application two.
The pinyin character string manipulation method of the present embodiment comprises the following steps:
Step S302: generate local personalized information data storehouse.
In local personalized information data storehouse except storing user habit information and history environment information, temporal information and/or the positional information of user can also be stored.
Local personalized information data storehouse can generate in the following manner: obtain the historical behavior information of user in this locality, wherein, historical behavior information comprises: user is previous carries out the content of Chinese charater input, time, geographic position, input environment at every turn, and/or user is previous carries out the content of network access, time, geographic position at every turn; Historical behavior information is analyzed and added up, and according to the personalized information data storehouse analyzed and statistics generation is local.Certainly, if do not store temporal information and/or the positional information of user in personalized information data storehouse, also content corresponding in historical behavior information can not be obtained, time, geographic position of at every turn carrying out Chinese charater input as previous in user, and/or, user's previous time, geographic position etc. of at every turn carrying out network access.Preferably, user habit information can use the number of times of each group of syllable combination and user to use the final time of each group of syllable combination to obtain according to active user.
The customized information that local personalized information data storehouse is preserved can upgrade in real time or every setting-up time section.Such as, user this select after one group of applicable Chinese character inputs, the time, position, input environment etc. of this group Chinese character and this input correspondence can be recorded in as a historical behavior information in local personalized information data storehouse, use when next time upgrades user personalized information.Particularly, as: user is current by inputting " xian " this pinyin character string in short message content input frame, when in the many groups Chinese character provided, selected " Xi'an " is for candidate word, and the time is point in afternoon three at that time, obtaining current location according to GPS to mobile terminal location is Xi'an, then local will preserve user's input pinyin character string " xian ", " xian " point lexicon position " xi ' an ", and the candidate word corresponding with it " Xi'an ", time 15 point, input environment note frame and Xi'an, geographic position information.When the follow-up customized information to user upgrades, the part as historical behavior information is stored in local personalized information data storehouse by above-mentioned information, the foundation that the customized information becoming user upgrades.As, before supposing renewal, the number of times of the selection " Xi'an " that this user's input Pinyin character string " xian " is corresponding is 10 times, wherein, the number of times selecting " Xi'an " 15 time periods is 2 times, the number of times selecting " Xi'an " in Xi'an during this city is 5 times, the number of times selecting " Xi'an " when note frame inputs also is 5 times, so in the updated, the number of times of total selection " Xi'an " is 11 times, corresponding at the input number of times of time 15, the input number of times in Xi'an during this city with use the number of times of note frame input also to add 1 all accordingly.
Step S304: the personalized information data storehouse of generation server end.
Identical with the personalized information data storehouse of this locality, in the personalized information data storehouse of server end except storing user habit information and history environment information, also corresponding stored can there be temporal information and/or the positional information of user.Wherein, to be that user is previous carry out inputting and/or the previous time residing when carrying out network access temporal information; To be that user is previous carry out inputting and/or previous geographic position residing when carrying out network access positional information.
The personalized information data storehouse of server end can generate in the following manner: obtain the historical behavior information that multiple user carries out network access, wherein, historical behavior information comprises: the content that user is previous carries out inputting, time, geographic position, input environment, and/or user is previous carries out the content of network access, time, geographic position; The historical behavior information using default interest model, environmental model, time model and position model corresponding to user respectively carries out analyzing and processing; The customized information of the result generation server end of analyzing and processing is carried out according to historical behavior information, and then the personalized information data storehouse of generation server end.Identical with during the user individual database that generation is local, in other embodiments, if do not store temporal information and/or the positional information of user in the personalized information data storehouse of server end, also can not obtain content corresponding in historical behavior information, corresponding model also need not be used to carry out analyzing and processing.
In above-mentioned various analytical model, interest model can adopt relevant maturity model, obtains by carrying out training to the content of the previous input of user and the reading aspects data of user.Wherein, the reading aspects of user comprises the Internet resources such as webpage, microblogging that user browses, and the literal resource such as mail, instant messaging.The topic be concerned about due to user and content and input content have positive correlation characteristic, therefore, can be analyzed the interest tendency of user by the previous input content of user and the reading aspects of user, as physical culture, and amusement, news, or finance and economics, social activity etc.Obtain in the previous reading content of content and the user of the previous input of user, when the content that user browses is internet content, the URL web page address that user accesses is uploaded onto the server, the text message in webpage needed for server captures.If text message is the content that can not capture, then, after the information needed for client obtains, records and upload onto the server.Text classification is carried out by capturing the content obtained, and according to attribute marking, thus finally determine the classification belonging to content, that is category of interest.At present, widespread use model-naive Bayesian, or most adjacent node algorithm KNN or vector space model carry out text classification, dynamically affect follow-up pinyin character string manipulation according to classification results.
Preferably, interest model can comprise individual interest model and group interest model, and body interest model corresponds at least one group interest model one by one; Wherein, individual interest model is used for carrying out analyzing and processing to the previous content of carrying out Chinese charater input and/or network access of unique user at every turn, and generates the user habit information of unique user according to the result of analyzing and processing; Group interest model is used for carrying out analyzing and processing to the previous content of carrying out Chinese charater input and/or network access of multiple user at every turn, and generates the user habit information of multiple user according to the result of analyzing and processing.Also namely, by under line to the cluster analysis that a large number of users data are carried out, thus to users classification, the people being about to same interest merges.Same user can belong to multiple classification, to browse in units of the similar colony of input.Group interest model can adopt textual classification model and the algorithm of existing maturation, is determined the point of interest of user, do not repeating at this by classification.
Environmental model also can adopt relevant maturity model, and the syllable inputted under different input environment by collecting the whole network user combines and corresponding input is that residing environmental information unceasing study training gets.The influence factor of environmental model mainly contains two parts, corresponding program when being input, i.e. the category of employment of application program, an edit box attribute corresponding when being input.The study of model and training browsing and inputting based on user, carry out cluster by features such as region, time, interest respectively, obtain the feature dictionary of different qualities.Further, by retrieving the data obtained after cluster, can obtain inputting weight corresponding to syllable combination.When user inputs, the current context information such as application, edit box type carrying out inputting place is sent to server by client, is given a mark by server.Wherein, edit box type can be pressed the corresponding function mating button with edit box and is divided into for search box, searching contact person frame, note input frame etc.; The category of employment of application program can divide instant messages class, security classes, digital map navigation class, audio frequency and video etc. into by the category of employment of software.According to the category of employment of different application programs, different input frames, under being dynamically given in this input environment, the sequence of different syllable combination, further, can also provide the weights of each syllable combination.
Time series analysis model can adopt relevant maturity model equally, by collecting the syllable combination that a large amount of users inputs under different time sections, take input time as feature foundation, the combination of the syllable of the input of all users under each time period is added up, generates the feature dictionary of classifying on a time period.
Position model also can adopt relevant maturity model, by collecting the syllable combination that a large amount of users inputs in different geographical, take input environment as feature foundation, the combination of the syllable of the input of all users under each input environment is added up, generates the feature dictionary by regional feature classification.
It should be noted that, the execution order in no particular order of above-mentioned steps S302 and S304, also can executed in parallel.Further, a foundation also can be selected in the personalized information data storehouse of above-mentioned local personalized information data storehouse and server end, and need not all set up.
In addition, preferably, the number of times that the user habit information in the customized information of above-mentioned this locality and server end also can use each syllable to combine according to active user and obtaining the last service time that user uses each syllable to combine.Further, in the present embodiment, all store customized information with the form of database, but it should be understood by those skilled in the art that in actual applications, other file layout suitable is arbitrarily applicable equally, as textual form or other appropriate format.
Step S306: the input receiving pinyin character string.
Wherein, the input of pinyin character string can adopt various ways, such as: when user's input " xian " pinyin character string, what nine key forms inputted is " 9426 ", and that the input of full key board form is " xian ".
Step S308: the current context information obtaining the input of pinyin character string.
Preferably, while acquisition current context information, current time information or current location information can also be obtained, also can obtain current time information and current location information simultaneously.
Wherein, environmental information is receive the described application program of pinyin character string and/or the information of input frame.Such as: user is in note input frame during input Pinyin character string, then environmental information is note input frame; User is in map software during input Pinyin character string, then environmental information is map software.
The current location information of pinyin character string input, the modes such as the GPS GPS of urban area, place, IP when can be inputted by user, WIFI network location, equipment are obtained, and the current location information of user is accurate to City-level.
Current time information can by software design patterns automatically from the equipment that user uses, and as obtained in mobile phone or computer and other input tools, current time information is Beijing time, is accurate to hour.Such as user is a pinyin character string of input at 7 in evening, then current time information is designated as 19:00.
Step S310: according to the configuration information preset, obtains the customized information of the user of input Pinyin character string from server end.
Configuration information indicates and obtains customized information from this locality or server end, and when configuration information instruction obtains from this locality, input method obtains the customized information of the user of input Pinyin character string from the personalized information data storehouse of this locality; When configuration information instruction obtains from server end, input method obtains the customized information of the user of input Pinyin character string from the personalized information data storehouse of server end.In the present embodiment, be described for the customized information obtaining user from server end.
Step S312: be combined into row filter according to customized information and the current context information at least one group of syllable corresponding to pinyin character string, thus determine the position of point lexicon in pinyin character string.
A kind of feasible pattern is, respectively each corresponding for pinyin character string syllable combination is mated with the user habit information in customized information, current context information is mated with the history environment information in customized information, filter out the syllable combination meeting and more meet consumers' demand, according to the position of in matching result determination pinyin character string point of lexicon.Certainly, can also be further processed matching result, as the matching result weighted sum to each several part information, according to weighted sum result, each corresponding for pinyin character string syllable is combined into row filter, thus determines the position of point lexicon in pinyin character string.
Preferably, when the current time information when obtaining input Pinyin character string and/or current location information, can also according to user personalized information and current context information, and current time information and/or the current location information at least one group syllable corresponding to pinyin character string are combined into row filter, thus determine the position of point lexicon in pinyin character string.
Specific implementation comprises: according to a setting rule, and each syllable corresponding according to pinyin character string respectively combines the matching result with user habit information, obtains respectively and often organizes user habit weights corresponding to syllable combination; According to the matching result of current context information and history environment information, obtain respectively and often organize environment weights corresponding to syllable combination; According to the matching result of current time information and/or current location information and corresponding user time information and/or customer position information, obtain respectively and often organize time weight corresponding to syllable combination and/or place weights; Combine corresponding weights to each group of syllable to sue for peace, obtain the total characteristic weights of each group of syllable combination, according to total characteristic weights, line ordering is combined into at least one group of syllable, and determine the position of point lexicon in pinyin character string according to the preceding syllable combination of sequence.Wherein, the acquisition of each weights above-mentioned in no particular order sequentially, also can walk abreast and carry out.
In addition, preferably, when obtaining customized information and the interest model of server end from server end and comprising individual interest model and group interest model, when obtain respectively according to each group of syllable combination corresponding to pinyin character string and the matching result of user habit information often organize syllable combine corresponding custom weights time: each group of syllable is combined and mates with the user habit information generated by individual interest model respectively; If mate unsuccessful, then the combination of each group of syllable is mated with the user habit information by group interest model generation corresponding to individual interest model respectively; Obtain each group of syllable according to each group of syllable combination with the matching result of group interest model and combine corresponding custom weights.Wherein, custom weights obtain in the following manner: the combination of target syllable is always inputted number of times × N by the multiple syllables combinations selecting number of times ÷ pinyin character string corresponding, and wherein, N is weight coefficient, and N be greater than 0 natural number.
This is because in some cases, the disappearance of user's historical behavior information can cause the incomplete of user habit information, causes Chinese character to mate unsuccessful with the user profile generated by individual interest model.But meanwhile, because the people with same interest may have identical user habit feature, therefore in colony, the user habit feature of other user can be accustomed to the reference of feature as unique user.Now, each group of syllable can be obtained according to each group of syllable combination with the matching result of group interest model respectively and combine corresponding custom weights.Such as, user A belongs to colony X, by learning the data analysis of users all in colony X, the common interest of all users in colony X is shopping, so when party A-subscriber's input " baisheng " this pinyin character string, if user habit information not corresponding in the customized information that user A is generated by individual interest model, namely user A is previous did not input this pinyin character string yet, then " baisheng " is mated with the customized information of the group of subscribers by group interest model generation, the syllable obtaining Optimum Matching corresponding to " baisheng " is combined as " bai ' sheng ", instead of due to button multiplexing in mobile platform produce " bai ' she ' ni " waits syllable to combine.
Another kind of preference ordering scheme is, after obtaining many groups syllable combination corresponding to pinyin character string, obtains the intrinsic weights often organizing each syllable in syllable combination; Wherein, intrinsic weights obtain after carrying out analytic statistics to the whole network user to the selection that each group of syllable combines; The intrinsic weights corresponding to the combination of each group of syllable and total characteristic weights are sued for peace, and obtain the total weight value of each syllable combination; According to the position of point lexicon in each syllable determination pinyin character string in the syllable combination before total weight value row.Such as, the scope setting intrinsic weights is 0-2048, and the scope of total characteristic weights is also 0-2048, and in total weight value, the scope of time weight is 0-512, and the scope of place weights is 0-512, and the scope of environment weights is 0-512, and the scope of custom weights is 0-512.Thus, the total weight value of one group of syllable combination correspondence is between 0-4096, and numerical value larger expression priority is higher.
When calculating total characteristic weights, a kind of mode uses following computing formula:
Weight(total characteristic)=Weight(is accustomed to)+Weight(environment)+Weight(the time)+Weight(place)
Wherein, Weight(total characteristic) represent the total characteristic weights that one group of syllable combines, be each weights sum obtained according to customized information; Weight(is accustomed to) represent the custom weights that the combination of this syllable is corresponding, be the combination of this syllable and the weights of user habit information matches result; Weight(environment) environment weights corresponding when representing user's input Pinyin character string are the current environment of input Pinyin character string and the weights of history environment information matches result; The Weight(time) time weight corresponding when representing user's input Pinyin character string is the current time of input Pinyin character string and the weights of user time information matches result; Weight(place) place weights when representing user's input Pinyin character string are the current location of input Pinyin character string and the weights of customer position information matching result.Certainly, only can consider the weight of user habit and environment when calculating total characteristic weights, also can consider the one or all weight in current time, current location while the weight considering user habit and input environment.
In above-mentioned formula,
The Weight(time) number of times/this syllable combination of=this syllable combinatorial input of this period always inputs number of times × 512;
Weight(place) number of times/this syllable combination that inputs the combination of this syllable of=this place always inputs number of times × 512;
Weight(is accustomed to) combination of=this syllable selected the combination of each group of syllable corresponding to number of times/pinyin character string always to input number of times × 512;
Weight(environment)=this application number of times/this syllable combination of inputting the combination of this syllable always inputs number of times × 512;
Wherein, customized information is local customized information or server end customized information, the weighted value that numeral 512,2048,4096 just represents each feature, can also according to the difference of weight, is certainly about to numeral carry out being set to natural number suitable arbitrarily.
Step S314: obtain the candidate word that syllable combination is corresponding respectively.
Such as: when screening the syllable be triggered in each syllable combination obtained and being combined as " xi ' an ", the candidate word obtaining this syllable combination Corresponding matching from dictionary is " Xi'an ", " west bank ", " Sion ".
Step S316: obtain the additional weights that each candidate word is corresponding.
Wherein, additional weights obtain after carrying out analytic statistics to the selection percentage of all users to each candidate word of the syllable combination correspondence chosen.
Such as, when for syllable combination " xi ' an ", obtain after the selection of all users is added up, candidate word " Xi'an " is 50 times by selection number of times, and candidate word " west bank " is 30 times by selection number of times, so, the additional weights that candidate word " Xi'an " is corresponding are just higher than the additional weights that candidate word " west bank " is corresponding.
Preferably, additional weights obtain after can also carrying out analytic statistics according to customized information and current context information to the candidate word selected, and namely carry out analytic statistics to all users to the selection of each group of candidate word and obtain after carrying out analytic statistics according to customized information and current context information.
Such as, or with candidate word " Xi'an " and " west bank ", as user inputs character string " xian ", determine that syllable combination " xi ' an " is for after the syllable combination that needs, the candidate word obtaining syllable combination corresponding has in " Xi'an " and " west bank ".So to consider that these two groups of candidate word are in the customized information of this user, under current input environment by selection number of times, and these two groups of candidate word are respectively by the number of times that all users select, and consider the factor of these two aspects, give and often organize candidate word with additional weights.
Step S318: export after candidate word being sorted according to additional weights.
As, can calculate the additional weights often organizing candidate word, the height according to additional weights sorts to many group candidate word; Also after the DISPLAY ORDER determining each group of Chinese character, can be each group of candidate word setting numbering, according to number order for many group candidate word carry out sorting etc., and by the order after sequence, each candidate word shown in the Chinese character region of mobile terminal.
Preferably, after the many group candidate word of user to display is selected, user-selected candidate word can also be obtained; And preserve this input information of user, e.g., final select candidate word, the combination of corresponding syllable, input time the corresponding information such as environment, time, place, provide foundation for subsequent user customized information upgrades.
By the pinyin character string manipulation method of the present embodiment, when user's input Pinyin character string, according to customized information and the current environment of user, time, place, calculate the total characteristic weights often organizing syllable combination, according to the position of point lexicon in the height adjustment pinyin character string of total characteristic weights, solve in current pinyin character string manipulation process the sequence that only can provide each group of same syllable combination uniformly, user is caused to need to use more interactive action just can obtain the syllable combination of needs, input efficiency is low, the problem of user individual input demand cannot be met.Simultaneously, the many group candidate word corresponding to the syllable combination chosen sorts according to additional weights, preferential display more meets the Chinese character combination of user's request, this kind of pinyin character string manipulation mode can sort to candidate word according to the customized information of user and current context information, the candidate word that user expects preferentially is exported.When user uses nine key board inputs, the pinyin combinations of meeting consumers' demand can be screened, the repeatedly triggering repeated is needed when avoiding the selection syllable combination caused because of keyboard multiplexing, decrease the interactive action of user when inputting, improve input efficiency, meet the individual demand of user.
Embodiment three
With reference to Fig. 4, show the flow chart of steps of a kind of pinyin character string manipulation method of the embodiment of the present application three.
The pinyin character string manipulation method of the present embodiment comprises the following steps:
Step S402: user's input Pinyin character string.
User can input key sequence, and such as, when user thinks input " Xi'an ", then nine corresponding in the keyboard of mobile terminal key input forms are " 9426 ", and full key board input form is " xian ".
Step S404: the current location, current time, the current context information that obtain input Pinyin character string.
Step S406: obtain customized information, in conjunction with the current location of input Pinyin character string, current time and current context information, at least one group syllable corresponding to pinyin character string is combined into row filter, determines the position of point lexicon in pinyin character string according to the preceding syllable combination of sequence.
The preceding syllable that sorts combination can be sequence the most front syllable combination, also can be setting sequence former syllable combination.
In the present embodiment, after getting the customized information of user, according to customized information, be combined into row filter in conjunction with at least one group of syllable carrying out the current location of the client inputted, current time and current environment corresponding to described pinyin character string, thus determine the position of point lexicon in pinyin character string.
The customized information of user represents the individualized feature of user, comprise user characteristics and environmental characteristic two parts, the user's characteristic information that wherein user characteristics is corresponding comprises temporal information, location information and user habit information, represents the temporal characteristics of user, Site characterization and user habit interest respectively.
Temporal characteristics for user: such as, under nine keys, " si " and " qi " are same codes, when the 16:00-17:00 time period, user is more prone to input syllable combination " si ", namely the entry that candidate word " 4 points " is relevant, and other times, be more prone to input syllable combination " qi ", the entry that namely candidate word " 7 points " is relevant.In addition, " evening " and " morning " is also repeated code at nine keys, and the custom for everyone is also not quite similar.
Site characterization for user: the content of different place inputs also can be not quite similar, such as in the place of work, and in tourism way, the content of input is not quite similar.Input identical pinyin character string, user may expect different candidates.As, the pinyin character string inputting " 9426 " corresponding under user's nine key boards is in a mobile device " xian ", user may be more prone to input syllable combination " xian " at home, and when travelling, or when user is in Xi'an, may more need syllable to combine before " xi ' an " come.
For user habit interest: the input habit of user is formed for a long time, not malleable.As a simplicity user, when inputting under nine key boards, last word simplicity can be used as far as possible, at this moment, will seem abnormal intimate according to the sorting consistence that the custom of user is done, more can improve the input efficiency of user.As, pinyin character string " ban " is cut into syllable combination " ba ' n " " ba ' n " is come before recommend user; For another example pinyin character string " beng " is cut into syllable combination " ben ' g " etc., similar example also has a lot.Equally, the content of the content that user reads and user's input also has positive correlation, the interested parties that can be analyzed user by the content browsed in a large number user and input content to, thus by the position of point lexicon in the current pinyin character string of the input habit weighted influence of the crowd with same interest.
The information that environmental characteristic during user's input Pinyin character string is corresponding is the another part describing user individual feature, same pinyin character string, under different input environments, performance is also not quite similar, as user's input Pinyin character string " yuan ", if user used syllable to combine " yu ' an " (prediction scheme) in word document, and in map application, used syllable to combine " yuan " (garden), then according to the history use habit of user, when the two is the used entry of user simultaneously and when there is input environment information, input environment factor will determine the screening of each syllable combination that pinyin character string is corresponding, thus in adjustment pinyin character string, divide the position of lexicon.
Customized information can obtain in this locality, also can obtain from server end.When obtaining customized information from this locality, based on the history input data of user, constantly add up, form the additional weights of the current input Pinyin character string of impact.As, to all possible pinyin character string, in configuration file, carrying out binary chop, when searching, is corresponding comparable weights by the convert information such as time, place of user characteristics in its customized information.When obtaining customized information from server end, send the pinyin character string of user's input to server end, analyzed by interest model, environmental model, time model and position model, obtain and be accustomed to weights, environment weights, time weight and place weights accordingly.
In the present embodiment, be retrieved as example with customized information from this locality.As " xi ' an " this syllable is combined in mobile phone and used 10 times altogether, wherein used 8 times in map application, used 2 times in word document, the used time is: 14:002 time, 15:003 time, 16:003 time, 18:002 time.Use 9 times in Xi'an, use 1 time in Beijing." xian " this syllable combinationally used 30 times, used 1 time in Xi'an, used 29 times in Beijing.The used time is: 14:005 time, 15:005 time, 16:005 time, 17:005 time, 18:005 time, 19:005 time, not used in map application, uses 30 times in word." xi ' an " last service time is 15:01.
To sum up, assuming that current time is 15:30, the current location being obtained user place by GPS is Xi'an, input Pinyin character string " xian " in map application, then the total weight value that " xi ' an " the syllable combination in corresponding syllable combination obtains is: (3/10) × 512+ (9/10) × 512+(10/35) × 512+(8/10) × 512=1170.29; And the total weight value that the combination of " xian " syllable obtains is: (1/30) × 512+ (5/30) × 512+ [30/ (10+30)] × 512+(0/30) × 512=486.4.Syllable combination sorted according to the height of total weight value, known syllable combination " xi ' an " sequence is front at " xian ".
And if subsidiary for pinyin character string " xian " the current information such as time, place, input environment is sent to server end, mark will be provided by various characteristic model.If often browse the scenic spots and historical sites information in Xi'an before user, so, interest model analyzes active user and likes tourism, according to personal interest and other input features having the user of common interest corresponding with active user of active user, weights are accustomed to accordingly to active user, equally, also through environmental model, time model and position model analysis, corresponding environment weights, time weight and place weights can be provided.Finally, the total weight value of each syllable combination that pinyin character string " xian " is corresponding, is jointly determined by each characteristic model of server end, is obtained the sequence of each syllable combination by total weight value, or, corresponding weights are added to the sequence intrinsic weights obtaining the combination of each syllable.
Step S408: user chooses the syllable combination of needs, the candidate word of the syllable combination correspondence that upper screen is selected.
Step S410: the syllable that selected by recording user is final, candidate word is corresponding is incorporated into corresponding configuration file, and stores corresponding customized information simultaneously, for user's input next time provides personalized weighting foundation.
When the syllable combination of Chinese character user finally selected and the customized information of correspondence are stored in local configuration file, local profile can be index with four dimensions respectively, store data respectively to four files, namely carry out structured storage by user habit, time, place, input environment.For the data structure that the time is corresponding, containing 24 KEY in this form, be 24 hours respectively, i.e. 0-23.After each period, corresponding corresponding syllable combination and this syllable are combined in the input word frequency of this time period.Data structure corresponding to place is also that same recording mode stores.The recording mode of environmental characteristic and user habit does not then distinguish when and where, record the input environment that the combination of this syllable is corresponding, namely the type that the software object that inputs of pinyin character string is corresponding (is such as at word, or in map application, or the input carried out in audiovisual applications), with the use habit of active user, whether inputted the number of times of identical pinyin character string and input and the time etc. of last input.Wherein, the use habit of active user by arranging the input of user, the trigger recording of local each application carries out analysis and obtains.
Still " xi ' an " this syllable is selected to be combined as example with user, the machine configuration file is updated to after this input: the combination of " xi ' an " this syllable used 11 times altogether, wherein used 9 times in map application, 2 times were used in word document, time corresponding during use is respectively, 14:002 time, 15:004 time, 16:003 time, 18:002 time; Geographic position corresponding during use is respectively, and uses 10 times in Xi'an, uses 1 time in Beijing, thus provides the weighting foundation of local customized information as user's input next time.Meanwhile, this above-mentioned information can also be reached server end, by the various characteristic models of server end, related content be carried out cluster, put forward the weighting foundation of server end for customized information as user's input next time.
By the pinyin character string manipulation scheme of the present embodiment, combine user habit, input environment, time and position etc. and input closely related and that each user is not quite similar again feature with user, user's input is affected by these features, decrease user select input time interaction times, reduce cost input time, improve input efficiency.
Embodiment four
With reference to Fig. 5, show the structured flowchart of a kind of pinyin character string manipulation device of the embodiment of the present application four.
As shown in Figure 5, the pinyin character string manipulation device of the present embodiment comprises: the first acquisition module 502, for receiving the input of pinyin character string, obtains current context information; Wherein, current context information is receive the current application program of pinyin character string and/or the information of input frame; Second acquisition module 504, for according to the configuration information preset, obtains the customized information of input Pinyin character string; Wherein, customized information comprises user habit information and history environment information, environmental information residing when history environment information comprises that user is previous carries out Chinese charater input; 3rd acquisition module 506, for being combined into row filter according to customized information and the current context information at least one group of syllable corresponding to pinyin character string, thus determines the position of point lexicon in pinyin character string.
Preferably, user personalized information also comprises: temporal information and/or positional information; Wherein, temporal information is previously with user carry out inputting and/or previously carry out the time residing when network is accessed; To be that user is previous carry out inputting and/or previous geographic position residing when carrying out network access positional information;
3rd acquisition module 506 is when being combined into row filter according to customized information and the current context information at least one group of syllable corresponding to pinyin character string, according to user habit information and current context information, and current time information and/or current location information, at least one group syllable corresponding to pinyin character string is combined into row filter.
Preferably, the 3rd acquisition module 506 comprises: obtain weights submodule 5062, combines the matching result with user habit information for each group of syllable corresponding according to pinyin character string, obtains respectively and often organizes custom weights corresponding to syllable combination; According to the matching result of current context information and history environment information, obtain respectively and often organize environment weights corresponding to syllable combination; According to the matching result of current time information and/or current location information and corresponding temporal information and/or positional information, obtain respectively and often organize time weight corresponding to syllable combination and/or place weights; Summation submodule 5064, for suing for peace to each weights of each group of syllable combination correspondence, obtains the total characteristic weights of each group of syllable combination; Determining submodule 5066, for being combined into line ordering according to the total characteristic weights of each group of syllable combination at least one group of syllable, and determining the position of point lexicon in described pinyin character string according to the syllable combination before row.
Preferably, the second acquisition module 504, according to the configuration information preset, when obtaining the customized information of input Pinyin character string, according to the configuration information preset, is determined to obtain customized information from server end; Wherein, the user personalized information of server end generates in the following manner: acquisition user carries out historical behavior information during network access; Wherein, historical behavior information comprises: the content that user is previous carries out inputting, time, geographic position, input environment, and/or user is previous carries out the content of network access, time, geographic position; Respectively according to interest model, environmental model, time model and the position model preset, respectively analyzing and processing is carried out to historical behavior information; The result of analyzing and processing is carried out, the customized information that generation server end is corresponding according to historical behavior information.
Preferably, interest characteristics analytical model comprises individual interest model and group interest model, the matching result with described user habit information is combined at each group of syllable corresponding according to pinyin character string, when obtaining the custom weights that often combination of group syllable is corresponding respectively: combined by each group of syllable and mate with the user habit information generated by individual interest model respectively when obtaining weights submodule 5062; If mate unsuccessful, then the combination of each group of syllable is mated with the user habit information by group interest model generation corresponding to individual interest model respectively; Obtain each group of syllable according to each group of syllable combination with the matching result of group interest model and combine corresponding custom weights.
Preferably, obtain the custom weights that weights submodules 5062 obtains and obtain in the following manner: the combination of target syllable is selected the combination of many groups syllable corresponding to number of times/pinyin character string always to input number of times × N, wherein, weight coefficient, and N be greater than 0 natural number.
Preferably, the pinyin character string manipulation device of the present embodiment also comprises: the 4th acquisition module 508, after being combined into row filter at least one group of syllable corresponding at the 3rd acquisition module 506 pairs of pinyin character strings, obtain the candidate word of screening the syllable combination difference correspondence obtained; 5th acquisition module 510, for obtaining additional weights corresponding to each candidate word; Wherein, additional weights obtain after carrying out analytic statistics to all users to the selection of each group of Chinese character; Output module 512, exports after sorting to candidate word according to additional weights.
Preferably, the additional weights of output module 512 for sorting to candidate word, except obtaining after basis carries out analytic statistics to all users to the selection of each group of Chinese character, obtain after also analytic statistics being carried out to each group of Chinese character selected according to customized information and current context information.
Preferably, the user habit information that the second acquisition module 504 obtains, uses the number of times of each group of syllable combination and user to use the final time of each group of syllable combination to obtain according to active user.
The pinyin character string manipulation device of the present embodiment is used for the corresponding pinyin character string manipulation method in aforementioned multiple embodiment of the method that realizes, and has the beneficial effect of corresponding method enforcement, does not repeat them here.
The invention provides a kind of pinyin character string manipulation scheme, the program can be widely used in all devices that input method can be adopted to carry out content input, as: mobile phone, PC etc.Pinyin character string manipulation scheme of the present invention is when user's input Pinyin character string, according to customized information, current context information, current time information, the current location information of user, the many group syllable corresponding to pinyin character string is combined into row filter, preferentially be supplied to the syllable combination that user more presses close to wish input target, reduce the interactive action of user when inputting, reduce cost input time, meet the individual demand of user.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Above a kind of pinyin character string manipulation method and apparatus provided by the present invention is described in detail, apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (11)

1. a pinyin character string manipulation method, is characterized in that, comprising:
Receive the input of pinyin character string, obtain current context information; Wherein, described current context information is receive the described current application program of pinyin character string and/or the information of input frame;
According to the configuration information preset, obtain the customized information of the described pinyin character string of input; Wherein, described customized information comprises user habit information and history environment information, environmental information residing when described history environment information comprises that user is previous carries out Chinese charater input;
Be combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, thus determine the position of point lexicon in described pinyin character string.
2. method according to claim 1, is characterized in that, described customized information also comprises: temporal information and/or positional information; Wherein, to be that described user is previous carry out inputting and/or the previous time residing when carrying out network access described temporal information; To be that described user is previous carry out inputting and/or previous geographic position residing when carrying out network access described positional information;
Described when being combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, according to described user habit information and described current context information, and described current time information and/or described current location information, at least one group syllable corresponding to described pinyin character string is combined into row filter.
3. method according to claim 2, it is characterized in that, the described at least one group syllable corresponding to described pinyin character string according to the information of described user habit information and described current environment is combined into row filter, thus determines that the step of the position of point lexicon in described pinyin character string comprises:
The each group syllable combination corresponding according to described pinyin character string and the matching result of described user habit information, obtain respectively and often organize custom weights corresponding to syllable combination;
According to the matching result of described current context information and described history environment information, obtain respectively and often organize environment weights corresponding to syllable combination;
According to the matching result of described current time information and/or described current location information and corresponding described temporal information and/or positional information, obtain respectively and often organize time weight corresponding to syllable combination and/or place weights;
The each weights corresponding to described each group of syllable combination are sued for peace, obtain the total characteristic weights of described each group of syllable combination, according to described total characteristic weights, line ordering is combined into described each group of syllable, and determines the position of point lexicon in described pinyin character string according to the preceding syllable combination of sequence.
4. according to the method in claim 2 or 3, it is characterized in that, described customized information obtains from server end, and generates in the following manner:
Obtain described user carry out network access time historical behavior information; Wherein, described historical behavior information comprises: the content that described user is previous carries out inputting, time, geographic position, input environment, and/or described user is previous carries out the content of network access, time, geographic position;
Respectively according to interest model, environmental model, time model and the position model preset, respectively analyzing and processing is carried out to described historical behavior information;
Carry out the result of analyzing and processing according to described historical behavior information, generate the customized information that described server end is corresponding.
5. method according to claim 4, is characterized in that, described interest model comprises individual interest model and group interest model;
When each group syllable corresponding according to described pinyin character string combines the matching result with described user habit information, when obtaining the custom weights often organizing syllable combination correspondence respectively: described each group of syllable is combined and mates with the user habit information generated by described individual interest model respectively;
If mate unsuccessful, then described each group of syllable combination is mated with the user habit information by group interest model generation corresponding to described individual interest model respectively;
Obtain described each group of syllable according to described each group of syllable combination with the matching result of described group interest model and combine corresponding custom weights.
6. the method according to claim 3 or 5, is characterized in that, described custom weights obtain in the following manner:
The combination of target syllable is always inputted number of times × N by the multiple syllables combinations selecting pinyin character string described in number of times ÷ corresponding, and wherein, N is weight coefficient, and N be greater than 0 natural number.
7. method according to claim 1, is characterized in that, the described at least one group syllable corresponding to described pinyin character string also comprises after being combined into row filter:
Obtain the candidate word of screening the syllable combination difference correspondence obtained;
Obtain the additional weights that each described candidate word is corresponding; Wherein, described additional weights obtain after carrying out analytic statistics to the selection of all users to described each group of Chinese character;
Export after described candidate word being sorted according to described additional weights.
8. method according to claim 7, is characterized in that, described additional weights, except obtaining after basis carries out analytic statistics to the selection of all users to described each group of Chinese character, also comprise:
According to described customized information and described current context information, obtain after analytic statistics is carried out to the described each group of Chinese character selected.
9. the method according to any one of claim 1,2,3 or 5, is characterized in that, described user habit information uses the number of times of each group of syllable combination and user to use the final time of each group of syllable combination to obtain according to active user.
10. a pinyin character string manipulation device, is characterized in that, comprising:
First acquisition module, for receiving the input of pinyin character string, obtains current context information; Wherein, described current context information is receive the described current application program of pinyin character string and/or the information of input frame;
Second acquisition module, for according to the configuration information preset, obtains the customized information of the described pinyin character string of input; Wherein, described customized information comprises user habit information and history environment information, environmental information residing when described history environment information comprises that user is previous carries out Chinese charater input;
3rd acquisition module, for being combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, thus determines the position of point lexicon in described pinyin character string.
11. devices according to claim 10, is characterized in that,
Described customized information also comprises: temporal information and/or positional information; Wherein, to be that described user is previous carry out inputting and/or the previous time residing when carrying out network access described temporal information; To be that described user is previous carry out inputting and/or previous geographic position residing when carrying out network access described positional information;
Described 3rd acquisition module is when being combined into row filter according to described customized information and the described current context information at least one group of syllable corresponding to described pinyin character string, according to described user habit information and described current context information, and described current time information and/or described current location information, at least one group syllable corresponding to described pinyin character string is combined into row filter.
CN201310370370.XA 2013-08-22 2013-08-22 Pinyin string processing method and device Pending CN104423621A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310370370.XA CN104423621A (en) 2013-08-22 2013-08-22 Pinyin string processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310370370.XA CN104423621A (en) 2013-08-22 2013-08-22 Pinyin string processing method and device

Publications (1)

Publication Number Publication Date
CN104423621A true CN104423621A (en) 2015-03-18

Family

ID=52972879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310370370.XA Pending CN104423621A (en) 2013-08-22 2013-08-22 Pinyin string processing method and device

Country Status (1)

Country Link
CN (1) CN104423621A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951099A (en) * 2015-06-16 2015-09-30 北京奇虎科技有限公司 Method and device for showing candidate items based on input method
CN105045412A (en) * 2015-08-28 2015-11-11 百度在线网络技术(北京)有限公司 Method and system for generating candidate item of input method
CN105549756A (en) * 2015-10-30 2016-05-04 东莞酷派软件技术有限公司 Input method based on position information, and user terminal
CN106371624A (en) * 2016-09-23 2017-02-01 百度在线网络技术(北京)有限公司 Method and device for providing input candidate item
JP2017027143A (en) * 2015-07-16 2017-02-02 富士ゼロックス株式会社 Display control device and program
CN106708282A (en) * 2015-12-02 2017-05-24 北京搜狗科技发展有限公司 Recommending method and device and device for recommending
CN106774969A (en) * 2015-11-20 2017-05-31 北京搜狗科技发展有限公司 A kind of input method and device
CN108073293A (en) * 2016-11-11 2018-05-25 北京搜狗科技发展有限公司 A kind of definite method and apparatus of target phrase
CN108629174A (en) * 2018-05-08 2018-10-09 阿里巴巴集团控股有限公司 The method and device of character string verification
CN109032375A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Candidate text sort method, device, equipment and storage medium
CN109376358A (en) * 2018-10-25 2019-02-22 陈逸天 A kind of word learning method, device and electronic equipment for borrowing history and combining experience into syllables
CN109377980A (en) * 2018-08-31 2019-02-22 众安信息技术服务有限公司 A kind of syllable splitting method and apparatus
CN109901725A (en) * 2017-12-07 2019-06-18 北京搜狗科技发展有限公司 A kind of pinyin string cutting method and device
CN110244857A (en) * 2019-05-24 2019-09-17 阿里巴巴集团控股有限公司 A kind of Chinese character string statistical method and system based on block chain
CN113707144A (en) * 2021-08-24 2021-11-26 深圳市衡泰信科技有限公司 Control method and system of golf simulator

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051323A (en) * 2007-05-22 2007-10-10 北京搜狗科技发展有限公司 Character input method, input method system and method for updating word stock
CN101493812A (en) * 2009-03-06 2009-07-29 中国科学院软件研究所 Tone-character conversion method
CN101770328A (en) * 2009-01-04 2010-07-07 英业达股份有限公司 Multiple-segmentation Chinese pinyin input system and method
CN102722483A (en) * 2011-03-29 2012-10-10 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for determining candidate-item sequence of input method
CN103226393A (en) * 2013-04-12 2013-07-31 百度在线网络技术(北京)有限公司 Input method and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101051323A (en) * 2007-05-22 2007-10-10 北京搜狗科技发展有限公司 Character input method, input method system and method for updating word stock
CN101770328A (en) * 2009-01-04 2010-07-07 英业达股份有限公司 Multiple-segmentation Chinese pinyin input system and method
CN101493812A (en) * 2009-03-06 2009-07-29 中国科学院软件研究所 Tone-character conversion method
CN102722483A (en) * 2011-03-29 2012-10-10 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for determining candidate-item sequence of input method
CN103226393A (en) * 2013-04-12 2013-07-31 百度在线网络技术(北京)有限公司 Input method and equipment

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951099A (en) * 2015-06-16 2015-09-30 北京奇虎科技有限公司 Method and device for showing candidate items based on input method
CN104951099B (en) * 2015-06-16 2017-12-19 北京奇虎科技有限公司 A kind of method and apparatus of the displaying candidate item based on input method
JP2017027143A (en) * 2015-07-16 2017-02-02 富士ゼロックス株式会社 Display control device and program
CN105045412A (en) * 2015-08-28 2015-11-11 百度在线网络技术(北京)有限公司 Method and system for generating candidate item of input method
CN105549756A (en) * 2015-10-30 2016-05-04 东莞酷派软件技术有限公司 Input method based on position information, and user terminal
CN106774969A (en) * 2015-11-20 2017-05-31 北京搜狗科技发展有限公司 A kind of input method and device
CN106774969B (en) * 2015-11-20 2021-12-14 北京搜狗科技发展有限公司 Input method and device
US11106709B2 (en) 2015-12-02 2021-08-31 Beijing Sogou Technology Development Co., Ltd. Recommendation method and device, a device for formulating recommendations
CN106708282A (en) * 2015-12-02 2017-05-24 北京搜狗科技发展有限公司 Recommending method and device and device for recommending
WO2017092198A1 (en) * 2015-12-02 2017-06-08 北京搜狗科技发展有限公司 Recommendation method and device, and device for recommendation
CN106708282B (en) * 2015-12-02 2019-03-19 北京搜狗科技发展有限公司 A kind of recommended method and device, a kind of device for recommendation
CN106371624B (en) * 2016-09-23 2019-03-19 百度在线网络技术(北京)有限公司 It is a kind of for provide input candidate item method, apparatus and input equipment
CN106371624A (en) * 2016-09-23 2017-02-01 百度在线网络技术(北京)有限公司 Method and device for providing input candidate item
CN108073293A (en) * 2016-11-11 2018-05-25 北京搜狗科技发展有限公司 A kind of definite method and apparatus of target phrase
CN108073293B (en) * 2016-11-11 2022-01-14 北京搜狗科技发展有限公司 Method and device for determining target phrase
CN109901725A (en) * 2017-12-07 2019-06-18 北京搜狗科技发展有限公司 A kind of pinyin string cutting method and device
CN109901725B (en) * 2017-12-07 2022-05-06 北京搜狗科技发展有限公司 Pinyin string segmentation method and device
CN108629174A (en) * 2018-05-08 2018-10-09 阿里巴巴集团控股有限公司 The method and device of character string verification
CN109032375A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 Candidate text sort method, device, equipment and storage medium
CN109377980A (en) * 2018-08-31 2019-02-22 众安信息技术服务有限公司 A kind of syllable splitting method and apparatus
CN109377980B (en) * 2018-08-31 2022-06-07 众安信息技术服务有限公司 Syllable segmentation method and device
CN109376358A (en) * 2018-10-25 2019-02-22 陈逸天 A kind of word learning method, device and electronic equipment for borrowing history and combining experience into syllables
CN109376358B (en) * 2018-10-25 2021-07-16 陈逸天 Word learning method and device based on historical spelling experience and electronic equipment
CN110244857A (en) * 2019-05-24 2019-09-17 阿里巴巴集团控股有限公司 A kind of Chinese character string statistical method and system based on block chain
CN113707144A (en) * 2021-08-24 2021-11-26 深圳市衡泰信科技有限公司 Control method and system of golf simulator
CN113707144B (en) * 2021-08-24 2023-12-19 深圳市衡泰信科技有限公司 Control method and system of golf simulator

Similar Documents

Publication Publication Date Title
CN104423621A (en) Pinyin string processing method and device
CN110309427B (en) Object recommendation method and device and storage medium
CN110825957B (en) Deep learning-based information recommendation method, device, equipment and storage medium
CN102054003B (en) Methods and systems for recommending network information and creating network resource index
CN1764916B (en) Method and apparatus for frequency count
US20150278359A1 (en) Method and apparatus for generating a recommendation page
CN102999586B (en) A kind of method and apparatus of recommendation of websites
CN113536793A (en) Entity identification method, device, equipment and storage medium
CN112052387B (en) Content recommendation method, device and computer readable storage medium
CN102073699A (en) Method, device and equipment for improving search result based on user behaviors
CN111125528B (en) Information recommendation method and device
CN107729578B (en) Music recommendation method and device
JP5469046B2 (en) Information search apparatus, information search method, and information search program
CN108319628B (en) User interest determination method and device
CN111177559A (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN102004772A (en) Method and equipment for sequencing search results according to terms
CN111723256A (en) Government affair user portrait construction method and system based on information resource library
CN104503988A (en) Searching method and device
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN111552884A (en) Method and apparatus for content recommendation
CN103544150A (en) Method and system for providing recommendation information for mobile terminal browser
JP7200069B2 (en) Information processing device, vector generation method and program
CN106933380B (en) A kind of update method and device of dictionary
CN103955480A (en) Method and equipment for determining target object information corresponding to user
KR101122737B1 (en) Apparatus and method for establishing search database for knowledge node coupling structure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150318