CN109901725B - Pinyin string segmentation method and device - Google Patents

Pinyin string segmentation method and device Download PDF

Info

Publication number
CN109901725B
CN109901725B CN201711284974.7A CN201711284974A CN109901725B CN 109901725 B CN109901725 B CN 109901725B CN 201711284974 A CN201711284974 A CN 201711284974A CN 109901725 B CN109901725 B CN 109901725B
Authority
CN
China
Prior art keywords
segmentation result
segmentation
input
condition
syllable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711284974.7A
Other languages
Chinese (zh)
Other versions
CN109901725A (en
Inventor
姚波怀
张扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201711284974.7A priority Critical patent/CN109901725B/en
Publication of CN109901725A publication Critical patent/CN109901725A/en
Application granted granted Critical
Publication of CN109901725B publication Critical patent/CN109901725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the application discloses a pinyin string segmentation method and device, when a plurality of segmentation results can be obtained according to an input pinyin string, whether the segmentation results meet the rationality condition or not can be judged according to the input interval between adjacent syllable segments in the segmentation results, the segmentation results meeting the rationality condition are not only segmented according to syllables, but also can meet the characteristics of the input interval, and when candidate items are determined, the candidate items are determined directly according to the segmentation results meeting the rationality condition, so that the number of candidate items which are meaningless or not needed relative to the input requirement of a user in the candidate items displayed by the pinyin string is reduced, the time for the user to select the candidate items is shortened, and the input experience of the user is improved.

Description

Pinyin string segmentation method and device
Technical Field
The application relates to the field of input methods, in particular to a pinyin string segmentation method and device.
Background
The input method refers to an encoding method adopted for inputting various symbols into a computer or other equipment (such as a mobile phone), and a user can conveniently input required characters into the electronic equipment by using the input method. For example, in a Chinese character input method, Chinese characters may be input into an electronic device by inputting a pinyin string.
For the pinyin string input by the user, in order to determine what the corresponding character is, the input method needs to segment the pinyin string, each segment after segmentation generally corresponds to one syllable and is separated by a separator, for example, the input pinyin string is "wom", one segmentation result may be "wo'm", and the syllables "wo" and "m" are separated by a separator "'".
However, in the conventional method, only syllables are used as the pinyin strings for segmentation, multiple segmentation methods are generally used for the same pinyin string, when the pinyin string input by a user is wrong or long, multiple segmentation results obtained by segmentation may be caused, most of the segmentation results are meaningless segmentation results relative to the input requirement of the user, and the candidate items displayed according to the segmentation results occupy candidate items corresponding to effective segmentation results, so that trouble is brought to the user for selecting the candidate items, the time for selecting the required candidate items is prolonged, and the input experience of the user is reduced.
Disclosure of Invention
In order to solve the technical problem, the application provides a pinyin string segmentation method, which reduces the number of meaningless or unnecessary candidate items relative to the input requirement of a user in the candidate items displayed by an input pinyin string, thereby reducing the time for the user to select the candidate items.
The embodiment of the application discloses the following technical scheme:
in a first aspect, an embodiment of the present application provides a pinyin string segmentation method, where the method includes:
a plurality of segmentation results are obtained according to the obtained pinyin string segmentation, wherein any segmentation result comprises a plurality of syllable segments;
judging whether the segmentation result meets the rationality condition or not according to the input interval between the adjacent syllable segments in the segmentation result;
and determining candidate items aiming at the pinyin strings according to the segmentation result meeting the reasonableness condition.
Optionally, the determining whether the segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in the segmentation result includes:
and judging whether the segmentation result meets the rationality condition or not according to the input interval between the adjacent syllable segments in the segmentation result and the number of the syllable segments.
Optionally, the determining whether the segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in the segmentation result includes:
acquiring historical input interval data of a user inputting the pinyin string;
and judging whether the segmentation result meets the rationality condition or not according to the historical input interval data and the input interval between the adjacent syllable segments in the segmentation result.
Optionally, the determining whether the segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in the segmentation result and the number of syllable segments includes:
acquiring historical input syllable data of a user inputting the pinyin string;
and judging whether the segmentation result meets the rationality condition or not according to the number of the historical input syllables, the input interval between each adjacent syllable segment in the segmentation result and the number of the syllable segments.
Optionally, before the determining the candidate for the pinyin string according to the segmentation result meeting the reasonableness condition, the method further includes:
and correcting the syllable segments in the segmentation result meeting the reasonableness condition.
Optionally, if the segmentation result meeting the reasonableness condition includes a first segmentation result and a second segmentation result, determining a candidate item for the pinyin string according to the segmentation result meeting the reasonableness condition includes:
sorting the candidate items aiming at the first cutting result according to the satisfaction degree of the first cutting result and the rationality condition;
sorting the candidate items aiming at the second segmentation result according to the satisfaction degree of the second segmentation result and the rationality condition;
and determining candidate items and a display sequence aiming at the pinyin string according to the sorting result aiming at the first segmentation result and the sorting result aiming at the second segmentation result.
In a second aspect, an embodiment of the present application provides a pinyin string segmentation device, where the device includes:
the segmentation module is used for segmenting a plurality of segmentation results according to the obtained pinyin strings, wherein any segmentation result comprises a plurality of syllable segments;
the judging module is used for judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result;
and the determining module is used for determining candidate items aiming at the pinyin strings according to the segmentation result meeting the reasonableness condition.
Optionally, the determining module includes:
and the first judging unit is used for judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result and the number of the syllable segments.
Optionally, the determining module includes:
a historical input interval data acquisition unit, configured to acquire historical input interval data of a user who inputs the pinyin string;
and the second judgment unit is used for judging whether the segmentation result meets the rationality condition or not according to the historical input interval data and the input interval between the adjacent syllable segments in the segmentation result.
Optionally, the first determining unit includes:
a historical input syllable data acquisition subunit, configured to acquire historical input syllable data of a user who inputs the pinyin string;
and the first judgment subunit is used for judging whether the segmentation result meets the rationality condition or not according to the number of the historical input syllables, the input interval between each two adjacent syllable segments in the segmentation result and the number of the syllable segments.
Optionally, the apparatus further comprises:
and the error correction module is used for correcting the syllable segments in the segmentation result meeting the reasonableness condition.
Optionally, if the segmentation result that meets the reasonableness condition includes a first segmentation result and a second segmentation result, the device determines a candidate item for the pinyin string according to the segmentation result that meets the reasonableness condition, and includes:
the first sorting module is used for sorting the candidate items aiming at the first cutting result according to the satisfaction degree of the first cutting result and the rationality condition;
the second sorting module is used for sorting the candidate items aiming at the second segmentation result according to the satisfaction degree of the second segmentation result and the rationality condition;
and the candidate item determining module is used for determining candidate items and display sequences of the pinyin strings according to the sorting result aiming at the first segmentation result and the sorting result aiming at the second segmentation result.
In a third aspect, an embodiment of the present application provides a processing apparatus for pinyin string splitting, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors includes instructions for:
a plurality of segmentation results are obtained according to the obtained pinyin string segmentation, wherein any segmentation result comprises a plurality of syllable segments;
judging whether the segmentation result meets the rationality condition or not according to the input interval between the adjacent syllable segments in the segmentation result;
and determining candidate items aiming at the pinyin strings according to the segmentation result meeting the reasonableness condition.
In a fourth aspect, embodiments of the present application provide a machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform one or more of the pinyin-string splitting methods described in the first aspect.
It can be seen from the above technical solutions that, when a plurality of segmentation results can be obtained according to the input pinyin string, whether the segmentation result meets the rationality condition can be judged according to the input interval between the adjacent syllable segments in the segmentation result, the segmentation result meeting the rationality condition not only is segmented according to the syllables, but also can meet the characteristics of the input interval, so that the segmentation result that part is segmented according to syllables but the input interval between syllable segments is too small is eliminated, the candidate item can be determined according to the segmentation result meeting the reasonableness condition without considering the eliminated segmentation result when determining the candidate item, the number of the candidate items which are meaningless or unnecessary relative to the input requirement of the user in the candidate items displayed by the pinyin string is reduced, so that the time for the user to select the candidate items is reduced, and the input experience of the user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a flowchart of a pinyin string segmentation method provided in an embodiment of the present application;
fig. 2 is a flowchart of a method for determining pinyin string candidate items and a display order according to an embodiment of the present disclosure;
fig. 3 is a block diagram of a pinyin string splitting device according to an embodiment of the present application;
fig. 4 is a block diagram of an apparatus for pinyin string segmentation according to an embodiment of the present disclosure;
fig. 5 is a block diagram of a server for pinyin string splitting according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
When a user uses the input method, the user usually inputs Chinese characters into the electronic device by inputting a pinyin string. In order to determine the chinese characters corresponding to the pinyin string input by the user, the input method generally divides the pinyin string input by the user by taking syllables as units, for example, the pinyin string input by the user is "won", the input method divides the pinyin string to obtain two syllables of "wo" and "men", and then displays the chinese characters corresponding to each syllable as the candidate of the user.
However, when the user inputs the pinyin string with a mistake or the pinyin string is long, the pinyin string input by the user is segmented only based on the syllables, and a plurality of segmentation results for the pinyin string are obtained through segmentation, wherein most of the segmentation results may be meaningless segmentation results for the user, and the candidate items corresponding to the segmentation results occupy the candidate items corresponding to the effective segmentation results, so that the user cannot quickly find the option meeting the input requirement in the candidate items when selecting the candidate items.
For example, the user needs to input "us" in the input method, but in the process of inputting the pinyin string, "women" is mistakenly input as "wown". Accordingly, the input method firstly divides "wown" into "wo", "m", "w", and "n" based on the syllables, and then displays the chinese characters corresponding to the syllables "wo", "m", "w", and "n" in front of the candidate region as candidates. And "wown" is segmented into "wo" and "mwn", error correction is performed on "mwn" to obtain "men", and chinese characters corresponding to the syllables "wo" and "men" are taken as candidates to be displayed behind the candidate region. Therefore, the user cannot quickly find the option meeting the input requirement in the candidate item area, and the user experience is poor.
In order to solve the problems in the prior art, the application provides a pinyin string segmentation method, wherein when the pinyin string is segmented, whether each segmentation result is reasonable or not is judged according to the input interval between adjacent syllable segments in the segmentation result, and then candidate items are determined only according to the reasonable segmentation result.
Specifically, the obtained pinyin string is segmented to obtain segmentation results containing a plurality of syllable segments, whether each segmentation result meets the rationality condition or not is judged according to the input interval between adjacent syllable segments in each segmentation result, segmentation results which do not meet the rationality condition are eliminated, and candidate items of the obtained pinyin string are determined according to the segmentation results which meet the rationality condition.
The pinyin string segmentation method provided by the application judges whether each segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in each segmentation result, the segmentation result meeting the rationality condition obtained under the condition not only is segmented according to syllables, but also accords with the characteristics of the input interval, and the segmentation result which is segmented according to syllables but has an excessively small input interval between the syllable segments is eliminated. Therefore, when the candidate item is determined, the segmentation result with the excessively small input interval between the eliminated syllable segments does not need to be considered, the candidate item only needs to be determined according to the segmentation result meeting the rationality condition, and correspondingly, the number of the candidate items which are meaningless or unnecessary for the input requirement of the user in the candidate items displayed by the pinyin string is reduced, so that the time for the user to select the candidate item is reduced, and the input experience of the user is improved.
Example one
Referring to fig. 1, a flowchart of a pinyin string segmentation method provided in this embodiment is shown, where the method includes:
step 101: and obtaining a plurality of segmentation results according to the obtained pinyin string segmentation, wherein any segmentation result comprises a plurality of syllable segments.
The pinyin string is composed of all pinyins input by a user in the input, and the obtained pinyin string is segmented on the basis of syllables to obtain a plurality of segmentation results, wherein each segmentation result comprises a plurality of syllable segments.
For example, a user inputs a pinyin string "won" in one input, and based on syllables, the pinyin string may be segmented into a plurality of segmentation results, wherein one segmentation result includes syllable segments "wo" and "men", and another segmentation result includes syllable segments "wo", "me", and "n".
Step 102: and judging whether the segmentation result meets the rationality condition or not according to the input interval between the adjacent syllable segments in the segmentation result.
Step 103: and determining candidate items aiming at the pinyin strings according to the segmentation result meeting the rationality condition.
When a user inputs an input, the input interval between adjacent syllables may be different from the input interval between adjacent pinyins in a syllable due to brain habits.
Therefore, the input interval between each adjacent syllable segment in each segmentation result is obtained, whether the input interval between each syllable segment in the segmentation result meets the interval between the adjacent syllables input by the user or not can be judged according to the input interval between each adjacent syllable segment in the segmentation result, and if the input interval between each syllable segment in the segmentation result meets the input interval between the adjacent syllables input by the user, the syllable segment in the segmentation result is possibly the same as the syllable input by the user; if the input interval between syllable segments in the segmentation result is shorter and does not meet the input interval between adjacent syllables input by the user, the syllable segment in the segmentation result is a syllable segment obtained by segmenting two adjacent pinyins in the same syllable input by the user in the process of segmenting the pinyin string, but not the syllable input by the user.
And judging whether the syllable segments in the segmentation result are possibly the syllables input by the user or not according to the input interval between the syllable segments in each segmentation result by using the reasonability condition as a judgment basis. If the segmentation result meets the rationality condition, the syllable segment in the segmentation result is possibly the syllable input by the user, otherwise, if the segmentation result does not meet the rationality condition, the syllable segment in the segmentation result is not possibly the syllable input by the user.
In specific implementation, the average value of the input intervals between adjacent syllable segments in a certain segmentation result can be obtained, and whether the average value of the input intervals between adjacent syllable segments in the segmentation result meets the rationality condition is judged, wherein the rationality condition is a condition set by combining with the actual condition aiming at the average value of the input intervals between adjacent syllable segments.
For example, the rationality condition is set such that the average value of the input intervals between adjacent syllable segments in each segmentation result is greater than or equal to 0.5s, the pinyin string "women" is segmented, two segmentation results are obtained, the first segmentation result is "wo" and "men", and the second segmentation result is "wo", "me" and "n". In the first segmentation result, the input interval between two syllable segments of 'wo' and 'men' is 0.55s, and correspondingly, the average value of the input intervals of adjacent syllable segments of the segmentation result is also 0.55s, and the set rationality condition is met, so that the syllable segment in the first segmentation result is possibly the syllable input by the user. In the second segmentation result, the input interval between two syllable segments of 'wo' and 'me' is 0.55s, the input interval between two syllable segments of 'me' and 'n' is 0.2s, the average value of the input interval between every two adjacent syllable segments is calculated to be 0.375s, and the average value does not meet the rationality condition, so that the syllable segment in the second segmentation result is not possible to be the syllable input by the user.
Of course, it is also possible to perform other processing on each adjacent syllable segment input interval in each segmentation result, and determine whether the processing result corresponding to each segmentation result satisfies the reasonableness condition, which is correspondingly the condition set corresponding to the processing mode. And are not intended to be limiting in any way.
The method includes the steps of determining candidate items of the obtained pinyin strings according to the segmentation results meeting the rationality conditions without considering the segmentation results not meeting the rationality conditions, specifically, determining Chinese characters corresponding to all syllable segments in the segmentation results, judging whether the Chinese characters corresponding to all syllable segments in the segmentation results can form words or not, further displaying Chinese character combinations capable of forming the words as the candidate items, and certainly, determining the candidate items meeting the rationality conditions in other modes without any limitation.
The pinyin string segmentation method judges whether each segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in each segmentation result, the segmentation result meeting the rationality condition under the condition is not only segmented according to syllables, but also accords with the characteristics of the input interval, and the segmentation result which is segmented according to the syllables but has over-small input interval between the syllable segments is eliminated. Therefore, when the candidate items are determined, the segmentation result with the excessively small input interval between the eliminated syllable segments does not need to be considered, the candidate items only need to be determined according to the segmentation result meeting the rationality condition, and correspondingly, the number of the candidate items which are meaningless or unnecessary for the input requirement of the user in the candidate items displayed by the pinyin string is reduced, so that the time for the user to select the candidate items is reduced, and the input experience of the user is improved.
For step 202, when judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result, in order to improve the accuracy of the selected segmentation result meeting the rationality condition, the judgment can be performed by combining the number of syllable segments, that is, whether the segmentation result meets the rationality condition is judged according to the input interval between the adjacent syllable segments and the number of syllable segments in the segmentation result.
When the pinyin string is cut, the pinyin string can be cut according to syllables to obtain a plurality of cutting results, each cutting result comprises a plurality of syllable segments, and when whether each cutting result meets the rationality condition is judged, the input interval between every two adjacent syllable segments in each cutting result needs to be considered, and the number of the syllable segments in each cutting result can also be considered. Therefore, when the number of syllable segments in the segmentation result is not within the range of the normal number of syllable inputs, it indicates that the segmentation result may not be the segmentation result required by the user, and accordingly, the syllable segments in the segmentation result may not be the syllables input by the user.
For example, a segmentation result obtained contains 50 syllable segments, and a user generally does not input 50 syllables in one input, so that the segmentation result containing 50 syllable segments is considered not to be the segmentation result required by the user.
Therefore, the input interval between adjacent syllable segments in the segmentation result is combined with the number of syllable segments in the segmentation result, whether each segmentation result meets the rationality condition is judged, and the segmentation result required by a user can be further screened out.
An optional method provided in this embodiment is described below, and the method may determine whether the segmentation result satisfies the rationality condition according to the input interval between each adjacent syllable segment in the segmentation result and the number of syllable segments:
setting a first function taking the average value of the input intervals between the adjacent syllable segments in the segmentation result as a variable, wherein the closer the average value of the input intervals between the adjacent syllable segments in the segmentation result is to the input interval when the adjacent syllable segments are input by a user, the larger the first function value corresponding to the segmentation result is, and conversely, the more the difference between the average value of the input intervals between the adjacent syllable segments in the segmentation result and the input interval when the adjacent syllable segments are input by the user is, the smaller the first function value corresponding to the segmentation result is.
Setting a second function with the number of the syllable segments in the segmentation result as a variable, wherein if the number of the syllable segments in the segmentation result is within the range of the normal input syllable number, the second function value corresponding to the segmentation result is larger, and if the number of the syllable segments in the segmentation result is not within the range of the normal input syllable number or has a larger difference with the range, the second function value corresponding to the segmentation result is smaller.
Setting a reasonability condition as that the sum of the first function value and the second function value corresponding to the segmentation result is greater than or equal to a certain preset value, correspondingly, adding the first function value and the second function value corresponding to each segmentation result to obtain the sum of the function values corresponding to each segmentation result, and further judging whether the sum of the function values of each segmentation result is greater than or equal to the preset value in the reasonability condition, if so, indicating that the segmentation result meets the reasonability condition, otherwise, not meeting the reasonability condition.
Of course, other ways may also be adopted to combine the input interval between each adjacent syllable segment in the segmentation result with the number of syllable segments in the segmentation result, and determine whether the segmentation result meets the rationality condition, which is not limited herein.
The input interval between adjacent syllable segments in the segmentation result is combined with the number of the syllable segments, whether each segmentation result meets the rationality condition is judged, the accuracy of segmentation result screening can be further improved, some segmentation results meeting the conditions required by the input interval between the syllable segments but with unreasonable syllable segment number are further eliminated, and the meaningless candidate items for the user are further reduced.
Because different user input interval habits are different, whether each segmentation result meets the rationality condition can be judged by combining the input interval habits of the users.
Specifically, historical input interval data of the user who inputs the pinyin string may be obtained first. The historical input interval data refers to an input interval between adjacent syllables when a user uses an input method for inputting, for example, if an interval of 0.3s is required for inputting the middle of two adjacent syllables when a user uses the input method for inputting, the historical input interval data of the user is 0.3 s. Also, the historical input interval data may be different for different users.
Specifically, when historical input interval data of a user is acquired, an identifier of an input device or a current login account of an input method can be acquired, accordingly, the user inputting the pinyin string is determined according to the identifier of the input device or the current login account of the input method, and then the historical input interval data corresponding to the user is acquired. Of course, other methods may be used to obtain the historical input interval data of the user, and are not limited herein.
And judging whether the segmentation result meets the rationality condition or not according to the historical input interval data and the input interval between the adjacent syllable segments in the segmentation result.
When different users input pinyin strings, input interval habits among input adjacent syllable segments may be different, if the same rationality condition is set for all users, the same rationality condition may be used as a basis for judging each segmentation result due to different input interval habits of the users, and the selected segmentation result is not accurate, and correspondingly, the candidate item determined according to the selected segmentation result may not be the candidate item required by the users.
For example, the historical input interval data of a certain user is obtained to be 0.3s, and the system sets the same rationality condition for all users, wherein the rationality condition is that the average value of the input intervals between the adjacent syllable segments in the segmentation result is greater than or equal to 0.5 s. However, according to the input habit of the user, the input interval between adjacent syllable segments in normal input is 0.3s, which is shorter than 0.5s in the rationality condition, and if 0.5s in the rationality condition is used as the judgment basis, the segmentation result containing the user input syllable is judged as the segmentation result which does not satisfy the rationality condition because the input interval between adjacent syllable segments in the segmentation result is shorter, and then the candidate item required by the user cannot be obtained without determining the candidate item according to the segmentation result.
In order to prevent the above phenomenon, the obtained historical input interval data of the user and the input interval between each adjacent syllable segment in the segmentation result may be combined, and then whether each segmentation result satisfies the rationality condition may be determined.
It should be noted that, in some cases, the reference value of the collected user input interval data is not high, and the input interval habit of the user may not be accurately reflected according to the historical input interval data determined according to the user input interval data in the case. For example, when a user uses an input method to input data while walking, input interval data may be long, or when the user slows down or interrupts inputting due to external influences in the input process, the obtained input interval data is long, and the historical input interval data determined by using the input interval data in these cases cannot accurately reflect the input interval habit of the user. Therefore, when the historical input interval data of the user is obtained, the historical input interval data of the user needs to be screened, unreasonable input interval data in the historical input interval data are filtered, and the historical input interval data of the user is determined according to the remaining reasonable input interval data.
Two optional methods provided by this embodiment to determine whether the segmentation result satisfies the rationality condition are described below:
the first method is to obtain the historical input interval data of the user inputting the pinyin string, and to set the rationality condition for the user according to the historical input interval data of the user, or to adjust the preset rationality condition, so as to obtain the rationality condition according with the input habit of the user. And then the rationality condition is used as a judgment standard to judge the input interval between adjacent syllable segments in each segmentation result and judge whether each segmentation result meets the rationality condition related to the input habit of the user. And then, determining a candidate item according to the segmentation result meeting the rationality condition.
For example, the historical input interval data of a certain user is obtained to be 0.3s, that is, when the user inputs a pinyin string, the input interval between adjacent syllable segments is generally 0.3s, and accordingly, a rationality condition may be set for the user such that the average value of the input interval between adjacent syllable segments in the segmentation result is greater than or equal to 0.3 s. And calculating the average value of the input intervals between the adjacent syllable segments in each segmentation result, eliminating the segmentation results of which the average value of the input intervals is less than 0.3s, and determining the candidate items only according to the segmentation results of which the input intervals are more than or equal to 0.3 s.
In the second method, after historical input interval data of a user inputting a pinyin string is acquired, the input interval between adjacent syllable segments is adjusted according to the difference between the historical input interval data and a preset rationality condition, and whether the input interval between the adjacent syllable segments in the adjusted segmentation result meets the rationality condition is further judged.
For example, the historical input interval data of a certain user is obtained as 0.3s, the preset reasonableness condition is that the average value of the input intervals between adjacent syllable segments in the segmentation result is greater than or equal to 0.5s, the input interval between adjacent syllable segments in the segmentation result during normal input is 0.3s and is shorter than 0.5s in the reasonableness condition according to the input habit of the user, the input interval between adjacent syllable segments in the obtained segmentation result can be adjusted according to the difference between the historical input interval data of the user and the reasonableness condition, that is, the average value of the input intervals between adjacent syllable segments in the obtained segmentation result can be increased by 0.2s according to the difference between the historical input interval data of the user and the data in the reasonableness condition, and the average value of the input intervals between adjacent syllable segments in the obtained segmentation result can be increased by 0.2s according to the adjusted average value of the input intervals between adjacent syllable segments in the obtained segmentation result, and judging whether each segmentation result meets the rationality condition.
In addition, when judging whether the segmentation result meets the rationality condition, the number of syllable segments in the segmentation result can be further considered, namely, whether the segmentation result meets the rationality condition is judged according to the input interval between every two adjacent syllable segments in the segmentation result, the historical input interval data of the user and the number of syllable segments in the segmentation result, and the accuracy of the screened segmentation result meeting the rationality condition is further improved.
By combining the input interval habit of the user, the probability that the syllables contained in the screened segmentation result meeting the rationality condition are segmented into syllables input by the user is higher, and correspondingly, the probability of the candidate item required by the user is higher according to the candidate item determined according to the segmentation result meeting the rationality condition.
Further, in some cases, the reference value of the user's input interval data may not be high, for example, when the user is doing something else while inputting using the input method, the input interval at this time may be different from the input interval at which the user inputs using the input method in ordinary times. For example, the input interval of the user inputting by using the input method while walking may be different from the input interval of the normal input, or the user may be influenced by the outside world during the input process to slow down or interrupt the input, and at this time, the input interval of the user is also different from the input interval of the normal input. Therefore, in the above case, if the segmentation result satisfying the reasonableness condition is only screened according to the input interval between the syllable segments in the segmentation result, the candidate determined according to the obtained segmentation result satisfying the reasonableness condition may not be the candidate desired by the user. Therefore, the habit of inputting the syllable data by the user can be obtained, and whether each segmentation result meets the rationality condition or not can be judged by combining the habit of inputting the syllable data by the user.
Specifically, historical input syllable data of the user who inputs the pinyin string may be obtained first. The historical input syllable data refers to the number of syllables frequently input by the user in one input, for example, if a user frequently inputs 2 syllables in one input, the number of the user's historical input syllables is 2. Also, the user's historical input syllable data may differ for different users.
Specifically, when historical input syllable data of a user is acquired, an identifier of an input device or a current login account of an input method can be acquired, accordingly, the user inputting the pinyin string is determined according to the identifier of the input device or the current login account of the input method, and the historical input syllable data corresponding to the user is acquired. Of course, other methods may be used to obtain the user's historical input syllable data, and is not limited herein.
And judging whether the segmentation result meets the rationality condition or not according to the historical input syllable data, the input interval between each adjacent syllable segment in the segmentation result and the number of the syllable segments.
Acquiring the number of syllable segments contained in each segmentation result, and judging the habit coincidence degree of the number of syllable segments contained in each segmentation result and the number of input syllables of the user according to the historical input data of the user; judging the coincidence degree of the input interval between the adjacent syllable segments in the segmentation result and the interval between the normal input syllables according to the input interval between the adjacent syllable segments in the segmentation result; the coincidence degree of the syllable segment quantity and the user input syllable quantity habit is combined with the coincidence degree of the input interval between every two adjacent syllable segments and the interval between the normal input syllables, and whether the segmentation result meets the rationality condition or not is judged.
An optional method for determining whether the segmentation result meets the rationality condition provided by this embodiment is described as follows:
setting a third function by combining historical input syllable data of a user, wherein the third function takes the number of syllable segments in a segmentation result as a variable, the first function values corresponding to different syllable segment numbers are different, if the number of syllable segments contained in the segmentation result is closer to the historical input syllable data of the user, the first function value corresponding to the segmentation result is larger, otherwise, if the number of syllable segments contained in the segmentation result is more different from the historical input syllable number of the user, the first function value corresponding to the segmentation result is smaller.
And setting a fourth function, wherein the function takes the input interval between every two adjacent syllable segments in the segmentation result as a variable, and the input interval between every two adjacent syllable segments in the segmentation result is taken into the function, correspondingly, the more the input interval between every two adjacent syllable segments conforms to the input interval between the adjacent syllables in normal input, the larger the function value corresponding to the segmentation result is, and conversely, the more the difference between the input interval between every two adjacent syllable segments and the input interval between the adjacent syllables in normal input is, the smaller the function value corresponding to the segmentation result is.
And setting the reasonability condition that the sum of the third function value and the fourth function value is greater than or equal to a certain preset reasonability condition value, so that if the sum of the third function value and the fourth function value corresponding to the segmentation result is greater than or equal to the preset reasonability condition value, the segmentation result meets the reasonability condition, otherwise, the segmentation result does not meet the reasonability condition.
Of course, other ways may also be adopted, and whether the segmentation result meets the rationality condition is determined according to the number of the historical input syllables, the input interval between each adjacent syllable segment in the segmentation result, and the number of syllable segments, which is not limited herein.
For ease of understanding, the above-described method is exemplified below:
the historical input syllable data of a certain user is obtained to be 2, namely, the user generally inputs two syllables in one input. The method comprises the steps of carrying out segmentation on a pinyin string 'souguo' input by a user at a certain time to obtain two segmentation results, wherein the first segmentation is divided into two syllable segments of 'sou' and 'gou', the second segmentation is divided into four syllable segments of's', 'ou', 'g' and 'ou', and a third function is set as g (x), wherein x represents the number of syllable segments in each segmentation result, and the number of syllable segments in the first segmentation result is the same as the number of historical input syllables of the user, so that g (x) corresponding to the first segmentation result is obtained1) The larger value is 200, and the number of syllable segments in the second segmentation result is more different from the number of the syllables input by the user, which corresponds to g (x) of the second segmentation result2) The value is smaller and is 50.
Setting a fourth function f (y) taking the input interval between every two adjacent syllable segments in the segmentation result as a variable, wherein y represents the average value of the input interval between every two adjacent syllable segments in the segmentation result, and substituting the average value of the input interval between every two adjacent syllable segments in the first segmentation result into f (y) to obtain f (y) corresponding to the first segmentation result1) 150, dividing adjacent syllables in the second segmentation resultThe average value of the input intervals between the segments is substituted into f (y) to obtain f (y) corresponding to the second segmentation result2) Is 70.
Since the preset rationality condition is that the sum of the two functions is greater than or equal to 300, it is obvious that the rationality condition is satisfied for the sum of the two functions being 350 for the first kind of segmentation result, while the rationality condition is not satisfied for the sum of the two functions being only 120 for the second kind of segmentation result.
The method combines the habit of inputting the number of syllables by the user with the input interval between adjacent syllable segments in the segmentation result, judges whether each segmentation result meets the rationality condition, further improves the accuracy of segmentation result screening, eliminates some segmentation results which do not accord with the input habit of the user, or places candidate items corresponding to the segmentation results which do not accord with the input habit of the user behind a display area so as to prevent the candidate items as interference items from influencing the candidate items required by user selection.
In addition, before determining candidate items aiming at the pinyin strings according to the segmentation results meeting the reasonableness conditions, error correction processing can be carried out on syllable segments in the segmentation results meeting the reasonableness conditions.
Specifically, if the wrongly input pinyin exists in the segmentation result meeting the rationality condition, the error correction can be performed on the syllable segments in the segmentation result meeting the rationality condition. For example, the segmentation result that the pinyin string "wown" meets the rationality condition includes syllable segments "wo" and "mwn", error correction is performed on "mwn" to obtain correct "men", and then candidates determined according to the syllable segments "wo" and "men" in the segmentation result after error correction are obtained.
During error correction, the segmentation result which does not meet the rationality condition does not need to be considered, and only the segmentation result which meets the rationality condition is corrected, namely, the meaningless segmentation result does not need to be corrected, so that the error correction workload of the system is reduced, and the error correction efficiency is improved.
Generally, the segmentation result of a certain pinyin string is judged, and after whether each segmentation result meets the rationality condition or not is judged, the condition that a plurality of segmentation results meet the rationality condition exists, and under the condition, candidate items corresponding to the plurality of segmentation results can be sequenced, so that a user can quickly select a required option.
Example two
Referring to fig. 2, a flowchart of the method for determining a pinyin string candidate item and a display order provided in this embodiment is provided, and this embodiment introduces the method by taking the existence of two segmentation results meeting the reasonableness condition as an example.
Step 201: and sorting the candidate items aiming at the first cutting result according to the satisfaction degree of the first cutting result and the rationality condition.
Step 202: and sorting the candidate items aiming at the second segmentation result according to the satisfaction degree of the second segmentation result and the rationality condition.
And acquiring the satisfaction degrees of each segmentation result and the rationality condition, if the satisfaction degree of a certain segmentation result and the rationality condition is higher, indicating that the possibility that the syllable in the segmentation result is the syllable input by the user is higher, otherwise, if the satisfaction degree of a certain segmentation result and the rationality condition is lower, indicating that the possibility that the segmentation result is the syllable input by the user is relatively lower.
When determining the candidate items according to the segmentation result meeting the rationality condition, a plurality of candidate items may be determined for the same segmentation result, and at this time, the candidate items corresponding to the same segmentation result may be sorted according to other functions in the input method. Specifically, the candidate items can be sorted according to the word forming habits of the user, and the candidate items which are more in line with the word forming habits of the user are arranged in front.
It should be noted that step 201 and step 202 are two parallel steps, and the execution sequence is not sequential, and step 201 may be executed first, and then step 202 is executed, or step 202 may be executed first, and then step 201 may be executed, or step 201 and step 202 may be executed simultaneously, which is not limited herein.
Step 203: and determining candidate items and a display sequence aiming at the pinyin string according to the sorting result aiming at the first segmentation result and the sorting result aiming at the second segmentation result.
And integrating the sorting results aiming at the candidate items corresponding to the segmentation results to determine the candidate items and the display sequence aiming at the pinyin string input by the user. During specific implementation, the satisfaction degree between each segmentation result and the rationality condition and the influence of other functions of the input method on the candidate items corresponding to each segmentation result are comprehensively considered, and the candidate items capable of being displayed in the display area and the display sequence of the candidate items displayed in the display area are selected from the candidate items corresponding to each segmentation result meeting the rationality condition.
The following describes a specific implementation method for selectively determining pinyin string candidate items and display order provided by this embodiment:
and according to the satisfaction degree of each segmentation result and the rationality condition, performing first scoring on the candidate items corresponding to each segmentation result, wherein the first scoring Score Score1 is the same corresponding to different candidate items of the same segmentation result. And in combination with other functions in the input method, scoring Score2 according to the candidate items aiming at the same segmentation result. Weights w1 and w2 are respectively set for the first Score1 and the second Score1, and the first Score and the second Score of each candidate are integrated according to the corresponding weights in a linear weighting mode to obtain a total Score corresponding to each candidate, namely Score equal to w1 plus Score1 plus w2 plus Score 2. And then sorting the candidates aiming at the same segmentation result according to the total Score of the candidates. The weights set for the first term score w1 and the second term score w2 can be set according to historical experience.
The total Score of the candidate items corresponding to the segmentation result satisfying the rationality condition is obtained, and then the display order is set for all the candidate items according to the total Score of all the candidate items, specifically, the display order is set to be earlier for the candidate item with higher total Score, and the candidate item with lower total Score may be eliminated or set to be later.
For ease of understanding, the above-described method is exemplified below:
the pinyin string input by a certain user is obtained as 'fangan', and two segmentation results in the segmentation results aiming at the pinyin string both meet the rationality condition. The first segmentation result contains syllable segments "fang" and "an", and the second segmentation result contains syllable segments "fan" and "gan".
The candidate items aiming at the first cutting result comprise a scheme and a room darkness, the satisfaction degree of the first cutting result and the rationality condition is high, the first item scores of the candidate items corresponding to the first cutting result are both 450, the candidate items aiming at the first cutting result are scored in a second way by combining the word forming habit of the user in the input method, the second item score of the candidate item scheme is 200 and the second item score of the candidate item room darkness is 10 because the scheme is more consistent with the word forming habit of the user relative to the room darkness. And respectively assigning different weights to the first item score and the second item score, wherein the weight of the first item score is 0.9, and the weight of the second item score is 0.1. With a linear weighted calculation, the total score corresponding to the candidate "solution" is calculated as 425 and the total score corresponding to "dark room" is calculated as 406. Accordingly, for the first scoring result, the "plan" ranks higher than the "house darkness".
The candidates for the second segmentation result include "dislike" and "vexation", because the second segmentation result and the satisfaction degree of the rationality condition are low, the first item scores of the candidates corresponding to the second segmentation result are all 440, the candidates for the second segmentation result are scored in combination with the word formation habit of the user in the input method, and because the "dislike" is more in line with the word formation habit of the user than the "vexation", the second item score of the candidates for the "dislike" is 200, and the second item score of the candidates for the "vexation" is 50. According to the same weight, the total score of each candidate corresponding to the second segmentation result is calculated, specifically, the total score of "dislike" is calculated as 416, and the total score of "vexation" is calculated as 401. Accordingly, for the second segmentation result, the "dislike" ranking is higher than the "vexation".
The ranking results of the candidates for the first cutting result are combined with the ranking results of the candidates for the second cutting result, that is, the candidates for the first cutting result and the candidates for the second cutting result are ranked from high to low according to the total score of each candidate, so as to obtain a candidate display sequence, that is, the candidate display sequence is "scheme", "reflexive", "dark room" or "vexation" from front to back.
In the method provided by the embodiment, in each segmentation result meeting the reasonableness condition, the display sequence of the candidate items corresponding to each segmentation result is further determined according to the satisfaction degree of each segmentation result and the reasonableness condition, so that a user can quickly find the required candidate items when selecting the candidate items.
Based on the pinyin string splitting method provided in the foregoing embodiment, this embodiment provides a pinyin string splitting device, and fig. 3 shows a structural block diagram of the pinyin string splitting device, where the device includes:
the segmentation module 301 is configured to segment a plurality of segmentation results according to the obtained pinyin string, where any one of the segmentation results includes a plurality of syllable segments;
the judging module 302 is configured to judge whether the segmentation result meets a rationality condition according to an input interval between adjacent syllable segments in the segmentation result;
the determining module 303 is configured to determine a candidate item for the pinyin string according to the segmentation result that meets the reasonableness condition.
Optionally, the determining module includes:
and the first judging unit is used for judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result and the number of the syllable segments.
Optionally, the determining module includes:
a historical input interval data acquisition unit, configured to acquire historical input interval data of a user who inputs the pinyin string;
and the second judgment unit is used for judging whether the segmentation result meets the rationality condition or not according to the historical input interval data and the input interval between the adjacent syllable segments in the segmentation result.
Optionally, the first determining unit includes:
a historical input syllable data acquisition subunit, configured to acquire historical input syllable data of a user who inputs the pinyin string;
and the first judgment subunit is used for judging whether the segmentation result meets the rationality condition or not according to the number of the historical input syllables, the input interval between each two adjacent syllable segments in the segmentation result and the number of the syllable segments.
Optionally, the apparatus further comprises:
and the error correction module is used for correcting the syllable segments in the segmentation result meeting the reasonableness condition.
Optionally, if the segmentation result that meets the reasonableness condition includes a first segmentation result and a second segmentation result, the device determines a candidate item for the pinyin string according to the segmentation result that meets the reasonableness condition, and includes:
the first sorting module is used for sorting the candidate items aiming at the first cutting result according to the satisfaction degree of the first cutting result and the rationality condition;
the second sorting module is used for sorting the candidate items aiming at the second segmentation result according to the satisfaction degree of the second segmentation result and the rationality condition;
and the candidate item determining module is used for determining candidate items and display sequences of the pinyin strings according to the sorting result aiming at the first segmentation result and the sorting result aiming at the second segmentation result.
The pinyin string segmentation device judges whether each segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in each segmentation result, the segmentation result meeting the rationality condition obtained under the condition not only is segmented according to syllables, but also accords with the characteristics of the input interval, and the segmentation result which is segmented according to the syllables but has over-small input interval between the syllable segments is eliminated. Therefore, when the candidate item is determined, the segmentation result with the excessively small input interval between the eliminated syllable segments does not need to be considered, the candidate item only needs to be determined according to the segmentation result meeting the rationality condition, and correspondingly, the number of the candidate items which are meaningless or unnecessary for the input requirement of the user in the candidate items displayed by the pinyin string is reduced, so that the time for the user to select the candidate item is reduced, and the input experience of the user is improved.
Fig. 4 is a block diagram illustrating an apparatus 400 for pinyin-string splitting, according to an example embodiment. For example, the apparatus 400 may be a robot, a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 4, the apparatus 400 may include one or more of the following components: processing components 402, memory 404, power components 406, multimedia components 408, audio components 410, input/output (I/O) interfaces 412, sensor components 414, and communication components 416.
The processing component 402 generally controls overall operation of the apparatus 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 402 may include one or more processors 420 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.
The memory 404 is configured to store various types of data to support operations at the apparatus 400. Examples of such data include instructions for any application or method operating on the device 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power supply components 406 provide power to the various components of device 400. The power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 400.
The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 400 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when apparatus 400 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.
The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the apparatus 400. For example, the sensor assembly 414 may detect an open/closed state of the apparatus 400, the relative positioning of the components, such as a display and keypad of the apparatus 400, the sensor assembly 414 may also detect a change in the position of the apparatus 400 or a component of the apparatus 400, the presence or absence of user contact with the apparatus 400, orientation or acceleration/deceleration of the apparatus 400, and a change in the temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 8G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 404 comprising instructions, executable by the processor 420 of the apparatus 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a pinyin-string splitting method, the method comprising:
a plurality of segmentation results are obtained according to the obtained pinyin string segmentation, wherein any segmentation result comprises a plurality of syllable segments;
judging whether the segmentation result meets the rationality condition or not according to the input interval between the adjacent syllable segments in the segmentation result;
and determining candidate items aiming at the pinyin strings according to the segmentation result meeting the reasonableness condition.
Fig. 5 is a schematic structural diagram of a server in an embodiment of the present invention. The server 500 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and memory 532, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.
The server 500 may also include one or more power supplies 524, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, one or more keyboards 554, and/or one or more operating systems 541, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A pinyin string segmentation method, characterized in that the method comprises:
a plurality of segmentation results are obtained according to the obtained pinyin string segmentation, wherein any segmentation result comprises a plurality of syllable segments;
judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result, and judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result, wherein the judgment comprises the following steps: judging whether the segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in the segmentation result and the number of syllable segments;
and determining candidate items aiming at the pinyin strings according to the segmentation result meeting the reasonableness condition.
2. The method according to claim 1, wherein said determining whether the segmentation result satisfies the rationality condition according to the input interval between adjacent syllable segments in the segmentation result comprises:
acquiring historical input interval data of a user inputting the pinyin string;
and judging whether the segmentation result meets the rationality condition or not according to the historical input interval data and the input interval between the adjacent syllable segments in the segmentation result.
3. The method according to claim 1, wherein the judging whether the segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in the segmentation result and the number of syllable segments comprises:
acquiring the historical input syllable number of a user inputting the pinyin string;
and judging whether the segmentation result meets the rationality condition or not according to the number of the historical input syllables, the input interval between each adjacent syllable segment in the segmentation result and the number of the syllable segments.
4. The method according to claim 1, before the determining a candidate for the pinyin string according to the result of the segmentation that satisfies the reasonableness condition, further comprising:
and correcting errors of syllable segments in the segmentation result meeting the reasonableness condition.
5. The method according to claim 1, wherein if the segmentation result that meets the reasonableness condition includes a first segmentation result and a second segmentation result, determining the candidate for the pinyin string according to the segmentation result that meets the reasonableness condition includes:
sorting the candidate items aiming at the first cutting result according to the satisfaction degree of the first cutting result and the rationality condition;
sorting the candidate items aiming at the second segmentation result according to the satisfaction degree of the second segmentation result and the rationality condition;
and determining candidate items and a display sequence aiming at the pinyin string according to the sorting result aiming at the first segmentation result and the sorting result aiming at the second segmentation result.
6. A pinyin string segmentation device, characterized in that the device comprises:
the segmentation module is used for segmenting a plurality of segmentation results according to the obtained pinyin strings, wherein any segmentation result comprises a plurality of syllable segments;
the judging module is used for judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result;
the determining module is used for determining candidate items aiming at the pinyin strings according to the segmentation result meeting the reasonableness condition;
the judging module comprises:
and the first judging unit is used for judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result and the number of the syllable segments.
7. The apparatus of claim 6, wherein the determining module comprises:
a historical input interval data acquisition unit, configured to acquire historical input interval data of a user who inputs the pinyin string;
and the second judgment unit is used for judging whether the segmentation result meets the rationality condition or not according to the historical input interval data and the input interval between the adjacent syllable segments in the segmentation result.
8. The apparatus according to claim 6, wherein the first judging unit includes:
a historical input syllable data acquisition subunit, configured to acquire the number of historical input syllables of the user who inputs the pinyin string;
and the first judgment subunit is used for judging whether the segmentation result meets the rationality condition or not according to the number of the historical input syllables, the input interval between each two adjacent syllable segments in the segmentation result and the number of the syllable segments.
9. The apparatus of claim 6, further comprising:
and the error correction module is used for correcting the syllable segments in the segmentation result meeting the reasonableness condition.
10. The apparatus according to claim 6, wherein if the segmentation result that satisfies the reasonableness condition includes a first segmentation result and a second segmentation result, the apparatus determines a candidate for the pinyin string according to the segmentation result that satisfies the reasonableness condition, and includes:
the first sorting module is used for sorting the candidate items aiming at the first cutting result according to the satisfaction degree of the first cutting result and the rationality condition;
the second sorting module is used for sorting the candidate items aiming at the second segmentation result according to the satisfaction degree of the second segmentation result and the rationality condition;
and the candidate item determining module is used for determining candidate items and display sequences of the pinyin strings according to the sorting result aiming at the first segmentation result and the sorting result aiming at the second segmentation result.
11. A processing apparatus for pinyin string splitting, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to include instructions for:
a plurality of segmentation results are obtained according to the obtained pinyin string segmentation, wherein any segmentation result comprises a plurality of syllable segments;
judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result, and judging whether the segmentation result meets the rationality condition according to the input interval between the adjacent syllable segments in the segmentation result, wherein the judgment comprises the following steps: judging whether the segmentation result meets the rationality condition according to the input interval between each adjacent syllable segment in the segmentation result and the number of syllable segments;
and determining candidate items aiming at the pinyin string according to the segmentation result meeting the reasonability condition.
12. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform a pinyin string splitting method as claimed in one or more of claims 1 to 5.
CN201711284974.7A 2017-12-07 2017-12-07 Pinyin string segmentation method and device Active CN109901725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711284974.7A CN109901725B (en) 2017-12-07 2017-12-07 Pinyin string segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711284974.7A CN109901725B (en) 2017-12-07 2017-12-07 Pinyin string segmentation method and device

Publications (2)

Publication Number Publication Date
CN109901725A CN109901725A (en) 2019-06-18
CN109901725B true CN109901725B (en) 2022-05-06

Family

ID=66939205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711284974.7A Active CN109901725B (en) 2017-12-07 2017-12-07 Pinyin string segmentation method and device

Country Status (1)

Country Link
CN (1) CN109901725B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075262A (en) * 2007-06-12 2007-11-21 腾讯科技(深圳)有限公司 Method and system for inputting Chinese character by computer
CN102200839A (en) * 2010-03-25 2011-09-28 阿里巴巴集团控股有限公司 Method and system for processing pinyin string in process of inputting Chinese characters
CN102566775A (en) * 2010-12-31 2012-07-11 上海量明科技发展有限公司 Input method and system for generating character interval
CN102866782A (en) * 2011-07-06 2013-01-09 哈尔滨工业大学 Input method and input method system for improving sentence generating efficiency
CN104345896A (en) * 2013-07-31 2015-02-11 淘宝(中国)软件有限公司 Alphabetic writing word group inputting method and alphabetic writing word group inputting system
CN104423621A (en) * 2013-08-22 2015-03-18 北京搜狗科技发展有限公司 Pinyin string processing method and device
CN104516522A (en) * 2013-09-29 2015-04-15 北京三星通信技术研究有限公司 Input method and device of nine-rectangle-grid keyboard
CN105335415A (en) * 2014-08-04 2016-02-17 北京搜狗科技发展有限公司 Search method based on input prediction, and input method system
CN105843414A (en) * 2015-01-13 2016-08-10 北京搜狗科技发展有限公司 Input correction method for input method and input method device
CN106484131A (en) * 2015-09-02 2017-03-08 北京搜狗科技发展有限公司 A kind of input error correction method and input subtraction unit
CN106484132A (en) * 2015-09-02 2017-03-08 北京搜狗科技发展有限公司 A kind of input error correction method and input subtraction unit

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101644961A (en) * 2009-08-14 2010-02-10 北京搜狗科技发展有限公司 Encoded string sequencing method, device and character input method and device
JP5067680B2 (en) * 2010-09-14 2012-11-07 靖彦 佐竹 Chinese electronic device input method
CN102866783B (en) * 2011-07-06 2015-07-15 哈尔滨工业大学 Syncopation method of Chinese phonetic string and system thereof
CN102955770B (en) * 2011-08-17 2017-07-11 深圳市世纪光速信息技术有限公司 A kind of phonetic automatic identifying method and system
CN104252484B (en) * 2013-06-28 2018-10-19 重庆新媒农信科技有限公司 A kind of phonetic error correction method and system
JP2015022590A (en) * 2013-07-19 2015-02-02 株式会社東芝 Character input apparatus, character input method, and character input program
CN103885608A (en) * 2014-03-19 2014-06-25 百度在线网络技术(北京)有限公司 Input method and system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075262A (en) * 2007-06-12 2007-11-21 腾讯科技(深圳)有限公司 Method and system for inputting Chinese character by computer
CN102200839A (en) * 2010-03-25 2011-09-28 阿里巴巴集团控股有限公司 Method and system for processing pinyin string in process of inputting Chinese characters
CN102566775A (en) * 2010-12-31 2012-07-11 上海量明科技发展有限公司 Input method and system for generating character interval
CN102866782A (en) * 2011-07-06 2013-01-09 哈尔滨工业大学 Input method and input method system for improving sentence generating efficiency
CN104345896A (en) * 2013-07-31 2015-02-11 淘宝(中国)软件有限公司 Alphabetic writing word group inputting method and alphabetic writing word group inputting system
CN104423621A (en) * 2013-08-22 2015-03-18 北京搜狗科技发展有限公司 Pinyin string processing method and device
CN104516522A (en) * 2013-09-29 2015-04-15 北京三星通信技术研究有限公司 Input method and device of nine-rectangle-grid keyboard
CN105335415A (en) * 2014-08-04 2016-02-17 北京搜狗科技发展有限公司 Search method based on input prediction, and input method system
CN105843414A (en) * 2015-01-13 2016-08-10 北京搜狗科技发展有限公司 Input correction method for input method and input method device
CN106484131A (en) * 2015-09-02 2017-03-08 北京搜狗科技发展有限公司 A kind of input error correction method and input subtraction unit
CN106484132A (en) * 2015-09-02 2017-03-08 北京搜狗科技发展有限公司 A kind of input error correction method and input subtraction unit

Also Published As

Publication number Publication date
CN109901725A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
EP3454192B1 (en) Method and device for displaying page
US20180046336A1 (en) Instant Message Processing Method and Apparatus, and Storage Medium
CN107229348B (en) Input error correction method and device for input error correction
US20170371513A1 (en) Method and apparatus for text selection
CN107870677B (en) Input method, input device and input device
EP3641285B1 (en) Method and device for starting application program
EP3958110A1 (en) Speech control method and apparatus, terminal device, and storage medium
US11335348B2 (en) Input method, device, apparatus, and storage medium
US20150339016A1 (en) Tab creation method, device, and terminal
CN106572268B (en) Information display method and device
CN108073303B (en) Input method and device and electronic equipment
CN112051949A (en) Content sharing method and device and electronic equipment
US10229165B2 (en) Method and device for presenting tasks
US20160349947A1 (en) Method and device for sending message
CN109901725B (en) Pinyin string segmentation method and device
CN109799916B (en) Candidate item association method and device
CN109901726B (en) Candidate word generation method and device and candidate word generation device
US10198614B2 (en) Method and device for fingerprint recognition
CN110795014A (en) Data processing method and device and data processing device
CN110417987B (en) Operation response method, device, equipment and readable storage medium
CN113157090A (en) Bright screen control method and device of electronic equipment and electronic equipment
CN109917927B (en) Candidate item determination method and device
CN113031837A (en) Content sharing method and device, storage medium, terminal and server
CN110874146A (en) Input method and device and electronic equipment
CN110780749A (en) Character string error correction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant