CN111665956B - Candidate character string processing method and device, electronic equipment and storage medium - Google Patents

Candidate character string processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111665956B
CN111665956B CN202010304923.1A CN202010304923A CN111665956B CN 111665956 B CN111665956 B CN 111665956B CN 202010304923 A CN202010304923 A CN 202010304923A CN 111665956 B CN111665956 B CN 111665956B
Authority
CN
China
Prior art keywords
character
error correction
character string
candidate
candidate character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010304923.1A
Other languages
Chinese (zh)
Other versions
CN111665956A (en
Inventor
王鑫
孙明明
李平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010304923.1A priority Critical patent/CN111665956B/en
Publication of CN111665956A publication Critical patent/CN111665956A/en
Application granted granted Critical
Publication of CN111665956B publication Critical patent/CN111665956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0236Character input methods using selection techniques to select from displayed items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a candidate character string processing method, a candidate character string processing device, electronic equipment and a storage medium, and relates to the field of information recommendation. The specific implementation scheme is as follows: acquiring an input original character string based on coordinates of a plurality of input points input by a user; and carrying out character error correction according to a preset character error correction strategy based on the original character strings to obtain a plurality of candidate character strings. According to the technology of the application, the character granularity error correction can be realized on the character string, so that the error corrected candidate character string is more in line with the expectation of a user, the accuracy of the obtained candidate character string can be effectively improved, and the input accuracy and the input efficiency of the input method are further improved.

Description

Candidate character string processing method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for processing candidate character strings, an electronic device, and a storage medium.
Background
Mobile devices (e.g., smartphones, tablet computers) play a very important role in everyday life, and more internet activities are performed through mobile devices. In many internet activities, the most important communication mode is text input through an input method of the mobile device. Because the mobile device has smaller display screen and smaller character area on the screen soft keyboard due to the limitation of the volume of the mobile device, the user easily touches the outside of the character area in the input process, and input errors are generated, so that the user has to delete and re-input.
For example, in order to improve the input efficiency, the existing input method may acquire, according to the input information of the user, a word with a spelling similar to or a meaning similar to that of the input information, and use the word as a candidate word to recommend to the user.
However, the candidate words obtained in the above manner are difficult to predict the true intention of the user, and the accuracy of the predicted candidate words is poor.
Disclosure of Invention
In order to solve the technical problems, the application provides a candidate character string processing method, a candidate character string processing device, an electronic device and a storage medium.
According to a first aspect, there is provided a method for processing a candidate character string in an input method, including:
acquiring an input original character string based on coordinates of a plurality of input points input by a user;
and carrying out character error correction according to a preset character error correction strategy based on the original character strings to obtain a plurality of candidate character strings.
According to a second aspect, there is provided a processing apparatus for candidate character strings in an input method, including:
the character string acquisition module is used for acquiring an input original character string based on coordinates of a plurality of input points input by a user;
and the character error correction module is used for carrying out character error correction according to a preset character error correction strategy based on the original character string to obtain a plurality of candidate character strings.
According to a third aspect, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.
According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
According to the method and the device, the technical problem that the accuracy of the predicted candidate words in the prior art is poor is solved, character granularity correction can be achieved on the character strings, the corrected candidate character strings are more in line with the expectations of users, the accuracy of the obtained candidate character strings can be effectively improved, and then the input accuracy and the input efficiency of an input method are improved. Therefore, the technical scheme of the application can also effectively enhance the experience of the user using the input method, and further enhance the viscosity of the user to the input method.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present application;
FIG. 2 is a schematic diagram according to a second embodiment of the present application;
FIG. 3 is a schematic view of a third embodiment of the present application;
FIG. 4 is a schematic view of a fourth embodiment of the present application;
FIG. 5 is a schematic view of a fifth embodiment of the present application;
FIG. 6 is a schematic view of a sixth embodiment of the present application;
fig. 7 is a block diagram of an electronic device for implementing a method for processing candidate character strings in an input method according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
FIG. 1 is a schematic diagram according to a first embodiment of the present application; as shown in fig. 1, the present application provides a method for processing candidate character strings in an input method, which specifically includes the following steps:
s101, acquiring an input original character string based on coordinates of a plurality of input points input by a user;
s102, performing character error correction according to a preset character error correction strategy based on the original character strings to obtain a plurality of candidate character strings.
The processing device of the candidate character string in the execution subject input method of the processing method of the candidate character string in the input method of the embodiment may be set in the input method, and is used for recommending a plurality of candidate character strings based on the original character string input in the input method.
Specifically, when the user inputs an original character string by using the input method, the user may typically open a soft keyboard of the input method, click on a corresponding position according to a character identified in the soft keyboard, and use the corresponding position to implement the input of the character at the position. I.e. on the user side, the user clicks on a location. On the input method side, a coordinate system is established, which character corresponds to the clicked coordinate position in the soft keyboard is detected, so that which character the user wants to input is determined, accordingly, the coordinate of each input point sequentially input by the user can be detected, and each character input by the user is further determined. When the user input is stopped, the stopping time length can be detected and determined to be longer than a preset time length threshold, the user input can be considered to be ended, the end of the original character string input by the user is determined, and at the moment, the original character string input by the user can be acquired according to the sequence of the user input. Or the user can also input an ending symbol when the input ends, and determine that the input of the original character string ends. In practical applications, the original character string input by the user may include two or more characters.
In this embodiment, a plurality of candidate character strings may be obtained according to a preset character error correction policy based on the original character string. For example, character error correction may be directly performed on the original character string to obtain a part of candidate character strings, or character error correction may be performed again based on the candidate character string that has undergone the character error correction processing, to obtain a corresponding candidate character string. In practical application, one, two or more character error correction strategies can be adopted to perform character error correction processing on an original character string or a candidate character string subjected to character error correction processing according to practical requirements, so as to obtain a plurality of candidate character strings.
The candidate character strings in the embodiment are based on the original character strings, and the error correction processing of the character granularity is performed, so that the accuracy of the candidate character strings can be effectively improved. For example, error correction of character granularity may correct a character that a user mispronounces due to mispronunciation or pronunciation errors; and the problem of various character errors such as clicking adjacent characters and the like caused by clicking deviation when clicking the area of the character to be input due to misoperation of a user can be corrected.
According to the processing method of the candidate character strings in the input method, the input original character strings are obtained based on the coordinates of a plurality of input points input by a user; and based on the original character string, character error correction is carried out according to a preset character error correction strategy to obtain a plurality of candidate character strings, so that the technical problem that the accuracy of predicted candidate words in the prior art is poor can be solved, character granularity error correction is carried out on the character strings, the error corrected candidate character strings are more in line with the expectations of users, the accuracy of the obtained candidate character strings can be effectively improved, and the input accuracy and the input efficiency of an input method are further improved. Therefore, the technical scheme of the embodiment can also effectively enhance the experience of the user using the input method, thereby enhancing the viscosity of the user to the input method.
FIG. 2 is a schematic diagram of a second embodiment of the present application; as shown in fig. 2, in this embodiment, the technical solution of the present application is described by taking three character error correction modes including character replacement, character order adjustment and character completion in sequence as an example. As shown in fig. 2, the method for processing candidate character strings in the input method of the present embodiment may specifically include the following steps:
s201, collecting a coordinate sequence input by a user, wherein the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input;
s202, acquiring an input original character string according to the mapping relation between the coordinate area and characters in the soft keyboard and the coordinates of each input point;
each character in the soft keyboard of the mobile device corresponds to a certain viewable area in the screen. When each character in the original character string is input, the user clicks the visual area corresponding to the character, so that the input of the character is realized. Correspondingly, in the processing device of the candidate words of the input method, the mapping relation between the coordinate area of the visual area of each character in the coordinate system of the screen and the characters is established. When the coordinates of the input point are detected, it is determined which character the coordinates fall in the coordinate area of which character, and it is considered as which character the user is inputting.
When the user inputs the original character string, the processing device of the candidate word of the input method can detect the coordinate sequence input by the user, wherein the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input. And then the input original character string can be obtained according to the mapping relation between the coordinate area and the characters in the soft keyboard and the coordinates of each input point.
When the user clicks the input, the clicking point may fall within the visible area of a certain key of the soft keyboard, that is, the character, or may fall outside the visible area, and at this time, it is required to consider which character is input according to a certain policy, for example, a certain distance from the visible area of which character is closer, so that the character corresponding to each input point is obtained, and the original character string is obtained.
Steps S201 and S202 are one implementation of step S101 in the embodiment shown in fig. 1. Optionally, in practical application, when the coordinates of each input point are collected, corresponding characters are obtained immediately, and after the ending symbol is collected, the collected characters are arranged according to the collection sequence, so that the original character string is obtained. Or may acquire the input original character string in other manners, which is not described herein in detail.
S203, replacing the characters in the original character strings according to a character replacement table counted in advance to obtain corresponding candidate character strings;
s204, according to a character sequence list counted in advance, the sequence of characters in the original character string and the candidate character string after character replacement is adjusted to obtain a corresponding candidate character string;
s205, according to a character completion table counted in advance, completing the characters in the original character string, the candidate character string after character replacement and the candidate character string after character sequence adjustment to obtain the corresponding candidate character string.
Step S203 to step S205 of the present embodiment are one implementation manner of step S102 of the embodiment shown in fig. 1.
Alternatively, in practical application, in step S204, only the characters in the candidate character string after the character replacement may be sequenced, and in step S205, only the candidate character string after the character sequencing may be complemented as the final candidate character string.
Further optionally, step S102 of the embodiment shown in fig. 1 may further include performing character error correction according to a preset character error correction policy based on the original character string to obtain a plurality of candidate character strings, which may specifically include: based on the original character string, character error correction is performed according to at least one mode of character replacement, character sequence adjustment and character completion, and a plurality of candidate character strings are obtained.
That is, the character error correction policy of the present application may include: any one of character replacement, character order adjustment and character completion modes; or any one of a combination of character substitution and character completion, and a combination of character completion and character completion; or a manner of simultaneously including character substitution, character sequencing, and character completion. In the present embodiment, when two or more error correction methods are included, it is preferable that the character substitution method is arranged in the first order, the character order adjustment method is next, and the character complement method is always arranged in the last order if the character substitution method is present.
The three character error correction modes are described in detail below.
First error correction method of character substitution: based on the original character string, performing character error correction in a character replacement manner may include: and replacing the characters in the original character string according to a pre-counted character replacement table to obtain a corresponding candidate character string.
For example, the pre-counted character substitution table may be counted based on the history of input conditions of all users using the input method. For example, i- > o, i- > u, g- > h, g- > f, etc. may be recorded in the character substitution table, where the character preceding the symbol "- >" is the character to be substituted and the character following the symbol "- >" is the character after the substitution. The replaced characters and the replaced characters involved in the character replacement are adjacent characters in the full keyboard, and the user is easy to operate by mistake, so that the character input errors are caused. In addition, other characters which are easy to spell and write can be included in the character substitution table, such as i- > e, e- > i, f- > h, or h- > f, etc. In summary, the character replacement table may include all character mapping relationships that may be replaced, so that the characters in the original character string may be replaced according to the given character replacement table, and candidate character strings may be obtained. In this embodiment, when performing character replacement, if N characters that can be replaced exist in the original character string, where N is greater than 1, only any one of the characters may be replaced, so as to obtain a candidate character string; and 2-N characters with any number can be replaced to obtain candidate character strings. And in the replacing process, if M replaced characters exist after a certain character is replaced, different candidate character strings in M correspondingly exist. In summary, all possible character substitutions may be performed on the original character string according to the character substitution table, to obtain all possible candidate character strings. For example, if the original string is "hello", and a- > e, a- > i, o- > e, o- > u exist in the character substitution table, the corresponding candidate string may include hello, hellu, hillo, hillu, helle, hale, halu.
The second error correction mode of character sequencing: based on the original character string, performing character error correction in a character order adjustment manner may include: according to a pre-counted character sequence list, the sequence of characters in the original character string and/or the candidate character string after character replacement is adjusted to obtain a corresponding candidate character string;
for example, when a user uses two hands for touch screen input, it may also often happen that the input sequence is wrong, such as the user inputting "scroll" with two hands, but since the right hand is faster than the left hand, o is input immediately after p is input, resulting in a final result of "scroll". To correct this situation, it is necessary to order the user inputs and expand the possible inputs.
Similarly, the character order list can be obtained according to statistics of historical input conditions of all users using the input method. For example, gi, go, gu, ig, og, ug, ig, ih, if, fi, hi, etc. may be included in the character order table. If the character error correction strategy in this embodiment does not include an error correction mode of character replacement before character sequence adjustment, the error correction of the character sequence adjustment at this time needs to detect whether there is a connected character pair capable of sequence adjustment in the original character string according to the character sequence adjustment table, and if there is one, the sequence adjustment is directly performed to obtain a candidate character string. If a plurality of character strings exist, any one, two, a plurality of even all character pairs can be subjected to sequence adjustment, and all possible candidate character strings after the character sequence adjustment are obtained. If the character error correction policy of this embodiment further includes an error correction manner of character replacement, in addition to performing a character sequence adjustment process on an original character string to obtain a plurality of candidate character strings according to the foregoing manner, the same character sequence adjustment process is performed on all candidate character strings after the character replacement process in the same manner, so as to obtain all possible candidate character strings.
Third character completion error correction method: based on the original character string, performing character error correction in a character completion manner may include: and according to a character completion table counted in advance, completing the original character string, the candidate character string after character replacement and/or the characters in the candidate character string after character sequencing to obtain the corresponding candidate character string.
For example, when a user inputs a word, the user often relies on the automatic completion function of the input method to complete the word according to the word prefix. Such as user input "sa", it is desirable that the input method list possible candidates "same", "sad", "satisfactor", etc. The input method therefore requires that the completion of the word be implemented based on the word prefix. In consideration of the requirement, a character completion table in which each character string that can be completed and all possible character strings after completion are recorded can also be obtained statistically based on the history of input conditions of all users using the input method.
If the character error correction strategy in this embodiment does not include the error correction modes of character replacement and character sequence adjustment before character completion, at this time, the original character string is completed directly according to the character completion table, and the candidate character string is obtained.
If the character error correction policy of the present embodiment further includes an error correction method of character replacement before character completion, at this time, in addition to performing character completion processing on the original character string to obtain a plurality of candidate character strings according to the above manner, the same character completion processing needs to be performed on all candidate character strings after the character replacement processing in the same manner, so as to obtain all possible candidate character strings.
If the character error correction policy of the present embodiment further includes an error correction method of character sequence adjustment before character completion, at this time, in addition to performing character completion processing on the original character string to obtain a plurality of candidate character strings according to the above method, the same character completion processing needs to be performed on all candidate character strings after the character sequence adjustment processing in the same manner, so as to obtain all possible candidate character strings.
If the character error correction policy of this embodiment includes an error correction method of character replacement and character sequence adjustment before character completion, at this time, in addition to performing character completion processing on an original character string to obtain a plurality of candidate character strings according to the above manner, it is also necessary to perform character completion processing on all candidate character strings after character replacement processing and all candidate character strings after character sequence adjustment processing in the same manner, respectively, to obtain all possible candidate character strings. I.e. the implementation of step S203-step S205 of the present embodiment.
Based on the description of the above embodiment, based on the original character string, character error correction is performed according to the preset character error correction policy of the present embodiment, so that a plurality of candidate character strings can be obtained.
The processing method of candidate character strings in the input method of the embodiment includes the steps of collecting a coordinate sequence input by a user, wherein the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input; and according to the mapping relation between the coordinate area and the characters in the soft keyboard and the coordinates of each input point, the input original character string is obtained, and the accuracy of the collected original character string can be ensured.
Furthermore, in this embodiment, character error correction can be implemented based on the character replacement, the character order adjustment and/or the character completion mode, so as to expand a plurality of candidate character strings, further enrich the candidate character strings, and because the character replacement, the character order adjustment and the character completion error correction modes take the actual scene into consideration, the error corrected candidate character strings more conform to the user's expectation, and the accuracy of the obtained candidate character strings can be effectively improved, thereby improving the input accuracy and the input efficiency of the input method.
FIG. 3 is a schematic view of a third embodiment of the present application; as shown in fig. 3, on the basis of the technical solution of the embodiment shown in fig. 1 or fig. 2, the method for processing candidate character strings in the input method of this embodiment may further include, after performing character error correction according to a preset character error correction policy based on the original character string in step S102 of the embodiment shown in fig. 1, to obtain a plurality of candidate character strings, the following steps:
S301, obtaining occurrence probability of each candidate character string;
s302, sequencing a plurality of candidate character strings according to the sequence from the big probability to the small probability;
s303, recommending a plurality of candidate character strings to a user according to the sorting order.
The technical solution of this embodiment is used to implement the recommendation of multiple candidate character strings obtained in the embodiments shown in fig. 1 and fig. 2. In order to improve the recommendation efficiency, in this embodiment, the recommendation may be made to the user based on the occurrence probability of each candidate character string. The higher the recommendation probability, the more corresponding candidate character strings are in line with the expectations of the user, and the recommendation can be preferentially performed or performed in a sequence of being forward.
In this embodiment, the probability of occurrence of each candidate character string may be obtained by referring to the probability of error correction when each candidate character string is error corrected, where the higher the probability of error correction indicates that the higher the probability of using the error corrected candidate character string in the history of using the input method by the user, and correspondingly, the higher the probability of occurrence of the candidate character string. Therefore, in this embodiment, for each candidate character string, the probability of occurrence of the candidate character string may be generated with reference to the probability of error correction by the error correction method employed when the candidate character string is character-corrected. The error correction probability of each error correction mode can be obtained based on historical data statistics.
For example, optionally, the step S301 may obtain the occurrence probability of each candidate character string, which may specifically include the following steps:
(1) Obtaining the corresponding error correction probability of each candidate character string when character error correction processing is carried out;
for example, when character replacement error correction processing is performed, the error correction probability corresponding to the character replacement error correction processing performed on each candidate character string may be obtained based on the replacement probability of the replacement character counted in advance.
Specifically, when the character substitution table is counted based on the history input data of all users using the input method, the probability of character substitution can also be counted. For example, in statistics of character i, the total number of times of history input is 10000, wherein the number of times of substitution for o is 800, the number of times of substitution for u is 400, and the error correction probability corresponding to the corresponding i- > o is 0.08, and the error correction probability corresponding to the corresponding i- > u is 0.04. If i is not replaced by other characters in the history statistics, the rest is the probability that i is i. If a candidate character string is subjected to character replacement and error correction and a plurality of characters are subjected to replacement and error correction at the same time, the replacement probability of each character in the candidate character string needs to be multiplied as the error correction probability of the candidate character string.
For another example, when performing the character sequence adjustment and error correction processing, the error correction probability corresponding to the character sequence adjustment and error correction processing performed on each candidate character string can be obtained according to the pre-counted sequence adjustment probability of the sequence adjustment character.
Similarly, when the character order list is counted according to the historical input data of all users using the input method, the probability of character order can be counted. For example, in the statistics based on the history data, the total number of times of history input of gi is 100 times, the number of times of gi being replaced with ig is 60 times, it can be considered that the probability of gi being replaced with ig is 0.6, and the probability of gi being gi is 0.4. Similarly, if multiple sets of sequence adjustment and error correction are performed on the candidate character string during character sequence adjustment and error correction, the sequence adjustment probability of each set of character sequence adjustment in the candidate character string needs to be multiplied to be used as the error correction probability of the candidate character string.
For example, when character completion error correction processing is performed, the error correction probability corresponding to the character completion error correction processing performed on each candidate character string can be obtained based on the character completion probability counted in advance.
Similarly, when the character completion table is counted according to the historical input data of all users using the input method, the probability of character completion can be counted. For example, in the statistics from the history data, the user history inputs "sa" 1000 times in total, where "same" is selected 400 times in total, "sad" 300 times in total, "samisfactor" 100 times in total, and so on. Correspondingly, the complement probability of "same" is 0.4, the complement probability of "sad" is 0.3, and the complement probability of "satisfactor" is 0.1. And for each candidate character string, the complement probability is the corresponding error correction probability.
(2) For each candidate character string, generating the occurrence probability of the corresponding candidate character string according to the error correction probability of the character error correction processing.
In practical applications, if a candidate character string is subjected to two or more types of error correction processing as described above, the probabilities of the two types of error correction processing are combined together to generate the occurrence probabilities of the corresponding candidate character string.
For example, the probability of error correction at the time of various error correction processes performed on each candidate character string may be multiplied with respect to each candidate character string, and the resulting probability of occurrence of the candidate character string may be used. Or the error correction probability of the candidate character string in various error correction processes can be logarithmized and then weighted and summed to obtain the final occurrence probability of the candidate character string. And the weight of the error correction probability of each error correction process can be set according to the actual requirement when the weights are summed.
Finally, sequencing the candidate character strings according to the sequence from the big to the small of the occurrence probability; and recommending a plurality of candidate character strings to the user according to the sorting order. In practical application, the top N candidate character strings can be preferably ranked for recommendation.
The processing method of candidate character strings in the input method of the embodiment obtains the occurrence probability of each candidate character string; sequencing the candidate character strings according to the sequence from the big appearance probability to the small appearance probability; according to the method, a plurality of candidate character strings are recommended to the user according to the sorting order, so that the candidate character strings with higher occurrence probability can be recommended to the user preferentially, the candidate character strings with higher occurrence probability more accord with the expectation of the user, the recommended candidate character strings more accord with the expectation of the user, and the input accuracy and the input efficiency of the input method are further improved.
FIG. 4 is a schematic view of a fourth embodiment of the present application; the processing method of candidate character strings in the input method of the present embodiment further describes the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 1 to 3. As shown in fig. 4, the method for processing candidate character strings in the input method of the present embodiment may specifically include the following steps:
s401, collecting a coordinate sequence input by a user, wherein the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input;
s402, acquiring an input original character string according to the mapping relation between the coordinate area and characters in the soft keyboard and the coordinates of each input point;
in this embodiment, the original character string may be expressed as: g= < G 1:t > -representing that the original string G comprises the 1 st to t-th characters.
Reference may be made in detail to step S201 and step S202 in the embodiment shown in fig. 2, which are not described in detail herein.
S403, replacing characters in the original character strings according to a character replacement table counted in advance to obtain corresponding candidate character strings, and calculating error correction probability when each candidate character string is replaced by the characters;
in the present embodiment, in the character substitution table counted in advance, not only the character pairs that can be substituted but also the probability that "substituted character" is substituted by "character" of each character pair are recorded.
For example, for the original string G, the possible desired character m is used k (1. Ltoreq.k. Ltoreq.t) replacing the character G in the original character string G k Obtaining candidate character string M=<m 1:t >. For example, the user inputs "hu", the last character being u, but since i is a neighbor character of u, i substitution u is recorded in the character substitution table, and based on this, "hi" is generated as a candidate character string. In this embodiment, the probability of i replacing u may also be recorded in the character replacement table, and so on, the probability of "replaced character" being replaced by "in all the character replacement pairs may be recorded in the character replacement table.
In addition, the error correction probability when each candidate character string is subjected to character substitution may be expressed as p (m|g), specifically expressed by the following formula:
the above formula represents the error correction probability of the candidate character string M with respect to the original character string G, where N represents the characters replaced in G to M, and N represents the total number of the replaced characters, that is, the P (m|g) is equal to the product of the character replacement probabilities of the replaced characters in G to M.
In the character replacement scenario, it is assumed that word misreading and homophonic misuse are independent of contextual input.
In practical applications, this step may be implemented in a separate unit, such as a character replacer.
S404, according to a character sequence list counted in advance, carrying out sequence adjustment on characters in the candidate character strings after character replacement to obtain corresponding candidate character strings, and calculating error correction probability of each candidate character string when carrying out character sequence adjustment;
similarly, in this embodiment, in the pre-counted character order table, not only the ordered character pairs are recorded, but also the probability that each character pair is ordered is recorded.
For example, for each adjacent character pair M in candidate string M k And m k+1 And (1) sequencing the character strings k and t-1 to obtain the reordered candidate character strings R. Under the independence assumption, p (r|m) can be expressed as:
the above formula represents the error correction probability of the candidate character string R after the sequence adjustment relative to the candidate character string M before the sequence adjustment, wherein M k And m k+1 Representing adjacent characters in candidate character strings before sequence adjustment, r n And r n+1 N represents the character pairs of the sequence adjustment, N-1 represents the existence of N-1 character pairs needing the sequence adjustment in the candidate character string M before the sequence adjustment. Based on the above formula, it can be known that this P (r|m) is equal to the product of the probabilities of candidate string M to all of the order characters in candidate string R.
According to the actual application requirement, the method of the step can be adopted to perform order adjustment on the original character strings to obtain some candidate character strings.
In practical applications, this step may be implemented in a separate unit, such as a character sequencer.
S405, according to a character completion table counted in advance, completing the original character string, the candidate character string after character replacement and the characters in the candidate character string after character sequence adjustment to obtain corresponding candidate character strings; calculating error correction probability of each candidate character string when character complement is carried out;
similarly, in this embodiment, in the character completion table counted in advance, not only the character string before completion and the character string after completion but also the completion probability of each character string after completion are recorded.
In the case of completion, the embodiment can search the high-frequency word of the candidate character string R containing the prefix to obtain the candidate gain, and finally obtain the candidate character string setCan be expressed as:
on the word frequency dictionary, the error correction probability P (f|r) at the time of character completion can be calculated by maximum likelihood estimation and recorded in the character completion table. Wherein F represents a candidate character string after character completion, and R represents a candidate character string before character completion, namely a candidate character string after character tuning.
Similarly, according to the actual application requirement, the step can also complement the original character string and/or the candidate character string after character replacement to obtain some candidate character strings.
In practical applications, this step may be implemented in a separate unit, such as a character complement.
S406, for each candidate character string in all the obtained candidate character strings, generating the occurrence probability of the candidate character string according to the error correction probability of the candidate character string during character replacement, the error correction probability of the character sequence and the error correction probability of the character complement;
in this embodiment, the probability of occurrence of the candidate character string may be calculated by scoring the candidate with a probability continuous multiplication model or a log-linear model. For example, when generating the occurrence probability of each candidate character string, the error correction probability at the time of character replacement obtained in step S403, the error correction probability at the time of character tone sequence obtained in step S404, and the error correction probability at the time of character complement obtained in step S405 may be directly multiplied as the occurrence probability of the candidate character string. The weight coefficient corresponding to each sub-item is 1 at this time. I.e.Finally generated candidate character string set The probability of each candidate string in (c) may be expressed as:
where P (F) represents the occurrence probability of the candidate character string.
Alternatively, a weighted sum of the logarithms of the sub-terms may be taken, specifically expressed by the following formula:
log(P(F))=a*log(P(F|R))+b*log(P(R|M))+c*log(P(M|G))
where a, b and c are weights of each item, respectively.
In practical application, the error correction probabilities of the three sub-terms may be further adopted, and the occurrence probabilities of other candidate character strings are not described in detail herein.
In this embodiment, the error correction processing through the above-described character substitution, character sequencing, and character completion is taken as an example, so that there are error correction probabilities of three error correction methods at the same time. In practical application, according to different preset character error correction strategies, only one or two error correction modes may exist, at this time, error correction probabilities corresponding to error correction modes which are not adopted in the formula are removed, and the calculation modes are similar and are not repeated here.
In practice, this step may be implemented in a separate unit, such as a candidate scorer.
S407, sequencing the plurality of candidate character strings according to the sequence from the big probability to the small probability;
alternatively, in practical applications, this step may be implemented in a separate unit, such as a sequencer.
S408, recommending a plurality of candidate character strings to the user according to the sorting order.
Specifically, when a plurality of candidate strings are recommended to the user in the ranking order, if the number of candidate strings is large, only the top N candidate strings may be recommended to the user.
According to the candidate character string processing method in the input method, through the adoption of the technical scheme, the original character string can be sequentially subjected to character replacement, character order adjustment and character completion processing to obtain a plurality of candidate character strings, the candidate character strings which a user wants to input can be predicted in an omnibearing mode, the occurrence probability of each candidate character string is calculated, the candidate character strings are recommended to the user according to the order of the occurrence probability from large to small, and the candidate character strings with the front order can be effectively guaranteed to be the candidate character string which most probably accords with the expectation of the user and also the candidate character strings which the user most wants to see. Therefore, by adopting the technical scheme of the embodiment, the candidate character strings can be effectively screened, and the candidate character strings which accord with the expectation of the user can be effectively recommended to the user, so that the input accuracy and the input efficiency of the input method can be effectively improved.
FIG. 5 is a schematic view of a fifth embodiment of the present application; as shown in fig. 5, the processing apparatus 500 of candidate character strings in the input method of the present embodiment includes:
A character string obtaining module 501, configured to obtain an input original character string based on coordinates of a plurality of input points input by a user;
the character error correction module 502 is configured to perform character error correction according to a preset character error correction policy based on the original character string, so as to obtain a plurality of candidate character strings.
The implementation principle and the technical effect of the processing device 500 for the candidate character strings in the input method by adopting the above modules are the same as those of the related method embodiments, and detailed description of the related method embodiments may be referred to and will not be repeated here.
FIG. 6 is a schematic view of a sixth embodiment of the present application; as shown in fig. 6, the processing device 500 for candidate character strings in the input method according to the present embodiment further describes the technical scheme of the present invention in more detail on the basis of the technical scheme of the embodiment shown in fig. 5.
Specifically, the character error correction module 502 is specifically configured to perform character error correction according to at least one of character replacement, character order adjustment, and character completion based on the original character string, so as to obtain a plurality of candidate character strings.
For example, in this embodiment, the character error correction module 502 may include at least one of the following:
The character replacer 5021 is configured to replace characters in the original character string according to a pre-counted character replacement table, so as to obtain a corresponding candidate character string;
the character sequencer 5022 is configured to sequence the characters in the original character string and/or the candidate character string after the character replacement according to a character sequencing table counted in advance, so as to obtain a corresponding candidate character string; and/or
And the character complement device 5023 is used for complementing the original character string, the candidate character string after character replacement and/or the characters in the candidate character string after character sequencing according to a character complement table counted in advance to obtain the corresponding candidate character string.
Further alternatively, as shown in fig. 6, the processing apparatus 500 of the candidate character string in the input method of the present embodiment further includes:
a probability obtaining module 503, configured to obtain occurrence probabilities of candidate character strings;
a ranking module 504, configured to rank the plurality of candidate character strings in order of from the high probability of occurrence;
a recommending module 505, configured to recommend a plurality of candidate character strings to the user according to the sorting order.
Further alternatively, the probability obtaining module 503 includes:
an error correction probability acquiring unit 5031 configured to acquire an error correction probability corresponding to each candidate character string when character error correction processing is performed;
The occurrence probability generation unit 5032 is configured to generate, for each candidate character string, an occurrence probability of the corresponding candidate character string according to an error correction probability of the character error correction process.
Further alternatively, the error correction probability acquiring unit 5031 is configured to:
when character replacement error correction processing is carried out, acquiring error correction probability corresponding to character replacement error correction processing of each candidate character string according to replacement probability of replacement characters counted in advance;
when character sequence adjustment and error correction processing is carried out, according to the sequence adjustment probability of the sequence adjustment characters counted in advance, the error correction probability corresponding to the character sequence adjustment and error correction processing of each candidate character string is obtained; and/or
When character completion error correction processing is performed, the error correction probability corresponding to the character completion error correction processing of each candidate character string is obtained according to the pre-counted character completion probability.
Further alternatively, the character string obtaining module 501 includes:
the acquisition unit 5011 is used for acquiring a coordinate sequence input by a user, wherein the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input;
the character generator 5012 is configured to obtain an input original character string according to a mapping relation between the coordinate area and the characters in the soft keyboard and coordinates of each input point.
The implementation principle and the technical effect of the processing device 500 for the candidate character strings in the input method by adopting the above modules are the same as those of the related method embodiments, and detailed description of the related method embodiments may be referred to and will not be repeated here.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 7, a block diagram of an electronic device implementing a method for processing candidate character strings in an input method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 7, the electronic device includes: one or more processors 701, memory 702, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 701 is illustrated in fig. 7.
Memory 702 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method for processing candidate character strings in the input method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute a method of processing candidate character strings in an input method provided by the present application.
The memory 702 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., related modules shown in fig. 5 and 6) corresponding to a method for processing candidate character strings in an input method according to an embodiment of the present application. The processor 701 executes various functional applications of the server and data processing, that is, implements the processing method of the candidate character string in the input method in the above-described method embodiment, by running the non-transitory software programs, instructions, and modules stored in the memory 702.
Memory 702 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of an electronic device implementing a processing method of candidate character strings in an input method, and the like. In addition, the memory 702 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 702 optionally includes memory remotely located with respect to processor 701, which may be connected via a network to an electronic device implementing the processing method of candidate strings in an input method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device for implementing the processing method of the candidate character string in the input method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or otherwise, in fig. 7 by way of example.
The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic device implementing a method of processing candidate character strings in an input method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output device 704 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the input original character string is obtained based on the coordinates of a plurality of input points input by a user; and based on the original character string, character error correction is carried out according to a preset character error correction strategy to obtain a plurality of candidate character strings, so that the technical problem that the accuracy of predicted candidate words in the prior art is poor can be solved, character granularity error correction is carried out on the character strings, the error corrected candidate character strings are more in line with the expectations of users, the accuracy of the obtained candidate character strings can be effectively improved, and the input accuracy and the input efficiency of an input method are further improved. Therefore, the technical scheme of the application can also effectively enhance the experience of the user using the input method, and further enhance the viscosity of the user to the input method.
According to the technical scheme of the embodiment of the application, the coordinate sequence input by the user is collected, and the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input; and according to the mapping relation between the coordinate area and the characters in the soft keyboard and the coordinates of each input point, the input original character string is obtained, and the accuracy of the collected original character string can be ensured.
According to the technical scheme of the embodiment of the application, character error correction can be realized based on the character replacement, character order adjustment and/or character completion modes, a plurality of candidate character strings are expanded, the candidate character strings are further enriched, and due to the fact that actual scenes are considered in the character replacement, character order adjustment and character completion error correction modes, the candidate character strings after error correction are more in line with the expectations of users, the accuracy of the obtained candidate character strings can be effectively improved, and the input accuracy and the input efficiency of the input method are further improved.
According to the technical scheme of the embodiment of the application, the occurrence probability of each candidate character string can be obtained; sequencing the candidate character strings according to the sequence from the big appearance probability to the small appearance probability; according to the method, a plurality of candidate character strings are recommended to the user according to the sorting order, so that the candidate character strings with higher occurrence probability can be recommended to the user preferentially, the candidate character strings with higher occurrence probability more accord with the expectation of the user, the recommended candidate character strings more accord with the expectation of the user, and the input accuracy and the input efficiency of the input method are further improved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (16)

1. A processing method of candidate character strings in an input method is characterized by comprising the following steps:
acquiring an input original character string based on coordinates of a plurality of input points input by a user;
performing character error correction according to a preset character error correction strategy based on the original character strings to obtain a plurality of candidate character strings;
the method further comprises the steps of:
acquiring the occurrence probability of each candidate character string;
the method for obtaining the occurrence probability of each candidate character string comprises the following steps:
For each candidate character string, generating the occurrence probability of the corresponding candidate character string according to the error correction probability of the character error correction processing;
wherein for each candidate character string, generating an occurrence probability of the corresponding candidate character string according to an error correction probability of character error correction processing, comprising:
multiplying, for each of the candidate character strings, error correction probabilities at the time of various error correction processes performed on the candidate character string as occurrence probabilities of the candidate character strings; or alternatively
Taking logarithm of error correction probability of the candidate character string in various error correction processes, and carrying out weighted summation to obtain the occurrence probability of the candidate character string.
2. The method of claim 1, wherein performing character error correction according to a preset character error correction policy based on the original character string to obtain a plurality of candidate character strings, comprising:
and performing character error correction according to at least one mode of character replacement, character sequence adjustment and character completion based on the original character strings to obtain a plurality of candidate character strings.
3. The method of claim 2, wherein character error correction is performed in a character replacement manner based on the original character string, comprising:
According to a character replacement table counted in advance, replacing characters in the original character string to obtain a corresponding candidate character string;
based on the original character string, performing character error correction according to a character sequencing mode, including:
according to a pre-counted character sequencing list, sequencing the characters in the original character string and/or the character replaced candidate character string to obtain a corresponding candidate character string; and/or
Based on the original character string, performing character error correction in a character completion mode, including:
and according to a character completion table counted in advance, completing the original character string, the candidate character string after character replacement and/or the characters in the candidate character string after character sequencing, so as to obtain the corresponding candidate character string.
4. A method according to any one of claims 1-3, wherein after obtaining the occurrence probability of each of the candidate character strings, the method further comprises:
sorting the plurality of candidate character strings according to the order of the occurrence probability from high to low;
recommending the candidate character strings to the user according to the sorting order.
5. The method of claim 4, wherein for each of the candidate character strings, before generating the occurrence probability of the corresponding candidate character string based on the error correction probability of the character error correction process, the method further comprises:
And acquiring the corresponding error correction probability of each candidate character string when character error correction processing is carried out.
6. The method of claim 5, wherein obtaining the error correction probability corresponding to each candidate character string when performing character error correction processing, comprises:
when character replacement error correction processing is carried out, acquiring error correction probability corresponding to character replacement error correction processing of each candidate character string according to replacement probability of replacement characters counted in advance;
when character sequence adjustment and error correction processing is carried out, according to the sequence adjustment probability of the sequence adjustment characters counted in advance, the error correction probability corresponding to the character sequence adjustment and error correction processing of each candidate character string is obtained; and/or
And when character completion error correction processing is carried out, acquiring error correction probability corresponding to the character completion error correction processing of each candidate character string according to the pre-counted character completion probability.
7. The method of any of claims 1-3, 5-6, wherein obtaining the input original string based on coordinates of a plurality of input points entered by the user comprises:
collecting a coordinate sequence input by a user, wherein the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input;
And acquiring the input original character string according to the mapping relation between the coordinate area and the characters in the soft keyboard and the coordinates of each input point.
8. A processing apparatus for candidate character strings in an input method, comprising:
the character string acquisition module is used for acquiring an input original character string based on coordinates of a plurality of input points input by a user;
the character error correction module is used for carrying out character error correction according to a preset character error correction strategy based on the original character strings to obtain a plurality of candidate character strings;
the apparatus further comprises:
the probability acquisition module is used for acquiring the occurrence probability of each candidate character string;
the probability acquisition module comprises:
the occurrence probability generation unit is used for generating the occurrence probability of the corresponding candidate character strings according to the error correction probability of the character error correction processing for each candidate character string;
the occurrence probability generation unit is used for:
multiplying, for each of the candidate character strings, error correction probabilities at the time of various error correction processes performed on the candidate character string as occurrence probabilities of the candidate character strings; or alternatively
Taking logarithm of error correction probability of the candidate character string in various error correction processes, and carrying out weighted summation to obtain the occurrence probability of the candidate character string.
9. The apparatus of claim 8, wherein the character error correction module is configured to:
and performing character error correction according to at least one mode of character replacement, character sequence adjustment and character completion based on the original character strings to obtain a plurality of candidate character strings.
10. The apparatus of claim 9, wherein the character error correction module comprises at least one of:
the character replacer is used for replacing characters in the original character strings according to a character replacement table counted in advance to obtain corresponding candidate character strings;
the character sequencer is used for sequencing the characters in the original character string and/or the candidate character string after the character replacement according to a character sequencing table counted in advance to obtain a corresponding candidate character string; and/or
And the character complement device is used for complementing the original character string, the candidate character string after character replacement and/or the characters in the candidate character string after character sequencing according to a character complement table counted in advance to obtain the corresponding candidate character string.
11. The apparatus according to any one of claims 8-10, wherein the apparatus further comprises:
The sorting module is used for sorting the candidate character strings according to the sequence from the big appearance probability to the small appearance probability;
and the recommending module is used for recommending the candidate character strings to the user according to the sorting order.
12. The apparatus of claim 11, wherein the probability acquisition module further comprises:
and the error correction probability acquisition unit is used for acquiring the error correction probability corresponding to each candidate character string when character error correction processing is carried out.
13. The apparatus according to claim 12, wherein the error correction probability obtaining unit is configured to:
when character replacement error correction processing is carried out, acquiring error correction probability corresponding to character replacement error correction processing of each candidate character string according to replacement probability of replacement characters counted in advance;
when character sequence adjustment and error correction processing is carried out, according to the sequence adjustment probability of the sequence adjustment characters counted in advance, the error correction probability corresponding to the character sequence adjustment and error correction processing of each candidate character string is obtained; and/or
And when character completion error correction processing is carried out, acquiring error correction probability corresponding to the character completion error correction processing of each candidate character string according to the pre-counted character completion probability.
14. The apparatus according to any one of claims 8-10, 12-13, wherein the string acquisition module comprises:
the acquisition unit is used for acquiring a coordinate sequence input by a user, wherein the coordinate sequence comprises coordinates of a plurality of input points which are sequentially input;
and the character generator is used for acquiring the input original character string according to the mapping relation between the coordinate area and the characters in the soft keyboard and the coordinates of each input point.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN202010304923.1A 2020-04-17 2020-04-17 Candidate character string processing method and device, electronic equipment and storage medium Active CN111665956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010304923.1A CN111665956B (en) 2020-04-17 2020-04-17 Candidate character string processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010304923.1A CN111665956B (en) 2020-04-17 2020-04-17 Candidate character string processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111665956A CN111665956A (en) 2020-09-15
CN111665956B true CN111665956B (en) 2023-07-25

Family

ID=72382809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010304923.1A Active CN111665956B (en) 2020-04-17 2020-04-17 Candidate character string processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111665956B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112684910A (en) * 2020-12-29 2021-04-20 维沃移动通信有限公司 Input method candidate word display method and device and electronic equipment
CN113190160A (en) * 2021-02-26 2021-07-30 清华大学 Input error correction method, computing device and medium for analyzing hand tremor false touch
CN113849071A (en) * 2021-09-10 2021-12-28 维沃移动通信有限公司 Character string processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365573A (en) * 2012-03-27 2013-10-23 北京搜狗科技发展有限公司 Method and device for identifying multi-key input characters
CN106774970A (en) * 2015-11-24 2017-05-31 北京搜狗科技发展有限公司 The method and apparatus being ranked up to the candidate item of input method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0359761A (en) * 1989-07-27 1991-03-14 Matsushita Electric Ind Co Ltd Device for correcting spelling error of english word
JP4066507B2 (en) * 1998-05-11 2008-03-26 日本電信電話株式会社 Japanese character recognition error correction method and apparatus, and recording medium on which error correction program is recorded
CN101727271B (en) * 2008-10-22 2012-11-14 北京搜狗科技发展有限公司 Method and device for providing error correcting prompt and input method system
KR101263332B1 (en) * 2009-09-11 2013-05-20 한국전자통신연구원 Automatic translation apparatus by using user interaction in mobile device and its method
KR101381101B1 (en) * 2013-11-13 2014-04-02 주식회사 큐키 Error revising method through correlation decision between character strings
CN107102746B (en) * 2016-02-19 2023-03-24 北京搜狗科技发展有限公司 Candidate word generation method and device and candidate word generation device
CN110083819B (en) * 2018-01-26 2024-02-09 北京京东尚科信息技术有限公司 Spelling error correction method, device, medium and electronic equipment
CN110488990A (en) * 2019-08-12 2019-11-22 腾讯科技(深圳)有限公司 Input error correction method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103365573A (en) * 2012-03-27 2013-10-23 北京搜狗科技发展有限公司 Method and device for identifying multi-key input characters
CN106774970A (en) * 2015-11-24 2017-05-31 北京搜狗科技发展有限公司 The method and apparatus being ranked up to the candidate item of input method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A class of m-ary asymmetric symbol error correcting codes constructed by graph coloring;H. Kaneko;《Proceedings. 2001 IEEE International Symposium on Information Theory》;第225页 *
一种字符串纠错方法及其应用研究;李德银;《计算机应用与软件》;第41-45页 *

Also Published As

Publication number Publication date
CN111665956A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
CN111665956B (en) Candidate character string processing method and device, electronic equipment and storage medium
CN111859951B (en) Language model training method and device, electronic equipment and readable storage medium
EP3862892A1 (en) Session recommendation method and apparatus, and electronic device
CN111709234B (en) Training method and device for text processing model and electronic equipment
CN111859997B (en) Model training method and device in machine translation, electronic equipment and storage medium
US20210397791A1 (en) Language model training method, apparatus, electronic device and readable storage medium
CN112001169B (en) Text error correction method and device, electronic equipment and readable storage medium
CN111079945B (en) End-to-end model training method and device
CN111859907B (en) Text error correction method and device, electronic equipment and storage medium
CN110717340B (en) Recommendation method, recommendation device, electronic equipment and storage medium
US20210319185A1 (en) Method for generating conversation, electronic device and storage medium
CN111563198B (en) Material recall method, device, equipment and storage medium
CN111708477B (en) Key identification method, device, equipment and storage medium
CN111274353B (en) Text word segmentation method, device, equipment and medium
CN111666417B (en) Method, device, electronic equipment and readable storage medium for generating synonyms
CN111090991A (en) Scene error correction method and device, electronic equipment and storage medium
CN111753147A (en) Similarity processing method, device, server and storage medium
CN111090341A (en) Input method candidate result display method, related equipment and readable storage medium
CN111428489B (en) Comment generation method and device, electronic equipment and storage medium
CN111125445B (en) Community theme generation method and device, electronic equipment and storage medium
CN112328710B (en) Entity information processing method, device, electronic equipment and storage medium
CN112115233B (en) Relational network generation method and device, electronic equipment and storage medium
CN111274497B (en) Community recommendation and model training method and device, electronic equipment and storage medium
CN112579875A (en) Method, device, equipment and medium for generating release information title
CN111723318A (en) Page data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant