CN109992120B - Input error correction method and device - Google Patents

Input error correction method and device Download PDF

Info

Publication number
CN109992120B
CN109992120B CN201711484183.9A CN201711484183A CN109992120B CN 109992120 B CN109992120 B CN 109992120B CN 201711484183 A CN201711484183 A CN 201711484183A CN 109992120 B CN109992120 B CN 109992120B
Authority
CN
China
Prior art keywords
input
character string
character
error
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711484183.9A
Other languages
Chinese (zh)
Other versions
CN109992120A (en
Inventor
陈小帅
臧娇娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201711484183.9A priority Critical patent/CN109992120B/en
Publication of CN109992120A publication Critical patent/CN109992120A/en
Application granted granted Critical
Publication of CN109992120B publication Critical patent/CN109992120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application discloses an input error correction method and a device, wherein the method comprises the following steps: acquiring an input character string which is input by a user and is not submitted to an interactive session; determining a target character string corresponding to the input character string from the character string displayed on the screen in the interactive session; determining whether the input character string has an error character string according to the target character string; if yes, determining a candidate item for error correction aiming at the error character string according to the target character string. The mode of determining the input errors in the interactive context according to the interactive context can still find the input errors possibly irrelevant to the interactive context and provide error correction candidate items under the condition of the whole semantic smoothness of the input character string, thereby not only improving the application range of input error correction, but also improving the input experience of a user.

Description

Input error correction method and device
Technical Field
The present application relates to the field of input methods, and in particular, to an input error correction method and apparatus.
Background
When characters are input through the input method, the input method can correct errors of the characters input by a user, and when the errors are corrected, whether the characters are wrongly input can be judged according to semantic relations, association degrees and the like among characters in sentences of input contents, so that error correction candidates can be provided for the characters which are possibly output to help the user correct the errors.
Although this conventional error correction method can make the input sentence smooth, it is not applicable to all input scenarios. One common input scenario is an interactive type of input scenario, such as online chat, forum postings, messages, and so on. In this interactive input scenario, the user may communicate, discuss, ask and answer with other users, that is, the content input by the user may be related to the content already appearing in the input scenario.
In this kind of input scenario, although there is no semantic problem about the content input by the user, the input still belongs to an input error, for example, in the social software chat interface shown in fig. 1, the user a and the user b interact with each other, and the information sent by the user a is "do you want me? "i do not look like you" for the input content of the user b, if the above-mentioned conventional error correction method is adopted, the content input by the user b has no problem, and the error correction candidate item is not prompted to the user b, but if the above is already input, that is, "do you want me? "obviously, user b should actually input" i don't want you "as an interactive reply, rather than" i don't look like you "which has nothing to do with the above, so that user b will" want "to input" by mistake as "like" when inputting, but this mistake input behavior cannot be found by the aforementioned conventional error correction method.
Disclosure of Invention
In order to solve the technical problems, the application provides an input error correction method and device, which can still find out input errors irrelevant to interactive text and provide error correction candidates under the condition of smooth overall semantics of an input character string, so that the application range of input error correction and the input experience of a user are improved.
The embodiment of the application discloses the following technical scheme:
in a first aspect, an embodiment of the present application provides an input error correction method, where the method includes:
acquiring an input character string which is input by a user and is not submitted to an interactive session;
determining a target character string corresponding to the input character string from the character string displayed on the screen in the interactive session;
determining whether the input character string has an error character string according to the target character string;
if yes, determining an error correction candidate item aiming at the error character string according to the target character string.
Optionally, the target character string is any one or a combination of more than one of the following:
one of the on-screen character strings having an on-screen time closer to an input time of the input character string in the interactive session;
an on-screen string in the interactive session that has not been replied to;
and the characters on the screen are semantically smooth in the interactive session.
Optionally, the determining whether the input character string has an error character string according to the target character string includes:
judging semantic correlation between the input character string and the target character string;
and if the semantic relevance is lower than a preset condition, determining that the input character string has an error character string.
Optionally, after determining that the input string has an error string, the method further includes:
if the sub-input codes in the input codes of the input character strings and the sub-codes in the codes corresponding to the target character strings meet the code similarity condition, determining the characters corresponding to the sub-input codes as the error character strings; or,
and if the sub-input characters of the input character string and the sub-characters of the target character string accord with the similar composition condition, determining the sub-input characters as the error character string.
Optionally, the determining a candidate for error correction for the error character string according to the target character string includes:
determining a correction candidate item corresponding to the error character string according to the character corresponding to the subcode; or,
and determining an error correction candidate item corresponding to the error character string according to the sub-character.
Optionally, the determining, according to the character corresponding to the sub-code, the error correction candidate corresponding to the error character string includes:
determining a language model score of an input character string which replaces the error character string with a character to be determined, wherein the character to be determined is a character corresponding to the sub-code;
determining the correlation probability of the undetermined character and the target character string;
if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as the error correction candidate item;
or, the determining the error correction candidate item corresponding to the error character string according to the sub-character includes:
determining a language model score of an input character string which replaces the error character string with a character to be determined, wherein the character to be determined is the sub-character;
determining the correlation probability of the undetermined character and the target character string;
and if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as the error correction candidate item.
Optionally, the determining whether the input character string has an error character string according to the target character string includes
Determining intelligent reply content corresponding to the target character string;
and if the similarity of the input character string and the intelligent reply content accords with a reply similarity condition, determining that the input character string has an error character string.
Optionally, after determining that the input string has an error string, the method further includes:
if the sub-input codes in the input codes of the input character strings and the sub-codes in the codes corresponding to the intelligent reply content meet the code similarity condition, determining the characters corresponding to the sub-input codes as the error character strings; or,
and if the sub-input characters of the input character string and the sub-characters of the intelligent reply content accord with the similar composition condition, determining the sub-input characters as the error character string.
Optionally, the determining, according to the target character string, an error correction candidate for the error character string includes:
determining an error correction candidate item corresponding to the error character string according to the character corresponding to the subcode; or,
and determining the error correction candidate item corresponding to the error character string according to the sub-character.
In a second aspect, an embodiment of the present application provides an input error correction apparatus, where the method includes:
an input character string acquisition unit, configured to acquire an input character string that is input by a user and that has not been submitted to an interactive session;
the target character string determining unit is used for determining a target character string corresponding to the input character string from the character string which is displayed on the screen in the interactive session;
an error string determining unit configured to determine whether the input string has an error string based on the target string;
and the error correction candidate determining unit is used for determining the error correction candidate aiming at the error character string according to the target character string when the input character string has the error character string. Optionally, the target character string is any one or a combination of more than one of the following:
one of the displayed strings having a closer on-screen time to an input time of the input string in the interactive session;
an on-screen string in the interactive session that has not been replied to;
and the semantically smooth character strings on the screen in the interactive session.
Optionally, the error string determining unit includes:
a semantic correlation judging subunit, configured to judge semantic correlation between the input character string and the target character string;
and the first error character string determining subunit is used for determining that the input character string has an error character string if the semantic relevance is lower than a preset condition.
Optionally, the apparatus further comprises:
a first error character string determining unit, configured to determine a character corresponding to a sub-input code as the error character string if it is obtained that the sub-input code in the input code of the input character string and the sub-code in the code corresponding to the target character string conform to a code similarity condition; or,
and the second error character string determining unit is used for determining the sub-input characters of the input character string as the error character string if the sub-input characters of the input character string and the sub-characters of the target character string accord with a similar composition condition.
Optionally, the error correction candidate determining unit includes:
a first error correction candidate determining subunit, configured to determine an error correction candidate corresponding to the error character string according to a character corresponding to the subcode; or,
and the second error correction candidate determining subunit is used for determining the error correction candidate corresponding to the error character string according to the sub-character.
Optionally, the first error correction candidate determining subunit includes:
a first language model score determining submodule, configured to determine a language model score of an input character string in which the incorrect character string is replaced with a character to be determined, where the character to be determined is a character corresponding to the subcode;
a first correlation probability determination submodule, configured to determine a correlation probability between the undetermined character and the target character string;
a first error correction candidate determining sub-module, configured to determine the undetermined character as the error correction candidate if the language model score and the correlation probability determine that the undetermined character satisfies an error correction condition;
or, the second error correction candidate determining subunit includes:
a second language model score determining submodule for determining a language model score of an input character string in which the wrong character string is replaced with a character to be determined, the character to be determined being the sub-character;
the second correlation probability determination submodule is used for determining the correlation probability of the undetermined character and the target character string;
and the second error correction candidate determining sub-module is used for determining the undetermined character as the error correction candidate if the language model score and the correlation probability determine that the undetermined character meets the error correction condition.
Optionally, the error string determining unit includes:
the intelligent reply determining subunit is used for determining the intelligent reply content corresponding to the target character string;
and the second error character string determining subunit is used for determining that the input character string has an error character string if the similarity between the input character string and the intelligent reply content meets the reply similarity condition.
Optionally, the apparatus further comprises:
a third error character string determining unit, configured to determine, if it is obtained that a sub-input code in an input code of the input character string and a sub-code in a code corresponding to the intelligent reply content conform to a code similarity condition, a character corresponding to the sub-input code as the error character string; or,
and the fourth error character string determining unit is used for determining the sub-input characters of the input character string as the error character string if the sub-input characters of the input character string and the sub-characters of the intelligent reply content accord with the similar composition condition.
Optionally, the error correction candidate determining unit includes:
a third error correction candidate determining subunit, configured to determine an error correction candidate corresponding to the error character string according to the character corresponding to the subcode; or,
and the fourth error correction candidate determining subunit is used for determining the error correction candidate corresponding to the error character string according to the substring.
In a third aspect, an apparatus for input error correction is provided, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for:
acquiring an input character string which is input by a user and is not submitted to an interactive session;
determining a target character string corresponding to the input character string from the character string which is displayed on the screen in the interactive session;
determining whether the input character string has an error character string according to the target character string;
if yes, determining an error correction candidate item aiming at the error character string according to the target character string.
In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform one or more of the methods for error correction of inputs.
According to the technical scheme, in the interactive session, aiming at the input character string which is input by the user and is not submitted to the interactive session, the target character string which is possibly related to the input character string can be determined from the character string which is already displayed on the screen in the interactive session, the input character string can be the interactive context of the target character string because the target character string is the interactive context of the input character string in the interactive process, the content expressed by the input character string can be related to the content expressed in the target character string, whether the input character string has the error character string or not can be determined according to the target character string, and if the error character string has the error correction candidate item, the error correction candidate item aiming at the error character string can be determined according to the target character string. The mode of determining the input errors in the interactive context according to the interactive context can still find the input errors possibly irrelevant to the interactive context and provide error correction candidate items under the condition of the whole semantic smoothness of the input character string, thereby not only improving the application range of input error correction, but also improving the input experience of a user.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments of the present application, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.
Fig. 1 is a schematic diagram of an exemplary scenario application provided in an embodiment of the present application;
FIG. 2a is a schematic diagram of another exemplary scenario application provided by an embodiment of the present application;
FIG. 2b is a schematic diagram of another exemplary scenario application provided by an embodiment of the present application;
FIG. 3 is a flowchart of an input error correction method according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating locations of error correction candidates according to an embodiment of the present application;
fig. 5 is a flowchart of a method for determining error correction candidates corresponding to an error string according to characters corresponding to sub-codes according to an embodiment of the present application;
fig. 6 is a flowchart of a method for determining error correction candidates corresponding to an error string according to sub-characters according to an embodiment of the present application;
FIG. 7 is a flow chart of yet another input error correction method provided by an embodiment of the present application;
fig. 8 is a block diagram of an input error correction apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of an input error correction apparatus according to an embodiment of the present application;
fig. 10 is a block diagram of a server for error correction of an input according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
The inventor finds in research that the conventional error correction method can make the input sentence smooth, but is not suitable for all input scenarios. One common input scenario is an interactive type of input scenario, such as online chat, forum postings, messages, and so on. In this interactive input scenario, the user may communicate, discuss, ask and answer with other users, that is, the content input by the user may be related to the content already appearing in the input scenario.
In such input scenarios, although there is no semantic problem in the content input by the user, the input error still belongs to the input error, but the input error cannot be found by the conventional error correction method. That is to say, the error correction range of the conventional error correction method is limited to the semantics of the input character string, and the error correction cannot be performed according to the interactive text, so that the use range of the conventional error correction method is limited, and in some scenarios, higher user experience cannot be brought.
Therefore, the embodiment of the present application provides a solution to the above problem, which can still find out an input error that may be irrelevant to the interactive context and provide a candidate for error correction under the condition that the whole semantics of the input character string is smooth, thereby improving the application range of input error correction and the input experience of the user.
In the input error correction method provided by this embodiment, in an interactive session, for an acquired input character string that is input by a user and is not yet submitted to the interactive session, a target character string that may be related to the input character string may be determined from character strings that have been already displayed in the interactive session, whether the input character string has an error character string or not may be determined according to the target character string, and if so, an error correction candidate for the error character string may also be determined according to the target character string. The mode of determining the input error in the interactive context according to the interactive context can still find out the input error possibly irrelevant to the interactive context and provide error correction candidate items under the condition of the whole semantic smoothness of the input character string, thereby not only improving the application range of input error correction, but also improving the input experience of a user.
Taking the interactive session shown in fig. 1 as an example, in the embodiment of the present application, an input string "i do not look like you" that is input by a user and is not yet submitted to the interactive session is obtained first; then, from a character string 'do you want me' which is already on the screen in the interactive session, a target character string 'do you want me' corresponding to an input character string 'do not like you', and whether the input character string 'do not like you' has an error character string is determined according to the target character string 'do you want me'; since the input character string is used as an interactive reply to the target character string, the input character string should actually input "i do not want you", rather than "i do not want you" which has nothing to do with the above, it can be determined that the input character string has an error character string, and an error correction candidate for the input character string can be determined according to the target character string "do you want me", that is, an error correction candidate "will be provided for" like "in the input character string. Therefore, under the condition of smooth overall semantics of the input character string, input errors which are possibly irrelevant to interactive context can be still found out and error correction candidate items are provided, so that the application range of the input error correction and the input experience of a user are improved.
The interactive session mentioned in this embodiment may be an interactive session in a scenario where a user may communicate, discuss, ask for a question, and the like with other users, such as online chat, forum post, and message. The above mentioned interaction in this embodiment is the interaction content already existing in the interaction session. It should be noted that the above interaction includes not only the interaction content displayed in the current interaction session, but also the historical interaction content of both interaction parties that are not displayed in the current interaction session. For example, a user a and a user b use an instant messaging tool to chat, during the chat, the user a clears the chat record of the user a and the user b, the interaction record retained in the current chat interface of the user a and the user b is as shown in fig. 2a, and the above interaction mentioned in this embodiment includes not only the character string "what do you are in the area 201? "," nothing "and" do you want me ", also includes the historical interactive content of user a and user b, i.e. the part of the interactive content that user a empties.
The on-screen character string in this embodiment refers to a character string that both interaction parties have submitted to the interaction session. The character string is submitted to the interactive session, namely, the user triggers a submitting operation on the character string in the screen area of the input method, so that the character string is displayed in the interactive session. The on-screen character string in this embodiment may be a character string that any user in the interactive session has submitted into the interactive session. See, for example, fig. 2a, where the string in area 201 "what do you are? "," nothing to do "and" do you want me "are all the on-screen character strings mentioned in this embodiment.
The input character string mentioned in this embodiment refers to a character string that is currently edited by the user and is already displayed in the screen area of the input method, but is not submitted to the interactive session. For example, referring to fig. 2a, in which the area 202 is the upper screen area of the input method, the character string "i do not look like you" in the area 202 is the input character string mentioned in this embodiment, and since the input character string is not submitted to the interactive session, the user who inputs the input character string can perform editing operations such as modifying, deleting, etc. on the character string.
It should be noted that although the input character string shown in fig. 2a is a chinese kanji, fig. 2a is only an exemplary illustration. The input character string mentioned in this embodiment may be a character input by a user using various input methods. When the input method is a Chinese input method, the input character string is a Chinese character, when the input method is a Korean input method, the input character string is a Korean character, and when the input method is a Japanese input method, the input character string is a Japanese character.
The target character string mentioned in the present embodiment is included in the character string that has been already screened into the interactive session. The target string may be all of the strings that have been screened into the interactive session or may be a partial string that has been screened into the interactive session. It should be noted that the number of the character strings that have been displayed in the interactive session may be relatively large, and the target character string in this embodiment refers to a character string that may have a semantic relation with the input character string among the character strings that have been displayed in the interactive session. For example, referring to fig. 2a, in the interactive session shown in fig. 2a, the target character string may be one or more of the character strings "what you are doing", "what do not do" and "do you want to do me" that have been screened into the interactive session, and the character strings "do you want me" and the input character string "do i not like you" may have semantic relation, so the character string "do you want me" that has been screened into the interactive session is the target character string mentioned in the present embodiment. As another example, as can be seen in fig. 2b, in the interactive session shown in fig. 2b, the target string may be one or more of the strings "do you want me", "why you do not want me" that have been uploaded into the interactive session, whereas the strings "do you want me" and "why you do not want me" that have been uploaded into the interactive session may each have a semantic connection with the input string "i do not look like you", and thus, the strings "do you want me" and "why you do not want me" that have been uploaded into the interactive session are the target strings mentioned in this embodiment.
The error character string in the present embodiment refers to all or part of the input character string, and the error character string contains an error character irrelevant to the interaction context. When the input character string is an English word, the error character string can be one or more English words and can be one or more English words; when the input string is chinese, the error string may include one or more characters, for example, one or more chinese kanji characters, or one or more chinese words; when the input character string is korean characters, the error character string may be one or more korean characters or one or more korean words; when the input string is a japanese word, the error string may be one or more japanese words or one or more japanese words. Taking the interactive session shown in fig. 2a as an example, the error string may be the character "like" in the input string "i do not look like you" that is irrelevant to the interactive context, the error string may also be the character "do not look like" in the input string "i do not look like you" that is irrelevant to the interactive context, and the error string may also be the character "like you" in the input string "i do not look like you" that is irrelevant to the interactive context.
The error correction candidates mentioned in this embodiment refer to correct input characters corresponding to an erroneous character string in an input character string. Generally, the number of characters included in the error correction candidates is the same as the number of error character strings in the input character string. Taking the interactive session shown in fig. 2a as an example, the candidate for error correction is the input character string "i do not look like" the corresponding correct input character "like" the wrong character string in "tell you".
For convenience of description, in the following embodiments, the input character string and the interaction above are mainly described as chinese kanji as an example.
Referring to fig. 3, fig. 3 is a flowchart of an input error correction method provided in this embodiment.
The input error correction method provided by the embodiment comprises the following steps:
s301: obtaining an input string input by a user and not submitted to the interactive session.
It should be noted that the method provided by this embodiment is an input error correction method for an interactive session. Therefore, before error correction is performed on the input character string in the interactive session, the input character string in the interactive session needs to be acquired first, so as to further determine whether error correction is required on the input character string.
The embodiment is not particularly limited to the implementation manner of acquiring the input character string, and as an example, the input character string input by the user in the interactive session may be acquired through an input method system.
S302: and determining a target character string corresponding to the input character string from the character strings which are displayed on the screen in the interactive session.
It should be noted that, since the user may communicate, discuss, ask, etc. with other users in the interactive session, that is, the content of the input character string input by the user may be related to the content of the character string that has been displayed in the interactive session.
Thus, a target string corresponding to an input string may be determined from the strings already on-screen in the interactive session. In this embodiment, the character string that the user has been on-screen in the interactive session may be obtained through the input method system.
It should be noted that, on one hand, in practical applications, the interactive content in the interactive session is time-sensitive. That is, the on-screen time of the on-screen character string is different for a plurality of on-screen character strings in the interactive session, and the content correlation of the on-screen character string and the input character string is also different. In general, the longer the time interval between the screen-up time of the screened-up character string and the input time of the input character string, the lower the content correlation of the screened-up character string and the input character string; the shorter the time interval between the screen-up time of the already-screened character string and the input time of the input character string, the higher the correlation of the contents of the already-screened character string and the input character string. For example, the content of a string that was already on-screen two days ago in an interactive session may have no connection with the content of the input string; the content of the input string currently input by the user may be a targeted reply to the content of the displayed string within two minutes of the interactive session, and therefore, the content of the displayed string within two minutes of the interactive session is highly correlated with the content of the input string input by the user.
On the other hand, due to the interactive type input scenario, the user may communicate, discuss, ask and answer with other users. Thus, for an input string currently entered by the user, it may be a targeted reply to content having a questioning meaning among the on-screen strings in the interactive session, especially to those on-screen strings that have not been replied to. Thus, the input string currently entered by the user may be highly correlated with the on-screen string in the interactive session that has not been replied to. For example, in a comment area of a forum, a user presents a question, but no other user replies to the question, although the question-asking time of the question is far from the current time interval, but when the current user browses the comment area of the forum, the possibility of replying to the question is high, that is, the content of the input character string currently input by the user may be a targeted reply to the on-screen character string that has not been replied in the interactive session, and therefore, the relevance of the character string currently input by the user and the on-screen character string that has not been replied may be high.
In another aspect, in practical applications, some semantically incompliant character strings may exist in the displayed character strings in the interactive session, and the semantically incompliant character strings may have a high possibility of having wrong characters. If the character string with the inconsistent semantics is taken as the target character string, an incorrect error correction candidate may be provided if the input character string is correctly input. For example, the interactive session includes a displayed character string "weather forecast says peaceful day is good" and the input character string currently input by the user is "the day can go to step on", wherein the displayed character string is not semantically smooth and includes an error character string "peaceful day", and the input character string is correctly input, and the displayed character string with the non-smooth semanteme should not be used as the target character string. Therefore, in order to improve the accuracy of input error correction, in this embodiment, the target character string may be one of the on-screen character strings whose on-screen time is closer to the input time of the input character string in the interaction session; the target character string can also be an on-screen character string which is not replied in the interactive session; the target character string can also be a semantically smooth character string which is already displayed on the screen in the interactive session; the target character string can also be any one or combination of more than one of the three cases.
The screen-up time of the screened character string mentioned in this embodiment refers to the time when the user of the screened character string submits the screened character string to the interactive session, taking the interactive session shown in fig. 2a as an example, the user c submits the screened character string "what you are doing" to the interactive session at 11.
The input time of the input character string in this embodiment is the time when the user screens the currently input character string on the screen area of the input method, and taking the interactive session shown in fig. 2a as an example, the user d at 11.
In this embodiment, when acquiring a displayed character string, the input method system may acquire the screen-on time of the displayed character string at the same time, and may also acquire the input time of the input character string, select a time closer to the input character string by comparing the input time of the input character string with the screen-on time of each displayed character string, and use the displayed character string corresponding to the closer time as the target character string. Taking the interactive session shown in fig. 2a as an example, in the interactive session shown in fig. 2a, the screen-up time of the screened character string "what you are doing" is 11.
The present embodiment does not specifically limit the time interval between the screen-up time of the target character string and the input time of the input character string, and the time interval may be specifically set according to an application scenario of the interactive session. For example, when the application scenario is online chat, the time interval may be set to be short, for example, 30 minutes, and when the time interval between the screen-up time of the on-screen character string and the input time of the input character string is within 30 minutes, the on-screen character string may be considered as an on-screen character string closer to the input time of the input character string; when the application scene is forum posting and leaving a message, the time interval may be set to be longer, for example, 2 days, and when the time interval between the screen-on time of the screened character string and the input time of the input character string is within 2 days, the screened character string may be considered as the screened character string closer to the input time of the input character string.
S303: and determining whether the input character string has an error character string according to the target character string.
Since the user may communicate, discuss, ask, etc. with other users in the interactive session, that is, the content of the input character string input by the user may be related to the content of the character string that has been on-screen in the interactive session. Therefore, when determining whether the input character string needs error correction, it is first possible to determine whether the input character string has an error character string from the target character string.
S304: if yes, determining an error correction candidate item aiming at the error character string according to the target character string.
In this embodiment, after determining that the input character string has an error character string, a corresponding error correction candidate may be provided for the error character string in the input character string according to the target character string. So that the user replaces the error character string in the input character string with the error correction candidate. For example, after determining that the input string "i do not have you like" has an input character, a corresponding error correction candidate "like" may be provided for the error string "like" therein according to the target string "do you want me".
In the present embodiment, the error correction candidates may be displayed at any position of the input method interface, and the present embodiment does not specifically limit the display position of the error correction candidates, and as an example, the display position of the error correction candidates may be a position as shown in fig. 4, that is, the error correction candidates of an error character string are displayed above the error character string, and the user may replace the error character string in the input character string by clicking the error correction candidates.
The input error correction method provided by the embodiment can determine the input error in the interactive context according to the interactive context, so that the input error which is possibly irrelevant to the interactive context can be still found out and the error correction candidate items can be provided under the condition that the whole semantics of the input character string is smooth, the application range of the input error correction is improved, and the input experience of a user is also improved.
In an example, when S303 is specifically implemented, semantic relevance between the input character string and the target character string may be determined, and if the semantic relevance is lower than a preset condition, it is determined that an error character string exists in the input character string.
The semantic correlation between the input character string and the target character string mentioned in this embodiment refers to a semantic association degree between the input character string and the target character string. When the semantic relevance is relatively high, the degree of association between the input character string and the target character string can be considered to be relatively high, and the content of the input character string may be a targeted response to the content of the target character string; when the semantic relevance is relatively low, it can be considered that the degree of association between the input character string and the target character string is relatively low, and the input character string is likely to have output error characters.
The preset condition mentioned in this embodiment may be used as a boundary for determining the semantic correlation between the input character string and the target character string, and when the semantic correlation between the input character string and the target character string is not lower than the preset condition, it may be considered that the degree of association between the input character string and the target character string is relatively high; when the semantic correlation between the input character string and the target character string is lower than a preset condition, the semantic correlation between the input character string and the target character string may be considered to be relatively low.
The preset condition is not specifically limited in this embodiment, and the preset condition may be set according to a specific scene of the interactive session.
The embodiment does not specifically limit a specific implementation manner for obtaining the semantic correlation between the input character string and the target character string. As an example, a correlation probability between the input character string and the target character string may be calculated by a machine learning model or a deep learning model, and the semantic correlation between the input character string and the target character string is characterized by the correlation probability. Wherein the machine learning model and the deep learning model are obtained by training based on a large number of historical input character strings and historical target character strings.
In this embodiment, after determining that the input character string has an error character string by determining the semantic correlation between the input character string and the target character string, the error character string in the input character string may also be determined in the following two ways.
The first mode is as follows: and if the obtained sub-input codes in the input character string and the sub-codes in the codes corresponding to the target character string accord with the code similarity condition, determining the characters corresponding to the sub-input codes as the error character string.
It can be understood that, when a user inputs a character string by using the input method system, the user needs to input a pinyin character corresponding to the input character string through an input tool provided by the input method system. And selecting a character string which the user wants to be on a screen as an input character string from candidate words provided by the input method system through the Pinyin letter corresponding to the input character string. The pinyin letters corresponding to the input character strings are the input codes mentioned in this embodiment. The input code may be a complete pinyin corresponding to the input character string or an incomplete pinyin corresponding to the input character string. For example, if the input string is "i do not look like you," the input code may be "wobouxiangni," or "wobouxiangn," or "wbxn.
The sub-input codes are the phonetic alphabets corresponding to the characters in the input character string, and the sub-input codes are subsets of the input codes.
In an implementation manner of this embodiment, the sub-input codes in the codes input in the input character string and the sub-codes in the codes corresponding to the target character string meet the code similarity condition, and may be that the sub-input codes are the same as the full spellings or the short spellings of the sub-codes. For example, if the full spelling of the sub-input code is "baoyou" and the full spelling of the sub-input code is also "baoyou", the sub-input code and the sub-code may be considered to conform to the similar condition of the codes. For another example, if the simple spelling of the sub-input code is "by", and the simple spelling of the sub-input code is also "by", it can be considered that the sub-input code and the sub-code conform to the similar condition of the codes.
In another implementation manner of this embodiment, the sub-input codes in the codes input into the input character string and the sub-codes in the codes corresponding to the target character string conform to the code similarity condition, which may be that the sub-input codes are similar to the full spellings or the short spellings of the sub-codes. For example, if the full spelling of the sub-input code is "ningtian" and the full spelling of the sub-code is "ningtian", that is, the letters of the sub-input code and the sub-code are the same except for the first letter, the sub-input code and the sub-code can be considered to conform to the code similarity condition. For another example, the simplified spelling of the sub-input code is "ningt" and the simplified spelling of the sub-code is "mingt", that is, the letters of the sub-input code and the sub-code are the same except for the first letter, and it can be considered that the sub-input code and the sub-code conform to the similar condition of the code. In this embodiment, when specifically judging whether the sub-codes in the codes corresponding to the sub-input codes and the target character string conform to the code similarity condition, the codes corresponding to the target character string may be obtained according to the target character string, the codes corresponding to the target character string are divided into a plurality of sub-codes according to a certain rule, the input codes are divided into a plurality of sub-input codes according to a certain rule, and then the sub-codes in the codes corresponding to the plurality of target character strings are compared with the plurality of sub-input codes one by one, so as to obtain the sub-input codes in the codes corresponding to the target character string that conform to the code similarity condition. For example, for a target character string "do you want me", firstly, a code "nixiangwma" corresponding to the target character string is obtained, then, the code in the target character string is divided into four sub-codes "ni", "xiang", "wo" and "ma", an input code "wobou xiangni" of the input character string "do not look you like" is divided into four sub-input codes "wo", "bu", "xiang" and "ni", the sub-code "ni" is respectively compared with the sub-input codes "wo", "bu", "xiang" and "ni", the sub-code "xiang" is respectively compared with the sub-input codes "wo", "bu", "xiang" and "ni", the sub-code "wo" is respectively compared with the sub-input codes "wo", "bu", "xi ang" and "ni", the sub-code "ma" is respectively compared with the sub-input codes "wo", "bu", "xi ang" and "wo", and the sub-code "ni" is compared with the sub-input codes "wo", "ni", and the sub-code "ma" is respectively similar to the sub-input code "nixin", and the corresponding sub-input code "in the target character string is obtained as a sub-input code" and a sub-input code "nixin".
When the sub-input codes and the sub-codes in the codes corresponding to the target character string meet the coding similarity condition, under the condition that the input character string is judged and determined to have an error character string according to the semantic relevance, the characters corresponding to the sub-input codes are likely to be characters which are input by a user due to the fact that the user selects an error candidate word from candidate words provided by an input method, or are likely to be characters which are input by the user due to the fact that the user inputs the error codes when inputting the sub-input codes, and therefore the error character string is input. Therefore, in this embodiment, the character corresponding to the sub-input code is determined as the error string. For example, the sub-input code "xiang" is included in the code of the input character string "i do not look like you", the sub-code "xiang" is included in the code corresponding to the target character string "do you want me", and the semantic correlation between the character string input by the user and the target character string is relatively low, so that the user is likely to select an erroneous candidate word when selecting the candidate word of the sub-input code "xiang", and thus, it can be determined that the character "like" corresponding to the sub-input code "xiang" is an erroneous character string. For another example, the code for the input string "how to play in the tomorrow" includes the sub-input code "mingtian", the code corresponding to the target string "ning tian xiao" includes the sub-code "ningtian", and the semantic correlation between the character string input by the user and the target character string is relatively low, so that the user is likely to input an erroneous sub-code in the input sub-code, and thus, it may be determined that the character "ning tian" corresponding to the sub-input code "ningtian" is an erroneous character string, and it may also be determined that the character "ning" corresponding to the sub-input code "ning" is an erroneous character string.
In this embodiment, after determining the error character string in the input character string by using the first manner, when determining the error correction candidate for the error character string according to the target character string, the error correction candidate corresponding to the error character string may be determined according to the character corresponding to the subcode.
It can be understood that, when the sub-input code conforms to the code similarity condition of the sub-code in the code corresponding to the target character string, since the character corresponding to the sub-input code is likely to be a character which is input incorrectly due to the user selecting an incorrect candidate word from the candidate words provided by the input method, the character which the user actually wants to input is the character corresponding to the sub-code whose sub-input code conforms to the code similarity condition, and therefore, the error correction candidate corresponding to the incorrect character string can be determined according to the character corresponding to the sub-code whose sub-input code conforms to the code similarity condition. For example, the sub-input code corresponding to the character "want" in the target character string and the sub-input code corresponding to the character "like" in the input character string satisfy the similar condition of codes, and therefore, the error correction candidate corresponding to the error character string "like" can be determined according to the character "want" corresponding to the sub-code.
In this embodiment, when determining that the error correction candidate corresponding to the error character string is specifically implemented according to the character corresponding to the sub-code, the implementation may be implemented through S501 to S503.
S501: and determining a language model score of the input character string which replaces the error character string with the character to be determined, wherein the character to be determined is the character corresponding to the subcode.
As described above, since the character corresponding to the sub-input code is likely to be a character which is input incorrectly due to the user selecting an incorrect candidate word from the candidate words provided by the input method, the character corresponding to the sub-input code may also be a character which is input incorrectly due to the user inputting an incorrect code when the sub-input code is input. Therefore, the character which the user actually wants to input is the character corresponding to the sub-code of which the sub-input code meets the similar encoding condition, and therefore, the character corresponding to the sub-code of which the sub-input code meets the similar encoding condition can be used as the error correction candidate corresponding to the error character string, and the error character string can be replaced by the character corresponding to the sub-code of which the sub-input code meets the similar encoding condition. For example, the sub-input code corresponding to the character "want" in the target character string satisfies the similar condition of the codes with the sub-input code corresponding to the character "like" in the input character string, and therefore, the character "want" corresponding to the sub-code can be used as the error correction candidate corresponding to the error character string "like" and the error character string "like" can be replaced with "want". For another example, the sub-input code corresponding to the character "tomorrow" in the target character string and the sub-input code corresponding to the character "ningtian" in the input character string satisfy the code similarity condition, and therefore, the character "tomorrow" corresponding to the sub-code may be used as the error correction candidate corresponding to the error character string "tomorrow", and the error character string "ningtian" is replaced by "tomorrow".
It should be noted that, in consideration of practical applications, a certain error string may correspond to a plurality of pending characters, for example, the error string "like" corresponds to two pending characters "want" and "correspond". In order to ensure the accuracy of the error correction candidate item that can be provided, that is, to ensure that the semantics of the input character string obtained after the error character string is replaced with the undetermined character is smooth, the embodiment may further judge, through the language model score, the semantics of the input character string obtained after the error character string is replaced with the undetermined character.
S502: and determining the related probability of the undetermined character and the target character string.
S503: and if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as the error correction candidate item.
Regarding S502 and S503, it should be noted that, since the error correction method provided in this embodiment is based on the error correction method described above, not only the semantics of the input character string itself obtained after replacing the error character string with the pending character, but also the degree of association between the pending character string and the target character string need to be determined. In this embodiment, the association degree of the pending character with the target character string may be determined by the correlation probability.
After determining the language model score and the correlation probability of the character to be determined and the target character string, whether the character to be determined satisfies an error correction condition may be determined according to the language model score and the correlation probability. Specifically, in this embodiment, the score of the undetermined character may be calculated according to the language model score and the correlation probability, and when the score of the undetermined character is greater than a preset threshold, the undetermined character may be considered to satisfy an error correction condition.
Wherein the pending characters score = x1 × Log (LM) + x2 × log (P).
Wherein x1 and x2 are coefficients, LM represents a language model score of an input string that further replaces the error string with a pending character, and P represents a correlation probability of the pending character and the target string.
When the undetermined character meets the error correction condition, the undetermined character is replaced by the undetermined character, and the obtained input character string is smooth in semantics and has a high degree of association with the input character string, so that the undetermined character can be used as an error correction candidate item of the error character string, and the error correction candidate item is further prompted to a user. In this way, the probability that the error correction candidate is accepted by the user can be improved, and the user experience can be further improved.
When the undetermined character does not meet the error correction condition, the undetermined character is not the character which the user wants to input, the error correction candidate item is not prompted to the user, and the error correction candidate item with poor error correction performance can be prevented from being provided to the user, so that the user experience is influenced.
The second mode is as follows: and if the sub-input characters of the input character string and the sub-input characters of the target character string accord with the similar composition condition, determining the sub-input characters as the error character string.
The composition similarity condition in the present embodiment means that the glyphs of the two characters are different but very similar. For example, for the characters "account" and "account", the two characters can be considered similar in font style, except that the radicals are different.
In this embodiment, when it is specifically determined whether the sub-input characters of the input character string and the sub-input characters of the target character string meet the composition similarity condition, the target character string may be divided into a plurality of sub-characters according to a certain rule, the input character string may be divided into a plurality of sub-input characters according to a certain rule, and the plurality of sub-characters are respectively compared with the plurality of sub-input characters in a font manner, so as to determine the sub-input characters meeting the composition similarity condition with the sub-characters. For example, for a target character string "how many your account is", the target character string is participled to obtain 7 sub-characters: "you", "of", "account", "number", "is", "more", "less"; for the input string "my account is 123", the input string is participled to obtain 6 sub-input characters: "I", "of", "account", "number", "is", "123". The method comprises the steps of comparing the font of a sub character with the font of a sub input character, namely, I, E, account, number, Y and 123, comparing the font of the sub character with the font of the sub input character, namely, I, account, number, Y and 123, comparing the font of the sub character with the font of the sub character more than the sub character with the font of the sub input character, namely, account, number, Y and 123, comparing the font of the sub character less than the sub character with the font of the sub input character, namely, account, number, Y and 123, and obtaining the font of the sub input character which accords with the similar composition conditions to the font of the sub character.
It is understood that, when the sub-input characters of the input character string and the sub-characters of the target character string conform to the composition similarity condition, in the case where it is determined that there is an error character string in the input character string according to the semantic correlation determination, the sub-input characters are highly likely to be error characters input by the user, and therefore, in the present embodiment, the sub-input characters may be determined as the error character string. For example, the target character string is "what your account is", the input character string is "123 my account", and since the sub-input character "account" in the input character string conforms to the similar composition condition with the sub-character "account" in the target character string, the sub-input character "account" is determined to be the error character string.
In this embodiment, after determining the error character string in the input character string in the second manner, correspondingly, when determining an error correction candidate for the error character string according to the target character string, in a concrete implementation, the error correction candidate corresponding to the error character string may be determined according to the sub-character.
It is understood that, when the sub-input characters of the input character string and the sub-characters of the target character string conform to the composition similarity condition, in the case where it is determined that there is an error character string in the input character string according to the semantic correlation judgment, the sub-input characters are likely to be error characters input by the user, and therefore, the characters actually intended to be input by the user are sub-characters conforming to the composition similarity condition with the sub-input characters, and therefore, the error correction candidates corresponding to the error character string can be determined according to the sub-characters conforming to the composition similarity condition with the sub-input characters. For example, the sub-character "account" in the target character string and the sub-input character "account" in the input character string meet the composition similarity condition, and therefore, the error correction candidate corresponding to the error character string "account" can be determined according to the sub-character "account" in the target character string.
When determining that the error correction candidate corresponding to the error character string is specifically implemented according to the sub-character, the implementation can be realized through S601 to S603.
S601: and determining an input character string language model score for replacing the error character string with a character to be input, wherein the character to be input is the sub-character.
As described above, since the character that the user actually wants to input is a sub-character that meets the composition similarity condition with the sub-input character, the sub-character that meets the composition similarity condition with the sub-input character can be used as the error correction candidate corresponding to the error character string. For example, the sub-character "account" in the target character string and the sub-input character "account" in the input character string meet the composition similarity condition, and therefore, the sub-character "account" in the target character string can be used as the error correction candidate corresponding to the error character string "account".
It should be noted that, in consideration of practical applications, a certain error string may correspond to each pending character, for example, the error string "account" corresponds to two pending characters "account" and "sheet". In order to ensure the accuracy of the error correction candidate item that can be provided, that is, to ensure that the semantics of the input character string obtained after the error character string is replaced with the undetermined character is smooth, therefore, the embodiment may further determine, by the language model score, the semantics of the input character string obtained after the error character string is replaced with the undetermined character.
S602: and determining the related probability of the undetermined character and the target character string.
S603: and if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as the error correction candidate item.
S602 and S603 are similar to S502 and S503, respectively, except that the pending characters in S502 and S503 are characters corresponding to the subcode, and the pending characters in S602 and S603 are the substcharacters. For specific implementation of S602 and S603, reference may be made to the description of the portions S502 and S503, which are not described herein again.
In another example, when S303 is implemented, it may be determined that the intelligent reply content corresponding to the target character string has an error character string in the input character string if the similarity between the input character string and the intelligent reply content meets a reply similarity condition.
It should be noted that, the current input method system provides an intelligent reply function, that is, for a displayed character string in an interactive session, the input method system provides possible reply contents for the displayed character string. In consideration of practical application, the user may not select the reply option in the intelligent reply, but selects to input the character string by himself. Therefore, in the embodiment, the input character string input by the user may be compared with the content of the smart reply, and when the content of the input character string input by the user is similar to but not identical to the content of the smart reply, it may be determined that an error character string may exist in the input character string input by the user.
When the content of the input character string input by the user is similar to, but not identical to, the content of the smart reply as mentioned above, it can be considered that the similarity of the input character string and the content of the smart reply meets the reply similarity condition.
In this embodiment, after determining that the input character string has an error character string by judging that the similarity between the input character string and the intelligent reply content meets the reply similarity condition, the error character string in the input character string may also be determined in the following two ways.
Mode 1: and if the obtained sub-input codes in the input codes of the input character strings and the sub-codes in the codes corresponding to the intelligent reply content meet the similar condition of codes, determining the characters corresponding to the sub-input codes as the error character strings.
Correspondingly, when the error correction candidate item aiming at the error character string is determined according to the target character string in specific implementation, the error correction candidate item corresponding to the error character string can be determined according to the character corresponding to the subcode.
It should be noted that the method 1 is similar to the specific implementation manner of determining an error character string in the input character string by using the first method in the above-mentioned "determining an error character string in the input character string by determining the semantic correlation between the input character string and the target character string". In the first aspect, the character corresponding to the sub-input code whose sub-code of the target character string conforms to the code similarity condition is determined as the error character string by determining whether the sub-code in the code input by the input character string and the sub-code in the code corresponding to the target character string conform to the code similarity condition; and the mode 1 is to determine the character corresponding to the sub-input code satisfying the coding similarity condition with the sub-code of the intelligent reply content as the error character string by judging whether the sub-input code in the code input by the input character string and the sub-code in the code corresponding to the intelligent reply content meet the coding similarity condition.
For a specific implementation of the mode 1, reference may be made to the description part of the first implementation, and details are not described here.
Mode 2: and if the sub-input characters of the input character string and the sub-characters of the intelligent reply content accord with the similar composition condition, determining the sub-input characters as the error character string.
Correspondingly, when the error correction candidate item for the error character string is determined according to the target character string in a specific implementation, the error correction candidate item corresponding to the error character string can be determined according to the sub-character.
It should be noted that the method 2 is similar to the specific implementation manner of determining an error character string in the input character string by using the second method in the above-mentioned "determining an error character string in the input character string by determining the semantic correlation between the input character string and the target character string". Only, in the second mode, the characters corresponding to the sub-input characters of the target character string which meet the composition similar condition are determined as the error character string by judging whether the sub-input characters of the input character string and the sub-characters of the target character string meet the composition similar condition; and the mode 2 is to determine the sub-input character satisfying the composition similarity condition with the sub-character of the intelligent reply content as the error character string by judging whether the sub-input character of the input character string and the sub-character of the intelligent reply content satisfy the composition similarity condition.
For a specific implementation of the mode 2, reference may be made to the description part of the second implementation, and details are not described here again.
Based on the input error correction method provided in the above embodiment, the method will be described below with reference to specific scenarios.
In the scene, the user a chats with the user b by using the instant messaging software, the content input by the user a is 'package mail', and the user b inputs the letter 'baoyou' through the pinyin input method and selects the candidate word 'blessing' provided by the input method.
Referring to fig. 7, which is a flowchart of an input error correction method provided in this embodiment, the method provided in this embodiment includes the following steps:
s701: an input string "blessing" entered by a user that has not yet been submitted to the interactive session is obtained.
S702: and determining a target character string 'package mail' corresponding to the input character string 'blessing' from the character strings which are displayed on the screen in the interactive session.
S703: and judging that the input character 'blessing' has an error character string according to the semantic relevance of 'the package mail' and 'blessing'.
S704: whether the sub-input codes 'bao' and 'you' of the code 'baoyou' input by the input character string conform to the similar coding conditions with the sub-codes 'zhe', 'kuan', 'bao', 'you' and 'ma' of the code 'zhekuan and baoyouma' corresponding to the target character string is judged.
S705: and determining that the sub-input code 'bao' and the sub-code accord with similar coding conditions, and the sub-input code 'you' and the sub-code 'you' accord with similar coding conditions.
S706: and determining the characters 'guaranty' and 'you' corresponding to the sub-input code 'bao' and the sub-input code 'you' as error character strings.
S707: substituting the blessing in the input character string with the undetermined character 'package post', and determining that the undetermined character 'package post' meets the error correction condition according to the language model score of the input character string 'package post' after the substitution and the related probability of the undetermined character 'package post' and the target character string 'package post'.
S708: and taking the package mail as a candidate for error correction, and displaying the candidate to the user b.
The input error correction method provided by the embodiment can determine the input error in the interactive context according to the interactive context, so that the input error which is possibly irrelevant to the interactive context can be still found out and error correction candidate items can be provided under the condition that the whole semantics of the input character string are smooth, the application range of the input error correction is improved, and the input experience of a user is also improved.
Based on an input error correction method provided in the foregoing embodiment, the present embodiment provides an input error correction apparatus, and fig. 8 shows a block diagram of the structure of an input error correction apparatus including an input character string acquisition unit 801, a target character string determination unit 802, an error character string determination unit 803, and an error correction candidate determination unit 804.
An input character string acquisition unit 801 for acquiring an input character string input by a user and not yet submitted to an interactive session;
a target character string determining unit 802, configured to determine a target character string corresponding to the input character string from the character strings already displayed in the interactive session;
an error string determination unit 803 for determining whether the input string has an error string from the target string;
an error correction candidate determining unit 804, configured to determine, when an error character string exists in the input character string, an error correction candidate for the error character string according to the target character string.
Optionally, the target character string is any one or a combination of more than one of the following:
one of the on-screen character strings having an on-screen time closer to an input time of the input character string in the interactive session;
an on-screen string in the interactive session that has not been replied to;
and the characters on the screen are semantically smooth in the interactive session.
Optionally, the error string determining unit 803 includes:
a semantic correlation judging subunit, configured to judge semantic correlation between the input character string and the target character string;
and the first error character string determining subunit is used for determining that the input character string has an error character string if the semantic relevance is lower than a preset condition.
Optionally, the apparatus further comprises:
a first error character string determining unit, configured to determine a character corresponding to a sub-input code as the error character string if it is obtained that the sub-input code in the input code of the input character string and the sub-code in the code corresponding to the target character string conform to a code similarity condition; or,
and the second error character string determining unit is used for determining the sub-input characters of the input character string as the error character string if the sub-input characters of the input character string and the sub-characters of the target character string accord with the similar composition condition.
Optionally, the error correction candidate determining unit 804 includes:
a first error correction candidate determining subunit, configured to determine an error correction candidate corresponding to the error character string according to a character corresponding to the subcode; or,
and the second error correction candidate determining subunit is used for determining the error correction candidate corresponding to the error character string according to the sub-character.
Optionally, the first error correction candidate determining subunit includes:
the first language model score determining submodule is used for determining the language model score of the input character string which replaces the error character string with a character to be determined, and the character to be determined is a character corresponding to the subcode;
a first correlation probability determination submodule, configured to determine a correlation probability between the undetermined character and the target character string;
a first error correction candidate determining sub-module, configured to determine the undetermined character as the error correction candidate if the language model score and the correlation probability determine that the undetermined character satisfies an error correction condition;
or, the second error correction candidate determining subunit includes:
a second language model score determining submodule for determining a language model score of an input character string in which the wrong character string is replaced with a character to be determined, the character to be determined being the sub-character;
the second correlation probability determination submodule is used for determining the correlation probability of the undetermined character and the target character string;
and the second error correction candidate determining submodule is used for determining the undetermined character as the error correction candidate if the language model score and the correlation probability determine that the undetermined character meets the error correction condition.
Optionally, the error string determining unit 803 includes:
the intelligent reply determining subunit is used for determining the intelligent reply content corresponding to the target character string;
and the second error character string determining subunit is used for determining that an error character string exists in the input character string if the similarity between the input character string and the intelligent reply content meets the reply similarity condition.
Optionally, the apparatus further comprises:
a third error character string determining unit, configured to determine, if it is obtained that a sub-input code in an input code of the input character string and a sub-code in a code corresponding to the intelligent reply content conform to a code similarity condition, a character corresponding to the sub-input code as the error character string; or,
and the fourth error character string determining unit is used for determining the sub-input characters of the input character string as the error character string if the sub-input characters of the input character string and the sub-characters of the intelligent reply content accord with the similar composition condition.
Optionally, the error correction candidate determining unit 804 includes:
a third error correction candidate determining subunit, configured to determine an error correction candidate corresponding to the error character string according to the character corresponding to the subcode; or,
and the fourth error correction candidate determining subunit is used for determining the error correction candidate corresponding to the error character string according to the substring.
The input error correction device provided by the embodiment can determine the input error in the interactive context according to the interactive context, so that the input error which is possibly irrelevant to the interactive context can be still found out and the error correction candidate items can be provided under the condition that the whole semantics of the input character string is smooth, the application range of the input error correction is expanded, and the input experience of a user is also expanded.
Fig. 9 is a block diagram illustrating an input error correction apparatus 900 according to an example embodiment. For example, the apparatus 900 may be a robot, a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 9, apparatus 900 may include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916.
The processing component 902 generally controls overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing element 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.
The memory 904 is configured to store various types of data to support operation at the apparatus 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 906 provides power to the various components of the device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 900.
The multimedia component 908 comprises a screen providing an output interface between the device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when apparatus 900 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.
I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 914 includes one or more sensors for providing status assessment of various aspects of the apparatus 900. For example, sensor assembly 914 may detect an open/closed state of device 900, the relative positioning of components, such as a display and keypad of device 900, the change in position of device 900 or a component of device 900, the presence or absence of user contact with device 900, the orientation or acceleration/deceleration of device 900, and the change in temperature of device 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 916 is configured to facilitate communications between the apparatus 900 and other devices in a wired or wireless manner. The apparatus 900 may access a wireless network based on a communication standard, such as WiFi,2G or 8G, or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as memory 904 comprising instructions executable by processor 920 of device 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a method of input error correction, the method comprising:
acquiring an input character string which is input by a user and is not submitted to an interactive session;
determining a target character string corresponding to the input character string from the character string which is displayed on the screen in the interactive session;
determining whether the input character string has an error character string according to the target character string;
if yes, determining a candidate item for error correction aiming at the error character string according to the target character string.
Fig. 10 is a schematic structural diagram of a server in the embodiment of the present invention. The server 1000, which may vary significantly due to configuration or performance, may include one or more Central Processing Units (CPUs) 1022 (e.g., one or more processors) and memory 1032, one or more storage media 1030 (e.g., one or more mass storage devices) that store applications 1042 or data 1044. Memory 1032 and storage medium 1030 may be, among other things, transient or persistent storage. The program stored on the storage medium 1030 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, a central processor 1022 may be disposed in communication with the storage medium 1030, and configured to execute a series of instruction operations in the storage medium 1030 on the server 1000.
The server 1000 may also include one or more power supplies 1024, one or more wired or wireless network interfaces 1050, one or more input-output interfaces 1058, one or more keyboards 1054, and/or one or more operating systems 1041, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
Those of ordinary skill in the art will understand that: all or part of the steps of implementing the method embodiments may be implemented by hardware associated with program instructions, where the program may be stored in a computer-readable storage medium, and when executed, performs the steps including the method embodiments; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as a read-only memory (ROM), a RAM, a magnetic disk, or an optical disk.
It should be noted that, in this specification, each embodiment is described in a progressive manner, and the same and similar parts between the embodiments are referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. An input error correction method, the method comprising:
acquiring an input character string which is input by a user and is not submitted to an interactive session;
determining a target character string corresponding to the input character string from the character string which is displayed on the screen in the interactive session;
determining whether the input character string has an error character string according to the target character string;
after determining that the input string has an error string, the method further comprises:
if the sub-input codes in the input codes of the input character strings and the sub-codes in the codes corresponding to the target character strings meet the code similarity condition, determining the characters corresponding to the sub-input codes as the error character strings;
determining a language model score of an input character string which replaces the error character string with a character to be determined, wherein the character to be determined is a character corresponding to the subcode;
determining the correlation probability of the undetermined character and the target character string;
if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as an error correction candidate item corresponding to the error character string;
or,
after determining that the input string has an error string, the method further comprises:
if the sub-input characters of the input character string and the sub-characters of the target character string accord with the similar composition condition, determining the sub-input characters as the error character string;
determining a language model score of an input character string which replaces the error character string with a character to be determined, wherein the character to be determined is the sub-character;
determining the correlation probability of the undetermined character and the target character string;
and if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as an error correction candidate corresponding to the error character string.
2. The method of claim 1, wherein the target string is any one or more of the following:
one of the on-screen character strings having an on-screen time closer to an input time of the input character string in the interactive session;
an on-screen string in the interactive session that has not been replied to;
and the characters on the screen are semantically smooth in the interactive session.
3. The method of claim 1, wherein the determining whether the input string has an error string based on the target string comprises:
judging semantic correlation between the input character string and the target character string;
and if the semantic correlation is lower than a preset condition, determining that the input character string has an error character string.
4. The method of claim 1, wherein determining whether the input string has an error string based on the target string comprises
Determining intelligent reply content corresponding to the target character string;
and if the similarity of the input character string and the intelligent reply content meets the reply similarity condition, determining that the input character string has an error character string.
5. The method of claim 4, wherein after the determining that the input string has an error string, the method further comprises:
if the sub-input codes in the input codes of the input character strings and the sub-codes in the codes corresponding to the intelligent reply content meet the code similarity condition, determining the characters corresponding to the sub-input codes as the error character strings; or,
and if the sub-input characters of the input character string and the sub-characters of the intelligent reply content accord with the similar composition condition, determining the sub-input characters as the error character string.
6. The method of claim 5, wherein the determining the error correction candidates for the error string according to the target string comprises:
determining an error correction candidate item corresponding to the error character string according to the character corresponding to the subcode; or,
and determining an error correction candidate item corresponding to the error character string according to the sub-character.
7. An input error correction apparatus, characterized in that the apparatus comprises:
an input character string obtaining unit, configured to obtain an input character string that is input by a user and that has not been submitted to an interactive session;
the target character string determining unit is used for determining a target character string corresponding to the input character string from the character string which is displayed on the screen in the interactive session;
an error string determining unit configured to determine whether the input string has an error string based on the target string;
a correction candidate determining unit configured to determine, when an error character string is included in the input character string, a correction candidate for the error character string based on the target character string;
the device further comprises:
a first error character string determining unit, configured to determine a character corresponding to a sub-input code as the error character string if it is obtained that the sub-input code in the input code of the input character string and the sub-code in the code corresponding to the target character string conform to a code similarity condition;
the error correction candidate determining unit includes:
a first error correction candidate determining subunit, configured to determine an error correction candidate corresponding to the error character string according to a character corresponding to the subcode;
the first error correction candidate determining subunit includes:
a first language model score determining submodule, configured to determine a language model score of an input character string in which the incorrect character string is replaced with a character to be determined, where the character to be determined is a character corresponding to the subcode;
a first correlation probability determination submodule, configured to determine a correlation probability between the undetermined character and the target character string;
a first error correction candidate determining sub-module, configured to determine the undetermined character as the error correction candidate if the language model score and the correlation probability determine that the undetermined character satisfies an error correction condition;
alternatively, the apparatus further comprises:
a second error string determining unit, configured to determine, if a sub-input character of the input string and a sub-character of the target string conform to a composition similarity condition, the sub-input character as the error string;
the error correction candidate determining unit includes:
a second error correction candidate determining subunit, configured to determine an error correction candidate corresponding to the error character string according to the substring;
the second error correction candidate determining subunit includes:
the second language model score determining sub-module is used for determining the language model score of the input character string which replaces the error character string with a character to be determined, and the character to be determined is the sub-character;
the second correlation probability determination submodule is used for determining the correlation probability of the undetermined character and the target character string;
and the second error correction candidate determining submodule is used for determining the undetermined character as the error correction candidate if the language model score and the correlation probability determine that the undetermined character meets the error correction condition.
8. The apparatus of claim 7, wherein the target string is any one or more of the following in combination:
one of the on-screen character strings having an on-screen time closer to an input time of the input character string in the interactive session;
an on-screen string in the interactive session that has not been replied to;
and the characters on the screen are semantically smooth in the interactive session.
9. The apparatus of claim 7, wherein the error string determination unit comprises:
a semantic correlation judging subunit, configured to judge semantic correlation between the input character string and the target character string;
and the first error character string determining subunit is used for determining that the input character string has an error character string if the semantic correlation is lower than a preset condition.
10. The apparatus of claim 7, wherein the error string determination unit comprises:
the intelligent reply determining subunit is used for determining the intelligent reply content corresponding to the target character string;
and the second error character string determining subunit is used for determining that the input character string has an error character string if the similarity between the input character string and the intelligent reply content meets the reply similarity condition.
11. The apparatus of claim 10, further comprising:
a third error character string determining unit, configured to determine, if it is obtained that a sub-input code in an input code of the input character string and a sub-code in a code corresponding to the intelligent reply content conform to a code similarity condition, a character corresponding to the sub-input code as the error character string; or,
and the fourth error character string determining unit is used for determining the sub-input characters of the input character string as the error character string if the sub-input characters of the input character string and the sub-characters of the intelligent reply content accord with the similar composition condition.
12. The apparatus of claim 11, wherein the error correction candidate determination unit comprises:
a third error correction candidate determining subunit, configured to determine an error correction candidate corresponding to the error character string according to the character corresponding to the subcode; or,
and the fourth error correction candidate determining subunit is used for determining the error correction candidate corresponding to the error character string according to the sub-character.
13. An apparatus for input error correction comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to perform the one or more programs include instructions for:
acquiring an input character string which is input by a user and is not submitted to an interactive session;
determining a target character string corresponding to the input character string from the character string which is displayed on the screen in the interactive session;
determining whether the input character string has an error character string according to the target character string;
after determining that the input string has an error string, the operations further comprise:
if the sub-input codes in the codes input by the input character strings and the sub-codes in the codes corresponding to the target character strings meet the code similarity condition, determining the characters corresponding to the sub-input codes as the error character strings;
determining a language model score of an input character string which replaces the error character string with a character to be determined, wherein the character to be determined is a character corresponding to the subcode;
determining the correlation probability of the undetermined character and the target character string;
if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as an error correction candidate item corresponding to the error character string;
or,
after determining that the input string has an error string, the operations further comprise:
if the sub-input characters of the input character string and the sub-characters of the target character string accord with the similar composition condition, determining the sub-input characters as the error character string;
determining a language model score of an input character string which replaces the error character string with a character to be determined, wherein the character to be determined is the sub-character;
determining the correlation probability of the undetermined character and the target character string;
and if the language model score and the correlation probability determine that the undetermined character meets an error correction condition, determining the undetermined character as an error correction candidate corresponding to the error character string.
14. A non-transitory computer readable storage medium, instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the input error correction method of any one of claims 1-6.
CN201711484183.9A 2017-12-29 2017-12-29 Input error correction method and device Active CN109992120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711484183.9A CN109992120B (en) 2017-12-29 2017-12-29 Input error correction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711484183.9A CN109992120B (en) 2017-12-29 2017-12-29 Input error correction method and device

Publications (2)

Publication Number Publication Date
CN109992120A CN109992120A (en) 2019-07-09
CN109992120B true CN109992120B (en) 2022-10-04

Family

ID=67110300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711484183.9A Active CN109992120B (en) 2017-12-29 2017-12-29 Input error correction method and device

Country Status (1)

Country Link
CN (1) CN109992120B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889267A (en) * 2019-11-29 2020-03-17 北京金山安全软件有限公司 Method and device for editing characters in picture, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193993A (en) * 2011-04-20 2011-09-21 北京百度网讯科技有限公司 Method, device and facility for determining similarity information between character string information
CN104298672A (en) * 2013-07-16 2015-01-21 北京搜狗科技发展有限公司 Error correction method and device for input
CN104915264A (en) * 2015-05-29 2015-09-16 北京搜狗科技发展有限公司 Input error-correction method and device
CN106325537A (en) * 2015-06-23 2017-01-11 腾讯科技(深圳)有限公司 Information inputting method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8713433B1 (en) * 2012-10-16 2014-04-29 Google Inc. Feature-based autocorrection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193993A (en) * 2011-04-20 2011-09-21 北京百度网讯科技有限公司 Method, device and facility for determining similarity information between character string information
CN104298672A (en) * 2013-07-16 2015-01-21 北京搜狗科技发展有限公司 Error correction method and device for input
CN104915264A (en) * 2015-05-29 2015-09-16 北京搜狗科技发展有限公司 Input error-correction method and device
CN106325537A (en) * 2015-06-23 2017-01-11 腾讯科技(深圳)有限公司 Information inputting method and device

Also Published As

Publication number Publication date
CN109992120A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
US9977779B2 (en) Automatic supplementation of word correction dictionaries
CN107918496B (en) Input error correction method and device for input error correction
CN107688399B (en) Input method and device and input device
CN107564526B (en) Processing method, apparatus and machine-readable medium
CN108073292B (en) Intelligent word forming method and device for intelligent word forming
CN109582768B (en) Text input method and device
CN109002183B (en) Information input method and device
CN106886294B (en) Input method error correction method and device
CN107797676B (en) Single character input method and device
CN112631435A (en) Input method, device, equipment and storage medium
CN109992120B (en) Input error correction method and device
CN110633017A (en) Input method, input device and input device
CN110780749B (en) Character string error correction method and device
CN110795014B (en) Data processing method and device and data processing device
CN109144286B (en) Input method and device
CN108108356B (en) Character translation method, device and equipment
CN107688400B (en) Input error correction method and device for input error correction
CN114610163A (en) Recommendation method, apparatus and medium
CN109471538B (en) Input method, input device and input device
CN108983992B (en) Candidate item display method and device with punctuation marks
CN112905023A (en) Input error correction method and device for input error correction
CN112181163A (en) Input method, input device and input device
CN113407099A (en) Input method, device and machine readable medium
CN112612442A (en) Input method and device and electronic equipment
CN110389666B (en) Input error correction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant