CN110780749B

CN110780749B - Character string error correction method and device

Info

Publication number: CN110780749B
Application number: CN201810759149.6A
Authority: CN
Inventors: 费腾; 崔欣; 张扬
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-07-11
Filing date: 2018-07-11
Publication date: 2024-03-08
Anticipated expiration: 2038-07-11
Also published as: CN110780749A

Abstract

The embodiment of the application discloses a character string error correction method and device, which are used for determining a screen word group on a screen of a user, wherein the screen word group can comprise a plurality of screen words on the screen of the user in sequence, the screen word group is input into a deep learning model, a result vector can be obtained, and the result vector can identify the probability of different words appearing after the screen word group. When the target character string input after the on-screen phrase needs to be corrected, a plurality of undetermined error correction results can be obtained according to the target character string. Because the on-screen phrase formed by the last on-screen words can more accurately reflect the current actual input requirement of the user in terms of semantics, the error correction result of the target character string relative to the on-screen phrase can be determined from the plurality of undetermined error correction results according to different words identified by the result vector and the probabilities corresponding to the different words, the error correction result has higher possibility of meeting the actual input requirement of the user, and the input experience of the user is improved.

Description

Character string error correction method and device

Technical Field

The present application relates to the field of input methods, and in particular, to a method and apparatus for correcting character strings.

Background

The input method belongs to a common text input tool, for example, corresponding candidates can be displayed according to character strings input by a user, and the candidates selected by the user are displayed on a screen.

When a user inputs a character string, there is a possibility that an error is occurred, such as a wrong character is input, a position error of the input character in the character string, or the like. If the error correction is not performed automatically, the wrongly input character strings may display wrong candidates for the user or display no candidates, so that poor input experience is caused.

In the existing error correction mode, the input method can segment the character string input by the user, correct the error of the possibly misinput segmented word according to the character combination rule, and then display the corresponding candidate item according to the error correction result. However, the error correction method only considers the character string itself, but does not consider the actual input requirement of the user, so that the displayed candidate items do not meet the actual input requirement of the user, and the input experience of the user is reduced.

Disclosure of Invention

In order to solve the technical problems, the application provides a character string error correction method and device, according to different words identified by result vectors corresponding to the upper screen word groups and probabilities corresponding to the different words, error correction results which are more likely to meet the expectations of users can be determined from a plurality of undetermined error correction results, and input experience of the users is improved.

The embodiment of the application discloses the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for correcting error of a character string, where the method includes:

determining a screen-on phrase, wherein the screen-on phrase comprises at least one screen-on word which is sequentially screen-on by a user;

obtaining a result vector according to the on-screen phrase and the deep learning model, wherein the result vector is used for identifying the probability of different words appearing after the on-screen phrase;

correcting the error of the target character string input after the word group is on the screen to obtain a plurality of undetermined error correction results;

and determining an error correction result corresponding to the target character string from the plurality of undetermined error correction results according to the result vector.

Optionally, the error correction result corresponding to the target character string is a pending error correction result, of which the probability identified by the result vector is higher than a predetermined threshold, among the plurality of pending error correction results.

Optionally, the determining, according to the result vector, an error correction result corresponding to the target string from the plurality of pending error correction results includes:

determining at least one word with probability higher than a preset threshold according to the result vector;

determining character strings corresponding to the at least one word respectively;

And determining the undetermined error correction result which is the same as the character string in the undetermined error correction results as an error correction result corresponding to the target character string.

establishing a corresponding relation between word identifications of different words identified by the result vector and probabilities of the different words;

establishing a character string query tree according to character strings corresponding to different words identified by the result vector, wherein the character strings in the character string query tree have word identifications of the corresponding words;

inquiring the character string inquiry tree according to the plurality of undetermined error correction results;

if a target pending error correction result in the plurality of pending error correction results obtains a query result, determining a probability corresponding to the target pending error correction result according to a word identifier in the query result and the corresponding relation;

and if the probability corresponding to the target undetermined error correction result meets a preset condition, determining the target undetermined error correction result as an error correction result corresponding to the target character string.

Optionally, the method further comprises:

and displaying the corresponding candidate item according to the error correction result corresponding to the target character string.

Optionally, the number of the on-screen words included in the on-screen phrase is smaller than a predetermined threshold, or the on-screen words included in the on-screen phrase are determined according to the spacer.

Optionally, the on-screen phrase is determined according to the position of the input focus in the on-screen editing area.

Optionally, the upper screen phrase at least includes a previous upper screen phrase at the position of the input focus.

In a second aspect, an embodiment of the present application provides a character string error correction apparatus, where the apparatus includes a first determining unit, a learning unit, an error correction unit, and a second determining unit:

the first determining unit is used for determining a screen-on phrase, and the screen-on phrase comprises at least one screen-on phrase which is sequentially screen-on by a user;

the learning unit is used for obtaining a result vector according to the on-screen phrase and the deep learning model, and the result vector is used for marking the probability of different words appearing after the on-screen phrase;

the error correction unit is used for correcting the error of the target character string input after the word group is on the screen to obtain a plurality of undetermined error correction results;

and the second determining unit determines the error correction result corresponding to the target character string from the plurality of undetermined error correction results according to the result vector.

Optionally, the second determining unit includes a first determining subunit, a second determining subunit, and a third determining subunit:

the first determining subunit is configured to determine, according to the result vector, at least one word with a probability higher than a predetermined threshold;

the second determining subunit is configured to determine character strings corresponding to the at least one word respectively;

and the third determining subunit is configured to determine, as an error correction result corresponding to the target string, a pending error correction result that is the same as the string in the plurality of pending error correction results.

Optionally, the second determining unit includes a first establishing subunit, a second establishing subunit, a querying subunit, a fourth determining subunit, and a fifth determining subunit:

the first establishing subunit is used for establishing a corresponding relation between word identifications of different words identified by the result vector and probabilities of the different words;

the second establishing subunit is configured to establish a string query tree according to strings corresponding to different words identified by the result vector, where strings in the string query tree have word identifiers of the corresponding words;

The inquiring subunit is used for inquiring the character string inquiring tree according to the plurality of undetermined error correction results;

the fourth determining subunit is configured to determine, if a target pending error correction result in the plurality of pending error correction results obtains a query result, a probability corresponding to the target pending error correction result according to a word identifier in the query result and the correspondence;

and the fifth determining subunit is configured to determine the target pending error correction result as an error correction result corresponding to the target character string if the probability corresponding to the target pending error correction result meets a preset condition.

Optionally, the device further comprises a display unit:

and the display unit is used for displaying the corresponding candidate item according to the error correction result corresponding to the target character string.

In a third aspect, an embodiment of the present application provides a string error correction apparatus, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by one or more processors, where the one or more programs include instructions for performing the string error correction method according to any one of the first aspects.

In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method for character string error correction according to any one of the first aspects.

According to the technical scheme, the screen word group of the screen of the user is determined, the screen word group can comprise a plurality of screen words of the screen of the user in sequence, the screen word group is input into the deep learning model, the result vector can be obtained, the probability of different words appearing after the screen word group can be identified by the result vector, and the higher the probability of one word identified by the result vector, the higher the probability of the word appearing after the screen word group in terms of semantics. When the target character string input after the on-screen phrase needs to be corrected, a plurality of undetermined error correction results can be obtained according to the target character string. Because the on-screen phrase formed by the last on-screen word on the last screen can more accurately reflect the current actual input requirement of the user in terms of semantics, the error correction result of the target character string relative to the on-screen phrase can be determined from the plurality of undetermined error correction results according to different words identified by the result vector and the probabilities corresponding to the different words, the error correction result has higher possibility of meeting the actual input requirement of the user, and the candidate displayed according to the error correction result is more likely to meet the expectations of the user, and the input experience of the user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a schematic diagram of a character string error correction system according to an embodiment of the present application;

fig. 2 is a method flowchart of a character string error correction method according to an embodiment of the present application;

fig. 3 is a device structure diagram of a character string error correction device according to an embodiment of the present application;

fig. 4 is a block diagram of an apparatus for a character string error correction method according to an embodiment of the present application;

fig. 5 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

Because the traditional error correction mode only considers errors of the character string to provide an error correction result, and does not consider actual input requirements of users, even if the error correction result is determined to meet the error input characteristics of the character string, the error correction result is combined with the context to not meet the semantic requirements, so that candidates displayed according to the determined error correction result do not meet the actual input requirements of the users, and input experience of the users is reduced.

To this end, the embodiment of the application provides a character string error correction method, which can be applied to a terminal provided with an input method, wherein the terminal can be a mobile phone, a notebook computer, a desktop computer, a tablet computer, an electronic book reader, a dynamic image expert compression standard audio layer 4 (Moving Picture Experts Group Audio Layer IV, MP 4) player, a wearable device (such as a smart watch), a smart sound box, a laptop portable computer and the like, which are provided with the input method.

The user can input the character string by using the input method set by the terminal, and realize screen-up by selecting the candidate item displayed by the corresponding character string. The user may input the character string through an input method using different ways, such as through a virtual keyboard, through a handwriting area, or through voice input, etc. The character string input may be characters in different languages, and may include letters, numbers, symbols, and the like. For example, in the chinese language, the character string may include pinyin. It should be noted that, in the embodiment of the present application, the input method may include, besides a common chinese input method (such as pinyin input method, wubi input method, zhuyin input method, etc.), other language input methods (such as japanese hiragana input method, korean input method, etc.).

After the user selects the candidate item corresponding to the character string to screen, in the embodiment of the application, a result vector corresponding to a screen phrase formed by the screen words can be determined through the deep learning model and one or more screen words on which the user screens in turn, and the result vector is used as an error correction basis of the target character string input by the user after the screen phrase is based on the result vector. In the embodiment of the present application, a word may be a basic language constituent unit, for example, chinese, and a word may include a chinese character or a plurality of chinese characters, and a word may also be a spacer, for example, a punctuation mark. A word can provide complete semantic information, and can be obtained by carrying out semantic segmentation on the word group. The on-screen word can be a word selected by a user to be on-screen, and the on-screen phrase is a phrase formed by one or more on-screen words selected by the user to be on-screen. When the upper screen phrase input by the user is "tomorrow writing", two upper screen phrases, namely "tomorrow" and "writing", can be obtained through segmentation.

For example, in the terminal shown in fig. 1, a session window of social software or an editing window of a short message or mail is displayed.

Wherein 101 is an on-screen editing area for displaying text content that is on-screen by the user and has not yet been transmitted, and the user can edit the text in the area.

Reference numeral 102 denotes a character string editing area for displaying a character string input by a user, and the user can edit the input character string.

And 103 is a candidate display area, and candidates corresponding to the character string input by the user can be displayed.

It should be noted that fig. 1 illustrates only one possible input scenario in the embodiment of the present application, and does not limit the application of the embodiment of the present application to the scenario illustrated in fig. 1, for example, in an input scenario in which a user composes a document, an area in which on-screen text of the user is displayed in the document may also be an on-screen editing area, and an area in which an input character string is displayed based on an input position is a character string editing area.

The result vector corresponding to the on-screen phrase may be determined by the on-screen phrase and the deep learning model 105 (which may be, for example, a 1stm model), and may identify the probability of a different word appearing after the on-screen phrase, the higher the probability of a word identified by the result vector, the more likely it is that, semantically, the word will continue to appear after the on-screen phrase. For example, when the screen phrase is "tomorrow writing", the result vector may be "paper", "job", "complete", "what" and the like, where the probability of the identified word is higher.

When the user continues to input the character string based on the upper screen phrase, the character string can be called as a target character string in the embodiment of the application, and the input method can correct the target character string under the condition that the error transmission of the target character string is found, and can obtain at least one undetermined error correction result. For example, when the target string is "lnu ' wen" shown in fig. 1, the corresponding pending error correction results may include "liu ' wen" and "lun ' wen". When determining a plurality of pending error correction results, unlike the prior art, which directly determines an error correction result corresponding to a target character string from the plurality of pending error correction results, in the embodiment of the present application, a vector result determined before is used as a basis for determining the error correction result, and the error correction result corresponding to the target character string is determined from the plurality of pending error correction results according to the result vector. For example, as shown in fig. 1, according to the result vector of the upper screen phrase "tomorrow writing", the "lun 'wen" is determined as the error correction result corresponding to the target character string "lnu' wen" from the pending error correction results "liu 'wen" and "lun' wen". And may display the candidate "paper" corresponding to the error correction result "lun' wen" in the candidate display area 103. In the prior art, it is possible to determine the pending error correction result "liu ' wen" as the error correction result corresponding to the target character string "lnu ' wen" and display the candidate "Liu Wen" corresponding to the error correction result "liu ' wen" to the user.

It should be noted that the' "included in the foregoing character string may be a word segmentation symbol generated by the input method after word segmentation for the character string input by the user, and may not be input by the user.

It should be noted that, in the foregoing examples, the target character string is in a full-screen form, and of course, the specific form of the target character string is related to the input habit of the user, and may be in a full-screen form, or may be in a half-full-screen form, a simple spelling form, or the like. The corresponding pending error correction results may also be in the form of semi-full screen, simple spelling, etc.

Because the on-screen phrase formed by the last on-screen word on the last screen can more accurately reflect the current actual input requirement of the user in terms of semantics, the error correction result of the target character string relative to the on-screen phrase can be determined from the plurality of undetermined error correction results according to different words identified by the result vector and the probabilities corresponding to the different words, the error correction result has higher possibility of meeting the actual input requirement of the user, and the candidate displayed according to the error correction result is more likely to meet the expectations of the user, and the input experience of the user is improved. From the foregoing input scenario for "lnu' wen", the error correction results determined based on the embodiments of the present application and the prior art are different, but it is obvious that the error correction results obtained based on the embodiments of the present application take into account the context semantic association, and are more capable of meeting the actual input requirement of the user, so that the displayed candidates are more likely to be selected by the user.

Next, a method for correcting character strings according to an embodiment of the present application will be described with reference to the accompanying drawings, as shown in fig. 2, where the method includes:

s201: and determining the word group on the screen.

The screen-on phrase comprises at least one screen-on word which is sequentially screen-on by a user.

The embodiment of the application provides various ways for determining the on-screen phrase. Generally, the number of the on-screen words included in the on-screen word groups should not be too large, the input requirement of the on-screen word groups formed by the excessive on-screen words can come in and go out somewhat from the actual input requirement of the user, and the calculated amount is increased and the system burden is increased when the on-screen word groups formed by the excessive on-screen words pass through the deep learning model.

Therefore, the number of the upper screen words in the upper screen word group can be limited through a preset threshold value, the upper screen word group can be determined through the spacer in the upper screen content, and the upper screen word group determined through the spacer can comprise a sentence at most and also can be used for limiting the number of the upper screen words in the upper screen word group because the spacer plays a role in separating sentences.

It should be noted that the on-screen phrase includes at least one on-screen word that the user sequentially screens, and which one or more users sequentially screen on-screen words are used as the on-screen phrase may be determined according to the input focus.

The input focus mentioned in the embodiments of the present application will be described first. The input focus is in the on-screen editing area, and is used for identifying the position where the text appears when the screen is next on, and the user can be prompted in a flashing display mode generally, or can be in an invisible state. In the input scenario shown in FIG. 1, 104 may be the input focus for identifying that text will appear behind the "write" as the user continues to screen. The user can adjust the position of the input focus in the on-screen text according to the input requirement, for example, after the user has on-screen "tomorrow writing", the user wants to go on the screen text "afternoon" after "tomorrow", and can move the input focus between "tomorrow" and "writing".

Under the condition that the on-screen characters are displayed in a left-to-right arrangement mode, the on-screen words on the screen sequentially by the user are positioned in front of the input focus, namely, the on-screen words on the screen sequentially by the user before the position of the input focus is located. When the focus is input at the position shown as 104 in fig. 1, the screen words on the screen sequentially by the user are "tomorrow" and "write" respectively according to the sequence, so that the determined screen word group can be "tomorrow write" or "write". When the input focus is moved between "day" and "write", the screen word on the screen sequentially by the user is "tomorrow" according to the sequence, and the determined screen word group can be "tomorrow".

It should be noted that, because the role of the on-screen phrase is to determine the basis of the error correction result corresponding to the target character string from the multiple pending error correction results, the possibility that the previous on-screen word at the position of the input focus has a semantic relation with the target character string input by the user is generally greater, so in one possible implementation manner, when the on-screen phrase is determined through the input focus, the determined on-screen phrase at least includes the previous on-screen word at the position of the input focus. For example, when the input focus is at the position shown as 104 in fig. 1, the previous on-screen word at the position of the input focus is "writing", and the determined on-screen word group at least needs to include "writing".

S202: and obtaining a result vector according to the on-screen phrase and the deep learning model.

Wherein the result vector is used to identify the probability of different words occurring after the on-screen phrase.

S203: and correcting the error of the target character string input after the screen phrase is on the basis to obtain a plurality of undetermined error correction results.

First, the target character string in the embodiment of the present application is clarified. In the embodiment of the application, the target character string is a character string input after the on-screen phrase, and the target character string can be determined by determining the input focus of the on-screen phrase. Since the on-screen word before the current input focus is the basis for forming the on-screen word group, the input focus is used for identifying the input position where the user is currently required to continue on-screen. When the input focus is determined, the corresponding on-screen phrase can be determined, and under the condition that the input focus is determined and not changed, the character string which is continuously input by the user is based on the character string which is input after the on-screen phrase, namely the target character string. If the candidate item corresponding to the target character string is selected by the user to be on the screen, the position of the screen is immediately after the phrase is on the screen. For example, when the screen phrase is "tomorrow writing", it can be determined that the input focus is behind "writing", and when the user inputs "lnu ' wen" at this time, the "lnu ' wen" is the target character string, and if the user selects the candidate of the error correction result corresponding to "lnu ' wen" to be displayed, the candidate of the display will be displayed next to "writing".

The undetermined error correction result is obtained by correcting the target character string under the condition that the target character string has error input, belongs to correction of the target character string, and is a character string without error input.

In the embodiment of the application, a plurality of pending error correction results can be obtained by correcting the error of the target character string. Taking the target character string as "lnu ' wen" as an example, since "lnu" therein is difficult to find a corresponding candidate item in terms of semantics or word formation, it can be considered as belonging to the misinput, and based on "wen" thereafter, at least two undetermined error correction results, namely "liu ' wen" and "lun ' wen" can be obtained, respectively.

S204: and determining an error correction result corresponding to the target character string from the plurality of undetermined error correction results according to the result vector.

Because the on-screen phrase formed by the last on-screen words on the last screen can more accurately reflect the current actual input requirement of the user in terms of semantics, the error correction result of the target character string corresponding to the on-screen phrase can be determined from the plurality of undetermined error correction results according to different words identified by the result vector corresponding to the on-screen phrase and the probability corresponding to the different words, the error correction result has higher probability of meeting the actual input requirement of the user, and therefore the candidate displayed according to the error correction result is more likely to meet the requirement of the user, and the input experience of the user is improved.

In one possible implementation manner, since the result vector can identify the probability that different words appear after the on-screen phrase, if any one of the pending error correction results is the same as the character string corresponding to the word identified by the result vector, the probability that the word appears in the result vector can be used as the probability that the pending error correction result appears after the on-screen phrase.

After the probability of all or part of the undetermined error correction results relative to the upper screen phrase is determined, the error correction results corresponding to the target character strings can be determined in an assisted mode from the undetermined error correction results according to the probability of the undetermined error correction results as a tuning basis, and therefore candidates corresponding to the error correction results can meet current input requirements of users.

The candidate item corresponding to the error correction result determined according to the result vector can meet the current input requirement of a user, so that the error correction result corresponding to the target character string is a pending error correction result with the probability of being identified by the result vector higher than a preset threshold value in the plurality of pending error correction results. The purpose of the predetermined threshold is to screen out the pending error correction results that can exhibit similar or identical requirements to the current actual input of the user. The predetermined threshold may be set and adjusted according to different application scenarios or different computing requirements.

It should be noted that there may be one or more error correction results corresponding to the determined target character string.

After determining the error correction result corresponding to the target character string from the plurality of undetermined error correction results, the input method can display the corresponding candidate item according to the error correction result. Because the error correction result is more likely to meet the actual input requirement of the user, the candidate displayed according to the error correction result is more likely to meet the user's expectations, and the input experience of the user is improved.

Under the condition that a plurality of determined error correction results exist, candidates corresponding to different error correction results can be displayed successively or in a staggered mode according to a preset rule.

According to the embodiment, it can be seen that a screen phrase on the screen of the user is determined, the screen phrase may include a plurality of screen words on the screen of the user in turn, the screen phrase is input into the deep learning model, a result vector may be obtained, the result vector may identify a probability of occurrence of different words after the screen phrase, and the higher the probability of occurrence of a word identified by the result vector, the greater the probability of occurrence of the word after the screen phrase in terms of semantics. When the target character string input after the on-screen phrase needs to be corrected, a plurality of undetermined error correction results can be obtained according to the target character string. Because the on-screen phrase formed by the last on-screen word on the last screen can more accurately reflect the current actual input requirement of the user in terms of semantics, the error correction result of the target character string relative to the on-screen phrase can be determined from the plurality of undetermined error correction results according to different words identified by the result vector and the probabilities corresponding to the different words, the error correction result has higher possibility of meeting the actual input requirement of the user, and the candidate displayed according to the error correction result is more likely to meet the expectations of the user, and the input experience of the user is improved.

The embodiment of the present application provides a plurality of specific determination manners for S204, and two of them will be described in detail below.

The first determination mode is as follows:

s301: and determining at least one word with probability higher than a preset threshold according to the result vector.

The purpose of this predetermined threshold is to screen out words from the words identified by the result vector that can represent a demand that is similar or identical to the current actual input of the user. The predetermined threshold may be set and adjusted according to different application scenarios or different computing requirements. So that the words with higher probability of being identified in the result vector can be screened out through S301.

S302: and determining character strings corresponding to the at least one word respectively.

After determining at least one word according to S301, a character string corresponding to the at least one word may be further determined, so as to facilitate matching with a pending error correction result in a character string form.

For example, when at least one word is "paper", "job", "what", respectively, the character strings corresponding to the words may be "lun ' wen", "zuo ' ye", and "shen ' me", respectively. It should be noted that the character strings corresponding to the at least one word may be in other forms than the full-screen form, such as a half-full-screen form, a simple spelling form, and the like. The specific form is related to the subsequent matching mode and the object, and will not be described here.

In one possible implementation, after determining the character strings corresponding to at least one word in S301, the character strings and the probabilities of at least one word in S301 being identified by the result vector may be listed in correspondence for subsequent matching.

S303: and determining the undetermined error correction result which is the same as the character string in the undetermined error correction results as an error correction result corresponding to the target character string.

Since the pending error correction result is in the form of a character string, matching with the character string determined in S302 is possible, and there may be two cases of matching and matching not.

Because the words corresponding to the character strings determined in the step S302 are words with probability higher than the preset threshold, namely the possibility that the words corresponding to the character strings appear behind the on-screen phrase is higher, the semantic continuity between the words and the on-screen phrase is good, and the current input requirements of users can be met, so that the method is characterized in that:

if a pending error correction result does not match the character string determined in S302, it may be considered that the semantic continuity between the pending error correction result and the on-screen phrase is poor, and it is difficult to meet the current input requirement of the user. This pending error correction result should not be taken as the error correction result for the target string.

If a pending error correction result matches the character string determined in S302, the semantic continuity between the pending error correction result and the on-screen phrase can be considered to be good, so that the current input requirement of the user can be met. This pending error correction result may be used as the error correction result for the target string.

In one possible implementation, S303 may be implemented by looking up the list of possible correspondences in S302.

The second determination mode:

s401: and establishing the corresponding relation between the word identifications of different words identified by the result vector and the probabilities of the different words.

Because the result vector can identify the probability that different words appear behind the screen phrase, after the word identifiers corresponding to the different words identified by the result vector are distributed one by one, the corresponding relation between the word identifiers of the identified different words and the probabilities of the different words can be determined according to the result vector. For example, the words identified by the result vector include "paper," "job," "what," and the like, and word identification id1 may be allocated to "paper," word identification id2 may be allocated to "job," word identification id3 may be allocated to "why," a correspondence of id1 and 80% of probability identified by "paper" in the result vector may be established, a correspondence of id2 and 70% of probability identified by "job" in the result vector may be established, and a correspondence of id3 and 75% of probability identified by "what" in the result vector may be established.

S402: and establishing a character string query tree according to character strings corresponding to different words identified by the result vector, wherein the character strings in the character string query tree have word identifications of the corresponding words.

After determining the correspondence according to S402, a query tree may be built to prepare for candidate error correction. Because the undetermined error correction result is in the form of a character string, in order to facilitate the query, the establishment of the query tree may be a character string query tree, the nodes in the tree are character strings corresponding to the words not identified by the result vector, and one of the nodes may correspond to the character string of one word. In the established character string query tree, each node is provided with a word identifier of a word corresponding to the character string included in the node.

By establishing a character string query tree, the query efficiency of candidate pending error correction results can be improved, and therefore the probability corresponding to each pending error correction result can be determined efficiently according to the result vector.

S403: and inquiring the character string inquiry tree according to the plurality of undetermined error correction results.

S404: and if the target pending error correction result in the plurality of pending error correction results obtains a query result, determining the probability corresponding to the target pending error correction result according to the word identification in the query result and the corresponding relation.

After determining a plurality of pending error correction results corresponding to the target character string, the word identifiers possibly corresponding to the pending error correction results may be queried according to the character string query tree established in S402, and when querying, one pending error correction result may possibly be capable of querying the corresponding word identifiers, and may not be queried. When a word identifier corresponding to a pending error correction result is queried, a probability corresponding to the word identifier of the pending error correction result in a result vector can be determined according to the corresponding relation established in S401.

It should be noted that, if a pending error correction result fails to query the word identifier according to the string query tree determined in S402, it may be considered that the semantic continuity between the pending error correction result and the on-screen phrase is poor, and it is difficult to meet the current input requirement of the user. This pending error correction result should not be taken as the error correction result for the target string.

S405: and if the probability corresponding to the target undetermined error correction result meets a preset condition, determining the target undetermined error correction result as an error correction result corresponding to the target character string.

For the part of the pending error correction results of the word mark, whether the part of the pending error correction results can be used as the error correction results corresponding to the target character string is required to be judged. The determination can be specifically performed according to the probability determined by the part of the pending error correction results according to the corresponding relation. If any one of the partial pending error correction results, for example, the probability corresponding to the target pending error correction result, meets the preset condition, the target pending error correction result can be determined as the error correction result corresponding to the target character string.

The predetermined condition is set for the purpose of screening out words from the words identified by the result vector that can represent a demand similar to or the same as the current actual input of the user. Therefore, the target pending error correction result with probability meeting the preset condition can show the requirement similar to or the same as the current actual input of the user. The preset conditions can be set and adjusted according to different application scenes or different calculation requirements. Therefore, the pending error correction result with higher probability of being identified in the pending error correction result can be screened out through S405.

Fig. 3 is a block diagram of an apparatus for correcting a character string according to an embodiment of the present application, where the apparatus includes a first determining unit 301, a learning unit 302, an error correcting unit 303, and a second determining unit 304:

the first determining unit 301 is configured to determine a screen phrase, where the screen phrase includes at least one screen word that a user screens in sequence;

the learning unit 302 is configured to obtain a result vector according to the on-screen phrase and the deep learning model, where the result vector is used to identify probabilities of different words appearing after the on-screen phrase;

the error correction unit 303 is configured to perform error correction on the target character string input after the on-screen phrase to obtain a plurality of pending error correction results;

The second determining unit 304 determines the error correction result corresponding to the target character string from the plurality of pending error correction results according to the result vector.

Optionally, the device further comprises a display unit:

It can be seen that, determining the on-screen phrase of the user, where the on-screen phrase may include a plurality of on-screen words that the user sequentially screens, inputting the on-screen phrase into the deep learning model, and obtaining a result vector, where the result vector may identify a probability of occurrence of a different word after the on-screen phrase, and the higher the probability of occurrence of a word identified by the result vector, the greater the probability of occurrence of the word after the on-screen phrase in terms of semantics. When the target character string input after the on-screen phrase needs to be corrected, a plurality of undetermined error correction results can be obtained according to the target character string. Because the on-screen phrase formed by the last on-screen word on the last screen can more accurately reflect the current actual input requirement of the user in terms of semantics, the error correction result of the target character string relative to the on-screen phrase can be determined from the plurality of undetermined error correction results according to different words identified by the result vector and the probabilities corresponding to the different words, the error correction result has higher possibility of meeting the actual input requirement of the user, and the candidate displayed according to the error correction result is more likely to meet the expectations of the user, and the input experience of the user is improved.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 4 is a block diagram illustrating a character string error correction apparatus 400 according to an exemplary embodiment. For example, apparatus 400 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 4, text region determining device 400 may include one or more of the following: a processing component 402, a memory 404, a power supply component 406, a multimedia component 406, an audio component 410, an input/output (I/O) interface 412, a sensor component 414, and a communication component 416.

The processing component 402 generally controls the overall operation of the device 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 may include a multimedia module to facilitate interaction between the multimedia component 406 and the processing component 402.

Memory 404 is configured to store various types of data to support operations at device 400. Examples of such data include instructions for any application or method operating on the apparatus 400, contact data, phonebook data, messages, pictures, videos, and the like. The memory 404 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 406 provides power to the various components of the apparatus 400. The power supply components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 400.

The multimedia component 406 includes a screen between the device 400 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 406 includes a front-facing camera and/or a rear-facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 400 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 410 is configured to output and/or input audio signals. For example, the audio component 410 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 further includes a speaker for outputting audio signals.

The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 414 includes one or more sensors for providing status assessment of various aspects of the apparatus 400. For example, the sensor assembly 414 may detect the on/off state of the device 400, the relative positioning of the components, such as the display and keypad of the apparatus 400, the sensor assembly 414 may also detect the change in position of the apparatus 400 or one component of the apparatus 400, the presence or absence of user contact with the apparatus 400, the orientation or acceleration/deceleration of the apparatus 400, and the change in temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 416 is configured to facilitate communication between the apparatus 400 and other devices in a wired or wireless manner. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication part 416 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

Embodiments of the present application also provide a non-transitory computer-readable storage medium, such as memory 404, comprising instructions executable by processor 420 of apparatus 400 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A non-transitory computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform a method of string error correction, the method comprising:

Fig. 5 is a schematic structural diagram of a server in an embodiment of the present application. The server 500 may vary considerably in configuration or performance and may include one or more central processing units (central processing units, CPU) 522 (e.g., one or more processors) and memory 532, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 542 or data 544. Wherein memory 532 and storage medium 530 may be transitory or persistent. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 522 may be configured to communicate with a storage medium 530 and execute a series of instruction operations in the storage medium 530 on the server 500.

The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input/output interfaces 555, one or more keyboards 556, and/or one or more operating systems 541, such as Windows ServerTM, mac OSXTM, unixTM, linuxTM, freeBSDTM, and the like.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, where the above program may be stored in a computer readable storage medium, and when the program is executed, the program performs steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: read-only memory (ROM), RAM, magnetic disk or optical disk, etc., which can store program codes.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for error correction of a character string, the method comprising:

inputting the screen-on phrase into a deep learning model to obtain a result vector, wherein the result vector is used for identifying the probability of different words appearing after the screen-on phrase;

determining an error correction result corresponding to the target character string from the plurality of undetermined error correction results according to the result vector, wherein the error correction result comprises the following specific steps:

determining at least one word with probability higher than a preset threshold according to the result vector; determining character strings corresponding to the at least one word respectively; establishing a corresponding relation list based on the character strings respectively corresponding to the at least one word and the probability of the at least one word identified by the result vector; determining the pending error correction result which is the same as the character string in the plurality of pending error correction results as an error correction result corresponding to the target character string by searching the corresponding relation list;

Or,

establishing a corresponding relation between word identifications of different words identified by the result vector and probabilities of the different words; establishing a character string query tree according to character strings corresponding to different words identified by the result vector, wherein the character strings in the character string query tree have word identifications of the corresponding words; inquiring the character string inquiry tree according to the plurality of undetermined error correction results; if a target pending error correction result in the plurality of pending error correction results obtains a query result, determining a probability corresponding to the target pending error correction result according to a word identifier in the query result and the corresponding relation; and if the probability corresponding to the target undetermined error correction result meets a preset condition, determining the target undetermined error correction result as an error correction result corresponding to the target character string.

2. The method according to claim 1, wherein the method further comprises:

3. The method of claim 1, wherein the on-screen phrase includes a number of on-screen words that is less than a predetermined threshold, or wherein the on-screen phrase includes on-screen words that are determined based on a spacer.

4. The method of claim 1, wherein the on-screen phrase is determined based on a location of an input focus in an on-screen editing area.

5. The method of claim 4, wherein the on-screen phrase includes at least a previous on-screen phrase of the location of the input focus.

6. A character string error correction apparatus, comprising a first determination unit, a learning unit, an error correction unit, and a second determination unit:

the learning unit is used for inputting the on-screen phrase into a deep learning model to obtain a result vector, and the result vector is used for marking the probability of different words appearing after the on-screen phrase;

the second determining unit determines an error correction result corresponding to the target character string from the plurality of undetermined error correction results according to the result vector;

wherein the second determining unit includes a first determining subunit, a second determining subunit, and a third determining subunit:

the second determining subunit is configured to determine character strings corresponding to the at least one word respectively; establishing a corresponding relation list based on the character strings respectively corresponding to the at least one word and the probability of the at least one word identified by the result vector;

the third determining subunit is configured to determine, by searching the correspondence list, a pending error correction result that is the same as the character string in the plurality of pending error correction results as an error correction result corresponding to the target character string;

or,

the second determining unit comprises a first establishing subunit, a second establishing subunit, a querying subunit, a fourth determining subunit and a fifth determining subunit:

7. The device of claim 6, further comprising a display unit:

8. The apparatus of claim 6, wherein the on-screen phrase comprises a number of on-screen words that is less than a predetermined threshold, or wherein the on-screen phrase comprises an on-screen word that is determined based on a spacer.

9. The apparatus of claim 6, wherein the on-screen phrase is determined based on a location of an input focus in an on-screen editing area.

10. The apparatus of claim 9, wherein the on-screen phrase includes at least a previous on-screen phrase of the location of the input focus.

11. A character string error correction apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the character string error correction method of any of claims 1 to 5.

12. A non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the character string error correction method of any one of claims 1 to 5.