CN110780749A

CN110780749A - Character string error correction method and device

Info

Publication number: CN110780749A
Application number: CN201810759149.6A
Authority: CN
Inventors: 费腾; 崔欣; 张扬
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2018-07-11
Filing date: 2018-07-11
Publication date: 2020-02-11
Anticipated expiration: 2038-07-11
Also published as: CN110780749B

Abstract

The embodiment of the application discloses a character string error correction method and a character string error correction device, wherein a screen-on phrase on a screen of a user is determined, the screen-on phrase can comprise a plurality of screen-on words on the screen of the user in sequence, the screen-on phrase is input into a deep learning model, a result vector can be obtained, and the result vector can mark the probability of different words appearing behind the screen-on phrase. When the target character string input after the on-screen phrase needs to be corrected, a plurality of results to be corrected can be obtained according to the target character string. Because the on-screen word group formed by the last on-screen words on the screen can semantically and accurately reflect the current actual input requirement of the user, the error correction result corresponding to the target character string relative to the on-screen word group can be determined from the plurality of undetermined error correction results according to the different words identified by the result vector and the corresponding probabilities of the different words, the probability that the error correction result meets the actual input requirement of the user is higher, and the input experience of the user is improved.

Description

Character string error correction method and device

Technical Field

The present application relates to the field of input methods, and in particular, to a method and an apparatus for error correction of a character string.

Background

The input method belongs to a common character input tool, for example, corresponding candidate items can be displayed according to a character string input by a user, and the candidate items selected by the user are displayed on a screen.

When a user inputs a character string, wrong input may occur, such as wrong input of the character, wrong position of the input character in the character string, and the like. If the wrong input character strings are not subjected to automatic error correction, wrong candidate items may be shown for the user, or the candidate items cannot be shown, so that poor input experience is caused.

In the existing error correction mode, an input method divides a character string input by a user, judges that the error correction is carried out on the possibly wrongly input divided words according to a character combination rule, and then displays corresponding candidate items according to an error correction result. However, the error correction method only considers the character string itself and does not consider the actual input requirement of the user, so that the displayed candidate items do not meet the actual input requirement of the user, and the input experience of the user is reduced.

Disclosure of Invention

In order to solve the technical problem, the application provides a character string error correction method and device, according to different words identified by result vectors corresponding to on-screen phrases and corresponding probabilities of the different words, error correction results which are more likely to meet the user expectation can be determined from the multiple undetermined error correction results, and the input experience of the user is improved.

The embodiment of the application discloses the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for correcting a string error, where the method includes:

determining a screen-on phrase, wherein the screen-on phrase comprises at least one screen-on word which is sequentially displayed by a user;

obtaining a result vector according to the on-screen phrase and the deep learning model, wherein the result vector is used for identifying the probability of different words appearing behind the on-screen phrase;

correcting errors of target character strings input after the on-screen phrases to obtain a plurality of undetermined error correction results;

and determining an error correction result corresponding to the target character string from the plurality of results to be corrected according to the result vector.

Optionally, the error correction result corresponding to the target character string is a pending error correction result of which the probability identified by the result vector is higher than a predetermined threshold among the multiple pending error correction results.

Optionally, the determining, according to the result vector, an error correction result corresponding to the target character string from the multiple results to be corrected includes:

determining at least one word with a probability higher than a predetermined threshold according to the result vector;

determining character strings corresponding to the at least one word respectively;

and determining the undetermined error correction result which is the same as the character string in the undetermined error correction results as the error correction result corresponding to the target character string.

establishing a corresponding relation between word identifications of different words identified by the result vector and the probabilities of the different words;

establishing a character string query tree according to character strings corresponding to different words identified by the result vector, wherein the character strings in the character string query tree have word identifications of the corresponding words;

querying the character string query tree according to the plurality of results to be corrected;

if a target undetermined error correction result in the undetermined error correction results obtains a query result, determining the probability corresponding to the target undetermined error correction result according to the word identification in the query result and the corresponding relation;

and if the probability corresponding to the target undetermined error correction result meets a preset condition, determining the target undetermined error correction result as the error correction result corresponding to the target character string.

Optionally, the method further includes:

and displaying corresponding candidate items according to the error correction result corresponding to the target character string.

Optionally, the number of the on-screen words included in the on-screen word group is smaller than a predetermined threshold, or the on-screen words included in the on-screen word group are determined according to the spacer.

Optionally, the on-screen phrase is determined according to a position of an input focus in the on-screen editing area.

Optionally, the on-screen phrase at least includes a previous on-screen word at the position of the input focus.

In a second aspect, an embodiment of the present application provides a character string error correction apparatus, which includes a first determination unit, a learning unit, an error correction unit, and a second determination unit:

the first determining unit is used for determining a screen-on phrase, and the screen-on phrase comprises at least one screen-on word which is sequentially displayed by a user;

the learning unit is used for obtaining a result vector according to the on-screen phrase and the deep learning model, and the result vector is used for identifying the probability of different words appearing after the on-screen phrase;

the error correction unit is used for correcting errors of target character strings input after the on-screen phrase is based on the on-screen phrase to obtain a plurality of undetermined error correction results;

the second determining unit determines an error correction result corresponding to the target character string from the plurality of results to be corrected according to the result vector.

Optionally, the second determining unit includes a first determining subunit, a second determining subunit, and a third determining subunit:

the first determining subunit is configured to determine, according to the result vector, at least one word with a probability higher than a predetermined threshold;

the second determining subunit is configured to determine character strings corresponding to the at least one word respectively;

and the third determining subunit is configured to determine, as the error correction result corresponding to the target character string, an undetermined error correction result that is the same as the character string among the multiple undetermined error correction results.

Optionally, the second determining unit includes a first establishing subunit, a second establishing subunit, a querying subunit, a fourth determining subunit, and a fifth determining subunit:

the first establishing subunit is used for establishing a corresponding relation between word identifications of different words identified by the result vector and the probabilities of the different words;

the second establishing subunit is configured to establish a character string query tree according to character strings corresponding to different words identified by the result vector, where the character strings in the character string query tree have word identifiers of the corresponding words;

the query subunit is configured to query the string query tree according to the multiple results to be corrected;

the fourth determining subunit is configured to determine, if a target undetermined error correction result in the multiple undetermined error correction results obtains a query result, a probability corresponding to the target undetermined error correction result according to the word identifier in the query result and the corresponding relationship;

and the fifth determining subunit is configured to determine, if the probability corresponding to the target undetermined error correction result meets a preset condition, the target undetermined error correction result as the error correction result corresponding to the target character string.

Optionally, the apparatus further comprises a display unit:

and the display unit is used for displaying the corresponding candidate item according to the error correction result corresponding to the target character string.

In a third aspect, an embodiment of the present application provides a string error correction apparatus, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by one or more processors includes a processing unit configured to perform the string error correction method according to any one of the first aspect.

In a fourth aspect, the present application provides a non-transitory computer-readable storage medium, where instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the character string error correction method according to any one of the first aspect.

According to the technical scheme, the screen-on phrase on the screen of the user is determined, the screen-on phrase can comprise a plurality of screen-on words on which the user sequentially screens, the screen-on phrase is input into the deep learning model, a result vector can be obtained, the result vector can mark the probability of different words appearing behind the screen-on phrase, the higher the probability of a word marked by the result vector is, and the higher the probability of the word appearing behind the screen-on phrase is semantically, the higher the probability of the word appearing behind the screen-on phrase is. When the target character string input after the on-screen phrase needs to be corrected, a plurality of results to be corrected can be obtained according to the target character string. Because the on-screen word group formed by the last on-screen words on the screen can semantically and accurately reflect the current actual input requirement of the user, the error correction result corresponding to the target character string can be determined from the plurality of undetermined error correction results according to the different words identified by the result vector and the corresponding probabilities of the different words, the possibility that the error correction result meets the actual input requirement of the user is higher, so that the candidate items displayed according to the error correction result are more likely to meet the expectation of the user, and the input experience of the user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic diagram of a string error correction system according to an embodiment of the present application;

fig. 2 is a flowchart of a method for error correction of a character string according to an embodiment of the present application;

fig. 3 is a device structure diagram of a character string error correction device according to an embodiment of the present application;

fig. 4 is a structural diagram of an apparatus for error correction of a character string according to an embodiment of the present application;

fig. 5 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

Because the traditional error correction mode only simply considers the error of the character string to provide an error correction result and does not consider the actual input requirement of the user, even if the error correction result is determined to accord with the error input characteristic of the character string, the error correction result does not accord with the semantic requirement in combination with the context, and therefore the candidate items displayed according to the determined error correction result do not accord with the actual input requirement of the user, and the input experience of the user is reduced.

To this end, the present application provides a character string error correction method, which may be applied to a terminal in which an input method is set, where the terminal may be, for example, a mobile phone, a notebook computer, a desktop computer, a tablet computer, an e-book reader, a motion Picture Experts Group audio layer 4 (MP 4) player, a wearable device (such as a smart watch), a smart speaker, a laptop, and the like, in which the input method is installed.

The user can input the character string by using the input method set by the terminal and the candidate item displayed by the corresponding character string is selected to realize screen-on. The user may input the character string through an input method using different ways, such as through a virtual keyboard, through a handwriting area, or through voice input, etc. The input character string may be characters in different languages, and may include characters, letters, numbers, symbols, and the like. For example, in the language of Chinese, a string of characters may include Pinyin. It should be noted that, in the embodiment of the present application, besides the common chinese input method (such as pinyin input method, wubi input method, zhuyin input method, etc.), the input method may also include other languages (such as japanese hiragana input method, korean input method, etc.).

After the user selects the candidate item corresponding to the character string to be displayed on the screen, in the embodiment of the application, a result vector corresponding to a displayed phrase formed by the displayed phrases can be determined through the deep learning model and one or more displayed phrases sequentially displayed by the user, and the result vector is used as an error correction basis of a target character string input by the user based on the displayed phrase. In the embodiment of the present application, a word may be a basic language component unit, taking chinese as an example, a word may include one chinese character or a plurality of chinese characters, and a word may also be a spacer, such as a punctuation mark. One word can provide complete semantic information, and can be obtained by performing semantic segmentation on a word group. The on-screen words can be words selected by the user to be on the screen, and the on-screen word group is a word group formed by one or more on-screen words selected by the user to be on the screen. When the screen words input by the user are 'tomorrow writing', two screen words can be obtained by segmentation, namely 'tomorrow' and 'writing'.

For example, in the terminal shown in fig. 1, a conversation window of social software or an edit window of short messages or emails may be displayed.

Wherein 101 is an on-screen editing area for displaying the text content which is on the screen and has not been sent by the user, and the user can edit the text in the area.

102 is a character string editing area for displaying the character string input by the user, and the user can edit the input character string.

103 is a candidate item display area, and a candidate item corresponding to the user input character string may be displayed.

It should be noted that fig. 1 illustrates only one possible input scenario of the embodiment of the present application, and does not limit the application of the embodiment of the present application to only the scenario illustrated in fig. 1, for example, in an input scenario in which a user writes a document, an area within the document that displays characters on a screen of the user may also be an area for editing on the screen, and an area that displays an input character string based on an input location may also be an area for editing a character string.

A result vector corresponding to the on-screen phrase may be determined by the on-screen phrase and deep learning model 105 (which may be, for example, a 1stm model), and the result vector may identify the probability of different words appearing after the on-screen phrase, where the higher the probability of a word identified by the result vector, the higher the semantic probability that the word continues to appear after the on-screen phrase. For example, when the word on the screen is "tomorrow," the words identified in the result vector with higher probability may be "paper", "job", "done", "what", etc.

When a user continues to input a character string based on the on-screen phrase, the character string may be referred to as a target character string in the embodiment of the application, and the input method may correct the error of the target character string when the input method finds that the target character string is incorrectly input, and may obtain at least one result to be corrected. For example, when the target character string is "lnu ' wen" shown in fig. 1, the corresponding result to be corrected may include "liu ' wen" and "lun ' wen". When a plurality of undetermined error correction results are determined, different from the prior art that an error correction result corresponding to a target character string is directly determined from the plurality of undetermined error correction results, in the embodiment of the application, a previously determined vector result is used as a basis for determining the error correction result, and the error correction result corresponding to the target character string is determined from the plurality of undetermined error correction results according to a result vector. For example, as shown in fig. 1, it can be determined from the results of pending error correction "liu 'wen" and "lun' wen" that "lun 'wen" is the error correction result corresponding to the target character string "lnu' wen" according to the result vector of the on-screen phrase "tomorrow writing". And the candidate "paper" corresponding to the error correction result "lun' wen" may be presented in the candidate presentation area 103. However, in the prior art, it is possible to determine the pending error correction result "liu ' wen" as the error correction result corresponding to the target character string "lnu ' wen", and show the candidate "liu" corresponding to the error correction result "liu ' wen" to the user.

It should be noted that the "'" included in the character string may be a word segmentation symbol generated by the input method after segmenting the character string input by the user, and may not be input by the user.

It should be noted that, in the foregoing example, the target character string is in a full screen form, and of course, the specific form of the target character string is related to the input habit of the user, and may be in a full screen form, or in a half-full screen form, or in a simple spelling form. The corresponding result of pending error correction can be in the form of half-full screen, simple spelling, etc.

Because the on-screen word group formed by the last on-screen words on the screen can semantically and accurately reflect the current actual input requirement of the user, the error correction result corresponding to the target character string can be determined from the plurality of undetermined error correction results according to the different words identified by the result vector and the corresponding probabilities of the different words, the possibility that the error correction result meets the actual input requirement of the user is higher, so that the candidate items displayed according to the error correction result are more likely to meet the expectation of the user, and the input experience of the user is improved. From the foregoing input scenario for "lnu' wen", the error correction results determined based on the embodiment of the present application are different from those determined in the prior art, but it is obvious that the error correction results obtained based on the embodiment of the present application take contextual semantic relation into consideration, and can better meet the actual input requirements of the user, and the displayed candidates are more likely to be selected by the user.

Next, a character string error correction method provided by an embodiment of the present application is described with reference to the accompanying drawings, as shown in fig. 2, where the method includes:

s201: and determining the on-screen phrase.

The on-screen phrase comprises at least one on-screen word which is sequentially on-screen by a user.

The embodiment of the application provides various modes for determining the on-screen phrases. Generally speaking, the number of the on-screen words included in the on-screen phrases should not be too large, the input requirements embodied by the on-screen phrases composed of too many on-screen words can come in and go out with the actual input requirements of the user, and the calculated amount can be increased when the on-screen phrases composed of too many on-screen words pass through the deep learning model, so that the system burden is increased.

Therefore, the number of the upper screen words in the upper screen phrase can be limited through a preset threshold value, the upper screen phrase can also be determined through the spacer in the upper screen content, and the spacer plays a role in separating sentences, so that the upper screen phrase determined through the spacer at most comprises one sentence, and the number of the upper screen words in the upper screen phrase can also be limited.

It should be noted that the on-screen phrase includes at least one on-screen word that the user sequentially displays, and which on-screen word or words that the user sequentially displays as the on-screen phrase may be determined according to the input focus.

First, the input focus mentioned in the embodiments of the present application will be described. The input focus is in the editing area of the upper screen, is used for marking the position where the characters appear when the user next screens, and can generally prompt the user in a flashing display mode or be in an invisible state. In the input scenario shown in FIG. 1, 104 may be the input focus to identify that text will appear behind "write" as the user continues to go on the screen. The user can adjust the position of the input focus in the text on the screen according to the input requirement, for example, after the user writes in tomorrow on the screen, if the user wants to write in afternoon after tomorrow, the input focus can be moved to between day and writing.

Under the condition that the characters on the upper screen are displayed in a left-to-right arrangement mode, the words on the upper screen sequentially displayed by the user are positioned in front of the input focus, namely the words on the upper screen sequentially displayed by the user are positioned before the input focus. When the position of the input focus is as 104 in fig. 1, the screen words on the screen by the user are respectively "tomorrow" and "writing" in the sequence, and the phrase on the screen determined by this may be "tomorrow writing" or "writing". When the input focus is moved to between "day" and "write", the words on the screen that the user has sequentially displayed are "tomorrow" in chronological order, and the phrase on the screen thus determined may be "tomorrow".

It should be noted that, because the function of the on-screen phrase is to determine the basis of the error correction result corresponding to the target character string from the multiple undetermined error correction results, and the possibility that the previous on-screen word at the position of the input focus has semantic relation with the target character string input by the user is generally large, in a possible implementation manner, when the on-screen phrase is determined through the input focus, the determined on-screen phrase at least includes the previous on-screen word at the position of the input focus. For example, when the input focus is at the position shown as 104 in fig. 1, the previous word on the screen at the position of the input focus is "write", and the determined word group on the screen at least needs to include "write".

S202: and obtaining a result vector according to the on-screen phrases and the deep learning model.

Wherein the result vector is used to identify the probability of different words appearing after the on-screen phrase.

S203: and correcting errors of target character strings input after the on-screen phrases to obtain a plurality of undetermined error correction results.

First, the target character string in the embodiment of the present application is specified. In the embodiment of the present application, the target character string is a character string input after the on-screen phrase, and the target character string may be determined by determining an input focus of the on-screen phrase. The input focus is used for identifying the input position of the user to continue to input the screen currently, because the screen words before the current input focus are the basis for forming the screen words. When the input focus is determined, the corresponding on-screen phrase can be determined, and under the condition that the input focus is determined and not changed, the character string continuously input by the user is based on the character string input after the on-screen phrase, namely the target character string. And if the candidate item corresponding to the target character string is selected to be displayed on the screen by the user, the position of the displayed screen is immediately behind the displayed phrase. For example, when the phrase "tomorrow writing" is used as the screen-up phrase, it may be determined that the input focus is after "writing", and when the user inputs "lnu ' wen" at this time, "lnu ' wen" is the target character string, and if the user selects the candidate of the error correction result corresponding to "lnu ' wen" to screen up, the candidate of the screen-up will be immediately "writing".

And the undetermined error correction result is obtained by correcting the error of the target character string under the condition that the target character string has error input, belongs to the correction of the target character string and is the character string without error input problem.

In the embodiment of the application, a plurality of undetermined error correction results can be obtained by performing error correction on the target character string. Taking the target character string as "lnu ' wen" as an example, since "lnu" among them is difficult to find out the corresponding candidate in semantics or construction, it can be considered as belonging to wrong input, and based on the following "wen", at least two results to be corrected can be obtained, which are "liu ' wen" and "lun ' wen", respectively.

S204: and determining an error correction result corresponding to the target character string from the plurality of results to be corrected according to the result vector.

Because the on-screen word group formed by the last on-screen words on the screen can semantically and accurately reflect the current actual input requirement of the user, the error correction result corresponding to the target character string relative to the on-screen word group can be determined from the plurality of undetermined error correction results according to the different words identified by the result vector corresponding to the on-screen word group and the corresponding probabilities of the different words, the probability that the error correction result meets the actual input requirement of the user is higher, so that the candidate items displayed according to the error correction result are more likely to meet the expectation of the user, and the input experience of the user is improved.

In a possible implementation manner, since the result vector can identify the probability of different words appearing after the on-screen phrase, if any undetermined error correction result in the undetermined error correction result is the same as the character string corresponding to the word identified by the result vector, the probability of the word embodied in the result vector can be used as the probability of the undetermined error correction result appearing after the on-screen phrase.

After the probability of all or part of the undetermined error correction results relative to the on-screen phrase is determined, the probability of the undetermined error correction results can be used as an optimization basis to assist in determining the error correction results corresponding to the target character strings from the undetermined error correction results, so that the candidate items corresponding to the error correction results can meet the current input requirements of the user.

And determining a candidate item corresponding to the error correction result according to the result vector, wherein the candidate item can meet the current input requirement of the user, and therefore the error correction result corresponding to the target character string is the undetermined error correction result of which the probability identified by the result vector is higher than the preset threshold value. The purpose of setting the predetermined threshold is to screen out the results of pending error correction that can embody requirements similar to or the same as the current actual input of the user from the results of pending error correction. The predetermined threshold value can be set and adjusted according to different application scenarios or different calculation requirements.

It should be noted that there may be one or more error correction results corresponding to the determined target character strings.

After determining the error correction result corresponding to the target character string from the plurality of undetermined error correction results, the input method may show the corresponding candidate item according to the error correction result. Because the error correction result is more likely to meet the actual input requirement of the user, the candidate items displayed according to the error correction result are more likely to meet the expectation of the user, and the input experience of the user is improved.

Under the condition that a plurality of error correction results are determined, candidate items corresponding to different error correction results can be displayed successively or alternately according to a preset rule.

According to the embodiment, the on-screen phrase on the screen of the user is determined, the on-screen phrase may include a plurality of on-screen words that the user sequentially screens, the on-screen phrase is input into the deep learning model, a result vector may be obtained, the result vector may identify the probability that different words appear after the on-screen phrase, the higher the probability of a word identified by the result vector is, the higher the probability that the word continues to appear after the on-screen phrase is semantically, the higher the probability is. When the target character string input after the on-screen phrase needs to be corrected, a plurality of results to be corrected can be obtained according to the target character string. Because the on-screen word group formed by the last on-screen words on the screen can semantically and accurately reflect the current actual input requirement of the user, the error correction result corresponding to the target character string can be determined from the plurality of undetermined error correction results according to the different words identified by the result vector and the corresponding probabilities of the different words, the possibility that the error correction result meets the actual input requirement of the user is higher, so that the candidate items displayed according to the error correction result are more likely to meet the expectation of the user, and the input experience of the user is improved.

The embodiment of the present application provides a plurality of specific determination manners for S204, and two of them will be described in detail below.

The first determination method:

s301: and determining at least one word with the probability higher than a preset threshold value according to the result vector.

The purpose of this predetermined threshold is to screen out words from the words identified by the result vector that can exhibit similar or identical requirements as the user's current actual input. The predetermined threshold value can be set and adjusted according to different application scenarios or different calculation requirements. So the words with higher probability identified in the result vector can be screened out through S301.

S302: and determining character strings corresponding to the at least one word respectively.

After at least one word is determined according to S301, a character string corresponding to each of the at least one word may be further determined, so as to facilitate subsequent matching with an undetermined error correction result in the form of a character string.

For example, when at least one word is "paper", "job", "what", respectively, the corresponding character strings may be "lun ' wen", "zuo ' ye", and "shen ' me", respectively. It should be noted that, besides the full-screen form, the character strings corresponding to the at least one word may also be in other forms such as half-full-screen, simple spelling, etc. Which form is related to the subsequent matching mode and object is not described herein again.

In a possible implementation manner, after determining the character strings corresponding to at least one word in S301 respectively, the character strings and the probabilities of at least one word in S301 identified by the result vector may be used to establish a corresponding relationship list for subsequent matching.

S303: and determining the undetermined error correction result which is the same as the character string in the undetermined error correction results as the error correction result corresponding to the target character string.

Since the result to be corrected is in the form of a character string, it can be matched with the character string determined in S302, and the matching result can be matched with or not matched with the character string.

Because the words corresponding to the character strings determined in S302 are all words whose probability is higher than the predetermined threshold, that is, the probability that the words corresponding to the character strings appear behind the on-screen phrase is high, semantic continuity between the words and the on-screen phrase is good, and the current input requirement of the user can be met better, so:

if one undetermined error correction result is not matched with the character string determined in the step S302, the semantic continuity between the undetermined error correction result and the displayed phrase is considered to be poor, and the current input requirement of the user is difficult to meet. This pending error correction result should not be taken as the error correction result of the target character string.

If one undetermined error correction result is matched with the character string determined in the step S302, the semantic continuity between the undetermined error correction result and the displayed phrase is considered to be good, and the current input requirement of the user can be better met. This result of pending error correction can be taken as the error correction result of the target character string.

In a possible implementation manner, S303 may be implemented by looking up a possible correspondence list in S302.

The second determination method is as follows:

s401: and establishing a corresponding relation between word identifications of different words identified by the result vector and the probabilities of the different words.

Because the result vector can identify the probability of different words appearing after the on-screen word group, after the corresponding word identifiers are distributed to the different words identified by the result vector, the corresponding relation between the word identifiers of the identified different words and the probabilities of the different words can be determined according to the result vector. For example, words identified by the result vector include "paper", "job", "what", etc., and a word identification id1 may be assigned to "paper", a word identification id2 may be assigned to "job", a word identification id3 may be assigned to "what", a correspondence relationship between id1 and 80% of the probability identified by "paper" in the result vector may be established, a correspondence relationship between id2 and 70% of the probability identified by "job" in the result vector may be established, and a correspondence relationship between id3 and 75% of the probability identified by "what" in the result vector may be established.

S402: and establishing a character string query tree according to character strings corresponding to different words identified by the result vector, wherein the character strings in the character string query tree have the word identifications of the corresponding words.

After the correspondence is determined according to S402, a query tree may be built to prepare for candidate error correction. Because the result to be corrected is in the form of character string, the established query tree can be a character string query tree for the convenience of query, the nodes in the tree are character strings which are identified by the result vector and do not correspond to the words, and one of the nodes can correspond to the character string of one word. In the established character string query tree, each node is provided with a word identifier of a word corresponding to the character string included in the node.

By establishing the character string query tree, the query efficiency of the candidate undetermined error correction results can be improved, and therefore the probability corresponding to each undetermined error correction result can be determined efficiently according to the result vector.

S403: and querying the character string query tree according to the plurality of results to be corrected.

S404: and if the target undetermined error correction result in the undetermined error correction results obtains a query result, determining the probability corresponding to the target undetermined error correction result according to the word identification in the query result and the corresponding relation.

After determining a plurality of undetermined error correction results corresponding to the target character string, the word identifiers possibly corresponding to the undetermined error correction results can be queried according to the character string query tree established in S402. When a word identifier corresponding to a result to be determined is found, the probability corresponding to the word identifier of the result to be determined in the result vector can be determined according to the corresponding relationship established in S401.

It should be noted that, if a result to be corrected fails to find a word identifier according to the character string query tree determined in S402, semantic continuity between the result to be corrected and a word group on the screen may be considered to be poor, which is difficult to meet the current input requirement of the user. This pending error correction result should not be taken as the error correction result of the target character string.

S405: and if the probability corresponding to the target undetermined error correction result meets a preset condition, determining the target undetermined error correction result as the error correction result corresponding to the target character string.

And aiming at the part of the undetermined error correction results of the inquired word identifiers, judging whether the part of the undetermined error correction results can be used as the error correction results corresponding to the target character string. The probability determined according to the corresponding relation can be determined according to the part of the results of the pending error correction. If any error correction result in the part of the results to be corrected, for example, the probability corresponding to the target result to be corrected satisfies the preset condition, the target result to be corrected may be determined as the error correction result corresponding to the target character string.

The predetermined condition is set to screen out words from the words identified by the result vector that can exhibit similar or identical requirements as the current actual input by the user. Therefore, the target undetermined error correction result with the probability meeting the preset condition can embody the requirements similar to or the same as the current actual input of the user. The predetermined condition can be set and adjusted according to different application scenes or different calculation requirements. Therefore, the undetermined error correction result with higher identified probability in the undetermined error correction result can be screened out through S405.

Fig. 3 is a structural diagram of an apparatus of a character string error correction apparatus according to an embodiment of the present application, where the apparatus includes a first determining unit 301, a learning unit 302, an error correction unit 303, and a second determining unit 304:

the first determining unit 301 is configured to determine a screen-up phrase, where the screen-up phrase includes at least one screen-up word that is sequentially displayed by a user;

the learning unit 302 is configured to obtain a result vector according to the on-screen phrase and the deep learning model, where the result vector is used to identify probabilities of different words appearing after the on-screen phrase;

the error correction unit 303 is configured to perform error correction on a target character string input after the on-screen phrase is based on the on-screen phrase, so as to obtain a plurality of results to be corrected;

the second determining unit 304 determines an error correction result corresponding to the target character string from the plurality of results to be corrected according to the result vector.

Optionally, the apparatus further comprises a display unit:

It can be seen that a screen-up phrase on the screen of the user is determined, the screen-up phrase may include a plurality of screen-up words on which the user sequentially screens, the screen-up phrase is input into the deep learning model, a result vector may be obtained, the result vector may identify the probability of different words appearing after the screen-up phrase, the higher the probability of a word identified by the result vector, the higher the semantic probability that the word continues to appear after the screen-up phrase. When the target character string input after the on-screen phrase needs to be corrected, a plurality of results to be corrected can be obtained according to the target character string. Because the on-screen word group formed by the last on-screen words on the screen can semantically and accurately reflect the current actual input requirement of the user, the error correction result corresponding to the target character string can be determined from the plurality of undetermined error correction results according to the different words identified by the result vector and the corresponding probabilities of the different words, the possibility that the error correction result meets the actual input requirement of the user is higher, so that the candidate items displayed according to the error correction result are more likely to meet the expectation of the user, and the input experience of the user is improved.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 4 is a block diagram illustrating a character string error correction apparatus 400 according to an example embodiment. For example, the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 4, text region determination device 400 may include one or more of the following components: processing components 402, memory 404, power components 406, multimedia components 406, audio components 410, input/output (I/O) interfaces 412, sensor components 414, and communication components 416.

The processing component 402 generally controls the overall operation of the device 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 can include a multimedia module to facilitate interaction between the multimedia component 406 and the processing component 402.

The memory 404 is configured to store various types of data to support operations at the device 400. Examples of such data include instructions for any application or method operating on the device 400, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 404 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power supply components 406 provide power to the various components of device 400. The power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 400.

The multimedia component 406 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 406 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 400 is in an operational mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 410 is configured to output and/or input audio signals. For example, audio component 410 includes a Microphone (MIC) configured to receive external audio signals when apparatus 400 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 also includes a speaker for outputting audio signals.

The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 414 includes one or more sensors for providing various aspects of status assessment for the apparatus 400. For example, the sensor assembly 414 may detect the open/closed state of the device 400, the relative positioning of the components, such as the display and keypad of the apparatus 400, the sensor assembly 414 may also detect a change in the position of the apparatus 400 or a component of the apparatus 400, the presence or absence of user contact with the apparatus 400, the orientation or acceleration/deceleration of the apparatus 400, and a change in the temperature of the apparatus 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The apparatus 400 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

Embodiments of the present application also provide a non-transitory computer-readable storage medium, such as the memory 404, comprising instructions executable by the processor 420 of the apparatus 400 to perform the above-described method. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a string error correction method, the method comprising:

Fig. 5 is a schematic structural diagram of a server in an embodiment of the present application. The server 500 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 522 (e.g., one or more processors) and memory 532, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 522 may be configured to communicate with the storage medium 530, and execute a series of instruction operations in the storage medium 530 on the server 500.

The server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input-output interfaces 555, one or more keyboards 556, and/or one or more operating systems 541, such as Windows Server, Mac OSXTM, UnixTM, LinuxTM, FreeBSDTM, etc.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.

It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for string error correction, the method comprising:

2. The method of claim 1, wherein the error correction result corresponding to the target string is a pending error correction result of the plurality of pending error correction results that has a probability identified by the result vector that is higher than a predetermined threshold.

3. The method according to claim 1 or 2, wherein the determining an error correction result corresponding to the target character string from the plurality of results to be corrected according to the result vector comprises:

4. The method according to claim 1 or 2, wherein the determining an error correction result corresponding to the target character string from the plurality of results to be corrected according to the result vector comprises:

5. The method according to claim 1 or 2, characterized in that the method further comprises:

6. The method according to claim 1 or 2, wherein the number of the on-screen words included in the on-screen phrase is less than a predetermined threshold, or the on-screen words included in the on-screen phrase are determined according to the spacer.

7. The method of claim 1 or 2, wherein the on-screen phrase is determined according to a position of an input focus in an on-screen editing area.

8. The method according to claim 7, wherein the on-screen phrase includes at least one on-screen word before the position of the input focus.

9. A character string error correction apparatus characterized by comprising a first determination unit, a learning unit, an error correction unit, and a second determination unit:

10. A string correction apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs configured to be executed by the one or more processors comprise instructions for performing the string correction method of any one of claims 1 to 8.

11. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the string error correction method of any one of claims 1 to 8.