CN117252193A

CN117252193A - Word stock updating method and system based on text input habit

Info

Publication number: CN117252193A
Application number: CN202311301055.1A
Authority: CN
Inventors: 赵岳; 贺敏; 戴建武; 康丽丽
Original assignee: Beijing Thunisoft Information Technology Co ltd
Current assignee: Beijing Thunisoft Information Technology Co ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2023-12-19

Abstract

The application discloses a word stock updating method and system based on text input habits, which are used for solving the technical problem of lower text input accuracy caused by the text input habits. According to the word stock updating scheme based on the text input habit, the input habit error dictionary is transmitted to the input method word stock, so that the false input is reduced in a manner of correcting pinyin and improving the ranking of candidate words, and the input accuracy is improved on the input source head. By transmitting the input habit error dictionary to the word stock for word correction, the self-learning of the word stock for word correction can be realized, and the accuracy and the specialty of a correction system are improved.

Description

Word stock updating method and system based on text input habit

Technical Field

The application relates to the technical field of text collation, in particular to a word stock updating method and system based on text input habits.

Background

Text is an important way of information circulation. When a word worker inputs a text, the word worker may have wrong choice on the word due to the fact that the input person is not concentrated on the word worker, the daily words are not standard, and the like. Under the weak internet environment, after writing a manuscript, word workers usually need to mark errors in the manuscript through word proofreading software, and then the word workers correct the marked errors and can make public release after review by experts.

In implementing the prior art, the inventors found that:

the existing text proofreading system is based on a large-scale word stock, errors are found by matching sentences with words in the word stock, and the proofreading effect can be affected if the word stock is not updated in time. The word collation is a general service in a unit, and the input accuracy cannot be improved from the source whether by installing a third word stock or a special word stock which is suitable for the unit and is arranged by a special person.

Therefore, it is necessary to provide a new word stock updating scheme based on text input habit, so as to solve the technical problem of lower text input accuracy caused by the text input habit.

Disclosure of Invention

The embodiment of the application provides a new word stock updating scheme based on text input habits, which is used for solving the technical problem of lower text input accuracy caused by the text input habits.

Specifically, a word stock updating method based on text input habit comprises the following steps:

acquiring a text to be checked;

identifying an error text in the text to be checked;

acquiring a correction text;

identifying correction text corresponding to the error text in the correction text;

determining the error reason of the error text according to the corrected text;

Establishing an association relation between the error text and the correction text and the error cause;

and according to the error reasons, transmitting the error text and the corrected text corresponding to the error text as an input habit error dictionary to an input method word stock and a correction system word stock.

Further, the error reasons at least comprise at least one of sound similarity, shape similarity, dislocation, multiple words, missing words, repeated words, grammar semantics, traditional Chinese characters, special words and sensitive words;

according to the error reasons, transmitting the error text and the corrected text corresponding to the error text as an input habit error dictionary to an input method word stock and a correction system word stock, wherein the method specifically comprises the following steps of:

when the error source is similar in sound or multi-word, transmitting the error text corresponding to the similar in sound or multi-word of the error source and the corrected text corresponding to the error text to an input habit error dictionary;

when the error source is at least one of similar shape, dislocation, multiple words, missing words, repeated words, grammar semantics, complex words, irregular words and sensitive words, the error text corresponding to at least one of the error source is used as an input habit error dictionary, and the correction text corresponding to the error text is transmitted to a correction system word stock.

Further, the correction text records a correction user ID, the input method word stock has an association relationship with the correction user ID, and the correction system word stock has an association relationship with the correction user ID;

acquiring a correction user ID of the correction text record;

when the error source is similar in sound or multiple in words, determining an input method word stock corresponding to the corrected user ID according to the corrected user ID;

transmitting the error texts corresponding to the voice similarity or the multiple words and the corrected texts corresponding to the error texts to the input method word stock corresponding to the corrected user ID as input habit error dictionary through an uploading interface provided by the input method word stock corresponding to the corrected user ID;

when the error source is at least one of similar shape, dislocation, multiple words, missing words, repeated words, grammar and semantics, complex words, special words and sensitive words, determining a correction system word stock corresponding to the correction user ID according to the correction user ID;

and transmitting at least one corresponding error text and correction text corresponding to the error text in the error source character similarity, dislocation, multiple characters, missing characters, repeated characters, grammar semantics, traditional Chinese characters, special-shaped words and sensitive words to the correction system word stock corresponding to the correction user ID as an input habit error dictionary through an uploading interface provided by the correction system word stock corresponding to the correction user ID.

Further, the error text comprises a suspicious error text or an exact error text;

the method further comprises the steps of:

when the corrected text is the same as the suspicious error text, recording the same times of the corrected text and the suspicious error text;

when the number of times that the corrected text is the same as that of the suspicious error text exceeds a preset checking threshold, adding the suspicious error text into a white list;

when the correction text is different from the suspicious error text, marking the suspicious error text as the suspicious error text;

determining the error reason of the error-correcting text according to the error-correcting text;

establishing an association relation between the error determination text and the correction text and the error cause;

and according to the error reasons, transmitting the error-correcting text and the correction text corresponding to the error-correcting text to an input method word stock and a correction system word stock as an input habit error dictionary.

Further, the determining, according to the corrected text, an error cause of the error text specifically includes:

determining a pinyin sequence corresponding to the error text and a pinyin sequence corresponding to the corrected text;

and when the similarity of the pinyin sequence corresponding to the error text and the pinyin sequence corresponding to the corrected text exceeds a first preset similarity threshold, determining that the error cause of the error text is similar in sound.

Determining a root sequence corresponding to the error text and a root sequence corresponding to the correction text;

and when the similarity of the root sequence corresponding to the error text and the root sequence corresponding to the corrected text exceeds a second preset similarity threshold, determining that the error cause of the error text is similar in shape.

determining a text sequence corresponding to the error text and a text sequence corresponding to the corrected text;

when the text sequence corresponding to the error text and the text sequence corresponding to the corrected text have different arrangements of entity texts, determining that the error reason of the error text is misplacement.

determining the number of text words corresponding to the error text and the number of text words corresponding to the corrected text;

when the number of text words corresponding to the error text is larger than the number of text words corresponding to the corrected text, determining that the error cause of the error text is multi-word or repeated word;

and when the number of text words corresponding to the error text is smaller than the number of text words corresponding to the corrected text, determining that the error reason of the error text is missing words.

And when the corrected text comprises the preset entity text, determining that the error reason of the error text is a sensitive word.

The embodiment of the application also provides a word stock updating system based on the text input habit.

Specifically, a lexicon updating system based on text input habit includes:

the input module is used for acquiring a text to be checked;

the checking module is used for identifying an error text in the text to be checked;

the distribution module is used for acquiring the correction text; the correction text is also used for identifying the correction text corresponding to the error text in the correction text; the method is also used for determining the error reason of the error text according to the corrected text; the method is also used for establishing the association relation between the error text and the correction text and the error cause; and the method is also used for transmitting the error text and the corrected text corresponding to the error text to the input method word stock and the correction system word stock as input habit error dictionary according to the error reasons.

The technical scheme provided by the embodiment of the application has at least the following beneficial effects:

the input habit error dictionary is transmitted to the input method word stock, so that the false input is reduced in a manner of correcting pinyin and improving the ranking of candidate words, and the input accuracy is improved on the input source head. By transmitting the input habit error dictionary to the word stock for word correction, the self-learning of the word stock for word correction can be realized, and the accuracy and the specialty of a correction system are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a flowchart of a word stock updating method based on text input habit provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of a thesaurus updating system based on text input habit according to an embodiment of the present application.

The reference numerals in the drawings are as follows:

100. word stock updating system based on text input habit

11. Input module

12. Calibration module

13. And a distribution module.

Description of the embodiments

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1, in order to solve the technical problem of lower text input accuracy caused by text input habit, the present application provides a word stock updating method based on text input habit, which includes the following steps:

s110: and acquiring a text to be checked.

S120: an erroneous text in the text to be collated is identified.

It will be appreciated that the text to be collated is the original text entered by the word worker. In general, the text to be checked is affected by input habits such as lack of concentration of the inputter, irregular daily words, and the like, and is wrongly selected in terms of vocabulary.

For example, when using pinyin input methods, word workers may have input habits of: habitual press of one key less; the flat and clapper sounds are not clear; fuzzy sound input errors; words that do not pinyin are input through a U-mode to form similar words, etc.

The input habits easily lead to incorrect texts such as voice similarity, shape similarity, dislocation, multiple words, missing words, repeated words, grammar semantics, traditional Chinese characters, irregular words, sensitive words and the like in the text to be checked.

There are a number of ways to identify the erroneous text in the text to be collated. In a specific embodiment provided in the present application, a text to be checked may be subjected to word segmentation processing, so as to obtain a plurality of entity units. And then comparing the entity units with a preset dictionary in a word stock of a checking system to determine an error text.

In another specific embodiment provided in the present application, a text to be checked may be subjected to word segmentation processing, so as to obtain a plurality of entity units. And then calculating the matching smoothness among a plurality of entity units by adopting a smoothness checking algorithm. And when the matching smoothness between the entity units is smaller than a preset smoothness threshold, judging that the corresponding entity units are error texts. Typically, the error text will be displayed with indicia to prompt the word worker to make the modification.

S130: and acquiring correction text.

S140: and identifying corrected text corresponding to the error text in the corrected text.

It can be understood that the corrected text is text corrected by the word worker for the wrong text in the text to be corrected, and the words in the corrected text are usually selected by default without errors.

Further, the corrected text corresponding to the error text in the corrected text can be determined by comparing the text to be corrected with the corrected text. And then establishing the association relation between the corrected text and the error text.

S150: and determining the error reason of the error text according to the corrected text.

It will be appreciated that the corrected text reflects the error cause of the erroneous text, and therefore the error cause of the erroneous text can be determined from the corrected text. Specifically, the error causes at least comprise at least one of voice similarity, shape similarity, dislocation, multiple words, missing words, repeated words, grammar semantics, traditional Chinese characters, special-shaped words and sensitive words.

Further, in a specific embodiment provided in the present application, the determining, according to the corrected text, an error cause of the error text specifically includes:

The error reasons of the similarity of the sounds are expressed as homophone errors, syllable errors, fuzzy sound errors and the like in specific application scenes.

For example, when the erroneous text is "strict" and the corrected text is "forbidden", the pinyin sequences of both are represented by yanjin, and the similarity of the pinyin sequences of both is 1, exceeding the first preset similarity threshold, so that it is determined that the error of the erroneous text is due to homonym error in the homonym error.

When the error text is 'people' and the corrected text is 'people', the pinyin sequences of the error text are renming, the pinyin sequences of the corrected text are renmin, the similarity of the pinyin sequences is 0.89, and the first preset similarity threshold is exceeded, so that the error cause of the error text is determined to be syllable errors in the syllable similarity errors.

When the error text is "lang o wang" and the corrected text is "lang wang", the pinyin sequences of the error text are both displayed as langawang, the pinyin sequences of the corrected text are langwang, the similarity of the two pinyin sequences is 0.89, and the first preset similarity threshold is exceeded, so that the error cause of the error text is determined to be a fuzzy sound error in the sound similarity. Such an error may occur when the word worker inputs the pinyin sequence error of "wang" as "awng" and thus "o wang".

Further, in another specific embodiment provided in the present application, the determining, according to the corrected text, an error cause of the error text specifically includes:

It will be appreciated that the root sequence appears as an ideographic description sequence IDS (Ideographic Description Characters Sequence) and an ideographic descriptor IDC (Ideographic Description Characters) in a particular application scenario.

The algorithm basis for the ideographic description is: theoretically, all Chinese characters can be split into smaller parts, and the parts are ideograms. Therefore, by defining the ideographic descriptor IDC describing the structure of the words and matching with the words already encoded, most of the unencoded words outside the set can be described.

That is, the similarity of the root sequence corresponding to the error text and the root sequence corresponding to the corrected text can be calculated by converting the error text and the corrected text into the ideographic description sequence IDS in the ideographic descriptor IDC.

For example, when the error text is "greedy", and the corrected text is "lean", the similarity of the two root sequences is 0.857, and exceeds the first preset similarity threshold, so that the error cause of the error text is determined to be similarity.

Of course, the radical sequence may also be represented as a stroke sequence in a specific application scenario. When the overlap ratio of the stroke sequence corresponding to the error text and the stroke sequence of the corrected text exceeds a first preset similarity threshold, the error cause of the error text can be determined to be similarity. For example, the error text is "in", and the corrected text is "na". Among the stroke sequences of "na", the rest of the stroke sequences are completely consistent with the stroke sequences of "in", except the stroke sequences of "mouth" of the root. And the stroke sequence in the text is more than half of the total stroke sequence in the Na, so that the error reason of the error text can be judged to be the similarity.

Further, in still another specific embodiment provided in the present application, the determining, according to the corrected text, an error cause of the erroneous text specifically includes:

For example, when the error text is "disinfectant", and the corrected text is "disinfectant", belonging to different arrangements of the entity text, the error cause of the error text is a misalignment.

determining text semantics corresponding to the error text and correcting the text semantics corresponding to the text;

when the text semantics corresponding to the error text and the text semantics corresponding to the corrected text have different arrangements of semantic texts, determining that the error cause of the error text is grammar semantics.

For example, when the error text is "do you eat" and the corrected text is "do you eat", the error cause of the error text is grammar semantics because of different arrangements of the semantic text, that is, the arrangement order of the main guests is adjusted.

when the error text is a complex character, the corrected text is a same-character simplified character, and the error reason of the error text is determined to be the complex character error switching.

For example, when the input method is switched to the traditional Chinese character input due to the wrong operation of the word worker, the situation that the wrong text is the traditional Chinese character and the corrected text is the same-character simplified Chinese character can occur.

when the text semantic of the error text is the same as the text semantic of the corrected text and the pinyin sequence of the error text is the same as the pinyin sequence of the corrected text, the error reason of the error text is the special word.

For example, if the text semantic meaning of the error text "shutter" is the same as the text semantic meaning of the corrected text "shutter" and the pinyin sequence of the error text is the same as the pinyin sequence of the corrected text, the error cause of the error text is a special term.

It will be appreciated that when the error cause of the erroneous text is a multiword error, it is possible that part of the input method starts the association function, and the word worker presses a space more than once when inputting the text, thereby causing the associated word to be input. For example, when the input word is "we", the input method automatically associates "go out", and then, if the word worker inputs the word by pressing the space one more time, the error text of "we go out" is output. After the "out of doors" two words are deleted based on the corrected text, it may be determined that the error cause of the error text is multiple words.

When the error of the error text is a repeated word error, the original input single character is changed into repeated characters due to the fault of the keyboard response, and repeated words are generated. For example, when inputting a single character W, the single character W originally input becomes a repeated character WW due to a failure of the keyboard response, thereby outputting an erroneous text of "me". After deleting a duplicate "i" word based on the corrected text, it may be determined that the error cause of the erroneous text is a duplicate word.

When the error cause of the error text is a missed word error, the input is possibly deleted once more after the completion of the input, and a word is possibly missed.

It will be appreciated that when the corrected text includes a predetermined entity text, for example, the corrected text includes "a facility" and "a institution", it means that the word worker has made a sensitive word adjustment to the original expression in the erroneous text. Therefore, according to whether the preset entity text appears or not, the error reason of the error text is recognized as a sensitive word, and a word stock of a subsequent checking system is added.

Further, in a specific embodiment provided in the present application, the error text includes a suspected error text or an exact error text.

In a specific application scenario, the error-determining text is the text for determining the error. The suspicious text may be wrong or correct. The suspicious text and the correct text will be displayed with different labels to prompt the word worker to make the modification. For example, the missuspected text is underlined with a blue wavy line and the correct text is underlined with a red wavy line.

Typically, the exact text is the erroneous text that has been recorded in the collation system word stock. The error determination text is determined by comparing a plurality of entity units with a preset dictionary in a word stock of a checking system.

The suspicious error text is obtained by recognition according to a smoothness checking algorithm, and may be wrong or correct. Specifically, word segmentation processing is carried out on the text to be checked to obtain a plurality of entity units. And then calculating the matching smoothness among a plurality of entity units by adopting a smoothness checking algorithm. And when the matching smoothness between the entity units is smaller than a preset smoothness threshold, judging that the corresponding entity units are suspicious error texts.

After the suspicious text is identified, the method and the device further adjust the judgment accuracy of the suspicious text according to the corrected text. Specifically, the method further comprises the following steps:

Wherein, when the corrected text is the same as the wrong text, it means that the word worker does not modify the wrong text, the wrong text is likely to be correct. However, it is not excluded that the text worker does not modify the wrong text for other reasons, so the present application will also record the same number of times the correct text is the wrong text, i.e. the number of times the text worker does not modify the wrong text. And when the number of times that the corrected text is the same as that of the suspicious error text exceeds a preset checking threshold, adding the suspicious error text into the white list. In a later collation, the current missuspected text will not be marked.

And when the corrected text is different from the suspicious error text, the suspicious error text can be determined to have errors indeed, so that the mark for the suspicious error text is modified into the suspicious error text, and the subsequent steps are carried out.

S160: and establishing an association relation between the error text and the correction text and the error cause.

S170: and according to the error reasons, transmitting the error text and the corrected text corresponding to the error text as an input habit error dictionary to an input method word stock and a correction system word stock.

The method establishes the association relation between the error text and the correction text and the error cause. Specifically, the present application records error text, corrected text, and error cause in the following format:

Error_word	String	misword
			Correct_word	String	Positive word
Error_type	String	Error type

And further, according to the error reasons, the error text and the corrected text corresponding to the error text are used as an input habit error dictionary and are transmitted to an input method word stock and a correction system word stock.

The input method word stock is a data set for providing words required by an input method system. In order to provide accurate predictions and suggestions, a vast word stock is required to store information of various words and common phrases in an input method system. The input method lexicon typically includes commonly used Chinese characters, words, phrases, and other text segments.

The input habit error dictionary is transmitted to the input method word stock, so that the error input can be reduced by correcting pinyin (fuzzy sound, common spelling error nag- > ang) and improving the ranking of candidate words (renm, the candidate sequence of people, appointments, names and the like is hoped to appear when the announcement is written), and the mode of improving the candidate sequence of common words in the field only is adopted to improve the input accuracy on the input source head.

Word collation word stock is a collection of data used to correct misspellings and punctuation problems. Word-collation word libraries typically contain common spelling errors, mispronounced words, punctuation errors, and other common language errors. It can provide the correct vocabulary, phrases and grammar rules to provide accurate correction suggestions for the user after entering text. And the input habit error dictionary is transmitted to the word stock for word correction, so that the self-learning of the word stock for word correction can be realized, and the accuracy and the specialty of a correction system are improved.

Specifically, according to the error cause, transmitting the error text and the corrected text corresponding to the error text as an input habit error dictionary to an input method word stock and a correction system word stock, which specifically includes:

Furthermore, in the specific implementation process, the inventor finds that the distribution of errors detected by each time of proofreading is basically consistent with the same word worker, that is to say, the error is input the next time with high probability. Whether an input method system or a character checking system, the effective loading of the personalized word stock can not be performed according to the input characteristics of each character worker.

In order to meet the personalized requirements of different users, in a specific implementation manner provided by the application, the corrected text records a corrected user ID, the input method word stock and the corrected user ID have an association relationship, and the correction system word stock and the corrected user ID have an association relationship;

acquiring a correction user ID of the correction text record;

Therefore, the method can effectively load the personalized word stock according to the input characteristics of each word worker, further can reduce the wrong input from the source head of the input method, and can pertinently improve the proofreading capability of the proofreading system.

Referring to fig. 2, to support a word stock updating method based on text input habit, the present application further provides a word stock updating system 100 based on text input habit, including:

an input module 11, configured to obtain a text to be checked;

a proofing module 12 for identifying erroneous text in the text to be proofing;

A distribution module 13 for acquiring correction text; the correction text is also used for identifying the correction text corresponding to the error text in the correction text; the method is also used for determining the error reason of the error text according to the corrected text; the method is also used for establishing the association relation between the error text and the correction text and the error cause; and the method is also used for transmitting the error text and the corrected text corresponding to the error text to the input method word stock and the correction system word stock as input habit error dictionary according to the error reasons.

The proofing module 12 may identify erroneous text in the text to be proofing in a number of ways. In one embodiment provided in the present application, the proofing module 12 may perform word segmentation processing on the text to be proofing to obtain a plurality of entity units. The collation module 12 then compares the plurality of entity units with a pre-set dictionary in the collation system thesaurus to determine the erroneous text.

In another embodiment provided in the present application, the proofing module 12 may perform word segmentation processing on the text to be proofing to obtain a plurality of entity units. And then the correction module 12 calculates the matching smoothness among a plurality of entity units by adopting a smoothness verification algorithm. And when the matching smoothness between the entity units is smaller than a preset smoothness threshold, judging that the corresponding entity units are error texts. Typically, the error text will be displayed with indicia to prompt the word worker to make the modification.

After the distributing module 13 acquires the corrected text, the corrected text corresponding to the error text in the corrected text is identified.

Further, the distribution module 13 may determine corrected text corresponding to the error text in the corrected text by comparing the text to be corrected with the corrected text. The distribution module 13 then establishes an association of the corrected text with the erroneous text.

The distribution module 13 determines the error cause of the error text from the corrected text.

It will be appreciated that the corrected text reflects the error cause of the error text, and thus the distribution module 13 may determine the error cause of the error text from the corrected text. Specifically, the error causes at least comprise at least one of voice similarity, shape similarity, dislocation, multiple words, missing words, repeated words, grammar semantics, traditional Chinese characters, special-shaped words and sensitive words.

Further, in a specific embodiment provided in the present application, the determining, by the distribution module 13, an error cause of the error text according to the corrected text specifically includes:

Further, in another specific embodiment provided in the present application, the determining, by the distribution module 13, an error cause of the error text according to the corrected text specifically includes:

The distribution module 13 may further calculate the similarity of the root sequence corresponding to the erroneous text and the root sequence corresponding to the corrected text by converting the erroneous text and the corrected text into an ideographic description sequence IDS in an ideographic descriptor IDC.

Further, in still another specific embodiment provided in the present application, the determining, by the distribution module 13, an error cause of the error text according to the corrected text specifically includes:

It will be appreciated that when the corrected text includes a predetermined entity text, for example, the corrected text includes "a facility" and "a institution", it means that the word worker has made a sensitive word adjustment to the original expression in the erroneous text. Therefore, the distribution module 13 can identify that the error reason of the error text is a sensitive word according to whether the preset entity text appears or not, and add the sensitive word into a word stock of a subsequent checking system.

Typically, the exact text is the erroneous text that has been recorded in the collation system word stock. That is, the collation module 12 determines the correct text by comparing a plurality of entity units with a predetermined dictionary in the word stock of the collation system.

The suspicious text is obtained by the verification module 12 according to the recognition of the smoothness verification algorithm, and may be wrong or correct. Specifically, the collation module 12 performs word segmentation processing on the text to be collated, so as to obtain a plurality of entity units. And the calibration module 12 calculates the matching smoothness among a plurality of entity units by adopting a smoothness checking algorithm. And when the matching smoothness between the entity units is smaller than a preset smoothness threshold, judging that the corresponding entity units are suspicious error texts.

After the correction module 12 recognizes the error text, the distribution module 13 further adjusts the accuracy of the determination of the error text according to the corrected text. Specifically, the distribution module 13 is further configured to:

Wherein, when the corrected text is the same as the wrong text, it means that the word worker does not modify the wrong text, the wrong text is likely to be correct. It is not excluded that the word worker does not modify the wrong text for other reasons, and the distribution module 13 will therefore also record the same number of times the correct text is compared to the wrong text, i.e. the number of times the word worker does not modify the wrong text. And when the number of times that the corrected text is the same as that of the suspicious error text exceeds a preset checking threshold, adding the suspicious error text into the white list. In a later collation, the current missuspected text will not be marked.

The distribution module 13 establishes an association relationship between the error text, the corrected text and the error cause. And the distribution module 13 transmits the error text and the corrected text corresponding to the error text to an input method word stock and a correction system word stock as input habit error dictionary according to the error reasons.

The distribution module 13 will establish an association of the error text, the corrected text and the cause of the error. Specifically, the distribution module 13 records the error text, the corrected text and the error cause in the following format:

The distribution module 13 transmits the input habit error dictionary to the input method word stock, so that the input accuracy can be improved on the input source head by correcting pinyin (fuzzy sound, common spelling error nag- > ang) and improving candidate word ranking (renm, the candidate sequence of people, appointments, people names and the like is hoped to appear when the bulletins are written), and improving the candidate sequence of some common words only in the field to reduce the false input.

Word collation word stock is a collection of data used to correct misspellings and punctuation problems. Word-collation word libraries typically contain common spelling errors, mispronounced words, punctuation errors, and other common language errors. It can provide the correct vocabulary, phrases and grammar rules to provide accurate correction suggestions for the user after entering text. The distribution module 13 transmits the input habit error dictionary to the word stock for word correction, so that the self-learning of the word stock for word correction can be realized, and the accuracy and the specialty of the correction system are improved.

Specifically, the distributing module 13 transmits the error text and the corrected text corresponding to the error text as the input habit error dictionary according to the error cause, to the input method word stock and the checking system word stock, and specifically includes:

the distribution module 13 transmits the error text and the corrected text corresponding to the error text as an input habit error dictionary according to the error cause to an input method word stock and a correction system word stock, and specifically includes:

acquiring a correction user ID of the correction text record;

It should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the statement "comprises" or "comprising" an element defined by … … does not exclude the presence of other identical elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. The word stock updating method based on the text input habit is characterized by comprising the following steps:

acquiring a text to be checked;

identifying an error text in the text to be checked;

acquiring a correction text;

determining the error reason of the error text according to the corrected text;

2. The method for updating a lexicon based on text input habits of claim 1, wherein the error causes include at least one of phonetic similarity, shape similarity, misalignment, multiple words, missed words, repeated words, grammatical semantics, traditional Chinese characters, special words, and sensitive words;

3. The text-entry habit-based thesaurus updating method as claimed in claim 2, wherein the corrected text record corrects the user ID, the input method thesaurus has an association relationship with the corrected user ID, and the collation system thesaurus has an association relationship with the corrected user ID;

acquiring a correction user ID of the correction text record;

4. The text-entry habit based word stock updating method as claimed in claim 1, wherein the erroneous text includes a suspected erroneous text or an erroneous text;

the method further comprises the steps of:

5. The method for updating a word stock based on text input habits according to claim 2, wherein the determining an error cause of the error text based on the corrected text, specifically comprises:

6. The method for updating a word stock based on text input habits according to claim 2, wherein the determining an error cause of the error text based on the corrected text, specifically comprises:

7. The method for updating a word stock based on text input habits according to claim 2, wherein the determining an error cause of the error text based on the corrected text, specifically comprises:

8. The method for updating a word stock based on text input habits according to claim 2, wherein the determining an error cause of the error text based on the corrected text, specifically comprises:

9. The method for updating a word stock based on text input habits according to claim 2, wherein the determining an error cause of the error text based on the corrected text, specifically comprises:

10. A lexicon update system based on text input habits, comprising:

the input module is used for acquiring a text to be checked;