CN105404903B

CN105404903B - Information processing method and device and electronic equipment

Info

Publication number: CN105404903B
Application number: CN201410468559.7A
Authority: CN
Inventors: 贾沛; 孙林; 薛苏葵; 李众庆
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2014-09-15
Filing date: 2014-09-15
Publication date: 2020-06-23
Anticipated expiration: 2034-09-15
Also published as: CN105404903A

Abstract

The embodiment of the invention discloses an information processing method, an information processing device and electronic equipment, wherein keywords are extracted from electronic text characters associated with information to be recognized of an electronic text, the acquired keywords, synonyms of the keywords and a training database taking the words associated with the extracted keywords as a core are used for correcting recognized data, the acquired electronic text content is a text after manual correction, the recognition accuracy is 100%, and the acquired electronic text content is associated with the information to be recognized of the electronic text, so that the words recognized through optical character recognition are corrected through the information processing method provided by the embodiment of the application, and the accuracy of the optical character recognition is improved.

Description

Information processing method and device and electronic equipment

Technical Field

The present invention relates to the field of information processing technologies, and in particular, to an information processing method and apparatus, and an electronic device.

Background

Optical Character Recognition (OCR) is a process of scanning text data to obtain image files, analyzing the image files, and acquiring text and layout information, and is an important aspect in the field of automatic Recognition technology research and application.

At present, the accuracy of recognizing a printed text by optical character recognition is high, and the recognition rate of a handwritten text is low, so how to improve the accuracy of recognizing the handwritten text by OCR becomes a problem to be solved urgently.

Disclosure of Invention

The invention aims to provide an information processing method, an information processing device and electronic equipment, which are used for improving the accuracy of recognition of handwritten texts through OCR recognition.

In order to achieve the purpose, the invention provides the following technical scheme:

an information processing method is applied to electronic equipment and used for acquiring electronic text content associated with information to be identified as electronic text; extracting keywords in the acquired electronic text content; obtaining synonyms of the extracted keywords and words associated with the extracted keywords; the extracted keywords, synonyms of the keywords, and words associated with the extracted keywords constitute a lexicon; the method comprises the following steps:

identifying first information in the information to be identified to obtain a first word;

searching whether a second word similar to the first word exists in the word stock;

and when a second word similar to the first word exists in the word stock, replacing the first word with the second word.

In the above method, preferably, the identifying the first word of the first information in the information to be identified includes:

performing optical character recognition on first information in the information to be recognized to obtain a first word;

said searching for the presence of a second word in said thesaurus that is similar to said first word comprises:

and searching whether a second word with a font similar to that of the first word exists in the word stock.

In the above method, preferably, the information to be recognized is speech; said searching for the presence of a second word in said thesaurus that is similar to said first word comprises:

and searching whether a second word with the pronunciation similar to that of the first word exists in the word bank.

The method preferably further includes, after replacing the first word with the second word:

recording the number of times the first word is replaced by the second word.

In the method, preferably, when the number of times that the first word is replaced by the second word is greater than a first preset threshold, the first information in the information to be recognized is recognized as the second word in the process of recognizing the information to be recognized.

The above method, preferably, further comprises:

when at least two second words similar to the first words exist in the word stock, displaying the at least two second words;

selecting a second word according to a selection instruction triggered by a user;

the replacing the first word with the second word comprises:

and replacing the first word with a second word selected according to a selection instruction triggered by the user.

The above method, preferably, further comprises:

when the word library does not have a word similar to the first word, judging whether the first word is a word selected according to a selection instruction triggered by a user;

when the first word is a word selected according to a selection instruction triggered by a user, recording the number of times that the first word is triggered and selected by the user;

and when the number of times of triggering and selecting the first word by the user is greater than a second preset threshold value, adding the first word into the word stock.

An information processing apparatus applied to an electronic device, the electronic device having access to a thesaurus, the thesaurus comprising: extracting keywords from the obtained electronic text content, synonyms of the extracted keywords, and words associated with the extracted keywords; wherein, the obtained electronic text content is as follows: electronic text content associated with information to be identified as a text; the device comprises:

the identification module is used for identifying first information in the information to be identified to obtain a first word;

the searching module is used for searching whether a second word similar to the first word exists in the word bank;

and the replacing module is used for replacing the first word with a second word when the second word similar to the first word exists in the word stock.

The above apparatus, preferably, the identification module includes:

the first recognition unit is used for carrying out optical character recognition on first information in the information to be recognized to obtain a first word;

the searching module comprises:

and the first searching unit is used for searching whether a second word with a character pattern similar to that of the first word exists in the word stock.

In the above apparatus, preferably, the information to be recognized is speech;

the identification module comprises:

the second recognition unit is used for carrying out voice recognition on first information in the information to be recognized to obtain a first word;

the searching module comprises:

and the second searching unit is used for searching whether a second word with the pronunciation similar to that of the first word exists in the word bank.

The above apparatus, preferably, further comprises:

the first recording module is used for recording the times of replacing the first word by the second word after the replacing module replaces the first word by the second word.

In the above apparatus, preferably, the recognition module is further configured to, when the number of times that the first word is replaced by the second word is greater than a first preset threshold, recognize the first information in the information to be recognized as the second word in the process of recognizing the information to be recognized.

The above apparatus, preferably, further comprises:

the display module is used for displaying at least two second words similar to the first word when the at least two second words exist in the word stock;

the selection module is used for selecting a second word according to a selection instruction triggered by a user;

the replacement module is specifically configured to replace the first word with the second word selected according to the selection instruction triggered by the user.

The above apparatus, preferably, further comprises:

the judging module is used for judging whether the first word is a word selected according to a selection instruction triggered by a user when the word similar to the first word does not exist in the word bank;

the second recording module is used for recording the times of triggering and selecting the first word by the user when the first word is the word selected according to the selection instruction triggered by the user;

and the adding module is used for adding the first word into the word stock when the number of times of triggering and selecting the first word by the user is greater than a second preset threshold value.

An electronic device comprising the information processing apparatus as described above.

According to the scheme, the information processing method is applied to the electronic equipment, and the electronic text content associated with the information to be identified, which is to be identified as the electronic text, is obtained; extracting keywords in the acquired electronic text content; obtaining synonyms of the extracted keywords and words associated with the extracted keywords; the extracted keywords, synonyms of the keywords, and words associated with the extracted keywords constitute a lexicon; the method comprises the following steps: identifying first information in the information to be identified to obtain a first word; searching whether a second word similar to the first word exists in the word stock; and when a second word similar to the first word exists in the word stock, replacing the first word with the second word.

In the embodiment of the application, the keywords are extracted from the electronic text words associated with the information to be recognized as the electronic text, the obtained keywords, the synonyms of the keywords and the words associated with the extracted keywords are used as the core of the training database to correct the recognized data, the obtained electronic text content is the text after manual correction, the recognition accuracy is 100%, and the obtained electronic text content is associated with the information to be recognized as the electronic text, so that the words recognized through the optical characters are corrected through the information processing method provided by the embodiment of the application, and the accuracy of the optical character recognition is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a flowchart of an implementation of an information processing method according to an embodiment of the present application;

fig. 2 is a flowchart of another implementation of an information processing method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an identification module according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a lookup module according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an identification module according to an embodiment of the present disclosure;

fig. 7 is another schematic structural diagram of a lookup module according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The information processing method and device provided by the embodiment of the application are applied to electronic equipment.

In the embodiment of the application, electronic text content associated with information to be identified as the electronic text is acquired in advance; extracting keywords in the acquired electronic text content; obtaining synonyms of the extracted keywords and words associated with the extracted keywords; the extracted keywords, synonyms of the keywords, and words associated with the extracted keywords constitute a thesaurus.

The information to be recognized can be characters to be subjected to optical character recognition, such as handwritten characters or print characters; the information to be recognized may also be speech information.

Different application scenarios, information to be identified and electronic text content associated with the information to be identified may also differ. For example, at a meeting or in a class of a school, a PPT electronic document is usually used, and at the meeting or in the class, besides the PPT electronic document, handwriting or communication of live personnel is usually performed, so that the information to be recognized can be handwritten characters or voice segments, and the electronic text content associated with the information to be recognized can be the electronic text content in the PPT electronic document.

Of course, in addition to PPT documents, commonly used electronic documents include: WORD documents, PDF documents, etc., and thus the electronic text content associated with the information to be identified may also be the electronic text content in the WORD documents, PDF documents.

In other words, the information to be recognized is information generated based on the electronic text content.

Wherein the words associated with the extracted keyword may include lower-level words of the extracted keyword or words having a high degree of correlation with the extracted keyword. If the extracted keyword is "traffic", then the words associated with "traffic" may include: buses, subways, taxis, motor vehicles, roads, lines, congestion, peaks, slowness, and the like.

The specific words which are the words with high relevance to the extracted keywords can be determined through statistics. For example, the probability that a certain word appears when the extracted keyword appears may be counted. Specifically, when the extracted keyword occurs and the probability of occurrence of a certain word is greater than a third preset threshold, it may be determined that the certain word is a word with a high degree of correlation with the extracted keyword.

Referring to fig. 1, fig. 1 is a flowchart of an implementation of an information processing method according to an embodiment of the present application, which may include:

step S11: identifying first information in the information to be identified to obtain a first word;

step S12: searching whether a second word similar to the first word exists in the word stock;

in the embodiment of the present application, the word stock is a word stock constructed according to electronic text content associated with information to be recognized as an electronic text.

Step S13: and when a second word similar to the first word exists in the word stock, replacing the first word with the second word.

And after a second word similar to the first word is found in the constructed word stock, replacing the first word with the second word, namely correcting the first word obtained by recognition into the second word.

If the second word is the same word as the first word, no substitution may be made.

According to the information processing method provided by the embodiment of the application, the keywords are extracted from the electronic text characters associated with the information to be recognized which is to be recognized as the electronic text, the obtained keywords, the synonyms of the keywords and the words associated with the extracted keywords are used as the core of the training database to correct the recognized data, the obtained electronic text content is the text which is manually corrected, the recognition accuracy is 100%, and the obtained electronic text content is associated with the information to be recognized which is to be recognized as the electronic text, so that the recognized characters are corrected through the information processing method provided by the embodiment of the application, and the recognition accuracy is improved.

In the foregoing embodiment, preferably, the recognizing the first information in the information to be recognized to obtain the first word may include:

and carrying out optical character recognition on first information in the information to be recognized to obtain a first word.

In this embodiment of the application, the information to be recognized may be handwritten characters or print characters, and for the handwritten characters and the print characters, the information to be recognized may be recognized through Optical Character Recognition (OCR). For handwritten characters, the handwriting recognition engine can also recognize the handwritten characters by detecting tracks in the handwriting process.

Correspondingly, the searching whether a second word similar to the first word exists in the word stock may include:

And when a second word with the character pattern similar to that of the first word exists in the word stock, indicating that the second word with the character pattern similar to that of the first word exists in the word stock.

In the embodiment of the present application, words having a font similar to that of the first word may be predetermined. Whether the font of a certain word (hereinafter referred to as a third word) is similar to the font of the first word or not can be judged by judging whether the structures of the characters at the corresponding positions in the first word and the third word are similar or not. Specifically, when the structures of the characters at the corresponding positions in the first word and the third word are similar, the font style of the first word is determined to be similar to that of the third word. Whether the glyphs of the two characters at the corresponding positions are similar can be judged by the following methods:

the four corner features of two characters can be compared; when the four corner features of the two characters are the same, determining the characters with the similar characters in the character shapes; otherwise, determining that the glyphs of the two words are different.

Alternatively, the first and second electrodes may be,

or comparing whether the first two strokes and the last two strokes of the two characters are the same during writing, and if the first two strokes and the last two strokes of the two characters are the same, determining that the characters of the two characters are similar; otherwise, determining that the glyphs of the two words are different.

And forming a character pattern similar word phrase corresponding to the first word by using all third words with character patterns similar to the character pattern of the first word. In the embodiment of the present application, when searching whether a second word with a font similar to that of the first word exists in the word bank, if the second word in the constructed word bank is one of word groups of the font similar word corresponding to the first word, it is determined that the second word is a word with a font similar to that of the first word, that is, the second word with a font similar to that of the first word exists in the word bank.

If a plurality of second words with the font similar to the first word exist in the word stock, the second word with the font highest in similarity with the first word in the word stock can be determined as the word with the font similar to the font of the first word.

In the above embodiment, preferably, the information to be recognized may also be speech. The identifying the first information in the information to be identified to obtain the first word may include:

and performing voice recognition on first information in the information to be recognized to obtain a first word.

And when a second word with the pronunciation similar to that of the first word exists in the word stock, indicating that the second word similar to the first word exists in the word stock.

In the embodiment of the present application, words with pronunciation similar to that of the first word may be predetermined. Whether the pronunciation of a word (hereinafter referred to as a third word) is similar to the pronunciation of the first word or not can be determined by determining whether the pronunciations of the characters at the corresponding positions in the first word and the third word are similar to each other. Specifically, when the pronunciations of the characters at the corresponding positions in the first word and the third word are similar, the pronunciations of the first word and the third word are determined to be similar. Whether the pronunciation of the two characters at the corresponding positions is similar can be judged by the following methods:

the pinyin of the two characters can be directly compared to determine whether the pinyin of the two characters is the same, and if the pinyin of the two characters is the same, the pronunciation of the two characters is determined to be similar.

Alternatively, the first and second electrodes may be,

whether the voice templates corresponding to the two characters are similar or not can be compared, and when the similarity of the voice templates corresponding to the two characters is larger than a fourth preset threshold value, the pronunciation of the two characters is similar.

And all the third words with the pronunciation similar to that of the first word form the pronunciation similar word group corresponding to the first word. In the embodiment of the application, when searching whether a second word with a pronunciation similar to that of the first word exists in the word stock, if the second word in the constructed word stock is one of the word groups with the pronunciation similar to that of the first word, the second word is determined to be a word with the pronunciation similar to that of the first word, that is, the second word with the pronunciation similar to that of the first word exists in the word stock.

If a plurality of second words with pronunciation similar to that of the first word exist in the word stock, the second word with the pronunciation similar to that of the first word in the word stock can be determined as the word with the pronunciation similar to that of the first word.

In the foregoing embodiment, preferably, after replacing the first word with the second word, the method may further include:

recording the number of times the first word is replaced by the second word.

In the embodiment of the application, the times of replacing the first word are accumulated every time the first word is replaced by the second word.

Further, when the number of times of replacement of the first word by the second word is greater than a first preset threshold, in the process of identifying the information to be identified, identifying the first information in the information to be identified as the second word.

That is, when the number of times that the first word is replaced by the second word is greater than a first preset threshold, the first information is directly recognized as the second word, and after the first word is not recognized any more, the second word is used for replacing the first word, so that the recognition efficiency and accuracy are improved.

In the foregoing embodiment, preferably, another implementation flowchart of the information processing method provided in this embodiment is shown in fig. 2, and may include:

step S21: identifying first information in the information to be identified to obtain a first word;

step S22: searching whether a second word similar to the first word exists in the word stock, and if the second word similar to the first word exists in the word stock, executing step S25; if the second word is at least two, performing step S23; (ii) a

Step S23: displaying the at least two second words;

step S24: selecting a second word according to a selection instruction triggered by a user;

step S25: replacing the first word with the second word;

when at least two second words similar to the first word exist in the word stock, the first word is replaced by the second word selected according to a selection instruction triggered by the user.

In the foregoing embodiment, preferably, when at least two second words similar to the first word exist in the thesaurus, after selecting a second word according to a selection instruction triggered by a user, the method may further include:

counting the number of times that each second word is triggered and selected by the user;

determining a second word which is triggered and selected by the user for the maximum times;

and when the number of times that the determined second word is triggered and selected is larger than a fifth preset threshold value, in the process of identifying the information to be identified, identifying the first information in the information to be identified as the determined second word.

That is to say, when the number of times that the second word is triggered and selected by the user is the largest and the number of times that the second word is triggered and selected by the user is greater than a fifth preset threshold, in the process of identifying the information to be identified, the first information in the information to be identified is identified as the second word that is triggered and selected by the user for the largest number of times and that is greater than the fifth preset threshold.

In the above embodiment, when there is no second word similar to the first word in the word stock, the first word is considered to be correctly recognized, and the first word may not be processed.

In the above embodiment, it is preferable that the method further includes:

Corresponding to the method embodiment, an embodiment of the present application further provides an information processing apparatus, and a schematic structural diagram of the information processing apparatus provided in the embodiment of the present application is shown in fig. 3, and the information processing apparatus may include:

an identification module 31, a search module 32 and a replacement module 33; wherein the content of the first and second substances,

the identification module 31 is configured to identify first information in the information to be identified to obtain a first word;

the searching module 32 is configured to search the word bank for whether a second word similar to the first word exists;

The replacing module 33 is configured to replace the first word with a second word similar to the first word when the second word exists in the thesaurus.

According to the information processing device provided by the embodiment of the application, the keywords are extracted from the electronic text characters associated with the information to be recognized which is to be recognized as the electronic text, the obtained keywords, the synonyms of the keywords and the words associated with the extracted keywords are used as the core of the training database to correct the recognized data, the obtained electronic text content is the text which is manually corrected, the recognition accuracy is 100%, and the obtained electronic text content is associated with the information to be recognized which is to be recognized as the electronic text, so that the recognized characters are corrected through the information processing method provided by the embodiment of the application, and the recognition accuracy is improved.

In the above embodiment, a schematic structural diagram of the identification module 31 is shown in fig. 4, and may include:

a first recognition unit 41, configured to perform optical character recognition on first information in the information to be recognized to obtain a first word;

Accordingly, a schematic structural diagram of the search module 32 is shown in fig. 5, and may include:

a first searching unit 51, configured to search the thesaurus for whether a second word with a font similar to that of the first word exists.

In the foregoing embodiment, preferably, the information to be recognized is speech.

Another schematic structural diagram of the identification module 31 is shown in fig. 6, and may include:

the second recognition unit 61 is configured to perform voice recognition on first information in the information to be recognized to obtain a first word;

correspondingly, another structural diagram of the search module 32 is shown in fig. 7, and may include:

a second searching unit 71, configured to search, in the thesaurus, whether a second word with a pronunciation similar to that of the first word exists.

On the basis of the embodiment shown in fig. 3, another schematic structural diagram of the information processing apparatus provided in the embodiment of the present application is shown in fig. 8, and may further include:

a first recording module 81, configured to record, after the replacing module replaces the first word with the second word, the number of times that the first word is replaced by the second word.

Further, the identification module 31 is further configured to identify, when the number of times that the first word is replaced by the second word is greater than a first preset threshold, the first information in the information to be identified as the second word in the process of identifying the information to be identified.

It should be noted that the recording module 81 can also be applied to the embodiments shown in fig. 3 to 4.

In the foregoing embodiment, preferably, on the basis of the embodiment shown in fig. 3, a schematic diagram of another structure of the information processing apparatus provided in the embodiment of the present application is shown in fig. 9, and may further include:

a display module 91 and a selection module 92; wherein the content of the first and second substances,

the display module 91 is configured to display at least two second words similar to the first word when the at least two second words exist in the thesaurus;

the selection module 92 is configured to select a second word according to a selection instruction triggered by a user;

the replacing module 33 is further configured to replace the first word with the second word selected according to the selection instruction triggered by the user.

It should be noted that the display module 91 and the selection module 92 may also be applied to the embodiment shown in any one of fig. 4 to 8.

In the foregoing embodiment, preferably, on the basis of the embodiment shown in fig. 9, a schematic diagram of another structure of the information processing apparatus provided in the embodiment of the present application is shown in fig. 10, and may further include:

a counting module 101, configured to count, when at least two second words similar to the first word exist in the word bank, the number of times that each second word is triggered and selected by the user after the selecting module 92 selects the second word according to a selection instruction triggered by the user;

the determining module 102 is configured to determine a second word that is selected by the user with the largest number of times;

the identifying module 31 may be further configured to identify, when the number of times that the determined second word is triggered to be selected is greater than a fifth preset threshold, in the process of identifying the information to be identified, the first information in the information to be identified as the second word determined by the determining module 102.

In the foregoing embodiment, preferably, on the basis of the embodiment shown in fig. 3, a schematic diagram of another structure of the information processing apparatus provided in the embodiment of the present application is shown in fig. 11, and may further include:

the judging module 111 is configured to, when a word similar to the first word does not exist in the word bank, judge whether the first word is a word selected according to a selection instruction triggered by a user;

the second recording module 112 is configured to record, when the first word is a word selected according to a selection instruction triggered by a user, the number of times that the first word is triggered and selected by the user;

and the adding module 113 is configured to add the first word into the word stock when the number of times that the first word is triggered and selected by the user is greater than a second preset threshold.

An embodiment of the present application further provides an electronic device, which has the information processing apparatus described in any of the apparatus embodiments above.

The electronic device can be in various forms, such as a mobile phone, a palm computer, a tablet computer, a PC and the like.

A specific implementation of the embodiments of the present application is illustrated below.

Suppose a PPT electronic document is used in a meeting, the text in the PPT electronic document is: "Beijing governs traffic congestion 'urban disease': in recent years, Beijing has actively managed urban traffic congestion: the bus priority development strategy is adhered to, rail transit develops from 114 kilometers of 4 lines to 465 kilometers of 17 lines in ten years, the bus trip proportion is improved from 28% to 46%, and the rail transit is located at the first place of each major city; the traffic demand side management is implemented, and the over-fast growth of motor vehicles is restrained; promote scientific and technological innovation, improve traffic operating efficiency. "

Extracting keywords from the electronic text, wherein the extraction result is as follows:

governing, traffic, congestion, city, public transit, trip, management, operation, route, strategy, development, propulsion.

Expanding the keywords, wherein the expanded words include synonyms or words with high relevancy, and in this example, expanding the extracted keywords is as follows:

administering, treating, organizing, grooming, transporting, public transportation, subway, taxi, motor vehicle, road, street, road, line, congestion, peak, slow, city, urban, public transportation, trip, operation, line, planning, strategy, tactical, development, promotion, propulsion.

And forming a word bank by the expanded words.

Generally, the conference summary needs to be sorted after the conference, in order to improve the sorting efficiency, handwritten characters can be recognized through OCR recognition, voices of speakers are recognized through voice recognition, and in the recognition process, recognized characters can be corrected through the word bank so as to improve the recognition accuracy. For example,

for the OCR recognition results:

if the word obtained by OCR recognition is 'treatment', and the word 'treatment' also exists in the word stock, the 'treatment' is correct;

if the word identified by the OCR is 'metallurgical theory', and the word library does not have the word 'metallurgical theory' but has the word 'treatment' similar to the character shape of the 'metallurgical theory', the word 'metallurgical theory' is replaced by the word 'treatment';

similarly, if the word obtained through OCR recognition is 'subway', and the word 'subway' also exists in the word stock, the 'subway' is correct;

and if the word is 'altar rank' obtained through OCR recognition, the word library does not have the word 'altar rank' and has the word 'subway' similar to the 'altar rank' font, and the 'altar rank' is replaced by the 'subway'.

For the speech recognition result:

if the word obtained by the voice recognition is 'treatment', and the word 'treatment' also exists in the word stock, the 'treatment' is correct;

if the word obtained by speech recognition is 'intelligence', but the word 'intelligence' is not in the word stock, but the word 'treatment' similar to 'intelligence' pronunciation is present, the 'intelligence' is replaced by the 'treatment';

similarly, if the word obtained through voice recognition is "trip", and the word "trip" also exists in the thesaurus, it indicates that "trip" is correct;

if the word obtained by the voice recognition is 'rudiment', and the word 'trip' similar to the pronunciation of 'rudiment' is existed instead of 'rudiment' in the word stock, the 'rudiment' is replaced by 'trip';

further, if after OCR recognition, the 'metallurgical principle' is replaced by 'treatment' for more than 5 times, in the subsequent OCR recognition process, the first information to be recognized as the 'metallurgical principle' can be directly recognized as the 'treatment'; similarly, if the 'prototype' is replaced by the 'trip' for more than 5 times after the voice recognition, the voice segment to be recognized as the 'prototype' can be directly recognized as the 'trip' in the subsequent voice recognition process, so that the recognition accuracy is improved.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An information processing method is applied to electronic equipment and is characterized in that electronic text content associated with information to be identified which is to be identified as electronic text is acquired; extracting keywords in the acquired electronic text content; obtaining synonyms of the extracted keywords and words associated with the extracted keywords; the extracted keywords, synonyms of the keywords, and words associated with the extracted keywords constitute a lexicon; the electronic text content is a text which is corrected manually; the words associated with the extracted keywords include lower-order words of the extracted keywords or words having a high degree of correlation with the extracted keywords;

the method comprises the following steps:

2. The method according to claim 1, wherein the first word obtained by identifying the first information in the information to be identified comprises:

3. The method according to claim 1, wherein the information to be recognized is speech; said searching for the presence of a second word in said thesaurus that is similar to said first word comprises:

4. The method of any of claims 1-3, wherein after replacing the first word with the second word, further comprising:

recording the number of times the first word is replaced by the second word.

5. The method according to claim 4, wherein when the number of times of replacement of the first word by the second word is greater than a first preset threshold, in the process of identifying the information to be identified, the first information in the information to be identified is identified as the second word.

6. The method of any one of claims 1-3, further comprising:

the replacing the first word with the second word comprises:

7. The method of any one of claims 1-3, further comprising:

8. An information processing apparatus applied to an electronic device, wherein the electronic device has access to a thesaurus, and the thesaurus includes: extracting keywords from the obtained electronic text content, synonyms of the extracted keywords, and words associated with the extracted keywords; wherein, the obtained electronic text content is as follows: electronic text content associated with information to be identified as a text; the electronic text content is a text which is corrected manually; the words associated with the extracted keywords include lower-order words of the extracted keywords or words having a high degree of correlation with the extracted keywords; the device comprises:

9. The apparatus of claim 8, wherein the identification module comprises:

the searching module comprises:

10. The apparatus of claim 8, wherein the information to be recognized is speech;

the identification module comprises:

the searching module comprises:

11. The apparatus of any one of claims 8-10, further comprising:

12. The apparatus according to claim 11, wherein the recognition module is further configured to, when the number of times that the first word is replaced by the second word is greater than a first preset threshold, recognize, as the second word, the first information in the information to be recognized in the process of recognizing the information to be recognized.

13. The apparatus of any one of claims 8-10, further comprising:

the replacing module is specifically configured to replace the first word with the second word selected according to the selection instruction triggered by the user.

14. The apparatus of any one of claims 8-10, further comprising:

15. An electronic device characterized by comprising the information processing apparatus according to any one of claims 8 to 14.