Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for displaying character code recognition results, so that the operation times of a user are reduced, and the user can browse the recognition results in different character code formats more conveniently.
In order to achieve the above purpose, the embodiment of the invention discloses a method and a device for displaying a character code recognition result, wherein the technical scheme is as follows:
a display method of character code recognition results comprises the following steps:
selecting a first character subset of all character contents of a target document;
adopting a plurality of character coding formats to identify the selected first character subset to obtain a plurality of identification results, wherein each identification result is obtained by identification in one character coding format;
and displaying each obtained identification result in a first display area of the target document.
Preferably, the method for displaying the character encoding recognition result further includes:
receiving a selection instruction of a user for the first recognition result;
identifying all character contents in the target document by using a character coding format corresponding to the first identification result selected by the user to obtain an overall identification result;
and displaying the overall recognition result in a text display area of the target document.
Preferably, the selecting a first character subset of the whole character content of the target document includes:
starting with the first character in the entire content of the target document, a first number of characters is selected as a first character subset.
Preferably, the displaying the obtained recognition results in the first display area of the target document includes:
sequencing and displaying the obtained recognition results in a first display area of the target document according to the sequence of the names of the character coding formats corresponding to the recognition results; or,
and sequencing and displaying the obtained recognition results in a first display area of the target document according to the general degree of the character coding format corresponding to the recognition results.
Preferably, the displaying the obtained recognition results in the first display area of the target document includes:
and displaying each obtained recognition result and the character coding format name corresponding to each recognition result in a first display area of the target document.
A display device of character code recognition results, comprising:
the content selecting subset unit is used for selecting a first character subset of all character contents of the target document;
the recognition unit is used for recognizing the selected first character subset by adopting a plurality of character coding formats to obtain a plurality of recognition results, wherein each recognition result is obtained by recognition in one character coding format;
and the display unit is used for displaying each obtained identification result in the first display area of the target document.
Preferably, the display device for character encoding recognition result further includes:
a user selection instruction receiving unit, configured to receive a selection instruction of a user for a first recognition result after the display unit displays each obtained recognition result in a first display area of the target document;
the whole content identification unit is used for identifying all character contents in the target document by using a character coding format corresponding to the first identification result selected by the user to obtain a whole identification result;
and the whole content display unit is used for displaying the whole recognition result in a text display area of the target document.
Preferably, the content subset selecting unit is specifically configured to:
starting with the first character in the entire content of the target document, a first number of characters is selected as a first character subset.
Preferably, the display unit comprises a first display subunit and/or a second display subunit,
the first display subunit is configured to, according to the order of the names of the character encoding formats corresponding to the recognition results, perform sorting display on the obtained recognition results in the first display area of the target document;
and the second display subunit is used for sequencing and displaying the obtained recognition results in the first display area of the target document according to the general degree of the character coding formats corresponding to the recognition results.
Preferably, the display unit is specifically configured to:
and displaying each obtained recognition result and the character coding format name corresponding to each recognition result in a first display area of the target document.
The method and the device for displaying the character coding recognition result provided by the embodiment of the invention can display the recognition result in the specific display area of the target document to a user at one time after obtaining the recognition results of a plurality of different character coding formats. The invention does not need the user to repeatedly operate as the prior art, so the invention is convenient for the user to check and compare the recognition results of different character coding formats, reduces the user operation and saves the user time.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for displaying a character encoding recognition result according to an embodiment of the present invention, which may include the following steps:
s101, selecting a first character subset of all character contents of a target document;
to determine whether a character encoding format is the correct character encoding format, it is often determined that only a portion of the content of the target document needs to be identified. If the correct character encoding format is selected, the identified content will not contain a garbled code; if the selected character encoding format is incorrect, the recognized result will consist essentially of garbled codes. Therefore, in order to reduce the amount of computation and improve the efficiency, in this step, except that the entire content may be selected as the first character subset in the case where the target document is very short, generally, a part of the content of the target document may be selected first and identified in the subsequent step.
One preferred specific implementation manner of the step is as follows: a first number of characters is selected as a first subset, starting with a first character in the total content of the target document.
A portion of the content of the document is selected, typically starting with the first character at the beginning of the document, and a certain number of characters are selected, such as 100 characters. Not only the number of characters to be selected is not too large, but also the number of characters to be selected is reduced and the efficiency is improved, and also in the case of too many selected contents, each recognition result naturally becomes very long, so that dozens of recognition effects are displayed in a concentrated manner, and the whole recognition result becomes very long, which is not favorable for the user to browse and select.
Occasionally, several similar character encoding formats may occur, and since they contain substantially the same character sets, it may happen that when one of the character encoding formats is able to recognize most or all of the characters in the first subset of characters, the other similar character encoding formats are also able to do so. To facilitate the user's convenience in handling such a situation, the user may be provided with an instruction button that enables the user to quickly increase the value of the first number to make the best choice among the approximated character encoding formats.
S102, recognizing the selected first character subset by adopting a plurality of character coding formats to obtain a plurality of recognition results, wherein each recognition result is obtained by recognizing the character in one character coding format;
after part of the contents of the target document have been selected in S101, the selected contents are identified using various alternative character encoding formats. Due to the existence of multiple languages and multiple encoding methods, hundreds of common encoding formats exist. With each character encoding format, a recognition result is obtained, wherein, of course, the recognition result obtained by most character encoding formats is messy code. The recognition results are displayed to the user in the form of continuous recognition result blocks, so that the user can quickly view and select the recognition results.
S103, displaying each obtained recognition result in a first display area of the target document.
One recognition result is obtained for each character encoding format, so that many, possibly up to several tens, recognition results are obtained through step S102. And displaying the recognition results in a certain area different from the text display area of the target document in a continuous and mutually distinguishable mode in a centralized way for a user to select. The user can manually select the recognition result which does not contain the messy codes or contains less messy codes. The centralized display mode can be a window, the side of the window is provided with a scroll bar, and a user can conveniently browse the identification results which are arranged in series through mouse or keyboard operation. Moreover, two recognition results displayed in succession can adopt different display colors, and a certain gap and/or boundary line is arranged between the two recognition results, so that a user can still clearly distinguish different recognition result display blocks in the process of scrolling and browsing.
Moreover, a specific implementation manner of S103 in the embodiment of the method may be: and displaying each obtained recognition result and the character coding format name corresponding to each recognition result in a first display area of the target document.
Further, this approach may be specifically: and displaying the character code name corresponding to the recognition result in a bold mode above the display block of each recognition result. Therefore, the user can immediately know which character coding format the recognition result is recognized by when browsing each recognition result, and can immediately know which correct coding format the target document to be recognized is if the user finds that the recognition result is ideal (does not contain or does not basically contain messy codes).
Once the user considers that an ideal or more ideal recognition result is found in the plurality of recognition results, on the basis of the embodiment shown in fig. 1, the embodiment of the method may further include:
receiving a selection instruction of a user for the first recognition result;
identifying all character contents in the target document by using a character coding format corresponding to the first identification result selected by the user to obtain an overall identification result;
and displaying the overall recognition result in a text display area of the target document.
The implementation mode is that a user can click a certain recognition result or a character coding format name displayed corresponding to the recognition result in the process of browsing each recognition result, the embodiment of the method receives a click instruction of the user, recognizes all contents of the target document by using the character coding format corresponding to the user instruction, and displays the recognition result of all the contents as an integral recognition result in the text display area of the target document.
The user can turn to browse and view the whole recognition result of the target document, if the user is satisfied with the recognition result, the display window of the recognition result can be closed, and the recognition of the target document is finished. If the user still feels unsatisfied with the overall recognition result after viewing, the user can continue to view the recognition result in the display window until the correct character coding format is selected and the ideal overall recognition result is obtained.
Since the number of character encoding formats is large, the recognition results may be displayed in an appropriate order in order to make the display of the recognition results conform to the general user's habits. For convenience of example, only four character encoding formats, namely "simplified chinese", "arabic", "japanese" and "traditional chinese", are used herein to illustrate the arrangement and display of the recognition results in different character encoding formats.
In the embodiment shown in fig. 1, S103 may specifically include:
and sequencing and displaying the obtained recognition results in the first display area of the target document according to the sequence of the names of the character coding formats corresponding to the recognition results.
Because the name order of the four character coding formats is as follows from the order of Chinese Pinyin: "arabic language", "traditional chinese", "simplified chinese" and "japanese", so the display order of the recognition results should be:
recognition results obtained in "arabic; displaying the recognition result obtained by using 'traditional Chinese character'; displaying the recognition result obtained by the simplified Chinese character; the recognition result obtained with "japanese" is displayed next.
Or, in order to improve efficiency of viewing and making an ideal selection by the user and save time of the user, in the embodiment of the method, when displaying each recognition result, each recognition result may be displayed according to a certain preferred sequence, for example, S103 in the embodiment shown in fig. 1 may specifically include:
and sequencing and displaying the obtained recognition results in a first display area of the target document according to the general degree of the character coding format corresponding to the recognition results.
The general degree of a character encoding format may be determined by various methods, such as determining according to the application area of the product, or determining according to the usage rate of the character encoding format in various products, or determining the recognition rate of the character encoding format during the execution of step S102. Since these determination methods are well known in the art, they will not be described in detail here.
Taking the determination according to the application area of the product as an example, if the application area of the embodiment of the method is mainland china, the display sequence of the identification results may be:
the recognition result obtained by using simplified Chinese; displaying the recognition result obtained by using 'traditional Chinese character'; the recognition results obtained in "arabic" or "japanese" are then displayed.
Moreover, in order to facilitate the user to compare several recognition results with less messy codes and make an optimal selection from the recognition results when viewing different recognition results, in any of the above display modes, the user may be provided with a function of arbitrarily changing the display order of a certain recognition result in a manner of dragging or the like.
For example, if the user finds that the recognition results of the simplified chinese characters and the traditional chinese characters have fewer messy codes during the browsing process, and thus the user wants to compare and select the simplified chinese characters and the traditional chinese characters, but the default display positions of the simplified chinese characters and the traditional chinese characters are far apart and are inconvenient to compare, the user can drag the recognition result of the traditional chinese characters to the position behind the simplified chinese characters through mouse or keyboard operation. This function, in combination with the previously described function of increasing the number of characters included in the recognition result, will facilitate the user's viewing selection process.
As can be seen from the embodiment shown in fig. 1, the embodiment of the present invention selects a part of contents of a document to be recognized, then recognizes the selected part of contents in all supportable character encoding formats, and continuously displays all recognition results in a certain display area of the document to be recognized in a certain order. Because the user does not need to perform multiple selection operations like the existing mode, the embodiment of the invention is convenient for the user to check and compare the recognition results of different character coding formats, reduces the user operation and saves the user time.
Corresponding to the above method embodiment, the invention also provides a display device of character code recognition result.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a display device for character encoding recognition results according to an embodiment of the present invention. The apparatus may include: a selection content subset unit 201, a recognition unit 202 and a display unit 203.
A content selection subset unit 201, configured to select a first character subset of all character contents of the target document;
to determine whether a character encoding format is the correct character encoding format, it is often determined that only a portion of the content of the target document needs to be identified. Generally, part of the content of the target document is first selected and identified in a subsequent step, except that the entire content is selected as the first character subset in case the target document is very short.
In a preferred embodiment, the select content subset unit 201 may be specifically configured to select a first number of characters as the first character subset, starting with the first character in the entire content of the target document.
A portion of the content of the document is selected, typically starting with the first character at the beginning of the document, and a certain number of characters are selected, such as 100 characters. Not only the number of characters to be selected is not too large, but also the number of characters to be selected is reduced and the efficiency is improved, and also in the case of too many selected contents, each recognition result naturally becomes very long, so that dozens of recognition effects are displayed in a concentrated manner, and the whole recognition result becomes very long, which is not favorable for the user to browse and select.
To facilitate the user's convenience in handling such situations, the user may be provided with instruction buttons that allow the user to quickly increase the length of the selected content to make the best choice among the approximate character encoding formats.
The recognition unit 202 is configured to recognize the selected first character subset by using a plurality of character encoding formats, and obtain a plurality of recognition results, where each recognition result is obtained by recognition using one of the character encoding formats;
a display unit 203 for displaying each obtained recognition result in a first display area of the target document.
The display mode can be a window, the side edge of the window is provided with a scroll bar, and a user can conveniently browse the identification results which are arranged in series through mouse or keyboard operation. Moreover, two recognition results displayed in succession can adopt different display colors, and a certain gap and/or boundary line is arranged between the two recognition results, so that a user can still clearly distinguish different recognition result display blocks in the process of scrolling and browsing.
The display unit 203 in the embodiment shown in fig. 2 may be specifically configured to: and displaying each obtained recognition result and the character coding format name corresponding to each recognition result in a first display area of the target document.
Such a technical implementation may be: and displaying the character code name corresponding to the recognition result in a bold mode above the display block of each recognition result. Therefore, the user can immediately know which character coding format the recognition result is recognized by when browsing each recognition result, and can immediately know which correct coding format the target document to be recognized is if the user finds that the recognition result is ideal (does not contain or does not basically contain messy codes).
Once the user considers that an ideal or more ideal recognition result is found among the recognition results, the embodiment may further include:
a user selection instruction receiving unit configured to receive a selection instruction of a first recognition result by a user after the display unit 201 displays each obtained recognition result in a first display area of the target document;
the whole content identification unit is used for identifying all character contents in the target document by using a character coding format corresponding to the first identification result selected by the user to obtain a whole identification result;
and the whole content display unit is used for displaying the whole recognition result in a text display area of the target document.
If the user can click a certain recognition result or the character encoding format name displayed corresponding to the recognition result in the process of browsing the recognition results, the embodiment shown in fig. 2 receives a click instruction from the user, recognizes all the contents of the target document by using the character encoding format corresponding to the user instruction, and displays the recognition result of all the contents as the whole recognition result in the text display area of the target document.
The user can turn to browse and view the whole recognition result of the target document, if the user is satisfied with the recognition result, the display window of the recognition result can be closed, and the recognition of the target document is finished. If the user still feels unsatisfied with the overall recognition result after viewing, the user can continue to view the recognition result in the display window until the correct character coding format is selected and the ideal overall recognition result is obtained.
Since the number of character encoding formats is large, the recognition results may be displayed in an appropriate order in order to make the display of the recognition results conform to the general user's habits.
The display unit 203 may include a first display subunit and/or a second display subunit,
the first display subunit is configured to, according to the order of the names of the character encoding formats corresponding to the recognition results, perform sorting display on the obtained recognition results in the first display area of the target document;
and the second display subunit is used for sequencing and displaying the obtained recognition results in the first display area of the target document according to the general degree of the character coding formats corresponding to the recognition results.
Moreover, in order to facilitate the user to compare several recognition results with less messy codes and make an optimal selection from the recognition results when viewing different recognition results, in any of the above display modes, the user may be provided with a function of arbitrarily changing the display order of a certain recognition result in a manner of dragging or the like. This function, in combination with the previously described function of increasing the number of characters included in the recognition result, will facilitate the user's viewing selection process.
As can be seen from the embodiment shown in fig. 2, the embodiment of the present invention selects a part of contents of a document to be recognized, then recognizes the selected part of contents in all the supportable character encoding formats, and continuously displays all recognition results in a certain display area of the document to be recognized in a certain order. Because the user does not need to perform multiple selection operations like the existing mode, the embodiment of the invention is convenient for the user to check and compare the recognition results of different character coding formats, reduces the user operation and saves the user time.
It is clear to those skilled in the art that for convenience and brevity of description, the specific working procedures of the above-described apparatus and modules may be described with reference to the corresponding procedures in the foregoing method embodiments, so that in the description of the apparatus, the description is simplified, and only some important technical points are described.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will appreciate that all or part of the steps in the above method embodiments may be implemented by a program to instruct relevant hardware to perform the steps, and the program may be stored in a computer-readable storage medium, which is referred to herein as a storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.