WO2015136692A1

WO2015136692A1 - Digital image document editing system

Info

Publication number: WO2015136692A1
Application number: PCT/JP2014/056927
Authority: WO
Inventors: 久雄間瀬; 義行小林; 新庄　広; 竜治嶺; 高橋　寿一
Original assignee: 株式会社日立製作所
Priority date: 2014-03-14
Filing date: 2014-03-14
Publication date: 2015-09-17
Also published as: JPWO2015136692A1

Abstract

Provided is a digital image document editing system, which: accepts an input of a digital image document; recognizes, within the inputted digital image document, a text character string formed from one or more text characters of a plurality of categories of text characters; and, if the recognized text character string satisfies a text character string determination rule, determines that the recognized text character string is a text character string to be edited. The text character string determination rules include at least one determination rule among: a first determination rule wherein the recognized text character string is formed from a number of text characters equal to or greater than a first threshold (which is an integer greater than one); a second determination rule wherein the recognized text character string includes a partial text character string having a number of text characters belonging to a first group of categories, which are a portion of the abovementioned plurality of categories, equal to or greater than a second threshold (which is an integer greater than one); a third determination rule wherein the recognized text character string includes a text character that belongs to a second group of categories, which are a portion of the plurality of categories; and a fourth determination rule wherein the recognized text character string includes a content word.

Description

Electronic image document editing system

The present invention relates to an electronic image document editing system.

In order for multiple people to collaborate and perform one task quickly and accurately, it is desirable to share an edited document containing information necessary for the task. Editing a document includes creating a new document, updating an existing document (addition, correction, deletion, etc.), proofreading character information in the document, and translating the character information.

The recent development of the information society has made it possible for anyone to easily edit an electronic document in which character information in the document is encoded. However, there are many documents in which character information in the document is not coded, such as an electronic image document obtained by scanning a paper document and an electronic image document obtained by saving the electronic document as image data.

For example, in the design and development of products having a long life cycle such as electric power / electrical products, design documents of past products often remain only in the form of electronic image documents. When changing a part of the design document or translating characters in the design document and sharing design information with an overseas design department, the character string is recognized from the design document that is an electronic image document, The recognized character string must be edited.

As a background field of this technology, there is JP-A-9-223147 (Patent Document 1). According to this publication, “original image data read by the scanner unit 1 is input to the recognition processing unit 102 via the image processing unit 3 when recognition processing is set, and character recognition is performed. A predetermined number of lines (the number of hits) of the translated words in the total number of recognized words are stored, and when the hit number of the target line and the preceding and following lines is equal to or less than the predetermined number, or the character code of the recognized word is A ratio of the number of pattern matches in one line out of the total number of words recognized in one line by determining whether or not the pattern matches a certain pattern such as non-letters or the same code is repeated a predetermined number of times. “If the value is equal to or greater than a predetermined value, the extra line is suppressed by stopping drawing the attention line” (see summary).

JP-A-9-223147

For example, in an electronic image document in which characters are continuously written in one place, such as the text of a patent specification or a paper, the user can easily and accurately describe the place and amount of characters in the electronic image document. Can grasp. However, for example, character information is distributed and described in a document as in a design drawing, and a figure (non-character information) and character information are mixed, so that it is difficult to visually identify the character information description location. In this case, it becomes difficult for the user to easily and accurately grasp the description location and the description amount of the character information.

In order for a system for editing an electronic image document (hereinafter referred to as an electronic image document editing system) to edit character information in the electronic image document, it is necessary to first perform character recognition processing. That is, the electronic image document editing system needs to specify a region in which character information is described in the electronic image document, and to perform processing for specifying the character content described in the region.

If the recognition accuracy of the character recognition process is 100%, the electronic image document editing system can accurately specify the description location and the description amount of the character information in the electronic image document. However, in practice, the electronic image document editing system causes erroneous recognition due to the quality of the scanned paper, the resolution of the electronic image, the font type and size of the written characters, and the like in the character recognition process.

In addition, in the said misrecognition, when character information is recognized as non-character information (leakage), when non-character information is recognized as character information (noise), and character information is recognized as character information. In some cases, the content of the character information is not correctly recognized (recognition error). When an electronic image document editing system performs character recognition processing on a document such as a design drawing in which figure (non-character information) and character information are mixed, a noise character string generated by erroneously recognizing a fragment of the drawing as a character Are often seen in character recognition results.

The technique described in Patent Document 1 applies a predetermined rule to a recognized character string and its translation result (translation search result) as a result of character recognition processing, and when the rule matches, the recognized character string and its Do not output translation results. The rules mentioned in Patent Document 1 are the following two. The first rule is “if the number of translations in the total number of recognized words in one line (the number of hits) is memorized in a predetermined number of lines, and the hit number of the target line and the preceding and following lines is less than the predetermined number, “Stop drawing the line of interest”.

The second rule is to determine whether or not the character code of the recognized word matches a certain pattern such as a character other than a character or the same code continuing a predetermined number of times. When the ratio of the number of pattern matches in one line in the total number of recognized words is equal to or greater than a predetermined value, the drawing of the attention line is stopped.

As a result of the character recognition processing performed on the design drawing by the electronic image document editing system, the recognized noise character string includes, for example, one or more characters composed of kanji, hiragana, katakana, numbers, alphabets, symbols, and the like. It is a string. When the technique described in Patent Document 1 uses the second rule, the noise character string cannot be identified with high accuracy, and as a result, many noise character strings are to be drawn.

In addition, when the first rule is used, the technique described in Patent Document 1 uses a translation result (translation word search result) to identify whether or not the recognized character string is a drawing target character string. That is, in the technique described in Patent Document 1, when the first rule is used, it is necessary to perform a translation process even on a character string that does not output a translated word, and the processing load increases. For example, when the first rule is applied to an electronic image document editing system for editing work other than translation, a translation function that is not necessary for the original editing work must be installed in the electronic image document editing system. The cost burden increases.

Accordingly, an object of the present invention is to identify and remove a noise character string with high accuracy from a recognized character string in an electronic image document. Another object of the present invention is to specify a noise character string from a recognized character string without performing a character string editing process.

In order to solve the above problems, the present invention employs the following configuration, for example. An electronic image document editing system for editing a character string recognized from an electronic image document, including a processor and a storage device, wherein the storage device is a character string made up of one or more characters whether or not the character string to be edited One or more character string determination criteria that are criteria for determining whether or not the processor accepts an input of an electronic image document, and from one or more characters in a plurality of types of characters in the input electronic image document When the recognized character string satisfies the character string criterion, it is determined that the recognized character string is an edit target character string, and the character string criterion is the recognized character string. However, in the first determination group consisting of characters greater than or equal to the first threshold (the first threshold is an integer of 2 or more) and the characters of the first type group in which the recognized character string is part of the plurality of types A second determination criterion including a partial character string composed of two or more threshold values (the second threshold value is an integer of 2 or more), and a character in the second type group in which the recognized character string is a part of the plurality of types. An electronic image document editing system including at least one determination criterion among a third determination criterion including and a fourth determination criterion in which the recognized character string includes a content word.

According to one aspect of the present invention, a noise character string can be specified with high accuracy from a character string recognized by character recognition processing from an electronic image document. In addition, according to one aspect of the present invention, a noise character string can be specified from a recognized character string with high accuracy without performing editing processing on the recognized character string.

2 shows an example of the system configuration of an electronic image document editing system. 2 shows an example of a hardware configuration of an electronic image document editing system. An example of input electronic image document data is shown. The example of the electronic image document data after a translation process is shown. The example of the electronic image document data before a character recognition process is shown. The example of the electronic image document data after a character recognition process is shown. The 1st example of a character string criterion is shown. The 2nd example of a character string criteria is shown. The 3rd example of a character string criterion is shown. An example of a character string information table is shown. The example of a character string determination process flowchart by a character string determination part in case the value of Judge is the number of matching items is shown. The example of the character string determination process flowchart by a character string determination part in case the value of Judge is the weight sum of a matching item is shown. The example of the list output screen of the character string determined to be a translation object character string is shown. The example of the change screen of a character string criterion is shown. The example of the output screen which re-executed the translation process after a character string determination standard change is shown.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be noted that this embodiment is merely an example for realizing the present invention, and does not limit the technical scope of the present invention. In each figure, the same reference numerals are given to common configurations.

The electronic image document editing system according to the present embodiment accepts an input of an electronic image document of a design drawing and performs an editing process on the input electronic image document. The electronic image document editing system according to the present embodiment supports the work of translating a Japanese character string described in an electronic image document into an English character string as an example of editing processing. The character string in this embodiment is composed of one or more characters.

Specifically, the electronic image document editing system recognizes characters in an electronic image document such as a design drawing and extracts candidates for locations where character strings are described. The electronic image document editing system identifies a part where a character string is actually described from among candidates for a part where the character string is described by a character string determination process described later.

The electronic image document editing system performs translation processing on the Japanese character string among the specified character strings, and presents translation candidates for the Japanese character string. The electronic image document editing system generates a translation object for the translation selected by the user, corrects the layout of the translation object, and pastes it at an appropriate position on the document.

In this embodiment, the design drawing is used as an example of the input electronic image document. However, for example, an electronic image diagram included in a text or a paper that is converted into an electronic image may be used as the input electronic image document. In addition, the electronic image document editing system in this embodiment mainly describes the work of recognizing Japanese character strings and translating the recognized Japanese character strings into English character strings. There are no particular restrictions on. Furthermore, in this embodiment, the work of translating a document is described, but the present invention can also be applied to other editing work such as document update and document proofreading.

FIG. 1 shows a configuration example of the electronic image document editing system of this embodiment. The electronic image document editing system includes an input processing unit 1, an output processing unit 2, a character recognition processing unit 4, a character string determination unit 7, a translation processing unit 10, a translated word object generating unit 13, a translated word object editing unit 14, and character string information. A management unit 16 is included. Each unit described above is a program.

The electronic image document editing system also includes a translation target image document 3, a character recognition dictionary 5, a translation target image document 6 with character recognition results, a character string determination criterion 8, a character string information table 9, a translation dictionary 11, and a translation word candidate table 12. , An image document 15 with a character recognition result / translation result, and a word / character dictionary 17.

The input processing unit 1 accepts various data and operations designated or instructed by the user via input means such as a keyboard, a mouse, a touch panel, and a touch pen. As an example of specific data or operation instructions, the input processing unit 1 selects an electronic image document to be translated, instructs to perform character recognition, changes contents of character string criteria, specifies a character string to be translated, Selection and input, editing of translation object, etc. are accepted.

The output processing unit 2 outputs various data and processing results to the user via output means such as a display. The output processing unit 2 includes, as an example of specific data or processing results, an image document to be translated, an image document to be translated with character recognition results, a character string determination criterion, and character string information to be translated. Outputs an image document with a list, translation candidates, character recognition results, and translation results.

When using the electronic image document editing system of this embodiment, the user first selects an electronic image document to be translated from the electronic image document input to the electronic image document editing system. The content of the selected electronic image document is displayed to the user via a display or the like and is stored in the translation target image document 3.

Subsequently, the user instructs execution of character recognition. The character recognition processing unit 4 extracts electronic image document data from the translation target image document 3 and refers to a character recognition dictionary 5 that stores data relating to individual characters, rules for character recognition, and the like. Perform character recognition.

The character recognition process includes a character string area specifying process, a character cut-out process from the character string area, and a cut-out character recognition process. Since many character recognition algorithms used for character recognition processing are already widely known, description of character recognition processing is omitted. Note that the character recognition processing unit 4 may perform the character recognition process using any character recognition algorithm.

The character string recognized by the character recognition processing unit 4 is stored in the character string information table 9 together with the description location (coordinate position in the document image) of the character string. Further, the recognized character string is stored in the translation object image document 6 with the character recognition result in a form embedded in the document description portion of the translation object image document 3.

The character string determination unit 7 analyzes the character string recognized by the character recognition processing unit 4 and determines whether or not the recognized character string is a character string to be translated. The character string determination unit 7 analyzes the character string with reference to a word / character dictionary 17 in which a list of characters and attributes, and a headline and attributes of words are stored. The character string determination unit 7 refers to a character string determination criterion item stored in the character string determination criterion 8 and determines whether the recognized character string is a character string to be translated. Details of the processing by the character string determination unit 7 and the character string determination reference 8 will be described later. The determination result by the character string determination unit 7 is stored in the character string information table 9.

Subsequently, the user looks at the displayed translation target image document 6 with the character recognition result, designates a description portion corresponding to the character string via a mouse, a touch pen, etc., and instructs execution of translation. As a method for designating the description location by the user, for example, clicking the description location, dragging the description location, and selecting a rectangle of the range including the description location, etc., any method may be used.

The translation processing unit 10 extracts a character string corresponding to the description location (coordinates) designated by the user from the character string information table 9. The translation processing unit 10 refers to the translation dictionary 11, extracts translation word candidates corresponding to the character string, and presents them to the user. In this embodiment, the translation processing unit 10 searches the translated word by matching the character string with the translation dictionary. The translation string is divided into words by morphological analysis of the character string, and the translation dictionary for each word. You may search and present a translation from 11. Further, the translation processing unit 10 may pass the character string to a machine translation system and present a translation result by the machine translation system.

In addition, since many translation dictionary search algorithms and machine translation algorithms are widely known, description of translation processing using them will be omitted. The electronic image document editing system of the present embodiment may use any translation dictionary search algorithm and machine translation algorithm when performing translation processing. The translation result is stored in the translated word candidate table 12. The translation candidate table 12 temporarily stores the correspondence between Japanese character strings and translation candidates.

The translation object generation unit 13 transmits the translation word candidates stored in the translation word candidate table 12 to the output processing unit 2, and the output processing unit 2 presents the received translation word candidates to the user. The user selects a correct translation from the presented translation candidates. If there is no correct translation in the presented translation candidates, the user inputs the correct translation directly from the keyboard or the like. If there is an error in the recognized character string, the user corrects the recognized character string and instructs re-execution of translation. The user selects a correct translation from the translation candidates presented again.

When the translated word is confirmed by the user inputting or selecting a correct translated word, the translated object generating unit 13 generates a translated object consisting of the translated text character string and displays it on the translation target image document 3. Further, the character string information management unit 16 stores the corrected character string and the confirmed translation result in the character string information table 9.

The translated object editing unit 14 adjusts the object size of the displayed translated object, the font size of the text, and the like, and performs an editing process for prompting the user to move to an appropriate position on the document and paste it. . For example, the translated object editing unit 14 may automatically adjust the object size of the translated object, the font size of the text, and the like according to the character string length before and after translation. When the user instructs to save the editing result, the electronic image document data at that time is stored in the image document 15 with character recognition result / translation result.

The character string information management unit 16 manages the character string to be translated and the translation processing status of the character string. Specifically, the character string information management unit 16 analyzes the character string information table 9, calculates the number of character strings to be translated and the number of characters in the electronic image document, and holds them. In addition, the character string information management unit 16 manages the editing work status such as whether or not each translation target character string has been translated in cooperation with the translated word object generating unit 13 and the translated word object editing unit 14.

When the character string information management unit 16 receives information from the translated object editing unit 14 that the translated object has been pasted on a predetermined coordinate, the translation of the character string to be translated corresponding to the coordinate is completed. I reckon. At this time, the character string information management unit 16 stores 1 in a translation work completion flag (to be described later) of the character string information table 9. The electronic image document editing system can manage to what extent the translation work is completed at a certain point of time by the translation work management by the character string information management unit 16 and can present the translation work status to the user.

FIG. 2 shows a hardware configuration example of the electronic image document editing system of the present embodiment. The electronic image document editing system includes a processing device 50, an input device 30, an output device 40, and a storage device 60, and is connected to a network 90.

The processing device 50 includes a processor and / or a logic circuit that operates according to a program, inputs / outputs data, reads / writes data, and executes each program shown in FIG. The program is executed by the processor to perform a predetermined process using a storage device and a communication port (communication device). Therefore, in the present embodiment and other embodiments, the description with the program as the subject may be the description with the processor as the subject. Alternatively, the process executed by the program is a process performed by a computer and a computer system on which the program operates.

The processor operates as a functional unit that realizes a predetermined function by operating according to a program. For example, the processor functions as the character recognition processing unit 4 by operating according to the character recognition processing program, and functions as the character string determination unit 7 by operating according to the character string determination program. The same applies to other programs. Furthermore, the processor also operates as a functional unit that realizes each of a plurality of processes executed by each program. A computer and a computer system are an apparatus and a system including these functional units.

The input device 30 is a device that accepts an operation content or data input from a user. The input device 30 includes a keyboard 31 and a mouse 32. The input device 30 may include a touch pen, a touch panel, or the like instead of or in addition to the keyboard 31 and the mouse 32.

The output device 40 is a device that outputs calculation processing results and the like to the user. The output device 40 includes an output monitor 41. The electronic image document editing system transmits / receives input / output data via the network 90 when the input / output data is exchanged with another computer.

The storage device 60 stores the program and data shown in FIG. The storage device 60 includes a working area 61 that temporarily stores processing data generated by the processing device 50 when the program is executed.

Further, the storage device 60 is an area for storing each data shown in FIG. 1, which is a translation target image document storage area 62, a character recognition dictionary storage area 64, a translation target image document storage area 65 with a character recognition result, a character A column criterion storage area 67, a character string information table storage area 68, a translation dictionary storage area 70, a translation word candidate table storage area 71, an image document storage area 74 with character recognition results / translation results, and a word / character dictionary storage area 75 Including.

In addition, the storage device 60 is an area for storing each unit shown in FIG. 1, and is a character recognition processing unit storage area 63, a character string determination unit storage area 66, a translation processing unit storage area 69, and a translated object generation unit storage area. 72, and a translated object editing section storage area 73.

In FIG. 2, the electronic image document editing system has a configuration in which all data and processing are aggregated in one computer, but the data and processing may be distributed and arranged in a plurality of computers. For example, a character recognition server, which is another computer storing the character recognition processing unit 4 and the character recognition dictionary 5, and a computer having a function other than character recognition may exchange data with each other via the network 90. . Similarly, for example, a translation server which is another computer storing the translation processing unit 10 and the translation dictionary 11 and a computer having a function other than translation may exchange data with each other via the network 90.

FIG. 3A shows an example of an input electronic image document before translation. Here, for convenience of explanation, a simple electric circuit diagram is used as an example of an electronic image document. However, in reality, the electronic image document editing system includes a large amount of character information and drawing information including non-character information. Is often entered. The electric circuit diagram in the electronic image document 301 before translation shows a circuit including a 6V dry cell, a miniature bulb, a transistor, and a resistor. In the vicinity of each symbol in the electric circuit diagram, a character string representing the content and explanation of each symbol is described. That is, character information and non-character information such as a symbol representing a circuit and wiring are mixed in the electric circuit diagram.

FIG. 3B shows an example of the electronic image document of FIG. 3A translated by the electronic image document editing system. In FIG. 3B, the portion of the figure (non-character division) in the translated electronic image document 302 is not edited, the contents of FIG. 3A are displayed as they are, and only the Japanese character string is translated into English. Further, character strings having the same notation and meaning in Japanese and English, such as “100Ω” and “6V”, are not translated, and the contents in the electronic image document 301 before translation are displayed as they are. For example, if the character string lengths of Japanese before translation and English after translation differ greatly, the user can adjust the character font or add a line break to the electronic image document of the translated word. Editing processing such as making multiple lines or adjusting the description position may be performed.

FIG. 4A shows an example of an electronic image document before character recognition by the electronic image document editing system. The electronic image document 401 before character recognition is the same as the electronic image document 301 before translation described in FIG. 3A.

FIG. 4B shows an example of a character recognition result for the electronic image document of FIG. 4A by the electronic image document editing system. The character string in the character recognition result 402 is associated with the description location (coordinates) of the character string. In FIG. 4B, the character recognition result is displayed by overwriting the document data for convenience of explanation. However, in actuality, the character string obtained as the character recognition result is arranged behind the document data and is visible to the user. Absent.

Character strings “resistance 100Ω”, “resistance 200Ω”, “bean bulb”, and “dry battery 6V” in the character recognition result 402 are correctly recognized. However, the character string “NPN transistor” is erroneously recognized as “NPN transistor” with only one character (“J” is “S”). Also, in the character string “amplify changes in input”, the partial character string “input” is “entered”, the partial character string “change” is “weird”, and the partial character string “)” is “}”. Are mistakenly recognized. The partial character string in the character string of length n is a continuous character string from the i-th character to the j-th character (1 ≦ i ≦ j ≦ n) of the character string.

Furthermore, in the character recognition result 402, the miniature bulb circuit symbol is the character string “te W”, the NPN transistor circuit symbol is the character string “six”, the resistor circuit symbol is the character string “−VV-”, and the dry cell circuit Each symbol is recognized as a character string “state”. These recognized character strings are all noise character strings in which non-character information is erroneously recognized as character information.

FIG. 5A shows a first example of the character string determination criterion 8. The character string determination criterion 8 includes a plurality of determination criterion items. Each determination criterion item includes an ID 501 for identifying the determination criterion item, a determination criterion item content 502 describing the specific content of the determination criterion item, a weight value 503 representing the importance (reliability) of each determination criterion item, and An application flag 504 that indicates by 1/0 whether or not to apply the criterion item is included. Further, the character string determination criterion 8 includes a determination method 505 that defines a determination method using one or more determination criterion items.

The determination criterion item content 502 is described using a variable that can be recognized by the character string determination unit 7. S_length represents the number of characters constituting the character string. C_type represents the type of characters constituting the character string. For example, Kanji, hiragana, katakana, symbol, symbol suffix (n_suffix), number, alphabet, alphabet ), An emergency kanji (non_j_kanji), etc., and a character type value recognized by the character recognition processing unit 4. In this embodiment, the character recognition processing unit 4 recognizes characters used for the original word and the translated word.

In addition, the number (numeral) may indicate an Arabic numeral, or may indicate a numeral including a numeral other than an Arabic numeral (for example, a Roman numeral or a Greek numeral). The number suffix is an example of a classifier and represents a word that is a suffix among the classifiers. The classifier is a concept including a unit of measurement.

Note that the character recognition processing unit 4 identifies one type of each recognized character. For example, the type of the letter “A” may be an alphabet or a numeric suffix (ampere) representing a unit of current. For example, the character recognition processing unit 4 identifies one type of the character “A” from the relationship with the characters before and after the character “A”. C_type_seq represents the number of consecutive characters defined by C_type. C_word represents the number of independent words included in the character string.

The determination criterion item Rule_1 indicates that “the number of characters constituting the recognized character string is two or more characters”. In addition, the numerical value 2 in Rule_1 may be an integer of 3 or more. Since a character string with a small number of characters is unlikely to constitute a word, it is highly likely that it is a noise character string. The character string determination unit 7 can exclude such a noise character string from the edit target character string by applying Rule_1 to the recognized character string.

Further, the character string determination unit 7 can perform determination processing using Rule_1 by counting the number of characters in the character string without performing morphological analysis and without referring to various dictionaries. Therefore, the character string determination unit 7 can perform determination processing using Rule_1 at high speed.

Note that many noise character strings are single character strings. The character string to be edited includes many character strings of two characters. Therefore, the character string determination unit 7 removes a large number of noise character strings by performing determination using Rule_1 of the present embodiment, in which two or more character strings are to be edited. Despite this, it is possible to reduce the number of character strings that are no longer considered for editing. Note that the character string determination unit 7 can directly apply the above Rule_1 to the recognized character string even when the original language is other than Japanese.

The criterion item Rule_2 indicates that “the recognized character string includes a partial character string in which two or more kanji characters, hiragana characters, or katakana characters continue”. The numerical value 2 in Rule_2 may be an integer of 3 or more. A character string that does not include a partial character string in which a predetermined type of characters continues for a certain number of characters is highly likely to be a noise character string. In addition, a character string that does not satisfy Rule_2 having the predetermined type as kanji, hiragana or katakana is not likely to include a partial character string that is a Japanese (original) word, and thus may be a noise character string in particular. Is expensive. The character string determination unit 7 can exclude such a noise character string from the editing target by applying Rule_2 to the recognized character string.

In addition, the character string determination unit 7 determines Rule_2 by determining the type of characters constituting the character string and counting the number of characters in the character string without performing morphological analysis and referring to various dictionaries. The used determination process can be performed. Therefore, the character string determination unit 7 can perform determination processing using Rule_2 at high speed.

Note that many noise character strings are character strings that do not include a partial character string in which two or more kanji characters, hiragana characters, or katakana characters continue. In addition, the character string to be edited is a character string including a partial character string in which two kanji characters, hiragana characters, or katakana characters are continuous, and includes a partial character string in which three or more kanji characters, hiragana characters, or katakana characters are continuous. Contains a lot of non-character strings. Therefore, the character string determination unit 7 removes a large number of noise character strings by performing determination using Rule_2 of the present embodiment in which two or more character strings are to be edited. Despite this, it is possible to reduce the number of character strings that are no longer considered for editing.

Note that Rule_2 may be, for example, “the recognized character string includes a partial character string in which two or more characters in the first type group that is a part of the character type recognized by the character recognition processing unit 4 are continuous”. . In the present embodiment, the type group represents one or a plurality of types. Therefore, in Rule_2, for example, the first type group can be hiragana or katakana. Also in this case, the numerical value 2 in Rule_2 may be an integer of 3 or more.

When the original language is English, Rule_2 can be, for example, “a recognized character string includes a partial character string in which two or more alphabets are continuous”. Further, when the original language is Chinese, Rule_2 can be, for example, “the recognized character string includes a partial character string in which two or more kanji characters are continuous”.

Judgment criteria item Rule_3 indicates that “the recognized character string includes one or more characters other than symbols, numeral suffixes, numbers, alphabets, and emergency kanji”. A character string whose recognition character string does not include one or more characters other than the predetermined type is highly likely to be a noise character string. In addition, a character string that does not satisfy Rule_3 except that the predetermined type is a symbol, a numeric suffix, a number, an alphabet, or an emergency kanji is highly likely not to be a Japanese (original language) word, and thus is particularly a noise character string. Is likely. The character string determination unit 7 can exclude such a noise character string from the editing target by applying Rule_3 to the recognized character string.

Note that Rule_3 may be, for example, “the recognized character string includes one or more characters in the second type group that is a part of the character type recognized by the character recognition processing unit 4”. When the original language is English, Rule_3 can be, for example, “the recognized character string includes one or more characters other than symbols, classifiers, numbers, kanji, hiragana, katakana, and emergency kanji”. When the original language is Chinese, Rule_3 is, for example, “recognized character string includes one or more characters other than symbols, classifiers, numbers, hiragana, katakana, alphabet, and emergency kanji”. it can.

Judgment criteria item Rule_4 indicates that “the recognized character string includes one or more independent words”. Since a character string that does not include an independent word is a character string that does not have a word content, there is a high possibility that the character string is a noise character string. The character string determination unit 7 can exclude the noise character string from editing targets by applying Rule_4 to the recognized character string.

Note that Rule_3 may be, for example, “the recognized character string includes one or more content words”. A content word is a word having a specific meaning content other than a grammatical role, such as a noun, a verb, and an adjective. An independent word is an example of a content word in Japanese.

The electronic image document editing system according to the present embodiment does not use the information related to the editing result when performing the determination using the determination reference items Rule_1 to Rule_4, and therefore does not perform the editing process on the recognized character string. However, it can be determined whether the recognized character string is an edit target character string.

The electronic image document editing system according to the present embodiment provides two types of determination methods 505. FIG. 5A shows that a determination method (Num_of_items) is designated as the determination method 505 based on the number of matching determination criterion items. Also, the threshold value (Num_of_items_threshold) for the number of determination criterion items in FIG.

That is, in the determination method 505, when the recognition character string is an application target (the application flag 504 is 1), among the four types of determination reference items, the character string determination unit 7 Indicates that the recognized character string is determined to be a character string to be translated. In the determination method 505, when the recognized character string satisfies only two or less types of determination criterion items among the determination criterion items, the character string determination unit 7 determines that the recognized character string is not a character string to be translated. It shows that.

FIG. 5B shows a second example of the character string criterion 8. The determination method 505 indicates that the threshold (Num_of_items_threshold) for the number of determination criterion items is 2. That is, in the determination method 505, the character string determination unit 7 translates the recognized character string only when the recognized character string satisfies all of the two types of determination criterion items that are application targets (the application flag 504 is 1). Indicates that the target character string is AND determination. Further, for example, if the threshold value of the number of determination criterion items is 1, the determination method 505 indicates that the character string determination unit 7 determines that the recognized character string 7 satisfies one of the two types of determination criterion items. Indicates an OR determination in which is determined as a character string to be translated.

FIG. 5C shows a third example of the character string criterion 8. The determination method 505 indicates that a method (Sum_of_weights) for determining whether or not the recognized character string is a character string to be translated is specified based on the sum of the weight values 503 of the matching determination criterion items.

In addition, in the determination method 505, the weight sum threshold (Sum_of_weights_threshold) is 3.0. In other words, the determination method 505 is a character string when the sum of the weight values of the items satisfied by the recognized character string is 3.0 or more among the four types of determination criterion items to be applied (the application flag 504 is 1). It shows that the determination part 7 determines the said recognition character string as a translation object character string. The determination method 505 indicates that, when the sum of the weight values is less than 3.0, the character string determination unit 7 determines that the recognized character string is not a character string to be translated.

In FIG. 5C, when the weight values are all 1.0, the determination using FIG. 5C is based on the number of matched items shown in FIG. This is the same as the determination. That is, the determination based on the number of matching items is an example of determination based on a weight value.

Note that the user can change the contents of the character string judgment standard 8. That is, the user can select the determination criterion items to be applied, and can add, delete, or change parameters, threshold values, and the like. Details of the change of the contents of the character string determination criterion 8 will be described later.

FIG. 6 shows an example of the character string information table 9. The character string information table 9 holds data relating to the determination result by the character string determination unit 7 and the translation result by the translation processing unit 10. Here, there is shown a character string information table 9 that holds the result of the processing performed by the character recognition processing unit 4 and the character string determination unit 7 on the input document of FIG. 4A by the electronic image document editing system.

The character string information table 9 includes a recognized character string 601, a description position 602, a determination reference collation result 607, a translation target flag 610, a modified character string 611, a translated character string 612, and a translation status flag 613.

The recognized character string 601 holds a character string recognized by the character recognition processing unit 4. The description position 602 holds information on a rectangular area where the recognized character string 601 is displayed. The description position 602 includes an upper left X coordinate 603 and an upper left Y coordinate 604 that are coordinates of the upper left vertex of the rectangular area where the recognition character string 601 is displayed, and a lower right X coordinate that is a coordinate of the lower right vertex of the rectangular area. 605 and the lower right Y coordinate 606.

The determination reference collation result 607 holds the determination reference collation result in the character string determination unit 7. The judgment reference matching result 607 includes a column that holds the matching result for each judgment criterion item, the number of matches 608 that holds the number of matching judgment criterion items, and the weight that holds the sum of the weight values of the matching judgment criterion items. Total 609.

The translation target flag 610 is a flag for identifying whether or not the recognized character string 601 is a translation target character string. When the translation target flag 610 determines whether or not the recognized character string is a translation target, as in the character string determination criterion 8 shown in FIG. 5A or 5B, the number of matches 608 is equal to or greater than a threshold. If the number of matches is less than the threshold, 0 is held. When the translation target flag 610 is determined based on the sum of weights of the matched items as to whether or not the recognized character string is a translation target as in the character string determination criterion 8 shown in FIG. If there is, 1 is held, and if the weight sum 609 is less than the threshold, 0 is held.

The corrected character string 611 holds a character string corrected by the user when there is an error in the recognized character string 601 whose translation target flag 610 is 1. The translated character string 612 holds the translation result for the recognized character string 601 or the corrected character string 611. The translation status flag 613 is a flag for identifying whether or not the translation work of the recognized character string 601 or the corrected character string 611 having the translation target flag 610 of 1 is completed. The translation status flag 613 holds 1 when the translation result is held in the translated

character string

612, 0 when it is not held, and NULL when the recognized character string 601 is not a translation target.

6, the recognized character string “resistance 100Ω” is described in a rectangular area having the coordinates (160, 30) as the upper left vertex and the coordinates (300, 50) as the lower right vertex. The recognition character string “resistance 100Ω” has a match number 608 of 4.0 and is equal to or greater than a threshold value (= 3) (or a weight sum 609 of 4.0 and equal to or greater than a threshold value (= 3.0). Therefore, the character string determination unit 7 determines the character string to be translated. Therefore, the translation target flag 610 in the recognized character string “resistance 100Ω” holds “1”.

Since the character recognition result is also correct, the translation processing unit 10 translates the recognized character string “resistance 100Ω” into “Resistor 100Ω” and stores the translation result in the translated character string 612. Since the translation of the recognized character string “resistance 100Ω” by the translation processing unit 10 has been completed, the corresponding translation status flag 613 holds “1”.

The recognition character string “NPN transistor” is described in a rectangular area having coordinates (250, 250) as the upper left vertex and coordinates (390, 270) as the lower right vertex. The recognized character string “NPN transistor” has a match number 608 of 3.0 and is equal to or greater than a threshold value (= 3) (or a weight sum 609 is 3.5 and is equal to or greater than a threshold value (= 3.0)). Therefore, the character string determination unit 7 determines that the character string is a translation target character string. Therefore, the translation target flag 610 in the recognized character string “NPN transistor” holds “1”.

However, the recognition character string “NPN transistor” has an incorrect character recognition result. When the user inputs to correct the recognized character string “NPN transistor” to “NPN transistor”, the translation object generation unit 13 stores the correction result in the corrected character string 611. The translation processing unit 10 translates the modified character string “NPN transistor” and stores the translation result in the translated character string 612. Since the translation of the corrected character string “NPN transistor” by the translation processing unit 10 has been completed, the corresponding translation status flag 613 holds “1”.

The recognition character string “Te W” is described in a rectangular area having coordinates (160, 240) upper left vertex and coordinates (200, 260) lower right vertex. However, in the recognized character string “te W”, the number of matches 608 is 2 and is less than the threshold value (= 3) (or the weight sum 609 is 2.5 and is less than the threshold value (= 3.0)). Therefore, the character string determination unit 7 determines that the character string is not a translation target character string. Accordingly, the corresponding translation target flag 610 holds 0. As a result, the translation processing unit 10 does not perform the translation work for the recognized character string “Te W”.

The recognition character string “dry battery 6V” is described in a rectangular area having coordinates (335, 410) as the upper left vertex and coordinates (460, 430) as the lower right vertex. The recognition character string “dry battery 6V” has a match number 608 of 4 and is equal to or greater than a threshold value (= 3) (or a weight sum 609 is 4.0 and is equal to or greater than a threshold value (= 3.0)). The character string determining unit 7 determines that the character string is to be translated. Accordingly, the corresponding translation target flag 610 holds “1”. However, since the translation work for the recognized character string “dry battery 6V” has not been performed yet, that is, the translation character string 612 does not hold the translation result, the corresponding translation status flag 613 holds 0.

Hereinafter, the translation target character string determination process by the character string determination unit 7 will be described. First, the character string determination unit 7 checks the determination method defined in the character string determination standard 8. That is, the character string determination unit 7 checks whether the value of the variable Judge of the determination method 505 is the number of matching items (Num_of_items) or the weight sum of matching items (Sum_of_weights). When the value of Judge is the number of matching items, the character string determination unit 7 performs the process shown in FIG. 7A. If the Judge value is the weighted sum of the matching items, the character string determination unit 7 performs the process shown in FIG. 7B.

FIG. 7A shows an example of a translation target character string determination process performed by the character string determination unit 7 when the Judge value is the number of matching items. The character string determination unit 7 acquires the value of the threshold value S1 defined in the determination method 505 of the character string determination criterion 8 (step 702). That is, the character string determination unit 7 holds the value of the variable Num_of_items_threshold as the threshold value S1. Next, the character string determination unit 7 determines whether or not there is an undetermined recognized character string in the character string information table 9 (step 703). If there is no undetermined recognized character string (step 703: No), the character string determination unit 7 ends the process.

If there is an undetermined recognized character string (step 703: Yes), the character string determining unit 7 analyzes the recognized character string (step 704). In the analysis, the character string determination unit 7 refers to the word / character dictionary 17 and, for example, the number of characters constituting the recognized character string, the type of characters constituting the recognized character string, and the independent character included in the recognized character string. Information necessary for determination of the determination criterion item content 502 such as a word is extracted.

Next, the character string determination unit 7 determines whether or not there is a determination criterion item content 502 not applied to the recognized character string (step 705). When there is an unapplied determination criterion item content 502 (step 705: Yes), the character string determination unit 7 collates the unapplied determination criterion item content 502 (step 706), and the recognized character string is not applied. It is determined whether or not the determination criterion item content 502 is met (step 707).

When the recognized character string does not match the determination criterion item content 502 (step 707: Yes), the character string determination unit 7 stores the value 0 in the corresponding determination criterion item of the determination criterion collation result 607 of the character string information table 9. (Step 708), the process returns to Step 705. When the recognized character string matches the determination criterion item (step 705: No), the character string determination unit 7 stores the value 1 in the corresponding determination criterion item of the determination criterion collation result 607 of the character string information table 9 (step 705). 709), and returns to step 705.

If there is no unapplied determination criterion item (step 705: No), the character string determination unit 7 sums the values for each determination criterion item stored in the determination criterion collation result 607 of the character string information table 9, and the total value Is stored in the match number 608 (step 710). Next, the character string determination unit 7 determines whether or not the total value stored in the number of matches 608 is equal to or greater than the threshold value S1 acquired in Step 702 (Step 711).

If the total value is not equal to or greater than the threshold value S1 (step 711: No), the character string determination unit 7 stores a value 0 in the translation target flag 610 of the recognized character string in the character string information table 9 (step 712). Return to 703. If the total value is greater than or equal to the threshold value S1 (step 711: Yes), the character string determination unit 7 stores the value 1 in the translation target flag 610 of the recognized character string in the character string information table 9 (step 713). Return to step 703.

FIG. 7B shows an example of translation target character string determination processing by the character string determination unit 7 when the Judge value is the weighted sum of matching items. The character string determination unit 7 acquires the value of the threshold value S2 defined in the determination method 505 of the character string determination criterion 8 (step 714). That is, the character string determination unit 7 holds the value of the variable Sum_of_weights_threshold as the threshold value S2. Next, the character string determination unit 7 determines whether or not there is an undetermined recognized character string in the character string information table 9 (step 715).

If there is no undetermined recognized character string (step 715: No), the character string determination unit 7 ends the process. If there is an undetermined recognized character string (step 715: Yes), the character string determining unit 7 analyzes the recognized character string (step 716). Since this analysis is the same as the analysis performed in step 704, description thereof is omitted.

Next, the character string determination unit 7 determines whether or not there is an unapplied determination criterion item content 502 for the recognized character string (step 717). If there is an unapplied determination criterion item content 502 (step 717: Yes), the character string determination unit 7 collates the unapplied determination criterion item content 502 (step 718), and the recognized character string is not applied. It is determined whether or not the determination criterion item content 502 is met (step 719).

When the recognized character string does not match the determination criterion item (step 719: No), the character string determination unit 7 stores the value 0 in the corresponding determination criterion item of the determination criterion collation result 607 of the character string information table 9 (step 719). 720), the process returns to Step 717. When the recognized character string matches the determination criterion item (step 719: Yes), the character string determination unit 7 adds the character string determination criterion 8 to the corresponding determination criterion item of the determination criterion matching result 607 in the character string information table 9. The weight value 503 of the corresponding determination criterion item is stored (step 721), and the process returns to step 717.

In step 717, when there is no unapplied determination criterion item (step 717: No), the character string determination unit 7 sums the values for each determination criterion item stored in the determination criterion matching result 607 of the character string information table 9. The total value is stored in the weight sum 609 (step 722). Next, the character string determination unit 7 determines whether or not the total value stored in the weight sum 609 is equal to or greater than the threshold value S2 acquired in Step 714 (Step 723).

If the total value is not greater than or equal to the threshold value S2 (step 723: No), the character string determination unit 7 stores the value 0 in the translation target flag 610 of the recognized character string in the character string information table 9 (step 724), and step Return to 715. If the total value is greater than or equal to the threshold value (step 723: Yes), the character string determination unit 7 stores the value 1 in the translation target flag 610 of the recognized character string in the character string information table 9 (step 725). Return to 715.

FIG. 8 shows an example of a list output screen of character strings determined as translation target character strings. FIG. 8 illustrates a list output screen that is output and displayed based on the data in the character string information table 9 shown in FIG. The character string list output screen 800 includes an output image subscreen 801 that outputs an electronic image document and a translation result of a character string in the electronic image document, and a translation status subscreen 802 that outputs a list of translation target character strings and a translation result. And including. The translation status sub-screen 802 includes a status 803 that displays the translation work status of each character string, a translation target character string 804 that is a recognized character string to be translated, and a translation result 805 for the translation target character string 804. .

The user can sort the values of the selected items in descending or ascending order by selecting any of the item headings of the status 803, the translation target character string 804, and the translation result 805. Thereby, for example, the user can easily grasp a character string that has not yet been translated, or can easily check whether the translation result of the same character string varies.

Also, the translation target character string 804 is linked with the description position on the output image sub-screen 801. When the user designates an arbitrary character string in the translation target character string 804, the output image sub-screen 801 displays a description portion of the designated character string. The character string information management unit 16 refers to the description position 602 of the character string information table 9 to obtain the description position of the character string of the translation target character string 804. In this way, the user can refer to the list of character strings to be translated on the translation status sub-screen 802 and display it in conjunction with the output image sub-screen 801. Therefore, the translation omission for the character strings in the electronic image document can be performed. Can be reduced.

Also, the translation status sub-screen 802 displays the number of character strings to be translated and the total number of characters at the top. In the example of FIG. 8, the translation status subscreen 802 displays that the number of character strings to be translated is six and the total number of characters is 40 characters. These values are calculated by the character string information management unit 16 from the recognized character string 601 whose translation target flag is 1 in the character string information table 9. From the display of the translation status sub-screen 802, the user can grasp the amount of character strings to be translated in the electronic image document together with the list of character strings. Can be estimated.

FIG. 9 shows an example of a screen for changing the character string criterion 8. When the user presses the determination criterion change button 806 shown in FIG. 8, a determination criterion change screen 900 for changing the character string determination criterion 8 is displayed. The user changes the constituent elements and values (values enclosed in [] in the determination criterion change screen 900) for each determination criterion item constituting the character string determination criterion 8 on the determination criterion change screen 900. Can do. FIG. 9 shows that when the threshold value (Num_of_items_threshold) 902 of the character string determination criterion 8 shown in FIG. 5A is 3 to 4 (that is, when the recognized character string satisfies all four kinds of reference items) An example of a change to (determined as a character string) is shown.

When the user presses the update button 903 after inputting the determination criterion change content, the content displayed on the determination criterion change screen 900 immediately before the pressing is updated and reflected in the character string determination criterion 8. When the user presses the cancel button 904, the content is not updated and reflected in the character string determination standard 8.

FIG. 10 shows an example of an output screen in which the translation process is re-executed after the character string determination standard 8 is changed. When the user presses the redisplay button 807 after updating the character string determination standard 8 with the contents illustrated in FIG. 9, the character string determination unit 7 re-executes the determination process using the updated character string determination standard 8. The re-execution result is stored in the character string information table 9. Then, based on the information in the character string information table 9 storing the re-execution result, the translation status sub-screen 802 is re-displayed.

Comparing FIG. 10 with FIG. 8, in FIG. 10, the recognized character string “NPN transistor” is not regarded as a character string to be translated based on the updated determination criteria, and the characters output and displayed on the translation status sub-screen 802 Excluded from the column. In addition, since the recognized character string “NPN transistor” is no longer regarded as a translation target, the number of translation target character strings and the number of translation target characters are also reduced.

As described above, the electronic image document editing system according to the present embodiment allows the user to set the character string determination criterion 8 corresponding to the electronic image document to be edited because the contents of the character string determination criterion 8 can be updated. The character string to be edited can be extracted with high accuracy.

As described above, the electronic image document editing system of the present embodiment is recognized by character recognition processing from an electronic image document in which figure (non-character information) and character information are mixed, such as a design drawing. Among character information, character information to be edited can be specified with high accuracy. As a result, using the electronic image document editing system of the present embodiment, the user can easily and accurately grasp the description location and amount of text information to be edited, which in turn improves the efficiency and quality of editing work. Can be improved.

In addition, this invention is not limited to the above-mentioned Example, Various modifications are included. For example, the above-described embodiments are described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

In addition, each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

Claims

An electronic image document editing system for editing a character string recognized from an electronic image document,
Including a processor and a storage device;
The storage device stores one or more character string determination criteria that are criteria for determining whether or not a character string including one or more characters is an edit target character string;
The processor is
Accept input of electronic image documents,
Recognizing a character string composed of one or more characters among a plurality of types of characters in the input electronic image document;
When the recognized character string satisfies the character string determination criteria, it is determined that the recognized character string is an edit target character string;
The character string criterion is
A first determination criterion in which the recognized character string is composed of characters that are equal to or greater than a first threshold value (the first threshold value is an integer of 2 or more);
A second determination criterion, wherein the recognized character string includes a partial character string composed of a character equal to or greater than a second threshold value (the second threshold value is an integer equal to or greater than 2) in characters of the first type group that is a part of the plurality of types. When,
A third determination criterion including a character in a second type group in which the recognized character string is a part of the plurality of types;
An electronic image document editing system including at least one determination criterion among fourth determination criteria in which the recognized character string includes a content word.
The electronic image document editing system according to claim 1,
The storage device further holds a weight value corresponding to each of the character string determination criteria,
The processor is
Calculating a sum of weight values corresponding to each of the character string determination criteria satisfied by the recognized character string;
An electronic image document editing system that determines that the recognized character string is an edit target character string when the sum of the weight values is equal to or greater than a third threshold value.
The electronic image document editing system according to claim 1,
The character string criterion includes the first criterion and the second criterion,
The electronic image document editing system in which the processor determines that the recognized character string is an edit target character string when the recognized character string satisfies both the first determination criterion and the second determination criterion.
The electronic image document editing system according to claim 1,
The electronic image document editing system, wherein the first threshold is 2.
The electronic image document editing system according to claim 1,
The first type group is an electronic image document editing system composed of kanji, hiragana and katakana.
The electronic image document editing system according to claim 1,
The electronic image document editing system, wherein the second threshold is 2.
The electronic image document editing system according to claim 1,
The second type group is an electronic image document editing system configured with types other than symbols, numeral suffixes, numbers, alphabets, and emergency kanji.
The electronic image document editing system according to claim 1,
The electronic image document editing system, wherein the content word is an independent word.
The electronic image document editing system according to claim 1,
The processor is
An electronic image document editing system for outputting a list of character strings to be edited together with the input electronic image document.
The electronic image document editing system according to claim 1,
The electronic image document editing system, wherein the processor outputs a total number of the edit target character strings and a total number of characters of the edit target character strings.
The electronic image document editing system according to claim 1,
The processor is
Accepts an input for changing the character string criteria;
Storing the changed character string criterion in the storage device;
An electronic image document editing system that determines that the recognized character string is an edit target character string when the recognized character string satisfies the changed character string determination criterion.
A method of editing a character string recognized from an electronic image document, wherein the electronic image document editing system determines whether a character string consisting of one or more characters is an edit target character string,
The electronic image document editing system holds one or more character string determination criteria that are criteria for determining whether or not a character string consisting of one or more characters is an edit target character string,
The method
Accept input of electronic image documents,
Recognizing a character string composed of one or more characters among a plurality of types of characters in the input electronic image document;
Determining that the recognized character string is an edit target character string if the recognized character string satisfies the character string determination criterion,
The character string criterion is
A first determination criterion in which the recognized character string is composed of characters that are equal to or greater than a first threshold value (the first threshold value is an integer of 2 or more);
A second determination criterion, wherein the recognized character string includes a partial character string composed of a character equal to or greater than a second threshold value (the second threshold value is an integer equal to or greater than 2) in characters of the first type group that is a part of the plurality of types. When,
A third determination criterion including a character in a second type group in which the recognized character string is a part of the plurality of types;
The method in which the recognized character string includes at least one determination criterion among the fourth determination criteria including a content word.
A program executed in an electronic image document editing system for editing a character string recognized from an electronic image document,
The electronic image document editing system includes a processor and a storage device,
The storage device stores one or more character string determination criteria that are criteria for determining whether or not a character string including one or more characters is an edit target character string;
The program is
A procedure for accepting input of an electronic image document;
Recognizing a character string composed of one or more characters among a plurality of types of characters in the input electronic image document;
If the recognized character string satisfies the character string determination criterion, the processor determines that the recognized character string is an edit target character string, and
The character string criterion is
A first determination criterion in which the recognized character string is composed of characters that are equal to or greater than a first threshold value (the first threshold value is an integer of 2 or more);
A second determination criterion, wherein the recognized character string includes a partial character string composed of a character equal to or greater than a second threshold value (the second threshold value is an integer equal to or greater than 2) in characters of the first type group that is a part of the plurality of types. When,
A third determination criterion including a character in a second type group in which the recognized character string is a part of the plurality of types;
A program including at least one determination criterion among fourth determination criteria in which the recognized character string includes a content word.