CN110852142A - Document analysis device, document analysis method, document analysis program, and document analysis system - Google Patents

Document analysis device, document analysis method, document analysis program, and document analysis system Download PDF

Info

Publication number
CN110852142A
CN110852142A CN201910768003.2A CN201910768003A CN110852142A CN 110852142 A CN110852142 A CN 110852142A CN 201910768003 A CN201910768003 A CN 201910768003A CN 110852142 A CN110852142 A CN 110852142A
Authority
CN
China
Prior art keywords
document
unit
characters
character
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910768003.2A
Other languages
Chinese (zh)
Inventor
藤泽正人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IB Research KK
Original Assignee
IB Research KK
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IB Research KK filed Critical IB Research KK
Publication of CN110852142A publication Critical patent/CN110852142A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention enables to configure a document drawing in a direction in which a user can read a reference numeral. A document analysis device (1) according to an embodiment of the present invention includes: a reference number extraction unit (112) that extracts characters contained in a document; an orientation determination unit (114) that determines the orientation of the figure included in the arrangement document on the basis of the characters extracted by the reference number extraction unit (112); and an output unit (115) that outputs information for arranging the drawing in the predetermined direction when the direction determined by the direction determination unit (114) is different from the predetermined direction.

Description

Document analysis device, document analysis method, document analysis program, and document analysis system
Technical Field
The present invention relates to a document analysis device, a document analysis method, a document analysis program, and a document analysis system for analyzing documents.
Background
Documents such as patent publications include articles and drawings that explain the technical matters and methods. Since a user needs to read a document and a drawing while comparing them, it takes a lot of time and effort to understand the contents of the document. Patent document 1 describes an apparatus for associating and displaying names of reference symbols extracted from drawings and reference symbols extracted from texts to assist document viewing.
Japanese patent laid-open publication No. 2013-92916
The document may contain drawings arranged in different directions. That is, there are drawings arranged in a direction in which a user can read a reference numeral without changing the state, and drawings arranged in a direction in which the user cannot read the reference numeral without changing the state. When viewing a drawing arranged in a direction in which a reference numeral cannot be read, a user needs to manually rotate the drawing, which is inconvenient. The device described in patent document 1 cannot automatically arrange drawings in a direction in which a user can read a reference numeral.
Disclosure of Invention
The present invention has been made in view of the above problems, and an object of the present invention is to provide a document analysis device, a document analysis method, a document analysis program, and a document analysis system, which can arrange a document drawing in a direction in which a user can read a reference mark.
The document analysis device according to aspect 1 of the present invention includes: an extraction unit that extracts characters contained in a document; a determination unit that determines a direction in which a drawing included in the document is arranged, based on the characters extracted by the extraction unit; and an output unit that outputs information for arranging the drawing in a predetermined direction when the direction determined by the determination unit is different from the predetermined direction.
The extraction unit may extract one or more 1 st characters corresponding to a reference character by comparing a reference character indicating a character to be extracted with a pixel of a drawing while scanning the region of the drawing, the region of the structural diagram, or a page including at least one of the drawing and the structural diagram in the document along a1 st direction, and extract one or more 2 nd characters corresponding to the reference character by comparing the reference character with the pixel of the drawing while scanning the region of the drawing, the region of the structural diagram, or the page including at least one of the drawing and the structural diagram in the document along a2 nd direction orthogonal to the 1 st direction, and the specification unit may compare the one or more 1 st characters extracted by the scanning in the 1 st direction with the one or more 2 nd characters extracted by the scanning in the 2 nd direction, thereby determining the direction.
The extraction unit may extract one or more 1 st characters corresponding to the character to be extracted and one or more 2 nd characters corresponding to the character to be extracted after rotation by comparing a reference character indicating the character to be extracted after rotation with a pixel of the drawing while scanning a region of the drawing, a region of a structural diagram, or a page including at least one of the drawing and the structural diagram in the document, and the specifying unit may specify the direction by comparing the one or more 1 st characters extracted by the scanning and the one or more 2 nd characters extracted by the scanning.
The document analysis device may further include a2 nd extraction unit that extracts one or more 3 rd characters corresponding to a predetermined character from a sentence included in the document, and the specification unit may specify the direction by comparing a degree of coincidence between the one or more 1 st characters and the one or more 3 rd characters and a degree of coincidence between the one or more 2 nd characters and the one or more 3 rd characters.
The 2 nd extraction unit may extract a name associated with the 3 rd character from the article, and the output unit may output information for arranging the drawing in the predetermined direction and information for displaying the name associated with the 3 rd character on the drawing.
In the document analysis method according to claim 2 of the present invention, the processor executes: extracting characters contained in the literature; a step of determining a direction in which drawings included in the document are arranged, based on the characters extracted in the extracting step; and a step of outputting information for arranging the drawing in a predetermined direction in a case where the direction determined by the determining step is different from the predetermined direction.
A document analysis program according to aspect 3 of the present invention causes a computer to execute: extracting characters contained in the literature; a step of determining a direction in which drawings included in the document are arranged, based on the characters extracted in the extracting step; and an information step of outputting information for arranging the drawing in a predetermined direction in a case where the direction determined by the determination step is different from the predetermined direction.
A document analysis system according to claim 4 of the present invention includes a document management device and a document analysis device, the document management device including: a storage unit that stores documents; and a supply unit configured to supply the document stored in the storage unit to the document analysis device, the document analysis device including: an extraction unit that extracts characters included in the document provided from the document management apparatus; a determination unit that determines a direction in which a drawing included in the document is arranged, based on the characters extracted by the extraction unit; and an output unit that outputs information for arranging the drawing in a predetermined direction to the document management apparatus when the direction determined by the determination unit is different from the predetermined direction.
According to the present invention, the effect of enabling the document drawings to be arranged in a direction in which the user can read the reference numerals is exhibited.
Drawings
Fig. 1 is a schematic diagram of a document analysis system according to an embodiment.
Fig. 2 is a block diagram of a document analysis system according to an embodiment.
Fig. 3 is a diagram showing an exemplary diagram included in document information.
Fig. 4 is a schematic diagram of the reference numeral extraction method 1 performed by the reference numeral extraction section.
Fig. 5 is a schematic diagram of the 2 nd reference numeral extraction method performed by the reference numeral extraction section.
Fig. 6 is a schematic diagram of a method of extracting a reference symbol and a name of the reference symbol from an article by the reference symbol name extraction unit.
Fig. 7 is a schematic diagram of exemplary orientation information stored in the orientation information storage unit and exemplary reference numeral information stored in the reference numeral information storage unit.
FIG. 8 is a schematic illustration of a method of configuring a drawing in a predetermined orientation.
Fig. 9 is a schematic diagram of a drawing with reference label overlaps.
Fig. 10 is a timing chart of the document analysis method according to the embodiment.
Fig. 11 is a schematic diagram for explaining a method of highlighting a reference numeral or a name of a reference numeral.
Fig. 12 is a schematic diagram for explaining a method of highlighting a reference numeral or a name of a reference numeral.
Description of the reference numerals
S, a document analysis system; 1, a literature analysis device; 11 a control unit; 112, reference numeral extraction part; 113 reference numeral name extracting part; 114 orientation determination section; 115 an output section; 2 a document management device; 21 a control unit; 212 a document information providing unit; 22 a storage section; 221 document information storage unit.
Detailed Description
[ overview of document analysis System S ]
Fig. 1 is a schematic diagram of a document analysis system S according to the present embodiment. The document analysis system S includes a document analysis device 1, a document management device 2, and a user terminal 3. The document analysis system S may be a server or a terminal.
The user terminal 3 is a computer having a display unit 31 and an operation unit 32. The user terminal 3 communicates with the document management apparatus 2 via a network N such as the internet or a local area network. The display unit 31 includes a display device such as a liquid crystal display for displaying documents such as patent publications. The operation unit 32 includes an operation device such as a keyboard and a mouse for receiving a user operation. The display unit 31 and the operation unit 32 may be integrally configured by using a touch panel capable of detecting a user contact position as the display unit 31.
The document management apparatus 2 is a computer that stores information of documents and provides information for displaying the documents to the user terminal 3. The document management apparatus 2 communicates with the document analysis apparatus 1 and the user terminal 3 via the network N. The document is, for example, a patent document such as a patent publication or a patent publication, and includes a sentence and a drawing. Also, the drawings contain words indicating reference numerals, and the articles contain names of the reference numerals associated with the reference numerals of the drawings. Thus, the literature can associate the figure contents with the text contents, and technically explain the contents and the methods. The document is not limited to the patent document, and may be another document in which the figure content and the article content are associated with each other by a reference numeral.
The document analysis device 1 is a computer that analyzes documents received from the document management device 2 and supplies the analysis result to the document management device 2. The document analysis device 1 communicates with the document management device 2 via the network N.
Next, an outline of processing performed by the document analysis system S will be described. First, the user specifies a document (a) to be displayed by using the operation unit 32 of the user terminal 3. The document management apparatus 2 reads a document to be displayed, which is designated by the user terminal 3, from the storage unit, and transmits the document to the document analysis apparatus 1(b) as document information.
The document analysis device 1 analyzes the document information received from the document management device 2, and transmits information for arranging the drawing of the document in a predetermined orientation and information of a reference label indicating a name corresponding to the reference label on the drawing as analysis information to the document management device 2 (c). The predetermined orientation of the drawing is the orientation in which a person (user) can read the reference numerals contained in the drawing while keeping the state unchanged.
Accordingly, the document management apparatus 2 generates display information for displaying a reference label superimposed on the document on the drawing arranged in the predetermined orientation based on the analysis information received from the document analysis apparatus 1, and transmits the display information to the user terminal 3 (d). The user terminal 3 displays the document on the display unit 31 based on the display information received from the document management apparatus 2.
In this way, the document analysis system S can arrange the document drawings in a predetermined orientation, and can extract the reference numerals from the drawings. In addition, since the document analysis system S can superimpose and display a reference mark label indicating a name corresponding to a reference mark on a drawing arranged in a predetermined orientation, it becomes easy for a user to explain the meaning of the reference mark of the drawing by looking at a sentence and the drawing.
[ Structure of document analysis System S ]
Fig. 2 is a block diagram of the document analysis system S according to the present embodiment. In fig. 2, arrows indicate the flow of main data, and data other than the data shown in fig. 2 may be used. In fig. 2, each block represents not a hardware (device) unit configuration but a functional unit configuration. Therefore, the frame shown in fig. 2 may be installed in a single device, or may be installed in a plurality of devices separately. Data transmission and reception between the frames are performed by any means such as a data bus, a network, and a removable storage medium.
The document management apparatus 2 includes a control unit 21, a storage unit 22, and a communication unit 23. The control unit 21 includes a user input receiving unit 211, a document information providing unit 212, an analysis information receiving unit 213, and a display control unit 214. The storage unit 22 includes a document information storage unit 221, a direction information storage unit 222, and a reference numeral information storage unit 223.
The communication unit 23 is a communication interface for performing communication between the document analysis device 1 and the user terminal 3. The communication unit 23 includes a processor, a connector, an antenna, and the like for performing communication. The communication unit 23 performs predetermined processing on a communication signal received from the document analysis device 1 or the user terminal 3 to acquire data. The communication unit 23 performs predetermined processing on data to be transmitted externally to generate a communication signal, and transmits the generated communication signal to the document analysis device 1 or the user terminal 3.
The storage unit 22 is a storage medium including rom (read Only memory), ram (random Access memory), a hard disk drive, and the like. The storage unit 22 stores a program executed by the control unit 21 in advance. The document information storage unit 221, the orientation information storage unit 222, and the reference numeral information storage unit 223 may be storage areas on the storage unit 22, or may be databases configured on the storage unit 22.
The document information storage unit 221 stores document information indicating a document such as a patent publication and identification information (for example, a document number) for identifying the document in association with each other in advance. The document contains 1 or more article pages and 1 or more figure pages. The article page is a page in which articles in which characters of the document are aligned in the upward direction are described. The figure page is a page in which a figure is recorded. The drawing may be a structural diagram of a chemical formula or the like. The figure page may also be a page containing only 1 figure. A drawing page may also be a page containing multiple drawings in the same orientation. The figure page may be a page in which a plurality of figures having different orientations are mixed, each of which has characters arranged in a different orientation.
Identification information (for example, a drawing number) for identifying a drawing is assigned to each drawing of the document. The document information storage unit 221 may store the articles and drawings of 1 document as a single file, or may store the articles and drawings as 1 file. The direction information storage unit 222 stores direction information indicating the direction of the drawing of the document. The reference numeral information storage 223 stores reference numeral information indicating names and positions of reference numerals included in the reference drawings.
The control unit 21 is, for example, a processor such as a cpu (central Processing unit), and functions as a user input receiving unit 211, a document information providing unit 212, an analysis information receiving unit 213, and a display control unit 214 by executing a program stored in the storage unit 22. At least a part of the functions of the control section 21 may be performed by a circuit. At least a part of the functions of the control unit 21 may be executed by a program executed via a network.
The document analysis device 1 includes a control unit 11, a storage unit 12, and a communication unit 13. The control unit 11 includes a document information acquisition unit 111, a reference numeral extraction unit 112, a reference numeral name extraction unit 113, an orientation specification unit 114, and an output unit 115.
The communication unit 13 is a communication interface for communicating with the document management apparatus 2. The communication unit 13 includes a processor, a connector, an antenna, and the like for performing communication. The communication unit 13 performs predetermined processing on the communication signal received from the document management apparatus 2 to acquire data. The communication unit 13 performs predetermined processing on data to be transmitted to the outside to generate a communication signal, and transmits the generated communication signal to the document management apparatus 2.
The storage section 12 is a storage medium including a ROM, a RAM, a hard disk drive, and the like. The storage unit 12 stores a program executed by the control unit 11 in advance. The control unit 11 is, for example, a processor such as a CPU, and functions as a document information acquisition unit 111, a reference numeral extraction unit 112, a reference numeral name extraction unit 113, a direction specification unit 114, and an output unit 115 by executing a program stored in the storage unit 12. At least a part of the functions of the control unit 11 may be executed by a circuit. At least a part of the functions of the control unit 11 may be executed by a program executed via a network.
The document analysis system S according to the present embodiment is not limited to the specific configuration shown in fig. 2. For example, the document analysis device 1 and the document management device 2 may be integrated into 1 device.
[ description of literature analysis methods ]
In the following, a document analysis method performed by the document analysis system S will be described. First, the user designates a document to be displayed by using the operation unit 32 of the user terminal 3. The user terminal 3 transmits identification information of a document specified by a user as a display target to the document management apparatus 2. The identification information of the document is, for example, a unique number assigned to the document.
In the document management apparatus 2, the user input reception unit 211 receives identification information of a document designated as a display target from the user terminal 3 via the communication unit 23. In a case where the orientation information and the reference mark information associated with the identification information of the document to be displayed are already stored in the orientation information storage unit 222 and the reference mark information storage unit 223 (that is, in a case where the first display is not performed), the display control unit 214 executes a display control method described later.
In a case where the orientation information and the reference mark information associated with the identification information of the document to be displayed are not stored in the orientation information storage unit 222 and the reference mark information storage unit 223 (that is, in a case of the first display), the document information providing unit 212 acquires the document information associated with the identification information received by the user input receiving unit 211 from the document information storage unit 221. Thus, the document information providing unit 212 provides the acquired document information to the document analysis device 1 via the communication unit 23 together with the document identification information. In the document analysis device 1, the document information acquisition unit 111 acquires identification information and document information of a document specified as a display target from the document management device 2 via the communication unit 13.
Next, the reference numeral extracting unit 112 extracts characters representing reference numerals from drawings included in the document information acquired by the document information acquiring unit 111. When there are a plurality of drawings included in the document information acquired by the document information acquisition unit 111, the reference numeral extraction unit 112 extracts a reference numeral from each drawing and associates the extracted reference numeral with the identification information of the drawing. Fig. 3 (a) and 3 (b) are diagrams showing drawings on an exemplary drawing page F included in document information. The "a 1" and "a 2" shown in fig. 3 (a) and 3 (b) are reference numerals illustrated in the drawings.
As described above, the predetermined orientation of the drawing is an orientation in which a person (user) can read the reference numerals included in the drawing while the person is kept unchanged. The drawing shown in (a) of fig. 3 can read the reference numerals "a 1", "a 2" while the user keeps the state unchanged because the configuration is in the predetermined orientation. On the other hand, in the drawing shown in fig. 3 (b), since the drawing is arranged in an orientation rotated by 90 degrees toward the left in the predetermined orientation, the user cannot read the reference numerals "a 1" and "a 2" while keeping this state.
The reference symbol extracting unit 112 performs at least one of the 1 st reference symbol extracting method shown in fig. 4 and the 2 nd reference symbol extracting method shown in fig. 5 in order to determine the drawing orientation of the state of fig. 3 (a) or fig. 3 (b) and extract the reference symbol in the drawing with high accuracy.
Fig. 4 is a schematic diagram of the reference numeral extraction method 1 performed by the reference numeral extraction section 112. The reference characters 4 are stored in advance in the storage unit 12 of the document analysis device 1. The reference character 4 contains a group of characters, such as numerals, letters, signs, or a combination thereof, which are objects of extraction.
The reference character extracting unit 112 performs the 1 st scan by comparing the reference character 4 and the drawing pixel in the vertical direction while scanning the drawing along the 1 st direction (for example, the region of the drawing, the region of the structural diagram, or the horizontal direction of the page including at least one of the drawing and the structural diagram), and extracts a character (the 1 st character) corresponding to the reference character 4 from the drawing. That is, the reference numeral extraction unit 112 scans any one of the region of the drawing, the region of the structural diagram, or the page including the drawing or the structural diagram to extract the characters. Next, the reference character extracting unit 112 scans the drawing along the 2 nd direction (for example, the region of the drawing, the region of the structural diagram, or the vertical direction of the page including at least one of the drawing and the structural diagram) as the 2 nd scan, compares the reference character 4 with the pixels of the drawing, and extracts a character (2 nd character) corresponding to the reference character 4 from the drawing. In the scan 2, the reference character extracting unit 112 extracts characters corresponding to characters rotated by 90 degrees in the left direction of the reference character 4 from the drawing. Alternatively, in the 2 nd scan, the reference symbol extracting unit 112 may scan the drawing rotated by 90 degrees in the right direction to extract the character corresponding to the reference character 4 from the drawing.
The reference numeral extracting unit 112 uses, for example, an ocr (optical Character recognition) technique to extract a Character corresponding to the reference Character 4 from the drawing. The reference character extracting unit 112 may use another method capable of extracting a character corresponding to the reference character 4 from the drawing.
In this way, according to the 1 st reference numeral extraction method, the reference numeral extraction unit 112 extracts characters by the 1 st scan and the 2 nd scan. Therefore, even when the drawing arrangement is in a direction in which the reference marks cannot be read without changing the holding state, the characters can be extracted by performing the conventional character extraction process such as OCR only twice.
Fig. 5 is a schematic diagram of the 2 nd reference numeral extraction method performed by the reference numeral extraction section 112. The reference characters 5 are stored in advance in the storage unit 12 of the document analysis device 1. The reference character 5 includes a set of characters (for example, vertically oriented characters and horizontally oriented characters) to be extracted, and is, for example, a number, a letter, a symbol, or a combination thereof.
The reference symbol extraction unit 112 compares the reference character 5 with the pixels of the drawing while scanning the drawing along a certain direction (for example, a region of the drawing, a region of the structural diagram, or a horizontal direction of a page including at least one of the drawing and the structural diagram), and extracts a character (1 st character) in the vertical direction in the reference character 5 and a character (2 nd character) in the left direction in the reference character 5 from the drawing. That is, the reference numeral extracting unit 112 scans the region of the drawing, the region of the structural diagram, or the page including the structural diagram of the drawing to extract the characters. The reference character extracting unit 112 uses, for example, an OCR technique to extract a character corresponding to the reference character 5 from the drawing. The reference character extracting unit 112 may use another method capable of extracting a character corresponding to the reference character 5 from the drawing.
In this way, according to the 2 nd reference numeral extraction method, the reference numeral extraction section 112 extracts characters by 1-direction scanning. Therefore, even when the drawing is arranged in the orientation in which the reference mark cannot be read without changing the holding state, the character can be extracted by only 1 scan.
In the above-described reference numeral 1 extraction method and reference numeral 2 extraction method, the drawings are illustrated as being arranged in a predetermined orientation or in an orientation rotated by 90 degrees to the left of the predetermined orientation, but the drawings may be arranged in other orientations. For example, the drawings may be arranged in an orientation rotated 180 degrees or 270 degrees in a left orientation of a predetermined orientation. In addition, the drawings may be arranged in an orientation rotated by any one of a plurality of predetermined angles with respect to a predetermined orientation.
The reference mark name extraction unit 113 extracts a reference mark and a name of the reference mark from a sentence included in the document information acquired by the document information acquisition unit 111. The reference mark name extraction section 113 extracts the name of a reference mark from an article after, before, or in parallel with the process of the reference mark extraction section 112 extracting the reference mark from the figure.
Fig. 6 is a schematic diagram of a method of extracting a reference symbol and a name of the reference symbol from an article by the reference symbol name extraction unit 113. First, the reference character name extraction unit 113 searches the sentence on the sentence page W for the reference character 61 as a character (3 rd character) corresponding to a predetermined character. The predetermined words comprise a set of words that can be used as reference numbers, such as numbers, letters, signs or a combination thereof.
Thus, the reference numeral name extraction section 113 determines a word adjacent to the reference numeral 61 found by the search as the name 62 of the reference numeral. For example, the reference character name extracting unit 113 may recognize the positions where kanji, hiragana, and symbols alternate as word divisions in order to divide japanese texts into words. For example, the reference character name extracting unit 113 may recognize the position of the article as a word distinction in order to divide the english article into words. For example, the reference character name extracting unit 113 recognizes the position of a space (blank character) as a word classification in order to divide a korean sentence into words. The reference character name extracting unit 113 is not limited to the specific method shown here, and may divide the article into words according to a method corresponding to a language. The reference character name extracting unit 113 may divide the sentence into words by a known method of analyzing the elements.
When a plurality of reference symbols 61 are found, the reference symbol name extraction unit 113 extracts the reference symbol names 62 for the respective reference symbols 61. When a plurality of names corresponding to 1 reference numeral 61 are extracted, the reference numeral name extraction unit 113 takes the most extracted name as the reference numeral name 62. The reference symbol name extracting unit 113 is not limited to the above-described specific method, and may extract the name of a reference symbol associated with a reference symbol by another method.
The orientation determination unit 114 determines the orientation of the drawing based on the characters extracted from the drawing by the reference character extraction unit 112 and the reference characters extracted from the article by the reference character name extraction unit 113. The orientation determination unit 114 also determines the drawing direction by determining the drawing orientation. Specifically, in the case where the reference symbol extraction section 112 executes the 1 st reference symbol extraction method, the orientation determination section 114 calculates the degree of coincidence (for example, coincidence rate or coincidence number) between the characters extracted by the 1 st scan and the reference symbols extracted from the sentence, and calculates the degree of coincidence between the characters extracted by the 2 nd scan and the reference symbols extracted from the sentence. Accordingly, when the degree of matching of the characters extracted by the 1 st scan is equal to or greater than the degree of matching of the characters extracted by the 2 nd scan, the orientation specification unit 114 determines that the drawing is arranged in the predetermined orientation. When the degree of matching of the characters extracted by the 1 st scan is smaller than the degree of matching of the characters extracted by the 2 nd scan, the orientation determination unit 114 determines that the drawing is not arranged in the predetermined orientation but arranged in an orientation rotated by 90 degrees in the left direction of the predetermined orientation.
In the case where the reference symbol extraction section 112 executes the 2 nd reference symbol extraction method, the orientation determination section 114 calculates the degree of coincidence between the character extracted based on the vertically oriented character and the reference symbol extracted from the sentence, and also calculates the degree of coincidence between the character extracted based on the left oriented character and the reference symbol extracted from the sentence. Therefore, when the degree of matching of the characters extracted based on the characters in the vertical direction is equal to or greater than the degree of matching of the characters extracted based on the characters in the left direction, the direction specification unit 114 determines that the drawing is arranged in the predetermined direction. When the degree of matching of the characters extracted based on the characters in the vertical orientation is smaller than the degree of matching of the characters extracted based on the characters in the left orientation, the orientation specification unit 114 determines that the drawing is not arranged in the predetermined orientation but arranged in an orientation rotated by 90 degrees in the left orientation.
The reference numeral extracting unit 112 specifies a character of an orientation matching the orientation of the drawing specified by the orientation specifying unit 114 as a reference numeral. When the reference symbol extraction unit 112 uses the 1 st reference symbol extraction method, the reference symbol extraction unit 112 determines, as a reference symbol, a character extracted by a scan in a direction in which the degree of coincidence between reference symbols extracted from articles is large in the 1 st scan and the 2 nd scan. When the reference symbol extraction unit 112 uses the 2 nd reference symbol extraction method, the reference symbol extraction unit 112 determines, as a reference symbol, a character having a large degree of matching with a reference symbol extracted from a sentence.
The reference symbol extracting unit 112 determines the position of the reference symbol in the drawing. The position of the reference mark is, for example, the center of gravity of the position of each character included in the reference mark. The position of the reference numeral is represented by, for example, coordinates in the drawing (i.e., an x-coordinate and a y-coordinate in the case where the drawing is arranged in an x-y plane).
In this way, the document analysis device 1 can simultaneously extract characters from drawings and specify the orientation of drawings, and thus can reduce the time taken for the process of analyzing drawings.
The output unit 115 associates the document identification information, the drawing orientation specified by the orientation specifying unit 114, the reference symbol and the position of the reference symbol extracted from the drawing by the reference symbol extracting unit 112, and the name of the reference symbol extracted from the sentence by the reference symbol name extracting unit 113, and outputs the association to the document management apparatus 2 as analysis information.
In the document management apparatus 2, the analysis information reception unit 213 receives the analysis information transmitted from the document analysis apparatus 1. The analysis information reception unit 213 stores, in the direction information storage unit 222, direction information indicating the direction of the reference drawing, and stores, in the reference numeral information storage unit 223, reference numeral information indicating the name and position of a reference numeral included in the reference drawing, based on the received analysis information.
Fig. 7 (a) is a schematic diagram of exemplary orientation information 7 stored in the orientation information storage unit 222. The direction information 7 is information in which identification information 71 (document number) of a document, identification information 72 (drawing number) of a drawing, and a direction 73 of the drawing are associated with each other.
The document identification information 71 is identification information of a document indicated by analysis information received from the document analysis device 1. The figure identification information 72 is the figure identification information indicated by the analysis information received from the document analysis device 1.
The orientation 73 in the drawing is the orientation in the drawing indicated by the analysis information received from the document analysis device 1, and is indicated by, for example, the angle of the left orientation. Thus, by rotating the drawing to the opposite orientation (e.g., 90 degrees to the right) of the drawing orientation 73 (e.g., 90 degrees to the left), the drawing is configured in the predetermined orientation. The orientation 73 of the figure may also represent the angle of rotation required to configure the figure to a predetermined orientation.
Fig. 7 (b) is a schematic diagram of exemplary reference numeral information 8 stored by the reference numeral information storage section 223. The reference numeral information 8 is information associating document identification information 81 (document number), figure identification information 82 (reference number), reference numeral 83, reference numeral name 84, and reference numeral position 85.
The document identification information 81 is identification information of a document indicated by analysis information received from the document analysis device 1. The figure identification information 82 is the figure identification information indicated by the analysis information received from the document analysis device 1. Reference numeral 83 is a reference numeral indicated by analysis information received from the document analysis device 1. The name 84 of the reference numeral is the name of the reference numeral indicated in the analysis information received from the document analysis device 1. The position 85 of the reference numeral is the position of the reference numeral indicated in the analysis information received from the document analysis device 1.
In fig. 7 (a) and 7 (b), the direction information 7 and the reference numeral information 8 are represented by a character string table for identification, but each data may be recorded in any form such as character string data, numerical data, or binary data. The orientation information 7 and the reference numeral information 8 may be recorded as a database or may be recorded as a list of listed data.
In the following, a display control method executed by the display control unit 214 will be described. First, the display control unit 214 acquires document information associated with the identification information received by the user input reception unit 211 from the document information storage unit 221. Further, the display control unit 214 acquires, from the direction information storage unit 222, direction information associated with the identification information received by the user input reception unit 211. Further, the display control unit 214 acquires the reference numeral information associated with the identification information received by the user input reception unit 211 from the reference numeral information storage unit 223.
The display control unit 214 arranges each drawing included in the document information in a predetermined orientation by using the orientation of the drawing indicated by the orientation information. FIG. 8 is a schematic illustration of a method of arranging the figures in a predetermined orientation. Specifically, the display control unit 214 rotates the drawing to the direction opposite to the drawing direction indicated by the direction information. In this way, the display control unit 214 can arrange the drawing in a predetermined orientation by using the orientation determined based on the characters extracted from the drawing.
Next, the display control unit 214 superimposes a reference mark label on the drawing disposed in the predetermined orientation. Fig. 9 (a) and 9(b) are schematic diagrams of drawings in which reference labels 9 are superimposed. The reference label 9 contains reference numerals shown in the reference numeral information and names of the reference numerals.
In the example of fig. 9 (a), the display control unit 214 displays the reference label 9 in the vicinity of the position of the reference indicated by the reference information. Thereby, the user can easily view the reference label corresponding to the reference in the drawing. In the example of fig. 9(b), the display control unit 214 displays the reference label 9 in an aligned manner to an arbitrary end of the drawing. This can prevent the reference label 9 from blocking a part of the drawing.
In this way, when the user designates a document to be displayed, the display control unit 214 superimposes a reference label on the drawing based on the reference label information, and therefore, it is not necessary to change the drawing itself so as to include the reference label and store the reference label in the storage unit. Therefore, the capacity of the storage unit required for storing the drawing can be reduced.
Accordingly, the display control unit 214 outputs, to the user terminal 3 via the communication unit 23, the article included in the document information acquired from the document information storage unit 221, and display information for displaying a figure in which a reference label is superimposed on a figure included in the document information arranged in a predetermined orientation.
The user terminal 3 displays a document including a figure arranged in a predetermined orientation and having a reference label superimposed thereon on the display unit 31 in accordance with the display information received from the document management apparatus 2. Thus, the user can see the drawing arranged in the predetermined orientation, and the labor required for manually rotating the drawing can be reduced. In addition, since the user can recognize the name of the reference symbol extracted from the article as a reference symbol label on the figure, it becomes easy to understand the meaning of the reference symbol of the figure by comparing the article and the figure.
[ timing of literature analysis method ]
Fig. 10 is a timing chart of the document analysis method according to the present embodiment. First, the user terminal 3 transmits identification information of a document specified by a user as a display target to the document management apparatus 2. In the document management apparatus 2, the user input receiving unit 211 receives identification information of a document designated as a display target from the user terminal 3.
In a case where the orientation information storage 222 and the reference sign information storage 223 have already stored the orientation information and the reference sign information associated with the identification information of the document to be displayed (i.e., in a case where they are not displayed for the first time) (no at S11), the document management apparatus 2 proceeds to step S17.
When the direction information and the reference mark information associated with the identification information of the document to be displayed are not stored in the direction information storage unit 222 and the reference mark information storage unit 223 (that is, in the case of the first display) (yes in S11), the document information providing unit 212 acquires the document information associated with the identification information received by the user input receiving unit 211 from the document information storage unit 221. Accordingly, the document information providing unit 212 provides the acquired document information to the document analysis device 1 together with the identification information of the document (S12).
In the document analysis device 1, the document information acquisition unit 111 acquires identification information and document information of a document specified as a display target from the document management device 2. The reference numeral extraction unit 112 extracts a character indicating a reference numeral from a figure included in the document information acquired by the document information acquisition unit 111 (S13).
The reference mark name extraction unit 113 extracts a reference mark corresponding to a predetermined character and a name of a reference mark associated with the reference mark from a sentence included in the document information acquired by the document information acquisition unit 111 (S14). The orientation determining section 114 determines the orientation of the drawing based on the characters extracted by the reference character extracting section 112 in step S13 and the reference characters extracted by the reference character name extracting section 113 in step S14 (S15). The specific method performed by the reference symbol extraction unit 112, the reference symbol name extraction unit 113, and the orientation specification unit 114 is as described above with reference to fig. 4 to 6.
The output unit 115 associates the document identification information, the drawing identification information, the orientation of the drawing specified by the orientation specification unit 114 in step S14, the reference symbol extracted from the drawing by the reference symbol extraction unit 112 and the position of the reference symbol in step S13, and the name of the reference symbol extracted from the sentence by the reference symbol name extraction unit 113 in step S15, and outputs the association to the document management device 2 as analysis information (S16).
In the document management apparatus 2, the analysis information reception unit 213 receives the analysis information transmitted from the document analysis apparatus 1. Accordingly, the analysis information reception unit 213 causes the direction information storage unit 222 to store the direction information indicating the direction of the reference drawing and causes the reference numeral information storage unit 223 to store the reference numeral information indicating the name and the position of the reference numeral included in the reference drawing, based on the received analysis information.
The display control unit 214 acquires document information associated with the identification information received by the user input reception unit 211 from the document information storage unit 221. Further, the display control unit 214 acquires, from the direction information storage unit 222, direction information associated with the identification information received by the user input reception unit 211. Further, the display control unit 214 acquires the reference numeral information associated with the identification information received by the user input reception unit 211 from the reference numeral information storage unit 223.
The display control unit 214 arranges each drawing included in the document information in a predetermined orientation using the drawing orientation indicated by the orientation information (S17). In step S17, display controller 214 superimposes the reference numeral indicated by the reference numeral information and the reference numeral label containing the reference numeral name on the drawing arranged in the predetermined orientation (S18). The method in which the display control unit 214 arranges the drawings in the predetermined orientation and superimposes the reference label is as described above with reference to fig. 8 and 9.
The display control unit 214 outputs, to the user terminal 3, the article included in the document information acquired from the document information storage unit 221 and the display information for displaying the figure in which the reference label is superimposed after arranging the figure included in the document information in the predetermined orientation (S19). The user terminal 3 displays on the display unit 31 documents including the figures in which the reference label is superimposed, the documents being arranged in a predetermined orientation, in accordance with the display information received from the document management apparatus 2.
[ 1 st modification ]
When the document analysis device 1 cannot extract the name of the reference symbol corresponding to the reference symbol from the sentence, the document management device 2 may receive an input of the name of the reference symbol corresponding to the reference symbol from the user terminal 3 when displaying the reference symbol label including the reference symbol.
In this case, when the document management apparatus 2 displays a reference label in which the name of the reference label includes a reference label that cannot be extracted, the display unit 31 of the user terminal 3 displays a screen for accepting input of the name of the reference label corresponding to the reference label. The user inputs the name of the reference numeral corresponding to the reference numeral by using the operation unit 32 of the user terminal 3. The user terminal 3 transmits the name of the reference symbol input by the user to the document management apparatus 2.
The user input receiving unit 211 of the document management apparatus 2 receives the name of the reference symbol input by the user from the user terminal 3. Accordingly, the user input reception unit 211 sets the name of the reference symbol included in the reference symbol information associated with the reference symbol to the name of the reference symbol input by the user in the reference symbol information storage unit 223. Thus, the document management apparatus 2 can display the name of the reference symbol input by the user for the reference symbol whose name is not extracted. Further, since the name of the reference symbol input by the user is stored in the reference symbol information storage unit 223, when the name of the reference symbol cannot be extracted from the sentence, the name of the reference symbol input by any one user is shared by other users.
Similarly, when the document analysis device 1 cannot extract at least a part of the reference numerals from the drawing, the document management device 2 may receive the input of the reference numerals from the user terminal 3 when displaying the drawing. In this case, when the drawing is displayed, the document management apparatus 2 displays a screen for accepting input of a reference numeral included in the drawing on the display unit 31 of the user terminal 3. The user specifies the position of the reference numeral in the drawing with the operation section 32 of the user terminal 3 and inputs the reference numeral. The user terminal 3 transmits the reference numeral input by the user and the position of the reference numeral to the document management apparatus 2.
The user input receiving unit 211 of the document management apparatus 2 receives the reference numerals and the positions of the reference numerals input by the user from the user terminal 3. Accordingly, the user input reception unit 211 causes the reference symbol information storage unit 223 to store the reference symbols input by the user and the reference symbol information including the positions of the reference symbols. Thereby, the document management apparatus 2 can display reference numerals not extracted from the drawings based on the input of the user. Further, since the reference numerals input by the user are stored in the reference numeral information storage unit 223, when the reference numerals cannot be extracted from the drawing, the reference numerals input by any one user can be shared with other users.
[ modification 2 ]
There are situations in which a label displaying a reference on a figure covers a portion of the figure. Here, the document management apparatus 2 may receive an instruction to move the reference label from the user terminal 3.
In this case, the document management apparatus 2 can move the reference mark label when the display unit 31 of the user terminal 3 displays a figure in which the reference mark label is superimposed. The user moves the reference label by using the operation unit 32 of the user terminal 3. The user terminal 3 transmits the position (for example, coordinates on the drawing) of the moved reference mark label to the document management apparatus 2.
The user input reception unit 211 of the document management apparatus 2 receives the position of the reference label moved by the user from the user terminal 3. Accordingly, the user input reception unit 211 sets the position of the reference symbol included in the reference symbol information corresponding to the reference symbol label moved by the user to the position after the user has moved, in the reference symbol information storage unit 223. In this way, the document management apparatus 2 can display the reference label at the position after the user moves. Further, since the moved position is stored in the reference mark information storage unit 223, the position of the reference mark moved by one user can be shared by other users when the reference mark covers a part of the drawing.
[ modification 3 ]
The document analysis device 1 may be able to specify the orientation of the drawing without extracting all the characters included in the drawing. Therefore, the reference numeral extraction unit 112 may extract a part of the characters, not all the characters included in the drawings. Therefore, the reference numeral extraction unit 112 stops extracting characters from the drawing when the number of characters extracted from the drawing is equal to or greater than a predetermined threshold value. The orientation determination unit 114 determines the orientation of the drawing based on a part of the characters extracted from the drawing by the reference character extraction unit 112 and the reference characters extracted from the article by the reference character name extraction unit 113. Thus, the document analysis device 1 can shorten the time taken to extract characters from the drawings and specify the orientation of the drawings.
If the document analysis device 1 can stop character extraction before sufficient information necessary for specifying the orientation of the drawing is obtained, there is a possibility that the orientation of the drawing cannot be specified correctly. Here, the document analysis device 1 may use Artificial Intelligence (AI) to extract a part of the characters included in the drawing. For example, the document analysis device 1 performs machine learning on a plurality of drawings in advance by using artificial intelligence. At this time, the artificial intelligence learns the character features included in the drawings arranged along the respective orientations, and stores the learned data in the storage unit 12.
Accordingly, the reference numeral extracting unit 112 determines the orientation of the drawing based on the learning data stored in the storage unit 12 while extracting characters from the drawing using artificial intelligence. Therefore, when the orientation of the drawing can be determined by artificial intelligence, the reference numeral extraction unit 112 stops extracting characters from the drawing. Accordingly, the orientation specifying unit 114 specifies the drawing orientation determined by the artificial intelligence as the drawing orientation. Thus, the document analysis device 1 can stop extracting characters from the drawing at the stage of drawing orientation specification by artificial intelligence, and thus can specify the drawing orientation at an early stage and with high accuracy.
[ 4 th modification ]
In the above description, the case where the orientation determination section 114 determines the orientation of the drawing has been described, but the orientation determination section 114 may also determine the drawing direction (i.e., two orientations along 1 axis). The drawing direction is represented, for example, as an up-down direction (longitudinal direction) or a left-right direction (lateral direction). For example, when the orientation of the drawing in the document is limited to any one of the upward orientation and the left orientation, the orientation specification unit 114 may specify whether the drawing is in the upward-downward direction or the leftward-rightward direction, and the output unit 115 may transmit analysis information indicating the specified direction to the document management apparatus 2. Thus, the document management apparatus 2 can specify the drawing orientation as the up orientation when the analysis information indicates the up-down direction, and specify the drawing orientation as the left orientation when the analysis information indicates the left-right direction.
[ 5 th modification ]
When the user selects a reference numeral or a name of a reference numeral from one of the drawing and the text, the document analysis device 1 may highlight the selected reference numeral or the name of the reference numeral on the other of the drawing and the text.
Fig. 11 and 12 are diagrams for explaining a method of highlighting a reference numeral or a name of a reference numeral. Fig. 11 shows an example of highlighting a reference number or the name of a reference number selected in a figure in an article. Fig. 12 shows an example in which a reference number or the name of a reference number selected in an article is highlighted in the drawing.
The document management apparatus 2 displays a drawing and a sentence on the display unit 31 of the user terminal 3. As described above, reference numerals and reference label labels including names of the reference numerals are superimposed on the drawings. The user selects a reference numeral or a name of the reference numeral in a figure or a text by using the operation unit 32 of the user terminal 3. For example, as shown in fig. 11 and 12, the user presses the cursor 91 corresponding to a reference numeral or a name of the reference numeral, thereby selecting the position reference numeral or the name of the reference numeral of the cursor 91. Alternatively, the user may drag the cursor 91 so as to include the reference numeral or the name of the reference numeral to select one or more reference numerals or the names of the reference numerals included in the designated range. Alternatively, the user may select all reference numerals and names of the reference numerals included in the drawings or texts by performing a predetermined operation (for example, pressing a full selection button) on the screen.
In the document management apparatus 2, the user input reception unit 211 receives a selection of one or more reference symbols or names of reference symbols by a user from the user terminal 3 via the communication unit 23. Thus, the document information providing unit 212 provides the document analysis device 1 with selection information indicating the selected one or more reference symbols or the names of the reference symbols.
In the document analysis device 1, the document information acquisition unit 111 acquires the selection information from the document management device 2 via the communication unit 13. When the selection information acquired by the document information acquisition unit 111 indicates that at least one of the reference symbol and the name of the reference symbol is selected in the drawing, the output unit 115 outputs display control information for changing the display mode of at least one of the selected reference symbol and the name of the reference symbol in the sentence to the document management device 2. In the text, the change of the display mode means, for example, highlighting by changing the character type, the character color, the frame, or the background color of at least one of the selected reference character and the name of the reference character.
When the selection information acquired by the document information acquisition unit 111 indicates that at least one of the reference symbol and the name of the reference symbol is selected in the sentence, the output unit 115 outputs display control information for changing a display mode of at least one of the selected reference symbol and the name of the reference symbol in the drawing to the document management device 2. In the drawings, the display mode is changed by, for example, changing the character type, the character color, the size, or the background color of the character for the reference mark label including at least one of the selected reference mark and the name of the reference mark.
When the selection information acquired by the document information acquisition unit 111 indicates that at least one of the plurality of reference symbols and the names of the reference symbols is selected, the output unit 115 preferably sets the display modes of at least one of the plurality of reference symbols and the names of the reference symbols to be different from each other (for example, different background colors). Thereby, the user can easily distinguish the different reference numerals and names of the reference numerals.
In the document management apparatus 2, the analysis information reception unit 213 receives the display control information transmitted by the document analysis apparatus 1. Accordingly, the display control unit 214 outputs, to the user terminal 3 via the communication unit 23, display information for displaying a sentence included in the document information acquired from the document information storage unit 221 and a figure in which a reference label is superimposed on a figure in which the document information is included in a predetermined orientation. The display control unit 214 also outputs display information for changing a display mode of at least one of the selected reference symbol and the name of the reference symbol in the figure or the sentence to the user terminal 3 via the communication unit 23 in accordance with the display control information.
The user terminal 3 displays on the display unit 31 a document including a figure in which a reference label is superimposed and arranged in a predetermined orientation, and in which at least one of the selected reference label and the name of the selected reference label is displayed in the figure or the sentence, in accordance with the display information received from the document management apparatus 2. In the example of fig. 11 and 12, a frame 92 of a background color different from that of the other portions is displayed for the selected reference numeral and the name of the reference numeral.
With this configuration, when the user selects a reference numeral or a name of a reference numeral in one of the drawing and the sentence, the document analysis system S highlights the selected reference numeral or the name of the reference numeral in the other of the drawing and the sentence. Thus, the user can easily recognize that a reference mark or a name of a reference mark that is focused on one of the drawing and the text appears in the other of the drawing and the text.
[ Effect of the present embodiment ]
The document analysis system S according to the present embodiment can arrange the reference drawings in an orientation in which the reference numerals can be read, based on the characters of the reference numerals extracted from the drawings. Therefore, the user can view the drawing in the state of being arranged in the predetermined orientation, and the labor of manually rotating the drawing can be reduced.
The document analysis system S can display a drawing with a reference label indicating a reference name superimposed thereon after arranging the drawing in a predetermined orientation. Accordingly, the user can recognize the names of the reference symbols extracted from the article on the drawings, and thus it becomes easy to explain the meanings of the reference symbols of the drawings by comparing the article and the drawings.
The present invention has been described above with reference to the embodiments, but the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes can be made within the scope of the present invention. For example, the specific embodiments of the distribution and integration of the apparatuses are not limited to the above embodiments, and all or a part of them may be configured to be distributed and integrated in any unit function or physical manner. In addition, a new embodiment generated by arbitrary combination of the plurality of embodiments is also included in the embodiments of the present invention. The effect of the new embodiment by the combination matches the effect of the original embodiment.
The processors of the document analysis device 1, the document management device 2, and the user terminal 3 are the main components of the respective steps (steps) included in the method shown in fig. 10. That is, the processors of the document analysis device 1, the document management device 2, and the user terminal 3 read a program for executing the method shown in fig. 10 from the storage unit, and control the respective parts of the document analysis system S by executing the program, thereby executing the method shown in fig. 10. The method shown in fig. 10 may include some steps, may change the order of the steps, or may execute a plurality of steps in parallel.

Claims (10)

1. A document analysis device includes:
an extraction unit that extracts characters contained in a document;
a determination unit that determines a direction in which a drawing included in the document is arranged, based on the characters extracted by the extraction unit; and
an output unit that outputs information for arranging the drawing in a predetermined direction when the direction determined by the determination unit is different from the predetermined direction.
2. The document analysis device according to claim 1,
the extraction unit extracts one or more 1 st characters corresponding to a reference character by comparing the reference character representing a character to be extracted with pixels of a drawing while scanning the region of the drawing, the region of a structural diagram, or a page including at least one of the drawing and the structural diagram in the document along a1 st direction, and extracts one or more 2 nd characters corresponding to the reference character by comparing the reference character with the pixels of the drawing while scanning the region of the drawing, the region of the structural diagram, or the page including at least one of the drawing and the structural diagram in the document along a2 nd direction orthogonal to the 1 st direction,
the specifying unit specifies the direction by comparing the one or more 1 st characters extracted by the 1 st direction scan with the one or more 2 nd characters extracted by the 2 nd direction scan.
3. The document analysis apparatus according to claim 1 or 2, wherein,
the extraction unit extracts one or more 1 st characters corresponding to the extraction target character and one or more 2 nd characters corresponding to the rotated extraction target character by comparing a reference character representing the rotated extraction target character with a pixel of the drawing while scanning a region of the drawing, a region of a structural diagram, or a page including at least one of the drawing and the structural diagram in the document,
the specifying unit specifies the direction by comparing the one or more 1 st characters extracted by the scanning with the one or more 2 nd characters extracted by the scanning.
4. The document analysis apparatus according to claim 2 or 3, wherein,
further comprising a2 nd extraction unit for extracting one or more 3 rd characters corresponding to a predetermined character from the sentence contained in the document,
the specifying unit specifies the direction by comparing the degree of coincidence between the one or more 1 st characters and the one or more 3 rd characters with the degree of coincidence between the one or more 2 nd characters and the one or more 3 rd characters.
5. The document analysis apparatus according to claim 4,
the 2 nd extraction unit extracts a name associated with the 3 rd character from the article,
the output section outputs information for arranging the drawing in the predetermined direction and information for displaying the name associated with the 3 rd letter on the drawing.
6. The document analysis apparatus according to claim 5,
when at least one of the 3 rd character and the name is selected in the drawing, the output unit outputs information for changing a display mode of the selected at least one of the 3 rd character and the name in the sentence.
7. The document analysis apparatus according to claim 5 or 6, wherein,
when at least one of the 3 rd character and the name is selected in the sentence, the output unit outputs information for changing a display mode of the selected at least one of the 3 rd character and the name in the drawing.
8. A document parsing method, wherein a processor performs:
extracting characters contained in the literature;
a step of determining a direction in which drawings included in the document are arranged, based on the characters extracted in the extracting step; and
a step of outputting information for arranging the drawing in a predetermined direction, in a case where the direction determined by the determining step is different from the predetermined direction.
9. A document analysis program causes a computer to execute:
extracting characters contained in the literature;
a step of determining a direction in which drawings included in the document are arranged, based on the characters extracted in the extracting step; and
a step of outputting information for arranging the drawing in a predetermined direction, in a case where the direction determined by the determining step is different from the predetermined direction.
10. A document analysis system includes a document management device and a document analysis device,
the document management apparatus includes:
a storage unit that stores documents; and
a providing unit that provides the document stored in the storage unit to the document analysis device,
the document analysis device includes:
an extraction unit that extracts characters included in the document provided from the document management apparatus;
a determination unit that determines a direction in which a drawing included in the document is arranged, based on the characters extracted by the extraction unit; and
an output unit that outputs information for arranging the drawing in a predetermined direction to the document management apparatus when the direction determined by the determination unit is different from the predetermined direction.
CN201910768003.2A 2018-08-20 2019-08-20 Document analysis device, document analysis method, document analysis program, and document analysis system Pending CN110852142A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018154115 2018-08-20
JP2018-154115 2018-08-20

Publications (1)

Publication Number Publication Date
CN110852142A true CN110852142A (en) 2020-02-28

Family

ID=69595340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910768003.2A Pending CN110852142A (en) 2018-08-20 2019-08-20 Document analysis device, document analysis method, document analysis program, and document analysis system

Country Status (2)

Country Link
JP (1) JP2020030820A (en)
CN (1) CN110852142A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090132899A1 (en) * 2007-11-15 2009-05-21 Milton Jr Harold W System for automatically inserting reference numerals in a patent application
CN102196130A (en) * 2010-03-16 2011-09-21 佳能株式会社 Image processing apparatus and image processing method
US8036493B1 (en) * 2006-03-27 2011-10-11 Neustel Michael S Method for correcting orientation of patent figures
CN103295009A (en) * 2013-06-20 2013-09-11 电子科技大学 License plate character recognition method based on stroke decomposition
CN107077515A (en) * 2015-10-09 2017-08-18 Ib研究株式会社 Display control unit, display control method and display control program

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02270081A (en) * 1989-04-12 1990-11-05 Fujitsu Ltd Direction detecting system for drawing
JPH10224599A (en) * 1997-02-10 1998-08-21 Minolta Co Ltd Image input device
JP2013092916A (en) * 2011-10-26 2013-05-16 Ib Research Kk Intellectual property management device
JP6194781B2 (en) * 2013-12-11 2017-09-13 富士ゼロックス株式会社 Image processing apparatus and program
JP2015210636A (en) * 2014-04-25 2015-11-24 アイビーリサーチ株式会社 Code reading method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8036493B1 (en) * 2006-03-27 2011-10-11 Neustel Michael S Method for correcting orientation of patent figures
US20090132899A1 (en) * 2007-11-15 2009-05-21 Milton Jr Harold W System for automatically inserting reference numerals in a patent application
CN102196130A (en) * 2010-03-16 2011-09-21 佳能株式会社 Image processing apparatus and image processing method
CN103295009A (en) * 2013-06-20 2013-09-11 电子科技大学 License plate character recognition method based on stroke decomposition
CN107077515A (en) * 2015-10-09 2017-08-18 Ib研究株式会社 Display control unit, display control method and display control program

Also Published As

Publication number Publication date
JP2020030820A (en) 2020-02-27

Similar Documents

Publication Publication Date Title
CN100568903C (en) Display control unit, image processing apparatus, display control method
US7668814B2 (en) Document management system
CN100382096C (en) Document scanner
WO2020187118A1 (en) Page presentation method and apparatus
US20070098263A1 (en) Data entry apparatus and program therefor
CN111381751A (en) Text processing method and device
EP3522038A1 (en) Method for translating characters and apparatus therefor
EP3543912A1 (en) Image processing device, image processing method, and image processing program
US10114888B2 (en) Terminal, system, method, and program for presenting sentence candidate
JP2005135041A (en) Document search/browse method and document search/browse system
WO2020187117A1 (en) Figure page display method and apparatus, and text page display method and apparatus
CN109766885A (en) A kind of character detecting method, device, electronic equipment and storage medium
RU2605078C2 (en) Image segmentation for data verification
JP6531738B2 (en) Image processing device
JP2016066115A (en) Digital content browsing support device, browsing support method, and program
CN108062301A (en) Character translation method and its device
CN114067797A (en) Voice control method, device, equipment and computer storage medium
US20180121393A1 (en) Text output commands sequencing for pdf documents
KR20180126352A (en) Recognition device based deep learning for extracting text from images
US10275528B2 (en) Information processing for distributed display of search result
US20200175100A1 (en) Efficient data entry system for electronic forms
CN110852142A (en) Document analysis device, document analysis method, document analysis program, and document analysis system
JP6156740B2 (en) Information display device, input information correction program, and input information correction method
KR20200078880A (en) System for guiding store information using technology of image recognize
JP6582464B2 (en) Information input device and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination