WO2014064803A1

WO2014064803A1 - Document processing program, document processing device, document processing system, and document processing method

Info

Publication number: WO2014064803A1
Application number: PCT/JP2012/077614
Authority: WO
Inventors: 正和藤尾; 永崎　健; 淳一平山; 彰多田; 慶今沢
Original assignee: 株式会社日立製作所
Priority date: 2012-10-25
Filing date: 2012-10-25
Publication date: 2014-05-01

Abstract

The purpose of the present invention is to estimate the scope of impact of design changes by using change content in which change instructions instructing changes in data such as design data are notated. This document processing program subjects the following to machine learning: the correspondence relationship between change content instructed by document data; and the changed portions of data changed by the change content. On the basis of those results, the document processing program estimates portions for which data will be changed according to new change instructions (see figure 2).

Description

Document processing program, document processing apparatus, document processing system, and document processing method

The present invention relates to a technique for analyzing a document.

When designing a product, there are a new design to be newly designed and a design change in which a partial change is made based on an existing design. For example, when considering a product to be installed in an equipment plant, the design of the product has been changed, such as when the equipment plant cannot be placed at the installation location that was initially assumed due to interference between parts as a result of an investigation of the current status of the equipment plant. Often forced to do.

∙ When changing the design of a product, it may be necessary to additionally change the design of other functionally related parts. Derivative effects such as how to change the design of the related parts must also be considered. In this way, it is necessary to consider various direct and indirect effects when changing the design of a product. Therefore, it is necessary to implement a design change to quickly grasp the extent of influence (specifications, drawings, processes, etc. of other parts affected by the design) and the extent (increase / decrease in man-hours, increase / decrease in costs, etc.) It is extremely important when judging no.

The following Patent Document 1 describes “a change impact prediction apparatus having a change record database that is past history information related to a change (claim 1)” as a technology for predicting the impact of a design change. The purpose of this document is to “generate a pattern (summary) that can narrow down the range of influence with high precision by considering the original change contents”.

JP 2012-14308 A

When actually making a design change, a change instruction is given by document data (change instruction sheet) instructing the content of the change. In the technique described in Patent Document 1, it is considered that when a change record database is used, it is assumed that the database is searched using a data item stored in the database as a key. Therefore, it is necessary to manually extract the data item or the like as the search key from the change instruction in advance, and there is a possibility that a corresponding work load or work error may occur.

The present invention has been made in view of the above-described problems, and estimates the influence range of a design change by using a change content described in a change instruction that instructs a change to data such as design data. For the purpose.

The document processing program according to the present invention machine-learns the correspondence between the change content indicated by the document data and the changed portion of the data changed by the change content, and newly changes based on the result. The part where the data is changed by the instruction is estimated.

According to the document processing program according to the present invention, it is possible to estimate the influence of the change using the change contents described in the change instruction, thereby reducing the work load and mistakes caused by the manual influence prediction. , The impact range can be predicted quickly.

Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.

1 is a block diagram illustrating a configuration of a document processing apparatus 10 according to a first embodiment. FIG. 4 is a processing flowchart showing the overall operation of the document processing apparatus 10. It is a processing flowchart explaining the detail of step S202. It is a figure which shows the example of a text of the change instruction 191, and the process image of step S2022. It is a flowchart which shows the process example of step S2022. It is a flowchart explaining the detail of step S2023. It is a figure which shows the example of a text of the change instruction 191 described in the special format, and the process image of step S2024. It is a figure which shows the processing image of step S203 when it assumes that step S2025-S2027 were abbreviate | omitted. It is a figure which shows the process image of step S2026. It is a figure which shows the process image of step S2027. 10 is a flowchart showing an example of processing in steps S2026 to S2027. It is a figure which shows another structural example of the similarity vector 1541 which the learning program 154 uses. It is a figure which shows another structural example of the similarity vector 1541 which the learning program 154 uses. It is a figure which shows another structural example of the similarity vector 1541 which the learning program 154 uses. FIG. 10 is a process flow diagram illustrating an operation in an answer book learning phase of the document processing apparatus according to the second embodiment. It is a figure which shows the example of a layout of the change instruction 191 described in the table format. FIG. 10 is a process flow diagram illustrating details of step S202 in the third embodiment. It is a block diagram of the document processing system 1000 which concerns on Embodiment 4. FIG. FIG. 10 is a processing flowchart showing the overall operation of the document processing system 1000. It is a figure which shows the process image of step S1901. It is a figure which shows the process image of step S1902.

<Embodiment 1: Device configuration>
FIG. 1 is a block diagram showing a configuration of a document processing apparatus 10 according to the first embodiment of the present invention. The document processing apparatus 10 is an apparatus that analyzes a change instruction indicating a change content for data (hereinafter referred to as CAD data) such as CAD (Computer Aided Design) data, and estimates a range in which the change content affects. is there. The document processing apparatus 10 can be configured using a general computer, and includes, for example, an input device 11, a display device 12, a CPU (Central Processing Unit) 13, a printing device 14, a memory 15, and a storage unit 16.

The input device 11 is a device that accepts an input of an operation instruction or the like from a user, and can be configured using, for example, a keyboard, a mouse, or a touch panel. The display device 12 is a device that presents various information to the user, and can be configured using a screen display device such as a liquid crystal display. The printing device 14 prints various information provided to the user as necessary.

The CPU 13 is an arithmetic unit that realizes various functions by executing a program stored in the memory 15. In the following, for convenience of description, each program may be described as an operation subject, but it is added that the CPU 13 actually executes these programs.

The memory 15 is a storage device that stores a program executed by the CPU 13, and stores an OS (Operating System) 151, a communication program 152, a document analysis program 153, a learning program 154, and a change target estimation program 155. Details of each program will be described later. Each of these programs corresponds to a “document processing program” in the first embodiment. Equivalent functions can also be realized using hardware such as circuit devices. The memory 15 may further store other programs, may store data that is referred to when the CPU 13 executes these programs, or may store the results of processing executed by the CPU 13. .

The storage unit 16 is a storage device that stores information to be referred to when the CPU 13 executes various processes according to the description of each program. The storage unit 16 includes a change word dictionary 161, a general term dictionary 162, a template dictionary 163, and a learning model 164. Store. The storage unit 16 may further store other information.

Typically, the memory 15 is a high-speed and volatile storage device such as a DRAM (Dynamic Random Access Memory), and the storage unit 16 is a large-capacity and nonvolatile storage such as an HDD (Hard Disk Drive) or a flash memory. Although it is a storage device, other types of storage devices may be used. Each program may be stored in the storage unit 16 in advance and copied to the memory 15 when the CPU 13 executes the program. You may make it copy at least one part of the data which the memory | storage part 16 stores to the memory 15 temporarily as needed.

The document processing apparatus 10 can be connected to the file server 19 via the communication network 18. The file server 19 is a computer connected to the communication network 18, and one or more file servers 19 exist.

FIG. 1 shows an example in which the document processing apparatus 10 is realized by a single computer, but a similar function can be realized by a plurality of computers. For example, each dictionary stored in the storage unit 161 may be stored in one of the file servers 19 and transmitted / received via the communication program 152 and the communication network 18. Alternatively, the CAD data to be changed or the change instruction may be stored in any one of the file servers 19 and transmitted / received via the communication program 152 and the communication network 18. In the following, it is assumed that each document data is stored on the file server 19 for simplicity of explanation.

<Embodiment 1: Operation procedure>
FIG. 2 is a processing flowchart showing the overall operation of the document processing apparatus 10. The operation of the document processing apparatus 10 is divided into a learning phase and an estimation phase. The general operation of each phase will be described below.

(FIG. 2: Learning phase: Step S201)
The document processing apparatus 10 first performs a learning phase. In the learning phase, the document processing apparatus 10 acquires, from the file server 19, for example, the communication program 152 for instructing the change contents for the CAD data in the past. The communication program 152 corresponds to the “document acquisition program” and “document acquisition unit” in the first embodiment.

(FIG. 2: Learning phase: Step S202)
The document analysis program 153 analyzes the change instruction 191 acquired in step S201 and extracts the dependency relationship between words and phrases. Details of this step will be described later with reference to FIG.

(FIG. 2: Learning phase: Step S203)
The learning program 154 calculates the feature amount of the change instruction 191 acquired in step S201 based on the result of document analysis in step S202. An example of the feature amount calculated in this step will be described again in FIGS. 8 to 10 and FIGS. 12 to 14 described later.

(FIG. 2: Learning phase: Step S204)
The document processing apparatus 10 acquires design data 192 (for example, CAD data) from the file server 19 by using, for example, the communication program 152. The learning program 154 uses the feature amount of the change instruction 191 extracted in step S203, and the correspondence relationship between the change content instructed by the change instruction 191 and the changed portion of the design data 192 changed by the change content To learn. The learning result is stored in the learning model 164.

(FIG. 2: Estimation phase: steps S205 to S207)
The document processing apparatus 10 performs the same processing as steps S201 to S203 on the change instruction 191 describing the new change contents that have not been learned.

(FIG. 2: Estimation phase: Step S208)
The change target estimation program 155 uses the learning result accumulated as the learning model 164 to estimate the range in which the design data 192 is affected by the change content indicated by the new change instruction 191. For example, a feature quantity vector that is closest to the feature quantity vector of the new change instruction 191 is specified, and the design parameter designated by the past change instruction 191 related to the feature quantity vector is the new change instruction 191. It can be estimated that it is changed. The estimation result is output via an output unit such as the display device 12 or the printing device 14.

<Embodiment 1: Details of Operation Procedure>
FIG. 3 is a processing flowchart for explaining details of step S202. Hereinafter, each step of FIG. 3 will be described.

(FIG. 3: Steps S2021 to S2022)
The document analysis program 153 extracts a text (character string) portion from the change instruction 191 (S2021). The document analysis program 153 distributes the text into a part related to the changed content and a part other than that (for example, a front part explaining the progress) in accordance with a procedure described later with reference to FIG. 4 (S2022).

(FIG. 3: Step S2023)
The document analysis program 153 performs language analysis described with reference to FIG. 6 described later on the text instructing the content of change extracted in step S2022.

(FIG. 3: Step S2024)
The document analysis program 153 uses the template dictionary 163 to extract words from the text that indicates the change contents extracted in step S2022. This step is for extracting words from text described in a format that is difficult to extract by the language analysis in step S2023. Details of this step will be described later with reference to FIG.

(FIG. 3: Step S2025)
The document analysis program 153 uses the change word dictionary 161 to extract, as a keyword, a word that is assumed to contain an instruction for causing a change location in the design data 192. For example, words such as “change” and “correct” are considered to have a high possibility of instructing a design change with respect to the design data 192. Therefore, in this step, these words are extracted as keywords. This step has a significance as preparation for determining the feature amount of each word, and also has a significance to determine a processing start location in the next step S2026.

(FIG. 3: Step S2026)
The document analysis program 153 searches the text forward or backward within a predetermined range using the keyword extracted in step S2025 as a starting point, and extracts a phrase that is assumed to be the target of the keyword. Details of this step will be described later with reference to FIG.

(FIG. 3: Step S2027)
Using the general term dictionary 162, the document analysis program 153 excludes words that are assumed not to affect the changed portion of the design data 192 from the text extracted in step S2022. Details of this step will be described later with reference to FIG.

(FIG. 3: Steps S2025 to S2027: Supplement)
Steps S2025 to S2027 are processes performed to increase the degree of correlation between the text extracted in step S2022 and the changed portion of the design data 192, and can be omitted. An example in which these steps are omitted will be described later with reference to FIG. The document analysis program 153 outputs the text extracted by the above steps to the next step S203.

FIG. 4 is a diagram showing a text example of the change instruction 191 and a processing image of step S2022. The change instruction 191 does not necessarily describe only the changes to the design data 192, but usually includes other text. In step S2022, the document analysis program 153 sorts the change instruction 191 into the change content portion 1912 and the other portion 1911. For example, the following method can be considered as the sorting standard.

(Fig. 4: Example of determination method 1)
Judgment is made based on whether or not a phrase typically included in the sentence describing the changed content or the other part is present in the sentence to be processed in step S2022 this time. For example, phrases such as “to no” and “to accompany” are phrases that suggest a causal relationship in Japanese, and therefore, it is assumed that there are phrases related to the changed part of the design data 192 behind the phrase. Is done. Therefore, it can be presumed that the following text describes the contents of the change with these words as boundaries. A similar method is used in FIG. 9 described later.

(Fig. 4: Example of determination method 2)
Similarity determination, such as sentence pattern matching, is performed between a sentence describing the change contents or other parts in the past change instruction 191 and a sentence to be processed in step S2022 this time. Determine text content based on degree.

FIG. 5 is a flowchart showing an example of processing in step S2022. Here, the example 1 described in FIG. 4 is assumed. Steps S20221 to S20223 are the same as steps S20231 to S20233 in FIG. In step S20224, the document analysis program 153 refers to the keyword dictionary in which words / phrases typically included in the sentences describing the contents of change or other parts are registered, and the words / phrases extracted in steps S20221 to S20223 correspond to them. It is determined whether or not. In step S20225, the document analysis program 153 sorts the text according to the determination result of step S20224.

FIG. 6 is a flowchart for explaining the details of step S2023. In step S20231, the document analysis program 153 performs morpheme analysis of the input text using the morpheme dictionary. In step S20232, the document analysis program 153 performs dependency on the input text using the dependency dictionary. In step S20233, the document analysis program 153 determines a step word of the input text using the stop word dictionary.

Through the above processing, the document analysis program 153 extracts words, phrases, and their dependency relationships from the input text, and outputs the results. Each dictionary can be stored in the storage unit 16 in advance, for example.

FIG. 7 is a diagram showing a text example of the change instruction 191 written in a special format and a processing image of step S2024. The change instruction 191 shown in FIG. 7 is described using parentheses “()”, a colon “:”, and an arrow “→”.

語 Although words in texts written using these special symbols are difficult to extract by linguistic analysis, it is considered that there are many cases that describe matters related to changes based on empirical rules. Therefore, the document analysis program 153 recognizes the special format by using the template dictionary 163 in which these special formats are defined in advance, and extracts words included therein.

The template dictionary 163 shown in FIG. 7 illustrates two templates. Template 1 defines that the text should be divided into the word immediately before the colon and the text after it, and the latter should be divided into words described before and after the arrow. Template 2 defines that the text should be divided into the word immediately before the opening parenthesis and the text within the parenthesis, and the latter should be divided into words described before and after the arrow.

For example, assume that template 1 defines that the word immediately before the colon should be extracted, and template 2 defines that the word immediately before the opening parenthesis and the word immediately before the arrow in the parenthesis should be extracted. To do. In this case, each word can be extracted from the change instruction 191 shown in FIG. 7 as indicated by reference numeral 191 'in FIG.

Note that the template dictionary 163 shown in FIG. 7 is an example, and any template can be defined as long as it can be defined by text patterns such as characters and symbols.

FIG. 8 is a diagram showing a processing image of step S203 when it is assumed that steps S2025 to S2027 are omitted. As a result of step S202, the change instruction 191 is divided into each word as indicated by reference numeral 191 '' in FIG. The learning program 154 gives an appropriate feature amount to each word, and outputs it as a feature amount vector 1541 having the number of each word as the number of dimensions. For example, since the sentence illustrated in FIG. 8 is composed of 21 words, the feature can be expressed as a 21-dimensional feature quantity vector.

The specific numerical value of the feature amount of each word in the feature amount vector 1541 may be, for example, a value obtained by digitizing the degree to which the design data 192 is changed by the word based on the degree of correlation between the word and the changed portion. it can. This degree of correlation may be determined based on empirical rules, or may be determined based on past performance statistics or the like.

FIG. 9 is a diagram showing a processing image of step S2026. It is assumed that the phrases existing before and after the keyword registered in the change word dictionary 161 have a high correlation with the changed part of the design data 192. Therefore, the document analysis program 153 divides the change instruction 191 into each word, and then searches for the word forward or backward starting from the keyword. The learning program 154 gives a larger feature amount to words within the range than other words. As a result, the degree of correlation between the change location of the design data 192 and each word in the change instruction 191 can be expressed more accurately as the feature quantity vector 1541.

The change word dictionary 161 shown in FIG. 9 illustrates two types of keywords. Since the keyword 1 is a phrase that suggests a causal relationship in Japanese, it is assumed that there is a phrase related to the changed part of the design data 192 behind it. Since keyword 2 is a phrase that suggests a change in Japanese, it is assumed that there is a phrase related to the changed part in front of it. In this way, whether to search forward or backward of a keyword differs depending on the type of keyword, so it may be defined as a set with a keyword.

FIG. 10 is a diagram showing a processing image of step S2027. It is assumed that a general term (for example, a particle, an idiomatic term in the field) necessary for composing a sentence has a low correlation with a changed part of the design data 192. Even if the feature quantity vector 1541 is configured using such general terms, it is considered that the feature quantity vector 1541 cannot sufficiently express the correspondence between the change instruction 191 and the change location of the design data 192.

Therefore, the document analysis program 153 deletes the general terms registered in the general term dictionary 162 from the change instruction 191 in advance. The learning program 154 configures the feature quantity vector 1541 using the text after deleting the general terms. Thereby, the correlation degree between the feature-value vector 1541 and a change location can be raised, and both correspondence can be expressed more appropriately.

FIG. 11 is a flowchart showing a processing example of steps S2026 to S2027. Since steps S20261 to S20263 are the same as steps S20231 to S20233, the results of these steps can be used and omitted. In steps S20264 to S20265, the document analysis program 153 performs the processing described with reference to FIGS.

FIG. 12 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154. The similarity vector 1541 is used to learn the correspondence between the change contents of the design data 192 indicated by the change instruction 191 and the changed portion of the design data 192 changed by the change contents. Therefore, as long as it has the same function, configurations other than those exemplified in FIGS. 8 to 10 can be adopted.

In the example shown in FIG. 12, the similarity between the design parameter (for example, CAD parameter) instructed by the change instruction 191 and the text in the change instruction instructed to change each design parameter in the past, Each past sentence was obtained and used as a vector element of the similarity vector 1541.

The field 15411 holds a text instructing to change the design data 192 in the change instruction 191. The field 15412 holds design parameters that instruct the text in the field 15411 to change. The field 15413 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change each design parameter (illustrating the design parameters 1 to n). .

12, according to the configuration example of the similarity vector 1541 illustrated in FIG. 12, the degree of correlation with other design parameters can be learned through the design parameter to be changed. The same applies to the configuration example in FIG. 13 described below.

FIG. 13 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154. In the example shown in FIG. 12, the similarity between the change instruction 191 for each design parameter and the past sentence instructing to change the design parameter not related to the change instruction 191 is determined. In the example illustrated in FIG. 13, design parameters that are not related to the change instruction 191 are aggregated into one similarity using, for example, an average value of the similarity between each past sentence and the change instruction 191. It was decided to.

The field 15414 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change the same design parameter (that is, the design parameter in the field 15412). The field 15415 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change other design parameters (that is, design parameters other than the field 15412). .

FIG. 14 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154. The field 15416 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change the same design parameter (that is, the design parameter in the field 154112). However, a plurality of different methods are used for calculating the similarity, and the similarity obtained using each method is used as a vector element.

<Embodiment 1: Summary>
As described above, the document processing apparatus 10 according to the first embodiment determines in advance a correspondence relationship between the change content indicated by the change instruction 191 and the change location of the design data 192 changed by the change content. Learning is performed, and a location where the design data 192 is changed by the new change instruction 191 is estimated based on the learning result. Thereby, it is possible to predict the range of influence by the change instruction using only the change instruction 191 described as a sentence. This eliminates the need for manual extraction of design parameters from the change instruction 191 and reduces the work burden and cost for predicting the influence range.

<Embodiment 2>
In the first embodiment, the operation example of learning the range of influence on the design data 192 using the past change instruction 191 has been described. However, in the actual design work, the change content instructed by the change instruction 191 cannot be realized. For some reason, the change contents may not be reflected in the design data 192. Whether the design can be changed is generally described as an answer to the change instruction 191. Therefore, in the second embodiment of the present invention, an operation example for additionally learning the contents of an answer sheet will be described.

FIG. 15 is a process flow diagram showing an operation in the response document learning phase of the document processing apparatus 10 according to the second embodiment. In the second embodiment, the document processing apparatus 10 newly performs an answer book learning phase between the learning phase and the estimation phase. The answer learning phase is a phase for learning the influence of the answer 194 to the change instruction 191 on the design data 192, which is similar to the learning phase, but the answer 194 answers that the design cannot be changed. The learning phase is different from the learning phase. Hereinafter, the difference from the learning phase will be mainly described.

(FIG. 15: Answer book learning phase: step S1501)
The document processing apparatus 10 acquires, from the file server 19, for example, the communication program 152, which has written a reply to the change instruction 191 in the past.

(FIG. 15: Answer book learning phase: Step S1502)
The document analysis program 153 analyzes the answer book 194 acquired in step S1501 and extracts the dependency relationship between words and phrases. This step is generally the same as step S202, except that the response content for the change instruction 191 is extracted instead of extracting the change content for the design data 192. Therefore, it is necessary to extract a word suggesting an answer to the change content such as “unchangeable” instead of a word suggesting a change such as “change”. The change word dictionary 161 stores these words in advance. Since the answer target of the answer sheet 194 is considered to be a design parameter, the point that extracts a word suggesting the design parameter is the same as in the first embodiment.

(FIG. 15: Answer book learning phase: step S1503)
The learning program 154 calculates the feature amount of the answer sheet 194 acquired in step S1501 based on the document analysis result in step S1502. The processing in this step is the same as that in step S203.

(FIG. 15: Answer book learning phase: step S1504)
The learning program 154 learns the correspondence between the response content indicated by the response document 191 and the changed portion of the design data 192 changed by the response content, using the feature amount of the response document 191 extracted in step S1503. To do.

(FIG. 15: Answer book learning phase: Step S1504: Supplement)
If the answer is that the design change is permitted, the change contents can be learned by the change instruction 191, and there is no new matter to be learned by the answer 194. If the answer is that the design change is rejected, the change contents instructed by the change instruction 191 are not reflected in the design data 192, and the learning result based on the change instruction 191 needs to be cancelled. . Therefore, the learning program 154 reflects the learning result based on the answer sheet 194 in the negative direction with respect to the learning model 164. For example, learning may be performed using a vector element of the feature quantity vector 1541 that is multiplied by -1 and inverted in sign.

<Embodiment 2: Summary>
As described above, the document processing apparatus 10 according to the second embodiment, when the answer sheet 194 replies that the change contents specified by the change instruction sheet 191 are rejected, the change contents in the change instruction sheet 191. And machine learning are performed in such a direction as to cancel the correspondence between the change points in the design data 192. Thereby, the possibility of design change can be reflected in the learning result, and the influence range can be estimated more accurately.

<Embodiment 3>
In the first and second embodiments, the example in which the change instruction 191 is described by symbols and character strings has been described. However, the actual change instruction 191 may be described in various formats such as a table format. Therefore, in the third embodiment of the present invention, an example of operation for extracting a text portion by analyzing the format when analyzing the change instruction 191 will be described. Since the other configuration is the same as that of the first and second embodiments, the operation for analyzing the format of the change instruction 191 will be mainly described below.

FIG. 16 is a diagram showing a layout example of the change instruction 191 described in a table format. The text described at the bottom of FIG. 16 is the same as that described in FIG. 4 of the first embodiment. However, since there are other cells before that, these cells are removed or the text portion It is necessary to specify the position.

FIG. 17 is a processing flowchart for explaining details of step S202 in the third embodiment. Hereinafter, each step of FIG. 17 will be described.

(FIG. 17: Step S2028)
The document analysis program 153 executes the following steps S2028 and S2029 instead of step S2021. In step S2028, the document analysis program 153 converts the change instruction 191 into text with coordinates. The text with coordinates is document data in which, for example, the coordinates with the origin at the upper left of the document are internally given to the text portion described in the change instruction 191. Since the text with coordinates is a known technique, its details are omitted.

(FIG. 17: Step S2029)
The document analysis program 153 analyzes the layout of the change instruction 191 using the result of step S2028. Since the technique for analyzing the layout of a document using the text with coordinates is a known technique, the details thereof are omitted. The following steps are the same as those after step S2022 of the first embodiment.

<Embodiment 4>
FIG. 18 is a configuration diagram of a document processing system 1000 according to the fourth embodiment of the present invention. The document processing system 1000 is a system having various additional functions in addition to the document processing apparatus 10 described in the first to third embodiments. The document processing system 1000 includes a storage 1100, an extraction processing device (hereinafter, ETL: Extract / Transform / Load) 1200, a content server 1300, a search server 1410, a metadata server 1420, and the document processing device described in the first to third embodiments. 10. An application program 1500 is included.

The storage 1100 is a storage device that stores various data, and functions as an alternative to the file server 19 in the first to third embodiments. For example, document data 1101, CAD data 1102, mail data 1103, and the like are stored. A change instruction 191 and a reply 194 are also stored on the storage 1100.

The ETL 1200 is a device that extracts necessary items from each data stored in the storage 1100. The ETL 1200 stores the change location extraction program 1201, the association program 1202, and the association rule data 1203 in a storage device such as an HDD. These operations will be described later.

The content server 1300 is a server that provides various data to a host server group, and stores the metadata 1301 and the content data 1302 in a storage device such as an HDD. The metadata 1301 is data describing attribute information of the content data 1302.

The search server 1410 and the metadata server 1420 are servers that search the content data 1302 or the metadata 1301 in response to a request from the application program 1500. The search server 1401 searches the content data 1302 using the index 1411 extracted from the content data 1302. The metadata server 1420 searches the metadata 1301 using the database 1421 obtained by indexing the metadata 1301.

The application program 1500 is executed by an appropriate computer and provides various functions according to its use. For example, in addition to the search request, the document processing apparatus 10 can be instructed to estimate the range of influence by the change instruction 191 and the result can be received and presented to the user. A design application such as a CAD tool, a workflow application that transmits / receives a change instruction 191 or an answer 194, and the like can also be included.

The document processing apparatus 10 and other servers and apparatuses can be configured integrally. For example, each program installed in the ETL 1200 may be mounted on the document processing apparatus 10 and the document processing apparatus 10 may execute these programs.

<Embodiment 4: Function of ETL1200>
In the first to third embodiments, the learning of the correspondence between the change instruction 191 and the changed portion of the design data 192 has been described. However, when the design data 192 is described in a special format such as CAD data, for example, there is a possibility that a location where the design data 192 has been changed in the past cannot be easily specified. Therefore, in the fourth embodiment, the ETL 1200 extracts a changed portion of the design data 192 to reduce the burden on the operator.

In addition, as a premise for learning the correspondence between the change contents in the change instruction 191 and the changed part of the design data 192, it is necessary to specify which design data 192 the change instruction 191 has instructed to change. is there. However, when the change instruction 191 and the design data 192 are created through different systems and workflows, it becomes difficult for another worker to grasp the correspondence between the two after the design process is completed. there is a possibility. Therefore, in the fourth embodiment, the ETL 1200 supports the process of associating the change instruction 191 and the design data 192 in advance as the pre-process, thereby reducing the burden on the operator.

FIG. 19 is a processing flow diagram showing the overall operation of the document processing system 1000. The document processing system 1000 newly performs steps S1901 to S1902 in addition to the operation of the document processing apparatus 10 described in the first to third embodiments. Hereinafter, these steps will be described.

(FIG. 19: Learning phase: Step S1901)
The ETL 1200 executes the change location extraction program 1201 by the CPU, and extracts a location where the design data 192 has been changed by the past change instruction 191. Details of this step will be described later with reference to FIG.

(FIG. 19: Learning phase: Step S1902)
The ETL 1200 executes the association program 1202 by the CPU, and associates the past change instruction 191 with the design data 192 changed thereby. Details of this step will be described later with reference to FIG.

FIG. 20 is a diagram showing a processing image of step S1901. Here, the case where the design data 192 is CAD data is illustrated. Since CAD data is drawing data, it is necessary to convert the data into log data dumped in a text format in order to specify the changed portion. In this example, the log data is taken into the spreadsheet software.

20, it is assumed that the design data 192 has been changed as indicated by reference numeral 191 ′ by the change instruction 191. In general, the difference between the two may be taken in order to identify the changed part, but the CAD data cannot always be identified by simply taking the difference. This is because the part number 1921 may be changed according to the work process (for example, every month).

However, even if the part number 1921 is changed, the arrangement order is normally kept the same if the parts are the same. The change location extraction program 1201 extracts the change location of the design data 192 using this feature. Specifically, the

design data

191 and 191 ′ before and after the change are sorted by the part number 1921, and the places with the highest degree of coincidence in other columns (for example, the part names shown in the AE column in FIG. 20) correspond to each other. Estimated. The degree of coincidence may be obtained by a known method such as DP matching. In order to extract the changed part more reliably, the user may visually correct the result extracted by the changed part extraction program 1201 and correct it as necessary.

FIG. 21 is a diagram showing a processing image of step S1902. In general, the change instruction 191 and the design data 192 are usually created through different workflows and applications, so the formats of the two do not necessarily match, and a corresponding work load is generated for associating them.

However, design information such as a part number included in the design data 192 should be described in some form in the change instruction 191. Since CAD data is drawing data, it is considered that CAD data is often specified as a drawing number in the change instruction 191. Therefore, the association program 1202 assumes that the design data 192 including the drawing number described in the change instruction 191 corresponds to the change instruction 191 and presents the fact to the user. The user can visually check the presentation and associate the two more accurately.

Since the drawing number is considered to be typically described in the subject part or text of the change instruction 191, the association program 1202 preferentially checks the correspondence between these parts and the design data 192. Also good. In addition, even if the drawing numbers are the same, if the date and time are far from each other, it is considered that the relationship between the two is low, so the date and time (the date and time described in the document or the update date and time) may be Only when they are within a predetermined range (for example, within one week), they may be associated with each other.

A rule as to which item in the change instruction 191 is associated with which item in the design data 192 can be defined in advance as the association rule data 1203.

<Embodiment 4: Summary>
As described above, in the document processing system 1000 according to the fourth embodiment, the ETL 1200 extracts a portion where the design data 192 has been changed by the past change instruction 191. Thereby, even if the design data 192 is described in a special format, it is possible to efficiently learn the correspondence between the change instruction 191 and the changed portion.

In the document processing system 1000 according to the second embodiment, the ETL 1200 associates the past change instruction 191 with the design data 192 changed thereby. Thereby, even if both are created separately, the work burden for an operator to associate these can be reduced.

The present invention is not limited to the above-described embodiment, and includes various modifications. The above embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described. A part of the configuration of one embodiment can be replaced with the configuration of another embodiment. The configuration of another embodiment can be added to the configuration of a certain embodiment. Further, with respect to a part of the configuration of each embodiment, another configuration can be added, deleted, or replaced.

For example, in the above embodiment, the example in which the change is instructed to the design data 192 such as the CAD data has been described, but the present invention can also be applied to the case where the change is instructed with a document for other data.

The above components, functions, processing units, processing means, etc. may be realized in hardware by designing some or all of them, for example, with an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

10: Document processing device, 11: Input device, 12: Display device, 13: CPU, 14: Printing device, 15: Memory, 151: OS, 152: Communication program, 153: Document analysis program, 154: Learning program, 155 : Change target estimation program, 16: storage unit, 161: change word dictionary, 162: general term dictionary, 163: template dictionary, 164: learning model, 18: communication network, 19: file server, 1000: document processing system, 1100 : Storage, 1200: Extraction processing device, 1201: Change location extraction program, 1202: Association program, 1203: Association rule data, 1300: Content server, 1410: Search server, 1420: Metadata server, 1500: Application program.

Claims

A program for causing a computer to process a document, wherein the computer
A document acquisition step for acquiring document data instructing changes to the data;
A learning step of machine learning a correspondence relationship between the change content indicated by the document data and a change portion of the data changed by the change content;
An estimation step for estimating a portion where the data is changed according to a change content to the data indicated by the new document data based on the result of the machine learning;
A document processing program characterized by causing
The document processing program is further stored in the computer.
Executing a document analysis step of extracting at least one of a word of words described in the document data or a phrase dependency relationship;
In the learning step, the computer is
The correspondence relationship is machine-learned by obtaining a correlation between at least one of the word or the phrase dependency relationship extracted in the document analysis step and a changed portion of the data changed by the document data. Step to do,
The document processing program according to claim 1, wherein:
The document processing program is further stored in the computer.
Executing an answer document acquisition step of acquiring an answer document describing an answer to the change content to the data indicated by the document data;
In the document analysis step, the computer
Extracting the part describing the answer that the change content is not permitted from the text described in the answer document;
In the learning step, the computer is
The machine learning is performed in a direction in which the correspondence relationship is weakened when a text describing the response not permitting the change content exists in the response document. Document processing program.
The document processing program is further stored in the computer.
Performing a step of extracting the keyword included in the document data by referring to a word dictionary in which a keyword defined as a word causing a change part of the data is registered;
In the learning step, the computer is
The machine learning of the correspondence is performed by obtaining a correlation between the keyword extracted from the document data and a changed portion of the data changed by the document data. Document processing program.
The document processing program is further stored in the computer.
Performing a step of extracting the keyword included in the document data by applying a template describing a text pattern existing around a keyword defined as a word that causes a change part of the data to the document data; Let
In the learning step, the computer is
Machine learning of the correspondence by obtaining a degree of correlation between the keyword extracted from the document data using the template and a changed portion of the data changed by the document data. The document processing program according to claim 2.
The document processing program is further stored in the computer.
Performing a step of extracting from the document data a keyword defined as a word that causes a modified portion of the data;
In the learning step, the computer is
By calculating the degree of correlation between a word included in the document data within a predetermined range from the keyword and a changed portion of the data changed by the document data, the correspondence is determined by a machine. The document processing program according to claim 2, wherein learning is performed.
The document processing program is further stored in the computer.
Performing the step of identifying the general terms included in the document data by referring to a general term dictionary that registers general terms defined as words that do not cause a change in the data;
In the learning step, the computer is
Machine learning of the correspondence by obtaining a correlation between a text described by the document data excluding the general term and a changed portion of the data changed by the document data. The document processing program according to claim 2.
The document processing program is further stored in the computer.
Analyzing the layout of the document data, and executing a layout analysis step for specifying a portion describing the change contents in the layout obtained by the analysis,
In the learning step, the computer is
3. The machine learning of the correspondence is performed by obtaining a degree of correlation between the changed content specified in the layout analysis step and a changed portion of the data changed by the document data. The document processing program described.
The document processing program is further stored in the computer.
Executing a step of extracting a delimiter phrase arranged at a boundary part between a part describing the change content and a part before the part of the text included in the document data;
In the learning step, the computer is
By calculating the correlation between the text described in the document data excluding the part before the delimiter phrase and the changed part of the data changed by the document data, the correspondence is determined by the machine. The document processing program according to claim 2, wherein learning is performed.
In the learning step, the computer is
The similarity between the change part indicated by the change contents of the document data and the other document data including the change contents instructed to change the same change part is described for each of the plurality of document data. The document processing program according to claim 1, wherein the machine learning is performed using a feature vector formed as described above.
In the document acquisition step, the computer
Obtaining the document data instructing the change contents to the CAD data;
In the learning step, the computer is
Machine learning the correspondence between the changed content and the changed portion of the CAD data changed by the changed content;
In the estimation step, the computer
The document processing program according to claim 1, wherein a part of the CAD data to be changed is estimated based on a change content of the CAD data indicated by the new document data based on the result of the machine learning.
An apparatus for processing a document,
A document acquisition unit that acquires document data instructing changes to the data;
A learning unit that machine-learns the correspondence between the change content indicated by the document data and the changed portion of the data changed by the change content;
Based on the result of the machine learning, an estimation unit that estimates a portion in which the data is changed according to a change content to the data indicated by the new document data;
A document processing apparatus comprising:
A document processing apparatus according to claim 12,
An extraction processing device for extracting a portion where the data has been changed by the change content indicated by the document data;
Have
The data is
It is configured as CAD data that describes the design information of the parts that make up the article.
The extraction processing apparatus includes:
Taking out the part number and the arrangement order of the parts before and after the CAD data is changed by the document data as a change record,
If the part numbers do not match before and after the change, the part where the CAD data is changed is extracted by identifying the record in the change record that has the most similar order of the parts. A document processing system characterized by
The extraction processing apparatus includes:
The document data and the CAD data are associated with each other based on the subject, drawing number, and the update date and time of the CAD data and whether or not the update date and time of the document data are within a predetermined range,
The document processing apparatus includes:
The document processing system according to claim 13, wherein the correspondence relationship is learned using the document data and the CAD data associated with each other by the extraction processing device.
A method for processing a document, comprising:
A document acquisition step for acquiring document data instructing changes to the data;
A learning step of machine learning a correspondence relationship between the change content indicated by the document data and a change portion of the data changed by the change content;
An estimation step for estimating a portion where the data is changed according to a change content to the data indicated by the new document data based on the result of the machine learning;
A document processing method.