WO2014064803A1 - Document processing program, document processing device, document processing system, and document processing method - Google Patents

Document processing program, document processing device, document processing system, and document processing method Download PDF

Info

Publication number
WO2014064803A1
WO2014064803A1 PCT/JP2012/077614 JP2012077614W WO2014064803A1 WO 2014064803 A1 WO2014064803 A1 WO 2014064803A1 JP 2012077614 W JP2012077614 W JP 2012077614W WO 2014064803 A1 WO2014064803 A1 WO 2014064803A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
document
change
changed
computer
Prior art date
Application number
PCT/JP2012/077614
Other languages
French (fr)
Japanese (ja)
Inventor
正和 藤尾
永崎 健
淳一 平山
彰 多田
慶 今沢
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2012/077614 priority Critical patent/WO2014064803A1/en
Publication of WO2014064803A1 publication Critical patent/WO2014064803A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present invention relates to a technique for analyzing a document.
  • Patent Document 1 describes “a change impact prediction apparatus having a change record database that is past history information related to a change (claim 1)” as a technology for predicting the impact of a design change.
  • the purpose of this document is to “generate a pattern (summary) that can narrow down the range of influence with high precision by considering the original change contents”.
  • the present invention has been made in view of the above-described problems, and estimates the influence range of a design change by using a change content described in a change instruction that instructs a change to data such as design data. For the purpose.
  • the document processing program machine-learns the correspondence between the change content indicated by the document data and the changed portion of the data changed by the change content, and newly changes based on the result.
  • the part where the data is changed by the instruction is estimated.
  • the document processing program it is possible to estimate the influence of the change using the change contents described in the change instruction, thereby reducing the work load and mistakes caused by the manual influence prediction. , The impact range can be predicted quickly.
  • FIG. 1 is a block diagram illustrating a configuration of a document processing apparatus 10 according to a first embodiment.
  • FIG. 4 is a processing flowchart showing the overall operation of the document processing apparatus 10. It is a processing flowchart explaining the detail of step S202. It is a figure which shows the example of a text of the change instruction 191, and the process image of step S2022. It is a flowchart which shows the process example of step S2022. It is a flowchart explaining the detail of step S2023. It is a figure which shows the example of a text of the change instruction 191 described in the special format, and the process image of step S2024.
  • step S203 It is a figure which shows the processing image of step S203 when it assumes that step S2025-S2027 were abbreviate
  • FIG. 10 is a process flow diagram illustrating an operation in an answer book learning phase of the document processing apparatus according to the second embodiment. It is a figure which shows the example of a layout of the change instruction 191 described in the table format.
  • FIG. 10 is a process flow diagram illustrating details of step S202 in the third embodiment. It is a block diagram of the document processing system 1000 which concerns on Embodiment 4.
  • FIG. 10 is a processing flowchart showing the overall operation of the document processing system 1000. It is a figure which shows the process image of step S1901. It is a figure which shows the process image of step S1902.
  • FIG. 1 is a block diagram showing a configuration of a document processing apparatus 10 according to the first embodiment of the present invention.
  • the document processing apparatus 10 is an apparatus that analyzes a change instruction indicating a change content for data (hereinafter referred to as CAD data) such as CAD (Computer Aided Design) data, and estimates a range in which the change content affects. is there.
  • CAD data a change instruction indicating a change content for data
  • the document processing apparatus 10 can be configured using a general computer, and includes, for example, an input device 11, a display device 12, a CPU (Central Processing Unit) 13, a printing device 14, a memory 15, and a storage unit 16.
  • CAD data Change instruction indicating a change content for data
  • CAD Computer Aided Design
  • the input device 11 is a device that accepts an input of an operation instruction or the like from a user, and can be configured using, for example, a keyboard, a mouse, or a touch panel.
  • the display device 12 is a device that presents various information to the user, and can be configured using a screen display device such as a liquid crystal display.
  • the printing device 14 prints various information provided to the user as necessary.
  • the CPU 13 is an arithmetic unit that realizes various functions by executing a program stored in the memory 15.
  • each program may be described as an operation subject, but it is added that the CPU 13 actually executes these programs.
  • the memory 15 is a storage device that stores a program executed by the CPU 13, and stores an OS (Operating System) 151, a communication program 152, a document analysis program 153, a learning program 154, and a change target estimation program 155. Details of each program will be described later. Each of these programs corresponds to a “document processing program” in the first embodiment. Equivalent functions can also be realized using hardware such as circuit devices.
  • the memory 15 may further store other programs, may store data that is referred to when the CPU 13 executes these programs, or may store the results of processing executed by the CPU 13. .
  • the storage unit 16 is a storage device that stores information to be referred to when the CPU 13 executes various processes according to the description of each program.
  • the storage unit 16 includes a change word dictionary 161, a general term dictionary 162, a template dictionary 163, and a learning model 164. Store.
  • the storage unit 16 may further store other information.
  • the memory 15 is a high-speed and volatile storage device such as a DRAM (Dynamic Random Access Memory), and the storage unit 16 is a large-capacity and nonvolatile storage such as an HDD (Hard Disk Drive) or a flash memory. Although it is a storage device, other types of storage devices may be used.
  • Each program may be stored in the storage unit 16 in advance and copied to the memory 15 when the CPU 13 executes the program. You may make it copy at least one part of the data which the memory
  • the document processing apparatus 10 can be connected to the file server 19 via the communication network 18.
  • the file server 19 is a computer connected to the communication network 18, and one or more file servers 19 exist.
  • FIG. 1 shows an example in which the document processing apparatus 10 is realized by a single computer, but a similar function can be realized by a plurality of computers.
  • each dictionary stored in the storage unit 161 may be stored in one of the file servers 19 and transmitted / received via the communication program 152 and the communication network 18.
  • the CAD data to be changed or the change instruction may be stored in any one of the file servers 19 and transmitted / received via the communication program 152 and the communication network 18.
  • each document data is stored on the file server 19 for simplicity of explanation.
  • FIG. 2 is a processing flowchart showing the overall operation of the document processing apparatus 10.
  • the operation of the document processing apparatus 10 is divided into a learning phase and an estimation phase. The general operation of each phase will be described below.
  • the document processing apparatus 10 first performs a learning phase.
  • the document processing apparatus 10 acquires, from the file server 19, for example, the communication program 152 for instructing the change contents for the CAD data in the past.
  • the communication program 152 corresponds to the “document acquisition program” and “document acquisition unit” in the first embodiment.
  • Step S202 The document analysis program 153 analyzes the change instruction 191 acquired in step S201 and extracts the dependency relationship between words and phrases. Details of this step will be described later with reference to FIG.
  • the learning program 154 calculates the feature amount of the change instruction 191 acquired in step S201 based on the result of document analysis in step S202. An example of the feature amount calculated in this step will be described again in FIGS. 8 to 10 and FIGS. 12 to 14 described later.
  • the document processing apparatus 10 acquires design data 192 (for example, CAD data) from the file server 19 by using, for example, the communication program 152.
  • the learning program 154 uses the feature amount of the change instruction 191 extracted in step S203, and the correspondence relationship between the change content instructed by the change instruction 191 and the changed portion of the design data 192 changed by the change content To learn.
  • the learning result is stored in the learning model 164.
  • FIG. 2 Estimation phase: steps S205 to S207
  • the document processing apparatus 10 performs the same processing as steps S201 to S203 on the change instruction 191 describing the new change contents that have not been learned.
  • the change target estimation program 155 uses the learning result accumulated as the learning model 164 to estimate the range in which the design data 192 is affected by the change content indicated by the new change instruction 191. For example, a feature quantity vector that is closest to the feature quantity vector of the new change instruction 191 is specified, and the design parameter designated by the past change instruction 191 related to the feature quantity vector is the new change instruction 191. It can be estimated that it is changed.
  • the estimation result is output via an output unit such as the display device 12 or the printing device 14.
  • FIG. 3 is a processing flowchart for explaining details of step S202. Hereinafter, each step of FIG. 3 will be described.
  • the document analysis program 153 extracts a text (character string) portion from the change instruction 191 (S2021).
  • the document analysis program 153 distributes the text into a part related to the changed content and a part other than that (for example, a front part explaining the progress) in accordance with a procedure described later with reference to FIG. 4 (S2022).
  • Step S2023 The document analysis program 153 performs language analysis described with reference to FIG. 6 described later on the text instructing the content of change extracted in step S2022.
  • Step S2024 The document analysis program 153 uses the template dictionary 163 to extract words from the text that indicates the change contents extracted in step S2022. This step is for extracting words from text described in a format that is difficult to extract by the language analysis in step S2023. Details of this step will be described later with reference to FIG.
  • the document analysis program 153 uses the change word dictionary 161 to extract, as a keyword, a word that is assumed to contain an instruction for causing a change location in the design data 192. For example, words such as “change” and “correct” are considered to have a high possibility of instructing a design change with respect to the design data 192. Therefore, in this step, these words are extracted as keywords.
  • This step has a significance as preparation for determining the feature amount of each word, and also has a significance to determine a processing start location in the next step S2026.
  • Step S2026 The document analysis program 153 searches the text forward or backward within a predetermined range using the keyword extracted in step S2025 as a starting point, and extracts a phrase that is assumed to be the target of the keyword. Details of this step will be described later with reference to FIG.
  • Step S2027 Using the general term dictionary 162, the document analysis program 153 excludes words that are assumed not to affect the changed portion of the design data 192 from the text extracted in step S2022. Details of this step will be described later with reference to FIG.
  • Steps S2025 to S2027 are processes performed to increase the degree of correlation between the text extracted in step S2022 and the changed portion of the design data 192, and can be omitted. An example in which these steps are omitted will be described later with reference to FIG.
  • the document analysis program 153 outputs the text extracted by the above steps to the next step S203.
  • FIG. 4 is a diagram showing a text example of the change instruction 191 and a processing image of step S2022.
  • the change instruction 191 does not necessarily describe only the changes to the design data 192, but usually includes other text.
  • the document analysis program 153 sorts the change instruction 191 into the change content portion 1912 and the other portion 1911. For example, the following method can be considered as the sorting standard.
  • Fig. 4 Example of determination method 1
  • Judgment is made based on whether or not a phrase typically included in the sentence describing the changed content or the other part is present in the sentence to be processed in step S2022 this time.
  • phrases such as “to no” and “to accompany” are phrases that suggest a causal relationship in Japanese, and therefore, it is assumed that there are phrases related to the changed part of the design data 192 behind the phrase. Is done. Therefore, it can be presumed that the following text describes the contents of the change with these words as boundaries.
  • a similar method is used in FIG. 9 described later.
  • Fig. 4 Example of determination method 2
  • Similarity determination such as sentence pattern matching, is performed between a sentence describing the change contents or other parts in the past change instruction 191 and a sentence to be processed in step S2022 this time.
  • FIG. 5 is a flowchart showing an example of processing in step S2022.
  • Steps S20221 to S20223 are the same as steps S20231 to S20233 in FIG.
  • the document analysis program 153 refers to the keyword dictionary in which words / phrases typically included in the sentences describing the contents of change or other parts are registered, and the words / phrases extracted in steps S20221 to S20223 correspond to them. It is determined whether or not.
  • step S20225 the document analysis program 153 sorts the text according to the determination result of step S20224.
  • FIG. 6 is a flowchart for explaining the details of step S2023.
  • the document analysis program 153 performs morpheme analysis of the input text using the morpheme dictionary.
  • the document analysis program 153 performs dependency on the input text using the dependency dictionary.
  • the document analysis program 153 determines a step word of the input text using the stop word dictionary.
  • the document analysis program 153 extracts words, phrases, and their dependency relationships from the input text, and outputs the results.
  • Each dictionary can be stored in the storage unit 16 in advance, for example.
  • FIG. 7 is a diagram showing a text example of the change instruction 191 written in a special format and a processing image of step S2024.
  • the change instruction 191 shown in FIG. 7 is described using parentheses “()”, a colon “:”, and an arrow “ ⁇ ”.
  • the document analysis program 153 recognizes the special format by using the template dictionary 163 in which these special formats are defined in advance, and extracts words included therein.
  • the template dictionary 163 shown in FIG. 7 illustrates two templates.
  • Template 1 defines that the text should be divided into the word immediately before the colon and the text after it, and the latter should be divided into words described before and after the arrow.
  • Template 2 defines that the text should be divided into the word immediately before the opening parenthesis and the text within the parenthesis, and the latter should be divided into words described before and after the arrow.
  • each word can be extracted from the change instruction 191 shown in FIG. 7 as indicated by reference numeral 191 'in FIG.
  • template dictionary 163 shown in FIG. 7 is an example, and any template can be defined as long as it can be defined by text patterns such as characters and symbols.
  • FIG. 8 is a diagram showing a processing image of step S203 when it is assumed that steps S2025 to S2027 are omitted.
  • the change instruction 191 is divided into each word as indicated by reference numeral 191 '' in FIG.
  • the learning program 154 gives an appropriate feature amount to each word, and outputs it as a feature amount vector 1541 having the number of each word as the number of dimensions. For example, since the sentence illustrated in FIG. 8 is composed of 21 words, the feature can be expressed as a 21-dimensional feature quantity vector.
  • the specific numerical value of the feature amount of each word in the feature amount vector 1541 may be, for example, a value obtained by digitizing the degree to which the design data 192 is changed by the word based on the degree of correlation between the word and the changed portion. it can. This degree of correlation may be determined based on empirical rules, or may be determined based on past performance statistics or the like.
  • FIG. 9 is a diagram showing a processing image of step S2026. It is assumed that the phrases existing before and after the keyword registered in the change word dictionary 161 have a high correlation with the changed part of the design data 192. Therefore, the document analysis program 153 divides the change instruction 191 into each word, and then searches for the word forward or backward starting from the keyword. The learning program 154 gives a larger feature amount to words within the range than other words. As a result, the degree of correlation between the change location of the design data 192 and each word in the change instruction 191 can be expressed more accurately as the feature quantity vector 1541.
  • the change word dictionary 161 shown in FIG. 9 illustrates two types of keywords. Since the keyword 1 is a phrase that suggests a causal relationship in Japanese, it is assumed that there is a phrase related to the changed part of the design data 192 behind it. Since keyword 2 is a phrase that suggests a change in Japanese, it is assumed that there is a phrase related to the changed part in front of it. In this way, whether to search forward or backward of a keyword differs depending on the type of keyword, so it may be defined as a set with a keyword.
  • FIG. 10 is a diagram showing a processing image of step S2027. It is assumed that a general term (for example, a particle, an idiomatic term in the field) necessary for composing a sentence has a low correlation with a changed part of the design data 192. Even if the feature quantity vector 1541 is configured using such general terms, it is considered that the feature quantity vector 1541 cannot sufficiently express the correspondence between the change instruction 191 and the change location of the design data 192.
  • a general term for example, a particle, an idiomatic term in the field
  • the document analysis program 153 deletes the general terms registered in the general term dictionary 162 from the change instruction 191 in advance.
  • the learning program 154 configures the feature quantity vector 1541 using the text after deleting the general terms. Thereby, the correlation degree between the feature-value vector 1541 and a change location can be raised, and both correspondence can be expressed more appropriately.
  • FIG. 11 is a flowchart showing a processing example of steps S2026 to S2027. Since steps S20261 to S20263 are the same as steps S20231 to S20233, the results of these steps can be used and omitted. In steps S20264 to S20265, the document analysis program 153 performs the processing described with reference to FIGS.
  • FIG. 12 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154.
  • the similarity vector 1541 is used to learn the correspondence between the change contents of the design data 192 indicated by the change instruction 191 and the changed portion of the design data 192 changed by the change contents. Therefore, as long as it has the same function, configurations other than those exemplified in FIGS. 8 to 10 can be adopted.
  • the field 15411 holds a text instructing to change the design data 192 in the change instruction 191.
  • the field 15412 holds design parameters that instruct the text in the field 15411 to change.
  • the field 15413 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change each design parameter (illustrating the design parameters 1 to n). .
  • FIG. 13 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154.
  • the similarity between the change instruction 191 for each design parameter and the past sentence instructing to change the design parameter not related to the change instruction 191 is determined.
  • design parameters that are not related to the change instruction 191 are aggregated into one similarity using, for example, an average value of the similarity between each past sentence and the change instruction 191. It was decided to.
  • the field 15414 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change the same design parameter (that is, the design parameter in the field 15412).
  • the field 15415 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change other design parameters (that is, design parameters other than the field 15412). .
  • FIG. 14 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154.
  • the field 15416 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change the same design parameter (that is, the design parameter in the field 154112).
  • a plurality of different methods are used for calculating the similarity, and the similarity obtained using each method is used as a vector element.
  • the document processing apparatus 10 determines in advance a correspondence relationship between the change content indicated by the change instruction 191 and the change location of the design data 192 changed by the change content. Learning is performed, and a location where the design data 192 is changed by the new change instruction 191 is estimated based on the learning result. Thereby, it is possible to predict the range of influence by the change instruction using only the change instruction 191 described as a sentence. This eliminates the need for manual extraction of design parameters from the change instruction 191 and reduces the work burden and cost for predicting the influence range.
  • FIG. 15 is a process flow diagram showing an operation in the response document learning phase of the document processing apparatus 10 according to the second embodiment.
  • the document processing apparatus 10 newly performs an answer book learning phase between the learning phase and the estimation phase.
  • the answer learning phase is a phase for learning the influence of the answer 194 to the change instruction 191 on the design data 192, which is similar to the learning phase, but the answer 194 answers that the design cannot be changed.
  • the learning phase is different from the learning phase.
  • the difference from the learning phase will be mainly described.
  • FIG. 15 Answer book learning phase: step S1501
  • the document processing apparatus 10 acquires, from the file server 19, for example, the communication program 152, which has written a reply to the change instruction 191 in the past.
  • Step S1502 The document analysis program 153 analyzes the answer book 194 acquired in step S1501 and extracts the dependency relationship between words and phrases. This step is generally the same as step S202, except that the response content for the change instruction 191 is extracted instead of extracting the change content for the design data 192. Therefore, it is necessary to extract a word suggesting an answer to the change content such as “unchangeable” instead of a word suggesting a change such as “change”.
  • the change word dictionary 161 stores these words in advance. Since the answer target of the answer sheet 194 is considered to be a design parameter, the point that extracts a word suggesting the design parameter is the same as in the first embodiment.
  • step S1503 The learning program 154 calculates the feature amount of the answer sheet 194 acquired in step S1501 based on the document analysis result in step S1502. The processing in this step is the same as that in step S203.
  • step S1504 The learning program 154 learns the correspondence between the response content indicated by the response document 191 and the changed portion of the design data 192 changed by the response content, using the feature amount of the response document 191 extracted in step S1503. To do.
  • Step S1504 Supplement
  • the learning program 154 reflects the learning result based on the answer sheet 194 in the negative direction with respect to the learning model 164. For example, learning may be performed using a vector element of the feature quantity vector 1541 that is multiplied by -1 and inverted in sign.
  • FIG. 16 is a diagram showing a layout example of the change instruction 191 described in a table format.
  • the text described at the bottom of FIG. 16 is the same as that described in FIG. 4 of the first embodiment. However, since there are other cells before that, these cells are removed or the text portion It is necessary to specify the position.
  • FIG. 17 is a processing flowchart for explaining details of step S202 in the third embodiment. Hereinafter, each step of FIG. 17 will be described.
  • Step S2028 The document analysis program 153 executes the following steps S2028 and S2029 instead of step S2021.
  • step S2028 the document analysis program 153 converts the change instruction 191 into text with coordinates.
  • the text with coordinates is document data in which, for example, the coordinates with the origin at the upper left of the document are internally given to the text portion described in the change instruction 191. Since the text with coordinates is a known technique, its details are omitted.
  • Step S2029 The document analysis program 153 analyzes the layout of the change instruction 191 using the result of step S2028. Since the technique for analyzing the layout of a document using the text with coordinates is a known technique, the details thereof are omitted. The following steps are the same as those after step S2022 of the first embodiment.
  • FIG. 18 is a configuration diagram of a document processing system 1000 according to the fourth embodiment of the present invention.
  • the document processing system 1000 is a system having various additional functions in addition to the document processing apparatus 10 described in the first to third embodiments.
  • the document processing system 1000 includes a storage 1100, an extraction processing device (hereinafter, ETL: Extract / Transform / Load) 1200, a content server 1300, a search server 1410, a metadata server 1420, and the document processing device described in the first to third embodiments. 10.
  • An application program 1500 is included.
  • the storage 1100 is a storage device that stores various data, and functions as an alternative to the file server 19 in the first to third embodiments. For example, document data 1101, CAD data 1102, mail data 1103, and the like are stored. A change instruction 191 and a reply 194 are also stored on the storage 1100.
  • the ETL 1200 is a device that extracts necessary items from each data stored in the storage 1100.
  • the ETL 1200 stores the change location extraction program 1201, the association program 1202, and the association rule data 1203 in a storage device such as an HDD. These operations will be described later.
  • the content server 1300 is a server that provides various data to a host server group, and stores the metadata 1301 and the content data 1302 in a storage device such as an HDD.
  • the metadata 1301 is data describing attribute information of the content data 1302.
  • the search server 1410 and the metadata server 1420 are servers that search the content data 1302 or the metadata 1301 in response to a request from the application program 1500.
  • the search server 1401 searches the content data 1302 using the index 1411 extracted from the content data 1302.
  • the metadata server 1420 searches the metadata 1301 using the database 1421 obtained by indexing the metadata 1301.
  • the application program 1500 is executed by an appropriate computer and provides various functions according to its use.
  • the document processing apparatus 10 can be instructed to estimate the range of influence by the change instruction 191 and the result can be received and presented to the user.
  • a design application such as a CAD tool, a workflow application that transmits / receives a change instruction 191 or an answer 194, and the like can also be included.
  • the document processing apparatus 10 and other servers and apparatuses can be configured integrally.
  • each program installed in the ETL 1200 may be mounted on the document processing apparatus 10 and the document processing apparatus 10 may execute these programs.
  • ⁇ Embodiment 4 Function of ETL1200>
  • the learning of the correspondence between the change instruction 191 and the changed portion of the design data 192 has been described.
  • the design data 192 is described in a special format such as CAD data, for example, there is a possibility that a location where the design data 192 has been changed in the past cannot be easily specified. Therefore, in the fourth embodiment, the ETL 1200 extracts a changed portion of the design data 192 to reduce the burden on the operator.
  • the ETL 1200 supports the process of associating the change instruction 191 and the design data 192 in advance as the pre-process, thereby reducing the burden on the operator.
  • FIG. 19 is a processing flow diagram showing the overall operation of the document processing system 1000.
  • the document processing system 1000 newly performs steps S1901 to S1902 in addition to the operation of the document processing apparatus 10 described in the first to third embodiments. Hereinafter, these steps will be described.
  • Step S1901 The ETL 1200 executes the change location extraction program 1201 by the CPU, and extracts a location where the design data 192 has been changed by the past change instruction 191. Details of this step will be described later with reference to FIG.
  • Step S1902 The ETL 1200 executes the association program 1202 by the CPU, and associates the past change instruction 191 with the design data 192 changed thereby. Details of this step will be described later with reference to FIG.
  • FIG. 20 is a diagram showing a processing image of step S1901.
  • the case where the design data 192 is CAD data is illustrated. Since CAD data is drawing data, it is necessary to convert the data into log data dumped in a text format in order to specify the changed portion. In this example, the log data is taken into the spreadsheet software.
  • the design data 192 has been changed as indicated by reference numeral 191 ′ by the change instruction 191.
  • the difference between the two may be taken in order to identify the changed part, but the CAD data cannot always be identified by simply taking the difference. This is because the part number 1921 may be changed according to the work process (for example, every month).
  • the change location extraction program 1201 extracts the change location of the design data 192 using this feature. Specifically, the design data 191 and 191 ′ before and after the change are sorted by the part number 1921, and the places with the highest degree of coincidence in other columns (for example, the part names shown in the AE column in FIG. 20) correspond to each other. Estimated. The degree of coincidence may be obtained by a known method such as DP matching. In order to extract the changed part more reliably, the user may visually correct the result extracted by the changed part extraction program 1201 and correct it as necessary.
  • FIG. 21 is a diagram showing a processing image of step S1902.
  • the change instruction 191 and the design data 192 are usually created through different workflows and applications, so the formats of the two do not necessarily match, and a corresponding work load is generated for associating them.
  • design information such as a part number included in the design data 192 should be described in some form in the change instruction 191. Since CAD data is drawing data, it is considered that CAD data is often specified as a drawing number in the change instruction 191. Therefore, the association program 1202 assumes that the design data 192 including the drawing number described in the change instruction 191 corresponds to the change instruction 191 and presents the fact to the user. The user can visually check the presentation and associate the two more accurately.
  • the association program 1202 preferentially checks the correspondence between these parts and the design data 192. Also good. In addition, even if the drawing numbers are the same, if the date and time are far from each other, it is considered that the relationship between the two is low, so the date and time (the date and time described in the document or the update date and time) may be Only when they are within a predetermined range (for example, within one week), they may be associated with each other.
  • a rule as to which item in the change instruction 191 is associated with which item in the design data 192 can be defined in advance as the association rule data 1203.
  • the ETL 1200 extracts a portion where the design data 192 has been changed by the past change instruction 191. Thereby, even if the design data 192 is described in a special format, it is possible to efficiently learn the correspondence between the change instruction 191 and the changed portion.
  • the ETL 1200 associates the past change instruction 191 with the design data 192 changed thereby. Thereby, even if both are created separately, the work burden for an operator to associate these can be reduced.
  • the present invention is not limited to the above-described embodiment, and includes various modifications.
  • the above embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described.
  • a part of the configuration of one embodiment can be replaced with the configuration of another embodiment.
  • the configuration of another embodiment can be added to the configuration of a certain embodiment. Further, with respect to a part of the configuration of each embodiment, another configuration can be added, deleted, or replaced.
  • the present invention can also be applied to the case where the change is instructed with a document for other data.
  • the above components, functions, processing units, processing means, etc. may be realized in hardware by designing some or all of them, for example, with an integrated circuit.
  • Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor.
  • Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

Abstract

The purpose of the present invention is to estimate the scope of impact of design changes by using change content in which change instructions instructing changes in data such as design data are notated. This document processing program subjects the following to machine learning: the correspondence relationship between change content instructed by document data; and the changed portions of data changed by the change content. On the basis of those results, the document processing program estimates portions for which data will be changed according to new change instructions (see figure 2).

Description

文書処理プログラム、文書処理装置、文書処理システム、文書処理方法Document processing program, document processing apparatus, document processing system, and document processing method
 本発明は、文書を解析する技術に関する。 The present invention relates to a technique for analyzing a document.
 製品を設計する際には、新たに設計する新規設計と、既存の設計に基づいて部分的な変更を加える設計変更とがある。例えば、設備プラントに据え付ける製品を考えた場合、設備プラントを現況調査した結果、部品間の干渉等によって当初想定していた据え付け場所に当該製品を置くことができないような場合など、製品の設計変更を余儀なくされる場合がしばしばある。 When designing a product, there are a new design to be newly designed and a design change in which a partial change is made based on an existing design. For example, when considering a product to be installed in an equipment plant, the design of the product has been changed, such as when the equipment plant cannot be placed at the installation location that was initially assumed due to interference between parts as a result of an investigation of the current status of the equipment plant. Often forced to do.
 製品を設計変更する場合、機能的に関連する他の部品を付加的に設計変更する必要が生じる場合がある。またその関連部品をどのように設計変更するかなどの派生的な影響も考慮しなければならない。このように、製品の設計変更においては直接・間接の様々な影響を考慮する必要がある。そのため、製品のある部品に対する設計変更による影響範囲(影響が及ぶ他部品の仕様、図面、工程等)やその度合(工数増減、コスト増減等)を迅速に把握することは、設計変更の実施要否を判断する際にきわめて重要である。 ∙ When changing the design of a product, it may be necessary to additionally change the design of other functionally related parts. Derivative effects such as how to change the design of the related parts must also be considered. In this way, it is necessary to consider various direct and indirect effects when changing the design of a product. Therefore, it is necessary to implement a design change to quickly grasp the extent of influence (specifications, drawings, processes, etc. of other parts affected by the design) and the extent (increase / decrease in man-hours, increase / decrease in costs, etc.) It is extremely important when judging no.
 下記特許文献1は、設計変更の影響を予測する技術として、『変更に関する過去の履歴情報である変更実績データベースを有する変更影響予測装置(請求項1)』を記載している。同文献は、『元の変更内容を考慮することによって影響範囲を精度良く絞り込むことができるパターンを生成する(要約)』ことを目的としている。 The following Patent Document 1 describes “a change impact prediction apparatus having a change record database that is past history information related to a change (claim 1)” as a technology for predicting the impact of a design change. The purpose of this document is to “generate a pattern (summary) that can narrow down the range of influence with high precision by considering the original change contents”.
特開2012―14308号公報JP 2012-14308 A
 実際に設計変更を実施する際には、変更内容を指示する文書データ(変更指示書)によって変更指示がなされる。上記特許文献1に記載されている技術においては、変更実績データベースを利用するに際して、同データベースが格納しているデータ項目などをキーにしてデータベースを検索することが前提になっていると考えられる。したがって、変更指示書から検索キーとするデータ項目などをあらかじめ手作業によって抽出しておく必要があり、相応の作業負担や作業ミスが生じる可能性がある。 When actually making a design change, a change instruction is given by document data (change instruction sheet) instructing the content of the change. In the technique described in Patent Document 1, it is considered that when a change record database is used, it is assumed that the database is searched using a data item stored in the database as a key. Therefore, it is necessary to manually extract the data item or the like as the search key from the change instruction in advance, and there is a possibility that a corresponding work load or work error may occur.
 本発明は、上記のような課題に鑑みてなされたものであり、設計データ等のデータに対する変更を指示する変更指示書が記述している変更内容を用いて、設計変更の影響範囲を推定することを目的とする。 The present invention has been made in view of the above-described problems, and estimates the influence range of a design change by using a change content described in a change instruction that instructs a change to data such as design data. For the purpose.
 本発明に係る文書処理プログラムは、文書データが指示する変更内容と、その変更内容によって変更されたデータの変更部分との間の対応関係を機械学習しておき、その結果に基づき、新たな変更指示によってデータが変更される部分を推定する。 The document processing program according to the present invention machine-learns the correspondence between the change content indicated by the document data and the changed portion of the data changed by the change content, and newly changes based on the result. The part where the data is changed by the instruction is estimated.
 本発明に係る文書処理プログラムによれば、変更指示書が記述している変更内容を用いて変更による影響を推定することができるので、手作業による影響予測に起因する作業負担やミスを軽減し、影響範囲を迅速に予測することができる。 According to the document processing program according to the present invention, it is possible to estimate the influence of the change using the change contents described in the change instruction, thereby reducing the work load and mistakes caused by the manual influence prediction. , The impact range can be predicted quickly.
 上記した以外の課題、構成、および効果は、以下の実施形態の説明により明らかになるであろう。 Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.
実施形態1に係る文書処理装置10の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a document processing apparatus 10 according to a first embodiment. 文書処理装置10の全体動作を示す処理フロー図である。FIG. 4 is a processing flowchart showing the overall operation of the document processing apparatus 10. ステップS202の詳細を説明する処理フロー図である。It is a processing flowchart explaining the detail of step S202. 変更指示書191のテキスト例とステップS2022の処理イメージを示す図である。It is a figure which shows the example of a text of the change instruction 191, and the process image of step S2022. ステップS2022の処理例を示すフローチャートである。It is a flowchart which shows the process example of step S2022. ステップS2023の詳細を説明するフローチャートである。It is a flowchart explaining the detail of step S2023. 特殊な書式で記載された変更指示書191のテキスト例、およびステップS2024の処理イメージを示す図である。It is a figure which shows the example of a text of the change instruction 191 described in the special format, and the process image of step S2024. ステップS2025~S2027を省略したと仮定した場合におけるステップS203の処理イメージを示す図である。It is a figure which shows the processing image of step S203 when it assumes that step S2025-S2027 were abbreviate | omitted. ステップS2026の処理イメージを示す図である。It is a figure which shows the process image of step S2026. ステップS2027の処理イメージを示す図である。It is a figure which shows the process image of step S2027. ステップS2026~S2027の処理例を示すフローチャートである。10 is a flowchart showing an example of processing in steps S2026 to S2027. 学習プログラム154が使用する類似度ベクトル1541の別構成例を示す図である。It is a figure which shows another structural example of the similarity vector 1541 which the learning program 154 uses. 学習プログラム154が使用する類似度ベクトル1541の別構成例を示す図である。It is a figure which shows another structural example of the similarity vector 1541 which the learning program 154 uses. 学習プログラム154が使用する類似度ベクトル1541の別構成例を示す図である。It is a figure which shows another structural example of the similarity vector 1541 which the learning program 154 uses. 実施形態2に係る文書処理装置10の回答書学習フェーズにおける動作を示す処理フロー図である。FIG. 10 is a process flow diagram illustrating an operation in an answer book learning phase of the document processing apparatus according to the second embodiment. 表形式で記述された変更指示書191のレイアウト例を示す図である。It is a figure which shows the example of a layout of the change instruction 191 described in the table format. 実施形態3におけるステップS202の詳細を説明する処理フロー図である。FIG. 10 is a process flow diagram illustrating details of step S202 in the third embodiment. 実施形態4に係る文書処理システム1000の構成図である。It is a block diagram of the document processing system 1000 which concerns on Embodiment 4. FIG. 文書処理システム1000の全体動作を示す処理フロー図である。FIG. 10 is a processing flowchart showing the overall operation of the document processing system 1000. ステップS1901の処理イメージを示す図である。It is a figure which shows the process image of step S1901. ステップS1902の処理イメージを示す図である。It is a figure which shows the process image of step S1902.
<実施の形態1:装置構成>
 図1は、本発明の実施形態1に係る文書処理装置10の構成を示すブロック図である。文書処理装置10は、CAD(Computer Aided Design)データなどのデータ(以下ではCADデータとする)に対する変更内容を指示する変更指示書を解析し、その変更内容が影響を及ぼす範囲を推定する装置である。文書処理装置10は、一般的なコンピュータを用いて構成することができ、例えば入力装置11、表示装置12、CPU(Central Processing Unit)13、印刷装置14、メモリ15、記憶部16を備える。
<Embodiment 1: Device configuration>
FIG. 1 is a block diagram showing a configuration of a document processing apparatus 10 according to the first embodiment of the present invention. The document processing apparatus 10 is an apparatus that analyzes a change instruction indicating a change content for data (hereinafter referred to as CAD data) such as CAD (Computer Aided Design) data, and estimates a range in which the change content affects. is there. The document processing apparatus 10 can be configured using a general computer, and includes, for example, an input device 11, a display device 12, a CPU (Central Processing Unit) 13, a printing device 14, a memory 15, and a storage unit 16.
 入力装置11は、ユーザからの操作指示等の入力を受け付ける装置であり、例えばキーボード、マウス、タッチパネルを用いて構成することができる。表示装置12は、ユーザに種々の情報を提示する装置であり、例えば液晶ディスプレイのような画面表示装置を用いて構成することができる。印刷装置14は、ユーザに提供する種々の情報を必要に応じて印刷する。 The input device 11 is a device that accepts an input of an operation instruction or the like from a user, and can be configured using, for example, a keyboard, a mouse, or a touch panel. The display device 12 is a device that presents various information to the user, and can be configured using a screen display device such as a liquid crystal display. The printing device 14 prints various information provided to the user as necessary.
 CPU13は、メモリ15に格納されているプログラムを実行することによって種々の機能を実現する演算装置である。以下では記載の便宜上、各プログラムを動作主体として説明する場合があるが、実際にこれらプログラムを実行するのはCPU13であることを付言しておく。 The CPU 13 is an arithmetic unit that realizes various functions by executing a program stored in the memory 15. In the following, for convenience of description, each program may be described as an operation subject, but it is added that the CPU 13 actually executes these programs.
 メモリ15は、CPU13が実行するプログラムを格納する記憶装置であり、OS(Operating System)151、通信プログラム152、文書解析プログラム153、学習プログラム154、変更対象推定プログラム155を格納する。各プログラムの詳細については後述する。これら各プログラムは、本実施形態1における「文書処理プログラム」に相当する。同等の機能を、回路デバイスなどのハードウェアを用いて実現することもできる。メモリ15はさらに他のプログラムを格納してもよいし、それらのプログラムをCPU13が実行するときに参照されるデータを格納してもよいし、CPU13が実行した処理の結果を格納してもよい。 The memory 15 is a storage device that stores a program executed by the CPU 13, and stores an OS (Operating System) 151, a communication program 152, a document analysis program 153, a learning program 154, and a change target estimation program 155. Details of each program will be described later. Each of these programs corresponds to a “document processing program” in the first embodiment. Equivalent functions can also be realized using hardware such as circuit devices. The memory 15 may further store other programs, may store data that is referred to when the CPU 13 executes these programs, or may store the results of processing executed by the CPU 13. .
 記憶部16は、CPU13が各プログラムの記述にしたがって種々の処理を実行する際に参照する情報を格納する記憶装置であり、変化単語辞書161、一般用語辞書162、テンプレート辞書163、学習モデル164を格納する。記憶部16はさらに他の情報を格納してもよい。 The storage unit 16 is a storage device that stores information to be referred to when the CPU 13 executes various processes according to the description of each program. The storage unit 16 includes a change word dictionary 161, a general term dictionary 162, a template dictionary 163, and a learning model 164. Store. The storage unit 16 may further store other information.
 典型的には、メモリ15はDRAM(Dynamic Random Access Memory)のような高速かつ揮発性の記憶装置であり、記憶部16はHDD(Hard Disk Drive)またはフラッシュメモリのような大容量かつ不揮発性の記憶装置であるが、その他の種類の記憶装置でもよい。各プログラムは、あらかじめ記憶部16に格納しておき、CPU13が実行するときにメモリ15へコピーするようにしてもよい。記憶部16が格納しているデータの少なくとも一部を、必要に応じて一時的にメモリ15へコピーするようにしてもよい。 Typically, the memory 15 is a high-speed and volatile storage device such as a DRAM (Dynamic Random Access Memory), and the storage unit 16 is a large-capacity and nonvolatile storage such as an HDD (Hard Disk Drive) or a flash memory. Although it is a storage device, other types of storage devices may be used. Each program may be stored in the storage unit 16 in advance and copied to the memory 15 when the CPU 13 executes the program. You may make it copy at least one part of the data which the memory | storage part 16 stores to the memory 15 temporarily as needed.
 文書処理装置10は、通信ネットワーク18を介してファイルサーバ19と接続することができる。ファイルサーバ19は、通信ネットワーク18に接続された計算機であり、1台以上存在する。 The document processing apparatus 10 can be connected to the file server 19 via the communication network 18. The file server 19 is a computer connected to the communication network 18, and one or more file servers 19 exist.
 図1には文書処理装置10が1つのコンピュータによって実現される例を示したが、同様の機能を複数の計算機によって実現することもできる。例えば、記憶部161が格納している各辞書をいずれかのファイルサーバ19に格納し、通信プログラム152と通信ネットワーク18を介して送受信するようにしてもよい。あるいは、変更対象となるCADデータや変更指示書をいずれかのファイルサーバ19に格納し、通信プログラム152と通信ネットワーク18を介して送受信するようにしてもよい。以下では説明の簡易のため各文書データはファイルサーバ19上に格納されているものとする。 FIG. 1 shows an example in which the document processing apparatus 10 is realized by a single computer, but a similar function can be realized by a plurality of computers. For example, each dictionary stored in the storage unit 161 may be stored in one of the file servers 19 and transmitted / received via the communication program 152 and the communication network 18. Alternatively, the CAD data to be changed or the change instruction may be stored in any one of the file servers 19 and transmitted / received via the communication program 152 and the communication network 18. In the following, it is assumed that each document data is stored on the file server 19 for simplicity of explanation.
<実施の形態1:動作手順>
 図2は、文書処理装置10の全体動作を示す処理フロー図である。文書処理装置10の動作は、学習フェーズと推定フェーズに分かれる。各フェーズの概略動作について以下に説明する。
<Embodiment 1: Operation procedure>
FIG. 2 is a processing flowchart showing the overall operation of the document processing apparatus 10. The operation of the document processing apparatus 10 is divided into a learning phase and an estimation phase. The general operation of each phase will be described below.
(図2:学習フェーズ:ステップS201)
 文書処理装置10は、はじめに学習フェーズを実施する。学習フェーズにおいて、文書処理装置10は、過去にCADデータに対する変更内容を指示した変更指示書191を例えば通信プログラム152によってファイルサーバ19から取得する。通信プログラム152は、本実施形態1における「文書取得プログラム」「文書取得部」に相当する。
(FIG. 2: Learning phase: Step S201)
The document processing apparatus 10 first performs a learning phase. In the learning phase, the document processing apparatus 10 acquires, from the file server 19, for example, the communication program 152 for instructing the change contents for the CAD data in the past. The communication program 152 corresponds to the “document acquisition program” and “document acquisition unit” in the first embodiment.
(図2:学習フェーズ:ステップS202)
 文書解析プログラム153は、ステップS201で取得した変更指示書191を解析して単語や文節の係り受け関係を抽出する。本ステップの詳細については後述の図3で改めて説明する。
(FIG. 2: Learning phase: Step S202)
The document analysis program 153 analyzes the change instruction 191 acquired in step S201 and extracts the dependency relationship between words and phrases. Details of this step will be described later with reference to FIG.
(図2:学習フェーズ:ステップS203)
 学習プログラム154は、ステップS202における文書解析の結果に基づき、ステップS201で取得した変更指示書191の特徴量を算出する。本ステップにおいて算出する特徴量の例については、後述の図8~図10、図12~図14で改めて説明する。
(FIG. 2: Learning phase: Step S203)
The learning program 154 calculates the feature amount of the change instruction 191 acquired in step S201 based on the result of document analysis in step S202. An example of the feature amount calculated in this step will be described again in FIGS. 8 to 10 and FIGS. 12 to 14 described later.
(図2:学習フェーズ:ステップS204)
 文書処理装置10は、設計データ192(例えばCADデータ)を例えば通信プログラム152によってファイルサーバ19から取得する。学習プログラム154は、ステップS203で抽出した変更指示書191の特徴量を用いて、変更指示書191が指示する変更内容とその変更内容によって変更された設計データ192の変更部分との間の対応関係を学習する。学習結果は学習モデル164に格納される。
(FIG. 2: Learning phase: Step S204)
The document processing apparatus 10 acquires design data 192 (for example, CAD data) from the file server 19 by using, for example, the communication program 152. The learning program 154 uses the feature amount of the change instruction 191 extracted in step S203, and the correspondence relationship between the change content instructed by the change instruction 191 and the changed portion of the design data 192 changed by the change content To learn. The learning result is stored in the learning model 164.
(図2:推定フェーズ:ステップS205~S207)
 文書処理装置10は、未学習の新たな変更内容を記述した変更指示書191について、ステップS201~S203と同様の処理を実施する。
(FIG. 2: Estimation phase: steps S205 to S207)
The document processing apparatus 10 performs the same processing as steps S201 to S203 on the change instruction 191 describing the new change contents that have not been learned.
(図2:推定フェーズ:ステップS208)
 変更対象推定プログラム155は、学習モデル164として蓄積されている学習結果を用いて、新たな変更指示書191が指示する変更内容によって設計データ192が影響を受ける範囲を推定する。例えば、新たな変更指示書191の特徴量ベクトルと最も距離が近い特徴量ベクトルを特定し、その特徴量ベクトルに関連する過去の変更指示書191が変更指示した設計パラメータが新たな変更指示書191によって変更されると推定することができる。推定結果は、例えば表示装置12や印刷装置14などの出力部を介して出力される。
(FIG. 2: Estimation phase: Step S208)
The change target estimation program 155 uses the learning result accumulated as the learning model 164 to estimate the range in which the design data 192 is affected by the change content indicated by the new change instruction 191. For example, a feature quantity vector that is closest to the feature quantity vector of the new change instruction 191 is specified, and the design parameter designated by the past change instruction 191 related to the feature quantity vector is the new change instruction 191. It can be estimated that it is changed. The estimation result is output via an output unit such as the display device 12 or the printing device 14.
<実施の形態1:動作手順の詳細>
 図3は、ステップS202の詳細を説明する処理フロー図である。以下、図3の各ステップについて説明する。
<Embodiment 1: Details of Operation Procedure>
FIG. 3 is a processing flowchart for explaining details of step S202. Hereinafter, each step of FIG. 3 will be described.
(図3:ステップS2021~S2022)
 文書解析プログラム153は、変更指示書191からテキスト(文字列)部分を抽出する(S2021)。文書解析プログラム153は、後述の図4で説明する手順にしたがって、そのテキストを変更内容に関する部分とそれ以外の部分(例えば経過を説明する前段部分)に振り分ける(S2022)。
(FIG. 3: Steps S2021 to S2022)
The document analysis program 153 extracts a text (character string) portion from the change instruction 191 (S2021). The document analysis program 153 distributes the text into a part related to the changed content and a part other than that (for example, a front part explaining the progress) in accordance with a procedure described later with reference to FIG. 4 (S2022).
(図3:ステップS2023)
 文書解析プログラム153は、ステップS2022において抽出した、変更内容を指示するテキストに対して、後述の図6で説明する言語解析を実施する。
(FIG. 3: Step S2023)
The document analysis program 153 performs language analysis described with reference to FIG. 6 described later on the text instructing the content of change extracted in step S2022.
(図3:ステップS2024)
 文書解析プログラム153は、ステップS2022において抽出した、変更内容を指示するテキストから、テンプレート辞書163を用いることにより単語を抽出する。本ステップは、ステップS2023の言語解析によっては抽出することが難しい書式で記述されたテキストから単語を抽出するためのものである。本ステップの詳細は後述の図7で改めて説明する。
(FIG. 3: Step S2024)
The document analysis program 153 uses the template dictionary 163 to extract words from the text that indicates the change contents extracted in step S2022. This step is for extracting words from text described in a format that is difficult to extract by the language analysis in step S2023. Details of this step will be described later with reference to FIG.
(図3:ステップS2025)
 文書解析プログラム153は、変化単語辞書161を用いて、設計データ192に変更箇所を生じさせる指示を記載していると想定される単語を、キーワードとして抽出する。例えば、「変更する」「訂正する」といった単語は、設計データ192に対する設計変更を指示している可能性が高いと考えられるので、本ステップにおいてこれら語句をキーワードとして抽出する。本ステップは、各単語の特徴量を定める際の準備としての意義もあるし、次のステップS2026における処理開始箇所を定める意義もある。
(FIG. 3: Step S2025)
The document analysis program 153 uses the change word dictionary 161 to extract, as a keyword, a word that is assumed to contain an instruction for causing a change location in the design data 192. For example, words such as “change” and “correct” are considered to have a high possibility of instructing a design change with respect to the design data 192. Therefore, in this step, these words are extracted as keywords. This step has a significance as preparation for determining the feature amount of each word, and also has a significance to determine a processing start location in the next step S2026.
(図3:ステップS2026)
 文書解析プログラム153は、ステップS2025で抽出したキーワードを起点として所定範囲内でテキストを前方または後方に探索し、キーワードの対象になっていると想定される語句を抽出する。本ステップの詳細は、後述の図9で改めて説明する。
(FIG. 3: Step S2026)
The document analysis program 153 searches the text forward or backward within a predetermined range using the keyword extracted in step S2025 as a starting point, and extracts a phrase that is assumed to be the target of the keyword. Details of this step will be described later with reference to FIG.
(図3:ステップS2027)
 文書解析プログラム153は、一般用語辞書162を用いて、設計データ192に対する変更箇所に影響しないと想定される単語を、ステップS2022において抽出したテキストから除外する。本ステップの詳細は後述の図10で改めて説明する。
(FIG. 3: Step S2027)
Using the general term dictionary 162, the document analysis program 153 excludes words that are assumed not to affect the changed portion of the design data 192 from the text extracted in step S2022. Details of this step will be described later with reference to FIG.
(図3:ステップS2025~S2027:補足)
 ステップS2025~S2027は、ステップS2022において抽出したテキストと設計データ192の変更箇所との間の相関度を高めるために実施する処理であるため、省略することもできる。これらステップを省略する場合の例については、後述の図8で改めて説明する。文書解析プログラム153は、以上のステップによって抽出されたテキストを、次のステップS203に対して出力する。
(FIG. 3: Steps S2025 to S2027: Supplement)
Steps S2025 to S2027 are processes performed to increase the degree of correlation between the text extracted in step S2022 and the changed portion of the design data 192, and can be omitted. An example in which these steps are omitted will be described later with reference to FIG. The document analysis program 153 outputs the text extracted by the above steps to the next step S203.
 図4は、変更指示書191のテキスト例とステップS2022の処理イメージを示す図である。変更指示書191は、必ずしも設計データ192に対する変更内容のみを記載しているわけではなく、その他のテキストが含まれるのがむしろ通常である。ステップS2022において文書解析プログラム153は、変更指示書191を変更内容部分1912とその他部分1911に振り分ける。振り分けの基準は、例えば以下のような手法が考えられる。 FIG. 4 is a diagram showing a text example of the change instruction 191 and a processing image of step S2022. The change instruction 191 does not necessarily describe only the changes to the design data 192, but usually includes other text. In step S2022, the document analysis program 153 sorts the change instruction 191 into the change content portion 1912 and the other portion 1911. For example, the following method can be considered as the sorting standard.
(図4:判定手法の例その1)
 変更内容またはそれ以外の部分を記述する文章において典型的に含まれる語句が、今回ステップS2022において処理する文章内に存在するか否かに基づき判定する。例えば「~ので」「~に伴い」などの語句は、日本語においては因果関係を示唆する語句であるため、その後方には設計データ192の変更箇所に関連する語句が存在していると想定される。したがってこれらの語句を境界にして、以後のテキストは変更内容を記述していると推定することができる。同様の手法は後述の図9においても用いている。
(Fig. 4: Example of determination method 1)
Judgment is made based on whether or not a phrase typically included in the sentence describing the changed content or the other part is present in the sentence to be processed in step S2022 this time. For example, phrases such as “to no” and “to accompany” are phrases that suggest a causal relationship in Japanese, and therefore, it is assumed that there are phrases related to the changed part of the design data 192 behind the phrase. Is done. Therefore, it can be presumed that the following text describes the contents of the change with these words as boundaries. A similar method is used in FIG. 9 described later.
(図4:判定手法の例その2)
 過去の変更指示書191において変更内容またはそれ以外の部分を記述した文章と、今回ステップS2022において処理する文章との間で、文章パターンマッチングなどの類否判定を実施し、その結果得られた類似度に基づきテキストの内容を判定する。
(Fig. 4: Example of determination method 2)
Similarity determination, such as sentence pattern matching, is performed between a sentence describing the change contents or other parts in the past change instruction 191 and a sentence to be processed in step S2022 this time. Determine text content based on degree.
 図5は、ステップS2022の処理例を示すフローチャートである。ここでは図4で説明した例その1を前提とする。ステップS20221~S20223は後述の図6におけるステップS20231~S20233と同様である。ステップS20224において、文書解析プログラム153は、変更内容またはそれ以外の部分を記述する文章において典型的に含まれる語句を登録したキーワード辞書を参照し、ステップS20221~S20223で抽出した語句がそれらに該当するか否かを判定する。ステップS20225において、文書解析プログラム153は、ステップS20224の判定結果にしたがってテキストを振り分ける。 FIG. 5 is a flowchart showing an example of processing in step S2022. Here, the example 1 described in FIG. 4 is assumed. Steps S20221 to S20223 are the same as steps S20231 to S20233 in FIG. In step S20224, the document analysis program 153 refers to the keyword dictionary in which words / phrases typically included in the sentences describing the contents of change or other parts are registered, and the words / phrases extracted in steps S20221 to S20223 correspond to them. It is determined whether or not. In step S20225, the document analysis program 153 sorts the text according to the determination result of step S20224.
 図6は、ステップS2023の詳細を説明するフローチャートである。ステップS20231において、文書解析プログラム153は、形態素辞書を用いて、入力されたテキストの形態素解析を実施する。ステップS20232において、文書解析プログラム153は、係り受け辞書を用いて、入力されたテキストの係り受けを実施する。ステップS20233において、文書解析プログラム153は、ストップワード辞書を用いて、入力されたテキストのステップワードを判定する。 FIG. 6 is a flowchart for explaining the details of step S2023. In step S20231, the document analysis program 153 performs morpheme analysis of the input text using the morpheme dictionary. In step S20232, the document analysis program 153 performs dependency on the input text using the dependency dictionary. In step S20233, the document analysis program 153 determines a step word of the input text using the stop word dictionary.
 以上の処理により、文書解析プログラム153は、入力されたテキストから、単語、文節、およびそれらの係り受け関係を抽出し、その結果を出力する。各辞書は、例えば記憶部16にあらかじめ格納しておくことができる。 Through the above processing, the document analysis program 153 extracts words, phrases, and their dependency relationships from the input text, and outputs the results. Each dictionary can be stored in the storage unit 16 in advance, for example.
 図7は、特殊な書式で記載された変更指示書191のテキスト例、およびステップS2024の処理イメージを示す図である。図7に示す変更指示書191は、括弧「()」、コロン「:」、矢印「→」を用いて記述されている。 FIG. 7 is a diagram showing a text example of the change instruction 191 written in a special format and a processing image of step S2024. The change instruction 191 shown in FIG. 7 is described using parentheses “()”, a colon “:”, and an arrow “→”.
 これら特殊記号を用いて記述されたテキスト内の語句は、言語解析によって抽出することが難しい一方で、経験則上、変更内容に係る事項を記述している場合が多いと考えられる。そこで文書解析プログラム153は、これら特殊書式をあらかじめ定義したテンプレート辞書163を用いてその特殊書式を認識し、そのなかに含まれる単語を抽出することとした。 語 Although words in texts written using these special symbols are difficult to extract by linguistic analysis, it is considered that there are many cases that describe matters related to changes based on empirical rules. Therefore, the document analysis program 153 recognizes the special format by using the template dictionary 163 in which these special formats are defined in advance, and extracts words included therein.
 図7に示すテンプレート辞書163は、2つのテンプレートを例示している。テンプレート1は、テキストをコロン直前の単語とその後のテキストに分け、さらに後者は矢印の前後に記述された単語に分けるべき旨を定義している。テンプレート2は、テキストを開始括弧直前の単語と括弧内のテキストに分け、さらに後者は矢印の前後に記述された単語に分けるべき旨を定義している。 The template dictionary 163 shown in FIG. 7 illustrates two templates. Template 1 defines that the text should be divided into the word immediately before the colon and the text after it, and the latter should be divided into words described before and after the arrow. Template 2 defines that the text should be divided into the word immediately before the opening parenthesis and the text within the parenthesis, and the latter should be divided into words described before and after the arrow.
 例えばテンプレート1については、コロン直前の単語を抽出すべき旨を定義し、テンプレート2については、開始括弧直前の単語と括弧内の矢印直前の単語を抽出すべき旨を定義しているものと仮定する。この場合、図7に示す変更指示書191から、同図の符号191’に示すように各単語を抽出することができる。 For example, assume that template 1 defines that the word immediately before the colon should be extracted, and template 2 defines that the word immediately before the opening parenthesis and the word immediately before the arrow in the parenthesis should be extracted. To do. In this case, each word can be extracted from the change instruction 191 shown in FIG. 7 as indicated by reference numeral 191 'in FIG.
 なお、図7に示したテンプレート辞書163は例示であり、文字や記号などのテキストパターンによって定義することができるものであれば任意のテンプレートを定義することができる。 Note that the template dictionary 163 shown in FIG. 7 is an example, and any template can be defined as long as it can be defined by text patterns such as characters and symbols.
 図8は、ステップS2025~S2027を省略したと仮定した場合におけるステップS203の処理イメージを示す図である。変更指示書191は、ステップS202の結果、図8の符合191’’に示すように、各単語に分割される。学習プログラム154は、各単語に適当な特徴量を付与し、各単語の個数を次元数とする特徴量ベクトル1541として出力する。例えば図8に例示する文章は、21個の単語からなるので21次元の特徴量ベクトルとしてその特徴を表現することができる。 FIG. 8 is a diagram showing a processing image of step S203 when it is assumed that steps S2025 to S2027 are omitted. As a result of step S202, the change instruction 191 is divided into each word as indicated by reference numeral 191 '' in FIG. The learning program 154 gives an appropriate feature amount to each word, and outputs it as a feature amount vector 1541 having the number of each word as the number of dimensions. For example, since the sentence illustrated in FIG. 8 is composed of 21 words, the feature can be expressed as a 21-dimensional feature quantity vector.
 特徴量ベクトル1541における各単語の特徴量の具体的な数値は、例えばその単語によって設計データ192が変更される度合いを、その単語と当該変更箇所の相関度などによって数値化したものを用いることができる。この相関度は、経験則に基づいて定めてもよいし、過去実績の統計などに基づいて定めてもよい。 The specific numerical value of the feature amount of each word in the feature amount vector 1541 may be, for example, a value obtained by digitizing the degree to which the design data 192 is changed by the word based on the degree of correlation between the word and the changed portion. it can. This degree of correlation may be determined based on empirical rules, or may be determined based on past performance statistics or the like.
 図9は、ステップS2026の処理イメージを示す図である。変化単語辞書161に登録されているキーワードの前後に存在する語句は、設計データ192の変更箇所との間の相関が高いと想定される。そこで文書解析プログラム153は、変更指示書191を各単語に分割した後、キーワードを起点として前方または後方に単語を検索する。学習プログラム154は、その範囲内にある単語にはその他の単語よりも大きな特徴量を付与する。これにより、設計データ192の変更箇所と変更指示書191内の各単語との間の相関度を、特徴量ベクトル1541としてより正確に表現することができる。 FIG. 9 is a diagram showing a processing image of step S2026. It is assumed that the phrases existing before and after the keyword registered in the change word dictionary 161 have a high correlation with the changed part of the design data 192. Therefore, the document analysis program 153 divides the change instruction 191 into each word, and then searches for the word forward or backward starting from the keyword. The learning program 154 gives a larger feature amount to words within the range than other words. As a result, the degree of correlation between the change location of the design data 192 and each word in the change instruction 191 can be expressed more accurately as the feature quantity vector 1541.
 図9に示す変化単語辞書161は、2種類のキーワードを例示している。キーワード1は、日本語において因果関係を示唆する語句であるため、その後方には設計データ192の変更箇所に関連する語句が存在していると想定される。キーワード2は、日本語において変更を示唆する語句であるため、その前方には変更箇所に関連する語句が存在していると想定される。このように、キーワードの前方と後方のいずれを探索するかは、キーワードの種類によって異なるので、キーワードとセットにして定義しておけばよい。 The change word dictionary 161 shown in FIG. 9 illustrates two types of keywords. Since the keyword 1 is a phrase that suggests a causal relationship in Japanese, it is assumed that there is a phrase related to the changed part of the design data 192 behind it. Since keyword 2 is a phrase that suggests a change in Japanese, it is assumed that there is a phrase related to the changed part in front of it. In this way, whether to search forward or backward of a keyword differs depending on the type of keyword, so it may be defined as a set with a keyword.
 図10は、ステップS2027の処理イメージを示す図である。文章を構成するために必要な一般用語(例えば助詞、当該分野における慣用語など)は、設計データ192の変更箇所との間の相関が低いと想定される。かかる一般用語を用いて特徴量ベクトル1541を構成しても、その特徴量ベクトル1541は変更指示書191と設計データ192の変更箇所との間の対応関係を十分に表現できないと考えられる。 FIG. 10 is a diagram showing a processing image of step S2027. It is assumed that a general term (for example, a particle, an idiomatic term in the field) necessary for composing a sentence has a low correlation with a changed part of the design data 192. Even if the feature quantity vector 1541 is configured using such general terms, it is considered that the feature quantity vector 1541 cannot sufficiently express the correspondence between the change instruction 191 and the change location of the design data 192.
 そこで文書解析プログラム153は、一般用語辞書162に登録されている一般用語を、変更指示書191からあらかじめ削除する。学習プログラム154は、一般用語を削除した後のテキストを用いて特徴量ベクトル1541を構成する。これにより、特徴量ベクトル1541と変更箇所との間の相関度を高め、両者の対応関係をより適切に表現することができる。 Therefore, the document analysis program 153 deletes the general terms registered in the general term dictionary 162 from the change instruction 191 in advance. The learning program 154 configures the feature quantity vector 1541 using the text after deleting the general terms. Thereby, the correlation degree between the feature-value vector 1541 and a change location can be raised, and both correspondence can be expressed more appropriately.
 図11は、ステップS2026~S2027の処理例を示すフローチャートである。ステップS20261~S20263はステップS20231~S20233と同様であるため、これらステップの結果を流用して省略することもできる。ステップS20264~S20265において、文書解析プログラム153は、図9~図10で説明した処理を実施する。 FIG. 11 is a flowchart showing a processing example of steps S2026 to S2027. Since steps S20261 to S20263 are the same as steps S20231 to S20233, the results of these steps can be used and omitted. In steps S20264 to S20265, the document analysis program 153 performs the processing described with reference to FIGS.
 図12は、学習プログラム154が使用する類似度ベクトル1541の別構成例を示す図である。類似度ベクトル1541は、変更指示書191が指示する設計データ192に対する変更内容と、その変更内容によって変更される設計データ192の変更部分との間の対応関係を学習するために使用するものであるから、同様の機能を有する限り、図8~図10で例示した以外の構成を採用することができる。 FIG. 12 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154. The similarity vector 1541 is used to learn the correspondence between the change contents of the design data 192 indicated by the change instruction 191 and the changed portion of the design data 192 changed by the change contents. Therefore, as long as it has the same function, configurations other than those exemplified in FIGS. 8 to 10 can be adopted.
 図12に示す例においては、変更指示書191が変更指示する設計パラメータ(例えばCADパラメータ)と、過去に各設計パラメータを変更するよう指示した変更指示書内の文章との間の類似度を、過去の各文章について求め、これを類似度ベクトル1541のベクトル要素とした。 In the example shown in FIG. 12, the similarity between the design parameter (for example, CAD parameter) instructed by the change instruction 191 and the text in the change instruction instructed to change each design parameter in the past, Each past sentence was obtained and used as a vector element of the similarity vector 1541.
 フィールド15411は、変更指示書191内において設計データ192を変更するよう指示している文章を保持する。フィールド15412は、フィールド15411の文章が変更するよう指示する設計パラメータを保持する。フィールド15413は、フィールド15411の文章と、過去の変更指示書191内の文章のうち各設計パラメータ(設計パラメータ1~nを例示した)を変更するよう指示するものとの間の類似度を保持する。 The field 15411 holds a text instructing to change the design data 192 in the change instruction 191. The field 15412 holds design parameters that instruct the text in the field 15411 to change. The field 15413 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change each design parameter (illustrating the design parameters 1 to n). .
 図12に例示する類似度ベクトル1541の構成例によれば、変更すべき設計パラメータを媒介にして他の設計パラメータとの間の相関度を学習することができる。以下に説明する図13における構成例についても同様である。 12, according to the configuration example of the similarity vector 1541 illustrated in FIG. 12, the degree of correlation with other design parameters can be learned through the design parameter to be changed. The same applies to the configuration example in FIG. 13 described below.
 図13は、学習プログラム154が使用する類似度ベクトル1541の別構成例を示す図である。図12に示す例においては、変更指示書191とは関係しない設計パラメータを変更するよう指示する過去の文章についても、設計パラメータ毎に変更指示書191との間の類似度を求めることとした。図13に示す例においては、変更指示書191とは関係しない設計パラメータについては、例えば各過去文章と変更指示書191との間の類似度の平均値などを用いて、1つの類似度に集約することとした。 FIG. 13 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154. In the example shown in FIG. 12, the similarity between the change instruction 191 for each design parameter and the past sentence instructing to change the design parameter not related to the change instruction 191 is determined. In the example illustrated in FIG. 13, design parameters that are not related to the change instruction 191 are aggregated into one similarity using, for example, an average value of the similarity between each past sentence and the change instruction 191. It was decided to.
 フィールド15414は、フィールド15411の文章と、過去の変更指示書191内の文章のうち同じ設計パラメータ(すなわちフィールド15412の設計パラメータ)を変更するよう指示するものとの間の類似度を保持する。フィールド15415は、フィールド15411の文章と、過去の変更指示書191内の文章のうちその他の設計パラメータ(すなわちフィールド15412以外の設計パラメータ)を変更するよう指示するものとの間の類似度を保持する。 The field 15414 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change the same design parameter (that is, the design parameter in the field 15412). The field 15415 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change other design parameters (that is, design parameters other than the field 15412). .
 図14は、学習プログラム154が使用する類似度ベクトル1541の別構成例を示す図である。フィールド15416は、フィールド15411の文章と、過去の変更指示書191内の文章のうち同じ設計パラメータ(すなわちフィールド15412の設計パラメータ)を変更するよう指示するものとの間の類似度を保持する。ただし、類似度を算出する手法については複数のそれぞれ異なるものを採用し、各手法を用いて求めた類似度をベクトル要素とした。 FIG. 14 is a diagram illustrating another configuration example of the similarity vector 1541 used by the learning program 154. The field 15416 holds the similarity between the text in the field 15411 and the text in the past change instruction 191 that instructs to change the same design parameter (that is, the design parameter in the field 154112). However, a plurality of different methods are used for calculating the similarity, and the similarity obtained using each method is used as a vector element.
<実施の形態1:まとめ>
 以上のように、本実施形態1に係る文書処理装置10は、変更指示書191が指示する変更内容と、その変更内容によって変更される設計データ192の変更箇所との間の対応関係をあらかじめ機械学習しておき、その学習結果に基づき、新たな変更指示書191によって設計データ192が変更される箇所を推定する。これにより、文章として記述された変更指示書191のみを用いて、変更指示による影響範囲を予測することができる。したがって、変更指示書191から設計パラメータを手作業で抽出するなどの作業が不要になり、影響範囲を予測する際の作業負担やコストを軽減することができる。
<Embodiment 1: Summary>
As described above, the document processing apparatus 10 according to the first embodiment determines in advance a correspondence relationship between the change content indicated by the change instruction 191 and the change location of the design data 192 changed by the change content. Learning is performed, and a location where the design data 192 is changed by the new change instruction 191 is estimated based on the learning result. Thereby, it is possible to predict the range of influence by the change instruction using only the change instruction 191 described as a sentence. This eliminates the need for manual extraction of design parameters from the change instruction 191 and reduces the work burden and cost for predicting the influence range.
<実施の形態2>
 実施形態1では、過去の変更指示書191を用いて設計データ192に対する影響範囲を学習する動作例を説明したが、実際の設計作業においては、変更指示書191によって指示された変更内容を実現できないなどの理由により、その変更内容が設計データ192へ反映されない場合がある。設計変更の可否は、一般に変更指示書191に対する回答書として記述される。そこで本発明の実施形態2では、回答書の内容を追加的に学習する動作例を説明する。
<Embodiment 2>
In the first embodiment, the operation example of learning the range of influence on the design data 192 using the past change instruction 191 has been described. However, in the actual design work, the change content instructed by the change instruction 191 cannot be realized. For some reason, the change contents may not be reflected in the design data 192. Whether the design can be changed is generally described as an answer to the change instruction 191. Therefore, in the second embodiment of the present invention, an operation example for additionally learning the contents of an answer sheet will be described.
 図15は、本実施形態2に係る文書処理装置10の回答書学習フェーズにおける動作を示す処理フロー図である。本実施形態2において、文書処理装置10は、学習フェーズと推定フェーズの間において、新たに回答書学習フェーズを実施する。回答書学習フェーズは、変更指示書191に対する回答書194が設計データ192に与えた影響を学習するフェーズであり、学習フェーズと類似しているが、設計変更不可の旨を回答書194が回答している場合は学習フェーズと反対方向に学習を進める点が異なる。以下、学習フェーズとの差異点を中心に説明する。 FIG. 15 is a process flow diagram showing an operation in the response document learning phase of the document processing apparatus 10 according to the second embodiment. In the second embodiment, the document processing apparatus 10 newly performs an answer book learning phase between the learning phase and the estimation phase. The answer learning phase is a phase for learning the influence of the answer 194 to the change instruction 191 on the design data 192, which is similar to the learning phase, but the answer 194 answers that the design cannot be changed. The learning phase is different from the learning phase. Hereinafter, the difference from the learning phase will be mainly described.
(図15:回答書学習フェーズ:ステップS1501)
 文書処理装置10は、過去に変更指示書191に対する回答を記述した回答書194を例えば通信プログラム152によってファイルサーバ19から取得する。
(FIG. 15: Answer book learning phase: step S1501)
The document processing apparatus 10 acquires, from the file server 19, for example, the communication program 152, which has written a reply to the change instruction 191 in the past.
(図15:回答書学習フェーズ:ステップS1502)
 文書解析プログラム153は、ステップS1501で取得した回答書194を解析して単語や文節の係り受け関係を抽出する。本ステップは概ねステップS202と同様であるが、設計データ192に対する変更内容を抽出することに代えて、変更指示書191に対する回答内容を抽出する点が異なる。したがって、「変更する」などの変更を示唆する単語に代えて、「変更不可」などの変更内容に対する回答を示唆する単語を抽出する必要がある。変化単語辞書161は、これらの単語をあらかじめ格納しておくこととする。回答書194の回答対象は設計パラメータであると考えられるので、設計パラメータを示唆する単語を抽出する点は実施形態1と同様である。
(FIG. 15: Answer book learning phase: Step S1502)
The document analysis program 153 analyzes the answer book 194 acquired in step S1501 and extracts the dependency relationship between words and phrases. This step is generally the same as step S202, except that the response content for the change instruction 191 is extracted instead of extracting the change content for the design data 192. Therefore, it is necessary to extract a word suggesting an answer to the change content such as “unchangeable” instead of a word suggesting a change such as “change”. The change word dictionary 161 stores these words in advance. Since the answer target of the answer sheet 194 is considered to be a design parameter, the point that extracts a word suggesting the design parameter is the same as in the first embodiment.
(図15:回答書学習フェーズ:ステップS1503)
 学習プログラム154は、ステップS1502における文書解析の結果に基づき、ステップS1501で取得した回答書194の特徴量を算出する。本ステップの処理は、ステップS203と同様である。
(FIG. 15: Answer book learning phase: step S1503)
The learning program 154 calculates the feature amount of the answer sheet 194 acquired in step S1501 based on the document analysis result in step S1502. The processing in this step is the same as that in step S203.
(図15:回答書学習フェーズ:ステップS1504)
 学習プログラム154は、ステップS1503で抽出した回答書191の特徴量を用いて、回答書191が指示する回答内容とその回答内容によって変更された設計データ192の変更部分との間の対応関係を学習する。
(FIG. 15: Answer book learning phase: step S1504)
The learning program 154 learns the correspondence between the response content indicated by the response document 191 and the changed portion of the design data 192 changed by the response content, using the feature amount of the response document 191 extracted in step S1503. To do.
(図15:回答書学習フェーズ:ステップS1504:補足)
 設計変更を許可する旨の回答である場合には、変更指示書191によって変更内容を学習することができるので、回答書194によって新たに学習すべき事項はない。設計変更を拒否する旨の回答である場合には、変更指示書191が指示した変更内容は設計データ192に反映されなかったことになるので、変更指示書191による学習結果をキャンセルする必要がある。そこで学習プログラム154は、回答書194による学習結果を学習モデル164に対して負の方向に反映させる。例えば、特徴量ベクトル1541のベクトル要素に-1を乗算して符号を反転させたものを用いて学習を実施すればよい。
(FIG. 15: Answer book learning phase: Step S1504: Supplement)
If the answer is that the design change is permitted, the change contents can be learned by the change instruction 191, and there is no new matter to be learned by the answer 194. If the answer is that the design change is rejected, the change contents instructed by the change instruction 191 are not reflected in the design data 192, and the learning result based on the change instruction 191 needs to be cancelled. . Therefore, the learning program 154 reflects the learning result based on the answer sheet 194 in the negative direction with respect to the learning model 164. For example, learning may be performed using a vector element of the feature quantity vector 1541 that is multiplied by -1 and inverted in sign.
<実施の形態2:まとめ>
 以上のように、本実施形態2に係る文書処理装置10は、変更指示書191が指示する変更内容を拒否する旨を回答書194が回答している場合は、変更指示書191内の変更内容と設計データ192の変更箇所との間の対応関係をキャンセルする向きに機械学習を実施する。これにより、設計変更の可否を学習結果に反映し、より精度よく影響範囲を推定することができる。
<Embodiment 2: Summary>
As described above, the document processing apparatus 10 according to the second embodiment, when the answer sheet 194 replies that the change contents specified by the change instruction sheet 191 are rejected, the change contents in the change instruction sheet 191. And machine learning are performed in such a direction as to cancel the correspondence between the change points in the design data 192. Thereby, the possibility of design change can be reflected in the learning result, and the influence range can be estimated more accurately.
<実施の形態3>
 実施形態1~2では、変更指示書191が記号や文字列によって記述されている例を説明した。しかし実際の変更指示書191は、例えば表形式のように様々な書式で記述されている場合がある。そこで本発明の実施形態3では、変更指示書191を解析する際に、その書式を解析してテキスト部分を抽出する動作例を説明する。その他の構成は実施形態1~2と同様であるため、以下では変更指示書191の書式を解析する動作について中心に説明する。
<Embodiment 3>
In the first and second embodiments, the example in which the change instruction 191 is described by symbols and character strings has been described. However, the actual change instruction 191 may be described in various formats such as a table format. Therefore, in the third embodiment of the present invention, an example of operation for extracting a text portion by analyzing the format when analyzing the change instruction 191 will be described. Since the other configuration is the same as that of the first and second embodiments, the operation for analyzing the format of the change instruction 191 will be mainly described below.
 図16は、表形式で記述された変更指示書191のレイアウト例を示す図である。図16のうち最下段に記述されているテキストは実施形態1の図4で説明したものと同じであるが、その前に他のセルが存在するため、これらセルを除去するかまたはテキスト部分の位置を特定する必要がある。 FIG. 16 is a diagram showing a layout example of the change instruction 191 described in a table format. The text described at the bottom of FIG. 16 is the same as that described in FIG. 4 of the first embodiment. However, since there are other cells before that, these cells are removed or the text portion It is necessary to specify the position.
 図17は、本実施形態3におけるステップS202の詳細を説明する処理フロー図である。以下、図17の各ステップについて説明する。 FIG. 17 is a processing flowchart for explaining details of step S202 in the third embodiment. Hereinafter, each step of FIG. 17 will be described.
(図17:ステップS2028)
 文書解析プログラム153は、ステップS2021に代えて以下のステップS2028とS2029を実施する。ステップS2028において文書解析プログラム153は、変更指示書191を座標付テキストに変換する。座標付テキストとは、変更指示書191が記述しているテキスト部分について、例えば文書の左上を原点とする座標を内部的に付与した文書データである。座標付テキストは公知技術であるため、その詳細については省略する。
(FIG. 17: Step S2028)
The document analysis program 153 executes the following steps S2028 and S2029 instead of step S2021. In step S2028, the document analysis program 153 converts the change instruction 191 into text with coordinates. The text with coordinates is document data in which, for example, the coordinates with the origin at the upper left of the document are internally given to the text portion described in the change instruction 191. Since the text with coordinates is a known technique, its details are omitted.
(図17:ステップS2029)
 文書解析プログラム153は、ステップS2028の結果を用いて、変更指示書191のレイアウトを解析する。座標付テキストを用いて文書のレイアウトを解析する手法は公知技術であるため、その詳細については省略する。以下のステップは実施形態1のステップS2022以降と同様である。
(FIG. 17: Step S2029)
The document analysis program 153 analyzes the layout of the change instruction 191 using the result of step S2028. Since the technique for analyzing the layout of a document using the text with coordinates is a known technique, the details thereof are omitted. The following steps are the same as those after step S2022 of the first embodiment.
<実施の形態4>
 図18は、本発明の実施形態4に係る文書処理システム1000の構成図である。文書処理システム1000は、実施形態1~3で説明した文書処理装置10に加え、種々の付加機能を備えたシステムである。文書処理システム1000は、ストレージ1100、抽出処理装置(以下では、ETL:Extract/Transform/Load)1200、コンテンツサーバ1300、検索サーバ1410、メタデータサーバ1420、実施形態1~3で説明した文書処理装置10、アプリケーションプログラム1500を有する。
<Embodiment 4>
FIG. 18 is a configuration diagram of a document processing system 1000 according to the fourth embodiment of the present invention. The document processing system 1000 is a system having various additional functions in addition to the document processing apparatus 10 described in the first to third embodiments. The document processing system 1000 includes a storage 1100, an extraction processing device (hereinafter, ETL: Extract / Transform / Load) 1200, a content server 1300, a search server 1410, a metadata server 1420, and the document processing device described in the first to third embodiments. 10. An application program 1500 is included.
 ストレージ1100は、種々のデータを格納する記憶装置であり、実施形態1~3におけるファイルサーバ19の代替として機能する。例えば文書データ1101、CADデータ1102、メールデータ1103などを格納する。変更指示書191や回答書194もストレージ1100上に格納される。 The storage 1100 is a storage device that stores various data, and functions as an alternative to the file server 19 in the first to third embodiments. For example, document data 1101, CAD data 1102, mail data 1103, and the like are stored. A change instruction 191 and a reply 194 are also stored on the storage 1100.
 ETL1200は、ストレージ1100が格納している各データから必要な項目を抽出する装置である。ETL1200は、変更箇所抽出プログラム1201、対応付けプログラム1202、対応付け規則データ1203を、HDDなどの記憶装置に格納する。これらの動作については後述する。 The ETL 1200 is a device that extracts necessary items from each data stored in the storage 1100. The ETL 1200 stores the change location extraction program 1201, the association program 1202, and the association rule data 1203 in a storage device such as an HDD. These operations will be described later.
 コンテンツサーバ1300は、上位のサーバ群に対して各種データを提供するサーバであり、メタデータ1301、コンテンツデータ1302を、HDDなどの記憶装置に格納する。メタデータ1301は、コンテンツデータ1302の属性情報を記述するデータである。 The content server 1300 is a server that provides various data to a host server group, and stores the metadata 1301 and the content data 1302 in a storage device such as an HDD. The metadata 1301 is data describing attribute information of the content data 1302.
 検索サーバ1410とメタデータサーバ1420は、アプリケーションプログラム1500からの要求を受けてコンテンツデータ1302またはメタデータ1301を検索するサーバである。検索サーバ1401は、コンテンツデータ1302から抽出したインデックス1411を用いてコンテンツデータ1302を検索する。メタデータサーバ1420は、メタデータ1301をインデックス化したデータベース1421を用いてメタデータ1301を検索する。 The search server 1410 and the metadata server 1420 are servers that search the content data 1302 or the metadata 1301 in response to a request from the application program 1500. The search server 1401 searches the content data 1302 using the index 1411 extracted from the content data 1302. The metadata server 1420 searches the metadata 1301 using the database 1421 obtained by indexing the metadata 1301.
 アプリケーションプログラム1500は、適当なコンピュータによって実行され、その用途に応じて様々な機能を提供する。例えば、上記検索要求の他、文書処理装置10に対して変更指示書191による影響範囲を推定するよう指示し、その結果を受け取ってユーザに提示することができる。CADツールなどの設計アプリケーションや、変更指示書191または回答書194を送受信するワークフローアプリケーションなどを含めることもできる。 The application program 1500 is executed by an appropriate computer and provides various functions according to its use. For example, in addition to the search request, the document processing apparatus 10 can be instructed to estimate the range of influence by the change instruction 191 and the result can be received and presented to the user. A design application such as a CAD tool, a workflow application that transmits / receives a change instruction 191 or an answer 194, and the like can also be included.
 文書処理装置10とその他のサーバや装置は、一体的に構成することもできる。例えばETL1200が搭載している各プログラムを文書処理装置10上に実装し、文書処理装置10がこれらプログラムを実行するようにしてもよい。 The document processing apparatus 10 and other servers and apparatuses can be configured integrally. For example, each program installed in the ETL 1200 may be mounted on the document processing apparatus 10 and the document processing apparatus 10 may execute these programs.
<実施の形態4:ETL1200の機能>
 実施形態1~3では、変更指示書191と設計データ192の変更箇所との間の対応関係を学習することを説明した。しかし、設計データ192が例えばCADデータのように特殊な形式で記述されている場合、過去に設計データ192が変更された箇所を容易に特定できない可能性がある。そこで本実施形態4において、ETL1200は、設計データ192の変更箇所を抽出し、作業者の負担を軽減することとした。
<Embodiment 4: Function of ETL1200>
In the first to third embodiments, the learning of the correspondence between the change instruction 191 and the changed portion of the design data 192 has been described. However, when the design data 192 is described in a special format such as CAD data, for example, there is a possibility that a location where the design data 192 has been changed in the past cannot be easily specified. Therefore, in the fourth embodiment, the ETL 1200 extracts a changed portion of the design data 192 to reduce the burden on the operator.
 また、変更指示書191内の変更内容と設計データ192の変更箇所との間の対応関係を学習する前提として、変更指示書191がどの設計データ192に対して変更指示したのかを特定する必要がある。しかし、変更指示書191と設計データ192がそれぞれ別のシステムやワークフローを介して作成される場合、設計工程が終了した後になって別の作業者が両者の対応関係を把握することが困難になる可能性がある。そこで本実施形態4において、ETL1200は、前段処理として、変更指示書191と設計データ192をあらかじめ対応付ける処理を支援し、作業者の負担を軽減することとした。 In addition, as a premise for learning the correspondence between the change contents in the change instruction 191 and the changed part of the design data 192, it is necessary to specify which design data 192 the change instruction 191 has instructed to change. is there. However, when the change instruction 191 and the design data 192 are created through different systems and workflows, it becomes difficult for another worker to grasp the correspondence between the two after the design process is completed. there is a possibility. Therefore, in the fourth embodiment, the ETL 1200 supports the process of associating the change instruction 191 and the design data 192 in advance as the pre-process, thereby reducing the burden on the operator.
 図19は、文書処理システム1000の全体動作を示す処理フロー図である。文書処理システム1000は、実施形態1~3で説明した文書処理装置10の動作に加えて、新たにステップS1901~S1902を実施する。以下、これらのステップについて説明する。 FIG. 19 is a processing flow diagram showing the overall operation of the document processing system 1000. The document processing system 1000 newly performs steps S1901 to S1902 in addition to the operation of the document processing apparatus 10 described in the first to third embodiments. Hereinafter, these steps will be described.
(図19:学習フェーズ:ステップS1901)
 ETL1200は、CPUによって変更箇所抽出プログラム1201を実行し、過去の変更指示書191によって設計データ192が変更された箇所を抽出する。本ステップの詳細は後述の図20で改めて説明する。
(FIG. 19: Learning phase: Step S1901)
The ETL 1200 executes the change location extraction program 1201 by the CPU, and extracts a location where the design data 192 has been changed by the past change instruction 191. Details of this step will be described later with reference to FIG.
(図19:学習フェーズ:ステップS1902)
 ETL1200は、CPUによって対応付けプログラム1202を実行し、過去の変更指示書191とこれによって変更された設計データ192を対応付ける。本ステップの詳細は後述の図21で改めて説明する。
(FIG. 19: Learning phase: Step S1902)
The ETL 1200 executes the association program 1202 by the CPU, and associates the past change instruction 191 with the design data 192 changed thereby. Details of this step will be described later with reference to FIG.
 図20は、ステップS1901の処理イメージを示す図である。ここでは設計データ192がCADデータである場合を例示した。CADデータは図面データであるため、その変更箇所を特定するためには、テキスト形式でダンプしたログデータなどにいったん変換する必要がある。ここではログデータを表計算ソフト上に取り込んだ様子を例示した。 FIG. 20 is a diagram showing a processing image of step S1901. Here, the case where the design data 192 is CAD data is illustrated. Since CAD data is drawing data, it is necessary to convert the data into log data dumped in a text format in order to specify the changed portion. In this example, the log data is taken into the spreadsheet software.
 図20において、設計データ192が変更指示書191によって符号191’のように変更されたものと仮定する。一般的には、変更箇所を特定するためには両者の差分を取ればよいが、CADデータに関しては必ずしも単純に差分を取れば変更箇所を特定できるとは限らない。部品番号1921が作業工程に応じて(例えば月毎に)変更される可能性があるからである。 20, it is assumed that the design data 192 has been changed as indicated by reference numeral 191 ′ by the change instruction 191. In general, the difference between the two may be taken in order to identify the changed part, but the CAD data cannot always be identified by simply taking the difference. This is because the part number 1921 may be changed according to the work process (for example, every month).
 ただし、部品番号1921が変更になっても、その並び順は同じ部品であれば同一に保たれるのが通常である。変更箇所抽出プログラム1201は、この特徴を利用して設計データ192の変更箇所を抽出する。具体的には、変更前後に係る設計データ191と191’を部品番号1921でソートし、その他の列(例えば図20のAE列に示す部品名称)の一致度が最も高い箇所が互いに対応しているものと推定する。一致度は例えばDPマッチングなどの公知手法によって求めればよい。変更箇所をより確実に抽出するためには、変更箇所抽出プログラム1201が抽出した結果をユーザが目視確認するなどして必要に応じ訂正すればよい。 However, even if the part number 1921 is changed, the arrangement order is normally kept the same if the parts are the same. The change location extraction program 1201 extracts the change location of the design data 192 using this feature. Specifically, the design data 191 and 191 ′ before and after the change are sorted by the part number 1921, and the places with the highest degree of coincidence in other columns (for example, the part names shown in the AE column in FIG. 20) correspond to each other. Estimated. The degree of coincidence may be obtained by a known method such as DP matching. In order to extract the changed part more reliably, the user may visually correct the result extracted by the changed part extraction program 1201 and correct it as necessary.
 図21は、ステップS1902の処理イメージを示す図である。一般に変更指示書191と設計データ192は異なるワークフローやアプリケーションを介して作成されるのが通常であるため、両者の書式は必ずしも合致せず、これらを対応付けるために相応の作業負担が生じる。 FIG. 21 is a diagram showing a processing image of step S1902. In general, the change instruction 191 and the design data 192 are usually created through different workflows and applications, so the formats of the two do not necessarily match, and a corresponding work load is generated for associating them.
 ただし、設計データ192に含まれている部品番号などの設計情報は、変更指示書191のなかに何らかの形式で記述されているはずである。CADデータは図面データであるため、変更指示書191のなかにおいては図面番号としてCADデータが指定されている場合が多いと考えられる。そこで対応付けプログラム1202は、変更指示書191のなかに記載されている図面番号が含まれる設計データ192が、当該変更指示書191と対応していると仮定し、その旨をユーザに提示する。ユーザはその提示を目視確認して両者をより正確に対応付けることができる。 However, design information such as a part number included in the design data 192 should be described in some form in the change instruction 191. Since CAD data is drawing data, it is considered that CAD data is often specified as a drawing number in the change instruction 191. Therefore, the association program 1202 assumes that the design data 192 including the drawing number described in the change instruction 191 corresponds to the change instruction 191 and presents the fact to the user. The user can visually check the presentation and associate the two more accurately.
 図面番号は、典型的には変更指示書191の件名部分や本文に記載されていると考えられるので、対応付けプログラム1202はこれら部分と設計データ192の対応関係を優先的に検査するようにしてもよい。また、図面番号が同一であっても日時が離れている場合は両者の関係性が低いと考えられるので、両者の日時(文書内に記載されている日時でもよいし、更新日時でもよい)が所定範囲内(例えば1週間以内など)である場合に限って、両者を対応付けるようにしてもよい。 Since the drawing number is considered to be typically described in the subject part or text of the change instruction 191, the association program 1202 preferentially checks the correspondence between these parts and the design data 192. Also good. In addition, even if the drawing numbers are the same, if the date and time are far from each other, it is considered that the relationship between the two is low, so the date and time (the date and time described in the document or the update date and time) may be Only when they are within a predetermined range (for example, within one week), they may be associated with each other.
 変更指示書191内のいずれの項目を、設計データ192のいずれの項目と対応付けるかについての規則は、対応付け規則データ1203としてあらかじめ定義しておくことができる。 A rule as to which item in the change instruction 191 is associated with which item in the design data 192 can be defined in advance as the association rule data 1203.
<実施の形態4:まとめ>
 以上のように、本実施形態4に係る文書処理システム1000において、ETL1200は、過去の変更指示書191によって設計データ192が変更された箇所を抽出する。これにより、設計データ192が特殊な書式で記述されている場合であっても、変更指示書191とこれによる変更箇所との間の対応関係を効率よく学習することができる。
<Embodiment 4: Summary>
As described above, in the document processing system 1000 according to the fourth embodiment, the ETL 1200 extracts a portion where the design data 192 has been changed by the past change instruction 191. Thereby, even if the design data 192 is described in a special format, it is possible to efficiently learn the correspondence between the change instruction 191 and the changed portion.
 また、本実施形態2に係る文書処理システム1000において、ETL1200は、過去の変更指示書191とこれによって変更された設計データ192を対応付ける。これにより、両者が個別に作成された場合であっても、作業者がこれらを対応付けるための作業負担を軽減することができる。 In the document processing system 1000 according to the second embodiment, the ETL 1200 associates the past change instruction 191 with the design data 192 changed thereby. Thereby, even if both are created separately, the work burden for an operator to associate these can be reduced.
 本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。上記実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施形態の構成の一部を他の実施形態の構成に置き換えることもできる。また、ある実施形態の構成に他の実施形態の構成を加えることもできる。また、各実施形態の構成の一部について、他の構成を追加・削除・置換することもできる。 The present invention is not limited to the above-described embodiment, and includes various modifications. The above embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described. A part of the configuration of one embodiment can be replaced with the configuration of another embodiment. The configuration of another embodiment can be added to the configuration of a certain embodiment. Further, with respect to a part of the configuration of each embodiment, another configuration can be added, deleted, or replaced.
 例えば、以上の実施形態ではCADデータなどの設計データ192に対して変更を指示する例を説明したが、その他のデータに対して文書をもって変更指示する場合においても本発明を適用することができる。 For example, in the above embodiment, the example in which the change is instructed to the design data 192 such as the CAD data has been described, but the present invention can also be applied to the case where the change is instructed with a document for other data.
 上記各構成、機能、処理部、処理手段等は、それらの一部や全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、SSD(Solid State Drive)等の記録装置、ICカード、SDカード、DVD等の記録媒体に格納することができる。 The above components, functions, processing units, processing means, etc. may be realized in hardware by designing some or all of them, for example, with an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files for realizing each function can be stored in a recording device such as a memory, a hard disk, an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
 10:文書処理装置、11:入力装置、12:表示装置、13:CPU、14:印刷装置、15:メモリ、151:OS、152:通信プログラム、153:文書解析プログラム、154:学習プログラム、155:変更対象推定プログラム、16:記憶部、161:変化単語辞書、162:一般用語辞書、163:テンプレート辞書、164:学習モデル、18:通信ネットワーク、19:ファイルサーバ、1000:文書処理システム、1100:ストレージ、1200:抽出処理装置、1201:変更箇所抽出プログラム、1202:対応付けプログラム、1203:対応付け規則データ、1300:コンテンツサーバ、1410:検索サーバ、1420:メタデータサーバ、1500:アプリケーションプログラム。 10: Document processing device, 11: Input device, 12: Display device, 13: CPU, 14: Printing device, 15: Memory, 151: OS, 152: Communication program, 153: Document analysis program, 154: Learning program, 155 : Change target estimation program, 16: storage unit, 161: change word dictionary, 162: general term dictionary, 163: template dictionary, 164: learning model, 18: communication network, 19: file server, 1000: document processing system, 1100 : Storage, 1200: Extraction processing device, 1201: Change location extraction program, 1202: Association program, 1203: Association rule data, 1300: Content server, 1410: Search server, 1420: Metadata server, 1500: Application program.

Claims (15)

  1.  コンピュータに文書を処理させるプログラムであって、前記コンピュータに、
     データに対する変更内容を指示する文書データを取得する文書取得ステップ、
     前記文書データが指示する前記変更内容と、前記変更内容によって変更された前記データの変更部分との間の対応関係を機械学習する学習ステップ、
     前記機械学習の結果に基づき、新たな前記文書データが指示する前記データに対する変更内容によって前記データが変更される部分を推定する推定ステップ、
     を実行させることを特徴とする文書処理プログラム。
    A program for causing a computer to process a document, wherein the computer
    A document acquisition step for acquiring document data instructing changes to the data;
    A learning step of machine learning a correspondence relationship between the change content indicated by the document data and a change portion of the data changed by the change content;
    An estimation step for estimating a portion where the data is changed according to a change content to the data indicated by the new document data based on the result of the machine learning;
    A document processing program characterized by causing
  2.  前記文書処理プログラムはさらに、前記コンピュータに、
      前記文書データが記述しているテキストの単語または文節係受関係のうち少なくともいずれかを抽出する文書解析ステップを実行させ、
     前記学習ステップでは、前記コンピュータに、
      前記文書解析ステップにおいて抽出した前記単語または前記文節係受関係のうち少なくともいずれかと、前記文書データによって変更された前記データの変更部分との間の相関度を求めることにより、前記対応関係を機械学習するステップ、
     を実行させることを特徴とする請求項1記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Executing a document analysis step of extracting at least one of a word of words described in the document data or a phrase dependency relationship;
    In the learning step, the computer is
    The correspondence relationship is machine-learned by obtaining a correlation between at least one of the word or the phrase dependency relationship extracted in the document analysis step and a changed portion of the data changed by the document data. Step to do,
    The document processing program according to claim 1, wherein:
  3.  前記文書処理プログラムはさらに、前記コンピュータに、
     前記文書データが指示する前記データに対する変更内容に対する回答を記述した回答文書を取得する回答文書取得ステップを実行させ、
     前記文書解析ステップでは、前記コンピュータに、
      前記回答文書が記述しているテキストのうち前記変更内容を許可しない旨の前記回答を記述している部分を抽出させ、
     前記学習ステップでは、前記コンピュータに、
      前記変更内容を許可しない旨の前記回答を記述しているテキストが前記回答文書内に存在する場合は、前記対応関係を弱める向きに前記機械学習を実施させる
     ことを特徴とする請求項2記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Executing an answer document acquisition step of acquiring an answer document describing an answer to the change content to the data indicated by the document data;
    In the document analysis step, the computer
    Extracting the part describing the answer that the change content is not permitted from the text described in the answer document;
    In the learning step, the computer is
    The machine learning is performed in a direction in which the correspondence relationship is weakened when a text describing the response not permitting the change content exists in the response document. Document processing program.
  4.  前記文書処理プログラムはさらに、前記コンピュータに、
     前記データの変更部分を生じさせる単語として定義されたキーワードを登録した単語辞書を参照することにより、前記文書データ内に含まれる前記キーワードを抽出するステップを実行させ、
     前記学習ステップでは、前記コンピュータに、
      前記文書データから抽出された前記キーワードと、前記文書データによって変更された前記データの変更部分との間の相関度を求めることにより、前記対応関係を機械学習させる
     ことを特徴とする請求項2記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Performing a step of extracting the keyword included in the document data by referring to a word dictionary in which a keyword defined as a word causing a change part of the data is registered;
    In the learning step, the computer is
    The machine learning of the correspondence is performed by obtaining a correlation between the keyword extracted from the document data and a changed portion of the data changed by the document data. Document processing program.
  5.  前記文書処理プログラムはさらに、前記コンピュータに、
     前記データの変更部分を生じさせる単語として定義されたキーワードの周辺に存在するテキストパターンを記述するテンプレートを前記文書データに適用することにより、前記文書データ内に含まれる前記キーワードを抽出するステップを実行させ、
     前記学習ステップでは、前記コンピュータに、
      前記テンプレートを用いて前記文書データから抽出された前記キーワードと、前記文書データによって変更された前記データの変更部分との間の相関度を求めることにより、前記対応関係を機械学習させる
     ことを特徴とする請求項2記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Performing a step of extracting the keyword included in the document data by applying a template describing a text pattern existing around a keyword defined as a word that causes a change part of the data to the document data; Let
    In the learning step, the computer is
    Machine learning of the correspondence by obtaining a degree of correlation between the keyword extracted from the document data using the template and a changed portion of the data changed by the document data. The document processing program according to claim 2.
  6.  前記文書処理プログラムはさらに、前記コンピュータに、
     前記データの変更部分を生じさせる単語として定義されたキーワードを前記文書データから抽出するステップを実行させ、
     前記学習ステップでは、前記コンピュータに、
      前記文書データ内に含まれるテキストのうち前記キーワードから所定範囲内に存在する単語と、前記文書データによって変更された前記データの変更部分との間の相関度を求めることにより、前記対応関係を機械学習させる
     ことを特徴とする請求項2記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Performing a step of extracting from the document data a keyword defined as a word that causes a modified portion of the data;
    In the learning step, the computer is
    By calculating the degree of correlation between a word included in the document data within a predetermined range from the keyword and a changed portion of the data changed by the document data, the correspondence is determined by a machine. The document processing program according to claim 2, wherein learning is performed.
  7.  前記文書処理プログラムはさらに、前記コンピュータに、
     前記データの変更部分を生じさせない単語として定義された一般用語を登録した一般用語辞書を参照することにより、前記文書データ内に含まれる前記一般用語を特定するステップを実行させ、
     前記学習ステップでは、前記コンピュータに、
      前記文書データが記述しているテキストから前記一般用語を除いたものと、前記文書データによって変更された前記データの変更部分との間の相関度を求めることにより、前記対応関係を機械学習させる
     ことを特徴とする請求項2記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Performing the step of identifying the general terms included in the document data by referring to a general term dictionary that registers general terms defined as words that do not cause a change in the data;
    In the learning step, the computer is
    Machine learning of the correspondence by obtaining a correlation between a text described by the document data excluding the general term and a changed portion of the data changed by the document data. The document processing program according to claim 2.
  8.  前記文書処理プログラムはさらに、前記コンピュータに、
      前記文書データのレイアウトを解析し、前記解析によって得られた前記レイアウトのうち前記変更内容を記述した部分を特定するレイアウト解析ステップを実行させ、
     前記学習ステップでは、前記コンピュータに、
      前記レイアウト解析ステップにおいて特定した前記変更内容と、前記文書データによって変更された前記データの変更部分との間の相関度を求めることにより、前記対応関係を機械学習させる
     ことを特徴とする請求項2記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Analyzing the layout of the document data, and executing a layout analysis step for specifying a portion describing the change contents in the layout obtained by the analysis,
    In the learning step, the computer is
    3. The machine learning of the correspondence is performed by obtaining a degree of correlation between the changed content specified in the layout analysis step and a changed portion of the data changed by the document data. The document processing program described.
  9.  前記文書処理プログラムはさらに、前記コンピュータに、
      前記文書データに含まれるテキストのうち前記変更内容を記述した部分とそれ以前の部分との間の境界部分に配置される区切り語句を抽出するステップを実行させ、
     前記学習ステップでは、前記コンピュータに、
      前記文書データが記述しているテキストから前記区切り語句以前の部分を除いたものと、前記文書データによって変更された前記データの変更部分との間の相関度を求めることにより、前記対応関係を機械学習させる
     ことを特徴とする請求項2記載の文書処理プログラム。
    The document processing program is further stored in the computer.
    Executing a step of extracting a delimiter phrase arranged at a boundary part between a part describing the change content and a part before the part of the text included in the document data;
    In the learning step, the computer is
    By calculating the correlation between the text described in the document data excluding the part before the delimiter phrase and the changed part of the data changed by the document data, the correspondence is determined by the machine. The document processing program according to claim 2, wherein learning is performed.
  10.  前記学習ステップでは、前記コンピュータに、
      前記文書データの前記変更内容が指示する前記変更部分と、同じ前記変更部分を変更するよう指示する変更内容を含む他の文書データとの間の類似度を、複数の前記文書データについてそれぞれ記述することによって形成された特徴ベクトルを用いて、前記機械学習を実施させる
     ことを特徴とする請求項1記載の文書処理プログラム。
    In the learning step, the computer is
    The similarity between the change part indicated by the change contents of the document data and the other document data including the change contents instructed to change the same change part is described for each of the plurality of document data. The document processing program according to claim 1, wherein the machine learning is performed using a feature vector formed as described above.
  11.  前記文書取得ステップでは、前記コンピュータに、
      CADデータに対する変更内容を指示する前記文書データを取得させ、
     前記学習ステップでは、前記コンピュータに、
      前記変更内容と、前記変更内容によって変更された前記CADデータの変更部分との間の対応関係を機械学習させ、
     前記推定ステップでは、前記コンピュータに、
      前記機械学習の結果に基づき、新たな前記文書データが指示する前記CADデータに対する変更内容によって前記CADデータが変更される部分を推定させる
     ことを特徴とする請求項1記載の文書処理プログラム。
    In the document acquisition step, the computer
    Obtaining the document data instructing the change contents to the CAD data;
    In the learning step, the computer is
    Machine learning the correspondence between the changed content and the changed portion of the CAD data changed by the changed content;
    In the estimation step, the computer
    The document processing program according to claim 1, wherein a part of the CAD data to be changed is estimated based on a change content of the CAD data indicated by the new document data based on the result of the machine learning.
  12.  文書を処理する装置であって、
     データに対する変更内容を指示する文書データを取得する文書取得部と、
     前記文書データが指示する前記変更内容と、前記変更内容によって変更された前記データの変更部分との間の対応関係を機械学習する学習部と、
     前記機械学習の結果に基づき、新たな前記文書データが指示する前記データに対する変更内容によって前記データが変更される部分を推定する推定部と、
     を備えることを特徴とする文書処理装置。
    An apparatus for processing a document,
    A document acquisition unit that acquires document data instructing changes to the data;
    A learning unit that machine-learns the correspondence between the change content indicated by the document data and the changed portion of the data changed by the change content;
    Based on the result of the machine learning, an estimation unit that estimates a portion in which the data is changed according to a change content to the data indicated by the new document data;
    A document processing apparatus comprising:
  13.  請求項12記載の文書処理装置と、
     前記文書データが指示する前記変更内容によって前記データが変更された箇所を抽出する抽出処理装置と、
     を有し、
     前記データは、
      物品を構成する部品の設計情報を記述したCADデータとして構成されており、
     前記抽出処理装置は、
      前記CADデータが前記文書データによって変更された前後に係る前記部品の部品番号とその並び順を変更レコードとして取り出し、
      前記部品番号が前記変更の前後において一致しない場合には、前記変更レコード内の記録のうち、前記部品の並び順が最も類似するものを特定することにより、前記CADデータが変更された箇所を抽出する
     ことを特徴とする文書処理システム。
    A document processing apparatus according to claim 12,
    An extraction processing device for extracting a portion where the data has been changed by the change content indicated by the document data;
    Have
    The data is
    It is configured as CAD data that describes the design information of the parts that make up the article.
    The extraction processing apparatus includes:
    Taking out the part number and the arrangement order of the parts before and after the CAD data is changed by the document data as a change record,
    If the part numbers do not match before and after the change, the part where the CAD data is changed is extracted by identifying the record in the change record that has the most similar order of the parts. A document processing system characterized by
  14.  前記抽出処理装置は、
      前記文書データが記述している件名、図面番号、および前記CADデータの更新日時と前記文書データの更新日時が所定範囲内にあるか否かに基づき、前記文書データと前記CADデータを対応付け、
     前記文書処理装置は、
      前記抽出処理装置が対応付けた前記文書データと前記CADデータを用いて前記対応関係を学習する
     ことを特徴とする請求項13記載の文書処理システム。
    The extraction processing apparatus includes:
    The document data and the CAD data are associated with each other based on the subject, drawing number, and the update date and time of the CAD data and whether or not the update date and time of the document data are within a predetermined range,
    The document processing apparatus includes:
    The document processing system according to claim 13, wherein the correspondence relationship is learned using the document data and the CAD data associated with each other by the extraction processing device.
  15.  文書を処理する方法であって、
     データに対する変更内容を指示する文書データを取得する文書取得ステップ、
     前記文書データが指示する前記変更内容と、前記変更内容によって変更された前記データの変更部分との間の対応関係を機械学習する学習ステップ、
     前記機械学習の結果に基づき、新たな前記文書データが指示する前記データに対する変更内容によって前記データが変更される部分を推定する推定ステップ、
     を有することを特徴とする文書処理方法。
    A method for processing a document, comprising:
    A document acquisition step for acquiring document data instructing changes to the data;
    A learning step of machine learning a correspondence relationship between the change content indicated by the document data and a change portion of the data changed by the change content;
    An estimation step for estimating a portion where the data is changed according to a change content to the data indicated by the new document data based on the result of the machine learning;
    A document processing method.
PCT/JP2012/077614 2012-10-25 2012-10-25 Document processing program, document processing device, document processing system, and document processing method WO2014064803A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/077614 WO2014064803A1 (en) 2012-10-25 2012-10-25 Document processing program, document processing device, document processing system, and document processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2012/077614 WO2014064803A1 (en) 2012-10-25 2012-10-25 Document processing program, document processing device, document processing system, and document processing method

Publications (1)

Publication Number Publication Date
WO2014064803A1 true WO2014064803A1 (en) 2014-05-01

Family

ID=50544201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/077614 WO2014064803A1 (en) 2012-10-25 2012-10-25 Document processing program, document processing device, document processing system, and document processing method

Country Status (1)

Country Link
WO (1) WO2014064803A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9588952B2 (en) 2015-06-22 2017-03-07 International Business Machines Corporation Collaboratively reconstituting tables
US11157475B1 (en) 2019-04-26 2021-10-26 Bank Of America Corporation Generating machine learning models for understanding sentence context
US11423231B2 (en) 2019-08-27 2022-08-23 Bank Of America Corporation Removing outliers from training data for machine learning
US11449559B2 (en) 2019-08-27 2022-09-20 Bank Of America Corporation Identifying similar sentences for machine learning
US11526804B2 (en) 2019-08-27 2022-12-13 Bank Of America Corporation Machine learning model training for reviewing documents
US11556711B2 (en) 2019-08-27 2023-01-17 Bank Of America Corporation Analyzing documents using machine learning
US11783005B2 (en) 2019-04-26 2023-10-10 Bank Of America Corporation Classifying and mapping sentences using machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11184861A (en) * 1997-12-18 1999-07-09 Hitachi Ltd Similar parts retrieval method/device
JP2000047861A (en) * 1998-07-30 2000-02-18 Nec Corp Influence range detecting device and influence range detecting method
JP2006344200A (en) * 2005-05-12 2006-12-21 Hitachi Ltd Product design parameter decision method and support system for it
JP2010287026A (en) * 2009-06-11 2010-12-24 Exa Corp Project management system and project management program
JP2012014308A (en) * 2010-06-30 2012-01-19 Hitachi-Ge Nuclear Energy Ltd Method and device for predicting influence of change

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11184861A (en) * 1997-12-18 1999-07-09 Hitachi Ltd Similar parts retrieval method/device
JP2000047861A (en) * 1998-07-30 2000-02-18 Nec Corp Influence range detecting device and influence range detecting method
JP2006344200A (en) * 2005-05-12 2006-12-21 Hitachi Ltd Product design parameter decision method and support system for it
JP2010287026A (en) * 2009-06-11 2010-12-24 Exa Corp Project management system and project management program
JP2012014308A (en) * 2010-06-30 2012-01-19 Hitachi-Ge Nuclear Energy Ltd Method and device for predicting influence of change

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9588952B2 (en) 2015-06-22 2017-03-07 International Business Machines Corporation Collaboratively reconstituting tables
US11429897B1 (en) 2019-04-26 2022-08-30 Bank Of America Corporation Identifying relationships between sentences using machine learning
US11244112B1 (en) 2019-04-26 2022-02-08 Bank Of America Corporation Classifying and grouping sentences using machine learning
US11328025B1 (en) 2019-04-26 2022-05-10 Bank Of America Corporation Validating mappings between documents using machine learning
US11423220B1 (en) 2019-04-26 2022-08-23 Bank Of America Corporation Parsing documents using markup language tags
US11157475B1 (en) 2019-04-26 2021-10-26 Bank Of America Corporation Generating machine learning models for understanding sentence context
US11429896B1 (en) 2019-04-26 2022-08-30 Bank Of America Corporation Mapping documents using machine learning
US11694100B2 (en) 2019-04-26 2023-07-04 Bank Of America Corporation Classifying and grouping sentences using machine learning
US11783005B2 (en) 2019-04-26 2023-10-10 Bank Of America Corporation Classifying and mapping sentences using machine learning
US11423231B2 (en) 2019-08-27 2022-08-23 Bank Of America Corporation Removing outliers from training data for machine learning
US11449559B2 (en) 2019-08-27 2022-09-20 Bank Of America Corporation Identifying similar sentences for machine learning
US11526804B2 (en) 2019-08-27 2022-12-13 Bank Of America Corporation Machine learning model training for reviewing documents
US11556711B2 (en) 2019-08-27 2023-01-17 Bank Of America Corporation Analyzing documents using machine learning

Similar Documents

Publication Publication Date Title
WO2014064803A1 (en) Document processing program, document processing device, document processing system, and document processing method
JP5963328B2 (en) Generating device, generating method, and program
US7958444B2 (en) Visualizing document annotations in the context of the source document
RU2613846C2 (en) Method and system for extracting data from images of semistructured documents
US9483460B2 (en) Automated formation of specialized dictionaries
CN109074383B (en) Document search with visualization within the context of a document
JP2020126493A (en) Paginal translation processing method and paginal translation processing program
JP2005174336A (en) Learning and use of generalized string pattern for information extraction
CN111460131A (en) Method, device and equipment for extracting official document abstract and computer readable storage medium
JPWO2019224891A1 (en) Classification device, classification method, generation method, classification program and generation program
CN107870915A (en) Instruction to search result
JP4588037B2 (en) Document consistency check support system and method, and program thereof
JP2013246644A (en) Software object correction support device, software object correction support method and program
US10534788B2 (en) Automatically determining a recommended set of actions from operational data
Nanba et al. Bilingual PRESRI-Integration of Multiple Research Paper Databases.
JP6056489B2 (en) Translation support program, method, and apparatus
Bartoli et al. Semisupervised wrapper choice and generation for print-oriented documents
CN113919352A (en) Database sensitive data identification method and device
Adamu et al. A framework for enhancing the retrieval of UML diagrams
JP2016009415A (en) Glossary creation support system, method, and program
JP5312531B2 (en) Text association system and text correspondence program
WO2021121338A1 (en) Fingerprints for open source code governance
JP2022082746A (en) Sentence processing device and sentence processing method
CN115688744A (en) English sentence pattern conversion method, device, computer equipment and storage medium
JP6960270B2 (en) Design document learning device and design document learning method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12887108

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12887108

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP