WO2023162129A1 - 学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム - Google Patents

学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム Download PDF

Info

Publication number
WO2023162129A1
WO2023162129A1 PCT/JP2022/007860 JP2022007860W WO2023162129A1 WO 2023162129 A1 WO2023162129 A1 WO 2023162129A1 JP 2022007860 W JP2022007860 W JP 2022007860W WO 2023162129 A1 WO2023162129 A1 WO 2023162129A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
text data
risk
sentence
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/007860
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
直生 吉永
淳 吉田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2024502362A priority Critical patent/JPWO2023162129A1/ja
Priority to PCT/JP2022/007860 priority patent/WO2023162129A1/ja
Publication of WO2023162129A1 publication Critical patent/WO2023162129A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Definitions

  • the present invention relates to technology for detecting project risks.
  • Patent Literature 1 describes evaluating one document with respect to a plurality of independent rules and summing the product of the evaluation value and weight of each rule.
  • Patent Literature 2 describes extracting words from text data of a web page by morphological analysis, inputting the extracted words into a neural network, and calculating a risk-related score.
  • One aspect of the present invention has been made in view of the above problems. It is to provide a technique for generating data for
  • a learning data generation device includes analysis means for analyzing the structure of a sentence represented by text data and generating structural data representing the structure of the sentence; a tagging means for receiving a user operation specifying a, based on the received user operation, tagging an element corresponding to the expression in the structural data; and structural data tagged by the text data and the tagging means. and output means for outputting learning data including
  • a risk detection device includes an acquisition unit that acquires text data, an analysis that analyzes the structure of a sentence represented by the text data acquired by the acquisition unit, and generates structural data that represents the structure of the sentence.
  • a model learned by referring to learning data including means, text data, and structural data representing the structure of sentences in the text data, in which elements corresponding to expressions corresponding to risks are tagged.
  • extracting means for extracting an expression corresponding to the risk from the text data acquired by the acquiring means, using
  • At least one processor analyzes the structure of a sentence represented by text data, generates structural data representing the structure of the sentence, and determines the risk included in the sentence.
  • At least one processor acquires text data, analyzes the structure of a sentence represented by the acquired text data, generates structural data representing the structure of the sentence, Using a model learned by referring to learning data including text data and structural data representing the structure of sentences in the text data, in which elements corresponding to expressions corresponding to risks are tagged , extracting an expression corresponding to the risk from the acquired text data.
  • a learning data generation program provides a computer with an analysis process for analyzing the structure of a sentence represented by text data and generating structural data representing the structure of the sentence; receiving a user operation specifying a corresponding expression, tagging an element corresponding to the expression in the structure data based on the received user operation; and an output process of outputting learning data including the structure data.
  • a risk detection program provides a computer with an acquisition process for acquiring text data, analyzes the structure of a sentence represented by the text data acquired in the acquisition process, and generates structural data representing the structure of the sentence. Learning by referring to learning data including generated analysis processing, text data, structural data representing the sentence structure of the text data, and structural data tagged with elements corresponding to expressions corresponding to risks an extraction process for extracting an expression corresponding to the risk from the text data acquired in the acquisition process using the obtained model.
  • FIG. 1 is a block diagram showing the configuration of a learning data generation device according to Exemplary Embodiment 1;
  • FIG. FIG. 3 is a flow chart showing the flow of a learning data generation method according to exemplary embodiment 1;
  • 1 is a block diagram showing the configuration of a risk detection device according to Exemplary Embodiment 1;
  • FIG. 3 is a flow diagram showing the flow of a risk detection method according to exemplary embodiment 1;
  • FIG. 9 is a block diagram showing the configuration of an information processing apparatus according to Exemplary Embodiment 2;
  • FIG. 11 is a flow diagram showing the flow of a learning phase execution method according to exemplary embodiment 2;
  • FIG. 10 is a diagram showing a screen display example according to exemplary embodiment 2;
  • FIG. 10 is a diagram showing a screen display example according to exemplary embodiment 2;
  • FIG. 11 is a flow diagram illustrating the flow of an estimation phase execution method according to exemplary embodiment 2;
  • 1 is a block diagram showing the configuration of a computer functioning as an information processing device according to each exemplary embodiment;
  • FIG. 1 is a block diagram showing the configuration of the learning data generation device 1. As shown in FIG.
  • the learning data generation device 1 includes an analysis unit 11 , a tagging unit 12 and an output unit 13 .
  • the analysis unit 11 analyzes the structure of the sentence represented by the text data and generates structure data representing the structure of the sentence.
  • text data is data representing sentences, and for example, data representing business documents (daily business reports, operation diaries, etc.) stored in a company.
  • the data format in which the text data is saved is not limited to a text file.
  • a PDF Portable Document Format
  • HTML HyperText Markup Language
  • a file created by other predetermined document creation software There may be.
  • the analysis unit 11 analyzes the sentence structure by performing morphological analysis and syntactic analysis.
  • Morphological analysis is a process of dividing a sentence into morphemes and determining the part of speech of each morpheme. Parsing is the process of clarifying relationships between morphemes by, for example, schematizing them.
  • the method by which the analysis unit 11 analyzes the sentence structure is not limited to the example described above.
  • the analysis unit 11 may analyze the sentence structure by other methods.
  • Structural data is data that represents the structure of a sentence, and for example, data that represents a syntax tree.
  • the tagging unit 12 receives a user operation specifying an expression corresponding to the risk contained in the sentence, and tags the element corresponding to the expression in the structure data based on the received user operation.
  • the expression corresponding to the risk is, for example, a phrase such as "cost sharing is required”.
  • an expression corresponding to risk is also referred to as "risk expression”. Which phrase is a risk expression varies depending on the attributes of the user, the situation of the user or the project, and the like.
  • a risk expression includes the combination of several morphemes as an example.
  • a user operation is an action in which the user designates a risk expression, and includes, for example, operations on input devices such as a mouse, keyboard, touch panel, voice input device, and line-of-sight input device.
  • the tagging unit 12 displays a screen prompting the user to specify a risk expression on a display connected to the input/output interface.
  • the tagging unit 12 tags the elements included in the structure data based on the user's operation.
  • Elements corresponding to risk expressions in structural data are, for example, morphemes included in sentences.
  • the output unit 13 outputs learning data including the text data and the structural data tagged by the tagging unit 12 .
  • the learning data is used, for example, for learning a model that extracts risk expressions from text data.
  • Machine learning techniques for the model are not limited, but by way of example, decision tree-based, linear regression, or neural network techniques may be used, or two or more of these techniques may be used.
  • the output unit 13 may output learning data by writing it in a storage device, or may output learning data by transmitting the learning data to another device via a communication interface. good too. Also, the output unit 13 may output the learning data to an output device connected to the input/output interface.
  • the output device is, for example, a display, printer, projector, or speaker.
  • the analysis unit 11 analyzes the structure of a sentence represented by text data and generates structural data representing the structure of the sentence; receives a user operation specifying an expression corresponding to the risk included in the tagging unit 12 for tagging the element corresponding to the expression in the structure data based on the received user operation, and the text data and the tagging unit and an output unit 13 for outputting learning data including structural data tagged by 12 .
  • the learning data generating apparatus 1 does not use the structure data representing the structure of the sentence as the learning data as it is, but includes the structure data tagged based on the user's operation in the learning data. Therefore, according to the learning data generation device 1 according to the present exemplary embodiment, it is possible to generate learning data for more accurately detecting risks of projects, etc., without requiring complicated work such as rule creation. effect is obtained.
  • the functions of the learning data generation device 1 described above can also be realized by a program.
  • the learning data generation program according to this exemplary embodiment provides a computer with an analysis process for analyzing the structure of a sentence represented by text data, generating structural data representing the structure of the sentence, and a risk included in the sentence. Receiving a user operation specifying a corresponding expression, tagging an element corresponding to the expression in the structure data based on the received user operation, and tagging the text data and the tagged element in the tagging process and an output process of outputting learning data including the structural data.
  • FIG. 2 is a flowchart showing the flow of the learning data generation method S1.
  • the execution entity of each step in the learning data generation method S1 may be a processor included in the learning data generation device 1 or may be a processor included in another device, and the execution entity of each step is different. It may be a processor provided in the device.
  • At step S11 at least one processor analyzes the structure of the sentence represented by the text data and generates structure data representing the structure of the sentence.
  • at least one processor receives a user operation specifying an expression corresponding to the risk contained in the sentence, and tags the element corresponding to the expression in the structure data based on the received user operation.
  • at least one processor outputs learning data including the text data and the tagged structure data.
  • At least one processor analyzes the structure of a sentence represented by text data, generates structure data representing the structure of the sentence, Receiving a user operation specifying an expression corresponding to the risk contained in the sentence, tagging the element corresponding to the expression in the structure data based on the received user operation, and adding the text data and the tagged structure A configuration for outputting learning data including data is adopted. Therefore, according to the learning data generation method S1 according to the present exemplary embodiment, it is possible to generate learning data for more accurately detecting the risk of a project or the like without requiring complicated work such as rule creation. effect is obtained.
  • FIG. 3 is a block diagram showing the configuration of the risk detection device 2.
  • the risk detection device 2 includes an acquisition unit 21 , an analysis unit 22 and an extraction unit 23 .
  • Acquisition unit 21 acquires text data.
  • the acquiring unit 21 may acquire text data by reading text data from a storage device, or may acquire text data by receiving text data from another device connected via a communication interface. may be obtained. Further, the acquisition unit 21 may acquire text data input to an input device connected to the input/output interface.
  • the analysis unit 22 analyzes the structure of the sentence represented by the text data acquired by the acquisition unit 21, and generates structure data representing the structure of the sentence. For example, the analysis unit 22 analyzes the sentence structure by performing morphological analysis and syntactic analysis. However, the method by which the analysis unit 22 analyzes the sentence structure is not limited to the example described above. The analysis unit 22 may analyze the sentence structure by other methods.
  • the extracting unit 23 is learned by referring to learning data including text data and structural data representing the sentence structure of the text data and tagged with elements corresponding to expressions corresponding to risks. Using the obtained model, an expression corresponding to the risk is extracted from the text data acquired by the acquisition unit 21 .
  • model A model is a model that extracts risk expressions from text data.
  • Inputs for the model include, by way of example, textual data and structural data.
  • the output of the model also includes, by way of example, data indicative of risk representations.
  • the data indicating the risk expression includes, for example, data indicating the element corresponding to the risk expression, or the confidence (score) that the element included in the structure data is the risk expression.
  • the model is generated by supervised machine learning using learning data.
  • the training data used for model training includes text data and structural data in which elements corresponding to risk expressions are tagged.
  • the learning data is, for example, learning data generated by the learning data generation device 1 .
  • Machine learning techniques for the model are not limited, but by way of example, decision tree-based, linear regression, or neural network techniques may be used, or two or more of these techniques may be used.
  • the acquisition unit 21 acquires text data, the structure of a sentence represented by the text data acquired by the acquisition unit 21 is analyzed, and the structure of the sentence is analyzed. and the text data and structural data representing the sentence structure of the text data, in which the element corresponding to the expression corresponding to the risk is tagged. and an extraction unit 23 that extracts an expression corresponding to the risk from the text data acquired by the acquisition unit 21 using a model learned by referring to the data. Therefore, according to the risk detection device 2 according to this exemplary embodiment, it is possible to obtain the effect that the risk of a project or the like can be detected with higher accuracy without requiring complicated work such as rule creation.
  • the functions of the risk detection device 2 described above can also be realized by a program.
  • the risk detection program according to the present exemplary embodiment provides a computer with an acquisition process for acquiring text data, analyzes the structure of a sentence represented by the text data acquired in the acquisition process, and generates structural data representing the structure of the sentence. Learning by referring to learning data including generated analysis processing, text data, structural data representing the sentence structure of the text data, and structural data tagged with elements corresponding to expressions corresponding to risks an extraction process for extracting an expression corresponding to the risk from the text data acquired in the acquisition process using the obtained model.
  • FIG. 4 is a flow diagram showing the flow of the risk detection method S2.
  • the execution entity of each step in the risk detection method S2 may be a processor provided in the risk detection device 2 or a processor provided in another device. processor.
  • At step S21 at least one processor acquires text data.
  • At step S22 at least one processor analyzes the structure of the sentence represented by the acquired text data and generates structure data representing the structure of the sentence.
  • at least one processor prepares learning data including text data and structural data representing the sentence structure of the text data, in which elements corresponding to expressions corresponding to risks are tagged. Using the model learned with reference, an expression corresponding to risk is extracted from the obtained text data.
  • At least one processor obtains text data, analyzes the structure of the sentence represented by the obtained text data, and analyzes the structure of the sentence. generate structural data representing the risk, and refer to learning data including text data and structural data representing the sentence structure of the text data, in which elements corresponding to expressions corresponding to risks are tagged A configuration is adopted in which an expression corresponding to risk is extracted from the acquired text data using the learned model. Therefore, according to the risk detection method S2 according to the present exemplary embodiment, it is possible to obtain the effect that the risk of a project or the like can be detected with higher accuracy without requiring complicated work such as rule creation.
  • FIG. 5 is a block diagram showing the configuration of the risk detection device 1A according to this exemplary embodiment.
  • the risk detection device 1A has a function of detecting risks such as projects from stored documents.
  • the risk detection device 1A is an example of a learning data generation device and a risk detection device according to the present specification.
  • the risk detection device 1A as shown in FIG. 5, includes a control section 10A, a storage section 20A, a communication section 30A and an input/output section 40A.
  • the communication unit 30A communicates with a device external to the risk detection device 1A via a communication line.
  • a communication line includes wireless LAN (Local Area Network), wired LAN, WAN (Wide Area Network), public line network, mobile data communication network, or a combination thereof.
  • the communication unit 30A transmits data supplied from the control unit 10A to other devices, and supplies data received from other devices to the control unit 10A.
  • Input/output unit 40A Input/output devices such as a keyboard, mouse, display, printer, and touch panel are connected to the input/output unit 40A.
  • the input/output unit 40A receives input of various kinds of information from the connected input device to the risk detection device 1A. Also, the input/output unit 40A outputs various kinds of information to the connected output device under the control of the control unit 10A.
  • an interface such as a USB (Universal Serial Bus) can be used as the input/output unit 40A.
  • the control section 10A includes a learning phase execution section 110A and an estimation phase execution section 120A, as shown in FIG.
  • the learning phase execution unit 110A comprises an analysis unit 11, a tagging unit 12, an output unit 13 and a learning unit 14A.
  • the estimation phase execution unit 120A includes an acquisition unit 21, an analysis unit 22, and an extraction unit 23. FIG.
  • analysis unit 11 analyzes the structure of the sentence represented by the text data and generates structure data representing the structure of the sentence.
  • the analysis unit 11 performs morphological analysis and syntactic analysis of text data to generate data representing a syntax tree.
  • the tagging unit 12 accepts a user operation specifying a risk expression included in the sentence, and tags the element corresponding to the risk expression in the structure data based on the accepted user operation. As an example, based on the user operation, the tagging unit 12 attaches a tag indicating "intention” and a tag indicating "topic” to the elements of the structure data as tags indicating expressions corresponding to the risks.
  • the tags given by the tagging unit 12 are not limited to the two types of tags, the "intention" tag and the "topic” tag, and may include other types of tags. Also, the number of types of tags that the tagging unit 12 attaches is not limited to two, and may be more or less.
  • the output unit 13 outputs learning data TD including text data and structural data tagged by the tagging unit 12 .
  • the output unit 13 outputs the learning data TD by writing it into the storage unit 20A.
  • the learning unit 14A learns a model MA for extracting an expression corresponding to a risk from the text data by inputting the text data and the structure data representing the sentence structure of the text data using the learning data.
  • the acquisition unit 21 acquires text data that is the target of risk detection.
  • the text data is, for example, text data stored in the business document database DB.
  • the text data may be data received from another device connected via the communication unit 30A, or may be data input to an input device connected to the input/output unit 40A. good.
  • the analysis unit 22 analyzes the structure of the sentence represented by the text data acquired by the acquisition unit 21, and generates structure data representing the structure of the sentence.
  • the analysis processing performed by the analysis unit 22 is the same as the analysis processing performed by the analysis unit 11 .
  • the analysis unit 22 may use a common library with the analysis unit 11 to perform morphological analysis and syntactic analysis.
  • the extraction unit 23 extracts an expression corresponding to risk from the text data acquired by the acquisition unit 21 using the model MA generated by the learning unit 14A.
  • the storage unit 20A stores a business document database DB, as well as learning data TD and a model MA. Storing the model MA means that the parameters defining the model MA are stored in the storage unit 20A.
  • the business document database DB is a database in which business documents are accumulated. Business documents are saved in various file formats such as text files, PDF files, and HTML files. In this exemplary embodiment, at least some of the files stored in the business document database DB are used for training the model MA.
  • Model MA is a model for extracting risk expressions from text data, and is generated by supervised machine learning.
  • the input of model MA includes textual data and tagged structural data.
  • the output of Model MA includes data indicative of risk representations.
  • the data indicating the risk expression includes, for example, data indicating a combination of elements corresponding to the risk expression, or confidence that each element included in the structural data is the risk expression.
  • model MA is a model generated by deep learning.
  • FIG. 6 is a flowchart showing the flow of the learning phase execution method S100A executed by the risk detection device 1A. Note that some of the steps included in the learning phase execution method S100A may be executed in parallel or in a different order. Also, the description of the already described contents will not be repeated.
  • Step S101 the analysis unit 11 acquires text data from the business document database DB, analyzes the structure of the sentence represented by the acquired text data, and generates structural data representing the structure of the sentence.
  • step S102 the tagging unit 12 receives a user operation specifying an expression corresponding to the risk contained in the sentence, and tags the element corresponding to the expression in the structure data based on the received user operation. Specifically, as an example, the tagging unit 12 displays a screen prompting the user to specify a risk expression on a touch panel (not shown) connected to the input/output unit 40A, and performs tagging based on the user's operation on the touch panel. conduct.
  • FIG. 7 is a diagram showing a specific example of screen display output by the tagging unit 12.
  • the screen displays text data 201 and a syntax tree 202 that is the result of parsing the text data 201 .
  • a syntax tree 202 is displayed that is the result of parsing the sentence "If the other party refuses, we will report that our company will be required to bear the costs.”
  • the screen also displays a pointer 204 for the user to select an element, a button 206 for displaying the next text data, and a button 205 for displaying the previous text data.
  • a pointer 204 moves within the screen based on a user operation.
  • select "intent” and “topic” in this order. ” prompting a user operation may be displayed.
  • FIG. 8 is a diagram showing a specific example of screen display after the user selects an element to be tagged on the screen of FIG. When the user sequentially selects "necessary" and “cost burden” in the syntax tree 202 of FIG. ” with the tag of “topic”. Note that the UI screen for the user to specify the risk expression is not limited to the examples of FIGS. 7 and 8, and may be another screen.
  • the tagging unit 12 assigns different tags to multiple elements corresponding to risk expressions in structural data.
  • the tagged structural data can also be said to be data representing relationships (dependency relationships, etc.) between multiple elements corresponding to risk expressions.
  • the tagging unit 12 generates data indicating relationships (dependency relationships, etc.) between multiple elements corresponding to risk expressions.
  • the user designates a plurality of elements corresponding to risk expressions in the text data, and the tagging unit 12 identifies the plurality of elements designated by the user and the relationships between the elements.
  • the user selects button 206 or button 205 without user operation for tagging.
  • the tagging unit 12 does not tag structural data corresponding to the text data.
  • step S103 the output unit 13 outputs learning data TD including text data and tagged structural data.
  • the output unit 13 stores the learning data TD in the storage unit 20A.
  • the risk detection device 1A executes the processes of steps S101 to S103 for one piece of text data.
  • learning data TD is generated that includes text data and structural data representing the sentence structure of the text data and tagged with risk expression elements.
  • step S104 the learning unit 14A generates a model MA by supervised machine learning using the learning data TD.
  • the learning unit 14A generates the model MA by deep learning.
  • FIG. 9 is a flowchart showing the flow of the estimation phase execution method S200A executed by the risk detection device 1A. Some of the steps included in the estimation phase execution method S200A may be executed in parallel or in a different order. Also, the description of the already described contents will not be repeated.
  • Step S201 the acquisition unit 21 acquires text data that is the target of risk detection.
  • the text data acquired by the acquisition unit 21 is, for example, text data stored in the business document database DB and includes text data that is not used for learning the model MA.
  • the acquisition unit 21 may also receive text data from another device connected via the communication unit 30A.
  • step S202 the analysis unit 22 analyzes the structure of the sentence represented by the text data acquired by the acquisition unit 21, and generates structure data representing the structure of the sentence.
  • step S203 the extraction unit 23 extracts risk expressions from the text data acquired by the acquisition unit 21 using the model MA.
  • the extraction unit 23 extracts the risk expression from the text data based on the output of the model MA obtained by inputting the text data acquired by the acquisition unit 21 and the structural data generated by the analysis unit 22 into the model MA. do.
  • the extraction unit 23 outputs the extracted risk expression.
  • the extracting unit 23 outputs by writing data indicating the risk expression to the storage unit 20A.
  • the method by which the extraction unit 23 outputs the risk expression is not limited to the example described above, and the extraction unit 23 may output the risk expression by another method.
  • the extraction unit 23 may transmit data representing the risk expression to another device connected via the communication unit 30A, or transmit data representing the risk expression to an output device connected to the input/output unit 40A. may be output.
  • structural data that represents the structure of a sentence is not used as training data as it is, but structural data tagged based on user operations is included in learning data.
  • structural data indicating the multiple elements specified by the user and the relationships between the multiple elements in the learning data it is possible to generate learning data for more accurately detecting project risks. .
  • the tagged structural data is data that reflects the user's intentions, such as what phrases the user considers to be risky. . What phrases are risky depends on the situation of the user or the situation of the project. Therefore, it is possible to generate learning data for risk detection that better reflects the user's intention.
  • the risk detection device 1A according to the present exemplary embodiment employs a configuration including the learning unit 14A for learning the model MA using the learning data TD. Therefore, according to the risk detection device 1A according to the present exemplary embodiment, in addition to the effects of the learning data generation device 1 according to the first exemplary embodiment, complicated work such as rule creation is not required. It is possible to obtain the effect of being able to generate a model MA for detecting the risk of a project or the like with higher accuracy.
  • the tagging unit 12 converts a tag indicating an intention and a tag indicating a topic as a tag indicating a risk expression into elements of the structure data based on the above user operation. A configuration to give is adopted. Since the tagged elements are selected by the user, the structural data with these tags reflects the user's intention of what phrases the user considers to be risky.
  • the learning unit 14A generates the model MA by deep learning. Therefore, according to the risk detection device 1A according to the present exemplary embodiment, it is possible to generate a model MA for more accurately detecting the risk of a project or the like without requiring complicated work such as rule creation. can get.
  • the analysis unit 11 performs morphological analysis and syntactic analysis of text data. Therefore, according to the risk detection device 1A according to the present exemplary embodiment, it is possible to generate learning data for more accurately detecting the risk of a project or the like without requiring complicated work such as rule creation. is obtained.
  • the tagged structural data indicates, as an example, relationships (dependency relationships, etc.) between multiple elements corresponding to risk expressions.
  • risk expression can be A model MA with higher detection accuracy can be generated as the model MA to be detected.
  • the model MA is a model generated by deep learning. Therefore, according to the risk detection device 1A according to the present exemplary embodiment, it is possible to obtain the effect that the risk of a project or the like can be detected with higher accuracy without requiring complicated work such as rule creation.
  • the analysis unit 22 performs morphological analysis and syntactic analysis of the text data. Therefore, according to the risk detection device 1A according to the present exemplary embodiment, it is possible to obtain the effect that the risk of a project or the like can be detected with higher accuracy without requiring complicated work such as rule creation.
  • the degree of risk severity may be user selectable.
  • the tagging unit 12 outputs a UI screen for the user to select the degree of risk to the touch panel.
  • a user operates a touch panel to select an element to which a tag is attached and a degree of risk, and the tagging unit 12 attaches a tag including the degree of risk to the selected element according to the user's operation.
  • the detection accuracy of the model MA can be further improved by including the structure data to which the tag including the degree of risk is attached in the learning data.
  • Some or all of the functions of the learning data generation device 1, the risk detection device 2, and the risk detection device 1A are realized by hardware such as integrated circuits (IC chips). may be implemented by software.
  • the learning data generation device 1 and the like are implemented by a computer that executes program instructions, which are software that implements each function, for example.
  • An example of such a computer (hereinafter referred to as computer C) is shown in FIG.
  • Computer C comprises at least one processor C1 and at least one memory C2.
  • a program P for operating the computer C as the learning data generation device 1 or the like is recorded in the memory C2.
  • the processor C1 reads the program P from the memory C2 and executes it, thereby realizing each function of the learning data generation device 1 and the like.
  • processor C1 for example, CPU (Central Processing Unit), GPU (Graphic Processing Unit), DSP (Digital Signal Processor), MPU (Micro Processing Unit), FPU (Floating point number Processing Unit), PPU (Physics Processing Unit) , a microcontroller, or a combination thereof.
  • memory C2 for example, a flash memory, HDD (Hard Disk Drive), SSD (Solid State Drive), or a combination thereof can be used.
  • the computer C may further include a RAM (Random Access Memory) for expanding the program P during execution and temporarily storing various data.
  • Computer C may further include a communication interface for sending and receiving data to and from other devices.
  • Computer C may further include an input/output interface for connecting input/output devices such as a keyboard, mouse, display, and printer.
  • the program P can be recorded on a non-temporary tangible recording medium M that is readable by the computer C.
  • a recording medium M for example, a tape, disk, card, semiconductor memory, programmable logic circuit, or the like can be used.
  • the computer C can acquire the program P via such a recording medium M.
  • the program P can be transmitted via a transmission medium.
  • a transmission medium for example, a communication network or broadcast waves can be used.
  • Computer C can also obtain program P via such a transmission medium.
  • a learning data generation device comprising:
  • the tagging means attaches, based on the user operation, a tag indicating an intention and a tag indicating a topic as a tag indicating an expression corresponding to the risk to the elements of the structural data.
  • the learning data generation device according to appendix 1 or 2.
  • Appendix 5 the analysis means performs morphological analysis and syntactic analysis of the text data; 5.
  • the learning data generation device according to any one of Appendices 1 to 4.
  • a risk detection device comprising a
  • the model is a model generated by deep learning, The risk detection device according to appendix 6.
  • Appendix 8 the analysis means performs morphological analysis and syntactic analysis of the text data;
  • the risk detection device according to appendix 6 or 7.
  • (Appendix 9) at least one processor Analyze the structure of the sentence represented by the text data, generate structural data representing the structure of the sentence, Receiving a user operation specifying an expression corresponding to the risk contained in the sentence, tagging an element corresponding to the expression in the structural data based on the received user operation, outputting training data including the text data and the tagged structural data; Training data generation method.
  • At least one processor at least one processor get the text data, analyzing the structure of the sentence represented by the acquired text data, generating structural data representing the structure of the sentence; Using a model learned by referring to learning data including text data and structural data representing the structure of sentences in the text data, in which elements corresponding to expressions corresponding to risks are tagged , extracting an expression corresponding to the risk from the obtained text data; Risk detection method.
  • (Appendix 13) analysis means for analyzing the structure of a sentence represented by text data and generating structural data representing the structure of the sentence; Tagging means for receiving a user operation specifying an expression corresponding to the risk included in the sentence, and tagging an element corresponding to the expression in the structural data based on the received user operation; output means for outputting learning data including the text data and the structural data tagged by the tagging means; Training data generation system including.
  • Appendix 14 an acquisition means for acquiring text data; analysis means for analyzing the structure of a sentence represented by the text data acquired by the acquisition means and generating structural data representing the structure of the sentence; Using a model learned by referring to learning data including text data and structural data representing the structure of sentences in the text data, in which elements corresponding to expressions corresponding to risks are tagged , an extraction means for extracting an expression corresponding to risk from the text data acquired by the acquisition means; Risk detection system including;
  • At least one processor analyzes the structure of a sentence represented by the text data, and specifies an analysis process for generating structural data representing the structure of the sentence, and an expression corresponding to the risk included in the sentence.
  • learning including a tagging process of accepting a user operation, tagging an element corresponding to the expression in the structural data based on the accepted user operation, and the text data and the structural data tagged in the tagging process
  • a learning data generation device that executes output processing for outputting data for learning.
  • the learning data generation device may further include a memory, and the memory stores a program for causing the processor to execute the analysis process, the tagging process, and the output process. may have been Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.
  • At least one processor is provided, and the processor is an acquisition process for acquiring text data, and an analysis process for analyzing the structure of a sentence represented by the text data acquired in the acquisition process and generating structural data representing the structure of the sentence. and training data containing text data and structural data representing the structure of sentences in the text data and tagged with elements corresponding to expressions corresponding to risks. and extracting an expression corresponding to the risk from the text data acquired in the acquisition process.
  • the risk detection device may further include a memory, and the memory stores a program for causing the processor to execute the acquisition process, the analysis process, and the extraction process. good too. Also, this program may be recorded in a computer-readable non-temporary tangible recording medium.
  • the present invention has been described with reference to the above-described exemplary embodiments, the present invention is not limited to the above-described exemplary embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention. At least one or more of the functions of the learning data generation device 1 and the risk detection devices 1A and 2 described above are executed by a plurality of different information processing devices installed and connected anywhere on the network. may be implemented in so-called cloud computing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
PCT/JP2022/007860 2022-02-25 2022-02-25 学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム Ceased WO2023162129A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2024502362A JPWO2023162129A1 (https=) 2022-02-25 2022-02-25
PCT/JP2022/007860 WO2023162129A1 (ja) 2022-02-25 2022-02-25 学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/007860 WO2023162129A1 (ja) 2022-02-25 2022-02-25 学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム

Publications (1)

Publication Number Publication Date
WO2023162129A1 true WO2023162129A1 (ja) 2023-08-31

Family

ID=87765092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/007860 Ceased WO2023162129A1 (ja) 2022-02-25 2022-02-25 学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム

Country Status (2)

Country Link
JP (1) JPWO2023162129A1 (https=)
WO (1) WO2023162129A1 (https=)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016067396A1 (ja) * 2014-10-29 2016-05-06 株式会社日立製作所 文の並び替え方法および計算機
US20190156256A1 (en) * 2017-11-22 2019-05-23 International Business Machines Corporation Generating risk assessment software
US20210026835A1 (en) * 2019-07-22 2021-01-28 Kpmg Llp System and semi-supervised methodology for performing machine driven analysis and determination of integrity due diligence risk associated with third party entities and associated individuals and stakeholders

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016067396A1 (ja) * 2014-10-29 2016-05-06 株式会社日立製作所 文の並び替え方法および計算機
US20190156256A1 (en) * 2017-11-22 2019-05-23 International Business Machines Corporation Generating risk assessment software
US20210026835A1 (en) * 2019-07-22 2021-01-28 Kpmg Llp System and semi-supervised methodology for performing machine driven analysis and determination of integrity due diligence risk associated with third party entities and associated individuals and stakeholders

Also Published As

Publication number Publication date
JPWO2023162129A1 (https=) 2023-08-31

Similar Documents

Publication Publication Date Title
US11308278B2 (en) Predicting style breaches within textual content
CN113110988B (zh) 利用定义的输入格式来测试应用
US20220414463A1 (en) Automated troubleshooter
CN117707922A (zh) 测试用例的生成方法、装置、终端设备和可读存储介质
US20150033116A1 (en) Systems, Methods, and Media for Generating Structured Documents
US11074595B2 (en) Predicting brand personality using textual content
JP2020126493A (ja) 対訳処理方法および対訳処理プログラム
US12008322B2 (en) Machine learning techniques for semantic processing of structured natural language documents to detect action items
CN107808011A (zh) 信息的分类抽取方法、装置、计算机设备和存储介质
US12032607B2 (en) Context-based recommendation system for feature search
EP3553696A1 (en) Generating a structured document based on a machine readable document and artificial intelligence-generated annotations
KR102280490B1 (ko) 상담 의도 분류용 인공지능 모델을 위한 훈련 데이터를 자동으로 생성하는 훈련 데이터 구축 방법
JP5381704B2 (ja) 情報提供システム
US11176311B1 (en) Enhanced section detection using a combination of object detection with heuristics
US8666987B2 (en) Apparatus and method for processing documents to extract expressions and descriptions
CN110008807A (zh) 一种合同内容识别模型的训练方法、装置及设备
CN119848545A (zh) 面向大模型场景的样本处理方法、装置、设备和存储介质
WO2023162129A1 (ja) 学習用データ生成装置、リスク検知装置、学習用データ生成方法、リスク検知方法、学習用データ生成プログラム及びリスク検知プログラム
KR102072708B1 (ko) 텍스트 콘텐츠의 장르를 추론하는 방법 및 컴퓨터 프로그램
CN113705206B (zh) 情感预测模型的训练方法、装置、设备及存储介质
US20240202643A1 (en) Information processing apparatus, business action extraction method, and storage medium
WO2019225007A1 (ja) 入力ミス検知装置、入力ミス検知方法および入力ミス検知プログラム
US20260087499A1 (en) Logfile recommender service
US20250328733A1 (en) Information processing apparatus, analysis method, and storage medium
CN113343636B (zh) 一种设置标注线宽度的方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928656

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2024502362

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22928656

Country of ref document: EP

Kind code of ref document: A1