CN114830106A - Method and apparatus for NLP-based diagnostics - Google Patents

Method and apparatus for NLP-based diagnostics Download PDF

Info

Publication number
CN114830106A
CN114830106A CN202080087457.4A CN202080087457A CN114830106A CN 114830106 A CN114830106 A CN 114830106A CN 202080087457 A CN202080087457 A CN 202080087457A CN 114830106 A CN114830106 A CN 114830106A
Authority
CN
China
Prior art keywords
text
fault signature
constituent
nlp
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080087457.4A
Other languages
Chinese (zh)
Inventor
生若谷
惠浩添
丹尼尔·施尼盖斯
车效音
王焦剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens Ltd China
Original Assignee
Siemens Ltd China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Ltd China filed Critical Siemens Ltd China
Publication of CN114830106A publication Critical patent/CN114830106A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A method, apparatus for NLP based diagnostics is presented to provide a solution for fault signature with high efficiency and accuracy. A method (200) for NLP-based diagnosis comprising: encoding (S301) a text (20) into a semantic representation (50); decoding (S302) the semantic representation (50) to extract at least one constituent in a predefined hierarchy, wherein the predefined hierarchy comprises at least two constituents belonging to different levels and each constituent indicates an entity related to an event described in the text (20); the output (S303) includes a fault signature (30) of the at least one constituent.

Description

Method and apparatus for NLP-based diagnostics
Technical Field
The present invention relates to the art of NLP, and more particularly, to a method, apparatus and computer-readable storage medium for NLP-based diagnostics.
Background
Natural Language Processing (NLP) based diagnostic solutions are widely accepted and used today. Semantically well-defined phrases are referred to as fault signatures. The fault signature tags can be used as an important feature of a diagnostic solution, such as providing clues to which device has caused which fault.
For fault signature, manual and semi-automatic solutions currently exist. For small data sets, a domain expert may manually flag fault signatures in the file. However, manual marking is time consuming and requires a lot of domain knowledge for correct marking.
Another approach for fault signature is based on syntactic pattern matching. For certain problems, such as high pressure, a model with a limited syntactic pattern can be constructed to match the high pressure description. FIG. 1 shows what the grammar pattern model is. In the grammar pattern model, a pattern describing a high voltage is set. These patterns are defined by a combination of: fixed words (e.g., high, limit), multiple matching symbols (e.g., < pressure > -matching any form of pressure), and wildcards (e.g.,' matching any sequence of symbols). If the input text matches any pattern in the model, the input text is labeled with the fault signature defined in the model.
The grammar pattern model shown in FIG. 1 is capable of labeling a large number of files and requires less domain knowledge when using the model. However, creating grammar pattern models one by one is still time consuming.
Disclosure of Invention
Embodiments of the present disclosure include methods, apparatus for NLP-based diagnostics that can provide a solution for fault signature with high efficiency and accuracy. Also considering technical areas such as industrial fault diagnosis, it is important to track which part of which device caused the fault. Therefore, the embodiments of the present disclosure also provide a solution for utilizing the failure backtracking information, such as the device causing the failure, and partially performing the failure signature.
According to a first aspect of the present disclosure, a method for NLP-based diagnostics is presented, the method comprising the steps of:
-encoding text into a semantic presentation;
-decoding the semantic presentation to extract at least one constituent part in a predefined hierarchy, wherein the predefined hierarchy comprises at least two constituent parts belonging to different levels and each constituent part indicates an entity related to an event described in the text;
-outputting a fault signature comprising the at least one constituent part.
According to a second aspect of the present disclosure, a device for NLP-based diagnostics is presented, the device comprising:
-an encoder configured to encode text into a semantic presentation;
-a decoder configured to:
-decoding the semantic presentation to extract at least one constituent part in a predefined hierarchy, wherein the predefined hierarchy contains at least two constituent parts belonging to different levels and each constituent part indicates an entity related to an event described in the text; and
-outputting a fault signature comprising said at least one constituent.
According to a third aspect of the present disclosure, a device for NLP-based diagnostics is presented, the device comprising:
-at least one processor;
-at least one memory, coupled to the at least one processor, configured to perform a method according to the first aspect of the present disclosure.
According to a fourth aspect of the present disclosure, a computer-readable medium is presented storing executable instructions that, when executed by a processor, enable the processor to perform a method according to the first aspect of the present disclosure.
In the present disclosure, with a predefined hierarchy of constituent parts, important information indicating entities related to events described in the text can be acquired and recorded in fault signatures that can be used as good signatures in other NLP tasks. Also, with the encoder-decoder architecture, an automatic tagging solution is provided, and the manpower for manual tagging or grammar model creation can be greatly reduced. Since the solution provided is data driven, domain knowledge is less required and the effort of domain experts in model development can also be reduced.
In an embodiment of the present disclosure, a weight is calculated for each word in the semantic presentation, and a failure feature is output based on the calculated weight. In weight calculation, a predefined hierarchy may help find omitted device/part names, and this missing information may optimize the mechanism of weight calculation to obtain better calculation results and better labeling results. Further, failure features may be flagged with greater accuracy for long sentences because relationships between words in long sentences may be captured with a weight and a higher weight may be given to indicate such a relationship.
In embodiments of the present disclosure, the text is from the industrial field, and the predefined hierarchy may include: devices, parts, and components, where a device belongs to the highest level, a part belongs to a higher level, and a component belongs to the lowest level.
In an embodiment of the disclosure, the fault signature further comprises a type of event described in the text.
Drawings
The above-mentioned attributes and other features and advantages of the present technology, as well as the manner of attaining them, will become more apparent and the present technology itself will be better understood by reference to the following description of embodiments of the present technology taken in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a block diagram of a grammar pattern model for fault signature.
Fig. 2 depicts a block diagram of an apparatus according to one embodiment of the present disclosure.
Fig. 3 depicts a block diagram of an apparatus according to one embodiment of the present disclosure.
Fig. 4 depicts a flow diagram of a method according to one embodiment of the present disclosure.
Reference numerals:
device for NLP-based diagnosis
101, at least one memory
102, at least one processor
103, I/O port
104, monitor
105, encoder
106, decoder
107, weight calculator
20, text
30, fault signature
40, failure signature program
50, semantic rendering
60, weight of
300 method for NLP based diagnosis
S301, encoding
S301', weight calculation
S302, decoding
S303, outputting the fault characteristics
Detailed Description
The above-described and other features of the present technology are described in detail below. Various embodiments are described with reference to the drawings, wherein like reference numerals are used to refer to like elements throughout. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It should be noted that the illustrated embodiments are intended to illustrate, but not to limit the invention. It may be evident that such embodiment(s) may be practiced without these specific details.
When introducing elements of various embodiments of the present disclosure, the articles "a" and "the" mean that there are one or more of the elements. The terms "comprising," "including," and "having" are intended to be inclusive and mean that there may be additional elements other than the listed elements.
Now, the present disclosure will be described in detail below with reference to fig. 2 to 5.
Fig. 2 depicts a block diagram of an apparatus according to one embodiment of the present disclosure. The apparatus 10 for NLP-based diagnostics presented in this disclosure may be implemented as a network of computer processors to perform the methods for NLP-based diagnostics presented in this disclosure. The apparatus 10 may also be a single computer, as shown in FIG. 2, including at least one memory 101, including a computer-readable medium, such as Random Access Memory (RAM). The apparatus 10 also includes at least one processor 102 coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101 and, when executed by the at least one processor 102, may cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a state machine, and so forth. Embodiments of a computer-readable medium include, but are not limited to, a floppy disk, a CD-ROM, a magnetic disk, a memory chip, a ROM, a RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable media may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may include code from any computer programming language, including, for example, C, C + +, C #, Visual Basic, Java, and JavaScript.
The at least one memory 101 shown in fig. 2 may contain a fault signature program 40 that, when executed by the at least one processor 102, causes the at least one processor 102 to perform the methods presented in this disclosure for NLP-based diagnostics.
Referring to fig. 2, text 20 may be processed by device 10 to have a marked fault signature 30. The text 20 may contain descriptions about faults in a particular field, such as an industrial field, including but not limited to: chemical industry, petrochemical and refinery, pulp and paper manufacturing, boiler control and power plant systems, power generation, nuclear power generation, water management, water treatment, food/beverage production, fertilizer production, metal/metal alloy manufacturing, metallurgical processing, automotive manufacturing, medical manufacturing, food refining, product manufacturing and processing, and the like. The text 20 may be pre-stored in the at least one memory 101 or input via the I/O port 103 of the device 10. The processed fault signature 30 may also be stored in the at least one memory 101 and/or output via the I/O port 103 and may even be displayed on a monitor 104 of the device 10.
In some fields, such as the industrial field, text fields have their own characteristics. Taking the industrial field as an example, in industrial texts, such as diagnostic reports, it is also crucial to know which part of which device caused the fault, in addition to descriptive words about what fault occurred, and even sometimes it is necessary to know exactly which component of a part caused the fault. Therefore, special attention needs to be paid to the names of devices, parts and even components in the text. In addition, the names of devices, parts, and components have a hierarchical structure. Disadvantageously, however, in currently implemented NLP-based diagnostic solutions, information about the device, the part, the name of the component and its hierarchy is ignored during fault signature.
An example of text might be "induce fan trip unexpectedly during run time, motor bearing temperature is high and alarm is issued". In the currently implemented solution, the detected faults are characterized by "induced fan trip", "motor temperature high" and "bearing temperature high", which neglects the hierarchical structure. In the solution provided in the present disclosure, hierarchical structure information is defined and applied in the fault signature, which, as an example above, would be "induce fan trip", "induce fan motor bearing temperature high", where the induced fan motor bearings form a hierarchical structure.
Referring now to fig. 3 and 2, in one embodiment, the fault signature routine 40 may include:
an encoder 105 configured to encode the text 20 into a semantic representation 50;
a decoder 106 configured to:
-decoding said semantic representation 50 to extract at least one constituent in a predefined hierarchy, wherein said predefined hierarchy contains at least two constituents belonging to different levels and each constituent indicates an entity related to an event described in said text 20; and
outputting a fault signature 30 comprising said at least one constituent.
In some fields, such as industry, it is common to expect a failure feature to be marked to be bound to an entity, such as a device. While in actual text one may use the part/component name instead of the device name. Here are examples:
device a stops during operation. The operator has performed a thorough inspection and found that component a is leaking.
Thus, the expected failure characteristic may be "component a of device a leaking". Although, since device a is omitted in the text, according to the current solution, the failure feature may only be marked as "component a leakage", the connection of component a to device a being missing.
With the solutions provided in the present disclosure, the hierarchy of an entity may be predefined, such as "device, portion, and component," where the device belongs to the highest hierarchy, portion belongs to a higher hierarchy, and component belongs to the lowest hierarchy. This hierarchy should be predefined prior to NLP modeling. The fault signature may have the following format:
Device name_Partname_Componentname_type of event。
once the hierarchy is defined, the fault signature problem is converted into a sequence-to-sequence problem. The input sequence is the original text 20, which can be processed into a word sequence 20 1 ,20 2 ,…,20 n }. The output sequence is a key feature of the fault feature 30, which contains at least one constituent part in a predefined hierarchy and optionally an event type, as shown in fig. 3, 30 1 、30 2 、…、30 n . Taking the previous industry text as an example, the output sequence may have the following format: { device nameTitle, part name, component name, event type }.
Here, the encoder 105 is configured to convert the input sequence, i.e. the text 20 into the semantic representation 50. Next, the decoder 106 is configured to extract at least one constituent part, e.g. device a, part 1 and component a, in the predefined hierarchy.
In this encoder-decoder framework shown in fig. 3, different models may be selected for the encoder 105 and decoder 106. To capture internal relationships among word sequences, a Recurrent Neural Network (RNN) or long-term short-term memory (LSTM) may be used for the encoder 105. For the decoder 106, the RNN or LSTM may also be used to generate the fault signature 30, including the sequence of constituent parts described above and optionally the event type.
Referring now to FIG. 4, the fault signature program 40 may also include a weight calculator 107 configured to calculate a weight for each word in the semantic representation 50, and the decoder 106 may be further configured to output the fault signature 30 including the extracted constituents according to the calculated weights.
Optionally, the weight calculator 107 may be implemented by an attention model that uses an attention mechanism to achieve better labeling results. In general, different words in text 20 have different effects in determining final failure characteristics 30. If one or more words in the text 20 represent a device/part/component name, the one or more words should be more noticeable. Also, for some words that are strongly related to device/part/component name and/or event type, the words should be given higher weight. When generating the output sequence of fault features 30, the attention model may give different weights 60 for words in the text 20. Also, the attention model may also learn which words are more important.
Although the encoder 105, decoder 106 and weight calculator 107 are described above as software modules of the fault signature program 40. Also, it may be implemented via hardware, e.g., an ASIC chip. It may be integrated into one chip or implemented separately and electrically connected.
It should be mentioned that the present disclosure may include devices having architectures that are different from those shown in fig. 2 to 4. The above architecture is merely exemplary and is used to explain the exemplary method 300 shown in fig. 5.
Various methods according to the present disclosure may be carried out. An exemplary method 300 according to the present disclosure comprises the steps of:
s301: encoding the text 20 into a semantic representation 50;
s302: decoding S302 said semantic representation 50 to extract at least one constituent in a predefined hierarchy, wherein said predefined hierarchy comprises at least two constituents belonging to different levels and each constituent indicates an entity related to an event described in said text 20;
s303: the output includes a fault signature 30 of the at least one component.
Optionally, the method 300 further comprises: s301': the weight of each word in the semantic representation 50 is calculated. Next, step S303 further includes: the fault features 30 containing the extracted constituents are output according to the calculated weights.
Optionally, the predefined hierarchy includes: devices, sections, and components, where a device belongs to the highest level, a section belongs to a higher level, and a component belongs to the lowest level.
Optionally, fault signature 30 further includes the type of event described in text 20. And fault signature 30 may have the following format:
Device name_Partname_Componentname_type of the event。
also provided in the present disclosure is a computer-readable medium storing computer-executable instructions that, when executed by a computer, enable the computer to perform any of the methods presented in the present disclosure.
The computer program is being executed by at least one processor and performs any of the methods presented in this disclosure.
Compared to manual and semi-automatic fault signature methods based on grammar pattern matching, the solution provided by the present disclosure has the following advantages:
manual tagging and grammar model creation can be saved. This will save time for both the customer and the model developer.
The solution provided in this disclosure is data driven and does not require much domain knowledge. This will reduce the effort of the domain expert in model development.
With the solutions provided in the present disclosure, dependencies among different constituent parts, e.g. devices, parts and components, can be captured. The fault signature generated using the hierarchy information can be used as a good signature in other NLP tasks.
Furthermore, with the solution provided in the present disclosure, fault signatures can be marked for long sentences with higher accuracy.
Although the present technology has been described in detail with reference to certain embodiments, it should be understood that the present technology is not limited to those precise embodiments. Indeed, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations could be made by those skilled in the art without departing from the scope and spirit of the invention. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes, modifications and variations that fall within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (12)

1. A method (300) for NLP-based diagnosis, comprising:
-encoding (S301) the text (20) into a semantic presentation (50);
-decoding (S302) the semantic representation (50) to extract at least one constituent in a predefined hierarchy, wherein the predefined hierarchy comprises at least two constituents belonging to different levels and each constituent indicates an entity related to an event described in the text (20);
-outputting (S303) a fault signature (30) comprising the at least one constituent part.
2. The method (300) of claim 1,
-the method (300) further comprises: calculating (S301') a weight for each word in the semantic representation (50);
-outputting (S303) the fault signature (30) comprising the extracted constituent parts comprises: outputting (S303) the fault features (30) comprising the extracted constituents according to the calculated weights.
3. The method (300) of claim 1, wherein the predefined hierarchy comprises: a device, a portion, and a component, wherein the device belongs to a highest level, the portion belongs to a higher level, and the component belongs to a lowest level.
4. The method (300) of claim 1, wherein the fault signature (30) further comprises a type of the event described in the text (20).
5. The method (300) of claim 4, wherein the fault signature (30) has the following format:
Device name_Partname_Componentname_type of the event。
6. a device (10) for NLP-based diagnostics, comprising:
-an encoder (105) configured to encode the text (20) into a semantic presentation (50);
-a decoder (106) configured to:
-decoding the semantic representation (50) to extract at least one constituent part in a predefined hierarchy, wherein the predefined hierarchy comprises at least two constituent parts belonging to different levels and each constituent part indicates an entity related to an event described in the text (20); and
-outputting a fault signature (30) comprising the at least one constituent part.
7. The apparatus (10) of claim 6,
-the device (10) further comprising: a weight calculator (107) configured to calculate a weight for each word in the semantic presentation (50);
-the decoder (106) is further configured to output the fault signature (30) comprising the extracted constituent parts according to the calculated weights.
8. The device (10) of claim 6, wherein the predefined hierarchy comprises: devices, sections, and components, where a device belongs to the highest level, a section belongs to a higher level, and a component belongs to the lowest level.
9. The device (10) of claim 6, wherein the fault signature (30) further comprises a type of the event described in the text (20).
10. The device (10) according to claim 9, wherein the fault signature (30) has the following format:
Device name_Partname_Componentname_type of the event。
11. an apparatus (300) for NLP-based diagnostics, comprising:
-at least one processor (102);
-at least one memory (101), coupled to the at least one processor (306), configured to perform the method according to any one of claims 1 to 5.
12. A computer-readable medium for storing NLP-based diagnostics of computer-executable instructions, wherein the computer-executable instructions, when executed, cause at least one processor to perform the method of any one of claims 1 to 5.
CN202080087457.4A 2020-01-14 2020-01-14 Method and apparatus for NLP-based diagnostics Pending CN114830106A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/072075 WO2021142630A1 (en) 2020-01-14 2020-01-14 Method and apparatus for nlp based diagnostics

Publications (1)

Publication Number Publication Date
CN114830106A true CN114830106A (en) 2022-07-29

Family

ID=76863376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080087457.4A Pending CN114830106A (en) 2020-01-14 2020-01-14 Method and apparatus for NLP-based diagnostics

Country Status (2)

Country Link
CN (1) CN114830106A (en)
WO (1) WO2021142630A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10261995B1 (en) * 2016-09-28 2019-04-16 Amazon Technologies, Inc. Semantic and natural language processing for content categorization and routing
CN108447534A (en) * 2018-05-18 2018-08-24 灵玖中科软件(北京)有限公司 A kind of electronic health record data quality management method based on NLP
CN109446326B (en) * 2018-11-01 2021-04-20 大连理工大学 Biomedical event combined extraction method based on replication mechanism
CN109189946B (en) * 2018-11-06 2021-11-26 湖南云智迅联科技发展有限公司 Method for converting equipment fault statement description into knowledge graph expression
CN109740053B (en) * 2018-12-26 2021-03-05 广州灵聚信息科技有限公司 Sensitive word shielding method and device based on NLP technology

Also Published As

Publication number Publication date
WO2021142630A1 (en) 2021-07-22

Similar Documents

Publication Publication Date Title
Huang et al. C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models
Leopold et al. Supporting process model validation through natural language generation
US11861519B2 (en) System and method for semantics based probabilistic fault diagnosis
Parnas Precise documentation: The key to better software
Konrad et al. Real-time specification patterns
Konrad et al. Facilitating the construction of specification pattern-based properties
CN1457041B (en) System for automatically annotating training data for natural language understanding system
US20220075944A1 (en) Learning to extract entities from conversations with neural networks
CN112131861B (en) Dialog state generation method based on hierarchical multi-head interaction attention
Turnu et al. Entropy of the degree distribution and object-oriented software quality
Enevoldsen et al. DaCy: A unified framework for Danish NLP
Alsuhaibani et al. Heuristic-based part-of-speech tagging of source code identifiers and comments
Li et al. Do pre-trained language models indeed understand software engineering tasks?
Regnell Requirements engineering with use cases-A basis for software development
Breck et al. Data infrastructure for machine learning
US20220222576A1 (en) Data generation apparatus, method and learning apparatus
CN114662676A (en) Model optimization method and device, electronic equipment and computer-readable storage medium
Salmi et al. Content-based recommender support system for counselors in a suicide prevention chat helpline: design and evaluation study
Libal et al. Towards an executable methodology for the formalization of legal texts
CN114830106A (en) Method and apparatus for NLP-based diagnostics
CN112836013A (en) Data labeling method and device, readable storage medium and electronic equipment
Temperley et al. Current applications and future potential of C hat GPT in radiology: A systematic review
Dautovic et al. Automated quality defect detection in software development documents
Antia et al. Assessing and enhancing bottom-up CNL design for competency questions for ontologies
CN117151117B (en) Automatic identification method, device and medium for power grid lightweight unstructured document content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination