WO2021142630A1 - Method and apparatus for nlp based diagnostics - Google Patents

Method and apparatus for nlp based diagnostics Download PDF

Info

Publication number
WO2021142630A1
WO2021142630A1 PCT/CN2020/072075 CN2020072075W WO2021142630A1 WO 2021142630 A1 WO2021142630 A1 WO 2021142630A1 CN 2020072075 W CN2020072075 W CN 2020072075W WO 2021142630 A1 WO2021142630 A1 WO 2021142630A1
Authority
WO
WIPO (PCT)
Prior art keywords
symptom
constituent
text
hierarchical structure
component
Prior art date
Application number
PCT/CN2020/072075
Other languages
French (fr)
Inventor
Ruo Gu SHENG
Hao Tian HUI
Daniel Schneegass
Xiao Yin CHE
Jiao Jian WANG
Original Assignee
Siemens Ltd., China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Ltd., China filed Critical Siemens Ltd., China
Priority to PCT/CN2020/072075 priority Critical patent/WO2021142630A1/en
Priority to CN202080087457.4A priority patent/CN114830106A/en
Publication of WO2021142630A1 publication Critical patent/WO2021142630A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Definitions

  • the present invention relates to techniques of NLP, and more particularly to a method, apparatus and computer-readable storage medium for NLP based diagnostics.
  • NLP Natural Language Processing
  • FIG. 1 shows how a grammar pattern model looks like.
  • patterns describing high pressure are set up. These patterns are defined by combination of fixed words (like high, limit) , multiple matching symbols (like ⁇ pressure> which matches with any form of pressure) and wildcards (like*which matches with any sequence of symbols) . If input text matches with any pattern in the model, it is labeled with the symptom defined in the model.
  • the grammar pattern model shown in FIG. 1 is capable of labeling a lot of documents, and less domain knowledge is required when using the model. however, It is still time consuming to create grammar pattern models one by one.
  • Embodiments of the present disclosure include methods, apparatuses for NLP based diagnostics, which can provide solutions of symptom labeling with high efficiency and accuracy. Also taking into account of technical domains such as industrial failure diagnostics, it is important to trace which part of which device cause a failure. So, embodiments of the present disclosure also provide solutions of symptom labeling with failure trace back information, such as device, part causing the failure.
  • a method for NLP based diagnostics includes following steps:
  • the -decoding the semantic presentation to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure comprises at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text;
  • an apparatus for NLP based diagnostics includes:
  • -an encoder configured to encode a text into a semantic presentation
  • -a decoder configured to:
  • the semantic presentation to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure includes at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text;
  • an apparatus for NLP based diagnostics includes:
  • At least one memory coupled to the at least one processor, configured to execute method according to method according to the first aspect of the present disclosure.
  • a computer-readable medium stores executable instructions, which upon execution by a processor, enables the processor to execute the method according to the first aspect of the present disclosure.
  • a weight for each word in the semantic presentation is calculated, and the symptom is output according to calculated weights.
  • pre-defined hierarchical structure could help in finding omitted device/part names, and this missing information could refine the mechanism of weight calculation to get better calculation result and better labeling result.
  • symptoms can be labeled for long sentences with higher accuracy, since with weights relationships between words in long sentences can be captures and higher weights can be given to indicate this kind of relationships.
  • the text is from industrial domain
  • the pre-defined hierarchical structure can include: device, part and component, wherein, device belongs to the highest hierarchy, part belongs to the higher hierarchy and component belongs to the lowest hierarchy.
  • the symptom further includes type of the event described in the text.
  • FIG. 1 depicts a block diagram of a grammar pattern model for symptom labeling.
  • FIG. 2 depicts a block diagrams of an apparatus in accordance with one embodiment of the present disclosure.
  • FIG. 3 depicts a block diagrams of an apparatus in accordance with one embodiment of the present disclosure.
  • FIG. 4 depicts a flow diagram of a method in accordance with one embodiment of the present disclosure.
  • the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements.
  • the terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
  • FIG. 2 depicts a block diagrams of an apparatus in accordance with one embodiment of the present disclosure.
  • the apparatus 10 for NLP based diagnostics presented in the present disclosure can be implemented as a network of computer processors, to execute the method for NLP based diagnostics presented in the present disclosure.
  • the apparatus 10 can also be a single computer, as shown in FIG. 2, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) .
  • the apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101.
  • Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein.
  • the at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics processing unit (GPU) , state machines, etc.
  • embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions.
  • various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless.
  • the instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.
  • the at least one memory 101 shown in FIG. 2 can contain a symptom labeling program 40, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method for NLP based diagnostics presented in the present disclosure.
  • text 20 can be processed by the apparatus 10 to have a symptom 30 labeled.
  • the text 20 can include description on a failure in a specific domain, such as industrial domain, including but not limited to: chemical industry, petrochemical and refineries, pulp and paper manufacturing, boiler controlling and power plant systems, power generation, nuclear power generation, water management, water treatment, food/beverage production, fertilizer production, metal/metal alloys manufacturing, metallurgical processing, automobile manufacturing, pharmaceutical manufacturing, food refining, product manufacturing and processing, etc.
  • the text 20 can be pre-stored in the at least one memory 101, or input via I/O ports 103 of the apparatus 10.
  • the symptom 30 worked out can also be stored in the at last one memory 101, and/or output via I/O ports 103, and even may be displayed on monitor 104 of the apparatus 10.
  • An example of a text might be “Induced fan tripped inexpertly during run time, bearing of motor temperature high and raise alarm” .
  • symptoms detected are “induced fan tripped” , “motor temperature high” and “bearing temperature high” , which ignore the hierarchical structure.
  • hierarchical structure information is defined and applied in symptom labeling, taking the above example, symptoms detected will be “induced fan tripped” , “induced fan motor bearing temperature high” , in which induced fan –motor –bearing forms a hierarchical structure.
  • the symptom labeling program 40 can include:
  • -an encoder 105 configured to encode a text 20 into a semantic presentation 50
  • -a decoder 106 configured to:
  • the semantic presentation 50 to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure includes at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text 20;
  • a symptom to be labeled usually is expected to be bound with an entity, such as a device.
  • an entity such as a device.
  • people may use part/component name instead of a device name.
  • part/component name instead of a device name.
  • expected symptom could be “Component a of Device A is leaking” . While since Device A is omitted in the text, according to current solutions, we can only label the symptom as “Component a is leaking” , connection of Component a and Device A is lost.
  • a hierarchical structure of entities can be predefined, such as “device, part and component” , wherein device belongs to the highest hierarchy, part belongs to the higher hierarchy and component belongs to the lowest hierarchy.
  • This hierarchical structure should be pre-defined before NLP modeling.
  • a symptom might have following format:
  • the symptom labeling problem is converted to a sequence to sequence problem.
  • the input sequence is the original text 20, which could be processed as a sequence of words ⁇ 20 1 , 20 2 , ..., 20 n ⁇ .
  • the output sequence is the key features of symptom 30, which includes at least one constituent in a pre-defined hierarchical structure and optionally type of event, as shown in FIG. 3, 30 1 , 30 2 , ..., 30 n .
  • the output sequence can have following format: ⁇ Device name, Part name, Component name, type of event ⁇ .
  • the encoder 105 is configured to convert the input sequence, that is, to convert the text 20 into a semantic presentation 50. Then the decoder 106 is configured to extract at least one constituent in the pre-defined hierarchical structure, for example device A, part 1 and component a.
  • RNN recurrent neural network
  • LSTM long short-term memory
  • the symptom labeling program 40 can also include a weight calculator 107, configured to calculate a weight for each word in the semantic presentation 50, and the decoder 106 can be further configured to output the symptom 30 including the extracted constituent according to calculated weights.
  • the weight calculator 107 can be implemented by an attention model, which uses attention mechanism to achieve better labeling result.
  • an attention model which uses attention mechanism to achieve better labeling result.
  • different words in a text 20 have different influence in determining the final symptom 30. If a word or words in the text 20 represent a device/part/component name, it or they should be paid more attention to. Also, for some words that are strongly correlated to device/part/component names and/or type of event, they should be given higher weights.
  • the attention model can give different weights 60 for words in the text 20 when generating the output sequence of symptom 30. And the attention model could also learn which words are more important.
  • the encoder 105, the decoder 106 and the weight calculator 107 are described above as software modules of the symptom labeling program 40. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.
  • the present disclosure may include apparatuses having different architecture than shown in FIG. 2 ⁇ 4.
  • the architecture above is merely exemplary, and used to explain the exemplary method 300 shown in FIG. 5.
  • One exemplary method 300 according to the present disclosure includes following steps:
  • S302 decoding S302 the semantic presentation 50, to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure comprises at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text 20;
  • the method 300 further includes: S301’: calculating a weight for each word in the semantic presentation 50. Then the step S303 further includes: outputting the symptom 30 including the extracted constituent according to calculated weights.
  • the pre-defined hierarchical structure includes: device, part and component, wherein, device belongs to the highest hierarchy, part belongs to the higher hierarchy and component belongs to the lowest hierarchy.
  • the symptom 30 further comprises type of the event described in the text 20.
  • the symptom 30 may have following format:
  • a computer-readable medium is also provided in the present disclosure, storing computer-executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
  • a computer program which is being executed by at least one processor and performs any of the methods presented in this disclosure.
  • correlation among different constituents can be captured.
  • the symptom generated with the hierarchical structure information can be used as good features in other NLP tasks.
  • symptoms can be labeled for long sentences with higher accuracy.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A method (200), apparatus for NLP based diagnostics are proposed, to provide solutions of symptom labeling with high efficiency and accuracy. A method (200) for NLP based diagnostics includes: encoding (S301) a text (20) into a semantic presentation (50); decoding (S302) the semantic presentation (50), to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure comprises at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text (20); outputting (S303) a symptom (30) comprising the at least one constituent.

Description

Method and apparatus for NLP based diagnostics Technical Field
The present invention relates to techniques of NLP, and more particularly to a method, apparatus and computer-readable storage medium for NLP based diagnostics.
Background Art
Natural Language Processing (NLP) based diagnostic solutions are widely accepted and used nowadays. A phrase with clear semantic meaning is called a symptom. Symptom labels could be used as important features for diagnostic solutions, for example, giving clues to what kind of failure happens to which device.
For symptom labeling, currently there are manual and semi-automatic solutions. For a small data set, domain experts can manually label symptoms in documents. However, manual labeling is time consuming, and it requires a lot of domain knowledge for correct labeling.
Another method for symptom labeling is based on grammar pattern matching. For a specific issue, e.g. pressure high, a model with finite grammar patterns can be built to match with the descriptions for high pressure. FIG. 1 shows how a grammar pattern model looks like. In a grammar pattern model, patterns describing high pressure are set up. These patterns are defined by combination of fixed words (like high, limit) , multiple matching symbols (like <pressure> which matches with any form of pressure) and wildcards (like*which matches with any sequence of symbols) . If input text matches with any pattern in the model, it is labeled with the symptom defined in the model.
The grammar pattern model shown in FIG. 1 is capable of labeling a lot of documents, and less domain knowledge is required when using the model. however, It is still time consuming to create grammar pattern models one by one.
Summary of the Invention
Embodiments of the present disclosure include methods, apparatuses for NLP based diagnostics, which can provide solutions of symptom labeling with high efficiency and accuracy. Also taking into account of technical domains such as industrial failure diagnostics, it is important to trace which part of which device  cause a failure. So, embodiments of the present disclosure also provide solutions of symptom labeling with failure trace back information, such as device, part causing the failure.
According to a first aspect of the present disclosure, a method for NLP based diagnostics is presented, it includes following steps:
-encoding a text into a semantic presentation;
-decoding the semantic presentation, to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure comprises at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text;
-outputting a symptom comprising the at least one constituent.
According to a second aspect of the present disclosure, an apparatus for NLP based diagnostics is presented, it includes:
-an encoder, configured to encode a text into a semantic presentation;
-a decoder, configured to:
-decode the semantic presentation, to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure includes at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text; and
-output a symptom including the at least one constituent.
According to a third aspect of the present disclosure, an apparatus for NLP based diagnostics is presented, it includes:
-at least one processor;
-at least one memory, coupled to the at least one processor, configured to execute method according to method according to the first aspect of the present disclosure.
According to a fourth aspect of the present disclosure, a computer-readable medium is presented, it stores executable instructions, which upon execution by a processor, enables the processor to execute the method according to the first aspect of the present disclosure.
In the present disclosure, with pre-defined hierarchical structure of constituents,  important information indicating entity related to event described in a text can be acquired and recorded in symptom, which can be used as good features in other NLP tasks. Also with the encoder-decoder structure, an auto-labeling solution is provided, human efforts for manual labeling or grammar model creation can be sharply reduced. For the solutions provided are data driven, domain knowledge are not much required, efforts of domain experts in model development can also be reduced.
In an embodiment of the present disclosure, a weight for each word in the semantic presentation is calculated, and the symptom is output according to calculated weights. In weight calculation, pre-defined hierarchical structure could help in finding omitted device/part names, and this missing information could refine the mechanism of weight calculation to get better calculation result and better labeling result. Furthermore, symptoms can be labeled for long sentences with higher accuracy, since with weights relationships between words in long sentences can be captures and higher weights can be given to indicate this kind of relationships.
In an embodiment of the present disclosure, the text is from industrial domain, the pre-defined hierarchical structure can include: device, part and component, wherein, device belongs to the highest hierarchy, part belongs to the higher hierarchy and component belongs to the lowest hierarchy.
In an embodiment of the present disclosure, the symptom further includes type of the event described in the text.
Brief Description of the Drawings
The above mentioned attributes and other features and advantages of the present technique and the manner of attaining them will become more apparent and the present technique itself will be better understood by reference to the following description of embodiments of the present technique taken in conjunction with the accompanying drawings, wherein:
FIG. 1 depicts a block diagram of a grammar pattern model for symptom labeling.
FIG. 2 depicts a block diagrams of an apparatus in accordance with one embodiment of the present disclosure.
FIG. 3 depicts a block diagrams of an apparatus in accordance with one embodiment of the present disclosure.
FIG. 4 depicts a flow diagram of a method in accordance with one embodiment of the present disclosure.
Reference Numbers:
10, apparatus for NLP based diagnostics
101, at least one memory
102, at least one processor
103, I/O ports
104, monitor
105, encoder
106, decoder
107, weight calculator
20, text
30, symptom
40, symptom labeling program
50, semantic presentation
60, weight
300, method for NLP based diagnostics
S301, encoding
S301’, weight calculating
S302, decoding
S303, outputting symptom
Detailed Description of Example Embodiments
Hereinafter, above-mentioned and other features of the present technique are described in detail. Various embodiments are described with reference to the drawing, where like reference numerals are used to refer to like elements throughout. In the following description, for purpose of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be noted that the illustrated embodiments are intended to explain, and not to limit the invention. It may be evident that such embodiments may be practiced without these specific details.
When introducing elements of various embodiments of the present disclosure, the articles “a” , “an” , “the” and “said” are intended to mean that there are one or more of the elements. The terms “comprising” , “including” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements.
Now the present disclosure will be described hereinafter in details by referring to FIG. 2 to FIG. 5.
FIG. 2 depicts a block diagrams of an apparatus in accordance with one embodiment of the present disclosure. The apparatus 10 for NLP based diagnostics presented in the present disclosure can be implemented as a network of computer processors, to execute the method for NLP based diagnostics presented in the present disclosure. the apparatus 10 can also be a single computer, as shown in FIG. 2, including at least one memory 101, which includes computer-readable medium, such as a random access memory (RAM) . The apparatus 10 also includes at least one processor 102, coupled with the at least one memory 101. Computer-executable instructions are stored in the at least one memory 101, and when executed by the at least one processor 102, can cause the at least one processor 102 to perform the steps described herein. The at least one processor 102 may include a microprocessor, an application specific integrated circuit (ASIC) , a digital signal processor (DSP) , a central processing unit (CPU) , a graphics processing unit (GPU) , state machines, etc. embodiments of computer-readable medium include, but not limited to a floppy disk, CD-ROM, magnetic disk, memory chip, ROM, RAM, an ASIC, a configured processor, all optical media, all magnetic tape or other magnetic media, or any other medium from which a computer processor can read instructions. Also, various other forms of computer-readable medium may transmit or carry instructions to a computer, including a router, private or public network, or other transmission device or channel, both wired and wireless. The instructions may include code from any computer-programming language, including, for example, C, C++, C#, Visual Basic, Java, and JavaScript.
The at least one memory 101 shown in FIG. 2 can contain a symptom labeling program 40, when executed by the at least one processor 102, causing the at least one processor 102 to execute the method for NLP based diagnostics presented in the present disclosure.
Referring to FIG. 2, text 20 can be processed by the apparatus 10 to have a symptom 30 labeled. The text 20 can include description on a failure in a specific  domain, such as industrial domain, including but not limited to: chemical industry, petrochemical and refineries, pulp and paper manufacturing, boiler controlling and power plant systems, power generation, nuclear power generation, water management, water treatment, food/beverage production, fertilizer production, metal/metal alloys manufacturing, metallurgical processing, automobile manufacturing, pharmaceutical manufacturing, food refining, product manufacturing and processing, etc. The text 20 can be pre-stored in the at least one memory 101, or input via I/O ports 103 of the apparatus 10. The symptom 30 worked out can also be stored in the at last one memory 101, and/or output via I/O ports 103, and even may be displayed on monitor 104 of the apparatus 10.
In some domains, such as industrial domain, text fields have their own features. Taking industrial domain as an example, in an industrial text, e.g. diagnostic report, besides descriptive words about what kind of failure happens, it is also important to know which part of which device causes a failure, even sometimes, we need to know exactly which component of a part cause a failure. Therefore, we need to pay special attention to device, part and even component’s name in a text. What’s more, the device, part, component’s names have a hierarchical structure. But unfortunately, in currently implemented NLP based diagnostics solutions, information of device, part, component’s names and their hierarchical structure are neglected during symptom labeling.
An example of a text might be “Induced fan tripped inexpertly during run time, bearing of motor temperature high and raise alarm” . In currently implemented solutions, symptoms detected are “induced fan tripped” , “motor temperature high” and “bearing temperature high” , which ignore the hierarchical structure. In solutions provided in the present disclosure, hierarchical structure information is defined and applied in symptom labeling, taking the above example, symptoms detected will be “induced fan tripped” , “induced fan motor bearing temperature high” , in which induced fan –motor –bearing forms a hierarchical structure.
Now referring to FIG. 3 and FIG. 2, in one embodiment, the symptom labeling program 40 can include:
-an encoder 105, configured to encode a text 20 into a semantic presentation 50;
-a decoder 106, configured to:
-decode the semantic presentation 50, to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure includes at least two constituents belonging to different hierarchies, and each  constituent indicates an entity related to an event described in the text 20; and
-output a symptom 30 including the at least one constituent.
In some domains like industry, a symptom to be labeled usually is expected to be bound with an entity, such as a device. while in actual text, people may use part/component name instead of a device name. Here is an example:
Device A stopped during operation. The operators did a thorough check and find that Component a is leaking.
So, expected symptom could be “Component a of Device A is leaking” . While since Device A is omitted in the text, according to current solutions, we can only label the symptom as “Component a is leaking” , connection of Component a and Device A is lost.
With solutions provided in the present disclosure, a hierarchical structure of entities can be predefined, such as “device, part and component” , wherein device belongs to the highest hierarchy, part belongs to the higher hierarchy and component belongs to the lowest hierarchy. This hierarchical structure should be pre-defined before NLP modeling. A symptom might have following format:
Device name_Part name_Component name_type of event.
Once the hierarchical structure is defined, the symptom labeling problem is converted to a sequence to sequence problem. The input sequence is the original text 20, which could be processed as a sequence of words {20 1, 20 2, …, 20 n} . The output sequence is the key features of symptom 30, which includes at least one constituent in a pre-defined hierarchical structure and optionally type of event, as shown in FIG. 3, 30 1, 30 2, …, 30 n. Taking the former industrial text as an example, the output sequence can have following format: {Device name, Part name, Component name, type of event} .
Here, the encoder 105 is configured to convert the input sequence, that is, to convert the text 20 into a semantic presentation 50. Then the decoder 106 is configured to extract at least one constituent in the pre-defined hierarchical structure, for example device A, part 1 and component a.
In this encoder-decoder framework shown in FIG. 3, different models can be chosen for encoder 105 and decoder 106. To capture the internal relationship among the sequence of words, recurrent neural network (RNN) or long short-term memory (LSTM) can be used for encoder 105. For decoder 106, RNN or LSTM can also be used to generate the symptom 30, including a sequence of the above constituents and optionally type of event.
Now referring to FIG. 4, the symptom labeling program 40 can also include a  weight calculator 107, configured to calculate a weight for each word in the semantic presentation 50, and the decoder 106 can be further configured to output the symptom 30 including the extracted constituent according to calculated weights.
Optionally, the weight calculator 107 can be implemented by an attention model, which uses attention mechanism to achieve better labeling result. Usually, different words in a text 20 have different influence in determining the final symptom 30. If a word or words in the text 20 represent a device/part/component name, it or they should be paid more attention to. Also, for some words that are strongly correlated to device/part/component names and/or type of event, they should be given higher weights. The attention model can give different weights 60 for words in the text 20 when generating the output sequence of symptom 30. And the attention model could also learn which words are more important.
Although the encoder 105, the decoder 106 and the weight calculator 107 are described above as software modules of the symptom labeling program 40. Also, they can be implemented via hardware, such as ASIC chips. They can be integrated into one chip, or separately implemented and electrically connected.
It should be mentioned that the present disclosure may include apparatuses having different architecture than shown in FIG. 2~4. The architecture above is merely exemplary, and used to explain the exemplary method 300 shown in FIG. 5.
Various methods in accordance with the present disclosure may be carried out. One exemplary method 300 according to the present disclosure includes following steps:
S301: encoding a text 20 into a semantic presentation 50;
S302: decoding S302 the semantic presentation 50, to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure comprises at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text 20;
S303: outputting a symptom 30 comprising the at least one constituent.
Optionally, the method 300 further includes: S301’: calculating a weight for each word in the semantic presentation 50. Then the step S303 further includes: outputting the symptom 30 including the extracted constituent according to calculated weights.
Optionally, the pre-defined hierarchical structure includes: device, part and component, wherein, device belongs to the highest hierarchy, part belongs to the  higher hierarchy and component belongs to the lowest hierarchy.
Optionally, the symptom 30 further comprises type of the event described in the text 20. And the symptom 30 may have following format:
Device name_Part name_Component name_type of the event.
A computer-readable medium is also provided in the present disclosure, storing computer-executable instructions, which upon execution by a computer, enables the computer to execute any of the methods presented in this disclosure.
A computer program, which is being executed by at least one processor and performs any of the methods presented in this disclosure.
Compared with manual and semi-automatic symptom labeling approach based on grammar pattern matching, solutions provided by the present disclosure have following advantages:
-human efforts for manual labeling and grammar model creation can be saved. This will save time for both customer and model developer.
-solutions provided in the present disclosure are data driven, and do not require much domain knowledge. This will reduce efforts of domain experts in model development.
-with solutions provided in the present disclosure, correlation among different constituents, such as device, part and component, can be captured. The symptom generated with the hierarchical structure information can be used as good features in other NLP tasks.
-furthermore, with solutions provided in the present disclosure, symptoms can be labeled for long sentences with higher accuracy.
While the present technique has been described in detail with reference to certain embodiments, it should be appreciated that the present technique is not limited to those precise embodiments. Rather, in view of the present disclosure which describes exemplary modes for practicing the invention, many modifications and variations would present themselves, to those skilled in the art without departing from the scope and spirit of this invention. The scope of the invention is, therefore, indicated by the following claims rather than by the foregoing description. All changes, modifications, and variations coming within the meaning and range of equivalency of the claims are to be considered within their scope.

Claims (12)

  1. A method (300) for NLP based diagnostics, comprising:
    - encoding (S301) a text (20) into a semantic presentation (50) ;
    - decoding (S302) the semantic presentation (50) , to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure comprises at least two constituents belonging to different hierarchies, and each constituent indicates an entity related to an event described in the text (20) ;
    - outputting (S303) a symptom (30) comprising the at least one constituent.
  2. the method (300) according to the claim 1, wherein,
    - the method (300) further comprises: calculating (S301’ ) a weight for each word in the semantic presentation (50) ;
    - outputting (S303) a symptom (30) comprising the extracted constituent comprises: outputting (S303) the symptom (30) including the extracted constituent according to calculated weights.
  3. the method (300) according to the claim 1, wherein the pre-defined hierarchical structure comprises: device, part and component, wherein, device belongs to the highest hierarchy, part belongs to the higher hierarchy and component belongs to the lowest hierarchy.
  4. the method (300) according to the claim1, wherein the symptom (30) further comprises type of the event described in the text (20) .
  5. the method (300) according to the claim 4, wherein the symptom (30) has following format:
    Device name_Part name_Component name_type of the event.
  6. An apparatus (10) for NLP based diagnostics, comprising:
    - an encoder (105) , configured to encode a text (20) into a semantic presentation (50) ;
    - a decoder (106) , configured to:
    - decode the semantic presentation (50) , to extract at least one constituent in a pre-defined hierarchical structure, wherein the pre-defined hierarchical structure comprises at least two constituents belonging to different hierarchies, and each  constituent indicates an entity related to an event described in the text (20) ; and
    - output a symptom (30) comprising the at least one constituent.
  7. the apparatus (10) according to the claim 6, wherein,
    - the apparatus (10) further comprises: a weight calculator (107) , configured to calculate a weight for each word in the semantic presentation (50) ;
    - the decoder (106) is further configured to output the symptom (30) including the extracted constituent according to calculated weights.
  8. the apparatus (10) according to the claim 6, wherein the pre-defined hierarchical structure comprises: device, part and component, wherein, device belongs to the highest hierarchy, part belongs to the higher hierarchy and component belongs to the lowest hierarchy.
  9. the apparatus (10) according to the claim 6, wherein the symptom (30) further comprises type of the event described in the text (20) .
  10. the apparatus (10) according to the claim 9, wherein the symptom (30) has following format:
    Device name_Part name_Component name_type of the event.
  11. An apparatus (300) for NLP based diagnostics, comprising:
    - at least one processor (102) ;
    - at least one memory (101) , coupled to the at least one processor (306) , configured to execute method according to any of claims 1~5.
  12. A computer-readable medium for NLP based diagnostics storing computer-executable instructions, wherein the computer-executable instructions when executed cause at least one processor to execute method according to any of claims 1~5.
PCT/CN2020/072075 2020-01-14 2020-01-14 Method and apparatus for nlp based diagnostics WO2021142630A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/072075 WO2021142630A1 (en) 2020-01-14 2020-01-14 Method and apparatus for nlp based diagnostics
CN202080087457.4A CN114830106A (en) 2020-01-14 2020-01-14 Method and apparatus for NLP-based diagnostics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/072075 WO2021142630A1 (en) 2020-01-14 2020-01-14 Method and apparatus for nlp based diagnostics

Publications (1)

Publication Number Publication Date
WO2021142630A1 true WO2021142630A1 (en) 2021-07-22

Family

ID=76863376

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/072075 WO2021142630A1 (en) 2020-01-14 2020-01-14 Method and apparatus for nlp based diagnostics

Country Status (2)

Country Link
CN (1) CN114830106A (en)
WO (1) WO2021142630A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447534A (en) * 2018-05-18 2018-08-24 灵玖中科软件(北京)有限公司 A kind of electronic health record data quality management method based on NLP
CN109189946A (en) * 2018-11-06 2019-01-11 湖南云智迅联科技发展有限公司 A method of the description of equipment fault sentence is converted into knowledge mapping expression
CN109446326A (en) * 2018-11-01 2019-03-08 大连理工大学 Biomedical event based on replicanism combines abstracting method
US10261995B1 (en) * 2016-09-28 2019-04-16 Amazon Technologies, Inc. Semantic and natural language processing for content categorization and routing
CN109740053A (en) * 2018-12-26 2019-05-10 广州灵聚信息科技有限公司 Sensitive word screen method and device based on NLP technology

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10261995B1 (en) * 2016-09-28 2019-04-16 Amazon Technologies, Inc. Semantic and natural language processing for content categorization and routing
CN108447534A (en) * 2018-05-18 2018-08-24 灵玖中科软件(北京)有限公司 A kind of electronic health record data quality management method based on NLP
CN109446326A (en) * 2018-11-01 2019-03-08 大连理工大学 Biomedical event based on replicanism combines abstracting method
CN109189946A (en) * 2018-11-06 2019-01-11 湖南云智迅联科技发展有限公司 A method of the description of equipment fault sentence is converted into knowledge mapping expression
CN109740053A (en) * 2018-12-26 2019-05-10 广州灵聚信息科技有限公司 Sensitive word screen method and device based on NLP technology

Also Published As

Publication number Publication date
CN114830106A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
Konrad et al. Real-time specification patterns
Leopold et al. Supporting process model validation through natural language generation
Poliak A survey on recognizing textual entailment as an NLP evaluation
US8484141B2 (en) Evaluating ontologies
Konrad et al. Facilitating the construction of specification pattern-based properties
JP6601470B2 (en) NATURAL LANGUAGE GENERATION METHOD, NATURAL LANGUAGE GENERATION DEVICE, AND ELECTRONIC DEVICE
US10460028B1 (en) Syntactic graph traversal for recognition of inferred clauses within natural language inputs
CN112163681A (en) Equipment fault cause determination method, storage medium and electronic equipment
Goodwin et al. Towards zero-shot conditional summarization with adaptive multi-task fine-tuning
JP2022145623A (en) Method and device for presenting hint information and computer program
Voll et al. Improving the utility of speech recognition through error detection
WO2021142630A1 (en) Method and apparatus for nlp based diagnostics
Lee et al. TM-generation model: a template-based method for automatically solving mathematical word problems
An et al. Real-time Statistical Log Anomaly Detection with Continuous AIOps Learning.
CN115130545A (en) Data processing method, electronic device, program product, and medium
Barr et al. Verification and validation of language processing systems: is it evaluation?
CN107256220A (en) Data logging generation method, device and electronic equipment
Feldman et al. A cognitive journey for requirements engineering
Moketar et al. Extraction of essential requirements from natural language requirements
CN112162738B (en) Data conversion method and device, terminal equipment and storage medium
US20220121818A1 (en) Dependency graph-based word embeddings model generation and utilization
CN116662517A (en) Method, device, equipment and storage medium for generating inquiry information
Seyoum et al. Comparing Neural Network Parsers for a Less-resourced and Morphologically-rich Language: Amharic Dependency Parser
Li et al. Enhancing Relational Triple Extraction in Specific Domains: Semantic Enhancement and Synergy of Large Language Models and Small Pre-Trained Language Models.
Hershowitz et al. Causal knowledge extraction from long text maintenance documents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913585

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20913585

Country of ref document: EP

Kind code of ref document: A1