CN109783801B - Electronic device, multi-label classification method and storage medium - Google Patents
Electronic device, multi-label classification method and storage medium Download PDFInfo
- Publication number
- CN109783801B CN109783801B CN201811529912.2A CN201811529912A CN109783801B CN 109783801 B CN109783801 B CN 109783801B CN 201811529912 A CN201811529912 A CN 201811529912A CN 109783801 B CN109783801 B CN 109783801B
- Authority
- CN
- China
- Prior art keywords
- sentences
- zero
- sentence
- splitting
- antecedent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an electronic device, a multi-label classification method and a storage medium, wherein the method comprises the following steps: and (3) identifying and resolving zero-pronouns: recognizing and resolving zero pronouns of sentences to be classified to obtain expanded sentences; sentence splitting: carrying out syntactic analysis on the extended sentences, and extracting parallel relation items in the extended sentences; splitting the expansion sentence through replacement or marking training to form a plurality of split sentences; or pertinently designing corpus labels, manually marking parallel relation items and other items in the digested expansion sentences, training a Bi-LSTM-CRF model for splitting sentences, and classifying and splitting the expansion sentences by using the trained Bi-LSTM-CRF model to form a plurality of split sentences. The method can effectively split the complex multi-label sentence into a plurality of simple single-label sentences.
Description
Technical Field
The invention relates to the technical field of multi-label classification, in particular to an electronic device, a multi-label classification method and a storage medium.
Background
The existing deep learning statement multi-label classification technology has two main directions: firstly, multi-label classification indexes are adopted, such as: a hamming loss directly predicts the label set; and secondly, converting the sentence into a plurality of single-label two-classification problems, and respectively predicting the probability of each label coincidence. The deep learning sentence multi-label classification technology has the defects that the degree of freedom of label set is high, the training difficulty is high, a large number of independent training samples are needed, a single label training sample cannot be shared and the like; the latter prediction results may be disturbed by non-current predicted tag information, or may have predictable deviations because the training samples of a single tag are not consistent with the test sample distribution of multiple tags.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an electronic device, a multi-label classification method and a storage medium.
In order to achieve the above object, the present invention provides an electronic device, including a memory and a processor connected to the memory, where the memory stores a processing system that can be executed on the processor, and the processing system when executed by the processor implements the following steps:
and (3) identifying and resolving zero-pronouns:
recognizing and resolving zero pronouns of the sentences to be classified to obtain expanded sentences, wherein the zero pronouns are recognizable phrases or blank spaces of words in the sentences to be classified;
sentence splitting:
carrying out syntactic analysis on the extended sentences, and extracting parallel relation items in the extended sentences; splitting the expansion sentence through replacement or marking training to form a plurality of split sentences;
or pertinently designing corpus labels, manually marking parallel relation items and other items in the digested expansion sentences, training a Bi-LSTM-CRF model for splitting sentences, and classifying and splitting the expansion sentences by using the trained Bi-LSTM-CRF model to form a plurality of split sentences; the other items include a shared item and a deleted item.
Further, the processing system of the electronic device further implements an intention recognition step when executed by the processor, where the intention recognition step: and respectively inputting a plurality of split sentences obtained in the sentence splitting step as a model for single intention recognition to obtain a plurality of intentions.
The above electronic device, preferably, the identifying and digesting steps of the zero pronoun specifically include:
dividing sentences to be classified by adopting full-mode crust segmentation to obtain candidate antecedent sets;
performing feature learning according to the zero pronoun text by using a first cyclic neural network to obtain zero pronoun text vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word in each candidate antecedent by using a general attention model, performing weighted average on the vector of each word according to the attention to obtain the candidate antecedent representation, splicing the candidate antecedent representation and the zero pronoun text vector representation together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent or not by using the first feedforward neural network;
and performing feature learning according to the context of the zero pronoun by using a second cyclic neural network to obtain a zero pronoun context vector representation, simultaneously performing attention calculation on each word in each candidate antecedent by using a general attention model calculation, performing weighted average on the vector of each word according to the attention to obtain a representation of the candidate antecedent, splicing the representation of the candidate antecedent and the context vector representation of the zero pronoun together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent by using a second feedforward neural network.
In the implementation step when the processing system is executed by the processor, the syntactic analysis of the extended sentence is implemented by adopting the syntactic analysis function in the Stanford NLP tool, the syntactic analysis of the extended sentence obtained after the zero pronouncing is resolved to obtain a syntactic structure tree, and the parallel relation items in the extended sentence are extracted.
Correspondingly, the invention also provides a multi-label classification method, which comprises the following steps:
and (3) identifying and resolving zero-pronouns:
recognizing and resolving zero pronouns of the sentences to be classified to obtain expanded sentences, wherein the zero pronouns are recognizable phrases or blank spaces of words in the sentences to be classified;
sentence splitting:
carrying out syntactic analysis on the extended sentences, and extracting parallel relation items in the extended sentences; splitting the expansion sentence through replacement or marking training to form a plurality of split sentences;
or pertinently designing corpus labels, manually marking parallel relation items and other items in the digested expansion sentences, training a Bi-LSTM-CRF model for splitting sentences, and classifying and splitting the expansion sentences by using the trained Bi-LSTM-CRF model to form a plurality of split sentences; the other items include a shared item and a deleted item.
Further, the multi-label classification method further comprises,
an intention recognition step: and respectively inputting a plurality of split sentences obtained in the sentence splitting step as a model for single intention recognition to obtain a plurality of intentions.
Further, optionally, the identifying and digesting step of the zero pronoun specifically includes:
dividing sentences to be classified by adopting full-mode crust segmentation to obtain candidate antecedent sets;
performing feature learning according to the zero pronoun text by using a first cyclic neural network to obtain zero pronoun text vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word in each candidate antecedent by using a general attention model, performing weighted average on the vector of each word according to the attention to obtain the candidate antecedent representation, splicing the candidate antecedent representation and the zero pronoun text vector representation together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent or not by using the first feedforward neural network;
and performing feature learning according to the context of the zero pronoun by using a second cyclic neural network to obtain a zero pronoun context vector representation, simultaneously performing attention calculation on each word in each candidate antecedent by using a general attention model calculation, performing weighted average on the vector of each word according to the attention to obtain a representation of the candidate antecedent, splicing the representation of the candidate antecedent and the context vector representation of the zero pronoun together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent by using a second feedforward neural network.
In the multi-label classification method, the syntax analysis of the extended sentences is implemented by adopting a syntax analysis function in a Stanford NLP tool, the syntax analysis of the extended sentences obtained after the zero pronoun digestion is implemented to obtain a syntax structure tree, and parallel relation items in the extended sentences are extracted.
The invention also provides a computer readable storage medium having stored thereon a processing system which when executed by a processor implements the steps of the multi-label classification method described above.
The beneficial effects of the invention are as follows: the multi-label sentence sample to be classified is split into the effective single-label sentence sample set, so that the trained single-label classification model can be effectively utilized to conduct multi-label prediction on the premise of not damaging prediction precision, and the problem that the distribution of the prediction sample is inconsistent with that of the training sample is avoided. The method is beneficial to saving development cost and training cost of a large number of multi-label classification algorithms in industrial application, effectively integrating existing resources and furthest exerting the use of the existing single-label training data and models. In addition, the invention has expandability and can meet the requirement of quick feedback of fast-changing markets in industrial application. For example, a new demand label appears in the market, and only the corresponding single label data of the demand label is collected for modeling training and can be added into the multi-label classification system without retraining a multi-label model. The model can also be used for conveniently and rapidly transplanting the excellent open source classification model of other people, and the model can be grafted after the model is thoroughly researched.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a schematic diagram of an electronic device according to the present invention;
fig. 2 is a flow chart of the multi-label classification method according to the present invention.
In an embodiment of fig. 3, the expanded sentence obtained after the zero-pronoun digestion is subjected to syntactic analysis to obtain a schematic diagram of a syntactic structure tree;
FIG. 4 is a schematic diagram of classification splitting by Bi-LSTM-CRF model according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical schemes and beneficial effects to be solved more clear and obvious, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature.
The invention provides an electronic device, which is an electronic device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction. The electronic device comprises an electronic computer, a single server, a server group formed by a plurality of servers or a cloud server formed by a large number of hosts or servers based on cloud computing. As shown in fig. 1, in an embodiment of the present invention, the electronic device includes, but is not limited to, a memory 2 and a processor 1 connected to the memory 2, where the memory 2 stores a processing system that can run on the processor 1.
The memory 1 to which the present invention refers includes a memory and at least one type of readable storage medium. The Memory provides a buffer for the operation of the electronic device, and the readable storage medium includes, but is not limited to, various media capable of storing program codes, including, but not limited to, a usb (universal serial bus), a removable hard disk, a Read Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk.
The processor 1 to which the present invention refers may be a central processing unit or other data processing chip. The processor 1 is arranged to control the overall operation of the electronic device for running program code or processing data stored in the memory 2, such as running a processing system or the like.
The processing system, when executed by the processor 1, performs the steps of:
and (3) identifying and resolving zero-pronouns:
recognizing and resolving zero pronouns of the sentences to be classified to obtain expanded sentences, wherein the zero pronouns are recognizable phrases or blank spaces of words in the sentences to be classified;
sentence splitting:
carrying out syntactic analysis on the extended sentences, and extracting parallel relation items in the extended sentences; splitting the expansion sentence through replacement or marking training to form a plurality of split sentences;
or pertinently designing corpus labels, manually marking parallel relation items and other items in the digested expansion sentences, training a Bi-LSTM-CRF model for splitting sentences, and classifying and splitting the expansion sentences by using the trained Bi-LSTM-CRF model to form a plurality of split sentences; the other items include a shared item and a deleted item.
Further, the processing system of the electronic device further implements an intention recognition step when executed by the processor 1, the intention recognition step: and respectively inputting a plurality of split sentences obtained in the sentence splitting step as a model for single intention recognition to obtain a plurality of intentions.
In one embodiment, the identifying and resolving steps of the zero-pronoun preferably include:
dividing sentences to be classified by adopting full-mode crust segmentation to obtain candidate antecedent sets;
performing feature learning according to the zero pronoun text by using a first cyclic neural network to obtain zero pronoun text vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word in each candidate antecedent by using a general attention model, performing weighted average on the vector of each word according to the attention to obtain the candidate antecedent representation, splicing the candidate antecedent representation and the zero pronoun text vector representation together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent or not by using the first feedforward neural network;
and performing feature learning according to the context of the zero pronoun by using a second cyclic neural network to obtain a zero pronoun context vector representation, simultaneously performing attention calculation on each word in each candidate antecedent by using a general attention model, performing weighted average on the vector of each word according to the attention to obtain a representation of the candidate antecedent, splicing the representation of the candidate antecedent and the context vector representation of the zero pronoun together, calculating the probability of whether the candidate antecedent is the zero pronoun antecedent by using a second feedforward neural network, and putting the candidate antecedent with the maximum digestion probability into a vacancy of the corresponding zero pronoun in an original sentence to obtain a sentence after zero pronoun digestion.
In the implementation step when the processing system is executed by the processor, the syntactic analysis of the extended sentence is implemented by adopting the syntactic analysis function in the Stanford NLP tool, the syntactic analysis of the extended sentence obtained after the zero pronouncing is resolved to obtain a syntactic structure tree, and the parallel relation items in the extended sentence are extracted.
In addition, the invention also provides a multi-label classification method, as shown in fig. 2, comprising the following steps:
step S1, identifying and resolving zero pronouns:
recognizing and resolving zero pronouns of the sentences to be classified to obtain expanded sentences, wherein the zero pronouns are recognizable phrases or blank spaces of words in the sentences to be classified;
for example, a sentence to be classified: "I want to visit and stroll with girl friends to Beijing hometown museum. "segmentation obtains candidate antecedent sets: i want, and, girl friends, go, beijing palace museum, visiting, and strolling
Step S2, sentence splitting step:
carrying out syntactic analysis on the extended sentences, and extracting parallel relation items in the extended sentences; splitting the expansion sentence through replacement or marking training to form a plurality of split sentences;
it should be noted that, the traditional zero pronoun refers to a grammar space of a recognizable noun phrase, but in the present invention, for practical requirement, the zero pronoun refers to not only a noun phrase, but also words or phrases with various parts of speech. Such as sentences to be classified: "ask you what price is you dehairing lips and armpits? "zero-pronoun follows" lip "in the sentence to be classified, which refers to the verb phrase" unhairing ". The word "unhairing" for a zero-pronoun is a precursor word to that zero-pronoun. It follows that the antecedent may appear after the zero pronoun.
Further, the multi-label classification method further comprises,
step S3, an intention recognition step: and respectively inputting a plurality of split sentences obtained in the sentence splitting step as a model for single intention recognition to obtain a plurality of intentions.
Further, optionally, the identifying and digesting step of the zero pronoun specifically includes:
dividing sentences to be classified by adopting full-mode crust segmentation to obtain candidate antecedent sets;
performing feature learning according to the zero pronoun text by using a first cyclic neural network to obtain zero pronoun text vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word in each candidate antecedent by using a general attention model, performing weighted average on the vector of each word according to the attention to obtain the candidate antecedent representation, splicing the candidate antecedent representation and the zero pronoun text vector representation together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent or not by using the first feedforward neural network;
and performing feature learning according to the context of the zero pronoun by using a second cyclic neural network to obtain a zero pronoun context vector representation, simultaneously performing attention calculation on each word in each candidate antecedent by using a general attention model calculation, performing weighted average on the vector of each word according to the attention to obtain a representation of the candidate antecedent, splicing the representation of the candidate antecedent and the context vector representation of the zero pronoun together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent by using a second feedforward neural network.
The candidate antecedent of the invention refers to the word obtained by segmenting the sentence to be classified, and the granularity of the candidate antecedent is not determined by adopting the technical scheme, so that the invention preferably adopts a full-mode segmentation mode, and the full-mode fully considers various granularities of the sentence to be classified segmentation and considers the possibility of the candidate antecedent as much as possible.
In the multi-label classification method, the syntax analysis of the extended sentences is implemented by adopting a syntax analysis function in a Stanford NLP tool, the syntax analysis of the extended sentences obtained after the zero pronoun digestion is implemented to obtain a syntax structure tree, and parallel relation items in the extended sentences are extracted.
For example: statement to be classified: "ask you what price is you dehairing lips and armpits? The syntax analysis of the extended sentence is to adopt the syntax analysis function in the Stanford NLP tool to obtain the syntax structure tree by the syntax analysis of the extended sentence obtained after the zero pronoun is resolved as shown in figure 3,
the parallel relation indicator in the sentence to be classified is 'and', and the parallel relation item is 'lip' and 'armpit'. Next, the parallel relation item is respectively replaced with the parallel relation instruction word and all corresponding parallel relation item parts to obtain a split sentence 1 and a split sentence 2, wherein the split sentence 1: ask you what price to dehairing lips? Splitting sentence 2: ask what price you are in underarm dehairing?
In another embodiment of the present invention, there is provided a multi-tag classification method, including:
and (3) identifying and resolving zero-pronouns: recognizing and resolving zero pronouns of the sentences to be classified to obtain expanded sentences, wherein the zero pronouns are recognizable phrases or blank spaces of words in the sentences to be classified;
sentence splitting: the corpus labeling is designed pertinently, the manual labeling refers to parallel relation items and other items in the digested expansion sentences, a Bi-LSTM-CRF model for splitting sentences is trained, and the trained Bi-LSTM-CRF model is used for classifying and splitting the expansion sentences to form a plurality of split sentences; the other items include a shared item and a deleted item. The shared item is an original sentence part which is reserved in two split sentences, the deleted item is an original sentence part which is not reserved in two split sentences, and the parallel relation item is an original sentence part which is reserved in two split sentences respectively. Statement to be classified: "I want the arms and lower legs dehairing". "refer to parallel relationship items in digested extended sentences by artificial marking: "arm", "lower leg" and their shares: "I", "want", a "dehairing". "and delete item" and ". And classifying and splitting the extended sentences by using a trained Bi-LSTM-CRF model to form a split sentence 1: "I want the arm dehairing". ", split sentence 2: "I want the lower leg to dehairing". "wherein, the Bi-LSTM-CRF model is shown in FIG. 4, a word vector (word pattern) is transferred into a Bi-directional long-short-term memory model (Bi-LSTM). li token i and its context, ri token i and its context, and concatenating these two token vectors to generate a vector ci of token i and its context. According to ci, the non-normalized probability of mapping each word to the corresponding mark is obtained through the full connection layer, and finally a mark sequence corresponding to the maximum probability of each sentence is selected through the CRF layer.
In addition, the invention also provides a computer readable storage medium, on which a processing system is stored, where the processing system implements the steps of the multi-label classification method described above when executed by a processor, and the steps of the multi-label classification method are not described herein.
According to the method, the multi-label sentence sample to be classified is split into the effective single-label sentence sample set, so that the trained single-label classification model can be effectively utilized to conduct multi-label prediction on the premise of not damaging prediction precision, and the problem that the distribution of the prediction sample is inconsistent with that of the training sample is avoided. The method is beneficial to saving development cost and training cost of a large number of multi-label classification algorithms in industrial application, effectively integrating existing resources and furthest exerting the use of the existing single-label training data and models. In addition, the invention has expandability and can meet the requirement of quick feedback of fast-changing markets in industrial application. For example, a new demand label appears in the market, and only the corresponding single label data of the demand label is collected for modeling training and can be added into the multi-label classification system without retraining a multi-label model. The model can also be used for conveniently and rapidly transplanting the excellent open source classification model of other people, and the model can be grafted after the model is thoroughly researched.
The foregoing description describes preferred embodiments of the present invention, but it is to be understood that the invention is not limited to those described above and is not to be construed as excluding other embodiments. Numerous variations, changes, substitutions and alterations are now apparent to those skilled in the art without departing from the principles and spirit of the invention in light of the above teachings and prior art and knowledge.
Claims (5)
1. An electronic device, which is characterized in that,
the electronic device comprises a memory and a processor connected with the memory, wherein a processing system capable of running on the processor is stored in the memory, and the processing system realizes the following steps when being executed by the processor:
and (3) identifying and resolving zero-pronouns:
recognizing and resolving zero pronouns of sentences to be classified to obtain expanded sentences, wherein the zero pronouns are recognizable phrases or word gaps in the sentences to be classified, and refer to noun phrases and words or phrases with other various parts of speech;
the identifying and digesting step of the zero pronoun specifically comprises the following steps:
dividing sentences to be classified by adopting full-mode crust segmentation to obtain candidate antecedent sets;
performing feature learning according to the zero pronoun text by using a first cyclic neural network to obtain zero pronoun text vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word in each candidate antecedent by using a general attention model, performing weighted average on the vector of each word according to the attention to obtain the candidate antecedent representation, splicing the candidate antecedent representation and the zero pronoun text vector representation together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent or not by using the first feedforward neural network;
performing feature learning according to the context of the zero pronoun by using a second cyclic neural network to obtain a zero pronoun context vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word according to the general attention model, performing weighted average on the vector of each word according to the attention to obtain a representation of the candidate antecedent, splicing the representation of the candidate antecedent and the context vector representation of the zero pronoun together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent by using a second feedforward neural network;
sentence splitting:
carrying out syntactic analysis on the extended sentences, and extracting parallel relation items in the extended sentences; splitting the expansion sentence through replacement or marking training to form a plurality of split sentences;
or pertinently designing corpus labels, manually marking parallel relation items and other items in the digested expansion sentences, training a Bi-LSTM-CRF model for splitting sentences, and classifying and splitting the expansion sentences by using the trained splitting Bi-LSTM-CRF model to form a plurality of split sentences; the other items comprise a sharing item and a deleting item;
the processing system when executed by the processor also implements an intent recognition step,
the intention recognition step: and respectively inputting a plurality of split sentences obtained in the sentence splitting step as a model for single intention recognition to obtain a plurality of intentions.
2. The electronic device of claim 1, wherein the electronic device comprises a plurality of electronic devices,
the syntactic analysis of the extended sentence is to use the syntactic analysis function in the Stanford NLP tool to carry out syntactic analysis on the extended sentence obtained after the zero pronoun is resolved to obtain a syntactic structure tree, and the parallel relation items in the extended sentence are extracted.
3. A multi-tag classification method, comprising:
and (3) identifying and resolving zero-pronouns:
recognizing and resolving zero pronouns of sentences to be classified to obtain expanded sentences, wherein the zero pronouns are recognizable phrases or word gaps in the sentences to be classified, and refer to noun phrases and words or phrases with other various parts of speech;
the identifying and digesting step of the zero pronoun specifically comprises the following steps:
dividing sentences to be classified by adopting full-mode crust segmentation to obtain candidate antecedent sets;
performing feature learning according to the zero pronoun text by using a first cyclic neural network to obtain zero pronoun text vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word in each candidate antecedent by using a general attention model, performing weighted average on the vector of each word according to the attention to obtain the candidate antecedent representation, splicing the candidate antecedent representation and the zero pronoun text vector representation together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent or not by using the first feedforward neural network;
performing feature learning according to the context of the zero pronoun by using a second cyclic neural network to obtain a zero pronoun context vector representation, calculating the attention of each word in each candidate antecedent by using a general attention model, performing attention calculation on each word according to the general attention model, performing weighted average on the vector of each word according to the attention to obtain a representation of the candidate antecedent, splicing the representation of the candidate antecedent and the context vector representation of the zero pronoun together, and calculating the probability of whether the candidate antecedent is the zero pronoun antecedent by using a second feedforward neural network;
sentence splitting:
carrying out syntactic analysis on the extended sentences, and extracting parallel relation items in the extended sentences; splitting the expansion sentence through replacement or marking training to form a plurality of split sentences;
or pertinently designing corpus labels, manually marking parallel relation items and other items in the digested expansion sentences, training a Bi-LSTM-CRF model for splitting sentences, and classifying and splitting the expansion sentences by using the trained splitting Bi-LSTM-CRF model to form a plurality of split sentences; the other items comprise a sharing item and a deleting item;
the multi-tag classification method further includes,
an intention recognition step: and respectively inputting a plurality of split sentences obtained in the sentence splitting step as a model for single intention recognition to obtain a plurality of intentions.
4. The multi-label classification method according to claim 3, characterized in that,
the syntactic analysis of the extended sentence is to use the syntactic analysis function in the Stanford NLP tool to carry out syntactic analysis on the extended sentence obtained after the zero pronoun is resolved to obtain a syntactic structure tree, and the parallel relation items in the extended sentence are extracted.
5. A computer-readable storage medium comprising,
the computer readable storage medium has stored thereon a processing system which when executed by a processor implements the steps of the multi-label classification method according to any of claims 3 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811529912.2A CN109783801B (en) | 2018-12-14 | 2018-12-14 | Electronic device, multi-label classification method and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811529912.2A CN109783801B (en) | 2018-12-14 | 2018-12-14 | Electronic device, multi-label classification method and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109783801A CN109783801A (en) | 2019-05-21 |
CN109783801B true CN109783801B (en) | 2023-08-25 |
Family
ID=66496196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811529912.2A Active CN109783801B (en) | 2018-12-14 | 2018-12-14 | Electronic device, multi-label classification method and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109783801B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674630B (en) * | 2019-09-24 | 2023-03-21 | 北京明略软件系统有限公司 | Reference resolution method and device, electronic equipment and storage medium |
CN111400438A (en) * | 2020-02-21 | 2020-07-10 | 镁佳(北京)科技有限公司 | Method and device for identifying multiple intentions of user, storage medium and vehicle |
CN112256868A (en) * | 2020-09-30 | 2021-01-22 | 华为技术有限公司 | Zero-reference resolution method, method for training zero-reference resolution model and electronic equipment |
CN112214992A (en) * | 2020-10-14 | 2021-01-12 | 哈尔滨福涛科技有限责任公司 | Deep learning and rule combination based narrative structure analysis method |
CN113392629B (en) * | 2021-06-29 | 2022-10-28 | 哈尔滨工业大学 | Human-term pronoun resolution method based on pre-training model |
CN113850078B (en) * | 2021-09-29 | 2024-06-18 | 平安科技(深圳)有限公司 | Multi-intention recognition method, equipment and readable storage medium based on machine learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005025659A (en) * | 2003-07-01 | 2005-01-27 | Nippon Telegr & Teleph Corp <Ntt> | Zero pronoun resolving method, device and program, and recording medium to which the program is recorded |
CN102880645A (en) * | 2012-08-24 | 2013-01-16 | 上海云叟网络科技有限公司 | Semantic intelligent search method |
CN103440252A (en) * | 2013-07-25 | 2013-12-11 | 北京师范大学 | Method and device for extracting parallel information in Chinese sentence |
JP2015049545A (en) * | 2013-08-29 | 2015-03-16 | 株式会社ジャストシステム | Promoted questionnaire program and questionnaire system |
CN105988990A (en) * | 2015-02-26 | 2016-10-05 | 索尼公司 | Device and method for resolving zero anaphora in Chinese language, as well as training method |
CN106294322A (en) * | 2016-08-04 | 2017-01-04 | 哈尔滨工业大学 | A kind of Chinese based on LSTM zero reference resolution method |
CN107885844A (en) * | 2017-11-10 | 2018-04-06 | 南京大学 | Automatic question-answering method and system based on systematic searching |
CN108563790A (en) * | 2018-04-28 | 2018-09-21 | 科大讯飞股份有限公司 | A kind of semantic understanding method and device, equipment, computer-readable medium |
-
2018
- 2018-12-14 CN CN201811529912.2A patent/CN109783801B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005025659A (en) * | 2003-07-01 | 2005-01-27 | Nippon Telegr & Teleph Corp <Ntt> | Zero pronoun resolving method, device and program, and recording medium to which the program is recorded |
CN102880645A (en) * | 2012-08-24 | 2013-01-16 | 上海云叟网络科技有限公司 | Semantic intelligent search method |
CN103440252A (en) * | 2013-07-25 | 2013-12-11 | 北京师范大学 | Method and device for extracting parallel information in Chinese sentence |
JP2015049545A (en) * | 2013-08-29 | 2015-03-16 | 株式会社ジャストシステム | Promoted questionnaire program and questionnaire system |
CN105988990A (en) * | 2015-02-26 | 2016-10-05 | 索尼公司 | Device and method for resolving zero anaphora in Chinese language, as well as training method |
CN106294322A (en) * | 2016-08-04 | 2017-01-04 | 哈尔滨工业大学 | A kind of Chinese based on LSTM zero reference resolution method |
CN107885844A (en) * | 2017-11-10 | 2018-04-06 | 南京大学 | Automatic question-answering method and system based on systematic searching |
CN108563790A (en) * | 2018-04-28 | 2018-09-21 | 科大讯飞股份有限公司 | A kind of semantic understanding method and device, equipment, computer-readable medium |
Non-Patent Citations (1)
Title |
---|
基于语义结构分析的汉语零代词消解;曹军,周经野,肖赤心;湘潭大学自然科学学报(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109783801A (en) | 2019-05-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109783801B (en) | Electronic device, multi-label classification method and storage medium | |
US10417350B1 (en) | Artificial intelligence system for automated adaptation of text-based classification models for multiple languages | |
CN109543181B (en) | Named entity model and system based on combination of active learning and deep learning | |
CN110827929B (en) | Disease classification code recognition method and device, computer equipment and storage medium | |
CN110298035B (en) | Word vector definition method, device, equipment and storage medium based on artificial intelligence | |
CN110688854B (en) | Named entity recognition method, device and computer readable storage medium | |
CN110705206B (en) | Text information processing method and related device | |
CN109190110A (en) | A kind of training method of Named Entity Extraction Model, system and electronic equipment | |
CN111428493A (en) | Entity relationship acquisition method, device, equipment and storage medium | |
CN112464662B (en) | Medical phrase matching method, device, equipment and storage medium | |
CN110569332B (en) | Sentence feature extraction processing method and device | |
CN111143571B (en) | Entity labeling model training method, entity labeling method and device | |
WO2022222300A1 (en) | Open relationship extraction method and apparatus, electronic device, and storage medium | |
CN112052684A (en) | Named entity identification method, device, equipment and storage medium for power metering | |
CN112188311B (en) | Method and apparatus for determining video material of news | |
CN103823857A (en) | Space information searching method based on natural language processing | |
CN111401065A (en) | Entity identification method, device, equipment and storage medium | |
CN113326380A (en) | Equipment measurement data processing method, system and terminal based on deep neural network | |
CN113723077B (en) | Sentence vector generation method and device based on bidirectional characterization model and computer equipment | |
CN109657052B (en) | Method and device for extracting fine-grained knowledge elements contained in paper abstract | |
CN112989043B (en) | Reference resolution method, reference resolution device, electronic equipment and readable storage medium | |
CN114416976A (en) | Text labeling method and device and electronic equipment | |
CN112199954A (en) | Disease entity matching method and device based on voice semantics and computer equipment | |
CN110851597A (en) | Method and device for sentence annotation based on similar entity replacement | |
CN114398482A (en) | Dictionary construction method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |