CN109753660B - LSTM-based winning bid web page named entity extraction method - Google Patents
LSTM-based winning bid web page named entity extraction method Download PDFInfo
- Publication number
- CN109753660B CN109753660B CN201910013185.2A CN201910013185A CN109753660B CN 109753660 B CN109753660 B CN 109753660B CN 201910013185 A CN201910013185 A CN 201910013185A CN 109753660 B CN109753660 B CN 109753660B
- Authority
- CN
- China
- Prior art keywords
- word
- winning
- lstm
- text
- bid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention relates to a named entity identification method of winning bid data, which comprises the following steps: cleaning the text data of the winning bid webpage to obtain winning bid text; using a Lattice-LSTM as a coding layer to obtain semantic information characteristics of text data; performing entity marking on each word by using the LSTM as a decoding layer, and marking entity information in a sentence sequence; performing rule correction and formatting; and finally, outputting the identified named entity of the winning web page. The invention can efficiently identify the named entity in the detail page of the bid-winning item of the bid-winning website based on the Lattice-LSTM-LSTM model.
Description
Technical Field
The invention relates to the technical field of named entity recognition, in particular to a method for extracting named entities of a winning web page based on LSTM.
Background
Named entity recognition is one of the fundamental tasks of natural language processing. The method aims at identifying name entities such as person names, place names, organization names and the like in the corpus. Because of the ever-increasing number of these named entities, it is often not possible to list them in a dictionary in an exhaustive manner and their constituent methods have some regularity of each, recognition of these words is often handled independently from the task of lexical morphological processing (e.g., chinese segmentation), known as named entity recognition.
As a basic task of natural language processing, related studies of named entity recognition have attracted close attention of more experts and scholars, and some optimization algorithms and models have been proposed. A scholars puts forward a named entity recognition algorithm based on a stacked HMM model, firstly, recognizing a person name and a place name, and then, performing high-level organization name recognition as a characteristic; the learner puts forward a Chinese named entity recognition algorithm based on a conditional random field, and obtains characters, boundaries, parts of speech and entity dictionaries as characteristics, so that a good effect can be obtained; a learner puts forward a bootstrapping-based method, and the bootstrapping technology is utilized to expand a seed word list to solve the problem of insufficient manual annotation data; a scholars put forward a named entity recognition algorithm based on a neural network structure of BLSTM, the method does not directly depend on artificial features and domain knowledge any more, but utilizes word vectors based on context and word vectors based on words, the former expresses context information of named entities, and the latter expresses prefix, suffix and domain information forming the named entities; a learner puts forward a named entity recognition algorithm based on a BLSTM-CRF model, when sequence labeling is carried out on sentences, labels among words are not independent, label information of the previous words is considered, tag of the current word is further marked by combining with the information of the words, CRF replaces output from the layer by using softmax, and final prediction of each word is generated; the scholars put forward a deep neural network model based on a stacked self-coding classifier, solves the problem of conversion from a Chinese text sequence to a model input vector, and puts forward a vectorization forward-backward propagation formula which is convenient for engineering realization.
Most of named entity recognition algorithms at present recognize names of people, places and institutions, do not divide the names of people, places and institutions further, and have poor recognition effect on long entities.
Disclosure of Invention
Therefore, the invention aims to provide the LSTM-based bid-winning web page named entity extraction method, which can rapidly and effectively identify named entities in bid-winning item detail pages of bidding websites.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a method for extracting a winning bid web page named entity based on LSTM specifically comprises the following steps:
step A: cleaning text data of the bid-winning web page to be extracted to obtain a bid-winning text;
and (B) step (B): taking a Lattice-LSTM model as a coding layer, and taking a winning bid text as input of the coding layer to obtain semantic information characteristics of the winning bid text;
step C: taking the LSTM model as a decoding layer, and taking the semantic information characteristics of the obtained winning bid text as the input of the decoding layer, and marking each word in the winning bid text;
step D: performing rule correction and formatting treatment on the obtained marked winning bid text;
step E: and outputting the identified named entity.
Further, the step B specifically includes:
step B1: converting words in the winning bid text into word vectors;
wherein for the j-th word c in the winning text j Conversion to word vectorsThe calculation formula is as follows:
wherein e c Representing a character vector mapping table.
Step B2: converting words in the winning bid text into word vectors;
step B3: and inputting the word vector into a Lattice-LSTM model, and obtaining semantic information characteristics of the winning bid text by using the Lattice-LSTM model.
Further, the step B2 specifically includes:
step B21: constructing a word list D by using a Tire tree according to the large-scale corpus;
step B22: initializing an empty matching word set P of the winning bid text;
step B23: traversing the first word of the winning bid text as the current word, and executing step B24;
step B24: word list D is matched with the word with the current word as the initial wordAdding into the collection P;
wherein b represents the position of the first word of the word in the sentence, e represents the position of the last word of the word in the sentence;
step B25, taking the next character of the current character as the current character, and iteratively executing the step B24 until the last character of the winning bid text is finished;
step B26: after the traversal is finished, the data in the set P is collectedConversion to word vector->The calculation formula is as follows:
wherein e w Is a word vector mapping table.
Further, the step B3 specifically includes the following steps:
for each sentence in the text, sequentially inputting the word vector sequence obtained in the step B1And the word vector sequence obtained in the step B2 +.>In the Lattice-LSTM model, a vector representation sequence of semantic information of each word context is output, and a specific calculation formula is as follows:
is the word vector of the j-th word in the sentence,/>Is a word vector of words ending with the j-th word in the sentence,/for example>The output of the moment j; />Weight matrix for word level LSTM, +.> Bias terms for word level LSTM; />Is the forget gate of the word level LSTM at the moment j; />Is the input gate of the word level LSTM at the moment j; />Is a candidate memory vector of the word level LSTM at the moment j; />Is the memory vector of the word level LSTM at the moment j;weight matrix for character level LSTM, +.>Bias terms that are character level LSTM; />Is the input gate of the character level LSTM at the moment j; />Is a candidate memory vector of the word level LSTM at the moment j; />Is the memory vector of the word level LSTM at the moment j; />Is the output gate of the word level LSTM at the moment j; /> Is to calculate->Weight at that time.
Further, the step C specifically includes:
step C1: aiming at the named entity recognition task of the winning web page, dividing words in the data into two types;
wherein the first class represents words that are not related to the entity, denoted by the label "O"; the second class represents words associated with an entity, and the tags of this class consist of three parts:
step C2: hiding state information which is obtained in the step B and can represent semantic information of textThe output state of each character under the influence of the context character is calculated in the LSTM model input to the decoding layer, and the specific calculation formula is as follows:
step C3: vector of labelsInputting the text into a Softmax classifier, carrying out normalization operation on the text, and calculating the probability that each word in the text is marked as various labels, wherein the specific formula is as follows:
wherein W is y As a weight matrix, b y As bias term, N t The number of kinds of labels;
step C4: the log likelihood function is used as a loss function, model parameters are updated by using back propagation iteration through a random gradient descent optimization method, so that the loss function is minimized to train a model, and a specific calculation formula is as follows:
wherein D represents the size of the training set, L j Is the length of the sentence x and,is the character t in sentence x j Is->Is the normalized probability, Θ represents the model parameter, I (O) is a selection function to distinguish the loss of label 'O' from the loss of label that can indicate the entity, the specific calculation formula is as follows:
further, the named entity comprises a bid-winning organization, a region where the bid-winning organization is located, a bid-winning amount, a bid-winning organization contact person and a bid-winning project name, and a bid-winning time.
Further, the step D specifically includes:
step D1: c, carrying out regular correction processing on the marked data obtained in the step C;
step D2: and formatting the corrected data.
Further, the step D1 specifically includes:
and D11, judging whether the entity has Arabic numerals or Chinese capital numerals by adopting a regular expression mode for the winning amount, and if not, judging that the entity is not the winning amount and discarding the winning amount.
Step D12: for the winning time, the judgment is not a rejection of the date composition mode.
Step D13: for the project name, since the character string length of the project name entity is generally long, the situation that only two or three words are formed basically does not occur, and therefore, the entity with the character string length of the identified project name being less than 4 is discarded.
Step D14: and only the named entity with the longest character string length is reserved when one label data appears for a plurality of times in the same category.
Further, in the step D2, the formatting process is performed on the named entity, which specifically includes the following steps:
step D21, judging whether the entity contains units of hundred, bai, qian, wan, yi, and U.S. dollars and Japanese, and if so, converting the units;
step D22: for the winning time, the conversion is performed in the form of date format YYYY-MM-DD.
Compared with the prior art, the invention has the following beneficial effects:
the invention is based on the Lattice-LSTM-LSTM model, can efficiently identify the named entity in the detail page of the bid-winning item of the bid-winning website, and can well identify the long entity.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
Referring to fig. 1, the invention provides a method for extracting named entities of a winning bid web page based on LSTM, which specifically includes the following steps:
step A: cleaning text data of the bid-winning web page to be extracted to obtain a bid-winning text;
and (B) step (B): taking a Lattice-LSTM model as a coding layer, and taking a winning bid text as input of the coding layer to obtain semantic information characteristics of the winning bid text;
step B1: converting words in the winning bid text into word vectors;
wherein for the j-th word c in the winning text j Conversion to word vectorsThe calculation formula is as follows:
wherein e c Representing a character vector mapping table.
Step B2: converting words in the winning bid text into word vectors;
step B21: constructing a word list D by using a Tire tree according to the large-scale corpus;
step B22: initializing an empty matching word set P of the winning bid text;
step B23: traversing the first word of the winning bid text as the current word, and executing step B24;
step B24: word list D is matched with the word with the current word as the initial wordAdding into the collection P;
wherein b represents the position of the first word of the word in the sentence, e represents the position of the last word of the word in the sentence;
step B25, taking the next character of the current character as the current character, and iteratively executing the step B24 until the last character of the winning bid text is finished;
step B26: after the traversal is finished, the data in the set P is collectedConversion to word vector->The calculation formula is as follows:
wherein e w Is a word vector mapping table.
Step B3: and inputting the word vector into a Lattice-LSTM model, and obtaining semantic information characteristics of the winning bid text by using the Lattice-LSTM model.
For each sentence in the text, sequentially inputting the word vector sequence obtained in the step B1And the word vector sequence obtained in the step B2 +.>In the Lattice-LSTM model, a vector representation sequence of semantic information of each word context is output, and a specific calculation formula is as follows:
is the word vector of the j-th word in the sentence,/>Is a word vector of words ending with the j-th word in the sentence,/for example>The output of the moment j; />Weight matrix for word level LSTM, +.>Bias terms for word level LSTM; />Is the forget gate of the word level LSTM at the moment j; />Is the input gate of the word level LSTM at the moment j;is a candidate memory vector of the word level LSTM at the moment j; />Is the memory vector of the word level LSTM at the moment j;weight matrix for character level LSTM, +.>Bias terms that are character level LSTM; />Is the input gate of the character level LSTM at the moment j; />Is a candidate memory vector of the word level LSTM at the moment j; />Is the memory vector of the word level LSTM at the moment j; />Is the output gate of the word level LSTM at the moment j; /> Is to calculate->Weight at that time.
Step C: taking the LSTM model as a decoding layer, and taking the semantic information characteristics of the obtained winning bid text as the input of the decoding layer, and marking each word in the winning bid text;
step C1: aiming at the named entity recognition task of the winning web page, dividing words in the data into two types;
wherein the first class represents words that are not related to the entity, denoted by the label "O"; the second class represents words associated with an entity, and the tags of this class consist of three parts:
step C2: hiding state information which is obtained in the step B and can represent semantic information of textThe output state of each character under the influence of the context character is calculated in the LSTM model input to the decoding layer, and the specific calculation formula is as follows:
step C3: vector of labelsInputting the text into a Softmax classifier, carrying out normalization operation on the text, and calculating the probability that each word in the text is marked as various labels, wherein the specific formula is as follows:
wherein W is y As a weight matrix, b y As bias term, N t The number of kinds of labels;
step C4: the log likelihood function is used as a loss function, model parameters are updated by using back propagation iteration through a random gradient descent optimization method, so that the loss function is minimized to train a model, and a specific calculation formula is as follows:
wherein D represents the size of the training set, L j Is the length of the sentence x and,is the character t in sentence x j Is->Is the normalized probability, Θ represents the model parameter, I (O) is a selection function to distinguish the loss of label 'O' from the loss of label that can indicate the entity, the specific calculation formula is as follows:
step D: performing rule correction and formatting treatment on the obtained marked winning bid text;
step D1: c, carrying out regular correction processing on the marked data obtained in the step C;
and D11, judging whether the entity has Arabic numerals or Chinese capital numerals by adopting a regular expression mode for the winning amount, and if not, judging that the entity is not the winning amount and discarding the winning amount.
Step D12: for the winning time, the judgment is not a rejection of the date composition mode.
Step D13: for the project name, since the character string length of the project name entity is generally long, the situation that only two or three words are formed basically does not occur, and therefore, the entity with the character string length of the identified project name being less than 4 is discarded.
Step D14: and only the named entity with the longest character string length is reserved when one label data appears for a plurality of times in the same category.
Step D2: and formatting the corrected data.
Step D21, judging whether the entity contains units of hundred, bai, qian, wan, yi, and U.S. dollars and Japanese, and if so, converting the units;
step D22: for the winning time, the conversion is performed in the form of date format YYYY-MM-DD.
Step E: and outputting the identified bidding institutions, the regions where the bidding institutions are located, the bid amount, the bidding institution contacts, the bidding project names and the named entities of the bid time.
The foregoing description is only of the preferred embodiments of the invention, and all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Claims (6)
1. The method for extracting the named entity of the winning bid web page based on the LSTM is characterized by comprising the following steps of:
step A: cleaning text data of the bid-winning web page to be extracted to obtain a bid-winning text;
and (B) step (B): taking a Lattice-LSTM model as a coding layer, and taking a winning bid text as input of the coding layer to obtain semantic information characteristics of the winning bid text;
step C: taking the LSTM model as a decoding layer, and taking the semantic information characteristics of the obtained winning bid text as the input of the decoding layer, and marking each word in the winning bid text;
step D: performing rule correction and formatting treatment on the obtained marked winning bid text;
step E: outputting the identified named entity;
the step B specifically comprises the following steps:
step B1: converting words in the winning bid text into word vectors;
wherein for the j-th word c in the winning text j Conversion to word vectorsThe calculation formula is as follows:
wherein e c Representing character vector mappingA table;
step B2: converting words in the winning bid text into word vectors;
step B3: inputting the word vector into a Lattice-LSTM model, and obtaining semantic information characteristics of the winning bid text by using the Lattice-LSTM model;
the step B3 specifically comprises the following steps:
for each sentence in the text, sequentially inputting the word vector sequence obtained in the step B1And the word vector sequence obtained in the step B2 +.>In the Lattice-LSTM model, a vector representation sequence of semantic information of each word context is output, and a specific calculation formula is as follows:
is the word vector of the j-th word in the sentence,/>Is a word vector of words ending with the j-th word in the sentence,/for example>The output of the moment j; />Weight matrix for word level LSTM, +.>Bias terms for word level LSTM; />Is the forget gate of the word level LSTM at the moment j; />Is the input gate of the word level LSTM at the moment j; />Is a candidate memory vector of the word level LSTM at the moment j; />Is the memory vector of the word level LSTM at the moment j;weight matrix for character level LSTM, +.>Bias terms that are character level LSTM; />Is the input gate of the character level LSTM at the moment j; />Is a candidate memory vector of the word level LSTM at the moment j; />Is the memory vector of the word level LSTM at the moment j; />Is the output gate of the word level LSTM at the moment j; /> Is to calculate->Weight at time;
the step C specifically comprises the following steps:
step C1: aiming at the named entity recognition task of the winning web page, dividing words in the data into two types;
wherein the first class represents words that are not related to the entity, denoted by the label "O"; the second class represents words associated with an entity, and the labels of the words in this class consist of three parts;
step C2: b, hiding state information of semantic information representing text obtained in the stepThe output state of each character under the influence of the context character is calculated in the LSTM model input to the decoding layer, and the specific calculation formula is as follows:
step C3: vector of labelsInputting the text into a Softmax classifier, carrying out normalization operation on the text, and calculating the probability that each word in the text is marked as various labels, wherein the specific formula is as follows:
wherein W is y As a weight matrix, b y As bias term, N t The number of kinds of labels;
step C4: the log likelihood function is used as a loss function, model parameters are updated by using back propagation iteration through a random gradient descent optimization method, so that the loss function is minimized to train a model, and a specific calculation formula is as follows:
wherein D represents the size of the training set, L j Is the length of the sentence x and,is the character t in sentence x j Is->Is the normalized probability, Θ represents the model parameter, I (O) is a selection function to distinguish the loss of label 'O' from the loss of label that can indicate the entity, the specific calculation formula is as follows:
2. the method for extracting the named entity of the winning web page based on the LSTM as recited in claim 1, wherein the step B2 specifically includes:
step B21: constructing a word list D by using a Tire tree according to the large-scale corpus;
step B22: initializing an empty matching word set P of the winning bid text;
step B23: traversing the first word of the winning bid text as the current word, and executing step B24;
step B24: word list D is matched with the word with the current word as the initial wordAdding into the collection P;
wherein b represents the position of the first word of the word in the sentence, e represents the position of the last word of the word in the sentence;
step B25, taking the next character of the current character as the current character, and iteratively executing the step B24 until the last character of the winning bid text is finished;
step B26: after the traversal is finished, the data in the set P is collectedConversion to word vector->The calculation formula is as follows:
wherein e w Is a word vector mapping table.
3. The LSTM-based bid-winning web page named entity extraction method of claim 1, wherein: the named entities comprise a bid-winning institution, a region where the bid-winning institution is located, a bid-winning amount, a bid-winning institution contact person, a bid-winning project name and a bid-winning time.
4. The method for extracting named entities from a winning web page based on LSTM as recited in claim 3, wherein step D is specifically:
step D1: c, carrying out regular correction processing on the marked data obtained in the step C;
step D2: and formatting the corrected data.
5. The LSTM-based extraction method of named entities of a winning web page according to claim 4, wherein the step D1 specifically includes:
step D11, judging whether an Arabic number or a Chinese capital number exists in the entity by adopting a regular expression mode for the winning amount, and if not, judging that the winning amount is not the winning amount and discarding the winning amount;
step D12: judging whether the winning time is the rejection of the date composition mode;
step D13: for the project name, as the character string length of the project name entity is generally longer, the condition that only two or three words are formed basically does not occur, and therefore, the entity with the character string length of the identified project name less than 4 is abandoned;
step D14: and only the named entity with the longest character string length is reserved when one label data appears for a plurality of times in the same category.
6. The method for extracting named entities from a winning web page based on LSTM in claim 4, wherein in step D2, the named entities are formatted, specifically comprising the following steps:
step D21, judging whether the entity contains units of hundred, bai, qian, wan, yi, and U.S. dollars and Japanese, and if so, converting the units;
step D22: for the winning time, the conversion is performed in the form of date format YYYY-MM-DD.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013185.2A CN109753660B (en) | 2019-01-07 | 2019-01-07 | LSTM-based winning bid web page named entity extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013185.2A CN109753660B (en) | 2019-01-07 | 2019-01-07 | LSTM-based winning bid web page named entity extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109753660A CN109753660A (en) | 2019-05-14 |
CN109753660B true CN109753660B (en) | 2023-06-13 |
Family
ID=66404567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910013185.2A Active CN109753660B (en) | 2019-01-07 | 2019-01-07 | LSTM-based winning bid web page named entity extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109753660B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334300A (en) * | 2019-07-10 | 2019-10-15 | 哈尔滨工业大学 | Text aid reading method towards the analysis of public opinion |
CN110738182A (en) * | 2019-10-21 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for high-precision identification of bid amount |
CN112017016A (en) * | 2019-10-29 | 2020-12-01 | 河南拓普计算机网络工程有限公司 | Method for cleaning bid amount of bid-attracting bulletin |
CN110738319A (en) * | 2019-11-11 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for recognizing bid-winning units based on CRF |
CN111078978B (en) * | 2019-11-29 | 2024-02-27 | 上海观安信息技术股份有限公司 | Network credit website entity identification method and system based on website text content |
CN111241832B (en) * | 2020-01-15 | 2023-08-15 | 北京百度网讯科技有限公司 | Core entity labeling method and device and electronic equipment |
CN111738002A (en) * | 2020-05-26 | 2020-10-02 | 北京信息科技大学 | Ancient text field named entity identification method and system based on Lattice LSTM |
CN111737969B (en) * | 2020-07-27 | 2020-12-08 | 北森云计算有限公司 | Resume parsing method and system based on deep learning |
CN112990845A (en) * | 2021-01-04 | 2021-06-18 | 江苏省测绘地理信息局信息中心 | Intelligent acquisition method for mapping market project |
CN112989807B (en) * | 2021-03-11 | 2021-11-23 | 重庆理工大学 | Long digital entity extraction method based on continuous digital compression coding |
CN112948588B (en) * | 2021-05-11 | 2021-07-30 | 中国人民解放军国防科技大学 | Chinese text classification method for quick information editing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107832400A (en) * | 2017-11-01 | 2018-03-23 | 山东大学 | A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification |
CN108416058A (en) * | 2018-03-22 | 2018-08-17 | 北京理工大学 | A kind of Relation extraction method based on the enhancing of Bi-LSTM input informations |
CN108509423A (en) * | 2018-04-04 | 2018-09-07 | 福州大学 | A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8370128B2 (en) * | 2008-09-30 | 2013-02-05 | Xerox Corporation | Semantically-driven extraction of relations between named entities |
-
2019
- 2019-01-07 CN CN201910013185.2A patent/CN109753660B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107832400A (en) * | 2017-11-01 | 2018-03-23 | 山东大学 | A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification |
CN108416058A (en) * | 2018-03-22 | 2018-08-17 | 北京理工大学 | A kind of Relation extraction method based on the enhancing of Bi-LSTM input informations |
CN108509423A (en) * | 2018-04-04 | 2018-09-07 | 福州大学 | A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM |
Non-Patent Citations (1)
Title |
---|
唐敏.基于深度学习的中文实体关系抽取方法研究.《万方数据学位论文库》.2018, * |
Also Published As
Publication number | Publication date |
---|---|
CN109753660A (en) | 2019-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109753660B (en) | LSTM-based winning bid web page named entity extraction method | |
CN110427623B (en) | Semi-structured document knowledge extraction method and device, electronic equipment and storage medium | |
CN108984526B (en) | Document theme vector extraction method based on deep learning | |
CN107729309B (en) | Deep learning-based Chinese semantic analysis method and device | |
CN109062893B (en) | Commodity name identification method based on full-text attention mechanism | |
CN104834747B (en) | Short text classification method based on convolutional neural networks | |
CN106980609A (en) | A kind of name entity recognition method of the condition random field of word-based vector representation | |
CN109359291A (en) | A kind of name entity recognition method | |
CN113392209B (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN113591483A (en) | Document-level event argument extraction method based on sequence labeling | |
CN111783394A (en) | Training method of event extraction model, event extraction method, system and equipment | |
CN110555084A (en) | remote supervision relation classification method based on PCNN and multi-layer attention | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN106611041A (en) | New text similarity solution method | |
CN110750646B (en) | Attribute description extracting method for hotel comment text | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
CN110134934A (en) | Text emotion analysis method and device | |
CN113255320A (en) | Entity relation extraction method and device based on syntax tree and graph attention machine mechanism | |
CN108763192B (en) | Entity relation extraction method and device for text processing | |
CN111444704B (en) | Network safety keyword extraction method based on deep neural network | |
CN111191031A (en) | Entity relation classification method of unstructured text based on WordNet and IDF | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN115510864A (en) | Chinese crop disease and pest named entity recognition method fused with domain dictionary | |
CN115269834A (en) | High-precision text classification method and device based on BERT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |