CN109753660A - A kind of acceptance of the bid webpage name entity abstracting method based on LSTM - Google Patents
A kind of acceptance of the bid webpage name entity abstracting method based on LSTM Download PDFInfo
- Publication number
- CN109753660A CN109753660A CN201910013185.2A CN201910013185A CN109753660A CN 109753660 A CN109753660 A CN 109753660A CN 201910013185 A CN201910013185 A CN 201910013185A CN 109753660 A CN109753660 A CN 109753660A
- Authority
- CN
- China
- Prior art keywords
- bid
- word
- acceptance
- lstm
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of name entity recognition method of data of getting the bid, include the following steps: that the text data to acceptance of the bid webpage cleans, and obtains acceptance of the bid text;The semantic information feature of text data is obtained using Lattice-LSTM as coding layer;Entity mark is carried out to each word as decoding layer using LSTM, marks the entity information in statement sequence;Carry out the correction and formatting processing of rule;The name entity for the acceptance of the bid webpage that finally output identifies.The present invention is based on Lattice-LSTM-LSTM models, can efficiently identify the name entity in the project winning a bid details page of bidding website.
Description
Technical field
The present invention relates to name entity recognition techniques fields, and in particular to a kind of acceptance of the bid webpage name based on LSTM is real
Body abstracting method.
Background technique
Name Entity recognition is a background task of natural language processing.The purpose is to identify name in corpus,
Name, institution term etc. name entity.Since these name physical quantities are continuously increased, it is often impossible to exhaustive in dictionary
It lists, and its constructive method has respective certain law, thus, usually the identification to these words from vocabulary form
Independent process in (such as Chinese word segmentation) task of managing, referred to as name Entity recognition.
As a background task of natural language processing, the correlative study of Entity recognition is named to attract much more
The close attention of expert and scholar, and propose some optimization algorithms and model.There is scholar to propose a kind of based on stacking HMM mould
The name entity identification algorithms of type, first identify name and place name, then carry out high-rise mechanism name as feature and know
Not;There is scholar to propose a kind of Chinese name entity identification algorithms based on condition random field, and obtains based on word, boundary, part of speech
Good effect can be got as feature with entity dictionary;There is scholar to propose a kind of method based on bootstrapping,
Expand seed vocabulary using bootstrapping technology and solves the problems, such as that artificial labeled data is insufficient;There is scholar to propose a kind of base
In the name entity identification algorithms of the neural network structure of BLSTM, this method no longer depends directly on manual features and field is known
Knowing, but utilizes the term vector based on context and the term vector based on word, the former expresses the contextual information of name entity,
The latter expresses prefix, suffix and the realm information for constituting name entity;There is scholar to propose a kind of based on BLSTM-CRF model
Entity identification algorithms are named, when carrying out sequence labelling to sentence, the label between word is not independent, considers front word
Label information so that the information of bluebeard compound mark the tag of current word, CRF to replace again to export, produce from the layer using softmax
The final prediction of raw each word;There is scholar to propose a kind of deep-neural-network model based on stack from coding classifier,
Solve the transition problem from Chinese text sequence to mode input vector, propose before the vectorization convenient for Project Realization to-
Back-propagating formula.
Name entity identification algorithms most at present are all to name, place name, and mechanism name is identified, not to its into
Row is further to be divided, and bad to the recognition effect of long entity.
Summary of the invention
In view of this, the purpose of the present invention is to provide a kind of, the acceptance of the bid webpage based on LSTM names entity abstracting method,
It can quickly and effectively identify the name entity in the project winning a bid details page of bidding website.
To achieve the above object, the present invention adopts the following technical scheme:
A kind of acceptance of the bid webpage name entity abstracting method based on LSTM, specifically includes the following steps:
Step A: the text data of acceptance of the bid webpage to be extracted is cleaned, acceptance of the bid text is obtained;
Step B: it using Lattice-LSTM model as coding layer, and using acceptance of the bid text as the input of coding layer, obtains
The semantic information feature of acceptance of the bid text;
Step C: using LSTM model as decoding layer, and using the semantic information feature of obtained acceptance of the bid text as decoding
The input of layer is labeled each word in acceptance of the bid text;
Step D: rule regulating is carried out to the obtained acceptance of the bid text with mark and formatting is handled;
Step E: the name entity of identification is exported.
Further, the step B specifically:
Step B1: word vector is converted by the word in text of getting the bid;
Wherein, for j-th of word c in acceptance of the bid textj, it is converted into word vectorCalculation formula is as follows:
Wherein, ecIndicate character vector mapping table.
Step B2: the word in text of getting the bid is converted into term vector;
Step B3: inputting Lattice-LSTM model for term vector, obtains acceptance of the bid text using Lattice-LSTM model
Semantic information feature.
Further, the step B2 specifically:
Step B21: vocabulary D is constructed using Tire tree according to Large Scale Corpus;
Step B22: the matching set of words P of the empty acceptance of the bid text of initialization one;
Step B23: beginning stepping through the first character for text of getting the bid as current word, executes step B24;
Step B24: by matching in vocabulary D using current word as the word of prefix wordIt is added in set P;
Wherein, b indicates position of the first character of word in sentence, and e indicates position of the last character of word in sentence;
Step B25: using the character late of current word as current word, iteration executes step B24, until text of getting the bid
Last character terminate;
Step B26: will be in set P after traversalBe converted to term vectorCalculation formula is as follows:
Wherein, ewFor term vector mapping table.
Further, the step B3 is specific as follows:
For each sentence in text, the word sequence vector that step B1 is obtained is sequentially inputAnd step
The term vector sequence that B2 is obtainedInto Lattice-LSTM model, each word is exported in the semanteme of context
The vector of information indicates sequence, and specific formula for calculation is as follows:
It is the word vector of j-th of word in sentence,Be in sentence with j-th of word be ending word term vector,For j
The output at moment;For the weight matrix of word-level LSTM, For word-level
The bias term of LSTM;It is forgetting door of the word-level LSTM at the j moment;It is input gate of the word-level LSTM at the j moment;It is candidate memory vector of the word-level LSTM at the j moment;It is memory vector of the word-level LSTM at the j moment;For the weight matrix of character level LSTM,For character level
The bias term of LSTM;It is input gate of the character level LSTM at the j moment;Be candidate of the word-level LSTM at the j moment remember to
Amount;It is memory vector of the word-level LSTM at the j moment;It is out gate of the word-level LSTM at the j moment;
It is to calculateWhen weight.
Further, the step C specifically:
Step C1: for the name Entity recognition task of acceptance of the bid webpage, the word in data is divided into two classes;
Wherein, the first kind represents the word unrelated with entity, is indicated with label " O ";Second class represents relevant to entity
The label of word, this kind of words consists of three parts:
Step C2: by the hidden state information of the obtained semantic information that can indicate text of step BIt is input to decoding
In the LSTM model of layer, output state of each character under the influence of upper and lower Chinese character, the following institute of specific formula for calculation are calculated
Show:
WhereinFor label vector;
Step C3: by label vectorIt is input in Softmax classifier, it is normalized operation, calculate text
In each word be marked as the probability of all kinds of labels, specific formula is as follows:
Wherein WyFor weight matrix, byFor bias term, NtFor the species number of label;
Step C4: it using log-likelihood function as loss function, by stochastic gradient descent optimization method, is passed using reversed
It broadcasts iteration and updates model parameter, carry out training pattern to minimize loss function, specific formula for calculation is as follows:
Wherein, D indicates the size of training set, LjIt is the length of sentence x,It is character t in sentence xjLabel,
It is the probability after normalization, Θ representative model parameter, I (O) is a selection function, to distinguish the loss of label ' O ' and can refer to
Show the loss of the label of entity, specific formula for calculation is as follows:
Further, the name entity includes bid mechanism, acceptance of the bid mechanism, bid mechanism their location, middle standard gold
Volume, bid authority contact people, project for bidding title, get the bid the time.
Further, the step D specifically:
Step D1: the correction process that rule is carried out with labeled data that step C is obtained;
Step D2: will correction treated that data are formatted processing.
Further, the step D1 specifically:
Step D11: for the amount of money of getting the bid, judge entity with the presence or absence of Arabic numerals by the way of regular expression
Or Chinese word figure, if there is no then not thinking to be the acceptance of the bid amount of money and give up.
Step D12: for the time of getting the bid, judgement is not that date building form give up.
Step D13: project name will not be gone out substantially since the string length of project name entity is usually longer
Now there was only the case where two or three of word compositions, therefore gives up entity of the string length less than 4 of the project name recognized.
Step D14: reserved character string length longest life when classification same for an acceptance of the bid data occurs multiple
Name entity.
Further, in the step D2, processing is formatted to name entity, specifically includes the following steps:
Step D21: for the amount of money of getting the bid, judging whether entity includes unit " hundred ", " one hundred ", " thousand ", " thousand ", " ten thousand ",
" ten thousand ", " hundred million ", " hundred million ", " dollar ", " yen ", if carrying out unit conversion comprising if;
Step D22: it for the time of getting the bid, is converted in the form of date format YYYY-MM-DD.
Compared with the prior art, the invention has the following beneficial effects:
The present invention is based on Lattice-LSTM-LSTM models, can efficiently identify the project winning a bid details of bidding website
Name entity in the page, and identification that can very well to long entity.
Detailed description of the invention
Fig. 1 is the method for the present invention flow chart.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
Fig. 1 is please referred to, the present invention provides a kind of acceptance of the bid webpage name entity abstracting method based on LSTM, specifically includes
Following steps:
Step A: the text data of acceptance of the bid webpage to be extracted is cleaned, acceptance of the bid text is obtained;
Step B: it using Lattice-LSTM model as coding layer, and using acceptance of the bid text as the input of coding layer, obtains
The semantic information feature of acceptance of the bid text;
Step B1: word vector is converted by the word in text of getting the bid;
Wherein, for j-th of word c in acceptance of the bid textj, it is converted into word vectorCalculation formula is as follows:
Wherein, ecIndicate character vector mapping table.
Step B2: the word in text of getting the bid is converted into term vector;
Step B21: vocabulary D is constructed using Tire tree according to Large Scale Corpus;
Step B22: the matching set of words P of the empty acceptance of the bid text of initialization one;
Step B23: beginning stepping through the first character for text of getting the bid as current word, executes step B24;
Step B24: by matching in vocabulary D using current word as the word of prefix wordIt is added in set P;
Wherein, b indicates position of the first character of word in sentence, and e indicates position of the last character of word in sentence;
Step B25: using the character late of current word as current word, iteration executes step B24, until text of getting the bid
Last character terminate;
Step B26: will be in set P after traversalBe converted to term vectorCalculation formula is as follows:
Wherein, ewFor term vector mapping table.
Step B3: inputting Lattice-LSTM model for term vector, obtains acceptance of the bid text using Lattice-LSTM model
Semantic information feature.
For each sentence in text, the word sequence vector that step B1 is obtained is sequentially inputAnd step
The term vector sequence that B2 is obtainedInto Lattice-LSTM model, each word is exported in the semanteme of context
The vector of information indicates sequence, and specific formula for calculation is as follows:
It is the word vector of j-th of word in sentence,Be in sentence with j-th of word be ending word term vector,For j
The output at moment;For the weight matrix of word-level LSTM,For word
The bias term of grade LSTM;It is forgetting door of the word-level LSTM at the j moment;It is input gate of the word-level LSTM at the j moment;It is candidate memory vector of the word-level LSTM at the j moment;It is memory vector of the word-level LSTM at the j moment;For the weight matrix of character level LSTM,For character level
The bias term of LSTM;It is input gate of the character level LSTM at the j moment;Be candidate of the word-level LSTM at the j moment remember to
Amount;It is memory vector of the word-level LSTM at the j moment;It is out gate of the word-level LSTM at the j moment;
It is to calculateWhen weight.
Step C: using LSTM model as decoding layer, and using the semantic information feature of obtained acceptance of the bid text as decoding
The input of layer is labeled each word in acceptance of the bid text;
Step C1: for the name Entity recognition task of acceptance of the bid webpage, the word in data is divided into two classes;
Wherein, the first kind represents the word unrelated with entity, is indicated with label " O ";Second class represents relevant to entity
The label of word, this kind of words consists of three parts:
Step C2: by the hidden state information of the obtained semantic information that can indicate text of step BIt is input to decoding
In the LSTM model of layer, output state of each character under the influence of upper and lower Chinese character, the following institute of specific formula for calculation are calculated
Show:
WhereinFor label vector;
Step C3: by label vectorIt is input in Softmax classifier, it is normalized operation, calculate text
In each word be marked as the probability of all kinds of labels, specific formula is as follows:
Wherein WyFor weight matrix, byFor bias term, NtFor the species number of label;
Step C4: it using log-likelihood function as loss function, by stochastic gradient descent optimization method, is passed using reversed
It broadcasts iteration and updates model parameter, carry out training pattern to minimize loss function, specific formula for calculation is as follows:
Wherein, D indicates the size of training set, LjIt is the length of sentence x,It is character t in sentence xjLabel,
It is the probability after normalization, Θ representative model parameter, I (O) is a selection function, to distinguish the loss of label ' O ' and can refer to
Show the loss of the label of entity, specific formula for calculation is as follows:
Step D: rule regulating is carried out to the obtained acceptance of the bid text with mark and formatting is handled;
Step D1: the correction process that rule is carried out with labeled data that step C is obtained;
Step D11: for the amount of money of getting the bid, judge entity with the presence or absence of Arabic numerals by the way of regular expression
Or Chinese word figure, if there is no then not thinking to be the acceptance of the bid amount of money and give up.
Step D12: for the time of getting the bid, judgement is not that date building form give up.
Step D13: project name will not be gone out substantially since the string length of project name entity is usually longer
Now there was only the case where two or three of word compositions, therefore gives up entity of the string length less than 4 of the project name recognized.
Step D14: reserved character string length longest life when classification same for an acceptance of the bid data occurs multiple
Name entity.
Step D2: will correction treated that data are formatted processing.
Step D21: for the amount of money of getting the bid, judging whether entity includes unit " hundred ", " one hundred ", " thousand ", " thousand ", " ten thousand ",
" ten thousand ", " hundred million ", " hundred million ", " dollar ", " yen ", if carrying out unit conversion comprising if;
Step D22: it for the time of getting the bid, is converted in the form of date format YYYY-MM-DD.
Step E: bid mechanism, acceptance of the bid mechanism, bid mechanism their location, the acceptance of the bid amount of money, the bid mechanism of identification are exported
Contact person, project for bidding title, the name entity for time of getting the bid.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent
With modification, it is all covered by the present invention.
Claims (9)
1. a kind of acceptance of the bid webpage based on LSTM names entity abstracting method, which is characterized in that specifically includes the following steps:
Step A: the text data of acceptance of the bid webpage to be extracted is cleaned, acceptance of the bid text is obtained;
Step B: it using Lattice-LSTM model as coding layer, and using acceptance of the bid text as the input of coding layer, is got the bid
The semantic information feature of text;
Step C: using LSTM model as decoding layer, and using the semantic information feature of obtained acceptance of the bid text as the defeated of decoding layer
Enter, each word in acceptance of the bid text is labeled;
Step D: rule regulating is carried out to the obtained acceptance of the bid text with mark and formatting is handled;
Step E: the name entity of identification is exported.
2. a kind of acceptance of the bid webpage based on LSTM according to claim 1 names entity abstracting method, it is characterised in that: institute
State step B specifically:
Step B1: word vector is converted by the word in text of getting the bid;
Wherein, for j-th of word c in acceptance of the bid textj, it is converted into word vectorCalculation formula is as follows:
Wherein, ecIndicate character vector mapping table;
Step B2: the word in text of getting the bid is converted into term vector;
Step B3: inputting Lattice-LSTM model for term vector, obtains the language of acceptance of the bid text using Lattice-LSTM model
Adopted information characteristics.
3. a kind of acceptance of the bid webpage based on LSTM according to claim 2 names entity abstracting method, which is characterized in that institute
State step B2 specifically:
Step B21: vocabulary D is constructed using Tire tree according to Large Scale Corpus;
Step B22: the matching set of words P of the empty acceptance of the bid text of initialization one;
Step B23: beginning stepping through the first character for text of getting the bid as current word, executes step B24;
Step B24: by matching in vocabulary D using current word as the word of prefix wordIt is added in set P;
Wherein, b indicates position of the first character of word in sentence, and e indicates position of the last character of word in sentence;
Step B25: using the character late of current word as current word, iteration executes step B24, until the last of text of getting the bid
One character ends;
Step B26: will be in set P after traversalBe converted to term vectorCalculation formula is as follows:
Wherein, ewFor term vector mapping table.
4. a kind of acceptance of the bid webpage based on LSTM according to claim 2 names entity abstracting method, which is characterized in that institute
It is specific as follows to state step B3:
For each sentence in text, the word sequence vector that step B1 is obtained is sequentially inputIt is obtained with step B2
The term vector sequence arrivedInto Lattice-LSTM model, each word is exported in the semantic information of context
Vector indicates sequence, and specific formula for calculation is as follows:
It is the word vector of j-th of word in sentence,Be in sentence with j-th of word be ending word term vector,For the j moment
Output;For the weight matrix of word-level LSTM, For word-level LSTM
Bias term;It is forgetting door of the word-level LSTM at the j moment;It is input gate of the word-level LSTM at the j moment;It is word
Candidate memory vector of the language grade LSTM at the j moment;It is memory vector of the word-level LSTM at the j moment;For the weight matrix of character level LSTM,For character level
The bias term of LSTM;It is input gate of the character level LSTM at the j moment;Be candidate of the word-level LSTM at the j moment remember to
Amount;It is memory vector of the word-level LSTM at the j moment;It is out gate of the word-level LSTM at the j moment;
It is to calculateWhen weight.
5. a kind of acceptance of the bid webpage based on LSTM according to claim 4 names entity abstracting method, which is characterized in that institute
State step C specifically:
Step C1: for the name Entity recognition task of acceptance of the bid webpage, the word in data is divided into two classes;
Wherein, the first kind represents the word unrelated with entity, is indicated with label " O ";Second class represents word relevant to entity, this
The label of a kind of word consists of three parts:
Step C2: by the hidden state information of the obtained semantic information that can indicate text of step BIt is input to decoding layer
In LSTM model, output state of each character under the influence of upper and lower Chinese character is calculated, specific formula for calculation is as follows:
WhereinFor label vector;
Step C3: by label vectorIt is input in Softmax classifier, it is normalized operation, calculate every in text
A word is marked as the probability of all kinds of labels, and specific formula is as follows:
Wherein WyFor weight matrix, byFor bias term, Nt is the species number of label;
Step C4: using log-likelihood function as loss function, by stochastic gradient descent optimization method, backpropagation iteration is utilized
Model parameter is updated, carrys out training pattern to minimize loss function, specific formula for calculation is as follows:
Wherein, D indicates the size of training set, and Lj is the length of sentence x,It is label of the character t in sentence xj,It is normalizing
Probability after change, Θ representative model parameter, I (O) are a selection functions, to distinguish the loss of label ' O ' and can indicate entity
Label loss, specific formula for calculation is as follows:
6. a kind of acceptance of the bid webpage based on LSTM according to claim 1 names entity abstracting method, it is characterised in that: institute
Stating name entity includes bid mechanism, acceptance of the bid mechanism, bid mechanism their location, the acceptance of the bid amount of money, bid authority contact people, bid
Project name is got the bid the time.
7. a kind of acceptance of the bid webpage based on LSTM according to claim 6 names entity abstracting method, which is characterized in that institute
State step D specifically:
Step D1: the correction process that rule is carried out with labeled data that step C is obtained;
Step D2: will correction treated that data are formatted processing.
8. a kind of acceptance of the bid webpage based on LSTM according to claim 7 names entity abstracting method, which is characterized in that institute
State step D1 specifically:
Step D11: for the amount of money of getting the bid, judge entity with the presence or absence of Arabic numerals or Chinese by the way of regular expression
Word figure, if there is no then not thinking it is that acceptance of the bid and is given up the amount of money.
Step D12: for the time of getting the bid, judgement is not that date building form give up.
Step D13: being not in only since the string length of project name entity is usually longer for project name substantially
The case where being made of two or three of words, therefore give up entity of the string length less than 4 of the project name recognized.
Step D14: the longest name of reserved character string length is real when classification same for an acceptance of the bid data occurs multiple
Body.
9. a kind of acceptance of the bid webpage based on LSTM according to claim 1 names entity abstracting method, which is characterized in that institute
It states in step D2, processing is formatted to name entity, specifically includes the following steps:
Step D21: for the amount of money of getting the bid, judging whether entity includes unit " hundred ", " one hundred ", " thousand ", " thousand ", " ten thousand ", " ten thousand ",
" hundred million ", " hundred million ", " dollar ", " yen ", if carrying out unit conversion comprising if;
Step D22: it for the time of getting the bid, is converted in the form of date format YYYY-MM-DD.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013185.2A CN109753660B (en) | 2019-01-07 | 2019-01-07 | LSTM-based winning bid web page named entity extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910013185.2A CN109753660B (en) | 2019-01-07 | 2019-01-07 | LSTM-based winning bid web page named entity extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109753660A true CN109753660A (en) | 2019-05-14 |
CN109753660B CN109753660B (en) | 2023-06-13 |
Family
ID=66404567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910013185.2A Active CN109753660B (en) | 2019-01-07 | 2019-01-07 | LSTM-based winning bid web page named entity extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109753660B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334300A (en) * | 2019-07-10 | 2019-10-15 | 哈尔滨工业大学 | Text aid reading method towards the analysis of public opinion |
CN110738182A (en) * | 2019-10-21 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for high-precision identification of bid amount |
CN110738319A (en) * | 2019-11-11 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for recognizing bid-winning units based on CRF |
CN111078978A (en) * | 2019-11-29 | 2020-04-28 | 上海观安信息技术股份有限公司 | Web credit website entity identification method and system based on website text content |
CN111737969A (en) * | 2020-07-27 | 2020-10-02 | 北森云计算有限公司 | Resume parsing method and system based on deep learning |
CN111738002A (en) * | 2020-05-26 | 2020-10-02 | 北京信息科技大学 | Ancient text field named entity identification method and system based on Lattice LSTM |
CN112017016A (en) * | 2019-10-29 | 2020-12-01 | 河南拓普计算机网络工程有限公司 | Method for cleaning bid amount of bid-attracting bulletin |
CN112948588A (en) * | 2021-05-11 | 2021-06-11 | 中国人民解放军国防科技大学 | Chinese text classification method for quick information editing |
CN112989807A (en) * | 2021-03-11 | 2021-06-18 | 重庆理工大学 | Long digital entity extraction method based on continuous digital compression coding |
CN112990845A (en) * | 2021-01-04 | 2021-06-18 | 江苏省测绘地理信息局信息中心 | Intelligent acquisition method for mapping market project |
JP2021111416A (en) * | 2020-01-15 | 2021-08-02 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Method and apparatus for labeling core entity, electronic device, storage medium, and computer program |
CN114048750A (en) * | 2021-12-10 | 2022-02-15 | 广东工业大学 | Named entity identification method integrating information advanced features |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100082331A1 (en) * | 2008-09-30 | 2010-04-01 | Xerox Corporation | Semantically-driven extraction of relations between named entities |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107832400A (en) * | 2017-11-01 | 2018-03-23 | 山东大学 | A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification |
CN108416058A (en) * | 2018-03-22 | 2018-08-17 | 北京理工大学 | A kind of Relation extraction method based on the enhancing of Bi-LSTM input informations |
CN108509423A (en) * | 2018-04-04 | 2018-09-07 | 福州大学 | A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM |
-
2019
- 2019-01-07 CN CN201910013185.2A patent/CN109753660B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100082331A1 (en) * | 2008-09-30 | 2010-04-01 | Xerox Corporation | Semantically-driven extraction of relations between named entities |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN107832400A (en) * | 2017-11-01 | 2018-03-23 | 山东大学 | A kind of method that location-based LSTM and CNN conjunctive models carry out relation classification |
CN108416058A (en) * | 2018-03-22 | 2018-08-17 | 北京理工大学 | A kind of Relation extraction method based on the enhancing of Bi-LSTM input informations |
CN108509423A (en) * | 2018-04-04 | 2018-09-07 | 福州大学 | A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM |
Non-Patent Citations (1)
Title |
---|
唐敏: "基于深度学习的中文实体关系抽取方法研究", 《万方数据学位论文库》, 19 December 2018 (2018-12-19), pages 1 - 75 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334300A (en) * | 2019-07-10 | 2019-10-15 | 哈尔滨工业大学 | Text aid reading method towards the analysis of public opinion |
CN110738182A (en) * | 2019-10-21 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for high-precision identification of bid amount |
CN112017016A (en) * | 2019-10-29 | 2020-12-01 | 河南拓普计算机网络工程有限公司 | Method for cleaning bid amount of bid-attracting bulletin |
CN110738319A (en) * | 2019-11-11 | 2020-01-31 | 四川隧唐科技股份有限公司 | LSTM model unit training method and device for recognizing bid-winning units based on CRF |
CN111078978A (en) * | 2019-11-29 | 2020-04-28 | 上海观安信息技术股份有限公司 | Web credit website entity identification method and system based on website text content |
CN111078978B (en) * | 2019-11-29 | 2024-02-27 | 上海观安信息技术股份有限公司 | Network credit website entity identification method and system based on website text content |
JP2021111416A (en) * | 2020-01-15 | 2021-08-02 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | Method and apparatus for labeling core entity, electronic device, storage medium, and computer program |
JP7110416B2 (en) | 2020-01-15 | 2022-08-01 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | Core entity tagging method, core entity tagging device, electronic device, storage medium and computer program |
CN111738002A (en) * | 2020-05-26 | 2020-10-02 | 北京信息科技大学 | Ancient text field named entity identification method and system based on Lattice LSTM |
CN111737969A (en) * | 2020-07-27 | 2020-10-02 | 北森云计算有限公司 | Resume parsing method and system based on deep learning |
CN112990845A (en) * | 2021-01-04 | 2021-06-18 | 江苏省测绘地理信息局信息中心 | Intelligent acquisition method for mapping market project |
CN112989807A (en) * | 2021-03-11 | 2021-06-18 | 重庆理工大学 | Long digital entity extraction method based on continuous digital compression coding |
CN112989807B (en) * | 2021-03-11 | 2021-11-23 | 重庆理工大学 | Long digital entity extraction method based on continuous digital compression coding |
CN112948588A (en) * | 2021-05-11 | 2021-06-11 | 中国人民解放军国防科技大学 | Chinese text classification method for quick information editing |
CN114048750A (en) * | 2021-12-10 | 2022-02-15 | 广东工业大学 | Named entity identification method integrating information advanced features |
Also Published As
Publication number | Publication date |
---|---|
CN109753660B (en) | 2023-06-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109753660A (en) | A kind of acceptance of the bid webpage name entity abstracting method based on LSTM | |
CN108984526B (en) | Document theme vector extraction method based on deep learning | |
CN110083831B (en) | Chinese named entity identification method based on BERT-BiGRU-CRF | |
CN104834747B (en) | Short text classification method based on convolutional neural networks | |
CN110555084B (en) | Remote supervision relation classification method based on PCNN and multi-layer attention | |
WO2018028077A1 (en) | Deep learning based method and device for chinese semantics analysis | |
CN106599032B (en) | Text event extraction method combining sparse coding and structure sensing machine | |
CN110083700A (en) | A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks | |
CN109117472A (en) | A kind of Uighur name entity recognition method based on deep learning | |
CN113591483A (en) | Document-level event argument extraction method based on sequence labeling | |
CN109902177A (en) | Text emotion analysis method based on binary channels convolution Memory Neural Networks | |
CN113220876B (en) | Multi-label classification method and system for English text | |
WO2022198750A1 (en) | Semantic recognition method | |
CN110297889B (en) | Enterprise emotional tendency analysis method based on feature fusion | |
CN110188175A (en) | A kind of question and answer based on BiLSTM-CRF model are to abstracting method, system and storage medium | |
CN110750646B (en) | Attribute description extracting method for hotel comment text | |
CN111966825A (en) | Power grid equipment defect text classification method based on machine learning | |
CN110851593B (en) | Complex value word vector construction method based on position and semantics | |
CN111177383A (en) | Text entity relation automatic classification method fusing text syntactic structure and semantic information | |
CN109840328A (en) | Deep learning comment on commodity text emotion trend analysis method | |
CN108932229A (en) | A kind of money article proneness analysis method | |
CN111666752A (en) | Circuit teaching material entity relation extraction method based on keyword attention mechanism | |
CN115114926A (en) | Chinese agricultural named entity identification method | |
CN110134950A (en) | A kind of text auto-collation that words combines | |
CN113488196A (en) | Drug specification text named entity recognition modeling method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |