CN109508459A - A method of extracting theme and key message from news - Google Patents
A method of extracting theme and key message from news Download PDFInfo
- Publication number
- CN109508459A CN109508459A CN201811313654.4A CN201811313654A CN109508459A CN 109508459 A CN109508459 A CN 109508459A CN 201811313654 A CN201811313654 A CN 201811313654A CN 109508459 A CN109508459 A CN 109508459A
- Authority
- CN
- China
- Prior art keywords
- theme
- label
- matrix
- news
- serializing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 239000000284 extract Substances 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 9
- 238000011161 development Methods 0.000 claims abstract description 6
- 239000011159 matrix material Substances 0.000 claims description 58
- 230000008569 process Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 3
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000013526 transfer learning Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 14
- 238000010801 machine learning Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 210000003754 fetus Anatomy 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000001605 fetal effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003793 prenatal diagnosis Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000003045 statistical classification method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Abstract
The method that the invention discloses a kind of to extract theme and key message from news, comprising the following steps: html tag is carried out to news content and is handled;To treated, news content carries out theme mark and serializing mark respectively, obtains the corresponding serializing label of each word in the corresponding theme label of news content and news content;It creates theme and key message extracts model, which includes a seq2seq network and a fully-connected network, and from the state output of the coding stage of seq2seq network, training pattern obtains optimized parameter for the input of fully-connected network;It is injected into extraction model after carrying out html tag processing to the news content not marked, obtains optimal theme label and serializing label, news generic is obtained according to theme label, the corresponding slot position value of news content is obtained according to serializing label.This method uses seq2seq+attention+crf, strengthens the dependence of disaggregated model and slot filling model, reduces the complexity of text marking, while reducing project development complexity.
Description
Technical field
The present invention relates to text classification and information extraction fields, more particularly to one kind, and theme and crucial letter are extracted from news
The method of breath.
Background technique
Theme of news extracts the scope for belonging to text classification, and the slot filling in key message extraction belongs to the model of information extraction
Farmland is all the chief component of natural language processing.Text classification correlative study can trace back to the fifties in last century earliest,
It was to be classified by Expert Rules (Pattern), or even once develop at the beginning of the eighties and established using knowledge engineering at that time
Expert system, the advantage of doing so is that short, adaptable and fast solves the problems, such as top, it is apparent that ceiling is very low, it is not only time-consuming and laborious, it covers
The range and accuracy rate of lid are all very limited.Later along with internet after the development of statistical learning method, the especially nineties
Online amount of text increases and the rise of machine learning subject, has gradually formed a set of warp for solving the problems, such as large-scale text categorization
Allusion quotation playing method, the main set pattern in this stage are manual features engineering+shallow-layer disaggregated models.Entire text classification problem is just split into
Feature Engineering and classifier two parts.
Feature Engineering often most takes time and effort in machine learning, but extremely important.For abstract, engineering
Habit problem is the process for converting data to information and refining knowledge again, is characterized in the process of " data -- > information ", determines to finish
The upper limit of fruit, and classifier is the process of " information -- > knowledge ", then is to approach this upper limit.However Feature Engineering is different from
Sorter model does not have very strong versatility, generally requires to combine the understanding to feature task.Where text classification problem
Naturally also there is its distinctive characteristic processing logic in natural language field, this classification task of tradition point largely works also here.
Text feature engineering is divided into three Text Pretreatment, feature extraction, text representation parts, and final purpose is to convert text to count
The intelligible format of calculation machine, and encapsulate the information for being sufficiently used for classification, i.e., very strong feature representation ability.Classifier is substantially
Statistical classification method, substantial majority machine learning method are all applied in text classification field, such as simple pattra leaves
This sorting algorithm (Bayes), KNN, SVM, maximum entropy and neural network etc..
Unstructured data as natural language sentences is converted into structural data, then utilizes powerful inquiry work
Tool, such as SQL.This method that meaning is obtained from text is referred to as information extraction, and information extracting system search is a large amount of non-structural
Change text, finds certain types of entity and relationship, and be used to fill organized database.These databases can be used
To find the answer of particular problem.It is broadly divided into name Entity recognition, relationship is extracted.
Naming Entity recognition (NER) is a classical problem in natural language processing, and application is also extremely wide.Than
Name, place name are such as identified from a word, and the name of product, identification medicine name etc. are identified from the search of electric business.
Traditional generally acknowledges that relatively good Processing Algorithm is condition random field (CRF), it is a kind of discriminate probabilistic model, is random field
One kind being usually used in mark or analytical sequence data, such as natural language text or biological sequence.It is simply to say to apply in NER
It is to give a series of feature to go to predict the label of each word.
Relationship is extracted mainly between the semantic classification entity, and the Relation extraction technology of existing mainstream is divided into have supervision
Learning method, semi-supervised learning method and unsupervised three kinds of learning method:
1, the learning method of supervision as classification problem, designs Relation extraction task effective special according to training data
Then sign uses trained classifier projected relationship to learn various disaggregated models.The problem of this method, is to need big
The artificial mark training corpus of amount, and corpus labeling work usually takes time and effort very much.
2, semi-supervised learning method mainly uses Bootstrapping to carry out Relation extraction.For the relationship to be extracted,
This method sets several sub-instance by hand first, then iteratively from data from corresponding relationship templates of the relationship of extraction and more
Example.
3, unsupervised learning method assumes to possess the entity of identical semantic relation to possessing similar contextual information.Cause
This can use each entity to contextual information is corresponded to represent the semantic relation of the entity pair, and to the language of all entities pair
Adopted relationship is clustered.
Compared with other two methods, there is the learning method of supervision that can extract more effective feature, accuracy rate and calls together
The rate of returning is all higher.Therefore the learning method of supervision receives the concern of more and more scholars.
In nowadays most applications, name Entity recognition, relationship extraction are all that individual task is executing, and are less used
It says in conjunction with text classification.Currently used entity, the method that Relation extraction method is assembly line: one sentence of input, it is first
It is first named Entity recognition, combination of two then is carried out to the entity identified, then carry out relationship classification, finally presence
The triple of entity relationship is as input.The method of assembly line there are the shortcomings that have: 1) error propagation, the mistake of Entity recognition module
Misunderstanding influences following relationship classification performance;2) existing relationship between two subtasks is ignored.3) producing need not
Redundancy then carry out relationship classification again, those are not related due to being matched two-by-two to the entity identified
Entity promotes error rate to can bring about redundant information.
Existing text classification and slot filling are all intended only as individual model to train, and not only ignore between task
Dependence, and aggravated the development cycle of entire project, increased the workload of text marking.Text classification and information
What the mode of all common supervised learning of extraction was realized, and supervised learning must have enough sample datas, the mark of sample
Note is to compare the work taken time and effort, and mark quality also because of people.In this case, task reads the complexity of mark just
Bigger and quality also more Customers ' Legal Right.Solve the problems, such as that natural language processing is the most popular with deep learning at present, but depth
The period for practising general training is longer, and the task the how especially prominent, seriously constrains the iteration of project.
Summary of the invention
Slot filling is a vital task in natural language understanding, is for extracting various role's letters relevant with event
Breath and attribute information.News category and slot filling usually are divided into two independent models to train, and two moulds
Type is incoherent.But say that slot filling is to rely on news category on operational angle, different classes of problem will fill
Slot type be also different.Technical solution provided by the invention is to instruct news category and slot filling as a model
Practice, multiple tasks are incorporated into a task, the correlation between task has been fully considered, can avoid to a greater degree in this way
News category and the unmatched problem of slot type, reduce the development cycle, improve the accuracy of result.
The method of the present invention mainly uses the scheme of seq2seq+attention+crf to solve, specifically includes the following steps:
(1) html tag is carried out to the news content crawled on webpage to handle;
(2) to treated, news content carries out theme mark and serializing mark respectively, and it is corresponding to obtain news content
The corresponding serializing label of each word in theme label and news content;The theme is noted for the mark affiliated class of news
Not;Serializing mark is primarily directed in the case where having marked theme, determining the relevant role of theme or attribute
Information;
(3) it creates theme and key message extracts model, which includes a seq2seq network and a fully connected network
Network, the state output of the input of fully-connected network from the coding stage of seq2seq network;
(4) news data that step (2) has marked is injected into the seq2seq network for extracting model, to news content
In word encoded, cataloged procedure is as follows: first in news content each word carry out embedding vectorization processing,
Vectorization matrix is obtained, then vectorization matrix is injected into coding BiLstm bidirectional circulating neural network, obtains outputs
Output matrix and finalState end-state matrix;
(5) it is directed to theme label, finalState matrix is injected into the fully-connected network for extracting model, is obtained
Logic matrix and actual theme label are done cross entropy and handle to obtain penalty values category_ by logic intermediate result matrix
loss;
(6) for serializing label, outputs matrix progress attention attention mechanism is converted to obtain
Attention attention matrix;
(7) the decoding BiLstm for attention matrix and outputs matrix being input to seq2seq network together is bis-
Into Recognition with Recurrent Neural Network, decode_outputs decoded output matrix is obtained, calculates decode_ with crf loss function
Outputs matrix penalty values solt_loss corresponding with serializing label;
(8) the whole loss value loss that category_loss is obtained extracting network plus solt_loss, then utilizes
Gradient descent method carries out backpropagation to loss, obtains the optimized parameter for extracting model;
(9) it after carrying out html tag processing to the news content not marked, is injected into theme and key message extracts mould
In type, optimal theme label and serializing label is obtained, news generic is obtained according to theme label, is marked according to serializing
Label obtain the corresponding slot position value of news content, i.e. role or attribute information.
Further, in the step (4), embedding vectorization processing tool is carried out to each word in news content
Body are as follows: the good embedding word vector of pre-training is directly injected into seq2seq network with the method for transfer learning, is being instructed
It does not need to be updated the parameter in embedding word vector during practicing.
Further, in the step (6), outputs matrix progress attention attention mechanism is converted to obtain
During attention attention matrix, by the way of Self attention and Multi-head, solves tradition
Attention model can not parallelization the shortcomings that, promote effect and performance.
Further, in the step (9), theme and key message extract model output theme label matrix and serializing
Label matrix using softmax as activation primitive, obtains the theme label of maximum probability as optimal in theme label matrix
Theme label;For serializing label matrix, decode_outputs decoded output matrix is subjected to condition random field crf solution
Code obtains optimal serializing label.
It is mentioned the beneficial effects of the present invention are: being extracted the invention proposes a kind of disposable solution theme of news with key message
The method taken, the present invention use seq2seq+attention+crf scheme, enhance disaggregated model and slot filling model according to
The relationship of relying, reduces the complexity of text marking, while can reduce project development complexity.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram of one embodiment of the invention.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
A kind of method for extracting theme and key message from news provided by the invention, comprising the following steps:
(1) html tag is carried out to the news content crawled on webpage to handle;
(2) to treated, news content carries out theme mark and serializing mark respectively, and it is corresponding to obtain news content
The corresponding serializing label of each word in theme label and news content;
The main mark news generic of theme mark, for example, for financial institution, it will news relevant to information of inviting outside investment
It is labeled as 1, other news are labeled as 0;
Serializing mark is primarily directed in the case where having marked theme, determining the relevant role of theme or attribute
Information, such as financing event relevant for information of inviting outside investment, corresponding role are investor, by investor etc., corresponding attribute
Inferior for the financing amount of money, financing wheel, these corresponding roles and attribute are exactly slot position;
(3) it creates theme and key message extracts model, which includes a seq2seq network and a fully connected network
Network, the state output of the input of fully-connected network from the coding stage of seq2seq network;
(4) news data that step (2) has marked is injected into the seq2seq network for extracting model, to news content
In word encoded, cataloged procedure is as follows: first in news content each word carry out embedding vectorization processing,
Vectorization matrix is obtained, then vectorization matrix is injected into coding BiLstm bidirectional circulating neural network, obtains outputs
Output matrix and finalState end-state matrix;
Each word in news content is carried out in embedding vectorization treatment process, it will with the method for transfer learning
The good embedding word vector of pre-training is directly injected into seq2seq network, is not needed in the training process pair
Parameter in embedding word vector is updated.
(5) it is directed to theme label, finalState matrix is injected into the fully-connected network for extracting model, is obtained
Logic matrix and actual theme label are done cross entropy and handle to obtain penalty values category_ by logic intermediate result matrix
loss;
(6) for serializing label, outputs matrix progress attention attention mechanism is converted to obtain
Attention attention matrix by the way of Self attention and Multi-head, solves biography in the process
Unite attention model can not parallelization the shortcomings that, promote effect and performance;
(7) the decoding BiLstm for attention matrix and outputs matrix being input to seq2seq network together is bis-
Into Recognition with Recurrent Neural Network, decode_outputs decoded output matrix is obtained, calculates decode_ with crf loss function
Outputs matrix penalty values solt_loss corresponding with serializing label;
(8) the whole loss value loss that category_loss is obtained extracting network plus solt_loss, then utilizes
Gradient descent method carries out backpropagation to loss, obtains the optimized parameter for extracting model;
(9) it after carrying out html tag processing to the news content not marked, is injected into theme and key message extracts mould
In type, optimal theme label and serializing label is obtained, news generic is obtained according to theme label, is marked according to serializing
Label obtain the corresponding slot position value of news content, i.e. role or attribute information.
Theme and key message extract model output theme label matrix and serializing label matrix, for theme label square
Battle array, using softmax as activation primitive, obtains the theme label of maximum probability as optimal theme label;Serializing is marked
Matrix is signed, decodes decode_outputs decoded output matrix progress condition random field crf to obtain optimal serializing label.
Such as the processing of the method for the present invention is carried out to following news:
It " steps the auspicious high-end intelligent woman issued based on big data algorithm and produces ultrasonic special machine Chinese mythology goddess Resona 8, she includes
The multinomial intelligence such as the automatic volume navigation of fetus cranium brain, the self-navigation of fetus face, the automatic volume navigation of fetal rhythm, intelligent basin baselap sound
Using, will for the pre-natal diagnosis of women, postpartum recovery, healthy reproduction bring heart to heart take good care of ";
As shown in Figure 1, the news is input in theme and key message extraction model, model is with word in coding stage
Basic unit carries out embedding, f-lstm, b-lstm respectively and obtains outputs output matrix and the final shape of finalState
State matrix;FinalState end-state matrix is carried out full connection to handle to obtain final theme label;In decryption phase,
The corresponding attention of outpouts and outputs is injected into decryption network together, carried out respectively in decryption network lstm,
Crf decode handles to obtain final serializing label, and serializing label is finally converted into corresponding slot position value.
The foregoing is merely preferable implementation examples of the invention, are not intended to restrict the invention, it is all in spirit of that invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (5)
1. a kind of method for extracting theme and key message from news, which comprises the following steps:
(1) html tag is carried out to the news content crawled on webpage to handle;
(2) to treated, news content carries out theme mark and serializing mark respectively, obtains the corresponding theme of news content
The corresponding serializing label of each word in label and news content;The theme is noted for mark news generic;Institute
Serializing mark is stated primarily directed in the case where having marked theme, determining the relevant role of theme or attribute information;
(3) it creates theme and key message extracts model, which includes a seq2seq network and a fully-connected network,
State output of the input of fully-connected network from the coding stage of seq2seq network;
(4) news data that step (2) has marked is injected into the seq2seq network for extracting model, in news content
Word is encoded, and cataloged procedure is as follows: being carried out embedding vectorization processing to each word in news content first, is obtained
Then vectorization matrix is injected into coding BiLstm bidirectional circulating neural network by vectorization matrix, obtain outputs output
Matrix and finalState end-state matrix;
(5) it is directed to theme label, finalState matrix is injected into the fully-connected network for extracting model, is obtained in logic
Between matrix of consequence, logic matrix and actual theme label are done into cross entropy and handle to obtain penalty values category_loss;
(6) for serializing label, outputs matrix progress attention attention mechanism is converted to obtain attention note
Meaning torque battle array;
(7) it follows decoding BiLstm that attention matrix and outputs matrix are input to seq2seq network together is two-way
In ring neural network, decode_outputs decoded output matrix is obtained, calculates decode_outputs square with crf loss function
Battle array penalty values solt_loss corresponding with serializing label;
(8) the whole loss value loss that category_loss is obtained extracting network plus solt_loss, then utilizes gradient
Descent method carries out backpropagation to loss, obtains the optimized parameter for extracting model;
(9) it after carrying out html tag processing to the news content not marked, is injected into theme and key message extracts in model,
Optimal theme label and serializing label is obtained, news generic is obtained according to theme label, is obtained according to serializing label
To the corresponding slot position value of news content, i.e. role or attribute information.
2. a kind of method for extracting theme and key message from news according to claim 1, which is characterized in that described
In step (4), embedding vectorization processing is carried out to each word in news content specifically: with the method for transfer learning
The good embedding word vector of pre-training is directly injected into seq2seq network, is not needed in the training process pair
Parameter in embedding word vector is updated.
3. a kind of method for extracting theme and key message from news according to claim 1, which is characterized in that described
In step (6), convert outputs matrix progress attention attention mechanism to obtain the mistake of attention attention matrix
Cheng Zhong, by the way of Self attention and Multi-head, solving traditional attention model can not parallelization
Disadvantage promotes effect and performance.
4. a kind of method for extracting theme and key message from news according to claim 1, which is characterized in that described
In step (9), theme and key message extract model output theme label matrix and serializing label matrix, for theme label
Matrix obtains the theme label of maximum probability as optimal theme label using softmax as activation primitive;For serializing
Label matrix decodes decode_outputs decoded output matrix progress condition random field crf to obtain optimal serializing mark
Label.
5. a kind of method for extracting theme and key message from news according to claim 1, which is characterized in that the party
Method uses seq2seq+attention+crf, strengthens the dependence of disaggregated model and slot filling model, reduces text marking
Complexity, while reducing project development complexity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811313654.4A CN109508459B (en) | 2018-11-06 | 2018-11-06 | Method for extracting theme and key information from news |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811313654.4A CN109508459B (en) | 2018-11-06 | 2018-11-06 | Method for extracting theme and key information from news |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109508459A true CN109508459A (en) | 2019-03-22 |
CN109508459B CN109508459B (en) | 2022-11-29 |
Family
ID=65747642
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811313654.4A Active CN109508459B (en) | 2018-11-06 | 2018-11-06 | Method for extracting theme and key information from news |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109508459B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135493A (en) * | 2019-05-15 | 2019-08-16 | 北京信息科技大学 | A kind of news topic tracking |
CN110362823A (en) * | 2019-06-21 | 2019-10-22 | 北京百度网讯科技有限公司 | The training method and device of text generation model are described |
CN110415815A (en) * | 2019-07-19 | 2019-11-05 | 银丰基因科技有限公司 | The hereditary disease assistant diagnosis system of deep learning and face biological information |
CN110532452A (en) * | 2019-07-12 | 2019-12-03 | 西安交通大学 | A kind of general crawler design method of news website based on GRU neural network |
CN110597970A (en) * | 2019-08-19 | 2019-12-20 | 华东理工大学 | Multi-granularity medical entity joint identification method and device |
CN111062217A (en) * | 2019-12-19 | 2020-04-24 | 江苏满运软件科技有限公司 | Language information processing method and device, storage medium and electronic equipment |
CN111143514A (en) * | 2019-12-27 | 2020-05-12 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN111950199A (en) * | 2020-08-11 | 2020-11-17 | 杭州叙简科技股份有限公司 | Earthquake data structured automation method based on earthquake news event |
CN112765363A (en) * | 2021-01-19 | 2021-05-07 | 昆明理工大学 | Demand map construction method for scientific and technological service demand |
CN112818687A (en) * | 2021-03-25 | 2021-05-18 | 杭州数澜科技有限公司 | Method, device, electronic equipment and storage medium for constructing title recognition model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304911A (en) * | 2018-01-09 | 2018-07-20 | 中国科学院自动化研究所 | Knowledge Extraction Method and system based on Memory Neural Networks and equipment |
CN108491372A (en) * | 2018-01-31 | 2018-09-04 | 华南理工大学 | A kind of Chinese word cutting method based on seq2seq models |
CN108595704A (en) * | 2018-05-10 | 2018-09-28 | 成都信息工程大学 | A kind of the emotion of news and classifying importance method based on soft disaggregated model |
-
2018
- 2018-11-06 CN CN201811313654.4A patent/CN109508459B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304911A (en) * | 2018-01-09 | 2018-07-20 | 中国科学院自动化研究所 | Knowledge Extraction Method and system based on Memory Neural Networks and equipment |
CN108491372A (en) * | 2018-01-31 | 2018-09-04 | 华南理工大学 | A kind of Chinese word cutting method based on seq2seq models |
CN108595704A (en) * | 2018-05-10 | 2018-09-28 | 成都信息工程大学 | A kind of the emotion of news and classifying importance method based on soft disaggregated model |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135493A (en) * | 2019-05-15 | 2019-08-16 | 北京信息科技大学 | A kind of news topic tracking |
CN110362823A (en) * | 2019-06-21 | 2019-10-22 | 北京百度网讯科技有限公司 | The training method and device of text generation model are described |
CN110532452B (en) * | 2019-07-12 | 2022-04-22 | 西安交通大学 | News website universal crawler design method based on GRU neural network |
CN110532452A (en) * | 2019-07-12 | 2019-12-03 | 西安交通大学 | A kind of general crawler design method of news website based on GRU neural network |
CN110415815A (en) * | 2019-07-19 | 2019-11-05 | 银丰基因科技有限公司 | The hereditary disease assistant diagnosis system of deep learning and face biological information |
CN110597970A (en) * | 2019-08-19 | 2019-12-20 | 华东理工大学 | Multi-granularity medical entity joint identification method and device |
CN110597970B (en) * | 2019-08-19 | 2023-04-07 | 华东理工大学 | Multi-granularity medical entity joint identification method and device |
CN111062217A (en) * | 2019-12-19 | 2020-04-24 | 江苏满运软件科技有限公司 | Language information processing method and device, storage medium and electronic equipment |
CN111062217B (en) * | 2019-12-19 | 2024-02-06 | 江苏满运软件科技有限公司 | Language information processing method and device, storage medium and electronic equipment |
CN111143514A (en) * | 2019-12-27 | 2020-05-12 | 北京百度网讯科技有限公司 | Method and apparatus for generating information |
CN111950199A (en) * | 2020-08-11 | 2020-11-17 | 杭州叙简科技股份有限公司 | Earthquake data structured automation method based on earthquake news event |
CN112765363A (en) * | 2021-01-19 | 2021-05-07 | 昆明理工大学 | Demand map construction method for scientific and technological service demand |
CN112818687A (en) * | 2021-03-25 | 2021-05-18 | 杭州数澜科技有限公司 | Method, device, electronic equipment and storage medium for constructing title recognition model |
Also Published As
Publication number | Publication date |
---|---|
CN109508459B (en) | 2022-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508459A (en) | A method of extracting theme and key message from news | |
WO2020211275A1 (en) | Pre-trained model and fine-tuning technology-based medical text relationship extraction method | |
CN109408812A (en) | A method of the sequence labelling joint based on attention mechanism extracts entity relationship | |
CN106980608A (en) | A kind of Chinese electronic health record participle and name entity recognition method and system | |
CN110232439B (en) | Intention identification method based on deep learning network | |
CN112487820B (en) | Chinese medical named entity recognition method | |
CN108717574A (en) | A kind of natural language inference method based on conjunction label and intensified learning | |
CN109284361A (en) | A kind of entity abstracting method and system based on deep learning | |
WO2021017025A1 (en) | Method for automatically generating python codes from natural language | |
CN111243699A (en) | Chinese electronic medical record entity extraction method based on word information fusion | |
CN107526798A (en) | A kind of Entity recognition based on neutral net and standardization integrated processes and model | |
CN113221571B (en) | Entity relation joint extraction method based on entity correlation attention mechanism | |
CN112269868A (en) | Use method of machine reading understanding model based on multi-task joint training | |
CN109598002A (en) | Neural machine translation method and system based on bidirectional circulating neural network | |
CN111832293A (en) | Entity and relation combined extraction method based on head entity prediction | |
CN112364132A (en) | Similarity calculation model and system based on dependency syntax and method for building system | |
CN113869055A (en) | Power grid project characteristic attribute identification method based on deep learning | |
CN109446523A (en) | Entity attribute extraction model based on BiLSTM and condition random field | |
CN114564953A (en) | Emotion target extraction model based on multiple word embedding fusion and attention mechanism | |
CN114969269A (en) | False news detection method and system based on entity identification and relation extraction | |
CN116910272B (en) | Academic knowledge graph completion method based on pre-training model T5 | |
CN117349311A (en) | Database natural language query method based on improved RetNet | |
CN116680407A (en) | Knowledge graph construction method and device | |
CN114548090B (en) | Fast relation extraction method based on convolutional neural network and improved cascade labeling | |
Ma et al. | Joint pre-trained Chinese named entity recognition based on bi-directional language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 7 / F, building B, 482 Qianmo Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province 310000 Patentee after: Huoshi Creation Technology Co.,Ltd. Address before: 7 / F, building B, 482 Qianmo Road, Xixing street, Binjiang District, Hangzhou City, Zhejiang Province 310000 Patentee before: HANGZHOU FIRESTONE TECHNOLOGY Co.,Ltd. |