CN110110330A - Text based keyword extracting method and computer equipment - Google Patents
Text based keyword extracting method and computer equipment Download PDFInfo
- Publication number
- CN110110330A CN110110330A CN201910360872.1A CN201910360872A CN110110330A CN 110110330 A CN110110330 A CN 110110330A CN 201910360872 A CN201910360872 A CN 201910360872A CN 110110330 A CN110110330 A CN 110110330A
- Authority
- CN
- China
- Prior art keywords
- keyword
- text
- vector
- analyzed
- term vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application discloses a kind of text based keyword extracting method and computer equipments, belong to field of artificial intelligence, for efficiently excavating the keyword in text.The process employs Seq2seq network structures.The network structure includes that encoder and decoder and the neural network module with attention mechanism are adjusted the output result of encoder.Using entire text as input in this method, neural network is enabled to understand the contextual information of text.Due to eliminating the trouble for taking out feature in TextRank from text without extracting feature vector.Since without subjective carry out feature abstraction, so realization is relatively easy, the extraction of keyword is applicable in long text and short text, and effect is also more stable.In addition, this method output is vector rather than keyword, there is good generalization ability.It is further outer, by introducing attention mechanism, enable to keyword to excavate more accurate.
Description
Technical field
This application involves field of artificial intelligence, in particular to a kind of text based keyword extracting method and calculating
Machine equipment.
Background technique
It in order to facilitate understanding and retrieves, the meaning of text is usually expressed with some keywords.Since different terms are expressed
Semantic ability is different, so different terms are also different to the embodiment degree of text purport.Text master can be expressed by how extracting
The keyword of purport is one important topic of natural language processing field.The extraction of keyword simultaneously, is also widely used in content and pushes away
It recommends, the fields such as semantic search.
The index for portraying word significance level has TF-IDF (term frequency-inverse in the related technology
Document frequency, word frequency), the methods of textRank (automatic abstract algorithm), classification.Wherein, TF-IDF, based on pair
Document frequency weighted calculation counts word to the importance of text;TextRank is counted by the context relation of vocabulary and is calculated
The importance of word;Sorting algorithm will be converted to classification problem to the excavation of text key word, pass through feature extraction, Seq2seq
The word of text is divided into keyword and non-key word by neural metwork training, Seq2seq neural network prediction.However the above method
There are respective some disadvantages, shows in practical applications unsatisfactory.
Summary of the invention
The embodiment of the present application provides a kind of text based keyword extracting method and computer equipment, for intelligence compared with
Accurately to extract key.
On the one hand, a kind of text based keyword extracting method is provided, which comprises
The matrix of text to be analyzed is constructed, the term vector of the participle in the matrix including arranged in sequence, wherein put in order
For sequence of the term vector in the text to be analyzed;
It is analysed to Seq2seq (sequence to sequence, sequence-sequence of the Input matrix to pre-training of text
Column) neural network, output matrix is obtained, includes at least one output vector in the output matrix;Wherein, the Seq2seq
Neural network is obtained according to the corpus training for being labeled with keyword, and when training, the input of the Seq2seq neural network
When training text matrix, output is the matrix that the corresponding keyword of training text is constituted;In the matrix that wherein keyword is constituted
Each vector is corresponding with keyword;
According to the corresponding relationship of output vector and keyword, the keyword of the text to be analyzed is determined.
It optionally, include encoder, decoder and the nerve net with attention mechanism in the Seq2seq neural network
Network module, the encoder and decoder are Recognition with Recurrent Neural Network, and the neural network module with attention mechanism is used
The coding result of each term vector is directed in the adjustment encoder.
Optionally, it is analysed to Seq2seq neural network of the Input matrix to pre-training of text, obtains output matrix,
Include:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to
The encoder obtains the state of the term vector of each input;
By the state of the current input term vector of the encoder and a upper term vector for the current input term vector
The neural network module for having attention mechanism is inputed to, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, it is described after being adjusted
The state of a upper term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
Optionally, the neural network module with attention mechanism includes the full articulamentum being sequentially connected in series, random mistake
Layer living and normalization layer softmax;
The current input term vector of the encoder of the full articulamentum for handling input and the current input
The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to
The weight parameter of amount.
Optionally, the matrix of text to be analyzed is constructed, comprising:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
Optionally, it according to the corresponding relationship of output vector and keyword, determines the keyword of the text to be analyzed, wraps
It includes:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
Optionally, the corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed, comprising:
For the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute
It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute
It states in text to be analyzed, then abandons the keyword.
Optionally, the method also includes:
If the keyword quantity of the text to be analyzed is greater than preset quantity;Then from the keyword of the text to be analyzed
Partial key word is rejected so that remaining keyword quantity is equal to the preset quantity.
Optionally, the method also includes:
If the keyword quantity of the text to be analyzed is less than preset quantity;Then searched from the keyword vector set
Keyword similar with the keyword of the text to be analyzed;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
Second aspect, the embodiment of the present application also provide a kind of text based keyword extracting device, and described device includes:
Text matrix construction unit includes the participle of arranged in sequence for constructing the matrix of text to be analyzed, in the matrix
Term vector, wherein the sequence to put in order for term vector in the text to be analyzed;
Output matrix determination unit, for being analysed to Seq2seq neural network of the Input matrix to pre-training of text,
Output matrix is obtained, includes at least one output vector in the output matrix;Wherein, the Seq2seq neural network is root
When obtaining, and training according to the corpus training for being labeled with keyword, the training text when input of the Seq2seq neural network
Matrix, output are the matrixes that the corresponding keyword of training text is constituted;Each vector and pass in the matrix that wherein keyword is constituted
Keyword is corresponding;
Keyword determination unit determines the text to be analyzed for the corresponding relationship according to output vector and keyword
Keyword.
It optionally, include encoder, decoder and the nerve net with attention mechanism in the Seq2seq neural network
Network module, the encoder and decoder are Recognition with Recurrent Neural Network, and the neural network module with attention mechanism is used
The coding result of each term vector is directed in the adjustment encoder.
Optionally, output matrix determination unit is used for:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to
The encoder obtains the state of the term vector of each input;
By the state of the current input term vector of the encoder and a upper term vector for the current input term vector
The neural network module for having attention mechanism is inputed to, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, it is described after being adjusted
The state of a upper term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
Optionally, the neural network module with attention mechanism includes the full articulamentum being sequentially connected in series, random mistake
Layer living and normalization layer softmax;
The current input term vector of the encoder of the full articulamentum for handling input and the current input
The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to
The weight parameter of amount.
Optionally, text matrix construction unit is used for:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
Optionally, keyword determination unit is used for:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
Optionally, keyword determination unit is used for:
For the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute
It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute
It states in text to be analyzed, then abandons the keyword.
Optionally, described device further include:
Filter element, if the keyword quantity for the text to be analyzed is greater than preset quantity;Then from described to be analyzed
Partial key word is rejected in the keyword of text so that remaining keyword quantity is equal to the preset quantity.
Optionally, described device further include:
Expanding element, if the keyword quantity for the text to be analyzed is less than preset quantity;Then from the keyword
Keyword similar with the keyword of the text to be analyzed is searched in vector set;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
The third aspect, provides a kind of computer equipment, including memory, processor and storage on a memory and can located
The computer program run on reason device,
The processor realizes method and step described in above-mentioned aspect when executing the computer program.
Fourth aspect provides a kind of computer readable storage medium,
The computer-readable recording medium storage has computer instruction, when the computer instruction is run on computers
When, it enables a computer to execute method described in above-mentioned aspect.
The embodiment of the present application provides a kind of method for extracting keyword, uses Seq2seq network structure in this method.
The network structure includes encoder and decoder, using entire text as the input of Seq2seq neural network, so that neural network
The contextual information of text can be understood.In addition, without extracting feature vector in this method, so eliminate in TextRank from
The trouble of feature is taken out in text.Since without subjective carry out feature abstraction, so realization is relatively easy, keyword is mentioned
It takes and is applicable in long text and short text, effect is also more stable.In addition, this method output be vector rather than it is crucial
Word has good generalization ability.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Apply embodiment, for those of ordinary skill in the art, without creative efforts, can also basis mention
The attached drawing of confession obtains other attached drawings.
Fig. 1 is one of the structural schematic diagram of Seq2seq neural network provided by the embodiments of the present application;
Fig. 2 is the second structural representation of Seq2seq neural network provided by the embodiments of the present application;
Fig. 3 is the third structural representation of Seq2seq neural network provided by the embodiments of the present application;
Fig. 4 is the Processing Algorithm general flow chart provided by the embodiments of the present application for extracting keyword;
Fig. 5 is the flow diagram of trained Seq2seq neural network provided by the embodiments of the present application;
Fig. 6 is the flow diagram of text based keyword extracting method provided by the embodiments of the present application;
Fig. 7 is another flow diagram of text based keyword extracting method provided by the embodiments of the present application;
Fig. 8 is the four of the structural schematic diagram of eq2seq neural network provided by the embodiments of the present application;
The effect display diagram of Fig. 9-Figure 11 text based keyword extracting method provided by the embodiments of the present application;
Figure 12 is the structural schematic diagram for the text based keyword extracting device that inventive embodiments provide;
Figure 13 is a kind of structural schematic diagram of computer equipment provided by the embodiments of the present application.
Specific embodiment
For the purposes, technical schemes and advantages of the application are more clearly understood, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only
It is some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people
Member's every other embodiment obtained without making creative work, shall fall in the protection scope of this application.?
In the case where not conflicting, the features in the embodiments and the embodiments of the present application can mutual any combination.Although also, flowing
Logical order is shown in journey figure, but in some cases, it can be to be different from shown or described by sequence execution herein
The step of.
Technical solution provided by the embodiments of the present application for ease of understanding, some passes that first the embodiment of the present application is used here
Key name word explains:
Text: referring to the form of expression of written language, froms the perspective of from literature angle, usually has complete, system meaning one
The combination of a sentence or multiple sentences.One text can be a sentence (Sentence), a paragraph (Paragraph) or
One chapter (Discourse) of person.
Keyword extraction: refer to the technology of the keyword of Computer Automatic Extraction text.
The abbreviation of APP:application refers in particular to be installed on the application program on smart machine.
Attention mechanism (Attention Mechanism): derived from the research to human vision.In cognitive science, by
In the bottleneck of information processing, the mankind can selectively pay close attention to a part of all information, while ignore other visible information.On
The mechanism of stating is commonly known as attention mechanism.The different position of human retina has different degrees of information processing capability, i.e.,
Acuity (Acuity), only fovea centralis position have strongest acuity.In order to rationally be believed using limited vision
Process resource is ceased, human needs select the specific part in visual zone, then concentrate and pay close attention to it.For example, people read when,
Usually only a small amount of word to be read can be concerned and handle.To sum up, mainly there are two aspects for attention mechanism: determining to need
Any part of concern input;Limited messaging resource is distributed to part and parcel.In cognition neural, attention is one
The indispensable complicated cognitive function of the kind mankind, refers to that people can ignore the selection of other information while paying close attention to some information
Ability.In daily life, people receive a large amount of feeling input by modes such as vision, the sense of hearing, tactiles.But human brain can be with
It can also work without any confusion in these extraneous INFORMATION BOMBs, be because human brain can be either intentionally or unintentionally a large amount of from these
It selects the useful information of fraction to carry out emphasis processing in input information, and ignores other information.This ability is just called attention.
Attention can be presented as external stimulation (sense of hearing, vision, sense of taste etc.), can also be presented as that internal consciousness (recall by thinking
Deng).
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates may exist
Three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Separately
Outside, character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or" in the case where not illustrating.
In the related technology, TF-IDF method, which only measures that word from the angle of word frequency, can be used as the keyword of text.
This method fails the contextual information in conjunction with text, so the keyword scope of application extracted is limited.And classification method is to text
It carries out implementing relatively difficult when feature abstraction, the extraction of keyword also fails to consider contextual information.Although TextRank
Combine the contextual information of text, but it is since its process for carrying out feature abstraction is realized complicated and subjective factor is needed to join
With effect is poor and unstable in the case where short text corpus small scale.
In view of this, the embodiment of the present application provides a kind of method for extracting keyword, Seq2seq is used in this method
(sequence to sequence, sequence-sequence) network structure.The network structure includes encoder and decoder, will be entire
Input of the text as Seq2seq neural network, enables neural network to understand the contextual information of text.In addition, the party
Without extracting feature vector in method, so eliminate the trouble for taking out feature in TextRank from text.Due to without master
The carry out feature abstraction of sight, so realization is relatively easy, the extraction of keyword is applicable in long text and short text, effect
It is more stable.In addition, this method output is vector rather than keyword, there is good generalization ability.
After having introduced the design concept of the embodiment of the present application, the implementation method of the embodiment of the present application is done into one below
Walk explanation.
One, Seq2seq neural metwork training
The contents of the section mainly introduces the composition of Seq2seq neural network in the embodiment of the present application, and how to train this
Seq2seq neural network can carry out keyword excavation.
It is as described in Figure 1 the structural schematic diagram of Seq2seq neural network, which includes encoder 11
With decoder 12.For encoding to the data of input, decoder is used to carry out the output result of encoder encoder, defeated
Outgoing vector.The vector sum keyword wherein exported is corresponding.
In training process, the text for being labeled with keyword is obtained first as corpus.The corpus of selection may include different length
The text of degree.Its corresponding matrix is constructed to each training text in corpus.The specific configuration of matrix is implementable for first to training
Text carries out word segmentation processing, obtains each participle;Then term vector is converted by each participle;The term vector of each participle is pressed later
According to sequential configuration matrix of the participle in the training text.Namely the term vector of the participle in matrix including arranged in sequence,
In, it puts in order and segments the sequence in the training text for term vector is corresponding.Correspondingly, the corresponding key of construction text
The matrix of word.Wherein, the corresponding participle of a vector in the matrix of text;In the matrix of keyword, a vector corresponding one is closed
Keyword.
Then using the matrix of text as the input of Seq2seq neural network, using the matrix of corresponding keyword as
Seq2seq neural network is trained in the output of Seq2seq neural network.
Further, strengthen in order to reach and can weaken point that cannot function as keyword as the participle of keyword
The purpose of word.Attention mechanism also is introduced to Seq2seq neural network in the embodiment of the present application.
As shown in Fig. 2, another structural schematic diagram of the Seq2seq neural network.Including encoder 11 and decoder 12, and
Neural network module 13 with attention mechanism.Nervus opticus network is the neural network with attention mechanism, main
The output for acting on adjustment encoder, so that word important in text is strengthened, and weakens unessential word.In this way,
When the coding result of encoder inputs to decoder after being adjusted, important keyword can be more accurately excavated.
When it is implemented, as shown in figure 3, the aforementioned neural network module with attention mechanism include be sequentially connected in series it is complete
Articulamentum 31, random deactivating layer 32 and normalization layer softmax33;Wherein:
The current input term vector of the encoder of the full articulamentum for handling input and the current input
The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to
The weight parameter of amount.
Letter speech the embodiment of the present application in process flow may include four-stage as described in Figure 4 i.e.:
Data prediction: segmenting text and obtains the term vector of each participle.
Seq2seq neural metwork training: Seq2seq neural network is trained according to the text of mark keyword
To the Seq2seq neural network that can extract keyword.
Seq2seq neural network prediction: the candidate of text to be analyzed is excavated using trained Seq2seq neural network
The term vector (can be discussed in detail hereinafter about the point) of keyword.
As a result post-process: the vector obtained according to Seq2seq neural network prediction determines the key of text to be analyzed
Word.
For example, as shown in figure 5, keyword mark is manually carried out for a collection of text, as training corpus.Then, to instruction
The each text practiced in corpus carries out word segmentation processing, obtains the sequence of word.Sequence Transformed by word obtains text sequence for term vector
Column (are labeled as A), and the keyword then through every article is also converted to term vector and obtains keyword sequence (labeled as B), then
A is inputed to Seq2seq neural network to be trained, enable Seq2seq neural network export text for crucial word order
Arrange B.
Two, Seq2seq neural network prediction
The part mainly introduces how by the Seq2seq neural network of aforementioned training to extract keyword, such as Fig. 6 institute
Show, be the flow diagram of this method, it may include following steps:
Step 601: constructing the matrix of text to be analyzed, the term vector of the participle in the matrix including arranged in sequence, wherein
The sequence to put in order for term vector in the text to be analyzed.
In one embodiment, word segmentation processing can be carried out to text to be analyzed, obtains each participle;Then by each participle
It is converted into term vector;Later, the sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
In one embodiment, word2vec (word to vector, for generating the model of term vector) can be passed through
Term vector is converted by each participle of acquisition.When it is implemented, some stop words that can also obtain analysis are rejected, to simplify
The data volume of the matrix of text to be analyzed.
Step 602: it is analysed to Seq2seq neural network of the Input matrix to pre-training of text, obtains output matrix,
It include at least one output vector in the output matrix;Wherein, the Seq2seq neural network is that basis is labeled with keyword
Corpus training obtain, and when training, the matrix of training text when the input of the Seq2seq neural network, output is instruction
Practice the matrix that the corresponding keyword of text is constituted;Each vector is corresponding with keyword in the matrix that wherein keyword is constituted.
Step 603: according to the corresponding relationship of output vector and keyword, determining the keyword of the text to be analyzed.
In one embodiment, it has not been able to preferably excavate keyword, as previously mentioned, introducing in the embodiment of the present application
Attention mechanism.So as previously mentioned, including encoder, decoder and having attention machine in the Seq2seq neural network
The neural network module of system, the encoder and decoder are Recognition with Recurrent Neural Network, the nerve with attention mechanism
Network module is used to adjust the coding result that the encoder is directed to each term vector.In such manner, it is possible to strengthen the work of important information
With the effect of inessential information being weakened, so that the excavation of keyword is more accurate.
In one embodiment, using attention mechanism neural network module when, as shown in fig. 7, being analysed to text
Input matrix obtains output matrix, it may include following steps to the Seq2seq neural network of pre-training:
Step 701: by the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, according to
The secondary state for inputing to the encoder and obtaining the term vector of each input;
Step 702: by the current input term vector of the encoder and it is described it is current input term vector a upper word to
The state of amount inputs to the neural network module for having attention mechanism, obtains the weight parameter of a upper term vector;
Step 703: the weight parameter of a upper term vector being multiplied with the state of a upper term vector, is adjusted
The state of a upper term vector afterwards;
Step 704: the state of each term vector adjusted sequentially being inputed into the decoder, obtains the output square
Battle array.
For example, including the term vector of multiple participles in the matrix of text.Then first vector inputs to encoder, encoder
Obtain the state of the vector.When handling second vector, the state of second vector sum, first vector is inputed to attention
The neural network module of power mechanism obtains the weight parameter of first vector.The weight parameter and first vector of first vector
State be multiplied to obtain the vector for inputing to decoder.And so on each vector be pocessed so that each inputing to decoding
The vector of device can integrate contextual information.And encoder be Recognition with Recurrent Neural Network when, the state of each vector can also integrate
The state of a upper vector, enables the state of each vector further to consider contextual information.
In one embodiment, it after the matrix predicted, for each output vector in the matrix, can close
It is searched in keyword vector set and output vector is apart from nearest vector;And the corresponding keyword of the vector found is determined as
The keyword of the text to be analyzed.
Certainly when it is implemented, the distance between the vector in output vector and keyword vector set can be calculated, when
The distance of the two just can determine when being less than distance to a declared goal finds corresponding vector in vector set.In such manner, it is possible to guarantee to surpass
Find accurate vector.
Further, it is generally the case that the keyword of extraction should be included in text to be analyzed.So the application is real
It applies in example, for the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute
It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute
It states in text to be analyzed, then abandons the keyword.That is, being not suitable for if the keyword extracted does not include in text to be analyzed
In the final keyword for doing the text, the keyword in text to be analyzed will be filtered.As a result, the key extracted
Word is more accurate.
In one embodiment, the quantity of keyword can be set according to actual needs.When the keyword that decoder extracts
When more, a part of keyword can be weeded out, when the keyword that decoder extracts is less, some keywords can be extended.
The implementable program is in terms of including following two:
1, extra keyword is rejected
If the keyword quantity of the text to be analyzed is greater than preset quantity;Then from the keyword of the text to be analyzed
Partial key word is rejected so that remaining keyword quantity is equal to the preset quantity.
It in one embodiment, can be according between the vector in each output vector of output matrix and keyword vector set
Distance come determine reject which keyword, such as reject apart from biggish keyword.
2, similar keyword is extended
In one embodiment, if the keyword quantity of the text to be analyzed is less than preset quantity;Then from the key
Keyword similar with the keyword of the text to be analyzed is searched in term vector set;The similar key that will be found
Word is determined as the newly-increased keyword of the text to be analyzed.
For example, 3 keywords of actual needs, obtain an output vector by decoder, are being closed according to the output vector
A keyword is found in keyword vector set.For expanded keyword, distance can be searched in keyword vector set and obtained
Keyword its nearest vector, and using this apart from the nearest corresponding keyword of vector as the keyword extended.
Certainly, in one embodiment, can also using with keyword similar in determining keywords semantics as extension
Keyword.For example, it is lovely and stay and sprout semantic similarity to a certain extent, it can will stay the keyword sprouted as extension.
It is described in detail below for how to excavate keyword using attention mechanism.In the embodiment of the present application, have
The neural network module of attention mechanism includes full articulamentum, random deactivating layer and softmax;It is illustrated in figure 8 the application reality
The structural schematic diagram of the Seq2seq neural network of example offer is provided.Wherein, encoder (Encoder) and decoder (Decoder) are equal
Recognition with Recurrent Neural Network can be used, for example, by using LSTM (Long Short-Term Memory, shot and long term memory network).With note
After the internal structure expansion of the neural network of meaning power mechanism (Attention) as shown in the right side in Fig. 8, comprising: full articulamentum,
Random deactivating layer and normalization layer.Wherein, Input table shows the term vector sequence of input, in1…..innIndicate current term vector,
h1……hnIndicate the state of a upper term vector for current term vector, α1…….αnIndicate the weight parameter of a upper term vector.Needle
For any term vector, the dimension of weight parameter and the dimension of the term vector are identical.
When the keyword for carrying out text to be analyzed excavates, the Input matrix of term vector composition of text is analysed to volume
Code device, encoder successively handle term vector to obtain the state of each term vector.The state of a term vector on current vector sum
The neural network module with attention mechanism is inputed to, by the full articulamentum of the neural network module with attention mechanism
After processing, random deactivating layer is transferred to handle, normalization layer is finally transferred to handle to obtain the power of a upper term vector for current term vector
Weight parameter.Then the state of a upper term vector, which is multiplied after (Multi) with its weight parameter, inputs to decoder processes.
Decoder is decoded the vector of input to obtain output vector, then found in keyword vector set with respectively
The vector of output vector sequences match, and determine key of the keyword corresponding with the matched vector as text to be analyzed
Word.
There is Seq2seq neural network provided by the embodiments of the present application to extract keyword, due to output be keyword to
Amount rather than specific keyword, therefore Seq2seq neural network have better generalization ability;In addition, filtering out not
Keyword in text to be analyzed makes output result carry out the filtering of urtext, improves the robust of keyword extraction
Property.Furthermore due to having fully considered contextual information in extraction process, ambiguity can be inhibited effectively to improve keyword extraction
Accuracy.
The result of the extracting method of keyword provided by the embodiments of the present application is opened up below with reference to three measured results
Show explanation.
1) as shown in figure 9, being when carrying out keyword excavation in the description text for this App of king's honor
The output of attention module (having the neural network of attention mechanism).Color more superficial shows that weight is higher in Fig. 6, can
Reinforced with the local weight that the keyword (i.e. having the keyword of underscore in Fig. 6) found out in mark occurs.So
In Seq2seq neural network provided by the embodiments of the present application, attention module can be good at playing the work for excavating keyword
With.
2), as shown in Figure 10, the effect to carry out the keyword obtained after keyword extraction in the text to description game
Figure.It follows that the keyword of extraction includes: big waste discipline, strategy, and rpg trains and cultivate oneself to attain immortality, these are crucial for big waste this game of recording
Word is capable of the content of the corresponding text of accurate description.
This game is killed for hero, effect is identical, and which is not described herein again.
3) it is directed to the text to be analyzed as term, text amount is typically small in text.It is provided by the embodiments of the present application
The scheme for extracting keyword also can be good at this kind of short and small text to extract keyword, the search for row information of going forward side by side.
Such as shown in Figure 11, it is assumed that the term of input is provided for " game for being suitble to children to play " by the embodiment of the present application
The keyword that is extracted from the term of scheme include " children " and " game ", so assuming that the keyword needed is 4, then
Expand keyword " picture mosaic " and " intelligence development ".As a result, when carrying out the retrieval of App, skilful brave jigsaw puzzle can be accurately positioned
Game as recommendation.
In the embodiment of the present application, finally need how many keyword that can determine according to actual needs, when one key of needs
When word, the hit rate of keyword can reach 96%, and when needing multiple keywords, the hit rate of keyword can reach 84%, therefore
This, Seq2seq neural network provided by the embodiments of the present application can be good at extracting keyword.
Referring to Figure 12, based on the same inventive concept, the embodiment of the present application also provides a kind of text based keywords
Extraction element, comprising:
Text matrix construction unit 1201, for constructing the matrix of text to be analyzed, including arranged in sequence in the matrix
The term vector of participle, wherein the sequence to put in order for term vector in the text to be analyzed;
Output matrix determination unit 1202, for being analysed to Seq2seq nerve of the Input matrix to pre-training of text
Network obtains output matrix, includes at least one output vector in the output matrix;Wherein, the Seq2seq neural network
It is to be obtained according to the corpus training for being labeled with keyword, and when training, training text when the input of the Seq2seq neural network
This matrix, output are the matrixes that the corresponding keyword of training text is constituted;Each vector in the matrix that wherein keyword is constituted
It is corresponding with keyword;
Keyword determination unit 1203 determines described to be analyzed for the corresponding relationship according to output vector and keyword
The keyword of text.
It optionally, include encoder, decoder and the nerve net with attention mechanism in the Seq2seq neural network
Network module, the encoder and decoder are Recognition with Recurrent Neural Network, and the neural network module with attention mechanism is used
The coding result of each term vector is directed in the adjustment encoder.
Optionally, output matrix determination unit is used for:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to
The encoder obtains the state of the term vector of each input;
By the state of the current input term vector of the encoder and a upper term vector for the current input term vector
The neural network module for having attention mechanism is inputed to, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, it is described after being adjusted
The state of a upper term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
Optionally, the neural network module with attention mechanism includes the full articulamentum being sequentially connected in series, random mistake
Layer living and normalization layer softmax;
The current input term vector of the encoder of the full articulamentum for handling input and the current input
The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to
The weight parameter of amount.
Optionally, text matrix construction unit is used for:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
Optionally, keyword determination unit is used for:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
Optionally, keyword determination unit is used for:
For the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute
It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute
It states in text to be analyzed, then abandons the keyword.
Optionally, described device further include:
Filter element, if the keyword quantity for the text to be analyzed is greater than preset quantity;Then from described to be analyzed
Partial key word is rejected in the keyword of text so that remaining keyword quantity is equal to the preset quantity.
Optionally, described device further include:
Expanding element, if the keyword quantity for the text to be analyzed is less than preset quantity;Then from the keyword
Keyword similar with the keyword of the text to be analyzed is searched in vector set;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
Referring to Figure 13, it is based on same technical concept, the embodiment of the present application also provides a kind of computer equipments 130, can
To include memory 1301 and processor 1302.
The memory 1301, the computer program executed for storage processor 1302.Memory 1301 can be wrapped mainly
Include storing program area and storage data area, wherein storing program area can application needed for storage program area, at least one function
Program etc.;Storage data area, which can be stored, uses created data etc. according to computer equipment.Processor 1302, can be one
A central processing unit (central processing unit, CPU), or be digital processing element etc..The application is implemented
The specific connection medium between above-mentioned memory 1301 and processor 1302 is not limited in example.The embodiment of the present application in Figure 13 with
It is connected between memory 1301 and processor 1302 by bus 1303, bus 1303 is indicated in Figure 13 with thick line, other portions
Connection type between part is only to be schematically illustrated, does not regard it as and be limited.It is total that the bus 1303 can be divided into address
Line, data/address bus, control bus etc..Only to be indicated with a thick line in Figure 13 convenient for indicating, it is not intended that only one total
Line or a type of bus.
Memory 1301 can be volatile memory (volatile memory), such as random access memory
(random-access memory, RAM);Memory 1301 is also possible to nonvolatile memory (non-volatile
Memory), such as read-only memory, flash memory (flash memory), hard disk (hard disk drive, HDD) or solid
State hard disk (solid-state drive, SSD) or memory 1301 can be used for carrying or storing have instruction or number
According to structure type desired program code and can by any other medium of computer access, but not limited to this.Memory
1301 can be the combination of above-mentioned memory.
Processor 1302 executes when for calling the computer program stored in the memory 1301 such as institute in Fig. 6-7
Method performed by equipment in the embodiment shown.
In some possible embodiments, the various aspects of method provided by the present application are also implemented as a kind of program
The form of product comprising program code, when described program product is run on a computing device, said program code is used for
Execute the computer equipment in the method according to the various illustrative embodiments of the application of this specification foregoing description
Step, for example, the computer equipment can execute method performed by equipment in embodiment as shown in figs. 6-7.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter
Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, red
The system of outside line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing
(non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory
(RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc
Read memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application
Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies
Within, then the application is also intended to include these modifications and variations.
Claims (10)
1. a kind of text based keyword extracting method, which is characterized in that the described method includes:
The matrix of text to be analyzed is constructed, the term vector of the participle in the matrix including arranged in sequence, wherein put in order as word
Sequence of the vector in the text to be analyzed;
It is analysed to Seq2seq neural network of the Input matrix to pre-training of text, obtains output matrix, the output matrix
In include at least one output vector;Wherein, the Seq2seq neural network is trained according to the corpus for being labeled with keyword
When arriving, and training, the matrix of training text when the input of the Seq2seq neural network, output is that training text is corresponding
The matrix that keyword is constituted;Each vector is corresponding with keyword in the matrix that wherein keyword is constituted;
According to the corresponding relationship of output vector and keyword, the keyword of the text to be analyzed is determined.
2. the method according to claim 1, wherein including encoder, decoding in the Seq2seq neural network
Device and neural network module with attention mechanism, the encoder and decoder are Recognition with Recurrent Neural Network, described to have
The neural network module of attention mechanism is used to adjust the coding result that the encoder is directed to each term vector.
3. according to the method described in claim 2, it is characterized in that, being analysed to the Input matrix of text to pre-training
Seq2seq neural network, obtains output matrix, comprising:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to described
Encoder obtains the state of the term vector of each input;
By the state input of the current input term vector of the encoder and a upper term vector for the current input term vector
To the neural network module with attention mechanism, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, described upper one after being adjusted
The state of term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
4. according to the method described in claim 3, it is characterized in that, the neural network module with attention mechanism includes
Full articulamentum, random deactivating layer and the normalization layer softmax being sequentially connected in series;
The full articulamentum be used for handle input the encoder current input term vector and the current input word to
The state of a upper term vector for amount;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax is for obtaining a upper term vector after the processing result of the random deactivating layer is normalized
Weight parameter.
5. the method according to claim 1, wherein constructing the matrix of text to be analyzed, comprising:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
6. the method according to claim 1, wherein being determined according to the corresponding relationship of output vector and keyword
The keyword of the text to be analyzed, comprising:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
7. according to the method described in claim 5, it is characterized in that, the corresponding keyword of the vector found is determined as described
The keyword of text to be analyzed, comprising:
For the corresponding each keyword of vector found from keyword vector set, if the keyword be included in it is described to
It analyzes in text, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword be not included in it is described to
It analyzes in text, then abandons the keyword.
8. the method according to claim 1, wherein the method also includes:
If the keyword quantity of the text to be analyzed is greater than preset quantity;Then rejected from the keyword of the text to be analyzed
Partial key word is so that remaining keyword quantity is equal to the preset quantity.
9. the method according to claim 1, wherein the method also includes:
If the keyword quantity of the text to be analyzed is less than preset quantity;Then lookup and institute from the keyword vector set
State the similar keyword of keyword of text to be analyzed;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that
The processor realizes method and step described in claim 1 to 9 any claim when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910360872.1A CN110110330B (en) | 2019-04-30 | 2019-04-30 | Keyword extraction method based on text and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910360872.1A CN110110330B (en) | 2019-04-30 | 2019-04-30 | Keyword extraction method based on text and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110110330A true CN110110330A (en) | 2019-08-09 |
CN110110330B CN110110330B (en) | 2023-08-11 |
Family
ID=67487802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910360872.1A Active CN110110330B (en) | 2019-04-30 | 2019-04-30 | Keyword extraction method based on text and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110110330B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598213A (en) * | 2019-09-06 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Keyword extraction method, device, equipment and storage medium |
CN110610003A (en) * | 2019-08-15 | 2019-12-24 | 阿里巴巴集团控股有限公司 | Method and system for assisting text annotation |
CN110705268A (en) * | 2019-09-02 | 2020-01-17 | 平安科技(深圳)有限公司 | Article subject extraction method and device based on artificial intelligence and computer-readable storage medium |
CN110796160A (en) * | 2019-09-16 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Text classification method, device and storage medium |
CN110866393A (en) * | 2019-11-19 | 2020-03-06 | 北京网聘咨询有限公司 | Resume information extraction method and system based on domain knowledge base |
CN110991175A (en) * | 2019-12-10 | 2020-04-10 | 爱驰汽车有限公司 | Text generation method, system, device and storage medium under multiple modes |
CN111178041A (en) * | 2019-12-31 | 2020-05-19 | 北京妙笔智能科技有限公司 | Intelligent text repeat system and method |
CN111274815A (en) * | 2020-01-15 | 2020-06-12 | 北京百度网讯科技有限公司 | Method and device for mining entity attention points in text |
CN111667192A (en) * | 2020-06-12 | 2020-09-15 | 北京卓越讯通科技有限公司 | Safety production risk assessment method based on NLP big data |
WO2021042516A1 (en) * | 2019-09-02 | 2021-03-11 | 平安科技(深圳)有限公司 | Named-entity recognition method and device, and computer readable storage medium |
WO2021147363A1 (en) * | 2020-01-20 | 2021-07-29 | 中国电子科技集团公司电子科学研究院 | Text-based major depressive disorder recognition method |
CN113360639A (en) * | 2020-03-06 | 2021-09-07 | 上海卓繁信息技术股份有限公司 | Short text emotion classification method and device and storage device |
CN114048742A (en) * | 2021-10-26 | 2022-02-15 | 北京师范大学 | Knowledge entity and relation extraction method of text information and text quality evaluation method |
WO2022134759A1 (en) * | 2020-12-21 | 2022-06-30 | 深圳壹账通智能科技有限公司 | Keyword generation method and apparatus, and electronic device and computer storage medium |
CN115114913A (en) * | 2021-03-18 | 2022-09-27 | 马上消费金融股份有限公司 | Labeling method, device, equipment and readable storage medium |
WO2023060795A1 (en) * | 2021-10-12 | 2023-04-20 | 平安科技(深圳)有限公司 | Automatic keyword extraction method and apparatus, and device and storage medium |
CN114048742B (en) * | 2021-10-26 | 2024-09-06 | 北京师范大学 | Knowledge entity and relation extraction method of text information and text quality assessment method |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018086470A1 (en) * | 2016-11-10 | 2018-05-17 | 腾讯科技(深圳)有限公司 | Keyword extraction method and device, and server |
CN108304364A (en) * | 2017-02-23 | 2018-07-20 | 腾讯科技(深圳)有限公司 | keyword extracting method and device |
CN108376131A (en) * | 2018-03-14 | 2018-08-07 | 中山大学 | Keyword abstraction method based on seq2seq deep neural network models |
WO2018153265A1 (en) * | 2017-02-23 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Keyword extraction method, computer device, and storage medium |
CN108536678A (en) * | 2018-04-12 | 2018-09-14 | 腾讯科技(深圳)有限公司 | Text key message extracting method, device, computer equipment and storage medium |
CN108959396A (en) * | 2018-06-04 | 2018-12-07 | 众安信息技术服务有限公司 | Machine reading model training method and device, answering method and device |
CN109299268A (en) * | 2018-10-24 | 2019-02-01 | 河南理工大学 | A kind of text emotion analysis method based on dual channel model |
CN109446328A (en) * | 2018-11-02 | 2019-03-08 | 成都四方伟业软件股份有限公司 | A kind of text recognition method, device and its storage medium |
CN109446519A (en) * | 2018-10-10 | 2019-03-08 | 西安交通大学 | A kind of text feature of fused data classification information |
CN109471933A (en) * | 2018-10-11 | 2019-03-15 | 平安科技(深圳)有限公司 | A kind of generation method of text snippet, storage medium and server |
CN109472024A (en) * | 2018-10-25 | 2019-03-15 | 安徽工业大学 | A kind of file classification method based on bidirectional circulating attention neural network |
CN109597884A (en) * | 2018-12-28 | 2019-04-09 | 北京百度网讯科技有限公司 | Talk with method, apparatus, storage medium and the terminal device generated |
CN109635284A (en) * | 2018-11-26 | 2019-04-16 | 北京邮电大学 | Text snippet method and system based on deep learning associate cumulation attention mechanism |
-
2019
- 2019-04-30 CN CN201910360872.1A patent/CN110110330B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018086470A1 (en) * | 2016-11-10 | 2018-05-17 | 腾讯科技(深圳)有限公司 | Keyword extraction method and device, and server |
CN108304364A (en) * | 2017-02-23 | 2018-07-20 | 腾讯科技(深圳)有限公司 | keyword extracting method and device |
WO2018153265A1 (en) * | 2017-02-23 | 2018-08-30 | 腾讯科技(深圳)有限公司 | Keyword extraction method, computer device, and storage medium |
CN108376131A (en) * | 2018-03-14 | 2018-08-07 | 中山大学 | Keyword abstraction method based on seq2seq deep neural network models |
CN108536678A (en) * | 2018-04-12 | 2018-09-14 | 腾讯科技(深圳)有限公司 | Text key message extracting method, device, computer equipment and storage medium |
CN108959396A (en) * | 2018-06-04 | 2018-12-07 | 众安信息技术服务有限公司 | Machine reading model training method and device, answering method and device |
CN109446519A (en) * | 2018-10-10 | 2019-03-08 | 西安交通大学 | A kind of text feature of fused data classification information |
CN109471933A (en) * | 2018-10-11 | 2019-03-15 | 平安科技(深圳)有限公司 | A kind of generation method of text snippet, storage medium and server |
CN109299268A (en) * | 2018-10-24 | 2019-02-01 | 河南理工大学 | A kind of text emotion analysis method based on dual channel model |
CN109472024A (en) * | 2018-10-25 | 2019-03-15 | 安徽工业大学 | A kind of file classification method based on bidirectional circulating attention neural network |
CN109446328A (en) * | 2018-11-02 | 2019-03-08 | 成都四方伟业软件股份有限公司 | A kind of text recognition method, device and its storage medium |
CN109635284A (en) * | 2018-11-26 | 2019-04-16 | 北京邮电大学 | Text snippet method and system based on deep learning associate cumulation attention mechanism |
CN109597884A (en) * | 2018-12-28 | 2019-04-09 | 北京百度网讯科技有限公司 | Talk with method, apparatus, storage medium and the terminal device generated |
Non-Patent Citations (3)
Title |
---|
XIAOYU LIU: "Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm Problem", CIKM \'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT * |
何鸿业;郑瑾;张祖平;: "结合词性特征与卷积神经网络的文本情感分析", 计算机工程, no. 11 * |
王盛玉;曾碧卿;商齐;韩旭丽;: "基于词注意力卷积神经网络模型的情感分析研究", 中文信息学报, no. 09 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110610003B (en) * | 2019-08-15 | 2023-09-15 | 创新先进技术有限公司 | Method and system for assisting text annotation |
CN110610003A (en) * | 2019-08-15 | 2019-12-24 | 阿里巴巴集团控股有限公司 | Method and system for assisting text annotation |
CN110705268A (en) * | 2019-09-02 | 2020-01-17 | 平安科技(深圳)有限公司 | Article subject extraction method and device based on artificial intelligence and computer-readable storage medium |
CN110705268B (en) * | 2019-09-02 | 2024-06-25 | 平安科技(深圳)有限公司 | Article subject matter extraction method and device based on artificial intelligence and computer readable storage medium |
WO2021042517A1 (en) * | 2019-09-02 | 2021-03-11 | 平安科技(深圳)有限公司 | Artificial intelligence-based article gist extraction method and device, and storage medium |
WO2021042516A1 (en) * | 2019-09-02 | 2021-03-11 | 平安科技(深圳)有限公司 | Named-entity recognition method and device, and computer readable storage medium |
CN110598213A (en) * | 2019-09-06 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Keyword extraction method, device, equipment and storage medium |
CN110796160A (en) * | 2019-09-16 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Text classification method, device and storage medium |
CN110866393A (en) * | 2019-11-19 | 2020-03-06 | 北京网聘咨询有限公司 | Resume information extraction method and system based on domain knowledge base |
CN110991175A (en) * | 2019-12-10 | 2020-04-10 | 爱驰汽车有限公司 | Text generation method, system, device and storage medium under multiple modes |
CN110991175B (en) * | 2019-12-10 | 2024-04-09 | 爱驰汽车有限公司 | Method, system, equipment and storage medium for generating text in multi-mode |
CN111178041A (en) * | 2019-12-31 | 2020-05-19 | 北京妙笔智能科技有限公司 | Intelligent text repeat system and method |
CN111178041B (en) * | 2019-12-31 | 2023-04-07 | 北京妙笔智能科技有限公司 | Intelligent text repeating system and method |
CN111274815A (en) * | 2020-01-15 | 2020-06-12 | 北京百度网讯科技有限公司 | Method and device for mining entity attention points in text |
CN111274815B (en) * | 2020-01-15 | 2024-04-12 | 北京百度网讯科技有限公司 | Method and device for mining entity focus point in text |
US11775761B2 (en) | 2020-01-15 | 2023-10-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for mining entity focus in text |
WO2021147363A1 (en) * | 2020-01-20 | 2021-07-29 | 中国电子科技集团公司电子科学研究院 | Text-based major depressive disorder recognition method |
CN113360639A (en) * | 2020-03-06 | 2021-09-07 | 上海卓繁信息技术股份有限公司 | Short text emotion classification method and device and storage device |
CN111667192A (en) * | 2020-06-12 | 2020-09-15 | 北京卓越讯通科技有限公司 | Safety production risk assessment method based on NLP big data |
WO2022134759A1 (en) * | 2020-12-21 | 2022-06-30 | 深圳壹账通智能科技有限公司 | Keyword generation method and apparatus, and electronic device and computer storage medium |
CN115114913B (en) * | 2021-03-18 | 2024-02-06 | 马上消费金融股份有限公司 | Labeling method, labeling device, labeling equipment and readable storage medium |
CN115114913A (en) * | 2021-03-18 | 2022-09-27 | 马上消费金融股份有限公司 | Labeling method, device, equipment and readable storage medium |
WO2023060795A1 (en) * | 2021-10-12 | 2023-04-20 | 平安科技(深圳)有限公司 | Automatic keyword extraction method and apparatus, and device and storage medium |
CN114048742A (en) * | 2021-10-26 | 2022-02-15 | 北京师范大学 | Knowledge entity and relation extraction method of text information and text quality evaluation method |
CN114048742B (en) * | 2021-10-26 | 2024-09-06 | 北京师范大学 | Knowledge entity and relation extraction method of text information and text quality assessment method |
Also Published As
Publication number | Publication date |
---|---|
CN110110330B (en) | 2023-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110330A (en) | Text based keyword extracting method and computer equipment | |
Bamman et al. | An annotated dataset of coreference in English literature | |
Norris | Models of visual word recognition | |
CN108829719A (en) | The non-true class quiz answers selection method of one kind and system | |
WO2015093541A1 (en) | Scenario generation device and computer program therefor | |
CN110083705A (en) | A kind of multi-hop attention depth model, method, storage medium and terminal for target emotional semantic classification | |
CN110263324A (en) | Text handling method, model training method and device | |
CN104484411B (en) | A kind of construction method of the semantic knowledge-base based on dictionary | |
CN112650840A (en) | Intelligent medical question-answering processing method and system based on knowledge graph reasoning | |
JP6403382B2 (en) | Phrase pair collection device and computer program therefor | |
CN110188272A (en) | A kind of community's question and answer web site tags recommended method based on user context | |
JP5907393B2 (en) | Complex predicate template collection device and computer program therefor | |
CN108549658A (en) | A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree | |
Barhoom et al. | Sarcasm detection in headline news using machine and deep learning algorithms | |
CN114201683A (en) | Interest activation news recommendation method and system based on multi-level matching | |
CN114386410A (en) | Training method and text processing method of pre-training model | |
CN108920446A (en) | A kind of processing method of Engineering document | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN116257698A (en) | Social network sensitivity and graceful language detection method based on supervised learning | |
Kim et al. | Enhancing Korean named entity recognition with linguistic tokenization strategies | |
CN117312514A (en) | Consultation reply method, consultation reply device and computer readable storage medium | |
Peng et al. | Encoding Text Information By Pre-trained Model For Authorship Verification. | |
Xu et al. | Research on depression tendency detection based on image and text fusion | |
CN106528764A (en) | Retrieval method and device for question type retrieval word | |
CN110245230A (en) | A kind of books stage division, system, storage medium and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |