CN110110330A - Text based keyword extracting method and computer equipment - Google Patents

Text based keyword extracting method and computer equipment Download PDF

Info

Publication number
CN110110330A
CN110110330A CN201910360872.1A CN201910360872A CN110110330A CN 110110330 A CN110110330 A CN 110110330A CN 201910360872 A CN201910360872 A CN 201910360872A CN 110110330 A CN110110330 A CN 110110330A
Authority
CN
China
Prior art keywords
keyword
text
vector
analyzed
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910360872.1A
Other languages
Chinese (zh)
Other versions
CN110110330B (en
Inventor
李钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910360872.1A priority Critical patent/CN110110330B/en
Publication of CN110110330A publication Critical patent/CN110110330A/en
Application granted granted Critical
Publication of CN110110330B publication Critical patent/CN110110330B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application discloses a kind of text based keyword extracting method and computer equipments, belong to field of artificial intelligence, for efficiently excavating the keyword in text.The process employs Seq2seq network structures.The network structure includes that encoder and decoder and the neural network module with attention mechanism are adjusted the output result of encoder.Using entire text as input in this method, neural network is enabled to understand the contextual information of text.Due to eliminating the trouble for taking out feature in TextRank from text without extracting feature vector.Since without subjective carry out feature abstraction, so realization is relatively easy, the extraction of keyword is applicable in long text and short text, and effect is also more stable.In addition, this method output is vector rather than keyword, there is good generalization ability.It is further outer, by introducing attention mechanism, enable to keyword to excavate more accurate.

Description

Text based keyword extracting method and computer equipment
Technical field
This application involves field of artificial intelligence, in particular to a kind of text based keyword extracting method and calculating Machine equipment.
Background technique
It in order to facilitate understanding and retrieves, the meaning of text is usually expressed with some keywords.Since different terms are expressed Semantic ability is different, so different terms are also different to the embodiment degree of text purport.Text master can be expressed by how extracting The keyword of purport is one important topic of natural language processing field.The extraction of keyword simultaneously, is also widely used in content and pushes away It recommends, the fields such as semantic search.
The index for portraying word significance level has TF-IDF (term frequency-inverse in the related technology Document frequency, word frequency), the methods of textRank (automatic abstract algorithm), classification.Wherein, TF-IDF, based on pair Document frequency weighted calculation counts word to the importance of text;TextRank is counted by the context relation of vocabulary and is calculated The importance of word;Sorting algorithm will be converted to classification problem to the excavation of text key word, pass through feature extraction, Seq2seq The word of text is divided into keyword and non-key word by neural metwork training, Seq2seq neural network prediction.However the above method There are respective some disadvantages, shows in practical applications unsatisfactory.
Summary of the invention
The embodiment of the present application provides a kind of text based keyword extracting method and computer equipment, for intelligence compared with Accurately to extract key.
On the one hand, a kind of text based keyword extracting method is provided, which comprises
The matrix of text to be analyzed is constructed, the term vector of the participle in the matrix including arranged in sequence, wherein put in order For sequence of the term vector in the text to be analyzed;
It is analysed to Seq2seq (sequence to sequence, sequence-sequence of the Input matrix to pre-training of text Column) neural network, output matrix is obtained, includes at least one output vector in the output matrix;Wherein, the Seq2seq Neural network is obtained according to the corpus training for being labeled with keyword, and when training, the input of the Seq2seq neural network When training text matrix, output is the matrix that the corresponding keyword of training text is constituted;In the matrix that wherein keyword is constituted Each vector is corresponding with keyword;
According to the corresponding relationship of output vector and keyword, the keyword of the text to be analyzed is determined.
It optionally, include encoder, decoder and the nerve net with attention mechanism in the Seq2seq neural network Network module, the encoder and decoder are Recognition with Recurrent Neural Network, and the neural network module with attention mechanism is used The coding result of each term vector is directed in the adjustment encoder.
Optionally, it is analysed to Seq2seq neural network of the Input matrix to pre-training of text, obtains output matrix, Include:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to The encoder obtains the state of the term vector of each input;
By the state of the current input term vector of the encoder and a upper term vector for the current input term vector The neural network module for having attention mechanism is inputed to, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, it is described after being adjusted The state of a upper term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
Optionally, the neural network module with attention mechanism includes the full articulamentum being sequentially connected in series, random mistake Layer living and normalization layer softmax;
The current input term vector of the encoder of the full articulamentum for handling input and the current input The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to The weight parameter of amount.
Optionally, the matrix of text to be analyzed is constructed, comprising:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
Optionally, it according to the corresponding relationship of output vector and keyword, determines the keyword of the text to be analyzed, wraps It includes:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
Optionally, the corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed, comprising:
For the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute It states in text to be analyzed, then abandons the keyword.
Optionally, the method also includes:
If the keyword quantity of the text to be analyzed is greater than preset quantity;Then from the keyword of the text to be analyzed Partial key word is rejected so that remaining keyword quantity is equal to the preset quantity.
Optionally, the method also includes:
If the keyword quantity of the text to be analyzed is less than preset quantity;Then searched from the keyword vector set Keyword similar with the keyword of the text to be analyzed;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
Second aspect, the embodiment of the present application also provide a kind of text based keyword extracting device, and described device includes:
Text matrix construction unit includes the participle of arranged in sequence for constructing the matrix of text to be analyzed, in the matrix Term vector, wherein the sequence to put in order for term vector in the text to be analyzed;
Output matrix determination unit, for being analysed to Seq2seq neural network of the Input matrix to pre-training of text, Output matrix is obtained, includes at least one output vector in the output matrix;Wherein, the Seq2seq neural network is root When obtaining, and training according to the corpus training for being labeled with keyword, the training text when input of the Seq2seq neural network Matrix, output are the matrixes that the corresponding keyword of training text is constituted;Each vector and pass in the matrix that wherein keyword is constituted Keyword is corresponding;
Keyword determination unit determines the text to be analyzed for the corresponding relationship according to output vector and keyword Keyword.
It optionally, include encoder, decoder and the nerve net with attention mechanism in the Seq2seq neural network Network module, the encoder and decoder are Recognition with Recurrent Neural Network, and the neural network module with attention mechanism is used The coding result of each term vector is directed in the adjustment encoder.
Optionally, output matrix determination unit is used for:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to The encoder obtains the state of the term vector of each input;
By the state of the current input term vector of the encoder and a upper term vector for the current input term vector The neural network module for having attention mechanism is inputed to, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, it is described after being adjusted The state of a upper term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
Optionally, the neural network module with attention mechanism includes the full articulamentum being sequentially connected in series, random mistake Layer living and normalization layer softmax;
The current input term vector of the encoder of the full articulamentum for handling input and the current input The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to The weight parameter of amount.
Optionally, text matrix construction unit is used for:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
Optionally, keyword determination unit is used for:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
Optionally, keyword determination unit is used for:
For the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute It states in text to be analyzed, then abandons the keyword.
Optionally, described device further include:
Filter element, if the keyword quantity for the text to be analyzed is greater than preset quantity;Then from described to be analyzed Partial key word is rejected in the keyword of text so that remaining keyword quantity is equal to the preset quantity.
Optionally, described device further include:
Expanding element, if the keyword quantity for the text to be analyzed is less than preset quantity;Then from the keyword Keyword similar with the keyword of the text to be analyzed is searched in vector set;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
The third aspect, provides a kind of computer equipment, including memory, processor and storage on a memory and can located The computer program run on reason device,
The processor realizes method and step described in above-mentioned aspect when executing the computer program.
Fourth aspect provides a kind of computer readable storage medium,
The computer-readable recording medium storage has computer instruction, when the computer instruction is run on computers When, it enables a computer to execute method described in above-mentioned aspect.
The embodiment of the present application provides a kind of method for extracting keyword, uses Seq2seq network structure in this method. The network structure includes encoder and decoder, using entire text as the input of Seq2seq neural network, so that neural network The contextual information of text can be understood.In addition, without extracting feature vector in this method, so eliminate in TextRank from The trouble of feature is taken out in text.Since without subjective carry out feature abstraction, so realization is relatively easy, keyword is mentioned It takes and is applicable in long text and short text, effect is also more stable.In addition, this method output be vector rather than it is crucial Word has good generalization ability.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Apply embodiment, for those of ordinary skill in the art, without creative efforts, can also basis mention The attached drawing of confession obtains other attached drawings.
Fig. 1 is one of the structural schematic diagram of Seq2seq neural network provided by the embodiments of the present application;
Fig. 2 is the second structural representation of Seq2seq neural network provided by the embodiments of the present application;
Fig. 3 is the third structural representation of Seq2seq neural network provided by the embodiments of the present application;
Fig. 4 is the Processing Algorithm general flow chart provided by the embodiments of the present application for extracting keyword;
Fig. 5 is the flow diagram of trained Seq2seq neural network provided by the embodiments of the present application;
Fig. 6 is the flow diagram of text based keyword extracting method provided by the embodiments of the present application;
Fig. 7 is another flow diagram of text based keyword extracting method provided by the embodiments of the present application;
Fig. 8 is the four of the structural schematic diagram of eq2seq neural network provided by the embodiments of the present application;
The effect display diagram of Fig. 9-Figure 11 text based keyword extracting method provided by the embodiments of the present application;
Figure 12 is the structural schematic diagram for the text based keyword extracting device that inventive embodiments provide;
Figure 13 is a kind of structural schematic diagram of computer equipment provided by the embodiments of the present application.
Specific embodiment
For the purposes, technical schemes and advantages of the application are more clearly understood, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only It is some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, shall fall in the protection scope of this application.? In the case where not conflicting, the features in the embodiments and the embodiments of the present application can mutual any combination.Although also, flowing Logical order is shown in journey figure, but in some cases, it can be to be different from shown or described by sequence execution herein The step of.
Technical solution provided by the embodiments of the present application for ease of understanding, some passes that first the embodiment of the present application is used here Key name word explains:
Text: referring to the form of expression of written language, froms the perspective of from literature angle, usually has complete, system meaning one The combination of a sentence or multiple sentences.One text can be a sentence (Sentence), a paragraph (Paragraph) or One chapter (Discourse) of person.
Keyword extraction: refer to the technology of the keyword of Computer Automatic Extraction text.
The abbreviation of APP:application refers in particular to be installed on the application program on smart machine.
Attention mechanism (Attention Mechanism): derived from the research to human vision.In cognitive science, by In the bottleneck of information processing, the mankind can selectively pay close attention to a part of all information, while ignore other visible information.On The mechanism of stating is commonly known as attention mechanism.The different position of human retina has different degrees of information processing capability, i.e., Acuity (Acuity), only fovea centralis position have strongest acuity.In order to rationally be believed using limited vision Process resource is ceased, human needs select the specific part in visual zone, then concentrate and pay close attention to it.For example, people read when, Usually only a small amount of word to be read can be concerned and handle.To sum up, mainly there are two aspects for attention mechanism: determining to need Any part of concern input;Limited messaging resource is distributed to part and parcel.In cognition neural, attention is one The indispensable complicated cognitive function of the kind mankind, refers to that people can ignore the selection of other information while paying close attention to some information Ability.In daily life, people receive a large amount of feeling input by modes such as vision, the sense of hearing, tactiles.But human brain can be with It can also work without any confusion in these extraneous INFORMATION BOMBs, be because human brain can be either intentionally or unintentionally a large amount of from these It selects the useful information of fraction to carry out emphasis processing in input information, and ignores other information.This ability is just called attention. Attention can be presented as external stimulation (sense of hearing, vision, sense of taste etc.), can also be presented as that internal consciousness (recall by thinking Deng).
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates may exist Three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Separately Outside, character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or" in the case where not illustrating.
In the related technology, TF-IDF method, which only measures that word from the angle of word frequency, can be used as the keyword of text. This method fails the contextual information in conjunction with text, so the keyword scope of application extracted is limited.And classification method is to text It carries out implementing relatively difficult when feature abstraction, the extraction of keyword also fails to consider contextual information.Although TextRank Combine the contextual information of text, but it is since its process for carrying out feature abstraction is realized complicated and subjective factor is needed to join With effect is poor and unstable in the case where short text corpus small scale.
In view of this, the embodiment of the present application provides a kind of method for extracting keyword, Seq2seq is used in this method (sequence to sequence, sequence-sequence) network structure.The network structure includes encoder and decoder, will be entire Input of the text as Seq2seq neural network, enables neural network to understand the contextual information of text.In addition, the party Without extracting feature vector in method, so eliminate the trouble for taking out feature in TextRank from text.Due to without master The carry out feature abstraction of sight, so realization is relatively easy, the extraction of keyword is applicable in long text and short text, effect It is more stable.In addition, this method output is vector rather than keyword, there is good generalization ability.
After having introduced the design concept of the embodiment of the present application, the implementation method of the embodiment of the present application is done into one below Walk explanation.
One, Seq2seq neural metwork training
The contents of the section mainly introduces the composition of Seq2seq neural network in the embodiment of the present application, and how to train this Seq2seq neural network can carry out keyword excavation.
It is as described in Figure 1 the structural schematic diagram of Seq2seq neural network, which includes encoder 11 With decoder 12.For encoding to the data of input, decoder is used to carry out the output result of encoder encoder, defeated Outgoing vector.The vector sum keyword wherein exported is corresponding.
In training process, the text for being labeled with keyword is obtained first as corpus.The corpus of selection may include different length The text of degree.Its corresponding matrix is constructed to each training text in corpus.The specific configuration of matrix is implementable for first to training Text carries out word segmentation processing, obtains each participle;Then term vector is converted by each participle;The term vector of each participle is pressed later According to sequential configuration matrix of the participle in the training text.Namely the term vector of the participle in matrix including arranged in sequence, In, it puts in order and segments the sequence in the training text for term vector is corresponding.Correspondingly, the corresponding key of construction text The matrix of word.Wherein, the corresponding participle of a vector in the matrix of text;In the matrix of keyword, a vector corresponding one is closed Keyword.
Then using the matrix of text as the input of Seq2seq neural network, using the matrix of corresponding keyword as Seq2seq neural network is trained in the output of Seq2seq neural network.
Further, strengthen in order to reach and can weaken point that cannot function as keyword as the participle of keyword The purpose of word.Attention mechanism also is introduced to Seq2seq neural network in the embodiment of the present application.
As shown in Fig. 2, another structural schematic diagram of the Seq2seq neural network.Including encoder 11 and decoder 12, and Neural network module 13 with attention mechanism.Nervus opticus network is the neural network with attention mechanism, main The output for acting on adjustment encoder, so that word important in text is strengthened, and weakens unessential word.In this way, When the coding result of encoder inputs to decoder after being adjusted, important keyword can be more accurately excavated.
When it is implemented, as shown in figure 3, the aforementioned neural network module with attention mechanism include be sequentially connected in series it is complete Articulamentum 31, random deactivating layer 32 and normalization layer softmax33;Wherein:
The current input term vector of the encoder of the full articulamentum for handling input and the current input The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to The weight parameter of amount.
Letter speech the embodiment of the present application in process flow may include four-stage as described in Figure 4 i.e.:
Data prediction: segmenting text and obtains the term vector of each participle.
Seq2seq neural metwork training: Seq2seq neural network is trained according to the text of mark keyword To the Seq2seq neural network that can extract keyword.
Seq2seq neural network prediction: the candidate of text to be analyzed is excavated using trained Seq2seq neural network The term vector (can be discussed in detail hereinafter about the point) of keyword.
As a result post-process: the vector obtained according to Seq2seq neural network prediction determines the key of text to be analyzed Word.
For example, as shown in figure 5, keyword mark is manually carried out for a collection of text, as training corpus.Then, to instruction The each text practiced in corpus carries out word segmentation processing, obtains the sequence of word.Sequence Transformed by word obtains text sequence for term vector Column (are labeled as A), and the keyword then through every article is also converted to term vector and obtains keyword sequence (labeled as B), then A is inputed to Seq2seq neural network to be trained, enable Seq2seq neural network export text for crucial word order Arrange B.
Two, Seq2seq neural network prediction
The part mainly introduces how by the Seq2seq neural network of aforementioned training to extract keyword, such as Fig. 6 institute Show, be the flow diagram of this method, it may include following steps:
Step 601: constructing the matrix of text to be analyzed, the term vector of the participle in the matrix including arranged in sequence, wherein The sequence to put in order for term vector in the text to be analyzed.
In one embodiment, word segmentation processing can be carried out to text to be analyzed, obtains each participle;Then by each participle It is converted into term vector;Later, the sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
In one embodiment, word2vec (word to vector, for generating the model of term vector) can be passed through Term vector is converted by each participle of acquisition.When it is implemented, some stop words that can also obtain analysis are rejected, to simplify The data volume of the matrix of text to be analyzed.
Step 602: it is analysed to Seq2seq neural network of the Input matrix to pre-training of text, obtains output matrix, It include at least one output vector in the output matrix;Wherein, the Seq2seq neural network is that basis is labeled with keyword Corpus training obtain, and when training, the matrix of training text when the input of the Seq2seq neural network, output is instruction Practice the matrix that the corresponding keyword of text is constituted;Each vector is corresponding with keyword in the matrix that wherein keyword is constituted.
Step 603: according to the corresponding relationship of output vector and keyword, determining the keyword of the text to be analyzed.
In one embodiment, it has not been able to preferably excavate keyword, as previously mentioned, introducing in the embodiment of the present application Attention mechanism.So as previously mentioned, including encoder, decoder and having attention machine in the Seq2seq neural network The neural network module of system, the encoder and decoder are Recognition with Recurrent Neural Network, the nerve with attention mechanism Network module is used to adjust the coding result that the encoder is directed to each term vector.In such manner, it is possible to strengthen the work of important information With the effect of inessential information being weakened, so that the excavation of keyword is more accurate.
In one embodiment, using attention mechanism neural network module when, as shown in fig. 7, being analysed to text Input matrix obtains output matrix, it may include following steps to the Seq2seq neural network of pre-training:
Step 701: by the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, according to The secondary state for inputing to the encoder and obtaining the term vector of each input;
Step 702: by the current input term vector of the encoder and it is described it is current input term vector a upper word to The state of amount inputs to the neural network module for having attention mechanism, obtains the weight parameter of a upper term vector;
Step 703: the weight parameter of a upper term vector being multiplied with the state of a upper term vector, is adjusted The state of a upper term vector afterwards;
Step 704: the state of each term vector adjusted sequentially being inputed into the decoder, obtains the output square Battle array.
For example, including the term vector of multiple participles in the matrix of text.Then first vector inputs to encoder, encoder Obtain the state of the vector.When handling second vector, the state of second vector sum, first vector is inputed to attention The neural network module of power mechanism obtains the weight parameter of first vector.The weight parameter and first vector of first vector State be multiplied to obtain the vector for inputing to decoder.And so on each vector be pocessed so that each inputing to decoding The vector of device can integrate contextual information.And encoder be Recognition with Recurrent Neural Network when, the state of each vector can also integrate The state of a upper vector, enables the state of each vector further to consider contextual information.
In one embodiment, it after the matrix predicted, for each output vector in the matrix, can close It is searched in keyword vector set and output vector is apart from nearest vector;And the corresponding keyword of the vector found is determined as The keyword of the text to be analyzed.
Certainly when it is implemented, the distance between the vector in output vector and keyword vector set can be calculated, when The distance of the two just can determine when being less than distance to a declared goal finds corresponding vector in vector set.In such manner, it is possible to guarantee to surpass Find accurate vector.
Further, it is generally the case that the keyword of extraction should be included in text to be analyzed.So the application is real It applies in example, for the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute It states in text to be analyzed, then abandons the keyword.That is, being not suitable for if the keyword extracted does not include in text to be analyzed In the final keyword for doing the text, the keyword in text to be analyzed will be filtered.As a result, the key extracted Word is more accurate.
In one embodiment, the quantity of keyword can be set according to actual needs.When the keyword that decoder extracts When more, a part of keyword can be weeded out, when the keyword that decoder extracts is less, some keywords can be extended. The implementable program is in terms of including following two:
1, extra keyword is rejected
If the keyword quantity of the text to be analyzed is greater than preset quantity;Then from the keyword of the text to be analyzed Partial key word is rejected so that remaining keyword quantity is equal to the preset quantity.
It in one embodiment, can be according between the vector in each output vector of output matrix and keyword vector set Distance come determine reject which keyword, such as reject apart from biggish keyword.
2, similar keyword is extended
In one embodiment, if the keyword quantity of the text to be analyzed is less than preset quantity;Then from the key Keyword similar with the keyword of the text to be analyzed is searched in term vector set;The similar key that will be found Word is determined as the newly-increased keyword of the text to be analyzed.
For example, 3 keywords of actual needs, obtain an output vector by decoder, are being closed according to the output vector A keyword is found in keyword vector set.For expanded keyword, distance can be searched in keyword vector set and obtained Keyword its nearest vector, and using this apart from the nearest corresponding keyword of vector as the keyword extended.
Certainly, in one embodiment, can also using with keyword similar in determining keywords semantics as extension Keyword.For example, it is lovely and stay and sprout semantic similarity to a certain extent, it can will stay the keyword sprouted as extension.
It is described in detail below for how to excavate keyword using attention mechanism.In the embodiment of the present application, have The neural network module of attention mechanism includes full articulamentum, random deactivating layer and softmax;It is illustrated in figure 8 the application reality The structural schematic diagram of the Seq2seq neural network of example offer is provided.Wherein, encoder (Encoder) and decoder (Decoder) are equal Recognition with Recurrent Neural Network can be used, for example, by using LSTM (Long Short-Term Memory, shot and long term memory network).With note After the internal structure expansion of the neural network of meaning power mechanism (Attention) as shown in the right side in Fig. 8, comprising: full articulamentum, Random deactivating layer and normalization layer.Wherein, Input table shows the term vector sequence of input, in1…..innIndicate current term vector, h1……hnIndicate the state of a upper term vector for current term vector, α1…….αnIndicate the weight parameter of a upper term vector.Needle For any term vector, the dimension of weight parameter and the dimension of the term vector are identical.
When the keyword for carrying out text to be analyzed excavates, the Input matrix of term vector composition of text is analysed to volume Code device, encoder successively handle term vector to obtain the state of each term vector.The state of a term vector on current vector sum The neural network module with attention mechanism is inputed to, by the full articulamentum of the neural network module with attention mechanism After processing, random deactivating layer is transferred to handle, normalization layer is finally transferred to handle to obtain the power of a upper term vector for current term vector Weight parameter.Then the state of a upper term vector, which is multiplied after (Multi) with its weight parameter, inputs to decoder processes.
Decoder is decoded the vector of input to obtain output vector, then found in keyword vector set with respectively The vector of output vector sequences match, and determine key of the keyword corresponding with the matched vector as text to be analyzed Word.
There is Seq2seq neural network provided by the embodiments of the present application to extract keyword, due to output be keyword to Amount rather than specific keyword, therefore Seq2seq neural network have better generalization ability;In addition, filtering out not Keyword in text to be analyzed makes output result carry out the filtering of urtext, improves the robust of keyword extraction Property.Furthermore due to having fully considered contextual information in extraction process, ambiguity can be inhibited effectively to improve keyword extraction Accuracy.
The result of the extracting method of keyword provided by the embodiments of the present application is opened up below with reference to three measured results Show explanation.
1) as shown in figure 9, being when carrying out keyword excavation in the description text for this App of king's honor The output of attention module (having the neural network of attention mechanism).Color more superficial shows that weight is higher in Fig. 6, can Reinforced with the local weight that the keyword (i.e. having the keyword of underscore in Fig. 6) found out in mark occurs.So In Seq2seq neural network provided by the embodiments of the present application, attention module can be good at playing the work for excavating keyword With.
2), as shown in Figure 10, the effect to carry out the keyword obtained after keyword extraction in the text to description game Figure.It follows that the keyword of extraction includes: big waste discipline, strategy, and rpg trains and cultivate oneself to attain immortality, these are crucial for big waste this game of recording Word is capable of the content of the corresponding text of accurate description.
This game is killed for hero, effect is identical, and which is not described herein again.
3) it is directed to the text to be analyzed as term, text amount is typically small in text.It is provided by the embodiments of the present application The scheme for extracting keyword also can be good at this kind of short and small text to extract keyword, the search for row information of going forward side by side.
Such as shown in Figure 11, it is assumed that the term of input is provided for " game for being suitble to children to play " by the embodiment of the present application The keyword that is extracted from the term of scheme include " children " and " game ", so assuming that the keyword needed is 4, then Expand keyword " picture mosaic " and " intelligence development ".As a result, when carrying out the retrieval of App, skilful brave jigsaw puzzle can be accurately positioned Game as recommendation.
In the embodiment of the present application, finally need how many keyword that can determine according to actual needs, when one key of needs When word, the hit rate of keyword can reach 96%, and when needing multiple keywords, the hit rate of keyword can reach 84%, therefore This, Seq2seq neural network provided by the embodiments of the present application can be good at extracting keyword.
Referring to Figure 12, based on the same inventive concept, the embodiment of the present application also provides a kind of text based keywords Extraction element, comprising:
Text matrix construction unit 1201, for constructing the matrix of text to be analyzed, including arranged in sequence in the matrix The term vector of participle, wherein the sequence to put in order for term vector in the text to be analyzed;
Output matrix determination unit 1202, for being analysed to Seq2seq nerve of the Input matrix to pre-training of text Network obtains output matrix, includes at least one output vector in the output matrix;Wherein, the Seq2seq neural network It is to be obtained according to the corpus training for being labeled with keyword, and when training, training text when the input of the Seq2seq neural network This matrix, output are the matrixes that the corresponding keyword of training text is constituted;Each vector in the matrix that wherein keyword is constituted It is corresponding with keyword;
Keyword determination unit 1203 determines described to be analyzed for the corresponding relationship according to output vector and keyword The keyword of text.
It optionally, include encoder, decoder and the nerve net with attention mechanism in the Seq2seq neural network Network module, the encoder and decoder are Recognition with Recurrent Neural Network, and the neural network module with attention mechanism is used The coding result of each term vector is directed in the adjustment encoder.
Optionally, output matrix determination unit is used for:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to The encoder obtains the state of the term vector of each input;
By the state of the current input term vector of the encoder and a upper term vector for the current input term vector The neural network module for having attention mechanism is inputed to, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, it is described after being adjusted The state of a upper term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
Optionally, the neural network module with attention mechanism includes the full articulamentum being sequentially connected in series, random mistake Layer living and normalization layer softmax;
The current input term vector of the encoder of the full articulamentum for handling input and the current input The state of a upper term vector for term vector;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax for obtain after the processing result of the random deactivating layer is normalized a upper word to The weight parameter of amount.
Optionally, text matrix construction unit is used for:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
Optionally, keyword determination unit is used for:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
Optionally, keyword determination unit is used for:
For the corresponding each keyword of vector found from keyword vector set, if the keyword is included in institute It states in text to be analyzed, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword is not included in institute It states in text to be analyzed, then abandons the keyword.
Optionally, described device further include:
Filter element, if the keyword quantity for the text to be analyzed is greater than preset quantity;Then from described to be analyzed Partial key word is rejected in the keyword of text so that remaining keyword quantity is equal to the preset quantity.
Optionally, described device further include:
Expanding element, if the keyword quantity for the text to be analyzed is less than preset quantity;Then from the keyword Keyword similar with the keyword of the text to be analyzed is searched in vector set;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
Referring to Figure 13, it is based on same technical concept, the embodiment of the present application also provides a kind of computer equipments 130, can To include memory 1301 and processor 1302.
The memory 1301, the computer program executed for storage processor 1302.Memory 1301 can be wrapped mainly Include storing program area and storage data area, wherein storing program area can application needed for storage program area, at least one function Program etc.;Storage data area, which can be stored, uses created data etc. according to computer equipment.Processor 1302, can be one A central processing unit (central processing unit, CPU), or be digital processing element etc..The application is implemented The specific connection medium between above-mentioned memory 1301 and processor 1302 is not limited in example.The embodiment of the present application in Figure 13 with It is connected between memory 1301 and processor 1302 by bus 1303, bus 1303 is indicated in Figure 13 with thick line, other portions Connection type between part is only to be schematically illustrated, does not regard it as and be limited.It is total that the bus 1303 can be divided into address Line, data/address bus, control bus etc..Only to be indicated with a thick line in Figure 13 convenient for indicating, it is not intended that only one total Line or a type of bus.
Memory 1301 can be volatile memory (volatile memory), such as random access memory (random-access memory, RAM);Memory 1301 is also possible to nonvolatile memory (non-volatile Memory), such as read-only memory, flash memory (flash memory), hard disk (hard disk drive, HDD) or solid State hard disk (solid-state drive, SSD) or memory 1301 can be used for carrying or storing have instruction or number According to structure type desired program code and can by any other medium of computer access, but not limited to this.Memory 1301 can be the combination of above-mentioned memory.
Processor 1302 executes when for calling the computer program stored in the memory 1301 such as institute in Fig. 6-7 Method performed by equipment in the embodiment shown.
In some possible embodiments, the various aspects of method provided by the present application are also implemented as a kind of program The form of product comprising program code, when described program product is run on a computing device, said program code is used for Execute the computer equipment in the method according to the various illustrative embodiments of the application of this specification foregoing description Step, for example, the computer equipment can execute method performed by equipment in embodiment as shown in figs. 6-7.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, red The system of outside line or semiconductor, device or device, or any above combination.The more specific example of readable storage medium storing program for executing (non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc Read memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.

Claims (10)

1. a kind of text based keyword extracting method, which is characterized in that the described method includes:
The matrix of text to be analyzed is constructed, the term vector of the participle in the matrix including arranged in sequence, wherein put in order as word Sequence of the vector in the text to be analyzed;
It is analysed to Seq2seq neural network of the Input matrix to pre-training of text, obtains output matrix, the output matrix In include at least one output vector;Wherein, the Seq2seq neural network is trained according to the corpus for being labeled with keyword When arriving, and training, the matrix of training text when the input of the Seq2seq neural network, output is that training text is corresponding The matrix that keyword is constituted;Each vector is corresponding with keyword in the matrix that wherein keyword is constituted;
According to the corresponding relationship of output vector and keyword, the keyword of the text to be analyzed is determined.
2. the method according to claim 1, wherein including encoder, decoding in the Seq2seq neural network Device and neural network module with attention mechanism, the encoder and decoder are Recognition with Recurrent Neural Network, described to have The neural network module of attention mechanism is used to adjust the coding result that the encoder is directed to each term vector.
3. according to the method described in claim 2, it is characterized in that, being analysed to the Input matrix of text to pre-training Seq2seq neural network, obtains output matrix, comprising:
By the term vector in the matrix of the text to be analyzed according to the sequence in the text to be analyzed, sequentially input to described Encoder obtains the state of the term vector of each input;
By the state input of the current input term vector of the encoder and a upper term vector for the current input term vector To the neural network module with attention mechanism, the weight parameter of a upper term vector is obtained;
The weight parameter of a upper term vector is multiplied with the state of a upper term vector, described upper one after being adjusted The state of term vector;
The state of each term vector adjusted is sequentially inputed into the decoder, obtains the output matrix.
4. according to the method described in claim 3, it is characterized in that, the neural network module with attention mechanism includes Full articulamentum, random deactivating layer and the normalization layer softmax being sequentially connected in series;
The full articulamentum be used for handle input the encoder current input term vector and the current input word to The state of a upper term vector for amount;
The random deactivating layer is used to handle the processing result of the full articulamentum;
The softmax is for obtaining a upper term vector after the processing result of the random deactivating layer is normalized Weight parameter.
5. the method according to claim 1, wherein constructing the matrix of text to be analyzed, comprising:
Word segmentation processing is carried out to text to be analyzed, obtains each participle;
Term vector is converted by each participle;
Sequential configuration matrix by the term vector of each participle according to participle in the text to be analyzed.
6. the method according to claim 1, wherein being determined according to the corresponding relationship of output vector and keyword The keyword of the text to be analyzed, comprising:
It is searched in keyword vector set and output vector is apart from nearest vector;
The corresponding keyword of the vector found is determined as to the keyword of the text to be analyzed.
7. according to the method described in claim 5, it is characterized in that, the corresponding keyword of the vector found is determined as described The keyword of text to be analyzed, comprising:
For the corresponding each keyword of vector found from keyword vector set, if the keyword be included in it is described to It analyzes in text, then the keyword is determined as to the keyword of the text to be analyzed;If the keyword be not included in it is described to It analyzes in text, then abandons the keyword.
8. the method according to claim 1, wherein the method also includes:
If the keyword quantity of the text to be analyzed is greater than preset quantity;Then rejected from the keyword of the text to be analyzed Partial key word is so that remaining keyword quantity is equal to the preset quantity.
9. the method according to claim 1, wherein the method also includes:
If the keyword quantity of the text to be analyzed is less than preset quantity;Then lookup and institute from the keyword vector set State the similar keyword of keyword of text to be analyzed;
The similar keyword found is determined as to the newly-increased keyword of the text to be analyzed.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that
The processor realizes method and step described in claim 1 to 9 any claim when executing the computer program.
CN201910360872.1A 2019-04-30 2019-04-30 Keyword extraction method based on text and computer equipment Active CN110110330B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910360872.1A CN110110330B (en) 2019-04-30 2019-04-30 Keyword extraction method based on text and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910360872.1A CN110110330B (en) 2019-04-30 2019-04-30 Keyword extraction method based on text and computer equipment

Publications (2)

Publication Number Publication Date
CN110110330A true CN110110330A (en) 2019-08-09
CN110110330B CN110110330B (en) 2023-08-11

Family

ID=67487802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910360872.1A Active CN110110330B (en) 2019-04-30 2019-04-30 Keyword extraction method based on text and computer equipment

Country Status (1)

Country Link
CN (1) CN110110330B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598213A (en) * 2019-09-06 2019-12-20 腾讯科技(深圳)有限公司 Keyword extraction method, device, equipment and storage medium
CN110610003A (en) * 2019-08-15 2019-12-24 阿里巴巴集团控股有限公司 Method and system for assisting text annotation
CN110705268A (en) * 2019-09-02 2020-01-17 平安科技(深圳)有限公司 Article subject extraction method and device based on artificial intelligence and computer-readable storage medium
CN110796160A (en) * 2019-09-16 2020-02-14 腾讯科技(深圳)有限公司 Text classification method, device and storage medium
CN110866393A (en) * 2019-11-19 2020-03-06 北京网聘咨询有限公司 Resume information extraction method and system based on domain knowledge base
CN110991175A (en) * 2019-12-10 2020-04-10 爱驰汽车有限公司 Text generation method, system, device and storage medium under multiple modes
CN111178041A (en) * 2019-12-31 2020-05-19 北京妙笔智能科技有限公司 Intelligent text repeat system and method
CN111274815A (en) * 2020-01-15 2020-06-12 北京百度网讯科技有限公司 Method and device for mining entity attention points in text
CN111667192A (en) * 2020-06-12 2020-09-15 北京卓越讯通科技有限公司 Safety production risk assessment method based on NLP big data
WO2021042516A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Named-entity recognition method and device, and computer readable storage medium
WO2021147363A1 (en) * 2020-01-20 2021-07-29 中国电子科技集团公司电子科学研究院 Text-based major depressive disorder recognition method
CN113360639A (en) * 2020-03-06 2021-09-07 上海卓繁信息技术股份有限公司 Short text emotion classification method and device and storage device
CN114048742A (en) * 2021-10-26 2022-02-15 北京师范大学 Knowledge entity and relation extraction method of text information and text quality evaluation method
WO2022134759A1 (en) * 2020-12-21 2022-06-30 深圳壹账通智能科技有限公司 Keyword generation method and apparatus, and electronic device and computer storage medium
CN115114913A (en) * 2021-03-18 2022-09-27 马上消费金融股份有限公司 Labeling method, device, equipment and readable storage medium
WO2023060795A1 (en) * 2021-10-12 2023-04-20 平安科技(深圳)有限公司 Automatic keyword extraction method and apparatus, and device and storage medium
CN114048742B (en) * 2021-10-26 2024-09-06 北京师范大学 Knowledge entity and relation extraction method of text information and text quality assessment method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018086470A1 (en) * 2016-11-10 2018-05-17 腾讯科技(深圳)有限公司 Keyword extraction method and device, and server
CN108304364A (en) * 2017-02-23 2018-07-20 腾讯科技(深圳)有限公司 keyword extracting method and device
CN108376131A (en) * 2018-03-14 2018-08-07 中山大学 Keyword abstraction method based on seq2seq deep neural network models
WO2018153265A1 (en) * 2017-02-23 2018-08-30 腾讯科技(深圳)有限公司 Keyword extraction method, computer device, and storage medium
CN108536678A (en) * 2018-04-12 2018-09-14 腾讯科技(深圳)有限公司 Text key message extracting method, device, computer equipment and storage medium
CN108959396A (en) * 2018-06-04 2018-12-07 众安信息技术服务有限公司 Machine reading model training method and device, answering method and device
CN109299268A (en) * 2018-10-24 2019-02-01 河南理工大学 A kind of text emotion analysis method based on dual channel model
CN109446328A (en) * 2018-11-02 2019-03-08 成都四方伟业软件股份有限公司 A kind of text recognition method, device and its storage medium
CN109446519A (en) * 2018-10-10 2019-03-08 西安交通大学 A kind of text feature of fused data classification information
CN109471933A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of generation method of text snippet, storage medium and server
CN109472024A (en) * 2018-10-25 2019-03-15 安徽工业大学 A kind of file classification method based on bidirectional circulating attention neural network
CN109597884A (en) * 2018-12-28 2019-04-09 北京百度网讯科技有限公司 Talk with method, apparatus, storage medium and the terminal device generated
CN109635284A (en) * 2018-11-26 2019-04-16 北京邮电大学 Text snippet method and system based on deep learning associate cumulation attention mechanism

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018086470A1 (en) * 2016-11-10 2018-05-17 腾讯科技(深圳)有限公司 Keyword extraction method and device, and server
CN108304364A (en) * 2017-02-23 2018-07-20 腾讯科技(深圳)有限公司 keyword extracting method and device
WO2018153265A1 (en) * 2017-02-23 2018-08-30 腾讯科技(深圳)有限公司 Keyword extraction method, computer device, and storage medium
CN108376131A (en) * 2018-03-14 2018-08-07 中山大学 Keyword abstraction method based on seq2seq deep neural network models
CN108536678A (en) * 2018-04-12 2018-09-14 腾讯科技(深圳)有限公司 Text key message extracting method, device, computer equipment and storage medium
CN108959396A (en) * 2018-06-04 2018-12-07 众安信息技术服务有限公司 Machine reading model training method and device, answering method and device
CN109446519A (en) * 2018-10-10 2019-03-08 西安交通大学 A kind of text feature of fused data classification information
CN109471933A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of generation method of text snippet, storage medium and server
CN109299268A (en) * 2018-10-24 2019-02-01 河南理工大学 A kind of text emotion analysis method based on dual channel model
CN109472024A (en) * 2018-10-25 2019-03-15 安徽工业大学 A kind of file classification method based on bidirectional circulating attention neural network
CN109446328A (en) * 2018-11-02 2019-03-08 成都四方伟业软件股份有限公司 A kind of text recognition method, device and its storage medium
CN109635284A (en) * 2018-11-26 2019-04-16 北京邮电大学 Text snippet method and system based on deep learning associate cumulation attention mechanism
CN109597884A (en) * 2018-12-28 2019-04-09 北京百度网讯科技有限公司 Talk with method, apparatus, storage medium and the terminal device generated

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIAOYU LIU: "Generating Keyword Queries for Natural Language Queries to Alleviate Lexical Chasm Problem", CIKM \'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT *
何鸿业;郑瑾;张祖平;: "结合词性特征与卷积神经网络的文本情感分析", 计算机工程, no. 11 *
王盛玉;曾碧卿;商齐;韩旭丽;: "基于词注意力卷积神经网络模型的情感分析研究", 中文信息学报, no. 09 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610003B (en) * 2019-08-15 2023-09-15 创新先进技术有限公司 Method and system for assisting text annotation
CN110610003A (en) * 2019-08-15 2019-12-24 阿里巴巴集团控股有限公司 Method and system for assisting text annotation
CN110705268A (en) * 2019-09-02 2020-01-17 平安科技(深圳)有限公司 Article subject extraction method and device based on artificial intelligence and computer-readable storage medium
CN110705268B (en) * 2019-09-02 2024-06-25 平安科技(深圳)有限公司 Article subject matter extraction method and device based on artificial intelligence and computer readable storage medium
WO2021042517A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Artificial intelligence-based article gist extraction method and device, and storage medium
WO2021042516A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Named-entity recognition method and device, and computer readable storage medium
CN110598213A (en) * 2019-09-06 2019-12-20 腾讯科技(深圳)有限公司 Keyword extraction method, device, equipment and storage medium
CN110796160A (en) * 2019-09-16 2020-02-14 腾讯科技(深圳)有限公司 Text classification method, device and storage medium
CN110866393A (en) * 2019-11-19 2020-03-06 北京网聘咨询有限公司 Resume information extraction method and system based on domain knowledge base
CN110991175A (en) * 2019-12-10 2020-04-10 爱驰汽车有限公司 Text generation method, system, device and storage medium under multiple modes
CN110991175B (en) * 2019-12-10 2024-04-09 爱驰汽车有限公司 Method, system, equipment and storage medium for generating text in multi-mode
CN111178041A (en) * 2019-12-31 2020-05-19 北京妙笔智能科技有限公司 Intelligent text repeat system and method
CN111178041B (en) * 2019-12-31 2023-04-07 北京妙笔智能科技有限公司 Intelligent text repeating system and method
CN111274815A (en) * 2020-01-15 2020-06-12 北京百度网讯科技有限公司 Method and device for mining entity attention points in text
CN111274815B (en) * 2020-01-15 2024-04-12 北京百度网讯科技有限公司 Method and device for mining entity focus point in text
US11775761B2 (en) 2020-01-15 2023-10-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for mining entity focus in text
WO2021147363A1 (en) * 2020-01-20 2021-07-29 中国电子科技集团公司电子科学研究院 Text-based major depressive disorder recognition method
CN113360639A (en) * 2020-03-06 2021-09-07 上海卓繁信息技术股份有限公司 Short text emotion classification method and device and storage device
CN111667192A (en) * 2020-06-12 2020-09-15 北京卓越讯通科技有限公司 Safety production risk assessment method based on NLP big data
WO2022134759A1 (en) * 2020-12-21 2022-06-30 深圳壹账通智能科技有限公司 Keyword generation method and apparatus, and electronic device and computer storage medium
CN115114913B (en) * 2021-03-18 2024-02-06 马上消费金融股份有限公司 Labeling method, labeling device, labeling equipment and readable storage medium
CN115114913A (en) * 2021-03-18 2022-09-27 马上消费金融股份有限公司 Labeling method, device, equipment and readable storage medium
WO2023060795A1 (en) * 2021-10-12 2023-04-20 平安科技(深圳)有限公司 Automatic keyword extraction method and apparatus, and device and storage medium
CN114048742A (en) * 2021-10-26 2022-02-15 北京师范大学 Knowledge entity and relation extraction method of text information and text quality evaluation method
CN114048742B (en) * 2021-10-26 2024-09-06 北京师范大学 Knowledge entity and relation extraction method of text information and text quality assessment method

Also Published As

Publication number Publication date
CN110110330B (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN110110330A (en) Text based keyword extracting method and computer equipment
Bamman et al. An annotated dataset of coreference in English literature
Norris Models of visual word recognition
CN108829719A (en) The non-true class quiz answers selection method of one kind and system
WO2015093541A1 (en) Scenario generation device and computer program therefor
CN110083705A (en) A kind of multi-hop attention depth model, method, storage medium and terminal for target emotional semantic classification
CN110263324A (en) Text handling method, model training method and device
CN104484411B (en) A kind of construction method of the semantic knowledge-base based on dictionary
CN112650840A (en) Intelligent medical question-answering processing method and system based on knowledge graph reasoning
JP6403382B2 (en) Phrase pair collection device and computer program therefor
CN110188272A (en) A kind of community's question and answer web site tags recommended method based on user context
JP5907393B2 (en) Complex predicate template collection device and computer program therefor
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
Barhoom et al. Sarcasm detection in headline news using machine and deep learning algorithms
CN114201683A (en) Interest activation news recommendation method and system based on multi-level matching
CN114386410A (en) Training method and text processing method of pre-training model
CN108920446A (en) A kind of processing method of Engineering document
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN116257698A (en) Social network sensitivity and graceful language detection method based on supervised learning
Kim et al. Enhancing Korean named entity recognition with linguistic tokenization strategies
CN117312514A (en) Consultation reply method, consultation reply device and computer readable storage medium
Peng et al. Encoding Text Information By Pre-trained Model For Authorship Verification.
Xu et al. Research on depression tendency detection based on image and text fusion
CN106528764A (en) Retrieval method and device for question type retrieval word
CN110245230A (en) A kind of books stage division, system, storage medium and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant