CN111813927A - Sentence similarity calculation method based on topic model and LSTM - Google Patents

Sentence similarity calculation method based on topic model and LSTM Download PDF

Info

Publication number
CN111813927A
CN111813927A CN201910292541.9A CN201910292541A CN111813927A CN 111813927 A CN111813927 A CN 111813927A CN 201910292541 A CN201910292541 A CN 201910292541A CN 111813927 A CN111813927 A CN 111813927A
Authority
CN
China
Prior art keywords
sentence
word
vector
topic
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910292541.9A
Other languages
Chinese (zh)
Inventor
曹秀亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Potevio Information Technology Co Ltd
Original Assignee
Potevio Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Potevio Information Technology Co Ltd filed Critical Potevio Information Technology Co Ltd
Priority to CN201910292541.9A priority Critical patent/CN111813927A/en
Publication of CN111813927A publication Critical patent/CN111813927A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a sentence similarity calculation method based on a topic model and LSTM, which comprises the following steps: the following processing is performed on the two sentences respectively: generating corresponding word vectors and topic vectors according to sentences, fusing the word vectors and the topic vectors to obtain fused vectors, and taking the fused vectors as the input of an LSTM layer to obtain corresponding LSTM output; and (3) taking the LSTM output of the two sentences as the input of a full-connection layer, and obtaining the similarity of the two sentences after Dropout and regularization processing. The application also discloses a corresponding system. By applying the technical scheme disclosed by the application, the accuracy of sentence similarity calculation can be improved.

Description

Sentence similarity calculation method based on topic model and LSTM
Technical Field
The application relates to the technical field of sentence matching, in particular to a sentence similarity calculation method based on a topic model and a long-short term memory network (LSTM).
Background
In the intelligence question-answering system, mainly include 3 modules: question understanding, information retrieval and answer extraction. In practical application, in order to avoid difficulty in understanding natural language, question matching can achieve a good effect under the condition of having question-answer pairs. Question matching requires matching natural language questions input by a user with questions of question-answer pairs in a question-answer system, obtaining questions of the question-answer pairs through matching, and further obtaining answers.
In the process of question matching, sentence similarity calculation is required. The sentence similarity calculation is based on natural language processing technology to calculate the similarity between two sentences. Because the question is generally short and belongs to the category of short text similarity calculation, the related method of short text similarity calculation can be used for reference.
In the prior art, the most widely applied method is to take a 'bag of words' as a basic unit without considering the complete semantic meaning of the whole sentence expression. The vector space model is the most common question similarity calculation model, and calculates the weight of each word through TF-IDF, then calculates the distance between words through cosine values by using the weights as word vectors, and finally outputs sentence similarity. The specific process is as follows:
firstly, converting a question sentence into individual characteristic words through a word bag model, and then obtaining TF-IDF values of the characteristic words by using a statistical method. Wherein, the TF (Term Frequency) value is the number of times of appearance of the characteristic word in the question sentence. Assuming that a feature word appears twice in question 1, its TF value is 2. The IDF (Inverse text frequency index) value is determined by the number of occurrences of the feature word in all question templates, for example, 10 total occurrences of the feature word in question, and then, for question 1, the IDF value of the feature word is as follows:
Figure BDA0002025058840000021
wherein N represents the number of the sets of question sentences in the question sentence library;
dft10, this feature word appears in 10 question sentences.
And the TF-IDF value is obtained by multiplying the TF value by the IDF value. The TF-IDF value is calculated based on a statistical weight calculation mode, and the statistical method is an effective feature item weight measurement method through practical tests under the condition that corpus features contained in the global text set are sufficient.
Each sentence can be changed into a vector through TF-IDF, and the cosine value between every two vectors is calculated, so that the similarity between the two sentences is finally obtained.
However, the above conventional method for calculating sentence similarity by using "bag of words" loses some information, does not consider semantic information of question sentences, simply judges the similarity of sentences according to editing distance and vocabulary matching method, and has low robustness. There is a clear disadvantage if the similarity analysis is performed purely using word vectors of single words: word order is not considered and word vector distinction is ambiguous. For example the following two sentences:
1. calculating the similarity between the high school by the first school graduation and the high school by the high school graduation according to the method, wherein the calculation result is 1; obviously, the two sentences actually express the completely opposite meanings.
2. The similarity between the work painstaking and the work ease is calculated according to the method, and the result is probably very high; and the meanings of the two phrases in fact are intended to be diametrically opposite.
Disclosure of Invention
The application provides a sentence similarity calculation method and system based on a topic model and an LSTM, so as to improve the accuracy of sentence similarity calculation.
The application discloses a sentence similarity calculation method based on a topic model and LSTM, which comprises the following steps:
the following processing is performed on the two sentences respectively: generating corresponding word vectors and topic vectors according to sentences, fusing the word vectors and the topic vectors to obtain fused vectors, and taking the fused vectors as the input of an LSTM layer to obtain corresponding LSTM output;
and (3) taking the LSTM output of the two sentences as the input of a full-connection layer, and randomly discarding a part of neuron units Dropout and carrying out regularization treatment to obtain the similarity of the two sentences.
Preferably, generating the corresponding word vector from the sentence includes:
reading in a sentence, segmenting the sentence, and numbering each word;
generating a number vector of each sentence according to the word number; wherein, the sentence adopts fixed length, and the insufficient position is filled with zero;
the word number is saved and the word vector is saved.
Preferably, generating the corresponding topic vector from the sentence includes:
inputting the word vector of the sentence into a SennceLDA model to obtain the probability distribution theta of the sentence on each topic;
and converting the probability distribution theta of the sentence on each topic into a corresponding topic vector through linear conversion.
Preferably, when the topic of the sentence is extracted, the number of word clusters is specified, clustering is carried out simultaneously, and words in the sentence are expanded through similar words.
The application also discloses a sentence similarity calculation system based on the topic model and the LSTM, which is characterized by comprising the following steps: the system comprises a word vector processing module, a theme vector processing module, an LSTM layer, a full connection layer, a Dropout module and a regularization module, wherein:
the word vector processing module, the theme vector processing module and the LSTM layer respectively perform the following processing on two sentences:
the word vector processing module generates corresponding word vectors according to the sentences;
the theme vector processing module obtains a corresponding theme vector according to the word vector;
the LSTM layer takes a vector obtained by fusing a word vector and a theme vector as input to obtain corresponding LSTM output;
the full-connection layer is used for processing LSTM output of the two sentences, and the processing result is processed by the Dropout module and the regularization module to obtain the similarity of the two sentences.
Preferably, the word vector processing module is specifically configured to:
reading in a sentence, segmenting the sentence, and numbering each word;
generating a number vector of each sentence according to the word number; wherein, the sentence adopts fixed length, and the insufficient position is filled with zero;
the word number is saved and the word vector is saved.
Preferably, the theme vector processing module is specifically configured to:
inputting the word vector of the sentence into a SennceLDA model to obtain the probability distribution theta of the sentence on each topic;
and converting the probability distribution theta of the sentence on each topic into a corresponding topic vector through linear conversion.
Preferably, when the topic of the sentence is extracted, the number of word clusters is specified, clustering is carried out simultaneously, and words in the sentence are expanded through similar words.
According to the technical scheme, the question sentence is an irregular short text and is not necessarily standard in grammatical structure, so that the problem is effectively avoided by adopting a vector mode. The sentence similarity is calculated by combining the improved topic model and the LSTM, so that the system can notice semantic information and jointly match sentences by means of word vectors. Compared with the traditional TF-IDF algorithm, the invention has the capability of understanding the subject of the question and the associated information between the upper and lower words.
The sentence similarity calculation method and system based on the topic model and the LSTM, which are provided by the invention, provide a simple, efficient and extensible topic model by combining the original SenenteLDA topic model, expand the words in the sentence by improving the clustering algorithm, which is equivalent to the background knowledge of the words in the sentence, and fully improve the semantic information of the sentence. In addition, the method creatively uses the idea of convolutional neural network to combine the topic vector with the word vector of the sentence and inputs the combined topic vector into the LSTM network, thereby effectively improving the accuracy of sentence similarity calculation.
Drawings
FIG. 1 is a schematic diagram of a network architecture of a question matching system based on a topic model and LSTM according to the present invention;
FIG. 2 is a schematic diagram of the Sentence LDA model;
FIG. 3 is a schematic diagram of a generation algorithm for word and topic collections;
FIG. 4 is a schematic diagram illustrating the principle of sentence similarity calculation based on topic model and LSTM according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described in detail below by referring to the accompanying drawings and examples.
The invention establishes a sentence matching system based on a topic model and LSTM, and the network architecture is shown in figure 1:
first, the following processing is performed on two sentences, respectively: generating corresponding word vectors and topic vectors according to sentences, fusing the word vectors and the topic vectors to obtain fused vectors, and taking the fused vectors as the input of an LSTM layer to obtain corresponding LSTM output;
then, the LSTM outputs of the two sentences are used as the input of the fully connected layer, and after a part of the neural units Dropout and the regularization process are discarded, the similarity of the two sentences is obtained. Among them, Dropout processing is processing for randomly discarding a part of neuron units in order to prevent overfitting.
The technical scheme of the invention is further explained in detail in the following sections.
First, generating word vector
In order to solve the word order problem of sentences, the invention adopts LSTM (Long Short-Term Memory network) to represent the sentences again, and measures the similarity of the sentences through a plurality of full connection layers.
LSTM is a time-recursive neural network, natural language cannot be directly used as input of the neural network, and therefore, sentences need to be encoded to obtain corresponding word vectors, and the process corresponds to embedding-layer-1 and embedding-layer-2 in fig. 1.
The process of generating a word vector comprises the steps of:
1. reading in a sentence, segmenting the sentence, and numbering each word;
2. generating a number vector of each sentence according to the word number, wherein the sentences adopt fixed lengths, and the insufficient positions are filled with zero;
3. and the word numbers are stored in the file, and the word vectors are stored, so that the prediction and the use are convenient.
Secondly, generating a theme vector
The invention adopts a Sentence LDA topic model to extract question topics. The probability of the same word appearing in different subject backgrounds is different; the probability of occurrence in different sentences is also different for the same topic.
The sententitlet lda is an extension of lda (last Dirichlet allocation) with the goal of overcoming the limitation of data sparsity by incorporating text structures in the generation and reasoning process. LDA and sententilde differ in that: the latter assumes that there are very strong potential topic dependencies between words of a sentence, while the former is primarily independence between words of a sentence.
The sententitle lda adopts a deep learning method to vectorize sentences, and uses co-occurrence information of local context words. For example: using the first n-1 words, it is predicted what the next word is. The method essentially utilizes word co-occurrence information in the range of n words, and the main idea is to use global topic information to predict the probability of word occurrence in a sentence.
Obtaining probability distribution theta (called 'theme distribution' for short) of the sentence on each theme through the Sentence LDA training; then, the theme distribution θ is converted into a corresponding theme vector through linear transformation.
The sententitle lda model is shown in fig. 2:
in fig. 2, K represents the number of topics, D represents the number of sentences in the corpus, N represents the number of words in a sentence, and S represents the number of words under a certain topic.
The probability formula for the sententitle model is as follows:
p(w,z|α,β)=p(w|z,β)p(z|α)
wherein: w is a word, z represents a topic, and α and β are two independent sentences under the same topic. And the corresponding probability relation between the theme and the words can be obtained through the theme distribution under alpha and the words under beta theme distribution.
When determining the topic distribution θ of a sentence, it needs to be based on the corresponding relationship between words and topics, and the corresponding relationship is obtained through training, and a specific generation algorithm of a set of words and topics is shown in fig. 3, and includes the following steps:
step 1: the document collection is processed.
Step 2: and judging whether all the documents are selected, if not, continuing to execute the 3 rd step and the 4 th step, and if all the documents are selected, ending the flow.
And 3, step 3: the number of sample sentences in the selected document belongs to the execution 5 step of the poisson distribution.
And 4, step 4: the sample topic of the selection blend is the execution 5 th step belonging to the dirichlet distribution.
And 5, step 5: and judging whether all sentences are selected, if so, returning to the step 1, and otherwise, continuing the steps 6 and 7.
And 6, step 6: the selection of the number of sample words is step 8 of the execution belonging to the poisson distribution.
And 7, step 7: the select sample topic is step 8 of execution belonging to a polynomial distribution.
And 8, step 8: and judging whether all the words are selected, if so, returning to the step 3, and if not, continuing to execute the step 9.
Step 9: sample words belonging to a polynomial distribution are retained.
Step 10: and storing the corresponding relation between the words and the topics, and ending the process.
The topic model can encounter the problem of sparse information when abstracting the topic of the sentence, therefore, the invention adopts an improved clustering algorithm to expand the words in the sentence, and solves the problem of sparse information through similar words.
The clustering algorithm adopts a clustering algorithm based on the coacervation hierarchy, which is a bottom-up clustering algorithm. The existing agglomeration hierarchical clustering algorithm has low efficiency, and the invention improves the algorithm through the idea of distribution, specifies the number of word clusters and carries out clustering at the same time. In actual training, preferably, the word cluster can be classified into 6 categories: people, places, numbers, times, entities, unknowns, and the subclasses are classified into 30 classes.
Third, LSTM layer and subsequent processing
After a sentence is encoded to obtain a corresponding word vector, a word vector map of words in the sentence needs to be prepared as an input of the LSTM layer. The neural network adopts a simple single-layer LSTM and a full connection layer to train data, and the structure diagram of the network is shown in figure 4:
the section first encodes and maps the input sentences (as shown, "i want you" and "you want me") into a corresponding word vector list (as shown x1x2x3 in LSTM-a and x1x2x3 in LSTM _ b), and combines the word vectors with the topic vectors using the idea of convolutional neural networks, resulting in hidden vectors h1h2h 3. By innovatively adopting the combination mode, the invention can better fuse the relationship between the theme and the words, and finally, the LSTM layer outputs corresponding to two sentences are respectively obtained according to the hidden layer vector, such as y1 and y2 shown in FIG. 4. Then, the outputs of the two LSTMs are spliced and used as the input of the full connection layer, and the result is finally output after Dropout and Batchnormalization regularization. The sentence similarity calculation model can be obtained by training sentences in the training sentence set.
According to the method, when the sentence similarity is calculated, the topic vector is added, and the topic vector is the probability distribution of the sentence and each topic, so that the weight of the words is implicitly increased according to the topic of the sentence, the judgment of the sentence similarity by the LSTM is better improved, the semantic information of the sentence is fully considered, the sentence similarity is judged by combining the word information of the sentence, and the accuracy is greatly improved compared with that of the traditional method.
Based on the network architecture shown in fig. 1, the system for calculating sentence similarity based on the topic model and the LSTM of the present invention comprises the following processing modules: the system comprises a word vector processing module, a theme vector processing module, an LSTM layer, a full connection layer, a Dropout module and a regularization module, wherein:
the word vector processing module, the theme vector processing module and the LSTM layer respectively perform the following processing on two sentences:
the word vector processing module generates corresponding word vectors according to the sentences;
the theme vector processing module obtains a corresponding theme vector according to the word vector;
the LSTM layer takes a vector obtained by fusing a word vector and a theme vector as input to obtain corresponding LSTM output;
the full-connection layer is used for processing LSTM output of the two sentences, and the processing result is processed by the Dropout module and the regularization module to obtain the similarity of the two sentences.
Preferably, the word vector processing module is specifically configured to:
reading in a sentence, segmenting the sentence, and numbering each word;
generating a number vector of each sentence according to the word number; wherein, the sentence adopts fixed length, and the insufficient position is filled with zero;
the word number is saved and the word vector is saved.
Preferably, the theme vector processing module is specifically configured to:
inputting the word vector of the sentence into a SennceLDA model to obtain the probability distribution theta of the sentence on each topic;
and converting the probability distribution theta of the sentence on each topic into a corresponding topic vector through linear conversion.
Preferably, when the topic of the sentence is extracted, the number of word clusters is specified, clustering is carried out simultaneously, and words in the sentence are expanded through similar words.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (8)

1. A sentence similarity calculation method based on a topic model and LSTM is characterized by comprising the following steps:
the following processing is performed on the two sentences respectively: generating corresponding word vectors and topic vectors according to sentences, fusing the word vectors and the topic vectors to obtain fused vectors, and taking the fused vectors as the input of an LSTM layer to obtain corresponding LSTM output;
and (3) taking the LSTM output of the two sentences as the input of a full-connection layer, and randomly discarding a part of neuron units Dropout and carrying out regularization treatment to obtain the similarity of the two sentences.
2. The method of claim 1, wherein generating a corresponding word vector from a sentence comprises:
reading in a sentence, segmenting the sentence, and numbering each word;
generating a number vector of each sentence according to the word number; wherein, the sentence adopts fixed length, and the insufficient position is filled with zero;
the word number is saved and the word vector is saved.
3. The method of claim 2, wherein generating the corresponding topic vector from the sentence comprises:
inputting the word vector of the sentence into a SennceLDA model to obtain the probability distribution theta of the sentence on each topic;
and converting the probability distribution theta of the sentence on each topic into a corresponding topic vector through linear conversion.
4. A method according to any one of claims 1 to 3, characterized in that:
when the sentence theme is extracted, the number of word clusters is specified, clustering is carried out at the same time, and words in the sentence are expanded through similar words.
5. A system for calculating sentence similarity based on topic models and LSTM, comprising: the system comprises a word vector processing module, a theme vector processing module, an LSTM layer, a full connection layer, a Dropout module and a regularization module, wherein:
the word vector processing module, the theme vector processing module and the LSTM layer respectively perform the following processing on two sentences:
the word vector processing module generates corresponding word vectors according to the sentences;
the theme vector processing module obtains a corresponding theme vector according to the word vector;
the LSTM layer takes a vector obtained by fusing a word vector and a theme vector as input to obtain corresponding LSTM output;
the full-connection layer is used for processing LSTM output of the two sentences, and the processing result is processed by the Dropout module and the regularization module to obtain the similarity of the two sentences.
6. The system of claim 5, wherein the word vector processing module is specifically configured to:
reading in a sentence, segmenting the sentence, and numbering each word;
generating a number vector of each sentence according to the word number; wherein, the sentence adopts fixed length, and the insufficient position is filled with zero;
the word number is saved and the word vector is saved.
7. The system of claim 6, wherein the theme vector processing module is specifically configured to:
inputting the word vector of the sentence into a SennceLDA model to obtain the probability distribution theta of the sentence on each topic;
and converting the probability distribution theta of the sentence on each topic into a corresponding topic vector through linear conversion.
8. The system according to any one of claims 5 to 7, wherein:
when the sentence theme is extracted, the number of word clusters is specified, clustering is carried out at the same time, and words in the sentence are expanded through similar words.
CN201910292541.9A 2019-04-12 2019-04-12 Sentence similarity calculation method based on topic model and LSTM Withdrawn CN111813927A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910292541.9A CN111813927A (en) 2019-04-12 2019-04-12 Sentence similarity calculation method based on topic model and LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910292541.9A CN111813927A (en) 2019-04-12 2019-04-12 Sentence similarity calculation method based on topic model and LSTM

Publications (1)

Publication Number Publication Date
CN111813927A true CN111813927A (en) 2020-10-23

Family

ID=72844605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910292541.9A Withdrawn CN111813927A (en) 2019-04-12 2019-04-12 Sentence similarity calculation method based on topic model and LSTM

Country Status (1)

Country Link
CN (1) CN111813927A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650836A (en) * 2020-12-28 2021-04-13 成都网安科技发展有限公司 Text analysis method and device based on syntax structure element semantics and computing terminal
CN113806486A (en) * 2021-09-23 2021-12-17 深圳市北科瑞声科技股份有限公司 Long text similarity calculation method and device, storage medium and electronic device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650836A (en) * 2020-12-28 2021-04-13 成都网安科技发展有限公司 Text analysis method and device based on syntax structure element semantics and computing terminal
CN113806486A (en) * 2021-09-23 2021-12-17 深圳市北科瑞声科技股份有限公司 Long text similarity calculation method and device, storage medium and electronic device
CN113806486B (en) * 2021-09-23 2024-05-10 深圳市北科瑞声科技股份有限公司 Method and device for calculating long text similarity, storage medium and electronic device

Similar Documents

Publication Publication Date Title
CN108595632B (en) Hybrid neural network text classification method fusing abstract and main body characteristics
CN110825721B (en) Method for constructing and integrating hypertension knowledge base and system in big data environment
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
CN106970910B (en) Keyword extraction method and device based on graph model
Wang et al. Multilayer dense attention model for image caption
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN114064918B (en) Multi-modal event knowledge graph construction method
CN106776562A (en) A kind of keyword extracting method and extraction system
CN111325029B (en) Text similarity calculation method based on deep learning integrated model
CN110083710A (en) It is a kind of that generation method is defined based on Recognition with Recurrent Neural Network and the word of latent variable structure
CN108874896B (en) Humor identification method based on neural network and humor characteristics
CN110750635B (en) French recommendation method based on joint deep learning model
CN112507039A (en) Text understanding method based on external knowledge embedding
CN110765755A (en) Semantic similarity feature extraction method based on double selection gates
CN111881292B (en) Text classification method and device
CN110276396B (en) Image description generation method based on object saliency and cross-modal fusion features
CN114428850B (en) Text retrieval matching method and system
CN111144410A (en) Cross-modal image semantic extraction method, system, device and medium
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114997288A (en) Design resource association method
CN114064901B (en) Book comment text classification method based on knowledge graph word meaning disambiguation
CN111400449A (en) Regular expression extraction method and device
CN114757184A (en) Method and system for realizing knowledge question answering in aviation field
CN111813927A (en) Sentence similarity calculation method based on topic model and LSTM
US20220156489A1 (en) Machine learning techniques for identifying logical sections in unstructured data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201023