CN108415901A - A kind of short text topic model of word-based vector sum contextual information - Google Patents
A kind of short text topic model of word-based vector sum contextual information Download PDFInfo
- Publication number
- CN108415901A CN108415901A CN201810124600.7A CN201810124600A CN108415901A CN 108415901 A CN108415901 A CN 108415901A CN 201810124600 A CN201810124600 A CN 201810124600A CN 108415901 A CN108415901 A CN 108415901A
- Authority
- CN
- China
- Prior art keywords
- word
- theme
- document
- semantic
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of short text topic models of word-based vector sum contextual information, from the semantic relation extracted in term vector between word, the disadvantage of short text data word co-occurrence deficiency is compensated for by explicit this semantic relation of acquisition, semantic relation between word is further filtered by training set data, it is made to be more applicable for training dataset.Background theme is added in generating process, noise word in document is modeled by background theme.Model is solved using gibbs sampler method in model inference, and increase probability of the larger word of semantic dependency under related subject using the sampling policy of Generalized Wave Leah urn model during sampling, in this way so that the semantic consistency of word is greatly improved under theme.A series of experiments shows that method proposed by the present invention can largely improve the semantic consistency of theme, and a kind of new method is provided for the modeling of short text theme.
Description
Technical field
The invention belongs to natural language processing fields, are related to a kind of short text theme of word-based vector sum contextual information
Model
Background technology
With the development of social networks, short text has become one of main path of internet information spreading.Short essay
Contain abundant information in notebook data, it is very valuable that subject information is extracted from short text data.Probability topic mould
Type is a kind of effective ways for concentrating extraction subject information from document data, and topic model is a kind of unsupervised learning method,
The input of model is document data, exports the subject information to include in document data, each theme can be regarded as word
Distribution, the higher word of probability of occurrence can reflect the semantic feature of this theme, such as " education " under the theme, " university ",
The probability under a theme such as " student " words is higher, then what the theme was reflected is the theme of one " educational ".Theme
Why model is effectively largely dependent upon the co-occurrence information of word, i.e. two words occur in same piece document
Probability is higher, then the probability for belonging to a theme is bigger.The models such as classical topic model such as LDA and PLSA exist
Preferable effect is achieved in large-scale data.
Since in short text data, for word co-occurrence than sparse, traditional topic model can not be effectively from short
The theme of high quality is extracted in text, the semantic consistency for obtaining theme is not high.In order to concentrate extraction high from short text data
The theme of quality, it is therefore desirable to be able to make full use of the feature of external knowledge and training data itself to obtain the semantic letter of word
Breath makes up the deficiency of word co-occurrence information, and further application semantics information improves the semanteme one of theme during modeling
Cause property.
Invention content
The present invention is on the basis of existing research, it is proposed that a kind of short text theme of word-based vector sum contextual information
Model, influence caused by making up word co-occurrence deficiency using the semantic information of word improve semantic relevant word and exist
The probability occurred under same subject.Meanwhile background theme is introduced in a model to capture noise word information, it can further carry
The semantic consistency of word under high each theme.
Technical scheme of the present invention:
A kind of short text topic model of word-based vector sum contextual information, steps are as follows:
(1) the Semantic features extraction stage
First, training term vector is concentrated from large-scale data, is obtained in training set between two words according to term vector
Semantic similarity further obtains the set of semantic related words for word in training set.
(2) semantic information filtration stage
Because term vector training from large-scale text data obtains, the semantic dependency between word simultaneously differs
Surely it is suitable for training data, so the semantic dependency according to the information of training data between word is needed to carry out further mistake
Filter.
(3) the generating process modelling phase
With reference to DMM models, the generating process of Definition Model.Assuming that every short essay shelves are only there are one theme, it is every in document
A word is generated by the theme or a background theme.The instruction of one binary of each word associations in document becomes
Amount illustrates that the word comes from a normal theme, if the value of the variable is 1, illustrates the word when the value of the variable is 0
From background theme, which is a background word.
(4) model parameter solves the stage
According to generating process, the hidden variable in model is sampled using gibbs sampler, the parameter of model can root
It is found out according to maximum posterior estimation.Increase semantic phase using Generalized Wave Leah urn model (General Polya Urn model)
The statistic occurred under the word same subject of pass, after carrying out maximum posterior estimation according to sample, semantic phase under each theme
The word probability of occurrence of pass will increase, so the semantic consistency of theme can improve.
The beneficial effects of the present invention are, it is proposed that a kind of short text theme mould of word-based vector sum contextual information
Type, effectively utilizes term vector and contextual information goes to obtain the semantic dependency between word to make up in short text data
The insufficient defect of word co-occurrence.During model inference, the word with stronger semantic dependency is under related subject
Probability can be increased simultaneously, improve the semantic consistency of theme to a certain extent.It is added in the generating process of model
Background theme information, effectively captures the noise word information in document, can further increase the semantic consistency of theme.
And compared with the short text topic model proposed in the recent period, the model of this paper is all promoted in efficiency and effect, not
Robustness is presented in same data, a kind of new frame is provided for the modeling of short text theme.
Description of the drawings
Fig. 1 is that the probability graph model of the method for the present invention indicates.
Fig. 2 is the theme that is extracted on Amazon data set of the present invention as file characteristics, the F1 to classify to document
Value.
Fig. 3 is the theme that is extracted on Amazon data set of the present invention as file characteristics, the standard classified to document
True rate.
Fig. 4 is the theme that is extracted on network inquiry data set of the present invention as file characteristics, is classified to document
F1 values.
Fig. 5 is the theme that is extracted on network inquiry data set of the present invention as file characteristics, is classified to document
F1 values.
Specific implementation mode
Specific embodiment of the present invention is illustrated below, the starting point to further illustrate the present invention and corresponding
Technical solution.
The present invention is a kind of short text topic model of word-based vector sum contextual information, and main purpose is desirable to model
The subject information for automatically extracting high quality, this method can be concentrated to be divided into following 4 steps from short text data:
(1) semantic similarity between word is obtained:
Term vector is trained from the Open-Source Tools word2vec of wikipedia data focus utilization Google first, if it is English
Training data, then need using English wikipedia data set training term vector, if Chinese training data, then need to use
The term vector of Chinese wikipedia data set training Chinese.Herein by taking English training data as an example, the training data used is
Google's comment data collection (Amazon Reviews) and network inquiry data set (Web Snippet).So we utilize
The term vector of word2vec tools training English word in English wikipedia data, vectorial dimension are set as 300.
For the training data of model, it would be desirable to be pre-processed data so as to subsequent operation:First with
The nltk natural language processings library of python carries out subordinate sentence to text, and then each sentence is segmented, and English text is come
It says, the separator between space, that is, word.Progress one needs to filter out stop words in text, then filters out that frequency occur small
In the word of 5 documents, the document that Document Length is less than 3 words is filtered out.It can be obtained after the treatment about training number
According to word list V.
For word wiAnd wj, corresponding term vector is viAnd vj, then the semantic similarity between word we define
For:Cosine similarity i.e. between vector.For the list in word list V
Word w, defining its semanteme related words set S (w) is:S (w)={ wo|SR(w,wo)>ε }, wherein the value of ε regarding data set and
Fixed, range is [0,1] ε ∈.
(2) training data is used to filter semantic similarity information between word
In working before, term vector is taken as the topic model field that external knowledge is applied to.Term vector is typically
Concentrate training to obtain from large-scale text data, it includes semantic information may be not particularly suited for training data, such as
" bachelor " and " undergraduate " the two words may be there is no too big association in " family " this theme
Property, thus our point of use mutual informations (Point Mutual Information) come the semantic similarity information between word into
Row filtering, makes it be more suitable for short text training data.Given word wiAnd wj, then the PMI between word be defined as:
Wherein p (wi,wj) indicate wiAnd wjThe probability occurred jointly in same piece document, by
Estimation obtains, whereinIndicate word wiAnd wjThe number of files occurred jointly, and | D | indicate training set total number of documents, p
(w) it indicates the probability that word w occurs in collection of document, can go to estimate by the document frequency comprising the word, i.e.,Can be that word w definition set S (w) are again according to PMI:S (w)={ wo|SR(w,wo)>ε,PMI(w,
wo)≥η}.I.e. if two relevant words of semanteme in training data if relevance very little, largely this two
A word is not semantic relevant in training data concentration.Wherein η ∈ (- ∞ ,+∞), specific value is depending on data set.
(3) generating process of Definition Model
Due to being directed to short text data, it is possible to use for reference DMM models and Twitter-LDA models go definition originally
Generating process involved in method.In topic model, generating process refers to the flow for assuming to generate document.Assume initially that text
For shelves collection there are K theme, every short essay shelves are only related to a theme, because holding the limited length of document, every short text
Length is generally all within 100 words, so the hypothesis is relatively easy and rational.Assuming that a short essay shelves d is associated
Theme is k, then in the document not all word due to theme k list that is related, such as can all occurring in most of documents
Word, these words can be counted as background word, they keep sentence more complete or semantic meaning representation more cleans.So can
It is responsible for generating background word to set a global background theme B.Assuming that each word w in document d relating subjects z, d is closed
One binary indicator variable y of connection illustrates that the word illustrates that the word comes from from background theme B if y=0 if y=1
In theme z.Theme is to be generated from a multinomial distribution θ sampling, and it is to sample to obtain in the distribution of α Di Li Crays that θ, which is from parameter,
's.For theme k and background theme B, the multinomial distribution φ about wordkIt is to be adopted in being distributed from parameter for β Di Li Crays
What sample obtained.Complete generating process is as follows:
A) sampling obtains the theme distribution of collection of document from the Di Li Crays distribution that parameter is α:θ~Dirichlet (α)
B) it for background theme, is sampled from the Di Li Crays distribution that parameter is β and obtains the multinomial distribution about word:
φB~Dirichlet (β)
C) sampling obtains the distribution of binary indicator variable:ψ~Dirichlet (γ)
D) for each theme k, sampling obtains subject word distribution:φk~Dirichlet (β)
E) for every document d in document sets, sampling first obtains the theme z of the documentd~Multinomial
(θ) samples a binary indicator variable y first for i-th of word in document dd,i~Bernoulli (ψ), if yd,i=
0, then the word is from theme zdIt generates, i.e.,If yd,i=1, then the word is from background theme
It generates, i.e. wd,i~Multinomial (φB), wherein wd,iIndicate i-th of word in document d.
The corresponding probability graph model of above-mentioned generating process indicates as shown in Figure 1.We assume that the document of training data is all
Generated according to the above process, so in the case where having obtained observational variable, need according to above-mentioned generating process and
Observational variable carrys out the hidden variable of modulus type.
(4) model parameter solves
According to generating process, we can write out is about the likelihood function L of training data:
It needs to maximize likelihood function to acquire the parameter and hidden variable of model, but due between variable in above formula
Coupling, accurate solve is impossible, so we use the hidden variable and ginseng in the method solving model of approximate solution
Number, the method for more commonly used approximate solution has EM algorithms, variation EM, variation it is expected propagation and lucky cloth in probability graph model
This sampling, we solve parameter using gibbs sampler method here, and this method solution is fairly simple, by abundant
Sampling after can obtain globally optimal solution.Certainly there are one very important the reason is that, using gibbs sampler we
During can the semantic relevant information between word being included into sampling.According to likelihood function, it would be desirable to which sampling is to hidden
Variable y and z are sampled, and for hidden variable φBφ1,...Kθ ψ, we can be obtained by MAP estimation.
Given word w and its semanteme related words set S (w), since the word in set S (w) has relatively by force with w
Semantic relation, so if probability of the word w at a theme z is larger, for the word arbitrarily in set S (w)
wo, the probability at theme z also should be bigger.Our adopting using Generalized Wave Leah urn model in order to reach this purpose
Sample strategy.
Pohle Asia urn model is more classical one of model in statistics, and many problems can be attributed to Pohle Asia tank
Submodel.In the urn model of simple Pohle Asia, there are multiple beads, each bead to be coated with one kind inside a jar
Color extracts a bead, then simultaneously by the bead and a bead identical with the bead color in jar at random
It puts back in jar.And Generalized Wave Leah model is extended on the basis of the model, i.e., one is randomly selected from jar
Bead records the color of this ball, and the bead and certain amount and should of two colors are then put back into the jar
The bead of other similar colors of color.In topic model, jar can be the theme with analogy, and bead can be with analogy Cheng Dan
Word, it is common gibbs sampler process that simple Pohle Asia urn model is corresponding.And during this model solution, we make
It is Generalized Wave Leah urn model, i.e., for word w, if it occurs once, not only increasing w at k in theme k
Statistic, also to increase statistic of the word at theme k in S (w).I.e. during gibbs sampler, if single
The theme of word w is set as k, thenMeanwhile for wo∈ S (w),WhereinIndicate master
Topic k is associated with the statistic of word w, andIt is defined as:
For document d, the sampling formula of theme z is:
Wherein nkThe number of documents that the k that is the theme occurs,For statistics of the word w at theme k,Be the theme k about
The statistic of all words,For occurrence numbers of the word w in document d, and subscript-d indicates calculating these statistics
When, the information of document d is not considered.
For i-th of word of document d, the sampling formula of binary indicator variable y is:
Wherein nY=1Indicate the number of background word in document sets, similarly, nY=0Indicate of non-background word in collection of document
Number,The number of word w, n are generated for background theme B in document setsBFor background word number in document sets, and subscript-d, i are indicated
When calculating ASSOCIATE STATISTICS amount, the information of i-th of word of document d is not considered.After fully sampling, word can be obtained
Probability values of the w at theme k be:
Since the word of the distribution of relationship indicator variable and background theme is not distributed for we, so in practical applications
It only needs to find out φ.
The evaluation method of topic model is a variety of in having, we will learn feature of the obtained theme as document, to document
Classify, the quality of theme quality is judged with the accuracy rate of classification.For giving grader, if theme semantic consistency
Higher, then the accuracy rate classified is higher, we use random forest as grader, are made with the accuracy rate of classification and F1 values
For the measurement index of classification.Our setting model parameter ε=0.5 and η=0.0 in experiment, with Amazon comment data and net
Network inquires training data of the data as model.In order to further illustrate the validity of this model, we by the model and other 5
The topic model of a common short essay this field is compared, and the number K of theme is set as { 20,40,60,80 }.As a result as schemed
Shown in 2, Fig. 3, Fig. 4, Fig. 5.From the point of view of the classifying quality of data, our model can be obtained in most cases preferably
As a result, illustrating that the theme quality of the model extraction in majority of case is preferable.We can be into from Fig. 4 and Fig. 5
One step observes the growth with theme number, which has certain robustness, can't be because of the growth for the number that is the theme
And the quality of theme is made to have prodigious decline.It can be seen that word-based vector sum context proposed by the present invention from experiment effect
The feasibility of the short text topic model of information.
It is specific embodiments of the present invention and the technical principle used described in above, if conception under this invention institute
The change of work when the spirit that generated function is still covered without departing from specification and attached drawing, should belong to the present invention's
Protection domain.
Claims (1)
1. a kind of short text topic model of word-based vector sum contextual information, which is characterized in that effectively utilize term vector
And contextual information obtains the semantic similarity between word, and by semantic similarity Information application to gibbs sampler
In the process, increase the semantic consistency of theme:
(1) semantic similarity between word is obtained
The training term vector from wikipedia or Google's news, the vector for obtaining each word in training data indicates, with vector
Between cosine similarity indicate the semantic dependency between two words;For word wiAnd wj, corresponding term vector is
viAnd vj, then the semantic similarity between word be defined as:For training
The each word concentrated obtains the set S (w) of its semantic related words, is defined as:S (w)={ wo|SR(w,wo)>ε },
For the value of middle ε depending on data set, range is [0,1] ε ∈;
(2) training data is used to filter semantic similarity information between word
Term vector is obtained from large data concentration training, and semantic information wherein included may be not particularly suited for training number
According in order to further include the feature of training data, point of use mutual information PMI carried out obtained semantic similarity information
Filter, word wiAnd wjBetween PMI be defined as:
Wherein, p (wi,wj) indicate word wiAnd wjThe probability occurred jointly in same piece document, p (w) indicate word w in document
The probability occurred in set is gone to estimate by the document frequency comprising the word;Redefining set S (w) according to PMI is:S(w)
={ wo|SR(w,wo)>ε,PMI(w,wo) >=η }, wherein η ∈ (- ∞ ,+∞), specific value is depending on data set;
(3) generating process of Definition Model
Specifying in short collection of document has K theme and a background theme;One short essay shelves includes only a theme, a document
In word both generated by a normal theme or generated by a background theme;Specifically generating process is:
A) sampling obtains the theme distribution of collection of document:θ~Dirichlet (α);
B) sampling obtains the word distribution of background theme:φB~Dirichlet (β);
C) sampling obtains the distribution of binary indicator variable:ψ~Dirichlet (γ);
D) for each theme k, sampling obtains subject word distribution:φk~Dirichlet (β);
E) for every document d in document sets, sampling first obtains the theme of the document;
zd~Multinomial (θ) samples a binary variable first for i-th of word in document d
yd,i~Bernoulli (ψ), if yd,i=0, then the word is from theme zdIt generates, i.e.,
wd,i~Multinomial (φzd), if yd,i=1, then the word is from background theme B generations, i.e.,
wd,i~Multinomial (φB);
(4) model parameter solves
The method that the solution of model parameter is used is gibbs sampler, after maximum of the sample obtained according to sampling to carry out parameter
Test estimation;In order to increase language using the method for sampling of General Polya Urn models in the semantic consistency for improving theme
Statistic of the adopted higher word of similarity under related subject, i.e., when the corresponding themes of word w are assigned k, thenMeanwhile for wo∈ S (w),WhereinIt is defined as:
According to generating process, the hidden variable that is sampled is z and y, and hidden variable φB、φ1,...,k, θ and ψ estimated by maximum a posteriori
Meter obtains;For document d, the sampling formula of theme z is:
Wherein α, β are the hyper parameter in Dirichlet distributions, and V is the size of word list,For statistics of the word w at theme k
Amount,Be the theme statistics of the k about all words, nkBased on entitled k text number, subscript-d indicates calculating current system
When metering, document d will not be considered into, obtain sample z and obtain each theme about word by MAP estimation later
Distribution:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810124600.7A CN108415901A (en) | 2018-02-07 | 2018-02-07 | A kind of short text topic model of word-based vector sum contextual information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810124600.7A CN108415901A (en) | 2018-02-07 | 2018-02-07 | A kind of short text topic model of word-based vector sum contextual information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108415901A true CN108415901A (en) | 2018-08-17 |
Family
ID=63127010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810124600.7A Pending CN108415901A (en) | 2018-02-07 | 2018-02-07 | A kind of short text topic model of word-based vector sum contextual information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108415901A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110516053A (en) * | 2019-08-15 | 2019-11-29 | 出门问问(武汉)信息科技有限公司 | Dialog process method, equipment and computer storage medium |
CN110532378A (en) * | 2019-05-13 | 2019-12-03 | 南京大学 | A kind of short text aspect extracting method based on topic model |
CN110674783A (en) * | 2019-10-08 | 2020-01-10 | 山东浪潮人工智能研究院有限公司 | Video description method and system based on multistage prediction architecture |
CN113191134A (en) * | 2021-05-31 | 2021-07-30 | 平安科技(深圳)有限公司 | Document quality verification method, device, equipment and medium based on attention mechanism |
CN113705247A (en) * | 2021-10-27 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Theme model effect evaluation method, device, equipment, storage medium and product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391842A (en) * | 2014-12-18 | 2015-03-04 | 苏州大学 | Translation model establishing method and system |
CN105320960A (en) * | 2015-10-14 | 2016-02-10 | 北京航空航天大学 | Voting based classification method for cross-language subjective and objective sentiments |
CN105955948A (en) * | 2016-04-22 | 2016-09-21 | 武汉大学 | Short text topic modeling method based on word semantic similarity |
CN107861939A (en) * | 2017-09-30 | 2018-03-30 | 昆明理工大学 | A kind of domain entities disambiguation method for merging term vector and topic model |
-
2018
- 2018-02-07 CN CN201810124600.7A patent/CN108415901A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104391842A (en) * | 2014-12-18 | 2015-03-04 | 苏州大学 | Translation model establishing method and system |
CN105320960A (en) * | 2015-10-14 | 2016-02-10 | 北京航空航天大学 | Voting based classification method for cross-language subjective and objective sentiments |
CN105955948A (en) * | 2016-04-22 | 2016-09-21 | 武汉大学 | Short text topic modeling method based on word semantic similarity |
CN107861939A (en) * | 2017-09-30 | 2018-03-30 | 昆明理工大学 | A kind of domain entities disambiguation method for merging term vector and topic model |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110532378A (en) * | 2019-05-13 | 2019-12-03 | 南京大学 | A kind of short text aspect extracting method based on topic model |
CN110532378B (en) * | 2019-05-13 | 2021-10-26 | 南京大学 | Short text aspect extraction method based on topic model |
CN110516053A (en) * | 2019-08-15 | 2019-11-29 | 出门问问(武汉)信息科技有限公司 | Dialog process method, equipment and computer storage medium |
CN110674783A (en) * | 2019-10-08 | 2020-01-10 | 山东浪潮人工智能研究院有限公司 | Video description method and system based on multistage prediction architecture |
CN110674783B (en) * | 2019-10-08 | 2022-06-28 | 山东浪潮科学研究院有限公司 | Video description method and system based on multi-stage prediction architecture |
CN113191134A (en) * | 2021-05-31 | 2021-07-30 | 平安科技(深圳)有限公司 | Document quality verification method, device, equipment and medium based on attention mechanism |
CN113191134B (en) * | 2021-05-31 | 2023-02-03 | 平安科技(深圳)有限公司 | Document quality verification method, device, equipment and medium based on attention mechanism |
CN113705247A (en) * | 2021-10-27 | 2021-11-26 | 腾讯科技(深圳)有限公司 | Theme model effect evaluation method, device, equipment, storage medium and product |
CN113705247B (en) * | 2021-10-27 | 2022-02-11 | 腾讯科技(深圳)有限公司 | Theme model effect evaluation method, device, equipment, storage medium and product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108415901A (en) | A kind of short text topic model of word-based vector sum contextual information | |
Nguyen et al. | Automatic image filtering on social networks using deep learning and perceptual hashing during crises | |
CN111126386B (en) | Sequence domain adaptation method based on countermeasure learning in scene text recognition | |
CN105975499B (en) | A kind of text subject detection method and system | |
CN109492026B (en) | Telecommunication fraud classification detection method based on improved active learning technology | |
CN103995876A (en) | Text classification method based on chi square statistics and SMO algorithm | |
CN112417157A (en) | Emotion classification method of text attribute words based on deep learning network | |
CN113051932B (en) | Category detection method for network media event of semantic and knowledge expansion theme model | |
CN108829661A (en) | A kind of subject of news title extracting method based on fuzzy matching | |
CN113448843A (en) | Defect analysis-based image recognition software test data enhancement method and device | |
CN103440315A (en) | Web page cleaning method based on theme | |
CN114329455B (en) | User abnormal behavior detection method and device based on heterogeneous graph embedding | |
CN111984790B (en) | Entity relation extraction method | |
CN103853720B (en) | User attention based network sensitive information monitoring system and method | |
US20150149374A1 (en) | Relationship circle processing method and system, and computer storage medium | |
CN117114112A (en) | Vertical field data integration method, device, equipment and medium based on large model | |
CN110929509B (en) | Domain event trigger word clustering method based on louvain community discovery algorithm | |
CN112329857A (en) | Image classification method based on improved residual error network | |
CN108229565A (en) | A kind of image understanding method based on cognition | |
CN116450827A (en) | Event template induction method and system based on large-scale language model | |
Yuan et al. | A novel figure panel classification and extraction method for document image understanding | |
CN112328812A (en) | Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment | |
Jingliang et al. | A data-driven approach based on LDA for identifying duplicate bug report | |
Sánchez et al. | Diatom classification including morphological adaptations using CNNs | |
CN112131384A (en) | News classification method and computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180817 |
|
WD01 | Invention patent application deemed withdrawn after publication |