CN110390018A - A kind of social networks comment generation method based on LSTM - Google Patents
A kind of social networks comment generation method based on LSTM Download PDFInfo
- Publication number
- CN110390018A CN110390018A CN201910680645.7A CN201910680645A CN110390018A CN 110390018 A CN110390018 A CN 110390018A CN 201910680645 A CN201910680645 A CN 201910680645A CN 110390018 A CN110390018 A CN 110390018A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- comment
- feature
- lstm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000012552 review Methods 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims description 26
- 238000012937 correction Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000000605 extraction Methods 0.000 claims description 14
- 238000007637 random forest analysis Methods 0.000 claims description 11
- 101100340610 Mus musculus Igdcc3 gene Proteins 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 5
- 230000008901 benefit Effects 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 239000002356 single layer Substances 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 10
- 239000000463 material Substances 0.000 abstract description 7
- 230000004927 fusion Effects 0.000 abstract description 3
- 230000001427 coherent effect Effects 0.000 abstract description 2
- 230000001902 propagating effect Effects 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 16
- 238000011160 research Methods 0.000 description 14
- 238000002474 experimental method Methods 0.000 description 5
- 230000003252 repetitive effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000052 comparative effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000036541 health Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 241001633663 Iris pseudacorus Species 0.000 description 1
- 208000002193 Pain Diseases 0.000 description 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000036407 pain Effects 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Business, Economics & Management (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
Abstract
A kind of social networks comment generation method based on LSTM, belongs to social networks comment generation technique field.The present invention be understand existing social networks comment on scene applied by generation technique be excessively narrow it is single, the problem of material database is provided can not be drawn to public sentiment.The present invention uses the NLG technology learnt based on LSTM, and come the vision to sentence structure, the semantic, type of character and each character encode the probabilistic relation between each character obtained by study.Fusion in terms of having carried out semantic and syntax to the comment information being intended by, and the later period passes through the methods of specific word replacement, generation and almost consistent lively, clear and coherent, the changeful high quality reviews text of social networks.The present invention provides advantageous material corpus for public sentiment guidance, by propagating more true, trustworthy speeches, restores the network environment of positive energy.The present invention can be used as material corpus and be input in the system of existing public sentiment guidance, generate for the comment of social networks specific area.
Description
Technical field
The present invention relates to a kind of, and the social networks based on LSTM comments on generation method, belongs to social networks comment generation technique
Field.
Background technique
Nowadays, online social network-i i-platform be greatly promoted the life of netizen with exchange, occurrences in human life object all over the world because
It is closely connected for network, people are higher and higher to the participation of network event, generate numerous social networks comments therefrom.
Comment represents a kind of language, a kind of sound, is ideological reflection, text sentence is concise, be intended to clear, various structures,
It is the ideal place of test text Auto.Certain user can be expressed for a certain focus incident by being posted by
The view and position of oneself, or approve of, or neutral, or negative, or be the network rumour under a kind of driving of interests.This patent is by society
It hands over network locked in Twitter platform, collects political, health on this platform, education, amusement, user's life in five fields of science and technology
At content (User Generated Content, UGC), that mainly completes the comment text in Twitter automatically generates work.
Text Auto belongs to one of research field of artificial intelligence, its main thought is according to will input computer-internal
Information, the Analysis of Deep Implications for the information of being generated is cooked up according to the text planning device of computer-internal, then device is realized by text
This meaning is converted to the language construction of grammatical, is exported in the form of comment text.Artificial intelligence is current scientific development
Hot spot, text generation technology has gradually obtained the concern of people, while the application ten of text generation technology in real life
Divide extensively, strong influence is brought to human work and life.Existing literature CN108256968A discloses a kind of electric business platform
Commodity comment of experts generation method, the document propose that a kind of comment of experts for generating model based on sequence to sequence is summarized and generate skill
Art extracts the important information in all user comments of certain commodity, generates the language of one section of summing-up to describe the characteristic of commodity.Disappear
Expense person can understand the advantage and disadvantage of commodity according to the comment of experts of generation, consider whether to buy;Businessman can be according to generation
Comment of experts improves oneself commodity.The present invention can be extracted with the important comment for representing product characteristics, can be quotient
Family improves commodity and provides reference well, and businessman is allowed to promote the user experience of product, improves sales volume, additional income.Its energy simultaneously
Purchase reference enough is provided for consumer, promotes the shopping experience of consumer;It may also help in electric business platform and attract more viscosity
User expands the influence power of itself.The document does not propose to generate comment by deep learning.
From the point of view of the general situation of development that domestic and international natural language text generates, existing spatial term technology, in social activity
The comment of network specific area generates aspect, has the following problems.
(1) spatial term research aspect has had the model of very more comparative maturities.Natural language text generates
It focuses mostly in interactive system, machine translation, information retrieval, text classification, automatic abstract, the target text of research is mostly rule
The text collection of model, or the article to deliver, or be the data set of disclosed specification, it is related to social networks comment text generation side
The research in face is seldom.
(2) the way of production of network comment is varied, social networks (such as Facebook, Twitter, Weibo,
RenRen), e-commerce (such as Amazon, Alibaba, Dangdang.com), mail service (such as Gmail, Yahoo, E-mail), net
Network forum (such as ends of the earth, Netease, bean cotyledon).It is related to the more of the comment text generation of e-commerce and mail service platform at present,
And there is no the generations that a kind of pervasive method realizes social networks comment text.
(3) the existing model generated for comment text, the language mode fixed single of research.On Twitter platform
Comment be directed to different field focus incident, event have more it is sudden, thoughts i.e. send out, be mostly with first impression it is leading,
Comment has randomness more, and diversity, rule is difficult to capture.Although the comment generation on Yelp social network sites achieves plentiful and substantial
Achievement, but it only relates to user in the comment mode of public comment, it then follows a fairly standard structure is with experience
Leading, without preamble, link closely theme, comments in crucial point, and mostly being disliked with happiness is leading, mode fixed single, on the whole application scenarios
Excessively stable, rule is easy to capture.And the and comment that is not suitable on Twitter platform.
Summary of the invention
The technical problem to be solved in the present invention:
The present invention be understand existing social networks comment on scene applied by generation technique be excessively narrow it is single, can not be right
Public sentiment draws the problem of providing material database, and then proposes a kind of social networks comment generation method based on LSTM.
The present invention solve above-mentioned technical problem the technical solution adopted is that:
A kind of social networks comment generation method based on LSTM, which comprises
Classify to comment text, comment text is divided into seven kinds of classifications: principal series table structure, compares level structure, query
Sentence structure, exclamative sentence structure, Subject, Predicate and Object structure, Subject, Predicate and Object guest mend structure, imperative sentence structure;For different classifications, design is not
Same LSTM model, obtains Probability Structure by the study to each LSTM model, generates different classes of initial comment IRi,
Subscript i indicate seven kinds of classifications, i be equal to 1,2,3 ... 7;
According to it is different classes of itself the characteristics of, formulate corresponding text-processing strategy to correct corresponding LSTM model,
And then it generates and really comments on consistent high quality reviews text FR with social networksi;
A specific area D is given under social networks W, the hot topic collection which includes is combined into T={ T1,T2,…,
Tn, select a certain topic Ti, for topic TiA certain specific main patch P is selected, the comment text collection under main patch P is crawled, is expressed as
RR={ RR1,RR2,…,RRn, by classification processing, different classes of data are filtered out, are denoted as FlR={ FlR1,
FlR2,…,FlRn, it is separately input in the LSTM model of different parameters, generates the initial comment collection of respective classes, be expressed as IR
={ IR1,IR2,…,IRn};For different classes of feature respectively to IRiDifferent strategies is formulated to carry out drift correction, is given birth to
FR={ FR is expressed as at final comment collection1,FR2,…,FRn, as shown in formula (1) to (3);Wherein i ∈ { 1,2,3 ..., n },
N is equal to 7, and wherein C function represents the text classification process based on Random Forest model, and it is raw that function h represents the text based on LSTM
At process, function zjRepresent drift correction process;Drift correction includes text replacement, text is repeated and three kinds of plans of model customization
Slightly;
C(RR)→FlRi (1)
h(W,D,T,P,FlRi)→IRi (2)
zj(IRi)→FRi j∈{1,2,3} (3)。
It further, is (i.e. based on random forest mould to the process that comment text is classified based on Random Forest model
The sentence structure of type is classified):
Data set is created by crawler first, is segmented, part-of-speech tagging, sentence dividing processing,
Secondly the operation that feature extraction and extraction are carried out to text, obtains the feature vector for indicating text, input with
In machine forest model, the output of text classification is obtained by random forest training, comment text is finally divided into following seven class:
Principal series table structure compares level structure, interrogative sentence structure, exclamative sentence structure, Subject, Predicate and Object structure, Subject, Predicate and Object guest benefit structure, imperative sentence
Structure;
Feature Dimension Reduction is carried out using feature selecting and feature extraction in stating feature extraction, the feature of selection is identified
Symbol expression, the specific features of extraction are as follows:
(1) Word embedding+tf-idf feature
Vectorization is carried out to each word by selecting word embedding vector obtained by CBOW model training;
Shown in the objective function of CBOW such as formula (4);For word wt, word context is Context (wt)={ wt-b,...,
wt-1,wt+1,...,wt+b};Wherein, b constant is used to determine the contextual window size of word, window size b=4, final vector
The word dimension of change is determined as 120;Simultaneously introduce tf-idf feature selection approach to each word embedding vector into
Row is considered, and is obtained word embedding+tf-idf feature, is named as Wetfidf;
(2) WfreMatrix (word frequency matrix) feature
WfreMatrix feature is used to indicate that frequency matrix, row indicate the number of article, and column indicate institute in comment text
The word occurred, the element in matrix are the frequency that word occurs;Word frequency statistics are calculated by the interface that sklearn is provided
It completes;
(3) Pos (Part of speech) feature
Part-of-speech tagging is carried out to comment text using pos_tag_sents function in NLTK tool;
(4) Key feature
Key feature is used to indicate that keyword feature, keyword are the words for referring to represent a classification, and Key is calculated as public
Formula (5),
(5) Index feature
Index feature is used to indicate the feature of word position sequence;
(6) Punc feature
Punc feature is used to indicate punctuation mark feature, be counted by the position sequence to fixed punctuation mark sequence to take out
Take punc feature.
This above-mentioned process illustrates formula (1) is how to realize.
Further, for different classifications, different LSTM models is designed, learning to each LSTM model is passed through
To Probability Structure, different classes of initial comment IR is generatedi(comment text based on LSTM automatically generates), process are as follows:
Building is encoded-is decoded the text generation model of structure based on LSTM, is given a Twitter and is commented on short text, first
First the text of input is encoded, in decoding stage, uses the probability of the candidate characters of the LSTM different moments generated one by one
Distribution, then it is next to select and determine by a kind of reasonable sampling techniques (such as greedy sampling, random sampling, Beam are searched for)
The character of a appearance, the character string of composition constitute the natural language description to input semantic item;Coding side is equal with decoding end
It is made of single layer LSTM;A context vector C is generated in coding stage, input of this vector as decoding stage is decoding
End exports final sequence data.This above-mentioned process illustrates formula (2) is how to realize.
Below this 4 corresponding technical solution of power and 5 corresponding technical solutions of power come together to illustrate formula (3) be as
What what was realized.
Further, according to it is different classes of itself the characteristics of, it is corresponding to correct to formulate corresponding text-processing strategy
LSTM model, and then generate and really comment on consistent high quality reviews text FR with social networksi(the text based on domain knowledge
Drift correction technology), process are as follows:
For topic (event) TiItself collects relatively comprehensive priori knowledge, i.e. domain knowledge, for theme
Unrelated or low correlation comment, and the comment runed counter to the fact carry out drift correction processing, and drift correction processing includes
Text replacement, text are repeated and based on three kinds of drift correction processing of model customization, referred to as the text based on domain knowledge is inclined
Poor correction technique;
Further, the process description of text replacement algorithm:
(1) descriptor C is given, corresponding reference data set F is chosen by descriptor C;
(2) found out inside reference data set F all related with descriptor C, and topic relativity is greater than the word of threshold value,
Form a set P;The determination method of set P is carried out by dictionary wordNet, by choose descriptor subordinate relation, at
Member's relationship, implication relation extract Candidate Set, by comparing the similitude sim and threshold value of each word p in descriptor C and P
K obtains final Candidate Set P;
(3) noun similar with C is found out with second step inside the initial comment collection IR of generation, forms a set Q;It is right
A word in Q, score according to the degree of correlation, with the word in P come random replacement;
The process description of text repetition algorithm:
For any sentence s ∈ IR, part of speech judgement is carried out to word token therein, to judge adjective, pair
Word, verb structure token, if word belongs to synonym dictionary Syn, the token memory translate and retranslate
Process, will be nearest with former token distance by carrying out the judgement of cosine similarity so that obtain token reports word
Word is replaced, to obtain the repetition text FRpa for repeating algorithm based on text;
Process description based on model customization algorithm:
Text corresponding types in FR comment collection are extracted using type () function, to interrogative sentence in comment collection and exclamation
Sentence type, corresponding templates type set T in template library TR is extracted;It, can by judging the template type t' of sentence sent
It to obtain t' template slot and corresponding part of speech sequence, is carried out with template t corresponding, translate function is used to carry out template groove location
Exchange, realize based on template customization text sent' generation.
The invention has the following advantages:
The research of existing text generation is very mature, but the research for designing Twitter platform is also fewer, for
A problem of upper section proposes, diversity, randomness, nature of leisure of the present invention towards Twitter comment text press comment text
It is divided into according to language mode different classes of, for different classes of, targetedly generates the text of respective classes.Using based on LSTM
The NLG technology of study, the probabilistic relation between each character obtained by study semantic, character come the vision to sentence structure
Type and each character encoded.Fusion in terms of having carried out semantic and syntax to the comment information being intended by,
And the later period passes through the methods of specific word replacement, generation and almost consistent lively, clear and coherent, the changeful high quality of social networks
Comment text.
The social networks specific area that the present invention mainly studies comments on generation technique, provides advantageous material for public sentiment guidance
Corpus, by propagating more true, trustworthy speeches, for netizen provide one it is reliable, actively, it is health, upward
Mainstream of public opinions environment, restore the network environment of positive energy.The present invention can be used as material corpus and be input to existing carriage
In the system of feelings guidance, certain a large amount of reaction speeches are broken through in time.To purifying Internet environment, the public opinion of the country and people is ensured
Atmosphere weakens hostile force, and the stable development in an all-round way of safeguarding national security of building a harmonious society is of great significance.
Detailed description of the invention
Fig. 1 is the block diagram that the comment text based on LSTM generates model, and Fig. 2 is the exemplary schematic diagram of drift correction, and Fig. 3 is
Increase contrast and experiment figure (Fig 3Add the results of the comparative whether classification
experiment of classification or not);Fig. 4 is IR and FR text F1Value compares figure (Fig 4Compare
the text F1values of IR and FR);
Fig. 5 is comparative result figure (the Fig 5The model trains of model training Twitter data and Yelp data
Twitter data to be compared with Yelp data);
Fig. 6 is repetitive rate variation diagram (the Fig 6The variation diagram of the under each sample value
repetition rate under each sample value)。
Specific embodiment
The realization of overall plan of the present invention is illustrated as follows in conjunction with attached drawing:
1, since Twitter comment text has diversity, randomness, nature of leisure, for politics, health, education, joy
The characteristics of happy, five fields of science and technology are commented on, the architectural characteristic for combining language classifies to comment text.For different
Classification designs different LSTM models, by learn obtained Probability Structure is semantic to the vision of word structure, type of word with
And a other word is encoded.Fusion in terms of having carried out semantic and syntax to the comment information being intended by generates different
The initial comment of classification.According to it is different classes of itself the characteristics of, formulate corresponding text-processing strategy and carry out correction model.It generates
Almost consistent high quality reviews text is really commented on social networks.A specific area D, the neck are given under social networks W
The hot topic collection that domain includes is combined into T={ T1,T2,…,Tn, select a certain topic Ti, for topic TiSelect a certain specific master
P is pasted, the comment text collection under main patch P is crawled, is expressed as RR={ RR1,RR2,…,RRn, by classification processing, we are filtered out
Different classes of data are denoted as FlR={ FlR1,FlR2,…,FlRn, it is separately input to the LSTM model of different parameters
In, the initial comment collection of respective classes is generated, IR={ IR is expressed as1,IR2,…,IRn}.For different classes of feature, respectively
To IRiDifferent strategies is formulated to carry out drift correction, final comment collection is generated and is expressed as FR={ FR1,FR2,…,FRn, such as
Formula (1) is to shown in (3).Wherein i ∈ { 1,2,3 ..., n }, wherein text classification of the c function stand based on Random Forest model
Process, text generation process of the h function stand based on LSTM, zjFunction stand is replaced by text, text is repeated, model customization group
At drift correction process.
C(RR)→FlRi (1)
h(W,D,T,P,FlRi)→IRi (2)
zj(IRi)→FRi j∈{1,2,3} (3)
2, the sentence structure classification based on Random Forest model
User is higher and higher to the participation of the social event broken out suddenly, to generate large-scale comment text.
Twitter Commentary Writing style is complicated and changeable, and identification is low, and a large amount of text input machine learning model is directly carried out text
It generates, style is difficult to acquire, and can not obtain expected good result.So this patent is first by comment text from the sentence of sentence
The angle of formula structure is classified, and the single style comment that speech has identification high is obtained.Data are created by crawler first
Collection is segmented, the processing such as part-of-speech tagging, sentence segmentation, since the feature of text will cause feature disaster excessively to being formed
The problems such as over-fitting, therefore the operation of feature extraction and extraction has been carried out to text, the feature vector for indicating text is obtained,
Input Random Forest model in, by random forest training obtained the output of text classification, finally by comment text be divided into
Lower seven classes: principal series table structure compares level structure, interrogative sentence structure, exclamative sentence structure, Subject, Predicate and Object structure, and Subject, Predicate and Object guest mends structure,
Imperative sentence structure.
Feature extraction is a step of text classification most critical, and the serviceability of extracted feature directly affects classification results
Quality.In assorting process, if feature dimension is excessively high, it may occur that dimension disaster generates over-fitting, and noise data is excessive etc.
Phenomenon.Therefore it needs to carry out Feature Dimension Reduction using feature selecting and feature extraction.The feature of selection is indicated with identifier, under
Face will introduce these features one by one.
(1) Word embedding+tf-idf feature
Vectorization is carried out to each word by selecting word embedding vector obtained by CBOW model training.
Shown in the objective function of CBOW such as formula (4).For word wt, word context is Context (wt)={ wt-b,...,
wt-1,wt+1,...,wt+b}.Wherein, b constant is used to determine the contextual window size of word.The window of model accuracy and word
Mouth size b is positively correlated.In the research of this chapter, the word dimension of window size b=4, final vectorization are determined as 120.Together
When introduce this feature selection approach of tf-idf each of text word embedding vector is considered, obtain
Word embedding+tf-idf feature, is named as Wetfidf.
(2) WfreMatrix (word frequency matrix) feature
For WfreMatrix feature for indicating frequency matrix, its row indicates the number of article, and column indicate all in article
The word occurred, the element in matrix are the frequency that word occurs.The statistics of word frequency is calculated by the interface that sklearn is provided
It completes.
(3) Pos (Part of speech) feature
The sentence structure of English follows certain rule, has apparent template, the part of speech with word each in text
There are very big relationship, the relative ranks between word, dependence decides the trend of clause.Comment on more colloquial styles, clause letter
Short, mostly simple sentence, having altogether through statistics part of speech includes 10 kinds.Using pos_tag_sents function in NLTK tool to text into
Row part-of-speech tagging.
(4) Key feature
Key feature is used to indicate keyword feature.Keyword refer to it is some can represent in many cases one classification
Word.For example, basic judgement can be carried out by word " than " by comparing level structure;For exclamative sentence structure How/What with sigh
Number combination, can be locked well.And the link-verb in principal series table structure is an apparent mark.Key is calculated as public
Formula (5).
(5) Index feature
Index feature is used to indicate the feature of word position sequence.For imperative sentence structure, most of is verb beginning, i.e., dynamic
The position sequence of word is one, is a good feature.For Subject, Predicate and Object structure, subject, predicate, object relative ranks, be one
Important criterion of identification.
(6) Punc feature
Punc feature is used to indicate punctuation mark feature.The type of punctuation mark is to differ widely for different clause
, for interrogative sentence structure, the presence of question mark is a very big feature;Exclamation is used to indicate exclamative sentence or imperative sentence.It is right
In the punctuation mark feature of text, counted by the position sequence to fixed punctuation mark sequence to extract punc feature.
3, the comment text based on LSTM automatically generates
Building is encoded-is decoded the text generation model of structure based on LSTM, and basic structure is as shown in Figure 1.It is one given
Twitter comments on short text, encodes first to the text of input, in decoding stage, the difference generated one by one using LSTM
The probability distribution of the candidate characters at moment, then by a kind of reasonable sampling techniques (as greedy sampling, random sampling, Beam are searched
Rope etc.) character of next appearance is selected and determined, the character string of composition constitutes the natural language to input semantic item
Description.Coding side and decoding end are made of single layer LSTM.A context vector C, this vector conduct are generated in coding stage
The input of decoding stage exports final sequence data in decoding end.As shown in Figure 1.
4, the text drift correction based on domain knowledge
For make generate comment be close to model theme, that is, have higher topic relativity, and be true to life for
Principle has collected relatively comprehensive priori knowledge, i.e. domain knowledge for an event itself, for the unrelated or phase with theme
The low comment of closing property, and the comment runed counter to the fact carry out drift correction processing, including text replacement, text repetition and base
It is handled in three kinds of model customization, referred to as the text drift correction technology based on domain knowledge.By taking the replacement of nominal text as an example
Son is illustrated, under this hot spot theme of certain state leader's general election, RR1(It is a great book) this comment is bright
Aobvious is a comment unrelated with theme, carries out text replacement operation to it, as shown in Fig. 2, in initially commenting on theme without
The word mark green that the needs of pass are replaced marks, and the candidate word for being used to replace inside the primary comment collection on Twitter platform is used
Red mark, the word yellow flag after replacement.
5, algorithm description
Theme to make the comment ultimately generated and target hot spot send an invitation has higher topic relativity, proposes to be based on
The method of the text replacement of noun.The determination method of set P is carried out by wordNet, by choosing the subordinate relation of descriptor,
Member relation, implication relation extract Candidate Set P ', by comparing each word p in descriptor C and P similitude sim and
Threshold value k obtains final Candidate Set P.Some nouns similar with C are found out with second step inside the initial comment collection R of generation,
Form a set Q.For a word in Q, score according to the degree of correlation, with the word in P come random replacement.
Text replacement algorithm based on noun is expressed as follows:
Algorithm 1:Text Replacement Method
Input: initial comment collection IR, classify comment collection FiLR, descriptor C, similarity threshold MINsim
Output: final comment collection FRnoun
Step 1: finding the word set composition P set close with C in classification comment collection
For t∈FilR
For n∈Nouns(t)
A) initialization set
B) all words relevant with descriptor C are found out inside reference data set F
C) word that topic relativity is greater than threshold value is filtered out, a set P is formed
END For
END For
Set Q is formed with C close word set step 2: finding in initial comment collection
For n∈Nouns(R)
Some nouns similar with C are found out in the initial comment collection R of generation, form a set Q
For p∈P do
For a word in Q, score according to the degree of correlation, with the word in P come random replacement
END For
END For
Repeat a variety of expression ways i.e. to same semanteme.In the research of text generation, repetition can be applied to
In the automatic rewriting for the sentence that LSTM model generates, it can help to generate more smooth lively text.Especially in " vocabulary choosing
Select " this link can select to be used flexible and changeablely when expressing a certain semantic according to different context of co-texts
Vocabulary, the abundant corpus ultimately generated.It is as follows that text repeats algorithm specific manifestation:
Algorithm 2:Text Paraphrases Method
Input: initial comment collection-IR, adjective-ADJ, adverbial word-ADV
Output: text FR is repeatedPa
For each node e in s
(1) part of speech judgement is carried out to word token therein, to the adjective, adverbial word, verb structure token judged
(2) if word belongs to synonym dictionary Syn, which marks the process of translate and retranslate
(3) obtain token reports word
(4) it by carrying out the judgement of cosine similarity, will be replaced with former token apart from nearest word
END For
" template " refers to abstract expression that is extensive from phrase, sentence these natural languages and coming.Just because of template and phase
The example answered, which is compared, stronger representativeness, therefore is widely used in the research work of spatial term.One template
It is made of template word (pattern words) and template slot (pattern slots) two parts, wherein template word can be considered template
Constant part, template slot is then considered as the variable part of template.The statistical induction from the corpus for the syntax gauge largely collected
The fixed template of form out, according to the matching degree of input item and template, to determine the different instances generated.The same template can be with
It is instantiated as a variety of examples, thus more rich language material library.Algorithm based on model customization is specifically expressed as follows
Algorithm 3:Template-based Text Customization Method
Input: the comment collection FR for completion of classifying, template library TR
Output: the comment of addition customization text and FRim
For sent in FR:
Type () function is for extracting text corresponding types in FR comment collection
Corresponding templates type set T in template library TR is extracted
By the template type t' for judging sentence sent
(4) t' template slot and corresponding part of speech sequence are obtained,
(5) carried out with template t it is corresponding, use translate function carry out template groove location displacement
END For
It is verified as follows for technical effect of the invention:
Quality evaluation is carried out to model using accuracy rate (Precision) and recall rate (Recall).In such situation
Under, the ratio of the comment item number that the accuracy rate machine that then method of calculating with finger detected generates and all comment item numbers that detected,
It is the precision ratio for measuring experimental result.Recall rate is then that the comment item number that the machine that algorithm detected generates and all machines produce
The ratio of raw comment item number, measures the recall ratio of experimental result.Accuracy rate has more for the comment that assessment algorithm detected
It is the false comment that machine generates less, how many falseness that recall rate is used to assess all machines generations commented on and be retrieved.
F1Value indicates the harmomic mean of accuracy rate and recall rate, as accuracy rate with recall rate is lower means comment caused by machine
It is more difficult to be detected.Therefore, in order to assess the authenticity of false comment generated, therefore accuracy rate, recall rate and F1
The lower value the better.
The present invention is different with the research of previous text generation, in terms of having carried out English sentence structure before the text generation
To compare experiment, i.e., in order to prove the validity of addition this link of text classification both of which is arranged in the link of classification
Control its dependent variable it is all the same in the case where, model one be comment text generate experiment before without text classification grasp
Make, model two is that text classification operation is carried out before comment text generates experiment, and the F under both of which is illustrated in Fig. 31
Value, from the results, it was seen that the F for the model being added after text classification operation1Value is significantly lower than the model that do not classify in advance,
To illustrate the indispensable property of classification link.Processing is replaced into through text without the IR of text replacement processing in each field
FR afterwards is compared, as shown in figure 4, it can be found that after text-processing F1Value is greatly reduced, it is seen that text-processing pair
The performance of generated text plays huge improvement result.
Experimental data set using disclosed experimental data set (commenting on collection in the restaurant on the website Yelp) with this research
(comment on Twitter platform) as a comparison, what Fig. 5 was indicated is that this model learns Yelp platform and Twitter platform respectively
On comment accuracy rate the results show that, it is evident that two figures are almost without difference from figure, and accuracy rate is all relatively low, from
And more illustrating this model has stronger cross-platform adaptability.
Many comments on website are simply to be copied into thousand up to a hundred to drive public opinion direction, or to part duplication
Comment, which is modified, forms a new comment.It is team's crime that these comments, which are easy to find out at a glance, takes great pains to build up, carrys out band
Dynamic public opinion.The comment of these massive duplications is easy to the comment for being defined as being not trusted.Thus by based on K-gram's
Winnowing plagiarizes duplicate checking technology, and the part in the FR and database of this patent in Twitter platform is really commented on and is compared
Compared with carrying out duplicate checking detection with true comment in the two and database, the repetitive rate under different samples finally obtained, such as Fig. 6 institute
Show.True comment is stablized 0.08 or so since there is no plagiarizing the phenomenon that plagiarizing, will not with sample increase or subtract
Originally fluctuation less, the FR of this research increase with the increase repetitive rate of sample, repetitive rate declines when sample rate is 0.5,
When sample rate is 0.8, it is lower than 0.08, it can be seen that the FR repetitive rate of this research is lower than true comment.
Claims (5)
1. a kind of social networks based on LSTM comments on generation method, which is characterized in that the described method includes:
Classify to comment text, comment text is divided into seven kinds of classifications: principal series table structure, compares level structure, interrogative sentence knot
Structure, exclamative sentence structure, Subject, Predicate and Object structure, Subject, Predicate and Object guest mend structure, imperative sentence structure;For different classifications, design different
LSTM model obtains Probability Structure by the study to each LSTM model, generates different classes of initial comment IRi, inferior horn
Mark i indicate seven kinds of classifications, i be equal to 1,2,3 ... 7;
According to it is different classes of itself the characteristics of, formulate corresponding text-processing strategy to correct corresponding LSTM model, in turn
It generates and really comments on consistent high quality reviews text FR with social networksi;
A specific area D is given under social networks W, the hot topic collection which includes is combined into T={ T1,T2,…,Tn, choosing
Fixed a certain topic Ti, for topic TiA certain specific main patch P is selected, the comment text collection under main patch P is crawled, is expressed as RR=
{RR1,RR2,…,RRn, by classification processing, different classes of data are filtered out, are denoted as FlR={ FlR1,
FlR2,…,FlRn, it is separately input in the LSTM model of different parameters, generates the initial comment collection of respective classes, be expressed as IR
={ IR1,IR2,…,IRn};For different classes of feature respectively to IRiDifferent strategies is formulated to carry out drift correction, is given birth to
FR={ FR is expressed as at final comment collection1,FR2,…,FRn, as shown in formula (1) to (3);Wherein i ∈ { 1,2,3 ..., n },
N is equal to 7, and wherein C function represents the text classification process based on Random Forest model, and it is raw that function h represents the text based on LSTM
At process, function zjRepresent drift correction process;Drift correction includes text replacement, text is repeated and three kinds of plans of model customization
Slightly;
C(RR)→FlRi (1)
h(W,D,T,P,FlRi)→IRi (2)
zj(IRi)→FRi j∈{1,2,3} (3)。
2. a kind of social networks based on LSTM according to claim 1 comments on generation method, which is characterized in that
The process classified based on Random Forest model to comment text are as follows:
Data set is created by crawler first, is segmented, part-of-speech tagging, sentence dividing processing,
Secondly the operation that feature extraction and extraction are carried out to text, obtains the feature vector for indicating text, and input is random gloomy
In woods model, the output of text classification is obtained by random forest training, comment text is finally divided into following seven class: principal series
Table structure compares level structure, interrogative sentence structure, exclamative sentence structure, Subject, Predicate and Object structure, Subject, Predicate and Object guest benefit structure, imperative sentence structure;
Feature Dimension Reduction is carried out using feature selecting and feature extraction in stating feature extraction, by the feature of selection identifier table
Show, the specific features of extraction are as follows:
(1) Word embedding+tf-idf feature
Vectorization is carried out to each word by selecting word embedding vector obtained by CBOW model training;CBOW
Objective function such as formula (4) shown in;For word wt, word context is Context (wt)={ wt-b,...,wt-1,
wt+1,...,wt+b};Wherein, b constant is used to determine the contextual window size of word, window size b=4, final vectorization
Word dimension is determined as 120;Tf-idf feature selection approach is introduced simultaneously to examine each word embedding vector
Amount, obtains word embedding+tf-idf feature, is named as Wetfidf;
(2) WfreMatrix (word frequency matrix) feature
WfreMatrix feature is used to indicate frequency matrix, and row indicates the number of article, column indicate in comment text it is all go out
The word now crossed, the element in matrix are the frequency that word occurs;Word frequency statistics are had been calculated by the interface that sklearn is provided
At;
(3) Pos (Part of speech) feature
Part-of-speech tagging is carried out to comment text using pos_tag_sents function in NLTK tool;
(4) Key feature
Key feature is used to indicate that keyword feature, keyword are the words for referring to represent a classification, and Key calculates such as formula
(5),
(5) Index feature
Index feature is used to indicate the feature of word position sequence;
(6) Punc feature
Punc feature is used to indicate punctuation mark feature, be counted by the position sequence to fixed punctuation mark sequence to extract
Punc feature.
3. a kind of social networks based on LSTM according to claim 2 comments on generation method, which is characterized in that for not
Same classification, designs different LSTM models, obtains Probability Structure by the study to each LSTM model, generates different classes of
Initial comment IRi, process are as follows:
Building is encoded-is decoded the text generation model of structure based on LSTM, is given a Twitter and is commented on short text, right first
The text of input is encoded, in decoding stage, using the probability distribution of the candidate characters of the LSTM different moments generated one by one,
The character of next appearance is selected and determined by a kind of reasonable sampling techniques again, the character string of composition is constituted to defeated
Enter the natural language description of semantic item;Coding side and decoding end are made of single layer LSTM;About one is generated in coding stage
Literary vector C, input of this vector as decoding stage, final sequence data is exported in decoding end.
4. a kind of social networks based on LSTM according to claim 3 comments on generation method, which is characterized in that according to not
It is generic itself the characteristics of, formulate corresponding text-processing strategy to correct corresponding LSTM model, so generate with it is social
Network really comments on consistent high quality reviews text FRi, process are as follows:
For a topic TiItself collects relatively comprehensive priori knowledge, i.e. domain knowledge, for unrelated or related to theme
Property low comment, and the comment runed counter to the fact carries out drift correction processing, and drift correction processing includes text replacement, text
It repeats and based on three kinds of drift correction processing of model customization, referred to as the text drift correction based on domain knowledge.
5. a kind of social networks based on LSTM according to claim 4 comments on generation method, which is characterized in that
The process description of text replacement algorithm:
(1) descriptor C is given, corresponding reference data set F is chosen by descriptor C;
(2) found out inside reference data set F all related with descriptor C, and topic relativity is greater than the word of threshold value, composition
One set P;The determination method of set P is carried out by dictionary wordNet, by the subordinate relation, the Cheng Yuanguan that choose descriptor
System, implication relation extract Candidate Set, by comparing the similitude sim and threshold value k of each word p in descriptor C and P, obtain
To final Candidate Set P;
(3) noun similar with C is found out with second step inside the initial comment collection IR of generation, forms a set Q;For Q
In a word, score according to the degree of correlation, with the word in P come random replacement;
The process description of text repetition algorithm:
For any sentence s ∈ IR, part of speech judgement is carried out to word token therein, to the adjective judged, adverbial word, is moved
Word structure token, if word belongs to synonym dictionary Syn, the mistake of the token memory translate and retranslate
Journey, so that obtain token reports word, by carrying out the judgement of cosine similarity, by with former token apart from nearest word into
Row replacement, to obtain the repetition text FRpa for repeating algorithm based on text;
Process description based on model customization algorithm:
Text corresponding types in FR comment collection are extracted using type () function, to interrogative sentence in comment collection and exclamative sentence class
Type extracts corresponding templates type set T in template library TR;By judging the template type t' of sentence sent, can obtain
It to t' template slot and corresponding part of speech sequence, is carried out with template t corresponding, translate function is used to carry out the tune of template groove location
It changes, realizes the generation of the customization text sent' based on template.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910680645.7A CN110390018A (en) | 2019-07-25 | 2019-07-25 | A kind of social networks comment generation method based on LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910680645.7A CN110390018A (en) | 2019-07-25 | 2019-07-25 | A kind of social networks comment generation method based on LSTM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110390018A true CN110390018A (en) | 2019-10-29 |
Family
ID=68287434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910680645.7A Pending CN110390018A (en) | 2019-07-25 | 2019-07-25 | A kind of social networks comment generation method based on LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390018A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111078888A (en) * | 2019-12-20 | 2020-04-28 | 电子科技大学 | Method for automatically classifying comment data of social network users |
CN111126063A (en) * | 2019-12-26 | 2020-05-08 | 北京百度网讯科技有限公司 | Text quality evaluation method and device |
CN111221940A (en) * | 2020-01-03 | 2020-06-02 | 京东数字科技控股有限公司 | Text generation method and device, electronic equipment and storage medium |
CN111541910A (en) * | 2020-04-21 | 2020-08-14 | 华中科技大学 | Video barrage comment automatic generation method and system based on deep learning |
CN113033179A (en) * | 2021-03-24 | 2021-06-25 | 北京百度网讯科技有限公司 | Knowledge acquisition method and device, electronic equipment and readable storage medium |
CN113705227A (en) * | 2020-05-21 | 2021-11-26 | 中国科学院上海高等研究院 | Method, system, medium and device for constructing Chinese non-segmented word and word embedding model |
CN113743086A (en) * | 2021-08-31 | 2021-12-03 | 北京阅神智能科技有限公司 | Chinese sentence evaluation output method |
CN114429403A (en) * | 2020-10-14 | 2022-05-03 | 国际商业机器公司 | Mediating between social network and payment curation content producers in false positive content mitigation |
CN114443809A (en) * | 2021-12-20 | 2022-05-06 | 西安理工大学 | Hierarchical text classification method based on LSTM and social network |
CN114510649A (en) * | 2022-02-25 | 2022-05-17 | 西安理工大学 | Social network and LSTM model accuracy rate calculation method based on de-duplication sample |
CN114510924A (en) * | 2022-02-14 | 2022-05-17 | 哈尔滨工业大学 | Text generation method based on pre-training language model |
CN114707489A (en) * | 2022-03-29 | 2022-07-05 | 马上消费金融股份有限公司 | Method and device for acquiring marked data set, electronic equipment and storage medium |
CN117807963A (en) * | 2024-03-01 | 2024-04-02 | 之江实验室 | Text generation method and device in appointed field |
CN113033179B (en) * | 2021-03-24 | 2024-05-24 | 北京百度网讯科技有限公司 | Knowledge acquisition method, knowledge acquisition device, electronic equipment and readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160180838A1 (en) * | 2014-12-22 | 2016-06-23 | Google Inc. | User specified keyword spotting using long short term memory neural network feature extractor |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
-
2019
- 2019-07-25 CN CN201910680645.7A patent/CN110390018A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160180838A1 (en) * | 2014-12-22 | 2016-06-23 | Google Inc. | User specified keyword spotting using long short term memory neural network feature extractor |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
Non-Patent Citations (3)
Title |
---|
YU TAI等: "Automatic Generation of Review Content in Specific domain of social network based on RNN", 《IEEE》 * |
张文宇,李栋: "《物联网智能技术》", 31 December 2012 * |
蓝翔: "采用统计机器翻译模型的复述生成技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111078888A (en) * | 2019-12-20 | 2020-04-28 | 电子科技大学 | Method for automatically classifying comment data of social network users |
CN111078888B (en) * | 2019-12-20 | 2021-12-10 | 电子科技大学 | Method for automatically classifying comment data of social network users |
CN111126063A (en) * | 2019-12-26 | 2020-05-08 | 北京百度网讯科技有限公司 | Text quality evaluation method and device |
CN111126063B (en) * | 2019-12-26 | 2023-06-20 | 北京百度网讯科技有限公司 | Text quality assessment method and device |
CN111221940A (en) * | 2020-01-03 | 2020-06-02 | 京东数字科技控股有限公司 | Text generation method and device, electronic equipment and storage medium |
CN111541910A (en) * | 2020-04-21 | 2020-08-14 | 华中科技大学 | Video barrage comment automatic generation method and system based on deep learning |
CN111541910B (en) * | 2020-04-21 | 2021-04-20 | 华中科技大学 | Video barrage comment automatic generation method and system based on deep learning |
CN113705227B (en) * | 2020-05-21 | 2023-04-25 | 中国科学院上海高等研究院 | Method, system, medium and equipment for constructing Chinese word-segmentation-free word embedding model |
CN113705227A (en) * | 2020-05-21 | 2021-11-26 | 中国科学院上海高等研究院 | Method, system, medium and device for constructing Chinese non-segmented word and word embedding model |
CN114429403A (en) * | 2020-10-14 | 2022-05-03 | 国际商业机器公司 | Mediating between social network and payment curation content producers in false positive content mitigation |
CN113033179A (en) * | 2021-03-24 | 2021-06-25 | 北京百度网讯科技有限公司 | Knowledge acquisition method and device, electronic equipment and readable storage medium |
CN113033179B (en) * | 2021-03-24 | 2024-05-24 | 北京百度网讯科技有限公司 | Knowledge acquisition method, knowledge acquisition device, electronic equipment and readable storage medium |
CN113743086A (en) * | 2021-08-31 | 2021-12-03 | 北京阅神智能科技有限公司 | Chinese sentence evaluation output method |
CN114443809A (en) * | 2021-12-20 | 2022-05-06 | 西安理工大学 | Hierarchical text classification method based on LSTM and social network |
CN114443809B (en) * | 2021-12-20 | 2024-04-09 | 西安理工大学 | Hierarchical text classification method based on LSTM and social network |
CN114510924A (en) * | 2022-02-14 | 2022-05-17 | 哈尔滨工业大学 | Text generation method based on pre-training language model |
CN114510649A (en) * | 2022-02-25 | 2022-05-17 | 西安理工大学 | Social network and LSTM model accuracy rate calculation method based on de-duplication sample |
CN114510649B (en) * | 2022-02-25 | 2024-04-09 | 西安理工大学 | Social network and LSTM model accuracy calculating method based on deduplication sample |
CN114707489A (en) * | 2022-03-29 | 2022-07-05 | 马上消费金融股份有限公司 | Method and device for acquiring marked data set, electronic equipment and storage medium |
CN114707489B (en) * | 2022-03-29 | 2023-08-18 | 马上消费金融股份有限公司 | Method and device for acquiring annotation data set, electronic equipment and storage medium |
CN117807963A (en) * | 2024-03-01 | 2024-04-02 | 之江实验室 | Text generation method and device in appointed field |
CN117807963B (en) * | 2024-03-01 | 2024-04-30 | 之江实验室 | Text generation method and device in appointed field |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390018A (en) | A kind of social networks comment generation method based on LSTM | |
Gokulakrishnan et al. | Opinion mining and sentiment analysis on a twitter data stream | |
Cappallo et al. | New modality: Emoji challenges in prediction, anticipation, and retrieval | |
CN109829166B (en) | People and host customer opinion mining method based on character-level convolutional neural network | |
Aragón et al. | Overview of MEX-A3T at IberLEF 2020: Fake News and Aggressiveness Analysis in Mexican Spanish. | |
Barsever et al. | Building a better lie detector with BERT: The difference between truth and lies | |
CN112905739B (en) | False comment detection model training method, detection method and electronic equipment | |
CN114936266A (en) | Multi-modal fusion rumor early detection method and system based on gating mechanism | |
CN105869058B (en) | A kind of method that multilayer latent variable model user portrait extracts | |
CN112966117A (en) | Entity linking method | |
Maynard et al. | Multimodal sentiment analysis of social media | |
Yu et al. | BCMF: A bidirectional cross-modal fusion model for fake news detection | |
Pham | Transferring, transforming, ensembling: the novel formula of identifying fake news | |
CN116737922A (en) | Tourist online comment fine granularity emotion analysis method and system | |
Scola et al. | Sarcasm detection with BERT | |
Bölücü et al. | Hate Speech and Offensive Content Identification with Graph Convolutional Networks. | |
CN113220964B (en) | Viewpoint mining method based on short text in network message field | |
CN113704393A (en) | Keyword extraction method, device, equipment and medium | |
CN114372454A (en) | Text information extraction method, model training method, device and storage medium | |
Kavatagi et al. | A context aware embedding for the detection of hate speech in social media networks | |
Hamed et al. | DISINFORMATION DETECTION ABOUT ISLAMIC ISSUES ON SOCIAL MEDIA USING DEEP LEARNING TECHNIQUES | |
Wang et al. | Using ALBERT and Multi-modal Circulant Fusion for Fake News Detection | |
Li et al. | Multilingual toxic text classification model based on deep learning | |
Lan et al. | Mining semantic variation in time series for rumor detection via recurrent neural networks | |
Upadhyaya et al. | Food Items Prediction Using Sentimental Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |