CN111553146A - News writing style modeling method, writing style-influence analysis method and news quality evaluation method - Google Patents

News writing style modeling method, writing style-influence analysis method and news quality evaluation method Download PDF

Info

Publication number
CN111553146A
CN111553146A CN202010387680.2A CN202010387680A CN111553146A CN 111553146 A CN111553146 A CN 111553146A CN 202010387680 A CN202010387680 A CN 202010387680A CN 111553146 A CN111553146 A CN 111553146A
Authority
CN
China
Prior art keywords
news
refers
modeling method
mark
influence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010387680.2A
Other languages
Chinese (zh)
Inventor
曹娟
杨玉婷
谢添
刘浩远
郭俊波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences
Hangzhou Zhongke Ruijian Technology Co ltd
Original Assignee
Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences
Hangzhou Zhongke Ruijian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences, Hangzhou Zhongke Ruijian Technology Co ltd filed Critical Institute Of Digital Economy Industry Institute Of Computing Technology Chinese Academy Of Sciences
Priority to CN202010387680.2A priority Critical patent/CN111553146A/en
Publication of CN111553146A publication Critical patent/CN111553146A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a news writing style modeling method, a writing style-influence analysis method and a news quality evaluation method. The invention aims to provide a news writing style modeling method, a writing style-influence analysis method and a news quality evaluation method. The technical scheme of the invention is as follows: a news writing style modeling method is characterized in that: constructed according to quantitative readability, logistical, credibility, literacy, interactivity, interestingness, humanity, and structural integrity. The method is suitable for the field of social media data mining.

Description

News writing style modeling method, writing style-influence analysis method and news quality evaluation method
Technical Field
The invention relates to a news writing style modeling method, a writing style-influence analysis method and a news quality evaluation method. The method is suitable for the field of social media data mining.
Background
With the increasing influence of the internet on the lives of people and the wide popularization of mobile terminal devices, data in human production and life are rapidly growing in recent years. With the improvement and development of deep learning theory, the intrinsic value contained in the big data is continuously mined, and the value of the big data causes high importance to governments, business circles and scientific and technological circles.
For the media industry, as social media becomes a main way for people to acquire news events, many traditional news media begin to create accounts, post news and interact with readers in social media (such as a newsgang microblog), so how to generate high-quality news content according to user requirements in the social media becomes an important research task.
The quality of news on social media is often reflected in the influence, so the emphasis of the existing work is mostly how to predict the influence of news, and the work is mostly concentrated on three aspects: user, information dissemination and news content:
(1) the influence of the individual influence of a user who releases news on news influence is mainly researched about the work of the user, and the news influence is usually predicted by combining social networks of the user, such as a network of a spotlighter, on the basis of the individual influence of the user, such as the number of fans.
(2) The work on information dissemination mainly researches how to predict the future dissemination trend of news by the early network structure of the news dissemination, and mines important nodes in the dissemination network to facilitate information dissemination.
(3) Work on news content studies the effect of news content itself on its broadcast impact, can predict its impact before release, and help users to perform tuned retouching of news content as it is generated to produce higher quality news.
Work on news content is largely divided into two categories, based on topic and language style. Topic-based work is aimed at studying the impact of the topic to which the news belongs on its influence, but for authoritative news media, which need to ensure diversity of topics, including only some specific topics is not applicable. Therefore, it is more important how to retouch the writing style to produce high quality news, and this is less studied, especially for chinese. Most of the existing works only focus on some basic vocabulary information, including word embedding and n-gram (n-gram), and these linguistic knowledge can not accurately shape the characteristics of the news writing style, and it is difficult to capture the influence of the writing style on the influence.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the problems, a news writing style modeling method, a writing style-influence analysis method and a news quality evaluation method are provided.
The technical scheme adopted by the invention is as follows: a news writing style modeling method is characterized in that: constructed according to quantitative readability, logistical, credibility, literacy, interactivity, interestingness, humanity, and structural integrity.
The readability is measured in terms of the following features: the number of characters, the number of words, the number of sentences, the number of clauses, the average word length, the Sennce _ broken, the RIX, the LIX, the LW and the number of professional vocabularies;
wherein, sequence _ broken represents the average number of clauses contained in a Sentence in news; RIX LW/sentence number, LIX ═ word number/sentence number + (100 LW)/Words, where LW refers to the number of long Words in news;
the logical is measured in terms of the following features: forward _ reference and Conjs;
wherein Forward _ reference refers to the number of pronouns and third person pronouns in news; conjs refers to the number of conjunctions in the news.
The confidence level is measured by the following characteristics: @, Nums, Time, Places, Objects, Official _ speech, and Uncertainty;
where @ refers to whether the news contains an object that leads to the source of the news message or a news event; nums refers to whether the news contains detailed numbers; time refers to whether Time is included in the news; places refers to whether a place is contained in the news; objects refers to whether a person or organization is contained in the news; office _ speed refers to whether a word capable of indicating that the news is from an Official notice is contained in the news; uncertainty refers to whether or not words representing Uncertainty are contained in the statistical news.
The degree of writing is measured by the following characteristics: noun, Adj, Prep, Pron, Verb, and Adv;
where Noun refers to the number of nouns in the news; adj refers to the number of adjectives in the news; prep refers to the number of prepositions in the news; pronoun refers to the number of pronouns in the news; verb refers to the number of verbs in the news; adv refers to the number of adverbs in the news.
The interactivity is measured in terms of the following features: first _ prop, Second _ prop, indirect _ prop and Que _ mark;
wherein First _ prop refers to the number of First people in news; second _ prop refers to the number of Second people in the news; interrogative _ pron refers to the number of Interrogative pronouns in the news; que _ mark refers to the number of question marks in the news.
The interestingness is measured in terms of the following features: rhetoric, Idiom, adaptive, Exc _ mark, Emoticon, and Adj;
wherein Rhetoric refers to the number of metaphors in the news; idiom refers to the number of idioms in the news; adaptive refers to the number of inflection words in the news; exc _ mark refers to the number of exclamation marks in the news; emoticon refers to the number of emoticons in the news; adj refers to the number of adjectives in the news.
The passivity is measured with the following characteristics: adv of grid, Modal partition, First _ pron, Second _ pron, Exc _ mark and Que _ mark;
wherein Adv of degree refers to the number of degree adverbs in the news; modalparticle refers to the number of mood words in the news; first _ prop refers to the number of First-person pronouns in the news; second _ prop refers to the number of Second person pronouns in the news; exc _ mark refers to the number of exclamation marks in the news; que _ mark refers to the number of question marks in the news.
The structural integrity is measured with the following characteristics: HasHead, HasImage, HasVideo, and HasTag;
wherein HasHead refers to whether a news headline is included; HasImage refers to whether a news picture is included; HasVideo refers to whether news video is included; HasTag refers to whether an event related topic is included.
A writing style-influence analysis method is characterized by comprising the following steps:
acquiring news on a plurality of media, and determining the news quality of the news on the media according to the influence of the media;
obtaining information gain IG of each characteristic used for quantifying readability, logicality, credibility, writing degree, interactivity, interestingness, humanity and structural integrity in the news writing style modeling method based on news with different qualities;
the information gain IG is an index for measuring the influence of characteristics on the news quality;
counting the correlation between each characteristic and influence for quantifying readability, logicality, credibility, writing degree, interactivity, interestingness, humanity and structural integrity in each medium, and determining a Spearman correlation coefficient SRC;
the Spearman correlation coefficient SRC is an index that measures how much a feature has an effect on news quality.
The acquiring of the information gain IG of each feature includes:
the calculation formula corresponding to the characteristics T and IG is as follows:
IG(T)=H(C)-H(C|T)
Figure BDA0002484679190000041
H(C|T)=p(T)H(C|T)+P(T′)H(C|T′)
wherein H (C) represents information entropy, H (C | T) represents conditional entropy, p (T) represents probability of including the feature T, p (T') represents probability of not including the feature T, and IG ∈ [0,1 ].
A news quality assessment method is characterized in that:
obtaining information gain IG and Spearman correlation coefficient SRC of each characteristic obtained by news X based on the writing style-influence analysis method;
calculating a news quality assessment score Q _ score:
Figure BDA0002484679190000051
Wi=IGi*SRCi
wherein, WiIndicating the magnitude of the impact of the ith feature on news quality, FiAnd f is the number of the features.
The invention has the beneficial effects that: the invention constructs a news writing style model according to the quantified readability, logicality, credibility, literacy, interactivity, interestingness, humanity and structural integrity, and realizes the writing style characteristic of high-quality news media with higher accuracy through the readability, logicality, credibility, literacy, interactivity, interestingness, animateness and structural integrity quantified by a plurality of characteristics.
The invention analyzes the influence of the writing style on the influence from two angles of the space and the inside of the media respectively, removes other noise factors and can better strip the relation between the writing style and the influence.
The invention can evaluate the news quality based on the writing style, does not need the early spread information of the news in the network, can evaluate before the news is released, can give interpretable analysis to the evaluation result, and further can provide modification opinions with guiding significance for the news writer.
Drawings
FIG. 1 is a diagram of a writing style-based news quality assessment model according to the present invention.
Detailed Description
Example 1: the embodiment provides a writing style modeling method based on news criteria, which is constructed based on eight news criteria most relevant to social media news quality, and comprises the following steps: readability, logistical, credibility, literacy, interactivity, interestingness, humanity, and structural integrity. The present embodiment quantifies these eight news criteria by the following textual features, including:
1. readability. Clear and easy reading is a basic requirement of news, especially for a short text platform such as a Sing microblog, the content of the news is limited within 140 words, and a reader often spends less time to quickly browse the news.
In the present embodiment, there are 10 features that can affect the readability of a piece of news, including: the number of characters, the number of words, the number of sentences, the number of clauses, the average word length, the Sennce _ broken, the RIX, the LIX, the LW and the number of professional vocabularies; wherein, the sequence _ broken represents the average number of clauses contained in a Sentence, and the sequence _ broken is the number of clauses/number of clauses; RIX is LW/sentence number; LIX ═ word count/sentence count + (100 × LW)/Words; LW refers to the number of long words (a long word refers to a word containing more than 2 words).
2. And (4) logical property. Good news should be logical, up and down coherent. The logic correlation measurement features are two in total and comprise Forward _ reference and Conjs; the Forward reference is used for capturing logicality among different sentences in news, and the calculation method is to count the number of pronouns and third-person nominal words in the news. Since the connection word such as "therefore, so" also has a positive effect on the logical character of the news, the present embodiment proposes a feature Conjs for counting the number of connection words and further measuring the logical character of the news.
3. And (4) reliability. The credibility-related features are 7 in total, including @, Nums, Time, Places, Objects, Official _ speed, Uncertainty. For official media in the sweepstakes microblog, "@" is often used to elicit a source of news messages or an object of a news event, which is beneficial to enhance the credibility of news. Detailed numeric (Nums), Time (Time), place (Places), and object (people or organization) information is also useful to improve the credibility of news. Some words indicate that news originates from an Official notification (Official _ watch), such as a "advisory," are also counted for measuring the credibility of the news. In addition, words that represent Uncertainty (uncertainties), such as "possible," can reduce the credibility of news.
4. Degree of writing. News on social media tends to be more spoken than traditional media news. The written degree of news is related to the use of different parts of speech, including nouns (Noun), adjectives (Adj), prepositions (Prep), pronouns (Pron), verbs (Verb) and adverbs (Adv), and the written degree is Noun + Adj + Prep-Pron-Verb-Adv-sequence _ broken.
5. And (4) interactivity. Appropriate interaction with the reader is useful to elicit the reader's thought and participate in the discussion. "do you feel worsted? Because such sentences can achieve the effect of interacting with readers, the present embodiment counts the number of the First person's name (First _ prop), the Second person's name (Second _ prop), the query pronouns (interactive _ prop) and the question marks (Que _ mark) in the sentences to measure the interactivity.
6. Is interesting. Naturally, interesting news is more attractive to readers. The use of metaphors (Rhetoric), idioms (idim), milestones (adaptive), exclamation marks (Exc _ mark), emoticons (Emoticon), and adjectives (Adj) in news all contribute to the enjoyment of the news.
7. Is moving and humanized. Good news can cause emotional resonance of the reader. In the embodiment, it is considered that the use of the adverb (advofdepth), the word (Modal), the First person pronoun (First _ pron), the Second person pronoun (Second _ pron), the exclamation mark (Exc _ mark), and the question mark (quee _ mark) can improve the vividness of the news.
8. Structural integrity. News in the Sina microblog has a specific canonical format, including news headlines (HasHead) separated by "[ in ]), multimedia content including video/pictures (HasImage, HasVideo), and including event-related topics (HasTag).
Example 2: the embodiment is a multi-angle writing style-influence analysis method, which comprises inter-media analysis and intra-media analysis. In the analysis among the media, because the theme overlap among each media is larger, the influence of the theme can be weakened; in the intra-media analysis, the publisher of the news is unchanged, and thus, the influence of the user can be eliminated. Finally, the relationship between the writing style and the news quality is obtained by combining the common conclusion of the two angle analysis experiments.
1. Analysis experiment between media
a. The method comprises the steps of acquiring news on a plurality of different media, and determining the news quality of the news on the media according to the influence of the media.
In this embodiment, according to the "two micro-end" media fusion propagation ranking list published by the people network, the "people's daily news" and the "central news" belong to the most influential news media in the newwave microblog, so that all the news published on the newwave microblog by the two people are crawled first, and the time is from 2012 to 2018, and the news is taken as a high-quality news category (category 1). In contrast, two other medium impact news media were selected: the Xinhua network and the Xinhua viewpoint crawl all the Xinwang microblog news published by the Xinhua network as medium-quality news (class 0). Among them, the influence of all news of class 1 is always tens of times that of news of class 0.
b. The news writing style modeling method of example 1 is obtained based on news of different qualities, and information gains IG for quantifying characteristics of readability, logicality, credibility, writing degree, interactivity, interestingness, humanity, and structural integrity are obtained.
Based on the writing style related characteristics provided by the embodiment, the method combines a machine learning classification method (random forest ) to classify the middle and high influence news, and the obtained classification accuracy, precision, recall rate and F1 values are respectively 94%, 95%, 94% and 94%, so that the middle and high quality news are proved to have obvious and distinguishable writing style differences.
In the embodiment, Information Gain (IG) of each feature is obtained, the IG is used for measuring influence of the feature on classification, and the measurement criterion is to see how much Information the feature can bring to a classification system, and the more Information the feature is brought, the more important the feature is. For a feature, the amount of information will change when the system has it and when it does not, and the difference between the previous and next information amounts is the amount of information the feature brings to the system. The amount of information, entropy, is calculated for the characteristic T, IG as follows,
IG(T)=H(C)-H(C|T)
Figure BDA0002484679190000081
H(CT)=p(T)H(CT)+P(T')H(CT')
wherein H (C) represents information entropy, H (C | T) represents conditional entropy, p (T) represents probability of including the feature T, p (T') represents probability of not including the feature T, and IG ∈ [0,1 ].
c. The information gain IG is used as an index for measuring the influence of the characteristics on the news quality. In this embodiment, the IGs of each feature are sorted, and the features are sorted, and the top 10 features are obtained by: LIX, Exc _ mark, Clauses (number of Clauses), setents (number of Clauses), Ave _ word _ len (average word length), RIX, setence _ broken, Words (number of Words), Second _ prop, charcters (number of Characters). It is stated that the difference in writing style of high quality news as compared to medium quality news is mainly reflected in readability, interactivity and interestingness.
2. in-Medium analysis experiments
The in-media analysis method includes statistics of the correlation between each authoring style feature and influence, such as Spearman's Rank Correlation (SRC), in each media.
SRC reflects the direction and extent of the trend between the two variables, with values ranging from-1 to +1, with 0 indicating that the two variables are uncorrelated, positive values indicating positive correlation, and negative values indicating negative correlation.
SRC is a measure of the magnitude of the impact of a feature on news quality. And after calculating the SRC between all the characteristics and the influence in each medium, obtaining the SRC of the final characteristics on the influence in an averaging mode. The features are ranked according to SRC to obtain a conclusion similar to the inter-media analysis experiment, namely the features most relevant to the influence mainly come from readability, interestingness and interactivity.
Example 3: the embodiment is a news quality evaluation method based on writing style:
the effect of various features on news quality, including Information Gain (IG) and Spearman correlation coefficient (SRC), in a number of analysis perspectives, including inter-media and intra-media, was obtained according to example 2.
Given news X, the quality assessment score (Q score) for that news X is calculated as follows, according to the news quality assessment model as follows:
Figure BDA0002484679190000101
Wi=IGi*SRCi
wherein, WiIndicating the magnitude of the impact of the ith feature on news quality, FiAnd f is the number of the features.

Claims (12)

1. A news writing style modeling method is characterized in that: constructed according to quantitative readability, logistical, credibility, literacy, interactivity, interestingness, humanity, and structural integrity.
2. The news authoring style modeling method of claim 1, wherein: the readability is measured in terms of the following features: the number of characters, the number of words, the number of sentences, the number of clauses, the average word length, the Sennce _ broken, the RIX, the LIX, the LW and the number of professional vocabularies;
wherein, sequence _ broken represents the average number of clauses contained in a Sentence in news; RIX LW/number of sentences, LIX ═ number of Words/number of sentences + (100 LW)/Words, where LW refers to the number of long Words in news.
3. The news authoring style modeling method of claim 1, wherein: the logical is measured in terms of the following features: forward _ reference and Conjs;
wherein Forward _ reference refers to the number of pronouns and third person pronouns in news; conjs refers to the number of conjunctions in the news.
4. The news authoring style modeling method of claim 1, wherein: the confidence level is measured by the following characteristics: @, Nums, Time, Places, Objects, Official _ speech, and Uncertainty;
where @ refers to whether the news contains an object that leads to the source of the news message or a news event; nums refers to whether the news contains detailed numbers; time refers to whether Time is included in the news; places refers to whether a place is contained in the news; objects refers to whether a person or organization is contained in the news; office _ speed refers to whether a word capable of indicating that the news is from an Official notice is contained in the news; uncertainty refers to whether or not words representing Uncertainty are contained in the statistical news.
5. The news authoring style modeling method of claim 1, wherein: the degree of writing is measured by the following characteristics: noun, Adj, Prep, Pron, Verb, and Adv;
where Noun refers to the number of nouns in the news; adj refers to the number of adjectives in the news; prep refers to the number of prepositions in the news; pronoun refers to the number of pronouns in the news; verb refers to the number of verbs in the news; adv refers to the number of adverbs in the news.
6. The news authoring style modeling method of claim 1, wherein: the interactivity is measured in terms of the following features: first _ prop, Second _ prop, indirect _ prop and Que _ mark;
wherein First _ prop refers to the number of First people in news; second _ prop refers to the number of Second people in the news; interrogative _ pron refers to the number of Interrogative pronouns in the news; que _ mark refers to the number of question marks in the news.
7. The news authoring style modeling method of claim 1, wherein: the interestingness is measured in terms of the following features: rhetoric, Idiom, adaptive, Exc _ mark, Emoticon, and Adj;
wherein Rhetoric refers to the number of metaphors in the news; idiom refers to the number of idioms in the news; adaptive refers to the number of inflection words in the news; exc _ mark refers to the number of exclamation marks in the news; emoticon refers to the number of emoticons in the news; adj refers to the number of adjectives in the news.
8. The news authoring style modeling method of claim 1, wherein: the passivity is measured with the following characteristics: adv of hierarchy, Modal partition, First _ pron, Second _ pron, Exc _ mark and Que _ mark;
wherein Adv of degree refers to the number of degree adverbs in the news; modal particle refers to the number of mood words in the news; first _ prop refers to the number of First-person pronouns in the news; second _ prop refers to the number of Second person pronouns in the news; exc _ mark refers to the number of exclamation marks in the news; que _ mark refers to the number of question marks in the news.
9. The news authoring style modeling method of claim 1, wherein: the structural integrity is measured with the following characteristics: HasHead, HasImage, HasVideo, and HasTag;
wherein HasHead refers to whether a news headline is included; HasImage refers to whether a news picture is included; HasVideo refers to whether news video is included; HasTag refers to whether an event related topic is included.
10. A writing style-influence analysis method is characterized by comprising the following steps:
acquiring news on a plurality of different media, and determining the news quality of the news on the media according to the influence of the media;
acquiring information gain IG for quantifying characteristics of readability, logicality, credibility, writing degree, interactivity, interestingness, humanity and structural integrity in the news writing style modeling method according to any one of claims 1-9 based on news with different qualities;
the information gain IG is an index for measuring the influence of characteristics on the news quality;
counting the correlation between each characteristic and influence for quantifying readability, logicality, credibility, writing degree, interactivity, interestingness, humanity and structural integrity in each medium, and determining a Spearman correlation coefficient SRC;
the Spearman correlation coefficient SRC is an index that measures how much a feature has an effect on news quality.
11. The authoring style-impact analysis method of claim 10, wherein: the acquiring of the information gain IG of each feature includes:
the calculation formula corresponding to the characteristics T and IG is as follows:
IG(T)=H(C)-H(C|T)
Figure FDA0002484679180000031
H(C|T)=p(T)H(C|T)+P(T’)H(C|T’)
wherein H (C) represents information entropy, H (C | T) represents conditional entropy, p (T) represents probability of including the feature T, p (T') represents probability of not including the feature T, and IG ∈ [0,1 ].
12. A news quality assessment method is characterized in that:
the authoring style-impact analysis method of claim 10 or 11 obtaining information gain IG and Spearman correlation coefficient SRC for each feature;
calculating a news quality assessment score Q _ score:
Figure FDA0002484679180000032
Wi=IGi*SRCi
wherein, WiIndicating the ith feature for NewsMagnitude of influence of quantity, FiAnd f is the number of the features.
CN202010387680.2A 2020-05-09 2020-05-09 News writing style modeling method, writing style-influence analysis method and news quality evaluation method Pending CN111553146A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010387680.2A CN111553146A (en) 2020-05-09 2020-05-09 News writing style modeling method, writing style-influence analysis method and news quality evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010387680.2A CN111553146A (en) 2020-05-09 2020-05-09 News writing style modeling method, writing style-influence analysis method and news quality evaluation method

Publications (1)

Publication Number Publication Date
CN111553146A true CN111553146A (en) 2020-08-18

Family

ID=72000566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010387680.2A Pending CN111553146A (en) 2020-05-09 2020-05-09 News writing style modeling method, writing style-influence analysis method and news quality evaluation method

Country Status (1)

Country Link
CN (1) CN111553146A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462406A (en) * 2014-12-10 2015-03-25 天津大学 Algorithm for extracting text model features to classify text models
US20180349781A1 (en) * 2017-06-02 2018-12-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for judging news quality and storage medium
CN108959364A (en) * 2018-05-21 2018-12-07 大连理工大学 News media's influence power appraisal procedure in a kind of social media event level news
CN110210016A (en) * 2019-04-25 2019-09-06 中国科学院计算技术研究所 Bilinearity neural network Deceptive news detection method and system based on style guidance
CN110688834A (en) * 2019-08-22 2020-01-14 阿里巴巴集团控股有限公司 Method and equipment for rewriting intelligent manuscript style based on deep learning model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462406A (en) * 2014-12-10 2015-03-25 天津大学 Algorithm for extracting text model features to classify text models
US20180349781A1 (en) * 2017-06-02 2018-12-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for judging news quality and storage medium
CN108959364A (en) * 2018-05-21 2018-12-07 大连理工大学 News media's influence power appraisal procedure in a kind of social media event level news
CN110210016A (en) * 2019-04-25 2019-09-06 中国科学院计算技术研究所 Bilinearity neural network Deceptive news detection method and system based on style guidance
CN110688834A (en) * 2019-08-22 2020-01-14 阿里巴巴集团控股有限公司 Method and equipment for rewriting intelligent manuscript style based on deep learning model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CAO JUAN: "Web video topics discovery and structuralization with social network", pages 53 - 63 *
唐子惠: "医学人工智能导论", 上海科学技术出版社, pages: 145 *
李严;: "场景凸显视域下机器新闻写作的应用价值探析", no. 04, pages 155 - 158 *

Similar Documents

Publication Publication Date Title
Bosco et al. Developing corpora for sentiment analysis: The case of irony and senti-tut
Maynard et al. Challenges in developing opinion mining tools for social media
Afroz et al. Detecting hoaxes, frauds, and deception in writing style online
Bellot et al. INEX Tweet Contextualization task: Evaluation, results and lesson learned
Clarke Functional linguistic variation in Twitter trolling
Lee et al. Fandom, social media, and identity work: The emergence of virtual community through the pronoun “we”.
Das et al. Temporal analysis of sentiment events–a visual realization and tracking
Guerini et al. Corps: A corpus of tagged political speeches for persuasive communication processing
Trye et al. Harnessing Indigenous Tweets: The Reo Māori Twitter corpus
Talafha et al. Arabic poem generation with hierarchical recurrent attentional network
Brooke et al. Unsupervised stylistic segmentation of poetry with change curves and extrinsic features
Tigunova et al. RedDust: a large reusable dataset of Reddit user traits
Brooke et al. Distinguishing voices in the waste land using computational stylistics
Sütçü et al. An example of pragmatic analysis in natural language processing: sentimental analysis of movie reviews
CN111553146A (en) News writing style modeling method, writing style-influence analysis method and news quality evaluation method
Toolan Top keyword abridgements of short stories: A corpus linguistic resource?
Tretyakov et al. Sentiment analysis of social networks messages
Calude Demonstrative clefts in spoken English
Kasmuri et al. Building a Malay-English code-switching subjectivity corpus for sentiment analysis
Verhoeven Two authors walk into a bar: studies in author profiling
KR20210064620A (en) The informatization method for youtube video metadata for personal media production
Adewumi Vector representations of idioms in data-driven chatbots for robust assistance
Chang Multi-document summarisation using feature distribution analysis
Biri Metadiscourse in Online Opinion Texts
Afroz et al. An intelligent framework for text-to-emotion analyzer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 12 / F, building 4, 108 Xiangyuan Road, Gongshu District, Hangzhou City, Zhejiang Province 310015

Applicant after: Institute of digital economy industry, Institute of computing technology, Chinese Academy of Sciences

Applicant after: Hangzhou Zhongke Ruijian Technology Co.,Ltd.

Address before: Room 302, building 5, 17-1 Chuxin Road, Hangzhou City, Zhejiang Province, 310015

Applicant before: Hangzhou Zhongke Ruijian Technology Co.,Ltd.

Applicant before: Institute of digital economy industry, Institute of computing technology, Chinese Academy of Sciences