CN109241402A - A kind of virtual comment machine introduction method based on news content - Google Patents

A kind of virtual comment machine introduction method based on news content Download PDF

Info

Publication number
CN109241402A
CN109241402A CN201810858862.6A CN201810858862A CN109241402A CN 109241402 A CN109241402 A CN 109241402A CN 201810858862 A CN201810858862 A CN 201810858862A CN 109241402 A CN109241402 A CN 109241402A
Authority
CN
China
Prior art keywords
comment
news
keyword
data
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810858862.6A
Other languages
Chinese (zh)
Inventor
朱愚
李贤利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Hua Seiun Technology Co Ltd
Original Assignee
Chengdu Hua Seiun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Hua Seiun Technology Co Ltd filed Critical Chengdu Hua Seiun Technology Co Ltd
Priority to CN201810858862.6A priority Critical patent/CN109241402A/en
Publication of CN109241402A publication Critical patent/CN109241402A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of, and the virtual comment machine introduction method based on news content is saved in method includes the following steps: administrator or editorial staff are manually entered comment in information system Web page and virtually comments on dictionary;With or, information system backstage, dispose big data system, crawl in real time hot news+place government affairs news comment be saved in virtually comment on dictionary;Comment data is analyzed using comment association algorithm, the data of analysis are stored into mysql database, mysql real time data synchronization is stored into redis database using hash mode;Virtual comment data is imported into related news.This programme is commented on specifically for radio, TV and film industries new media news, does not need manual intervention, the real-time, authenticity, objectivity for having ensured dictionary of maximum depth.

Description

A kind of virtual comment machine introduction method based on news content
Technical field
The present invention relates to news information process fields, and in particular to a kind of virtual comment machine importing based on news content Method.
Background technique
In internet+melt the media epoch prevailing, especially in broadcasting and TV new media news content, traditional standard machinery formula is commented By the demand for being also unable to satisfy user.User needs it is seen that real-time is high, the objective reality for the news content that is closely connected Property comment.
The comment of existing tradition machinery formula, faces following shortcoming:
Comment on that library dictionary is insufficient, traditional product, by technical restriction, generally use lienar for relational database (such as: mysql, Oracle etc.) store comment dictionary, a table is stored in all comments, when data reach a certain amount of, access data Speed is slow;Or storage and main table and multiple classification charts, such data volume it is big when, correlation inquiry still compares consumption When.
The product that comment library dictionary content is dull, shortage objectivity and timeliness are low, traditional, usually manually to dictionary Comment data is added, thus there is drawback, the subjectivity of people is strong, and sentence dullness, usually this news are all well and good, this piece News is worth the too forced comments such as recommendation.Due to being artificial addition comment data, these are all after waiting new smell, manually After having read news, some comments manually just are added for the news, for using elsewhere.
Two places of information system are deposited in manual association's news and virtual comment, virtual comment with news, usually Manually from selected in dictionary it is one or more of comment be associated in news.Caused by this mode influence be news comment too Dependent on artificial.
Summary of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of virtual comment machine based on news content Introduction method is commented on specifically for radio, TV and film industries new media news, does not need manual intervention, and maximum depth has ensured word Real-time, authenticity, the objectivity in library.
The purpose of the present invention is achieved through the following technical solutions:
A kind of virtual comment machine introduction method based on news content, method includes the following steps:
S1: in information system Web page, administrator or editorial staff are manually entered comment, are saved in virtual comment dictionary;
With or, dispose big data system on information system backstage, the preservation of hot news+place government affairs news comment is crawled in real time To virtual comment dictionary;
S2: analyzing comment data using comment association algorithm, stores to the data of analysis into mysql database, Mysql real time data synchronization is stored into redis database using hash mode;
S3: virtual comment data is imported into related news.
Further, the sub-step of the step S2 is as follows:
S01: setting crawls the webpage of data;
S02: news list and news details page are analyzed from webpage;
S03: the comment url of the news is found out in news details page;
S04: as unit of news, headline, keyword, content and comment are crawled;
S05: processing crawls data, and associated record news keyword is commented on it.
Further, it is that N keyword is arranged to every news that the comment data, which carries out analysis method, gives each key Weight W is arranged in word, then W1+W2+……WN=1, keyword is combined, comment list is stored in crucial contamination;
Assuming that comment has n item, then the sum of associated comment of each keyword is more than or equal to n.
Further, specific step is as follows by the step S4:
S11: after editorial staff writes Press release, the keyword of news is set, and keyword weight is set;
S12: editorial staff saves manuscript, and news background system can be according to the key set in manuscript keyword and dictionary at this time Word carries out similarity mode;
S13: matched detailed process:
One similarity factor coefficient f is set, it is assumed that one of keyword of editorial staff's contribution is X, the keyword of virtual dictionary Y, matching result Y1, then their relationship are as follows:
Y1 = 0 ;(f=0)
Y1 = X*f ;(0<f<100%)
Y1 = Y ;(f=100).
Further, the similarity factor coefficient f people is setting.
The beneficial effects of the present invention are: physical record is used for using conventional linear database mysql(, it is convenient to be with other System docking)+non-relational database redis(caching, for reading in real time).This virtual Commentary Systems classifies to dictionary (usually dividing hot news, current political news), there are also region news (such as: Shenzhen, Guangxi etc.), and the news such as the political situation of the time, hot topic are adopted With big data crawler technology, Top Site, hot news, the comment data of local E-gov Network news are collected, is associated with using comment Manuscript keyword is commented on it and is associated by parser.Rare occasion requires manual intervention audit, and rearranges. Data after analysis are stored in mysql database, and the real time data synchronization to redis database, redis can be with mass memory Data have the advantage quickly read.
Using crawler technology+comment association algorithm, major website recent reviews, the guarantee of maximum depth can be quickly collected The real-time of dictionary, authenticity, objectivity.
Using weighted associations algorithm, manual intervention is not needed, editorial staff is only responsible for writing contribution, after saving contribution, Just there is part comment to be automatically imported in the news.Editorial staff audits and modifies just.
Detailed description of the invention
Fig. 1 is the flow chart of one's duty.
Specific embodiment
Technical solution of the present invention is described in further detail combined with specific embodiments below, but protection scope of the present invention is not It is confined to as described below.
As shown in Figure 1, a kind of virtual comment machine introduction method based on news content, this method key step are as follows:
S1: in information system Web page, administrator or editorial staff are manually entered comment, are saved in virtual comment dictionary;
With or, dispose big data system on information system backstage, the preservation of hot news+place government affairs news comment is crawled in real time To virtual comment dictionary;
S2: analyzing comment data using comment association algorithm, stores to the data of analysis into mysql database, Mysql real time data synchronization is stored into redis database using hash mode;
S3: virtual comment data is imported into related news.
The purpose of step S1 is for obtaining virtual comment, and virtual comment source mode includes:
It manually inputs, in information system Web page, supports administrator/editorial staff to be manually entered comment, be saved in data In library, such as the following figure
News and its comment are crawled using big data crawler technology, information system backstage disposes big data system, crawls heat in real time Door news+place government affairs news comment, it is as follows to crawl data step:
1 > setting crawls the webpage of data, such as: People's Net, Guangxi News Network;
2 > news list and news details page are analyzed from webpage;
3 > the comment url of the news is found out in news details page;
4 > as unit of news, crawl the related datas such as headline, keyword, content and comment;
5 > processing crawls data, and associated record news keyword is commented on it
// insertion here crawls flow chart and part core code.
Step S2 stores for realizing virtual comment and analysis, and steps are as follows for tool:
Comment data is analyzed using comment association algorithm, analyzes the content of news, keyword is set to every news, is given Weight is arranged in each keyword, is combined to keyword, and comment list is stored in crucial contamination.Citing, it is assumed that one Piece news has 3 keywords, is 1 weight W=30% of keyword, keyword 2W=50%, keyword 3W=20% respectively.Assuming that the news Comment have n item, analyze this n item comment, these comments are associated on the keyword of news.One of final result may be: Keyword 1 is associated with n/m, and keyword 2 is associated with n/p review record keyword 3 and is not associated with comment, then, and n/m+n/p >= n。
It is stored using conventional linear database combination novel non-linearity database is unified, the data storage as above analyzed is arrived In mysql database, mysql real time data synchronization is stored into redis database using hash mode.
Step S3 is that virtual comment data is imported into related news, and its step are as follows:
After editorial staff writes Press release, the keyword of news is set, and keyword weight is set, this program can be with Default sorts according to keyword to be arranged.
Editorial staff saves manuscript, and news background system can be according to the key set in manuscript keyword and dictionary at this time Word carries out similarity mode.
Matched detailed process.One similarity factor coefficient f is set here, it can be taking human as the setting coefficient value, it is assumed that One of keyword of editorial staff's contribution is X, the keyword Y of virtual dictionary, matching result Y1, then their relationship are as follows:
Y1 = 0 ;(f=0)
Y1 = X*f ;(0<f<100%)
Y1 = Y ;(f=100)
Illustrating: one X value of input, it can be deduced that 0 or multiple Y1 values, adjustment f coefficient value, which will affect Y1, is worth number, such as: As f=0, indicate that the degree of association of X and Y is 0, i.e., any comment in dictionary all cannot be as the comment of the news;When When f=100%, indicate that X and Y fits like a glove, Y all comments all can serve as the comment of the news in dictionary;As f=40%, Assuming that X value is Guangxi, Y value=[Guangxi, is understood at Guangxi Network TV Station by Guangxi TV station], then " " Guangxi Network TV Station " will not It is matched out.
The above is only a preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein Form should not be regarded as an exclusion of other examples, and can be used for other combinations, modifications, and environments, and can be at this In the text contemplated scope, modifications can be made through the above teachings or related fields of technology or knowledge.And those skilled in the art institute into Capable modifications and changes do not depart from the spirit and scope of the present invention, then all should be in the protection scope of appended claims of the present invention It is interior.

Claims (5)

1. a kind of virtual comment machine introduction method based on news content, which is characterized in that method includes the following steps:
S1: in information system Web page, administrator or editorial staff are manually entered comment, are saved in virtual comment dictionary;
With or, dispose big data system on information system backstage, the preservation of hot news+place government affairs news comment is crawled in real time To virtual comment dictionary;
S2: analyzing comment data using comment association algorithm, stores to the data of analysis into mysql database, Mysql real time data synchronization is stored into redis database using hash mode;
S3: virtual comment data is imported into related news.
2. a kind of virtual comment machine introduction method based on news content according to claim 1, which is characterized in that institute The sub-step for stating step S2 is as follows:
S01: setting crawls the webpage of data;
S02: news list and news details page are analyzed from webpage;
S03: the comment url of the news is found out in news details page;
S04: as unit of news, headline, keyword, content and comment are crawled;
S05: processing crawls data, and associated record news keyword is commented on it.
3. a kind of virtual comment machine introduction method based on news content according to claim 2, which is characterized in that institute Stating comment data and carrying out analysis method is that N keyword is arranged to every news, weight W is arranged to each keyword, then W1+W2 +……WN=1, keyword is combined, comment list is stored in crucial contamination;
Assuming that comment has n item, then the sum of associated comment of each keyword is more than or equal to n.
4. a kind of virtual comment machine introduction method based on news content according to claim 3, which is characterized in that institute Stating step S4, specific step is as follows:
S11: after editorial staff writes Press release, the keyword of news is set, and keyword weight is set;
S12: editorial staff saves manuscript, and news background system can be according to the key set in manuscript keyword and dictionary at this time Word carries out similarity mode;
S13: matched detailed process:
One similarity factor coefficient f is set, it is assumed that one of keyword of editorial staff's contribution is X, the keyword of virtual dictionary Y, matching result Y1, then their relationship are as follows:
Y1 = 0 ;(f=0)
Y1 = X*f ;(0<f<100%)
Y1 = Y ;(f=100).
5. a kind of virtual comment machine introduction method based on news content according to claim 1, which is characterized in that institute Similarity factor coefficient f people is stated as setting.
CN201810858862.6A 2018-07-31 2018-07-31 A kind of virtual comment machine introduction method based on news content Pending CN109241402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810858862.6A CN109241402A (en) 2018-07-31 2018-07-31 A kind of virtual comment machine introduction method based on news content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810858862.6A CN109241402A (en) 2018-07-31 2018-07-31 A kind of virtual comment machine introduction method based on news content

Publications (1)

Publication Number Publication Date
CN109241402A true CN109241402A (en) 2019-01-18

Family

ID=65073370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810858862.6A Pending CN109241402A (en) 2018-07-31 2018-07-31 A kind of virtual comment machine introduction method based on news content

Country Status (1)

Country Link
CN (1) CN109241402A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885770A (en) * 2019-02-20 2019-06-14 杭州威佩网络科技有限公司 A kind of information recommendation method, device, electronic equipment and storage medium
CN116306514A (en) * 2023-05-22 2023-06-23 北京搜狐新媒体信息技术有限公司 Text processing method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087648A (en) * 2009-12-03 2011-06-08 北京大学 Method and system for fetching news comment page
CN102279894A (en) * 2011-09-19 2011-12-14 嘉兴亿言堂信息科技有限公司 Method for searching, integrating and providing comment information based on semantics and searching system
US20120215798A1 (en) * 2011-02-18 2012-08-23 International Business Machines Corporation System and Method for a Centralized URL Commenting Service Enabling Metadata Aggregation
CN103034722A (en) * 2012-12-13 2013-04-10 合一网络技术(北京)有限公司 Network video comment gathering device and network video comment gathering method
CN106202563A (en) * 2016-08-02 2016-12-07 西南石油大学 A kind of real time correlation evental news recommends method and system
CN106951409A (en) * 2017-03-17 2017-07-14 黄淮学院 A kind of network social intercourse media viewpoint tendency analysis system and method
CN107220352A (en) * 2017-05-31 2017-09-29 北京百度网讯科技有限公司 The method and apparatus that comment collection of illustrative plates is built based on artificial intelligence
CN108153723A (en) * 2017-12-27 2018-06-12 北京百度网讯科技有限公司 Hot spot information comment generation method, device and terminal device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087648A (en) * 2009-12-03 2011-06-08 北京大学 Method and system for fetching news comment page
US20120215798A1 (en) * 2011-02-18 2012-08-23 International Business Machines Corporation System and Method for a Centralized URL Commenting Service Enabling Metadata Aggregation
CN102279894A (en) * 2011-09-19 2011-12-14 嘉兴亿言堂信息科技有限公司 Method for searching, integrating and providing comment information based on semantics and searching system
CN103034722A (en) * 2012-12-13 2013-04-10 合一网络技术(北京)有限公司 Network video comment gathering device and network video comment gathering method
CN106202563A (en) * 2016-08-02 2016-12-07 西南石油大学 A kind of real time correlation evental news recommends method and system
CN106951409A (en) * 2017-03-17 2017-07-14 黄淮学院 A kind of network social intercourse media viewpoint tendency analysis system and method
CN107220352A (en) * 2017-05-31 2017-09-29 北京百度网讯科技有限公司 The method and apparatus that comment collection of illustrative plates is built based on artificial intelligence
CN108153723A (en) * 2017-12-27 2018-06-12 北京百度网讯科技有限公司 Hot spot information comment generation method, device and terminal device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885770A (en) * 2019-02-20 2019-06-14 杭州威佩网络科技有限公司 A kind of information recommendation method, device, electronic equipment and storage medium
CN109885770B (en) * 2019-02-20 2022-01-07 杭州威佩网络科技有限公司 Information recommendation method and device, electronic equipment and storage medium
CN116306514A (en) * 2023-05-22 2023-06-23 北京搜狐新媒体信息技术有限公司 Text processing method and device, electronic equipment and storage medium
CN116306514B (en) * 2023-05-22 2023-09-08 北京搜狐新媒体信息技术有限公司 Text processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN101174273B (en) News event detecting method based on metadata analysis
CN108154395B (en) Big data-based customer network behavior portrait method
JP4489994B2 (en) Topic extraction apparatus, method, program, and recording medium for recording the program
Jäschke et al. Tag recommendations in folksonomies
US9430568B2 (en) Method and system for querying information
CN100462969C (en) Method for providing and inquiry information for public by interconnection network
CN103246644B (en) Method and device for processing Internet public opinion information
US9959326B2 (en) Annotating schema elements based on associating data instances with knowledge base entities
CN103020159A (en) Method and device for news presentation facing events
JP2005525657A (en) Managing expressions in database systems
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN105718585B (en) Document and label word justice correlating method and its device
CN113297457B (en) High-precision intelligent information resource pushing system and pushing method
CN111125297B (en) Massive offline text real-time recommendation method based on search engine
CN109241402A (en) A kind of virtual comment machine introduction method based on news content
CN106776640A (en) A kind of stock information information displaying method and device
US20160246794A1 (en) Method for entity-driven alerts based on disambiguated features
KR102413961B1 (en) Method for providing news analysis service using robotic process automation monitoring
Singhal et al. DataGopher: Context-based search for research datasets
CN104216901B (en) The method and system of information search
Savyanavar et al. Multi-document summarization using TF-IDF Algorithm
Saravanan et al. Extraction of Core Web Content from Web Pages using Noise Elimination.
CN114706948A (en) News processing method and device, storage medium and electronic equipment
CN112434126B (en) Information processing method, device, equipment and storage medium
CN108733687A (en) A kind of information retrieval method and system based on Text region

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190118

RJ01 Rejection of invention patent application after publication