CN109241297A - A kind of classifying content polymerization, electronic equipment, storage medium and engine - Google Patents

A kind of classifying content polymerization, electronic equipment, storage medium and engine Download PDF

Info

Publication number
CN109241297A
CN109241297A CN201810744608.3A CN201810744608A CN109241297A CN 109241297 A CN109241297 A CN 109241297A CN 201810744608 A CN201810744608 A CN 201810744608A CN 109241297 A CN109241297 A CN 109241297A
Authority
CN
China
Prior art keywords
content
article
hot word
article content
word bank
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810744608.3A
Other languages
Chinese (zh)
Other versions
CN109241297B (en
Inventor
李剑
陈星�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Pinwei Software Co Ltd
Original Assignee
Guangzhou Pinwei Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Pinwei Software Co Ltd filed Critical Guangzhou Pinwei Software Co Ltd
Priority to CN201810744608.3A priority Critical patent/CN109241297B/en
Publication of CN109241297A publication Critical patent/CN109241297A/en
Application granted granted Critical
Publication of CN109241297B publication Critical patent/CN109241297B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of classifying content polymerization, include: when original article content and article content to be measured for comment class article when, attribute tags corresponding with original article content are established according to variety classes, attribute tags and original article content are established into mapping relations;Destructing is carried out to different types of original article content using segmenter and extracts the corresponding high-frequency phrase of each original article content respectively, and each high-frequency phrase and attribute tags are established into mapping relations;Each high-frequency phrase is separately input into and is trained and obtains corresponding with attribute tags to have trained linear model in several linear models to be trained;It has trained linear model to screen article content to be measured according to difference and has matched corresponding attribute tags.A kind of classifying content polymerization of the invention, reduces cost of labor, according to the corresponding attribute tags of article content to be measured can once present user at the moment, greatly improves the experience sense of user in a manner of different attribute label by it.

Description

A kind of classifying content polymerization, electronic equipment, storage medium and engine
Technical field
The present invention relates to natural language processing field more particularly to a kind of classifying content polymerizations, electronic equipment, storage Medium and engine.
Background technique
Natural language processing (NLP) is an important directions in computer science and artificial intelligence field.It grinds Study carefully the various theory and methods for being able to achieve and carrying out efficient communication between people and computer with natural language.Natural language processing is one Door melts linguistics, computer science, mathematics in the science of one.Natural language processing is not generally to study natural language, And it is to develop the computer system that can effectively realize natural language communication, software systems especially therein.
All there is content shopping guide concept on current each platform, good content has more user's viscosity.Such as makeup album Female user can effectively be attracted, body-building open air album can effectively attract male user.These albums again can be with simultaneously Combine well with the working days of shopping platform, cargo, on the one hand increases user's viscosity, be on the one hand content shopping guide.With The growth of creation article quantity about all kinds of commodity, crawls the surge of article quantity, how to manage these articles, be multiplexed article All at problem.It is all to use to carry out labeling to these articles manually at present, this measure dramatically increases human cost, when article number More than it is excessive when, manpower can not solve.
Summary of the invention
For overcome the deficiencies in the prior art, one of the objects of the present invention is to provide a kind of classifying content polymerization, It is all to use to carry out labeling to these articles manually at present that it, which can solve, and this measure dramatically increases human cost, when article number is super When excessive, manpower insurmountable problem.
The second object of the present invention is to provide a kind of electronic equipment, and can solve all is using giving these texts manually at present Zhang Jinhang labeling, this measure dramatically increase human cost, when article number is more than excessive, manpower insurmountable problem.
The third object of the present invention is to provide a kind of computer storage medium, can solve all be at present using manually to These articles carry out labeling, and this measure dramatically increases human cost, and when article number is more than excessive, manpower is insurmountable Problem.
The fourth object of the present invention is to provide a kind of classifying content aggregation engine, and can solve all is using manual at present Labeling is carried out to these articles, this measure dramatically increases human cost, and when article number is more than excessive, manpower can not be solved The problem of.
An object of the present invention is implemented with the following technical solutions:
A kind of classifying content polymerization, characterized by comprising:
Story label is established, different types of original article content and article content to be measured on line platform are obtained, When the original article content and the article content to be measured for comment class article when, according to variety classes establish with it is described The attribute tags and the original article content are established mapping relations by the corresponding attribute tags of original article content;
High frequency words are concluded, and different types of original article content deconstruct and extracted respectively every using segmenter The corresponding high-frequency phrase of a original article content, and each high-frequency phrase and the attribute tags are established into mapping and closed System;
Linear model is established, each high-frequency phrase is separately input into several linear models to be trained and is trained And it obtains corresponding with the attribute tags having trained linear model;
Classifying content has trained linear model to screen article content to be measured according to difference and has matched correspondence The attribute tags.
Further, when the original article content and the article content to be measured be comment class article when, execute with Lower step:
Hot word bank is established, if obtaining the true comment of main line upper mounting plate, establishes hot word bank according to several true comments;
Hot word bank is arranged, several true comments in the hot word bank are subjected to attributive classification and obtains number of words attribute and matter Measure attribute;
Abundant hot word bank, deduces out near synonym library using word2vec from the hot word bank, uses the near synonym library Progressive alternate is carried out to the true comment of the different number of words attributes and has been enriched hot word bank;
Comment classification, the hot word bank and the article content to be measured are input in greedy Matching Model and are classified, Greediness Matching Model piece in the hot word bank matches the corresponding qualitative attribute.
Further, the hot word bank that arranges is specially by several true comments in the hot word bank successively according to number of words How much carry out classification and well also classifying according to quality, the qualitative attribute is preferably commented on, difference is commented on, medium comment.
Further, each high-frequency phrase includes several high frequency vocabulary, and the linear model of establishing further includes before High frequency words standardization counts current frequency of occurrence of each high frequency vocabulary in the corresponding original article, institute It states most frequency of occurrence in original article content and number at least occurs;According to the current frequency of occurrence, at most occur this number and Minimum frequency of occurrence calculates the corresponding weight of the high frequency vocabulary, according to the weight to described in each high-frequency phrase High frequency vocabulary carries out weight sequencing.
Further, the classifying content specifically: line will have been trained described in article content to be measured difference input value difference Property model in, it is each described that linear model has been trained to export corresponding phasic property value, filter out described in the maximum phasic property value corresponds to Linear model has been trained, the corresponding attribute tags are filtered out according to the training pattern.
Further, the attribute tags can be women's dress, cuisines, numeral science and technology, film, small pure and fresh, trend of back-to-ancients, the original Beginning article content is women's dress class article, cuisines class article, numeral science and technology class article, film class article, small pure and fresh class article, pseudo-classic Wind class article.
The second object of the present invention is implemented with the following technical solutions:
A kind of electronic equipment, comprising: processor;
Memory;And program, wherein described program is stored in the memory, and is configured to by processor It executes, described program includes for executing a kind of classifying content polymerization of the invention.
The third object of the present invention is implemented with the following technical solutions:
A kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program It is executed by processor a kind of classifying content polymerization of the invention.
The fourth object of the present invention is implemented with the following technical solutions:
A kind of classifying content aggregation engine, characterized by comprising:
Story label module is established, the story label module of establishing is for obtaining different types of original on line platform Beginning article content and article content to be measured, when the original article content and the article content to be measured are not comment class text Zhang Shi establishes attribute tags corresponding with the original article content according to variety classes, by the attribute tags and the original Beginning article content establishes mapping relations;
High frequency words conclude module, and the high frequency words are concluded module and are used for using segmenter to different types of original text Chapter content deconstruct and extract respectively the corresponding high-frequency phrase of each original article content, and by each high frequency words Group establishes mapping relations with the attribute tags;
Linear model module is established, the linear model module of establishing is for each high-frequency phrase to be separately input into It is trained in several linear models to be trained and obtains corresponding with the attribute tags having trained linear model;
Content, classification module, the content, classification module according to difference for having trained linear model to article to be measured Content is screened and matches the corresponding attribute tags.
Further, when the original article content and the article content to be measured are comment class article, comprising:
Hot word library module is established, if the true comment established hot word library module and be used to obtain main line upper mounting plate, according to Hot word bank is established in several true comments;
Hot word bank module is arranged, the hot word bank module that arranges is used to carry out several true comments in the hot word bank Attributive classification simultaneously obtains number of words attribute and qualitative attribute;
Abundant hot word library module, the abundant hot word library module are used to deduce out from the hot word bank using word2vec Near synonym library carries out progressive alternate to the true comment of the different number of words attributes using the near synonym library and obtains Abundant hot word bank;
Categorization module is commented on, the comment categorization module is for the hot word bank and the article content to be measured to be input to Classify in greedy Matching Model, greediness Matching Model piece in the hot word bank matches the corresponding quality category Property.
Compared with prior art, the beneficial effects of the present invention are: a kind of classifying content polymerization of the invention passes through elder generation Classified to original article content and establish corresponding attribute tags, using segmenter to different types of original article content It carries out structure and extracts the corresponding high-frequency phrase of each original article content, high-frequency phrase and attribute tags are established into mapping and closed System, by high-frequency phrase input value linear model, thus obtain it is corresponding with attribute tags trained linear model, reuse Training linear model screens article content to be measured and matches corresponding attribute tags, i.e., by article content to be measured and category Property label establishes corresponding relationship, and carries out classification polymerization according to corresponding relationship, and this mode classification no longer requires manual intervention place Reason, it is intelligentized that article content to be measured is classified, the rate of precision of classification is improved, cost of labor is reduced, according to be measured The corresponding attribute tags of article content can once present user at the moment, greatly improves use in a manner of different attribute label by it The experience sense at family.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, the following is a detailed description of the preferred embodiments of the present invention and the accompanying drawings. A specific embodiment of the invention is shown in detail by following embodiment and its attached drawing.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of content polymerization process of the invention;
Fig. 2 is a kind of module frame chart of content-aggregated engine of the invention;
Fig. 3 is a kind of operation schematic diagram of content-aggregated engine of the invention in working condition;
Fig. 4 is a kind of display interface schematic diagram one of content-aggregated engine of the invention in working condition;
Fig. 5 is a kind of display interface schematic diagram two of content-aggregated engine of the invention in working condition.
Specific embodiment
In the following, being described further in conjunction with attached drawing and specific embodiment to the present invention, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example.
As shown in Figure 1, a kind of classifying content polymerization of the invention, comprising the following steps:
Story label is established, different types of original article content and article content to be measured on line platform are obtained, When the original article content and the article content to be measured for comment class article when, according to variety classes establish with it is described The attribute tags and the original article content are established mapping relations by the corresponding attribute tags of original article content;At this It is that different types of original article content is obtained in each online network platform according to reptile instrument in embodiment, can be divided by type The types such as women's dress, cuisines, numeral science and technology, film, small pure and fresh, trend of back-to-ancients, original article content can be divided into comment and non-comment By class article;When the original article content and the article content to be measured are not comment class article and in original article Holding is that user is original or the content of platform professional production, establishes attribute tags first, in accordance with type, attribute tags are women's dress, beauty Food, numeral science and technology, film, small pure and fresh, trend of back-to-ancients etc., and mapping pass is established by each attribute tags and per each original article content System, i.e., classify all original article contents according to attribute tags.The corresponding original of each attribute tags in the present embodiment The quantity of beginning article content is at least 1,000.
High frequency words are concluded, and different types of original article content deconstruct and extracted respectively every using segmenter The corresponding high-frequency phrase of a original article content, and each high-frequency phrase and the attribute tags are established into mapping and closed System.It is mentioned in the present embodiment using the comprehensive different types of original article content of IKAnalyzer and paoding classifier High frequency words are taken, extracting positive keyword in each original article content and negative sense keyword seniority among brothers and sisters first, (positive keyword is logical The top50 of this article classification high frequency words is often chosen, negative sense can choose other classification article high frequency words top3 or top5), on The positive keyword and negative sense keyword stated are high frequency words;Each high frequency vocabulary is standardized, i.e., statistics is each Current frequency of occurrence of the high frequency vocabulary in the corresponding original article is a, is most had more in the original article content Occurrence number is maxHot and at least number occurs to be minHot;According to the current frequency of occurrence, at most there is this number and at least go out Occurrence number calculates the corresponding weight of the high frequency vocabulary, according to the weight to the high frequency words in each high-frequency phrase It converges and carries out weight sequencing.Referring in particular to formula (1):
Weight=(a-minHot)/(maxHot-minHot) (1)
Wherein, a is current frequency of occurrence, and maxHot is most frequency of occurrence, and minHot is number at least occur.
Linear model is established, each high-frequency phrase is separately input into several linear models to be trained and is trained And it obtains corresponding with the attribute tags having trained linear model;It can be established in the present embodiment according to the type of high frequency words multiple It is corresponding to have trained linear model, and training pattern is subjected to weight convergence using sigmond function.
Classifying content has trained linear model to screen article content to be measured according to difference and has matched correspondence The attribute tags.It will have been trained in linear model described in article content to be measured difference input value difference, it is each described to have instructed Practice linear model and export corresponding phasic property value, filter out the maximum phasic property value it is corresponding it is described trained linear model, according to institute It states training pattern and filters out the corresponding attribute tags.Linear model has been trained by article content input value to be measured is multiple In, each train linear model that can export corresponding phasic property value, phasic property value is higher, then article content to be measured is corresponding Attribute tags are closest, therefore when carrying out evaluation attribute to article content to be measured, evaluated according to phasic property value is highest To which different article contents to be measured is realized precisely reasonable classification and polymerization.
In the present embodiment, when the original article content and the article content to be measured are comment class article, when When article class to be measured is held and original article content is comment class article, classification polymerization as is carried out to comment.It then executes following Step:
Hot word bank is established, if obtaining the true comment of main line upper mounting plate, establishes hot word bank according to several true comments; Obtain the true comment on line.900,000 true comment component hot word banks are had collected in the present embodiment.
Hot word bank is arranged, several true comments in the hot word bank are subjected to attributive classification and obtains number of words attribute and matter Measure attribute;By in the hot word bank it is several it is true comment successively according to number of words how much carry out classification and according to quality well also into Row classification, the qualitative attribute is preferably commented on, difference is commented on, medium comment.How much it is divided into the comment of 1 word class, 2 words first, in accordance with number of words Class comment, the comment of 3 word classes, the comment of 4 word classes and the comment of 5 word classes take this five kinds of number of words classification, further according to quality in the present embodiment Attribute comments on above-mentioned 1 word class, the comment of 2 word classes, the comment of 3 word classes, 4 word classes are commented on and the comment of 5 word classes is divided into good comment, difference is commented Refer to middle comment.How much comment in hot word bank is arranged according to quality good job and number of words.
Abundant hot word bank, deduces out near synonym library using word2vec from the hot word bank, uses the near synonym library Progressive alternate is carried out to the true comment of the different number of words attributes and has been enriched hot word bank;It will using word2vec 1 word comment class is really commented on, and 2 words comment class is really commented on, the true comment of 3 words comment, the true comment of 4 words comment, and 5 words are really commented By comment progressive alternate, to achieve the effect that abundant hot word bank.
Comment classification, the hot word bank and the article content to be measured are input in greedy Matching Model and are classified, Greediness Matching Model piece in the hot word bank matches the corresponding qualitative attribute.Hot word bank abundant is led Enter in greedy Matching Model, and article content to be measured is classified and gathered using greedy Matching Model according to greedy matching strategy It closes, the greedy matching strategy in the present embodiment is divided into stringent and loose, and finally article content to be measured is classified and polymerize, most Eventually as shown in figure 5, all comments are shown to user according to qualitative attribute, i.e., favorable comment is same is shown to user in the same page.
Offer a kind of electronic equipment of the invention, comprising: processor;
Memory;And program, wherein described program is stored in the memory, and is configured to by processor It executes, described program includes for executing a kind of classifying content polymerization of the invention.
Of the invention provides a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: institute It states computer program and is executed by processor a kind of classifying content polymerization of the invention.
If Fig. 2 shows, the present invention provides a kind of classifying content aggregation engine, comprising: establishes story label module, the foundation Story label module is for obtaining different types of original article content and article content to be measured on line platform, when described When original article content and the article content to be measured are not comment class article, established and the original text according to variety classes The attribute tags and the original article content are established mapping relations by the corresponding attribute tags of chapter content;
High frequency words conclude module, and the high frequency words are concluded module and are used for using segmenter to different types of original text Chapter content deconstruct and extract respectively the corresponding high-frequency phrase of each original article content, and by each high frequency words Group establishes mapping relations with the attribute tags;
Linear model module is established, the linear model module of establishing is for each high-frequency phrase to be separately input into It is trained in several linear models to be trained and obtains corresponding with the attribute tags having trained linear model;
Content, classification module, the content, classification module according to difference for having trained linear model to article to be measured Content is screened and matches the corresponding attribute tags.After carrying out classification polymerization to article to be measured at this time, as Fig. 5 is shown as Display interface after classification polymerization carried out to article content to be measured, in Fig. 5 by article content be divided into makeups, wear take, at home, it is female Baby's attribute tags, each attribute tags classification lower section show similar article content to be measured.
Further, when the original article content and the article content to be measured are comment class article, comprising:
Hot word library module is established, if the true comment established hot word library module and be used to obtain main line upper mounting plate, according to Hot word bank is established in several true comments;
Hot word bank module is arranged, the hot word bank module that arranges is used to carry out several true comments in the hot word bank Attributive classification simultaneously obtains number of words attribute and qualitative attribute;
Abundant hot word library module, the abundant hot word library module are used to deduce out from the hot word bank using word2vec Near synonym library carries out progressive alternate to the true comment of the different number of words attributes using the near synonym library and obtains Abundant hot word bank;
Categorization module is commented on, the comment categorization module is used to the hot word bank and the comment being input to greediness Classify in Matching Model, greediness Matching Model piece in the hot word bank matches the corresponding qualitative attribute. Finally as Fig. 4 shows, all comments are shown to user according to qualitative attribute, i.e., favorable comment is same is shown to user in the same page.
Classifying content aggregation engine in the present embodiment is natural language processing classifying content aggregation engine, such as Fig. 3 institute Show, the classifying content aggregation engine in the present embodiment is carried out in application, first carrying out the image-text information etc. in shared data Caching, then classification polymerization is carried out to shared data with the classifying content aggregation engine in the present embodiment, staff passes through at this time Service list on contents management system configuration content service platform, content service platform is according to configured service list It will be put into service list by the shared data of classification polymerization, and sent out shared data publication by same external interface It is shown in the windows such as existing, intelligent, live streaming, activity, subchannel.
A kind of classifying content polymerization of the invention, by first being classified to original article content and being established corresponding Attribute tags carry out structure to different types of original article content using segmenter and extract each original article content to correspond to High-frequency phrase, high-frequency phrase and attribute tags are established into mapping relations, by high-frequency phrase input value linear model, thus Linear model has been trained to corresponding with attribute tags, has reused and linear model has been trained to screen simultaneously article content to be measured Match corresponding attribute tags, i.e., article content to be measured and attribute tags established into corresponding relationship, and according to corresponding relationship into Row classification polymerization, this mode classification no longer requires manual intervention processing, intelligentized that article content to be measured is classified, and mentions The high rate of precision of classification, reduces cost of labor, can be by it with difference according to the corresponding attribute tags of article content to be measured Once present user at the moment, greatly improves the experience sense of user to the mode of attribute tags.
More than, only presently preferred embodiments of the present invention is not intended to limit the present invention in any form;All current rows The those of ordinary skill of industry can be shown in by specification attached drawing and above and swimmingly implement the present invention;But all to be familiar with sheet special The technical staff of industry without departing from the scope of the present invention, is made a little using disclosed above technology contents The equivalent variations of variation, modification and evolution is equivalent embodiment of the invention;Meanwhile all substantial technologicals according to the present invention The variation, modification and evolution etc. of any equivalent variations to the above embodiments, still fall within technical solution of the present invention Within protection scope.

Claims (10)

1. a kind of classifying content polymerization, characterized by comprising:
Story label is established, different types of original article content and article content to be measured on line platform is obtained, works as institute State original article content and the article content to be measured for comment class article when, according to variety classes establish with it is described original The attribute tags and the original article content are established mapping relations by the corresponding attribute tags of article content;
High frequency words are concluded, and are carried out destructing to different types of original article content using segmenter and are extracted each institute respectively The corresponding high-frequency phrase of original article content is stated, and each high-frequency phrase and the attribute tags are established into mapping relations;
Linear model is established, each high-frequency phrase is separately input into several linear models to be trained and is trained and obtains Linear model has been trained to corresponding with the attribute tags;
Classifying content has trained linear model to screen article content to be measured according to difference and has matched corresponding institute State attribute tags.
2. a kind of classifying content polymerization as described in claim 1, it is characterised in that: when the original article content and institute Stating article content to be measured is when commenting on class article, to execute following steps:
Hot word bank is established, if obtaining the true comment of main line upper mounting plate, establishes hot word bank according to several true comments;
Hot word bank is arranged, several true comments in the hot word bank are subjected to attributive classification and obtains number of words attribute and quality category Property;
Abundant hot word bank, deduces out near synonym library using word2vec, using the near synonym library to not from the hot word bank The true comment with the number of words attribute carries out progressive alternate and has been enriched hot word bank;
Comment classification, the hot word bank and the article content to be measured are input in greedy Matching Model and are classified, described Greedy Matching Model piece in the hot word bank matches the corresponding qualitative attribute.
3. a kind of classifying content polymerization as claimed in claim 2, it is characterised in that: the arrangement hot word bank is specially will How much several true comments in the hot word bank successively carry out classification and well also classifying according to quality, institute according to number of words State qualitative attribute preferably comment on, difference comment, medium comment.
4. a kind of classifying content polymerization as described in claim 1, it is characterised in that: if each high-frequency phrase includes Dry high frequency vocabulary, the linear model of establishing further includes before high frequency words standardization, counts each high frequency vocabulary and exists Current frequency of occurrence in the corresponding original article at most frequency of occurrence and minimum occurs in the original article content Number;According to the current frequency of occurrence, at most there is this number and the corresponding weight of the minimum frequency of occurrence calculating high frequency vocabulary, Weight sequencing is carried out to the high frequency vocabulary in each high-frequency phrase according to the weight.
5. a kind of classifying content polymerization as described in claim 1, it is characterised in that: the classifying content specifically: will It has been trained in linear model described in article content difference input value to be measured is different, it is each described that linear model output has been trained to correspond to Phasic property value, filter out the maximum phasic property value it is corresponding it is described trained linear model, filtered out according to the training pattern The corresponding attribute tags.
6. a kind of classifying content polymerization as described in claim 1, it is characterised in that: the attribute tags can for women's dress, Cuisines, numeral science and technology, film, small pure and fresh, trend of back-to-ancients, the original article content are women's dress class article, cuisines class article, number Science and technology article, film class article, small pure and fresh class article, trend of back-to-ancients class article.
7. a kind of electronic equipment, characterized by comprising: processor;
Memory;And program, wherein described program is stored in the memory, and is configured to be held by processor Row, described program include requiring method described in 1-6 any one for perform claim.
8. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program quilt Processor executes method as claimed in any one of claims 1 to 6.
9. a kind of classifying content aggregation engine, characterized by comprising:
Story label module is established, the story label module of establishing is for obtaining different types of original text on line platform Chapter content and article content to be measured, when the original article content and the article content to be measured are not comment class article When, establish attribute tags corresponding with the original article content according to variety classes, by the attribute tags with it is described original Article content establishes mapping relations;
High frequency words conclude module, and the high frequency words are concluded module and are used for using segmenter in different types of original article Appearance deconstruct and extract respectively the corresponding high-frequency phrase of each original article content, and will each high-frequency phrase and The attribute tags establish mapping relations;
Establish linear model module, described to establish linear model module several for each high-frequency phrase to be separately input into It is trained in linear model to be trained and obtains corresponding with the attribute tags having trained linear model;
Content, classification module, the content, classification module according to difference for having trained linear model to article content to be measured It is screened and matches the corresponding attribute tags.
10. a kind of classifying content aggregation engine as claimed in claim 9, it is characterised in that: when the original article content and When the article content to be measured is comment class article, comprising:
Hot word library module is established, if the true comment established hot word library module and be used to obtain main line upper mounting plate, according to several Hot word bank is established in the true comment;
Hot word bank module is arranged, the hot word bank module that arranges is used to several true comments in the hot word bank carrying out attribute Classify and obtains number of words attribute and qualitative attribute;
Abundant hot word library module, the abundant hot word library module are used to deduce out nearly justice from the hot word bank using word2vec Dictionary carries out progressive alternate to the true comment of the different number of words attributes using the near synonym library and has been enriched Hot word bank;
Categorization module is commented on, the comment categorization module is used to the hot word bank and the article content to be measured being input to greediness Classify in Matching Model, greediness Matching Model piece in the hot word bank matches the corresponding qualitative attribute.
CN201810744608.3A 2018-07-09 2018-07-09 Content classification and aggregation method, electronic equipment, storage medium and engine Active CN109241297B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810744608.3A CN109241297B (en) 2018-07-09 2018-07-09 Content classification and aggregation method, electronic equipment, storage medium and engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810744608.3A CN109241297B (en) 2018-07-09 2018-07-09 Content classification and aggregation method, electronic equipment, storage medium and engine

Publications (2)

Publication Number Publication Date
CN109241297A true CN109241297A (en) 2019-01-18
CN109241297B CN109241297B (en) 2022-04-19

Family

ID=65071818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810744608.3A Active CN109241297B (en) 2018-07-09 2018-07-09 Content classification and aggregation method, electronic equipment, storage medium and engine

Country Status (1)

Country Link
CN (1) CN109241297B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020729A (en) * 2019-03-05 2019-07-16 中国联合网络通信集团有限公司 Article reviewing method and device based on artificial intelligence
CN110413759A (en) * 2019-07-31 2019-11-05 杭州凡闻科技有限公司 A kind of multi-platform user interaction data analysis method and system for from media
CN110955816A (en) * 2019-11-08 2020-04-03 广州坚和网络科技有限公司 Method for aggregating subject content based on content label
CN111159347A (en) * 2019-12-30 2020-05-15 掌阅科技股份有限公司 Article content quality data calculation method, calculation device and storage medium
CN111177369A (en) * 2019-11-19 2020-05-19 厦门二五八网络科技集团股份有限公司 Method and device for automatically classifying labels of articles
CN112131346A (en) * 2020-09-25 2020-12-25 北京达佳互联信息技术有限公司 Comment aggregation method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207913A (en) * 2013-04-15 2013-07-17 武汉理工大学 Method and system for acquiring commodity fine-grained semantic relation
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
CN105740389A (en) * 2016-01-27 2016-07-06 上海晶赞科技发展有限公司 Classification method and device
US20180060302A1 (en) * 2016-08-24 2018-03-01 Microsoft Technology Licensing, Llc Characteristic-pattern analysis of text

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207913A (en) * 2013-04-15 2013-07-17 武汉理工大学 Method and system for acquiring commodity fine-grained semantic relation
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
CN105740389A (en) * 2016-01-27 2016-07-06 上海晶赞科技发展有限公司 Classification method and device
US20180060302A1 (en) * 2016-08-24 2018-03-01 Microsoft Technology Licensing, Llc Characteristic-pattern analysis of text

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020729A (en) * 2019-03-05 2019-07-16 中国联合网络通信集团有限公司 Article reviewing method and device based on artificial intelligence
CN110413759A (en) * 2019-07-31 2019-11-05 杭州凡闻科技有限公司 A kind of multi-platform user interaction data analysis method and system for from media
CN110955816A (en) * 2019-11-08 2020-04-03 广州坚和网络科技有限公司 Method for aggregating subject content based on content label
CN111177369A (en) * 2019-11-19 2020-05-19 厦门二五八网络科技集团股份有限公司 Method and device for automatically classifying labels of articles
CN111159347A (en) * 2019-12-30 2020-05-15 掌阅科技股份有限公司 Article content quality data calculation method, calculation device and storage medium
CN111159347B (en) * 2019-12-30 2023-03-21 掌阅科技股份有限公司 Article content quality data calculation method, calculation device and storage medium
CN112131346A (en) * 2020-09-25 2020-12-25 北京达佳互联信息技术有限公司 Comment aggregation method and device, storage medium and electronic equipment
CN112131346B (en) * 2020-09-25 2024-04-30 北京达佳互联信息技术有限公司 Comment aggregation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN109241297B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN109241297A (en) A kind of classifying content polymerization, electronic equipment, storage medium and engine
CN103745210B (en) A kind of leucocyte classification method and device
CN109145097A (en) A kind of judgement document's classification method based on information extraction
CN105930347B (en) Text analysis based power outage cause recognition system
CN107038480A (en) A kind of text sentiment classification method based on convolutional neural networks
CN110277165A (en) Aided diagnosis method, device, equipment and storage medium based on figure neural network
CN106960214A (en) Object identification method based on image
CN105469376B (en) The method and apparatus for determining picture similarity
CN107705066A (en) Information input method and electronic equipment during a kind of commodity storage
WO2021043140A1 (en) Method, apparatus and system for determining label
CN106649050B (en) Sequential system multi-parameter operation situation graphical representation method
CN108664538A (en) A kind of automatic identification method and system of the doubtful familial defect of power transmission and transforming equipment
CN106709528A (en) Method and device of vehicle reidentification based on multiple objective function deep learning
CN107885849A (en) A kind of moos index analysis system based on text classification
CN109766935A (en) A kind of semisupervised classification method based on hypergraph p-Laplacian figure convolutional neural networks
CN108897778A (en) A kind of image labeling method based on multi-source big data analysis
CN110096519A (en) A kind of optimization method and device of big data classifying rules
CN108427713A (en) A kind of video summarization method and system for homemade video
CN106778834A (en) A kind of AP based on distance measure study clusters image labeling method
CN108304479A (en) A kind of fast density cluster double-layer network recommendation method based on graph structure filtering
CN110377659A (en) A kind of intelligence chart recommender system and method
CN108319518A (en) File fragmentation sorting technique based on Recognition with Recurrent Neural Network and device
CN110458600A (en) Portrait model training method, device, computer equipment and storage medium
CN110414626A (en) A kind of pig variety ecotype method, apparatus and computer readable storage medium
CN107305640A (en) A kind of method of unbalanced data classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant