CN104915399A - Recommended data processing method based on news headline and recommended data processing method system based on news headline - Google Patents

Recommended data processing method based on news headline and recommended data processing method system based on news headline Download PDF

Info

Publication number
CN104915399A
CN104915399A CN201510290279.6A CN201510290279A CN104915399A CN 104915399 A CN104915399 A CN 104915399A CN 201510290279 A CN201510290279 A CN 201510290279A CN 104915399 A CN104915399 A CN 104915399A
Authority
CN
China
Prior art keywords
text fragments
eigenvalue
entity
rationale
headline
Prior art date
Application number
CN201510290279.6A
Other languages
Chinese (zh)
Inventor
罗剑波
张俊彬
蔡勋梁
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to CN201510290279.6A priority Critical patent/CN104915399A/en
Publication of CN104915399A publication Critical patent/CN104915399A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses a recommended data processing method based on a news headline. The method comprises the following steps: recognizing a news headline related to an entity pair from a webpage; calculating a keyword set of the entity pair; intercepting text fragments from the news headline so as to obtain a text fragment set with time information, extracting first characteristic values of all the text fragments in the text fragment set; calculating semantic vectors of all the text fragments in the text fragment set, extracting second characteristic values of all the text fragments according to the semantic vectors; and fitting the first characteristic values and the second characteristic values according to the click data of users, thereby obtaining a recommendation reason sequence. The method is capable of solving the problem that webpage intelligent recommendation reasons in the prior art are lack of interestingness and further ensuring that the recommendation reason has accuracy and attraction.

Description

Based on recommending data disposal route and the system of headline

Technical field

The present invention relates to computer network field, specifically, relate to a kind of recommending data disposal route based on headline and system.

Background technology

Rapidly, people generally adopt network to consult all kinds of news and information in current network information development.In the evolution of Internet news, as the important component part that ripe business commending system is indispensable, rationale for the recommendation sets forth recommendation logic objective and accurately.

Allow the intelligence of user awareness commending system, significant to lifting Consumer's Experience.Current rationale for the recommendation mainly relies on predefine template way to generate, and be limited to the richness of template, rationale for the recommendation is lack of diversity on language performance.Recommend in this class entertainment recommendations scene star in amusement circle, the spirit being also only limitted to " related person ", " guessing that you like ", " other people are also searching " these machine-made rationale for the recommendation and amusement at present supreme is incompatible with, is difficult to win user's favor.

For the rationale for the recommendation solving Web page intelligent commending system in prior art lacks this problem interesting, make rationale for the recommendation take into account accuracy and attractive force simultaneously, need a kind of brand-new recommending data disposal route and system badly.

Summary of the invention

In order to the rationale for the recommendation solving Web page intelligent commending system in prior art lacks this problem interesting, embodiments of the present invention provide a kind of recommending data disposal route based on headline and system.

On the one hand, embodiment of the present invention provides a kind of recommending data disposal route based on headline, and described method comprises:

Identify to entity relevant headline from webpage;

Calculate the keyword set that described entity is right;

From described headline, intercept text fragments, obtain the text fragments set of being with temporal information, extract the First Eigenvalue of each text fragments in described text fragments set;

Calculate the semantic vector of each text fragments in described text fragments set, extract the Second Eigenvalue obtaining each text fragments described according to described semantic vector;

According to the click data of user, described the First Eigenvalue and described Second Eigenvalue matching are obtained rationale for the recommendation sequence.

Accordingly, embodiment of the present invention additionally provides a kind of recommending data disposal system based on headline, and described system comprises:

Header identification module, for identifying to entity relevant headline from webpage;

Keyword computing module, for calculating the right keyword set of described entity;

Text fragments interception module, for intercepting text fragments from described headline, obtaining the text fragments set of being with temporal information, extracting the First Eigenvalue of each text fragments in described text fragments set;

Characteristic value calculating module, for calculating the semantic vector of each text fragments in described text fragments set, extracts the Second Eigenvalue obtaining each text fragments described according to described semantic vector;

Screening module, for the click data according to user, obtains rationale for the recommendation sequence by described the First Eigenvalue and described Second Eigenvalue matching.

Implement various embodiment of the present invention and there is following beneficial effect: can accurately recommend to have more the network information that is interesting and attractive force to user intelligently again.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the recommending data disposal route based on headline according to embodiment of the present invention;

Fig. 2 shows the particular flow sheet of the step S5 of method shown in Fig. 1;

Fig. 3 is the Organization Chart of the recommending data disposal system based on headline according to embodiment of the present invention;

Fig. 4 shows the block diagram of the screening module 500 shown in Fig. 3.

Embodiment

Be described in detail to various aspects of the present invention below in conjunction with the drawings and specific embodiments.Wherein, well-known module, unit and connection each other, link, communication or operation do not illustrate or do not elaborate.Further, described feature, framework or function can combine by any way in one or more embodiments.It will be appreciated by those skilled in the art that following various embodiments are only for illustrating, but not for limiting the scope of the invention.Can also easy understand, the module in each embodiment described herein and shown in the drawings or unit or processing mode can be undertaken combining and designing by various different configuration.

Fig. 1 is the process flow diagram of the recommending data disposal route based on headline according to embodiment of the present invention.See Fig. 1, described method comprises the steps:

S1, identifies to entity relevant headline from webpage;

S2, calculates the keyword set that described entity is right;

S3, intercepts text fragments from described headline, obtains the text fragments set of being with temporal information, extracts the First Eigenvalue of each text fragments in described text fragments set;

S4, calculates the semantic vector of each text fragments in described text fragments set, extracts the Second Eigenvalue obtaining each text fragments described according to described semantic vector; S5, according to the click data of user, obtains rationale for the recommendation sequence by described the First Eigenvalue and described Second Eigenvalue matching.

In embodiments of the present invention, the recommending data disposal route based on headline can comprise: perform step S1, identifies to entity relevant headline from webpage.Between step S1 and step S2, also can comprise the steps: to detect the time interval that described entity breaks out news.The time interval that Gauss's abnormity point breaks out news at first detection model detection entity can be utilized.Such as: the news total amount of certain star within the A time period can be detected, and within the B time period this Star News amount abnormal increase, namely the news explosion time of this star is the B time period.By above-mentioned detection entity to the step of the time interval that news breaks out, can inquire and the time of concentration of entity to related news, thus reduce the query context of rationale for the recommendation data and improve search efficiency.

Next, perform step S2, calculate the keyword set that described entity is right, specifically, can comprise and calculate described entity in keyword set interval sometime according to tf-idf algorithm.Wherein, tf-idf (term frequency – inverse document frequency) is a kind of conventional weighting technique prospected for information retrieval and information.Lists of keywords can be obtained according to tf-idf model extraction, such as: in certain time period, the keyword set of N name before intercepting according to tf-idf value order from high to low.

Next, perform step S3, from described headline, intercept text fragments, obtain the text fragments set of being with temporal information, extract the First Eigenvalue of each text fragments in described text fragments set.Such as, regular expression can be utilized from headline to intercept text fragments, obtain entity with temporal information to text fragments set.

Then, perform step S4, calculate the semantic vector of each text fragments in described text fragments set, extract the Second Eigenvalue obtaining each text fragments described according to described semantic vector.Such as, by convolutional neural networks degree of depth learning model, each semantic segment can obtain the semantic feature vector of 200 dimensions, such as: " romance is proposed successfully " can obtain V1, " propose and successfully become front-page headline " and obtain V2, due to these two text fragments semantic similarity, the cosine similarity of V1 and V2 can close to 1, and the cosine similarity that the not identical text fragments of semanteme obtains can be tending towards 0 is even less than 0;

Wherein, described the First Eigenvalue comprises: the ageing feature of syntactic structure characteristic sum; Described Second Eigenvalue comprises: correlative character, attention rate feature, attractive force feature.Specifically, dependency analysis instrument can be utilized to calculate the syntactic structure feature of text fragments, the text fragments not meeting Chinese syntactic structure is deleted; According to the text fragments with temporal information, the ageing feature that this entity is right can be inquired, such as, breaks out the time interval of news; Can according to whether a collection of text fragments of attractive artificial mark is as standard data set, training SVM (Support Vector Machine, support vector machine) disaggregated model, and utilize the attractive force of this SVM model prediction text fragments, obtain attractive force feature; The heat right from search engine search Web log mining entity searches word, calculates heat and searches word and entity to the semantic similarity of text fragments, obtain user's attention rate feature; The right relation of entity is obtained from knowledge base, computational entity is to the semantic similarity of relation and text fragments, obtain correlative character, such as: pass through convolutional neural networks, can obtain the semantic feature vector of " man and wife ", " girlfriend ", " boyfriend " these entity relationship vocabulary, the semantic similarity of they and text fragments is for representing the correlative character of this relation and text fragments.The similarity of such as " romance is proposed successfully " this text fragments and " boyfriend " is higher than the similarity of " unmanned plane is wanted to become front-page headline ", therefore the cosine similarity of the semantic feature vector of relation right for entity and text fragments can be represented correlative character.

By adopting described method of the present invention, the rationale for the recommendation that can solve Web page intelligent commending system in prior art lacks interesting problem, makes rationale for the recommendation take into account accuracy and attractive force simultaneously.

Fig. 2 shows the particular flow sheet of the step S5 of method shown in Fig. 1.See Fig. 2, described step S5 comprises:

S51, is converted into the polled data to described the First Eigenvalue and described Second Eigenvalue by described click data;

S52, obtains the sequence of described rationale for the recommendation according to described polled data, and extracts rationale for the recommendation according to described rationale for the recommendation sequence order from high to low.

In embodiments of the present invention, according to click data on artificial annotation results and line, consider attractive force, architectural feature, user's attention rate, correlativity, the order models of the features training text fragments such as ageing, each entity centering, the text fragments that rank is the highest just clicks the forward ballot that can be understood as text fragments each time as the rationale for the recommendation user that this entity is right, text fragments number of clicks more multilist bright it is more welcome, also more rationale for the recommendation is suitable as, so just, the click behavior of user is converted into the training data of order models, utilize this training data, we can train logistic regression (Logistic Regression) model on the foundation characteristic of 5 of text fragments, thus select the text fragments of high-quality as rationale for the recommendation, also can extract rank the first or the text fragments of front N name as rationale for the recommendation.

Fig. 3 is the Organization Chart of the recommending data disposal system 1 based on headline according to embodiment of the present invention.See Fig. 3, described system 1 comprises:

Header identification module 100, for identifying to entity relevant headline from webpage;

Keyword computing module 200, for calculating the right keyword set of described entity;

Text fragments interception module 300, for intercepting text fragments from described headline, obtaining the text fragments set of being with temporal information, extracting the First Eigenvalue of each text fragments in described text fragments set;

Characteristic value calculating module 400, for calculating the semantic vector of each text fragments in described text fragments set, extracts the Second Eigenvalue obtaining each text fragments described according to described semantic vector;

Screening module 500, for the click data according to user, obtains rationale for the recommendation sequence by described the First Eigenvalue and described Second Eigenvalue matching.

In embodiments of the present invention, the recommending data disposal system based on headline can comprise: header identification module 100, identifies to entity relevant headline from webpage.System also can comprise detection module, for described from webpage identify and entity to relevant headline after, before calculating the right keyword set of described entity, detect the time interval that described entity breaks out news.Such as: the news total amount of certain star within the A time period can be detected, and within the B time period this Star News amount abnormal increase, namely the news explosion time of this star is the B time period.By above-mentioned detection entity to the step of the time interval that news breaks out, can inquire and the time of concentration of entity to related news, thus reduce the query context of rationale for the recommendation data and improve search efficiency.

Keyword computing module 200, calculates the keyword set that described entity is right, specifically, can comprise and calculate described entity in keyword set interval sometime according to tf-idf algorithm.Wherein, tf-idf is a kind of conventional weighting technique prospected for information retrieval and information.Lists of keywords can be obtained according to tf-idf model extraction, such as: in certain time period, the keyword set of N name before intercepting according to tf-idf value order from high to low.

Text fragments interception module 300, intercepts text fragments from described headline, obtains the text fragments set of being with temporal information, extracts the First Eigenvalue of each text fragments in described text fragments set.Such as, regular expression can be utilized from headline to intercept text fragments, obtain entity with temporal information to text fragments set.

Characteristic value calculating module 400, calculates the semantic vector of each text fragments in described text fragments set, extracts the Second Eigenvalue obtaining each text fragments described according to described semantic vector.Such as, by convolutional neural networks degree of depth learning model, each semantic segment can obtain the semantic feature vector of 200 dimensions, such as: " romance is proposed successfully " can obtain V1, " propose and successfully become front-page headline " and obtain V2, due to these two text fragments semantic similarity, the cosine similarity of V1 and V2 can close to 1, and the cosine similarity that the not identical text fragments of semanteme obtains can be tending towards 0 is even less than 0;

Wherein, described the First Eigenvalue comprises: the ageing feature of syntactic structure characteristic sum; Described Second Eigenvalue comprises: correlative character, attention rate feature, attractive force feature.Specifically, dependency analysis instrument can be utilized to calculate the syntactic structure feature of text fragments, the text fragments not meeting Chinese syntactic structure is deleted; According to the text fragments with temporal information, the ageing feature that this entity is right can be inquired, such as, breaks out the time interval of news; Can according to whether a collection of text fragments of attractive artificial mark is as standard data set, training svm classifier model, and utilize the attractive force of this SVM model prediction text fragments, obtain attractive force feature; The heat right from search engine search Web log mining entity searches word, calculates heat and searches word and entity to the semantic similarity of text fragments, obtain user's attention rate feature; The right relation of entity is obtained from knowledge base, computational entity is to the semantic similarity of relation and text fragments, obtain correlative character, such as: pass through convolutional neural networks, can obtain the semantic feature vector of " man and wife ", " girlfriend ", " boyfriend " these entity relationship vocabulary, the semantic similarity of they and text fragments is for representing the correlative character of this relation and text fragments.The similarity of such as " romance is proposed successfully " this text fragments and " boyfriend " is higher than the similarity of " unmanned plane is wanted to become front-page headline ", therefore the cosine similarity of the semantic feature vector of relation right for entity and text fragments can be represented correlative character.

By adopting described system of the present invention, the rationale for the recommendation that can solve Web page intelligent commending system in prior art lacks interesting problem, makes rationale for the recommendation take into account accuracy and attractive force simultaneously.

Fig. 4 shows the block diagram of the screening module 500 shown in Fig. 3; See Fig. 4, described screening module 500 comprises:

Sequencing unit 510, for being converted into the polled data to described the First Eigenvalue and described Second Eigenvalue by described click data;

Extraction unit 520, for obtaining the sequence of described rationale for the recommendation according to described polled data, and extracts rationale for the recommendation according to described rationale for the recommendation sequence order from high to low.

In embodiments of the present invention, according to click data on artificial annotation results and line, consider attractive force, architectural feature, user's attention rate, correlativity, the order models of the features training text fragments such as ageing, each entity centering, the text fragments that rank is the highest just clicks the forward ballot that can be understood as text fragments each time as the rationale for the recommendation user that this entity is right, text fragments number of clicks more multilist bright it is more welcome, also more rationale for the recommendation is suitable as, so just, the click behavior of user is converted into the training data of order models, utilize this training data, we can train Logic Regression Models on the foundation characteristic of 5 of text fragments, thus select the text fragments of high-quality as rationale for the recommendation, also can extract rank the first or the text fragments of front N name as rationale for the recommendation.

Through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode of software combined with hardware platform, can certainly all be implemented by hardware.Based on such understanding, what technical scheme of the present invention contributed to background technology can embody with the form of software product in whole or in part, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, smart mobile phone or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.

The term used in instructions of the present invention and wording, just to illustrating, are not meaned and are formed restriction.It will be appreciated by those skilled in the art that under the prerequisite of the ultimate principle not departing from disclosed embodiment, can various change be carried out to each details in above-mentioned embodiment.Therefore, scope of the present invention is only determined by claim, and in the claims, except as otherwise noted, all terms should be understood by the most wide in range rational meaning.

Claims (10)

1. based on a recommending data disposal route for headline, it is characterized in that, described method comprises:
Identify to entity relevant headline from webpage;
Calculate the keyword set that described entity is right;
From described headline, intercept text fragments, obtain the text fragments set of being with temporal information, extract the First Eigenvalue of each text fragments in described text fragments set;
Calculate the semantic vector of each text fragments in described text fragments set, extract the Second Eigenvalue obtaining each text fragments described according to described semantic vector;
According to the click data of user, described the First Eigenvalue and described Second Eigenvalue matching are obtained rationale for the recommendation sequence.
2. the method for claim 1, is characterized in that, described from webpage identify and entity to relevant headline after, comprise before calculating the right keyword set of described entity:
Detect the time interval that described entity breaks out news.
3. method as claimed in claim 2, is characterized in that, the right keyword set of the described entity of described calculating comprises:
Described entity is calculated to the keyword set at described time interval according to tf-idf algorithm.
4. the method for claim 1, is characterized in that, described the First Eigenvalue comprises: the ageing feature of syntactic structure characteristic sum; Described Second Eigenvalue comprises: correlative character, attention rate feature, attractive force feature.
5. the method for claim 1, is characterized in that, according to the click data of user, described the First Eigenvalue and described Second Eigenvalue matching is obtained rationale for the recommendation sequence and comprises:
Described click data is converted into the polled data to described the First Eigenvalue and described Second Eigenvalue, obtains the sequence of described rationale for the recommendation according to described polled data, and extract rationale for the recommendation according to described rationale for the recommendation sequence order from high to low.
6. based on a recommending data disposal system for headline, it is characterized in that, described system comprises:
Header identification module, for identifying to entity relevant headline from webpage;
Keyword computing module, for calculating the right keyword set of described entity;
Text fragments interception module, for intercepting text fragments from described headline, obtaining the text fragments set of being with temporal information, extracting the First Eigenvalue of each text fragments in described text fragments set;
Characteristic value calculating module, for calculating the semantic vector of each text fragments in described text fragments set, extracts the Second Eigenvalue obtaining each text fragments described according to described semantic vector;
Screening module, for the click data according to user, obtains rationale for the recommendation sequence by described the First Eigenvalue and described Second Eigenvalue matching.
7. system as claimed in claim 6, it is characterized in that, described system comprises:
Detection module, for described from webpage identify and entity to relevant headline after, before calculating the right keyword set of described entity, detect the time interval that described entity breaks out news.
8. system as claimed in claim 7, it is characterized in that, the keyword set calculating described entity right in described keyword computing module comprises:
Described entity is calculated to the keyword set at described time interval according to tf-idf algorithm.
9. system as claimed in claim 6, it is characterized in that, described the First Eigenvalue comprises: the ageing feature of syntactic structure characteristic sum; Described Second Eigenvalue comprises: correlative character, attention rate feature, attractive force feature.
10. system as claimed in claim 6, it is characterized in that, described screening module comprises:
Sequencing unit, for being converted into the polled data to described the First Eigenvalue and described Second Eigenvalue by described click data;
Extraction unit, for obtaining the sequence of described rationale for the recommendation according to described polled data, and extracts rationale for the recommendation according to described rationale for the recommendation sequence order from high to low.
CN201510290279.6A 2015-05-29 2015-05-29 Recommended data processing method based on news headline and recommended data processing method system based on news headline CN104915399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510290279.6A CN104915399A (en) 2015-05-29 2015-05-29 Recommended data processing method based on news headline and recommended data processing method system based on news headline

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510290279.6A CN104915399A (en) 2015-05-29 2015-05-29 Recommended data processing method based on news headline and recommended data processing method system based on news headline

Publications (1)

Publication Number Publication Date
CN104915399A true CN104915399A (en) 2015-09-16

Family

ID=54084462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510290279.6A CN104915399A (en) 2015-05-29 2015-05-29 Recommended data processing method based on news headline and recommended data processing method system based on news headline

Country Status (1)

Country Link
CN (1) CN104915399A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869016A (en) * 2016-03-28 2016-08-17 天津中科智能识别产业技术研究院有限公司 Method for estimating click through rate based on convolution neural network
CN107491436A (en) * 2017-08-21 2017-12-19 北京百度网讯科技有限公司 A kind of recognition methods of title party and device, server, storage medium
CN107609094A (en) * 2017-09-08 2018-01-19 北京百度网讯科技有限公司 Data disambiguation method, device and computer equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2378415A1 (en) * 2010-02-19 2011-10-19 Sap Ag Service integration modeling and execution framework
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN103324665A (en) * 2013-05-14 2013-09-25 亿赞普(北京)科技有限公司 Hot spot information extraction method and device based on micro-blog
CN103383702A (en) * 2013-07-17 2013-11-06 中国科学院深圳先进技术研究院 Method and system for recommending personalized news based on ranking of votes of users
CN104036038A (en) * 2014-06-30 2014-09-10 北京奇虎科技有限公司 News recommendation method and system
CN104166668A (en) * 2014-06-09 2014-11-26 南京邮电大学 News recommendation system and method based on FOLFM model
CN104239587A (en) * 2014-10-17 2014-12-24 北京字节跳动网络技术有限公司 Method and device for refreshing news list

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2378415A1 (en) * 2010-02-19 2011-10-19 Sap Ag Service integration modeling and execution framework
CN102831234A (en) * 2012-08-31 2012-12-19 北京邮电大学 Personalized news recommendation device and method based on news content and theme feature
CN103324665A (en) * 2013-05-14 2013-09-25 亿赞普(北京)科技有限公司 Hot spot information extraction method and device based on micro-blog
CN103383702A (en) * 2013-07-17 2013-11-06 中国科学院深圳先进技术研究院 Method and system for recommending personalized news based on ranking of votes of users
CN104166668A (en) * 2014-06-09 2014-11-26 南京邮电大学 News recommendation system and method based on FOLFM model
CN104036038A (en) * 2014-06-30 2014-09-10 北京奇虎科技有限公司 News recommendation method and system
CN104239587A (en) * 2014-10-17 2014-12-24 北京字节跳动网络技术有限公司 Method and device for refreshing news list

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869016A (en) * 2016-03-28 2016-08-17 天津中科智能识别产业技术研究院有限公司 Method for estimating click through rate based on convolution neural network
CN107491436A (en) * 2017-08-21 2017-12-19 北京百度网讯科技有限公司 A kind of recognition methods of title party and device, server, storage medium
CN107609094A (en) * 2017-09-08 2018-01-19 北京百度网讯科技有限公司 Data disambiguation method, device and computer equipment

Similar Documents

Publication Publication Date Title
US9514405B2 (en) Scoring concept terms using a deep network
Baroni et al. Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors
US9229977B2 (en) Real-time and adaptive data mining
JP6440732B2 (en) Automatic task classification based on machine learning
CN104899273B (en) A kind of Web Personalization method based on topic and relative entropy
CN104408093B (en) A kind of media event key element abstracting method and device
CN107102989B (en) Entity disambiguation method based on word vector and convolutional neural network
CN104615767B (en) Training method, search processing method and the device of searching order model
KR101721338B1 (en) Search engine and implementation method thereof
JP6487201B2 (en) Method and apparatus for generating recommended pages
EP2866421B1 (en) Method and apparatus for identifying a same user in multiple social networks
JP5157314B2 (en) Similarity calculation method, context model derivation method, similarity calculation program, context model derivation program
CN102693272B (en) Keyword extraction from uniform resource locators (URLs)
Nahar et al. Sentiment analysis for effective detection of cyber bullying
CN102902821B (en) The image high-level semantics mark of much-talked-about topic Network Based, search method and device
CN103870973B (en) Information push, searching method and the device of keyword extraction based on electronic information
CN105389722B (en) Malicious order identification method and device
US20140279774A1 (en) Classifying Resources Using a Deep Network
US20150324065A1 (en) System and Method to Automatically Aggregate and Extract Key Concepts Within a Conversation by Semantically Identifying Key Topics
CN102549603B (en) Relevance-based image selection
CN104615608B (en) A kind of data mining processing system and method
WO2017066543A1 (en) Systems and methods for automatically analyzing images
WO2017024884A1 (en) Search intention identification method and device
KR20110115542A (en) Method for calculating semantic similarities between messages and conversations based on enhanced entity extraction
TWI465950B (en) Method and system for discovering suspicious account groups

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150916

RJ01 Rejection of invention patent application after publication