CN107679217A - Association method for extracting content and device based on data mining - Google Patents

Association method for extracting content and device based on data mining Download PDF

Info

Publication number
CN107679217A
CN107679217A CN201710976636.3A CN201710976636A CN107679217A CN 107679217 A CN107679217 A CN 107679217A CN 201710976636 A CN201710976636 A CN 201710976636A CN 107679217 A CN107679217 A CN 107679217A
Authority
CN
China
Prior art keywords
comment
data
label
query object
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710976636.3A
Other languages
Chinese (zh)
Other versions
CN107679217B (en
Inventor
徐伟建
刘建林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710976636.3A priority Critical patent/CN107679217B/en
Publication of CN107679217A publication Critical patent/CN107679217A/en
Application granted granted Critical
Publication of CN107679217B publication Critical patent/CN107679217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The embodiment of the present application discloses association method for extracting content and device based on data mining.One embodiment of this method includes:Pending data is obtained, pending data includes default query object;Determine in pending data, the candidate associated with default query object comments on label;Commented on from candidate in label and filter out comment label;The presentation order of each comment label is determined to the click volume of each comment label based on user.To preset query object comment label intelligent extraction and according to priority present.

Description

Association method for extracting content and device based on data mining
Technical field
The invention relates to field of computer technology, and in particular to Internet technical field, more particularly to based on number According to the association method for extracting content and device of excavation.
Background technology
In existing research tool, search key is generally inputted by user, and after user triggers search, to user Corresponding search result is presented.
When user needs to obtain the summary viewpoint about search key, can by reading search result one by one, And voluntarily summarize, refine.
Data mining, generally refer to be hidden in the process of wherein information by algorithm search from substantial amounts of data.Data Excavation is generally relevant with computer science, and passes through statistics, Data Environments, information retrieval, machine learning, expert system All multi-methods such as (rely on the past rule of thumb) and pattern-recognition realize above-mentioned target.
In existing research tool, not yet occur showing relevant search key in search result based on data mining Summary viewpoint technical scheme.
The content of the invention
The purpose of the embodiment of the present application is to propose a kind of association method for extracting content and device based on data mining.
In a first aspect, the embodiment of the present application provides a kind of association method for extracting content based on data mining, including:Obtain Pending data is taken, pending data includes default query object;Determine in pending data, associated with default query object Candidate comments on label;Commented on from candidate in label and filter out comment label;The click volume of each comment label is determined based on user The presentation order of each comment label.
In certain embodiments, determine in pending data, the candidate associated with default query object comments on label, bag Include:Based on natural language processing method, the candidate associated with default query object is extracted from pending data and comments on label.
In certain embodiments, commented on from candidate in label and filter out comment label, including:It is regular based on preset matching, Comment on to remove in label from candidate and comment on label with the candidate that default query object is not consistent to filter out comment label.
In certain embodiments, pending data includes the comment data to presetting query object, and method also includes:From pre- If hot spot data source obtain the comment data for including default query object;Determine the weight of each comment data;And based on each The weight of comment data determines the displaying order of each comment data.
In certain embodiments, the comment data for including default query object is obtained from default hot spot data source, including: The candidate's comment data for including default query object is obtained from default hot spot data source;And number is commented on based on each candidate According to page browsing amount, determine comment data from candidate's comment data.
In certain embodiments, the weight of each comment data is determined, including number is respectively commented on based on any one following determination According to weight:Based on whether having the focus word that exceedes preset times with the co-occurrence number of default query object in comment data, Determine the weight of comment data;Based on machine learning algorithm, the quality score of comment data is determined, and is based on quality score come really Determine the weight of comment data;And the click volume based on user to comment data, determine the weight of comment data.
In certain embodiments, method also includes:The emotion for determining each comment data based on natural language processing instrument is inclined To, and the Sentiment orientation based on each comment data determines the positive rating of default query object.
In certain embodiments, method also includes:Positive rating generation based on the default query object in each preset period of time The positive rating curve of default query object.
Second aspect, the embodiment of the present application provide a kind of association contents extraction device based on data mining, including:Treat Processing data acquiring unit, for obtaining pending data, pending data includes default query object;Determining unit, it is used for Determine in pending data, the candidate associated with default query object comments on label;First screening unit, for being commented on from candidate Comment label is filtered out in label;First display unit, for determining each comment to the click volume of each comment label based on user The presentation order of label.
In certain embodiments, determining unit is further used for:Based on natural language processing device, from pending data Extract the candidate associated with default query object and comment on label.
In certain embodiments, the first screening unit is further used for:Based on preset matching rule, label is commented on from candidate Middle removal comments on label to filter out comment label with the candidate that default query object is not consistent.
In certain embodiments, pending data includes the comment data to presetting query object, and device also includes:Comment Data capture unit, for obtaining the comment data for including default query object from default hot spot data source;Weight determines single Member, for determining the weight of each comment data;And second display unit, determine respectively to comment for the weight based on each comment data By the displaying order of data.
In certain embodiments, comment data acquiring unit is further used for:Obtain and include from default hot spot data source Candidate's comment data of default query object;And the page browsing amount based on each candidate's comment data, commented on from candidate Comment data is determined in data.
In certain embodiments, weight unit is further used for the power based on following each comment data of any one determination Weight:Based on whether having the focus word that exceedes preset times with the co-occurrence number of default query object in comment data, it is determined that commenting By the weight of data;Based on machine learning algorithm, determine the quality score of comment data, and determine to comment on based on quality score The weight of data;And the click volume based on user to comment data, determine the weight of comment data.
In certain embodiments, device also includes:Positive rating determining unit, for being determined based on natural language processing instrument The Sentiment orientation of each comment data, and the Sentiment orientation based on each comment data determines the positive rating of default query object.
In certain embodiments, device also includes:Positive rating curve generation unit, for based on pre- in each preset period of time If the positive rating curve of the default query object of positive rating generation of query object.
The third aspect, the embodiment of the present application provide a kind of server, including:One or more processors;Storage device, For storing one or more programs, when one or more programs are executed by one or more processors so that one or more Processor realizes method as above.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable recording medium, are stored thereon with computer journey Sequence, method as above is realized when program is executed by processor.
The association method for extracting content and device based on data mining that the embodiment of the present application provides, included in advance by obtaining If the pending data of query object, and determine that the candidate associated with default query object comments on mark from pending data Label, then commented on from candidate in label and filter out comment label, it is finally based on user and the click volume of each comment label is determined respectively to comment By the presentation order of label, realize the intelligent extraction of the comment label to presetting query object and according to priority present.
Further, when default query object is scanned for as search key, it is possible to reduce user is to search As a result click one by one is read, and so as to reduce the occupancy of Internet resources, is advantageous to the stable operation of search server.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that the application can apply to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the association method for extracting content based on data mining of the application;
Fig. 3 is the flow according to another embodiment of the association method for extracting content based on data mining of the application Figure;
Fig. 4 is the structural representation according to one embodiment of the association contents extraction device based on data mining of the application Figure;
Fig. 5 is adapted for the structural representation for realizing the terminal device of the embodiment of the present application or the computer system of server Figure.
Embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Be easy to describe, illustrate only in accompanying drawing to about the related part of invention.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1, which is shown, can apply the association method for extracting content based on data mining of the application or based on data mining Association contents extraction device embodiment exemplary system architecture 100.
As shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104,105, first service Device 106 and second server 107.Network 104 between terminal device 101,102,103 and first server 106 providing The medium of communication link, network 105 between first server 106 and second server 107 provide communication link Jie Matter.Network 104,105 can include various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 101,102,103 by network 104 with first server 106, to receive Or send message etc..Various telecommunication customer end applications can be installed, such as searching class should on terminal device 101,102,103 With, web browser applications, the application of shopping class, mailbox client, social platform software etc..
Terminal device 101,102,103 can have a display screen and a various electronic equipments that supported web page browses, bag Include but be not limited to smart mobile phone, tablet personal computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio aspect 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio aspect 4) it is player, on knee portable Computer and desktop computer etc..
First server 106 can be to provide the server of various services, such as terminal device 101,102,103 is sent Searching request provide search result backstage search server.Backstage search server can be to searching request for receiving etc. Data are carried out the processing such as analyzing, and result (such as search result) is fed back into terminal device 101,102,103.
It should be noted that the association method for extracting content based on data mining that is provided of the embodiment of the present application typically by First server 106 performs, and correspondingly, the association contents extraction device based on data mining is generally positioned at first server In 106.
Second server 107 can be to provide the server of various services, for example, by capturing in first server 106 Search result carry out the background server of targeted to search result search key generation comment label.Second server 107 can obtain the search result corresponding to the search key in first server 106, and generation is corresponding with the search key Comment label, and by the comment tag feedback generated to first server 106.
It should be understood that the number of the terminal device, network, first server and second server in Fig. 1 is only to illustrate Property.According to needs are realized, can have any number of terminal device, network and server.
It should be noted that the association method for extracting content based on data mining that is provided of the embodiment of the present application typically by Second server 107 performs, and correspondingly, the association contents extraction device based on data mining is generally positioned at second server In 107.
With continued reference to Fig. 2, the reality for associating method for extracting content based on data mining according to the application is shown Apply the flow 200 of example.The association method for extracting content based on data mining, comprises the following steps:
Step 210, pending data is obtained, pending data includes default query object.
In the present embodiment, based on data mining association method for extracting content operation thereon electronic equipment (for example, Second server 107 shown in Fig. 1) can be by wired connection mode or radio connection from the electricity for communicating connection Pending data is obtained in sub- equipment (for example, first server 106 shown in Fig. 1).It is pointed out that above-mentioned wireless connection Mode can include but is not limited to 3G/4G connections, WiFi connections, bluetooth connection, WiMAX connections, Zigbee connections, UWB (ultra wideband) is connected and other currently known or exploitation in the future radio connections.
Acquired pending data in this step, for example, can include but is not limited to article, in social networking application by The partly-structured datas such as the comment that user delivers.Herein, it is semi-structured for example to refer to, by appropriate data processing, The data of structural data can be formed.
In application scenes, the acquisition pending data of this step for example can be related to the searching request of user Connection.Specifically, as user by terminal device (for example, terminal device 101,102,103 shown in Fig. 1) to search server (for example, first server 106 shown in Fig. 1) is sent to the searching request of a certain search key, the present embodiment based on number Can be with according to the association method for extracting content operation electronic equipment (for example, second server 107 shown in Fig. 1) thereon of excavation The search result that reception is sent to by search server is used as the pending data for current search keyword.At these In application scenarios, the search key of user's input may be considered a default query object in this step.
Or in other application scenarios, the acquisition pending data of this step can ask independently of the search of user Ask.Specifically, the electronic equipment of the operation of the association method for extracting content based on data mining of the present embodiment thereon can be led It is dynamic to capture the pending data for including default query object.For example, electronic equipment can answer from major social platform on one's own initiative Server captures the pending data for including keyword A.
Step 220, determine in pending data, the candidate associated with default query object comments on label.
Herein, the candidate that is associated with default query object comments on label and for example can be understood as being possible to being used as pair The information of the feature of default query object.
In application scenes, it is assumed that default query object is " convolutional neural networks (Convolutional Neural Network, CNN) ", and it is referred to convolutional neural networks and Recognition with Recurrent Neural Network simultaneously in the content of a certain bar pending data (Recurrent Neural Network, RNN), then, some evaluating words occurred in the pending data, for example (,) it is " accurate Rate height ", " model calculation speed is fast ", " directed circulation " etc. are possible to just can be understood as waiting as the word of CNN feature Choosing comment label.
Step 230, commented on from candidate in label and filter out comment label.
The purpose of this step, it is to comment in label to pick from the candidate associated with default query object obtained by step 220 Except the part candidate for being not belonging to default query object comments on label so that screen obtained comment label and default query object Incidence relation it is more accurate.
Still illustrated with step 220 given example." convolutional neural networks (Convolutional is directed in step 220 Neural Network, CNN) " this default query object candidate for extracting to obtain comments on label and includes " accuracy rate is high ", " mould Type arithmetic speed is fast ", " directed circulation ".In this step, will reject obvious " directed circulation " that cannot be used for evaluating CNN features this One candidate comments on label so that screens obtained comment label " accuracy rate is high ", " model calculation speed is fast " and CNN matching degree It is higher.
Can be as the pass related to default query object extracted in addition, screening the comment label drawn in this step Join one kind of content.
Step 240, the presentation order of each comment label is determined to the click volume of each comment label based on user.
In step 230, it has been determined that gone out the comment label associated with default query object.When user initiates to preset with this When query object is the searching request of search key, with this preset comment label that query object associates can be with search result The page sends the terminal device used to user in the lump.
When comment label is presented on into result of page searching, by the way that the larger comment label of user clicks is presented on More significant position, concerned degree of user's comment label within a period of time can be prompted higher, so, user Such as screening and preferential display can be further carried out to search result by clicking on the higher comment label of the attention rate.
This default query object still with " convolutional neural networks (Convolutional Neural Network, CNN) " Exemplified by, can be " accurate by the comment label being associated when user scans for convolutional neural networks as search key True rate height ", " model calculation speed is fast " are sent to the terminal device that user uses in the lump with search results pages.In addition, " model is transported It is fast to calculate speed " this comment label is because with higher user clicks, it is commented in result of page searching compared to other By label (for example, " accuracy rate is high ") preferential display.So, can be right if user clicks on " model calculation speed is fast " The search result of " convolutional neural networks " is further screened, and is therefrom filtered out and " model calculation speed is fast " this label Associated search result.
The association method for extracting content based on data mining of the present embodiment, include treating for default query object by obtaining Processing data, and determine that the candidate associated with default query object comments on label from pending data, then commented from candidate By comment label is filtered out in label, it is finally based on user and the presentation of each comment label is suitable is determined to the click volume of each comment label Sequence, realize the intelligent extraction of the comment label to presetting query object and according to priority present.
In some optional implementations of the present embodiment, in the determination pending data of step 220, with default inquiry The candidate of object association comments on label and may further include:Based on natural language processing method, extracted from pending data Go out the candidate associated with default query object and comment on label.
Natural language processing (Natural Language Processing, NLP), it is research computer disposal mankind's language A special kind of skill of speech.That includes the deciles such as syntax-semantic parsing, information extraction, text mining, machine translation, information retrieval Branch.Natural language processing method has been existing widely studied technology, be will not be repeated here.
In some optional implementations, step 230 from candidate comment on label in filter out comment label can enter One step includes:Based on preset matching rule, comment on to remove in label from candidate and commented on the candidate that default query object is not consistent Label is to filter out comment label.
In application scenes, for example, preset matching rule includes:Comment label for evaluating male is not useable for Women.So, it is assumed that the candidate's label obtained in step 220 includes " artistic skills are good ", " beautiful ", " handsome ", and presets inquiry pair As for a women, then, it is clear that " handsome " this candidate comments on label and will be removed.
It is shown in Figure 3, it is another embodiment of the association method for extracting content based on data mining of the application Indicative flowchart 300.
The method of the present embodiment includes:
Step 310, pending data is obtained, pending data includes default query object.
Step 320, determine in pending data, the candidate associated with default query object comments on label.
Step 330, commented on from candidate in label and filter out comment label.
Step 340, the presentation order of each comment label is determined to the click volume of each comment label based on user.
310~step 340 of above-mentioned steps is similar with the executive mode of step 210~step 240 of embodiment illustrated in fig. 2, It will not be repeated here.
Unlike the embodiment shown in Fig. 2, the present embodiment still further comprises:
Step 350, the comment data for including default query object is obtained from default hot spot data source.
Herein, default hot spot data source for example can be that the recent heat of the social platform pre-set searches data. Assuming that default query object is the film that a certain portion shows in the recent period.The movie name of the film can be used as default query object. If the heat that the movie name appears in a certain social platform is searched in data, the social platform heat can be obtained search in data and include The comment data of the movie name.
Step 360, the weight of each comment data is determined.
By the weight for determining comment data, it may be determined that associating between the comment data and the default query object Degree, and/or the quality of the comment data in itself.
Step 370, the weight based on each comment data determines the displaying order of each comment data.
The displaying order of each comment data is determined by the weight based on each comment data, query object can be preset with this Between the higher comment data of the higher comment data of the degree of association and/or quality level preferentially show user.
In some optional implementations, being obtained from default hot spot data source for the step 350 of the present embodiment includes The comment data of default query object may further include:
Step 351, the candidate's comment data for including default query object is obtained from default hot spot data source.
And step 352, based on the page browsing amount of each candidate's comment data, determined from candidate's comment data Comment data.
So, it can further be filtered out from hot spot data source and be associated with default query object and there is higher use The comment data of family attention rate.
In some optional implementations, the weight of each comment data of determination of the step 360 of the present embodiment for example may be used With including following any at least one:
Step 361, preset times are exceeded with the co-occurrence number of default query object based on whether having in comment data Focus word, determine the weight of comment data.Pass through the co-occurrence number for determining whether to have with default query object in comment data More than the focus word of preset times, the core focus (that is, focus word) that user pays close attention in comment data can be extracted, and increase Add the weight of the comment data, it is preferentially shown.
Step 362, based on machine learning algorithm, the quality score of comment data is determined, and determine based on quality score The weight of comment data.In application scenes, although the comment data in hot spot data source has with default query object Certain degree of association, but there is the comment data obvious " rubbing temperature " to be inclined to, then, it can be assumed that the comment data has Relatively low quality score.In these application scenarios, comment data can be inputted to the machine learning model (example of training in advance Such as, neural network model) in, so as to obtain the quality score for the comment data, and the comment number higher to quality score According to higher weight is assigned, it is set preferentially to show.
On the other hand, in application scenes, the comment data in some hot spot data sources is possible to comprising undesirable The rubbish contents of user are showed, such as carries out the comment data of ad promotions by means of hot spot data, there is the comment of improper speech Data etc..In these application scenarios, it can equally use the method for machine learning that these comment datas are carried out into filtering and pick Remove.It is for instance possible to use determine that the machine learning model identical model of quality score is commented comprising rubbish contents these Filtered out by data, or, using single machine learning model these can also be included with the comment numbers of rubbish contents According to being filtered out.
Step 363, the click volume based on user to comment data, the weight of comment data is determined.By based on user couple The click volume of comment data determines the weight of comment data, (can hit the comment data that user more pays close attention to higher point The comment data of amount) higher weight is assigned, it is preferentially shown.
It is understood that according in step 361~step 363 at least the two determine the weight of comment data, Then can be at least the two Weight determined addition, so that it is determined that going out the final weight of comment data.
In some optional implementations, the association method for extracting content based on data mining of the present embodiment can be with Further comprise:
Step 380, the Sentiment orientation of each comment data is determined based on natural language processing instrument, and is based on each comment data Sentiment orientation determine the positive rating of default query object.
For example, in application scenes, every comment data can be determined to default based on natural language processing instrument Query object emotion score value (for example, front tendency assignment 1, negative tendency assignment 0, middle sexual orientation assignment 0.5), eventually through one Fixed computing mode determines the positive rating of the default query object.
And step 390, the default query object of positive rating generation based on the default query object in each preset period of time Positive rating curve.
It is understood that because the quantity of comment data gradually changes (for example, increase), phase over time Ying Di, the positive rating for presetting query object will also be changed therewith.By based on the default query object in each preset period of time The positive rating curve of the default query object of positive rating generation, intuitively can be illustrated in a period of time, the default query object Positive rating development trend.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, dug this application provides one kind based on data One embodiment of the association contents extraction device of pick, the device embodiment is corresponding with the embodiment of the method shown in Fig. 2, the dress Put and specifically can apply in various electronic equipments.
As shown in figure 4, the association contents extraction device 400 based on data mining of the present embodiment includes:Pending data Acquiring unit 410, determining unit 420, the first screening unit 430 and the first display unit 440.
Wherein, pending data acquiring unit 410 can be used for obtaining pending data, and pending data includes default inquiry Object.
Determining unit 420 can be used in determination pending data, and the candidate associated with default query object comments on label.
First screening unit 430, which can be used for commenting in label from candidate, filters out comment label.
First display unit 440 can be used for the presentation for determining each comment label to the click volume of each comment label based on user Sequentially.
In some optional implementations, determining unit 420 can also be further used for:
Based on natural language processing device, the candidate's comment associated with default query object is extracted from pending data Label.
In some optional implementations, the first screening unit can also be further used for:
Based on preset matching rule, comment on to remove to comment on the candidate that default query object is not consistent in label from candidate and mark Sign to filter out comment label.
In some optional implementations, pending data can include the comment data to presetting query object.
In these optional implementations, the association contents extraction device based on data mining can also be wrapped further Include:Comment data acquiring unit, for obtaining the comment data for including default query object from default hot spot data source;Weight Determining unit, for determining the weight of each comment data;And second display unit, it is true for the weight based on each comment data The displaying order of fixed each comment data.
In some optional implementations, comment data acquiring unit can also be further used for:From default focus Data source obtains the candidate's comment data for including default query object;And the page browsing based on each candidate's comment data Amount, comment data is determined from candidate's comment data.
In some optional implementations, weight unit can also be further used for each based on any one following determination The weight of comment data:
Based on whether having the focus word that exceedes preset times with the co-occurrence number of default query object in comment data, really Determine the weight of comment data.
Based on machine learning algorithm, determine the quality score of comment data, and comment data is determined based on quality score Weight.
And the click volume based on user to comment data, determine the weight of comment data.
In some optional implementations, the association contents extraction device based on data mining can also include:Favorable comment Rate determining unit, for determining the Sentiment orientation of each comment data based on natural language processing instrument, and it is based on each comment data Sentiment orientation determine the positive rating of default query object.
In some optional implementations, the association contents extraction device based on data mining can also include:Favorable comment Rate curve generation unit, for the good of the default query object of positive rating generation based on the default query object in each preset period of time Comment rate curve.
Below with reference to Fig. 5, it illustrates suitable for for realizing the computer system 500 of the server of the embodiment of the present application Structural representation.Server shown in Fig. 5 is only an example, should not be to the function and use range band of the embodiment of the present application Carry out any restrictions.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 608 and Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;Penetrated including such as negative electrode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.; And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 510, in order to read from it Computer program be mounted into as needed storage part 508.
Especially, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, it includes being carried on computer-readable medium On computer program, the computer program include be used for execution flow chart shown in method program code.In such reality To apply in example, the computer program can be downloaded and installed by communications portion 509 from network, and/or from detachable media 511 are mounted.When the computer program is performed by CPU (CPU) 501, perform what is limited in the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer-readable recording medium either the two any combination.Computer-readable recording medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination. The more specifically example of computer-readable recording medium can include but is not limited to:Electrical connection with one or more wires, Portable computer diskette, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer-readable recording medium can any be included or store The tangible medium of program, the program can be commanded the either device use or in connection of execution system, device.And In the application, computer-readable signal media can include believing in a base band or as the data that a carrier wave part is propagated Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium beyond readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by instruction execution system, device either device use or program in connection.Included on computer-readable medium Program code any appropriate medium can be used to transmit, include but is not limited to:Wirelessly, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.
The calculating of the operation for performing the application can be write with one or more programming languages or its combination Machine program code, described program design language include object oriented program language-such as Java, Smalltalk, C+ +, in addition to conventional procedural programming language-such as " C " language or similar programming language.Program code can Fully to perform on the user computer, partly perform, performed as an independent software kit on the user computer, Part performs or performed completely on remote computer or server on the remote computer on the user computer for part. In the situation of remote computer is related to, remote computer can pass through the network of any kind --- including LAN (LAN) Or wide area network (WAN)-subscriber computer is connected to, or, it may be connected to outer computer (such as utilize Internet service Provider passes through Internet connection).
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, the part of the module, program segment or code include one or more use In the executable instruction of logic function as defined in realization.It should also be noted that marked at some as in the realization replaced in square frame The function of note can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actually It can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also to note Meaning, the combination of each square frame and block diagram in block diagram and/or flow chart and/or the square frame in flow chart can be with holding Function as defined in row or the special hardware based system of operation are realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set within a processor, for example, can be described as:A kind of processor bag Include pending data acquiring unit, determining unit, the first screening unit and the first display unit.Wherein, the title of these units The restriction to the unit in itself is not formed under certain conditions, for example, pending data acquiring unit is also described as " unit for obtaining pending data ".
As on the other hand, present invention also provides a kind of computer-readable medium, the computer-readable medium can be Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the device so that should Device:Pending data is obtained, pending data includes default query object;Determine in pending data, with default inquiry pair As the candidate of association comments on label;Commented on from candidate in label and filter out comment label;Point based on user to each comment label The amount of hitting determines the presentation order of each comment label.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the particular combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from foregoing invention design, carried out by above-mentioned technical characteristic or its equivalent feature The other technical schemes for being combined and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein The technical scheme that the technical characteristic of energy is replaced mutually and formed.

Claims (18)

  1. A kind of 1. association method for extracting content based on data mining, it is characterised in that including:
    Pending data is obtained, the pending data includes default query object;
    Determine in the pending data, the candidate associated with the default query object comments on label;
    Commented on from the candidate in label and filter out comment label;
    The presentation order of each comment label is determined to the click volume of each comment label based on user.
  2. 2. according to the method for claim 1, it is characterised in that it is described to determine in the pending data, preset with described The candidate of query object association comments on label, including:
    Based on natural language processing method, the candidate associated with the default query object is extracted from the pending data Comment on label.
  3. 3. according to the method for claim 1, it is characterised in that described commented on from the candidate filters out comment mark in label Label, including:
    Based on preset matching rule, comment on the candidate that removal is not consistent with the default query object in label from the candidate and comment By label to filter out the comment label.
  4. 4. according to the method described in claim 1-3 any one, it is characterised in that the pending data is included to described pre- If the comment data of query object, methods described also include:
    The comment data for including the default query object is obtained from default hot spot data source;
    Determine the weight of each comment data;And
    Weight based on each comment data determines the displaying order of each comment data.
  5. 5. according to the method for claim 4, it is characterised in that described to be obtained from default hot spot data source comprising described pre- If the comment data of query object, including:
    The candidate's comment data for including the default query object is obtained from default hot spot data source;And
    Based on the page browsing amount of each candidate's comment data, the comment is determined from candidate's comment data Data.
  6. 6. according to the method for claim 4, it is characterised in that the weight for determining each comment data, including base In the weight of each comment data of any one following determination:
    Based in the comment data whether have and the default query object co-occurrence number exceed preset times focus Word, determine the weight of the comment data;
    Based on machine learning algorithm, the quality score of the comment data is determined, and based on the quality score come described in determining The weight of comment data;And
    Click volume based on user to the comment data, determine the weight of the comment data.
  7. 7. according to the method described in claim 4-6 any one, it is characterised in that also include:
    The Sentiment orientation of each comment data is determined based on natural language processing instrument, and based on the feelings of each comment data Sense tendency determines the positive rating of the default query object.
  8. 8. according to the method for claim 7, it is characterised in that also include:
    The positive rating that positive rating based on the default query object in each preset period of time generates the default query object is bent Line.
  9. A kind of 9. association contents extraction device based on data mining, it is characterised in that including:
    Pending data acquiring unit, for obtaining pending data, the pending data includes default query object;
    Determining unit, for determining in the pending data, the candidate associated with the default query object comments on label;
    First screening unit, comment label is filtered out for being commented on from the candidate in label;
    First display unit, for determining the presentation of each comment label to the click volume of each comment label based on user Sequentially.
  10. 10. device according to claim 9, it is characterised in that the determining unit is further used for:
    Based on natural language processing device, the candidate associated with the default query object is extracted from the pending data Comment on label.
  11. 11. device according to claim 9, it is characterised in that first screening unit is further used for:
    Based on preset matching rule, comment on the candidate that removal is not consistent with the default query object in label from the candidate and comment By label to filter out the comment label.
  12. 12. according to the device described in claim 9-11 any one, it is characterised in that the pending data is included to described The comment data of default query object, described device also include:
    Comment data acquiring unit, for obtaining the comment number for including the default query object from default hot spot data source According to;
    Weight determining unit, for determining the weight of each comment data;And
    Second display unit, the displaying order of each comment data is determined for the weight based on each comment data.
  13. 13. device according to claim 12, it is characterised in that the comment data acquiring unit is further used for:
    The candidate's comment data for including the default query object is obtained from default hot spot data source;And
    Based on the page browsing amount of each candidate's comment data, the comment is determined from candidate's comment data Data.
  14. 14. device according to claim 12, it is characterised in that the weight unit is further used for based on following any One determines the weight of each comment data:
    Based in the comment data whether have and the default query object co-occurrence number exceed preset times focus Word, determine the weight of the comment data;
    Based on machine learning algorithm, the quality score of the comment data is determined, and based on the quality score come described in determining The weight of comment data;And
    Click volume based on user to the comment data, determine the weight of the comment data.
  15. 15. according to the device described in claim 12-14 any one, it is characterised in that also include:
    Positive rating determining unit, for determining the Sentiment orientation of each comment data, and base based on natural language processing instrument The positive rating of the default query object is determined in the Sentiment orientation of each comment data.
  16. 16. device according to claim 15, it is characterised in that also include:
    Positive rating curve generation unit, for described in the positive rating generation based on the default query object in each preset period of time The positive rating curve of default query object.
  17. 17. a kind of server, including:
    One or more processors;
    Storage device, for storing one or more programs,
    When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-8.
  18. 18. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that described program is processed The method as described in any in claim 1-8 is realized when device performs.
CN201710976636.3A 2017-10-19 2017-10-19 Associated content extraction method and device based on data mining Active CN107679217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710976636.3A CN107679217B (en) 2017-10-19 2017-10-19 Associated content extraction method and device based on data mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710976636.3A CN107679217B (en) 2017-10-19 2017-10-19 Associated content extraction method and device based on data mining

Publications (2)

Publication Number Publication Date
CN107679217A true CN107679217A (en) 2018-02-09
CN107679217B CN107679217B (en) 2021-12-07

Family

ID=61141669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710976636.3A Active CN107679217B (en) 2017-10-19 2017-10-19 Associated content extraction method and device based on data mining

Country Status (1)

Country Link
CN (1) CN107679217B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491463A (en) * 2018-03-05 2018-09-04 科大讯飞股份有限公司 Label determines method and device
CN109271609A (en) * 2018-09-14 2019-01-25 广州神马移动信息科技有限公司 Label generating method, device, terminal device and computer storage medium
CN110087118A (en) * 2019-04-26 2019-08-02 北京达佳互联信息技术有限公司 Comment on message treatment method, device, terminal, server and medium
CN110598786A (en) * 2019-09-09 2019-12-20 京东方科技集团股份有限公司 Neural network training method, semantic classification method and semantic classification device
CN110730382A (en) * 2019-09-27 2020-01-24 北京达佳互联信息技术有限公司 Video interaction method, device, terminal and storage medium
WO2020042376A1 (en) * 2018-08-31 2020-03-05 北京字节跳动网络技术有限公司 Method and apparatus for outputting information
CN111125028A (en) * 2019-12-25 2020-05-08 腾讯音乐娱乐科技(深圳)有限公司 Method, device, server and storage medium for identifying audio file
CN111177569A (en) * 2020-01-07 2020-05-19 腾讯科技(深圳)有限公司 Recommendation processing method, device and equipment based on artificial intelligence
CN111629270A (en) * 2019-02-27 2020-09-04 北京搜狗科技发展有限公司 Candidate item determination method and device and machine-readable medium
CN113553421A (en) * 2021-06-22 2021-10-26 北京百度网讯科技有限公司 Comment text generation method and device, electronic equipment and storage medium
CN113707335A (en) * 2021-09-06 2021-11-26 挂号网(杭州)科技有限公司 Method, device, electronic equipment and storage medium for determining target reception user

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081642A (en) * 2010-10-28 2011-06-01 华南理工大学 Chinese label extraction method for clustering search results of search engine
CN104598607A (en) * 2015-01-29 2015-05-06 百度在线网络技术(北京)有限公司 Method and system for recommending search phrase
US20170060999A1 (en) * 2015-09-01 2017-03-02 Electronics And Telecommunications Research Institute Apparatus and method for tagging topic to content
CN107153641A (en) * 2017-05-08 2017-09-12 北京百度网讯科技有限公司 Comment information determines method, device, server and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081642A (en) * 2010-10-28 2011-06-01 华南理工大学 Chinese label extraction method for clustering search results of search engine
CN104598607A (en) * 2015-01-29 2015-05-06 百度在线网络技术(北京)有限公司 Method and system for recommending search phrase
US20170060999A1 (en) * 2015-09-01 2017-03-02 Electronics And Telecommunications Research Institute Apparatus and method for tagging topic to content
CN107153641A (en) * 2017-05-08 2017-09-12 北京百度网讯科技有限公司 Comment information determines method, device, server and storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491463A (en) * 2018-03-05 2018-09-04 科大讯飞股份有限公司 Label determines method and device
WO2020042376A1 (en) * 2018-08-31 2020-03-05 北京字节跳动网络技术有限公司 Method and apparatus for outputting information
CN109271609A (en) * 2018-09-14 2019-01-25 广州神马移动信息科技有限公司 Label generating method, device, terminal device and computer storage medium
CN111629270A (en) * 2019-02-27 2020-09-04 北京搜狗科技发展有限公司 Candidate item determination method and device and machine-readable medium
CN110087118A (en) * 2019-04-26 2019-08-02 北京达佳互联信息技术有限公司 Comment on message treatment method, device, terminal, server and medium
CN110087118B (en) * 2019-04-26 2022-01-21 北京达佳互联信息技术有限公司 Comment message processing method, comment message processing device, comment message processing terminal, comment message processing server and comment message processing medium
CN110598786A (en) * 2019-09-09 2019-12-20 京东方科技集团股份有限公司 Neural network training method, semantic classification method and semantic classification device
CN110598786B (en) * 2019-09-09 2022-01-07 京东方科技集团股份有限公司 Neural network training method, semantic classification method and semantic classification device
CN110730382B (en) * 2019-09-27 2020-10-30 北京达佳互联信息技术有限公司 Video interaction method, device, terminal and storage medium
CN110730382A (en) * 2019-09-27 2020-01-24 北京达佳互联信息技术有限公司 Video interaction method, device, terminal and storage medium
CN111125028A (en) * 2019-12-25 2020-05-08 腾讯音乐娱乐科技(深圳)有限公司 Method, device, server and storage medium for identifying audio file
CN111125028B (en) * 2019-12-25 2023-10-24 腾讯音乐娱乐科技(深圳)有限公司 Method, device, server and storage medium for identifying audio files
CN111177569A (en) * 2020-01-07 2020-05-19 腾讯科技(深圳)有限公司 Recommendation processing method, device and equipment based on artificial intelligence
CN113553421A (en) * 2021-06-22 2021-10-26 北京百度网讯科技有限公司 Comment text generation method and device, electronic equipment and storage medium
CN113707335A (en) * 2021-09-06 2021-11-26 挂号网(杭州)科技有限公司 Method, device, electronic equipment and storage medium for determining target reception user

Also Published As

Publication number Publication date
CN107679217B (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN107679217A (en) Association method for extracting content and device based on data mining
CN107133345A (en) Exchange method and device based on artificial intelligence
CN107346336B (en) Information processing method and device based on artificial intelligence
CN105320766B (en) Information-pushing method and device
CN107491547A (en) Searching method and device based on artificial intelligence
CN109155136A (en) The computerized system and method for highlight are detected and rendered automatically from video
CN107105031A (en) Information-pushing method and device
CN107172151A (en) Method and apparatus for pushed information
CN106383875B (en) Man-machine interaction method and device based on artificial intelligence
CN107153641A (en) Comment information determines method, device, server and storage medium
CN107679211A (en) Method and apparatus for pushed information
CN106484766B (en) Searching method and device based on artificial intelligence
CN107609152A (en) Method and apparatus for expanding query formula
WO2020088058A1 (en) Information generating method and device
CN107506495A (en) Information-pushing method and device
CN105117474A (en) Method and device for loading recommendation information in webpage reading mode
CN106407361A (en) Method and device for pushing information based on artificial intelligence
CN110084658B (en) Method and device for matching articles
CN107169077A (en) Method and apparatus for pushed information
CN107783962A (en) Method and device for query statement
CN115982376B (en) Method and device for training model based on text, multimode data and knowledge
CN107958078A (en) Information generating method and device
CN107943895A (en) Information-pushing method and device
CN106407377A (en) Search method and device based on artificial intelligence
CN108268450A (en) For generating the method and apparatus of information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant