CN108959550A - User's focus method for digging, device, equipment and computer-readable medium - Google Patents

User's focus method for digging, device, equipment and computer-readable medium Download PDF

Info

Publication number
CN108959550A
CN108959550A CN201810712526.0A CN201810712526A CN108959550A CN 108959550 A CN108959550 A CN 108959550A CN 201810712526 A CN201810712526 A CN 201810712526A CN 108959550 A CN108959550 A CN 108959550A
Authority
CN
China
Prior art keywords
focus
user
class
entity
retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810712526.0A
Other languages
Chinese (zh)
Other versions
CN108959550B (en
Inventor
刘昊
何伯磊
肖欣延
吕雅娟
吴甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810712526.0A priority Critical patent/CN108959550B/en
Publication of CN108959550A publication Critical patent/CN108959550A/en
Application granted granted Critical
Publication of CN108959550B publication Critical patent/CN108959550B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of user's focus method for digging, comprising: obtains user retrieval behavior data;If not only having excavated theme class focus in the user retrieval behavior data, but also entity class focus is excavated, then expansion processing is carried out to the entity class focus, obtains the association focus of the entity class focus.The embodiment of the present invention is by excavating theme class focus, the focus that available user is long-term, extensive;By excavating entity class focus, it can excavate that user is short-term, specific focus;Expansion processing is carried out to the entity focus of user, is conducive to the entity focus excavated more comprehensively.Therefore, the focus excavated be more in line with the true interest of user and more comprehensively, be conducive to provide suitable recommendation for user.

Description

User's focus method for digging, device, equipment and computer-readable medium
Technical field
The present invention relates to data mining technology field more particularly to a kind of user's focus excavation sides read based on machine Method and device, equipment and computer-readable medium.
Background technique
Focus is the content tab from user interest, is theme, topic or entity that user gives more sustained attention.Individual character Change recommender system to model user and content of text using focus as essential characteristic, to support that content is accurate, a Propertyization distribution.
It is existing that personalized recommendation system is carried out based on focus, it is main using user's browsing within the system, thumb up, receive There are following two points in the focus of the Behavior minings users such as hiding, such method: 1) may for the user of cold start-up Initial focus be it is devious, will lead to that user model convergence is very slow, and there are deviations using existing model;2) early The recommender system of phase version fails to be optimal, and the later period may be caused to optimize to focus devious is stamped in user model System afterwards cannot effectively exert one's influence to user.
Summary of the invention
The embodiment of the present invention provides a kind of user's focus method for digging, device, equipment and computer-readable medium, with solution Certainly or alleviate one or more technical problems in the prior art.
In a first aspect, the embodiment of the invention provides a kind of user's focus method for digging, comprising:
Obtain user retrieval behavior data;
If not only having excavated theme class focus in the user retrieval behavior data, but also excavate entity class concern Point then carries out expansion processing to the entity class focus, obtains the association focus of the entity class focus.
With reference to first aspect, the embodiment of the present invention is in the first embodiment of first aspect, the method also includes:
Using at least one of deep neural network, convolutional neural networks, Recognition with Recurrent Neural Network, shot and long term memory network Mode establishes retrieval intention assessment model, to identify that the retrieval in the user retrieval behavior data is intended to;The retrieval meaning Figure includes navigation type, info class, transactions classes;
Correspondingly, the method also includes:
Retrieval is obtained in the user retrieval behavior data using the retrieval intention assessment model and is intended to info class User retrieval behavior data;
Correspondingly, the method also includes:
The theme class focus is excavated in the user retrieval behavior data of the info class and/or the entity class closes Note point.
With reference to first aspect, in second of embodiment of first aspect, the retrieval is intended to the embodiment of the present invention The user retrieval behavior data of info class include query text, click title, show at least one in title and clickthrough, It is then described that the theme class focus is excavated in the data of the info class, comprising:
Using the theme class prediction model constructed based on deep neural network, according to the theme system label of setting, in institute State query text, click title, show title and clickthrough in match theme class focus.
The first embodiment with reference to first aspect, the third embodiment of the embodiment of the present invention in first aspect In,
The user retrieval behavior data that the retrieval is intended to info class include query text, click title, show title It is at least one in clickthrough, then described that the entity class pass is excavated in the user retrieval behavior data of the info class Note point, comprising:
Candidate entity is obtained from the user retrieval behavior data of the info class;
Using the similarity calculation constructed based on deep neural network, calculate candidate entity and the query text it Between semantic similarity;
Entity class focus is matched from the query text according to the semantic similarity.
The third embodiment with reference to first aspect, four kind embodiment of the embodiment of the present invention in first aspect In, candidate entity is obtained in the user retrieval behavior data from the info class, comprising:
To in the user retrieval behavior data of the info class query text carry out inverted index, name Entity recognition and Term weight sequence obtains candidate entity.
Any embodiment with reference to first aspect or in first aspect, the 5th kind in first aspect of the embodiment of the present invention In embodiment, the method also includes:
If the user retrieval behavior data are user, history in the set time period retrieves behavioral data, to described Theme class focus in set period of time is polymerize, and, in the set period of time entity class focus and its Association focus is polymerize;
If the user retrieval behavior data are real-time retrieval behavioral datas for user's, according to the master excavated The weight of topic class focus and entity class focus updates the current concerns of the user.
Second aspect, the embodiment of the invention also provides a kind of user's focus excavating gears, comprising:
Module is obtained, for obtaining user retrieval behavior data;
Expand processing module, if for not only having excavated theme class focus in the user retrieval behavior data, but also Entity class focus is excavated, then expansion processing is carried out to the entity class focus, obtains the pass of the entity class focus Join focus.
In conjunction with second aspect, the embodiment of the present invention in the first embodiment of second aspect,
The theme class focus excavates module
Intention assessment module is retrieved, for using deep neural network, convolutional neural networks, Recognition with Recurrent Neural Network, length At least one of phase memory network mode establishes retrieval intention assessment model, to identify in the user retrieval behavior data Retrieval be intended to;The retrieval is intended to include navigation type, info class, transactions classes;
Data acquisition module is retrieved, for using the retrieval intention assessment model in the user retrieval behavior data Obtain the user retrieval behavior data that retrieval is intended to info class;
Focus excavates module, for excavating the theme class concern in the user retrieval behavior data of the info class Point and/or the entity class focus.
In conjunction with the first embodiment of second aspect, second embodiment of the embodiment of the present invention in first aspect In, the user retrieval behavior data that the retrieval is intended to info class include query text, click title, show title and click At least one of in link, then the focus excavates module when carrying out the excavation of theme class focus, for using based on deep The theme class prediction model for spending neural network building in the query text, clicks mark according to the theme system label of setting It inscribes, show and match theme class focus in title and clickthrough.
In conjunction with the first embodiment of second aspect, the third embodiment of the embodiment of the present invention in second aspect In,
The user retrieval behavior data that the retrieval is intended to info class include query text, click title, show title With in clickthrough at least one of, then the focus excavates module and includes:
Candidate entity acquisition submodule, for obtaining candidate entity from the user retrieval behavior data of the info class;
Similarity calculation submodule, for calculating and waiting using the similarity calculation constructed based on deep neural network Select the semantic similarity between entity and the query text;
Matched sub-block, for matching entity class focus from the query text according to the semantic similarity.
In conjunction with the third embodiment of second aspect, four kind embodiment of the embodiment of the present invention in second aspect In, it is described be selected entity acquisition submodule be specifically used for the query text in the user retrieval behavior data of the info class into Row inverted index, name Entity recognition and term weight sequence, obtain candidate entity.
In conjunction with any embodiment in second aspect or second aspect, the embodiment of the present invention in second aspect the 5th In kind embodiment, described device further include:
Aggregation module, if being that the history of user in the set time period retrieves row for the user retrieval behavior data For data, the theme class focus in the set period of time is polymerize, and, to the entity in the set period of time Class focus and its association focus are polymerize;
Update module, if being user for the user retrieval behavior data is real-time retrieval behavioral data, according to The weight of the theme class focus excavated and entity class focus updates the current concerns of the user.
The function of described device can also execute corresponding software realization by hardware realization by hardware.It is described Hardware or software include one or more modules corresponding with above-mentioned function.
The third aspect includes processor in a possible design, in the structure of user's focus excavating gear and is deposited Reservoir, the memory support user's focus excavating gear to execute user's focus in above-mentioned first aspect and excavate for storing The program of method, the processor is configured to for executing the program stored in the memory.User's focus is dug Digging device can also include communication interface, for user's focus excavating gear and other equipment or communication.
Fourth aspect, the embodiment of the invention provides a kind of computer-readable mediums, excavate for storing user's focus Computer software instructions used in device comprising for executing involved by user's focus method for digging of above-mentioned first aspect Program.
The embodiment of the present invention is by excavating theme class focus, the focus that available user is long-term, extensive;Pass through digging Entity class focus is dug, can excavate that user is short-term, specific focus;Expansion processing is carried out to the entity focus of user, Be conducive to the entity focus excavated more comprehensively.Therefore, the focus excavated is more in line with the true interest of user and more Add comprehensively, is conducive to provide suitable recommendation for user.
Further, by the offline Mining Interesting point from history retrieval behavioral data in advance, then real-time retrieval row is used Online updating is carried out for data, the convergence rate for being cold-started user's focus can be accelerated.
Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.
Detailed description of the invention
In the accompanying drawings, unless specified otherwise herein, otherwise indicate the same or similar through the identical appended drawing reference of multiple attached drawings Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings depict only according to the present invention Disclosed some embodiments, and should not serve to limit the scope of the present invention.
Fig. 1 is the flow chart of user's focus method for digging of one embodiment of the invention;
Fig. 2 is the specific steps flow chart that focus is obtained in one embodiment of the invention;
Fig. 3 is the module map of the retrieval intention assessment model of one embodiment of the invention;
Fig. 4 is the specific steps flow chart that entity class focus is obtained in one embodiment of the invention;
Fig. 5 is that the entity class focus of one embodiment of the invention obtains module map;
Fig. 6 is the step flow chart of user's focus method for digging of another embodiment of the present invention;
Fig. 7 is the block diagram of user's focus excavating gear of one embodiment of the invention;
Fig. 8 is the connection block diagram of user's focus excavating gear of another embodiment of the present invention;
Fig. 9 is the module frame chart that the focus of one embodiment of the invention obtains;
Figure 10 is the connection block diagram of user's focus excavating gear of another embodiment of the present invention;
Figure 11 is user's focus digging system architecture diagram based on retrieval behavior of one embodiment of the invention;
Figure 12 is user's focus excavating equipment block diagram of one embodiment of the invention.
Specific embodiment
Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be modified by various different modes. Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.The embodiment of the present invention mainly provides one kind The method and device that user's focus excavates is described by the expansion that following embodiment carries out technical solution separately below.
The present invention provides a kind of user's focus method for digging and device, the use of the embodiment of the present invention described in detail below The specific process flow and principle of family focus method for digging and device.
As shown in Figure 1, its flow chart for user's focus method for digging of the embodiment of the present invention.
The embodiment of the invention provides a kind of user's focus method for digging, comprising:
S110: user retrieval behavior data are obtained.
In one embodiment, the user retrieval behavior data include history retrieval behavior number within the set time According to the real-time retrieval behavioral data retrieved with active user.Wherein, set period of time includes appointing before current time It anticipates time range, it can be according to the characteristics of different user or the demand of practical application scene is selected.For example, may be set in Three months before current time or the record of the retrieval in six months retrieve behavioral data as history.The real-time retrieval of user Behavioral data can record for the retrieval that user is inquiring.And in the present embodiment, it can be searched for for example, by Baidu etc. Engine obtains the retrieval record of user.
S120: if not only having excavated theme class focus in the user retrieval behavior data, but also entity class is excavated Focus then carries out expansion processing to the entity class focus, obtains the association focus of the entity class focus.
For the retrieval behavior of user, user can be therefrom excavated for theme class focus, for example can excavate use The focus at family in which field, such as: sport, amusement etc..In addition, can also be obtained from the retrieval behavioral data of user User more specifically some focus, i.e. entity focus, such as: specific personage, event etc..Wherein, to excavating Entity focus after, carry out expansion processing for entity focus, namely carry out extensive processing.For example, when the entity excavated When focus is " in library ", extensive processing can be carried out, obtains associated focus, such as according to can expand to obtain in library " Warriors ", " NBA " etc. are associated with focus.
As shown in Fig. 2, illustrating how to obtain theme class focus and entity class focus in detail below.In a kind of embodiment In, the method also includes:
S130: using in deep neural network, convolutional neural networks, Recognition with Recurrent Neural Network, shot and long term memory network extremely A kind of few mode establishes retrieval intention assessment model, to identify that the retrieval in the user retrieval behavior data is intended to;It is described Retrieval is intended to include navigation type, info class, transactions classes.
According to the difference that user search is intended to, user retrieval behavior can be divided into three classifications: navigation type, information Class, things class.Wherein the query intention of navigation type be in order to access some specific website, such as inquire some company or The homepage of some tissue;The query intention of info class be in order to obtain it is some ought to present on one or more webpage letter Breath, such as study course, brief introduction of some star in amusement circle about deep learning etc.;The query intention of things class is to carry out Activity based on web, such as shopping, software download etc..Since the focus of user often lies in the retrieval behavior of info class In can only retain the coordinate indexing behavioral data of info class.
In one embodiment, it can establish retrieval intention assessment model, retrieve intention assessment model pair by establishing The retrieval intention of user is classified.As shown in figure 3, it is the module map for retrieving intention assessment model.Wherein, the retrieval meaning Figure identification model includes characteristic layer, expression layer and classification layer.Wherein, include query text in characteristic layer, click title (title), the behavior and result of retrievals such as title (title) are shown.The behavioral data of acquisition is input to table by the characteristic layer Show in layer, calculated result is exported to layer of classifying by expression layer, identification is finally exported by classification layer again and is intended to.Wherein, in expression layer In can use deep neural network (Deep Neural Networks, DNN), convolutional neural networks (Convolutional Neural Networks, CNN), Recognition with Recurrent Neural Network (Recurrent Neural Networks, RNN), shot and long term remember net At least one of network (Long Short-Term Memory, LSTM) mode is established.It can be with by retrieval intention assessment model Retrieval behavioral data is screened, the retrieval behavioral data of info class is obtained.
S140: retrieval is obtained in the user retrieval behavior data using the retrieval intention assessment model and is intended to letter Cease the user retrieval behavior data of class.
S150: the theme class focus and/or the reality are excavated in the user retrieval behavior data of the info class Body class focus.
In one embodiment, it is described retrieve be intended to info class user retrieval behavior data include query text, It clicks title, show at least one in title and clickthrough.It is described that the theme is excavated in the data of the info class Class focus, comprising: using the theme class prediction model constructed based on deep neural network, according to the theme system mark of setting Label, the query text, click title, show title and clickthrough in match theme class focus.
Wherein, the theme system label can be defined according to actual needs, for example, may include: sport, amusement, Several major class such as politics, economic, education.When being matched, the feature of use and the feature of theme system label can be carried out Similarity calculation, to judge the theme class focus of user.For example, the query text obtained from the retrieval record of user Are as follows: " lindane ", by being matched with above-mentioned label system, the highest available corresponding similarity of discovery is " sport ", Therefore the theme class focus that user can be extracted is " sport ".
As shown in figure 4, in one embodiment, excavating the entity in the user retrieval behavior data of the info class Class focus, comprising:
S151: candidate entity is obtained from the user retrieval behavior data of the info class.
As shown in figure 5, in one embodiment, it, can be by being fallen to query text when obtaining candidate entity Any one or more mode of row's index, name Entity recognition and term weight sequence obtains candidate entity.The row's of falling rope Drawing is the indexed mode that document is mapped to by word.The name Entity recognition is the proprietary noun of Direct Recognition, such as: people Name, place name etc..Vocabulary (term) weight sequencing is to be ranked up after being segmented query text according to importance.Than Such as, if the user retrieval behavior data obtained are " birthday and birthplace in library ", it can be obtained and be worked as according to aforesaid way Preceding candidate entity is " in library ", " birthday ", " birthplace ".
S152: using the similarity calculation constructed based on deep neural network, candidate entity and the inquiry are calculated Semantic similarity between text.
The similarity calculation first obtains the semantic expressiveness of query text, while also obtaining candidate focus semanteme table Show, i.e., after the learning training of model, the calculating of semantic vector is carried out for query text and candidate focus, is then counted again Calculate the similarity of the two.Such as: calculate above-mentioned candidate entity " in library ", " birthday ", " birthplace " semantic vector, then calculate The semantic vector of query text " birthday and birthplace in library " then calculates separately each candidate entity and query text again Similarity.For example, calculating the similarity in " in library " and " birthday and birthplace in library ".
S153: entity class focus is matched from the query text according to the semantic similarity.
And wherein, its weight of different words is of different sizes during model learning.It therefore, will after calculating The highest candidate entity of similarity is as entity class focus.For example, the entity that may be exported is " library after above-mentioned matching In ", because can generally be greater than using " name " as the probability of entity focus using other common nouns as entity focus Probability.
As shown in fig. 6, in one embodiment, the method also includes:
S160: if the user retrieval behavior data are user, history in the set time period retrieves behavioral data, Theme class focus in the set period of time is polymerize, and, the entity class in the set period of time is paid close attention to Point and its association focus are polymerize.
When the history retrieval behavioral data using user carries out the extraction of focus, to the theme class focus being drawn into Polymerization processing is carried out with the focus of entity class, that is, will acquire the set of the focus of user during this period of time.
S170: if it is real-time retrieval behavioral data that the user retrieval behavior data, which are user, according to the excavation The weight of theme class focus and entity class focus out updates the current concerns of the user.
In the present embodiment, using excavate obtain focus when, the corresponding weight of available each focus.Its In, the weight of each focus can be calculated according to the extraction frequency of focus.
Each theme class focus of user and its corresponding is obtained firstly, excavating from the history of user retrieval behavioral data Weight;Each entity class focus for obtaining user and its corresponding weight are excavated from the history of user retrieval behavioral data.Afterwards It is continuous, each theme class focus for obtaining user and its corresponding weight are excavated from the real-time retrieval behavioral data of user, from Each entity focus for obtaining user and its corresponding weight are excavated in the real-time retrieval behavioral data at family.If from the reality of user When retrieval behavioral data in obtain real-time focus and its corresponding weight and the history obtained in the history retrieval behavioral data Focus and its corresponding weighted can use each real-time focus and its corresponding weight, to each history focus and Its corresponding weight is updated.And it is possible to be updated according to the update condition of setting.For example, according to some cycles into Row updates, or is updated according to the variation size of weight.
For example, if some history focus is identical with some current concerns, but weighted, history can be closed The weight of note point is updated to the weight of this current concerns.
For another example, unduplicated each current concerns and history focus can be ranked up according to the size of weight, it will Forward focus sort as main focus.And if the quantity of focus be more than given threshold when, sequence can be leaned on Concern point deletion afterwards.
The embodiment of the present invention is by excavating theme class focus, the focus that available user is long-term, extensive;Pass through digging Entity class focus is dug, can excavate that user is short-term, specific focus;Expansion processing is carried out to the entity focus of user, Be conducive to the entity focus excavated more comprehensively.Therefore, the focus excavated is more in line with the true interest of user and more Add comprehensively, is conducive to provide suitable recommendation for user.
Further, by the offline Mining Interesting point from history retrieval behavioral data in advance, then real-time retrieval row is used Online updating is carried out for data, the convergence rate for being cold-started user's focus can be accelerated.
As shown in fig. 7, in another embodiment, the present invention also provides a kind of user's focus excavating gears, comprising:
Module 110 is obtained, for obtaining user retrieval behavior data;
Expand processing module 120, if for both having excavated theme class focus in the user retrieval behavior data, Entity class focus is excavated again, then expansion processing is carried out to the entity class focus, obtains the entity class focus It is associated with focus.
As shown in figure 8, described device further include:
Intention assessment module 130 is retrieved, for using deep neural network, convolutional neural networks, Recognition with Recurrent Neural Network, length At least one of short-term memory network mode establishes retrieval intention assessment model, to identify the user retrieval behavior data In retrieval be intended to;The retrieval is intended to include navigation type, info class, transactions classes.
Data acquisition module 140 is retrieved, for using the retrieval intention assessment model in the user retrieval behavior number According to the middle user retrieval behavior data for obtaining retrieval and being intended to info class.
Focus excavates module 150, for excavating the theme class in the user retrieval behavior data of the info class Focus and/or the entity class focus.
The user retrieval behavior data that the retrieval is intended to info class include query text, click title, show title With at least one in clickthrough, then the focus excavates module 150 when carrying out the excavation of theme class focus, for adopting With the theme class prediction model constructed based on deep neural network, according to the theme system label of setting, the query text, Title is clicked, shows and matches theme class focus in title and clickthrough.
As shown in figure 9, the focus excavates module 150 when obtaining entity class focus, may include:
Candidate entity acquisition submodule 151, it is candidate real for being obtained from the user retrieval behavior data of the info class Body.
Similarity calculation submodule 152, for calculating using the similarity calculation constructed based on deep neural network Semantic similarity between candidate entity and the query text.
Matched sub-block 153, for matching entity class concern from the query text according to the semantic similarity Point.
Wherein, the entity acquisition submodule 131 of being selected is specifically for the user retrieval behavior data to the info class In query text carry out inverted index, name Entity recognition and term weight sequence, obtain candidate entity.
As shown in Figure 10, in one embodiment, user's focus excavating gear further include:
Aggregation module 160, if being the history inspection of user in the set time period for the user retrieval behavior data Rope behavioral data polymerize the theme class focus in the set period of time, and, in the set period of time Entity class focus and its association focus are polymerize.
Update module 170, if being user for the user retrieval behavior data is real-time retrieval behavioral data, root The current concerns of the user are updated according to the weight of the theme class focus excavated and entity class focus.
The function of each module of the present embodiment device is similar with the principle of user's focus method for digging of above-described embodiment, So it will not be repeated.
It in one embodiment, as shown in figure 11, is the user based on retrieval behavior of the embodiment of the present invention provided Focus digging system architecture diagram.The system mainly may include:
1) thermal starting module.The input of the module is user's history retrieval behavioral data interior for a period of time, is exported as this The history focus (including two theme, entity classifications) of user during this period of time.In this, as the heat for being directed to each user Boot Model.
2) (update) computing module is updated on line.The input of the module is that user retrieves behavioral data (usually in real time It is the information of one query or a segment (session)), it exports as the real-time focus of the user.It is arrived using real-time excavation The focus of user tune power (weight for adjusting some focus) can be carried out to the user model that thermal starting module learns Or the operation such as withdraw from the arena is carried out to the focus of failure.
Each module includes two submodules of entity focus and theme focus.Each submodule includes retrieval (query) it is intended to analysis, focus extracts (type of theme, entity type).Wherein, focus is taken out in entity A TT submodule After taking, also progress focus rewriting (such as expansion processing).It is main to be analyzed by text subject in theme focus submodule Extract focus.In addition, retrieving behavioral data for history, it is also necessary to gather respectively to the focus that each submodule obtains It closes.The correlation that the specific implementation of each module may refer to above-described embodiment user's focus method for digging embodiment is retouched It states, details are not described herein.
In one embodiment, the present invention also provides a kind of user's focus excavating equipments, as shown in figure 12, the equipment packet Include: memory 510 and processor 520 are stored with the computer program that can be run on processor 520 in memory 510.It is described Processor 520 realizes user's focus method for digging in above-described embodiment when executing the computer program.The memory 510 and processor 520 quantity can for one or more.
The equipment further include:
Communication interface 530 carries out data interaction for being communicated with external device.
Memory 510 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
If memory 510, processor 520 and the independent realization of communication interface 530, memory 510,520 and of processor Communication interface 530 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component) bus or extended industry-standard architecture (EISA, Extended Industry Standard Component) bus etc..The bus can be divided into address bus, data/address bus, control bus etc..For convenient for expression, Figure 12 In only indicated with a thick line, it is not intended that an only bus or a type of bus.
Optionally, in specific implementation, if memory 510, processor 520 and communication interface 530 are integrated in one piece of core On piece, then memory 510, processor 520 and communication interface 530 can complete mutual communication by internal interface.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.
Computer-readable medium described in the embodiment of the present invention can be computer-readable signal media or computer can Read storage medium either the two any combination.The more specific example of computer readable storage medium is at least (non-poor Property list to the greatest extent) include the following: there is the electrical connection section (electronic device) of one or more wirings, portable computer diskette box (magnetic Device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash Memory), fiber device and portable read-only memory (CDROM).In addition, computer readable storage medium even can be with It is the paper or other suitable media that can print described program on it, because can be for example by paper or the progress of other media Optical scanner is then edited, interpreted or is handled when necessary with other suitable methods and is described electronically to obtain Program is then stored in computer storage.
In embodiments of the present invention, computer-readable signal media may include in a base band or as carrier wave a part The data-signal of propagation, wherein carrying computer-readable program code.The data-signal of this propagation can use a variety of Form, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media is also It can be any computer-readable medium other than computer readable storage medium, which can send, pass It broadcasts or transmits for instruction execution system, input method or device use or program in connection.Computer can The program code for reading to include on medium can transmit with any suitable medium, including but not limited to: wirelessly, electric wire, optical cable, penetrate Frequently (Radio Frequency, RF) etc. or above-mentioned any appropriate combination.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of column technology or their combination are realized: having a logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims (14)

1. a kind of user's focus method for digging characterized by comprising
Obtain user retrieval behavior data;
If not only having excavated theme class focus in the user retrieval behavior data, but also entity class focus is excavated, then Expansion processing is carried out to the entity class focus, obtains the association focus of the entity class focus.
2. the method according to claim 1, wherein the method also includes:
Using at least one of deep neural network, convolutional neural networks, Recognition with Recurrent Neural Network, shot and long term memory network mode Retrieval intention assessment model is established, to identify that the retrieval in the user retrieval behavior data is intended to;The retrieval is intended to packet Include navigation type, info class, transactions classes;
Correspondingly, the method also includes:
The use that retrieval is intended to info class is obtained in the user retrieval behavior data using the retrieval intention assessment model Retrieve behavioral data in family;
Correspondingly, the method also includes:
The theme class focus and/or entity class concern are excavated in the user retrieval behavior data of the info class Point.
3. according to the method described in claim 2, it is characterized in that, described retrieve the user retrieval behavior number for being intended to info class According to include query text, click title, show in title and clickthrough at least one of, then the number in the info class The theme class focus is excavated according to middle, comprising:
It is looked into according to the theme system label of setting described using the theme class prediction model constructed based on deep neural network Ask text, click title, show title and clickthrough in match theme class focus.
4. according to the method described in claim 2, it is characterized in that, described retrieve the user retrieval behavior number for being intended to info class According to include query text, click title, show in title and clickthrough at least one of, then the use in the info class The entity class focus is excavated in family retrieval behavioral data, comprising:
Candidate entity is obtained from the user retrieval behavior data of the info class;
Using the similarity calculation constructed based on deep neural network, calculate between candidate entity and the query text Semantic similarity;
Entity class focus is matched from the query text according to the semantic similarity.
5. according to the method described in claim 4, it is characterized in that, in the user retrieval behavior data from the info class Obtain candidate entity, comprising:
Inverted index, name Entity recognition and vocabulary are carried out to the query text in the user retrieval behavior data of the info class Weight sequencing obtains candidate entity.
6. method according to any one of claims 1-5, which is characterized in that the method also includes:
If the user retrieval behavior data are user, history in the set time period retrieves behavioral data, to the setting Theme class focus in period is polymerize, and, in the set period of time entity class focus and its association Focus is polymerize;
If the user retrieval behavior data are real-time retrieval behavioral datas for user's, according to the theme class excavated The weight of focus and entity class focus updates the current concerns of the user.
7. a kind of user's focus excavating gear characterized by comprising
Module is obtained, for obtaining user retrieval behavior data;
Expand processing module, if for not only having excavated theme class focus in the user retrieval behavior data, but also excavate Entity class focus out then carries out expansion processing to the entity class focus, and the association for obtaining the entity class focus is closed Note point.
8. device according to claim 7, which is characterized in that described device further include:
Intention assessment module is retrieved, for using deep neural network, convolutional neural networks, Recognition with Recurrent Neural Network, shot and long term note Recall at least one of network mode and establish retrieval intention assessment model, to identify the inspection in the user retrieval behavior data Suo Yitu;The retrieval is intended to include navigation type, info class, transactions classes;
Data acquisition module is retrieved, for obtaining in the user retrieval behavior data using the retrieval intention assessment model Retrieval is intended to the user retrieval behavior data of info class;
Focus excavates module, for excavating the theme class focus in the user retrieval behavior data of the info class And/or the entity class focus.
9. device according to claim 8, which is characterized in that described to retrieve the user retrieval behavior number for being intended to info class According to include query text, click title, show in title and clickthrough at least one of, then the focus excavates module and exists When carrying out the excavation of theme class focus, for using the theme class prediction model constructed based on deep neural network, according to setting Theme system label, the query text, click title, show title and clickthrough in match theme class concern Point.
10. device according to claim 8, which is characterized in that described to retrieve the user retrieval behavior for being intended to info class Data include query text, click title, show at least one in title and clickthrough, then the focus excavates module Include:
Candidate entity acquisition submodule, for obtaining candidate entity from the user retrieval behavior data of the info class;
Similarity calculation submodule, for calculating candidate real using the similarity calculation constructed based on deep neural network Semantic similarity between body and the query text;
Matched sub-block, for matching entity class focus from the query text according to the semantic similarity.
11. device according to claim 10, which is characterized in that the entity acquisition submodule of being selected is specifically used for institute It states the query text in the user retrieval behavior data of info class and carries out inverted index, name Entity recognition and term weight row Sequence obtains candidate entity.
12. according to the described in any item devices of claim 7-11, which is characterized in that described device further include:
Aggregation module, if being that the history of user in the set time period retrieves behavior number for the user retrieval behavior data According to, the theme class focus in the set period of time is polymerize, and, the entity class in the set period of time is closed Note point and its association focus are polymerize;
Update module, if being user for the user retrieval behavior data is real-time retrieval behavioral data, according to described The weight of the theme class focus and entity class focus excavated updates the current concerns of the user.
13. a kind of user's focus excavating equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors Realize such as user's focus method for digging as claimed in any one of claims 1 to 6.
14. a kind of computer-readable medium, is stored with computer program, which is characterized in that when the program is executed by processor Realize such as user's focus method for digging as claimed in any one of claims 1 to 6.
CN201810712526.0A 2018-06-29 2018-06-29 User focus mining method, device, equipment and computer readable medium Active CN108959550B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810712526.0A CN108959550B (en) 2018-06-29 2018-06-29 User focus mining method, device, equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810712526.0A CN108959550B (en) 2018-06-29 2018-06-29 User focus mining method, device, equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN108959550A true CN108959550A (en) 2018-12-07
CN108959550B CN108959550B (en) 2022-03-25

Family

ID=64484882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810712526.0A Active CN108959550B (en) 2018-06-29 2018-06-29 User focus mining method, device, equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN108959550B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222341A (en) * 2019-06-10 2019-09-10 北京百度网讯科技有限公司 Text analyzing method and device
CN111639234A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Method and device for mining core entity interest points
CN112905741A (en) * 2021-02-08 2021-06-04 合肥供水集团有限公司 Water supply user focus mining method considering space-time characteristics
CN113792149A (en) * 2021-11-15 2021-12-14 北京博瑞彤芸科技股份有限公司 Method and device for generating customer acquisition scheme based on user attention analysis

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030036848A1 (en) * 2001-08-16 2003-02-20 Sheha Michael A. Point of interest spatial rating search method and system
US20100318412A1 (en) * 2009-06-10 2010-12-16 Nxn Tech, Llc Method and system for real-time location and inquiry based information delivery
US8775355B2 (en) * 2010-12-20 2014-07-08 Yahoo! Inc. Dynamic online communities
CN103970858A (en) * 2014-05-07 2014-08-06 百度在线网络技术(北京)有限公司 Recommended content determining system and method
CN104298683A (en) * 2013-07-18 2015-01-21 佳能株式会社 Theme digging method and equipment and query expansion method and equipment
CN105243136A (en) * 2015-09-30 2016-01-13 北京奇虎科技有限公司 Method and apparatus for mining point of interest (POI) data in internet
CN105488196A (en) * 2015-12-07 2016-04-13 中国人民大学 Automatic hot topic mining system based on internet corpora
CN105677873A (en) * 2016-01-11 2016-06-15 中国电子科技集团公司第十研究所 Text information associating and clustering collecting processing method based on domain knowledge model
CN106168947A (en) * 2016-07-01 2016-11-30 北京奇虎科技有限公司 A kind of related entities method for digging and system
CN107590235A (en) * 2017-09-08 2018-01-16 成都掌中全景信息技术有限公司 A kind of information association searches for recommendation method
CN107766449A (en) * 2017-09-26 2018-03-06 杭州云赢网络科技有限公司 Focus method for digging, device, electronic equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030036848A1 (en) * 2001-08-16 2003-02-20 Sheha Michael A. Point of interest spatial rating search method and system
US20100318412A1 (en) * 2009-06-10 2010-12-16 Nxn Tech, Llc Method and system for real-time location and inquiry based information delivery
US8775355B2 (en) * 2010-12-20 2014-07-08 Yahoo! Inc. Dynamic online communities
CN104298683A (en) * 2013-07-18 2015-01-21 佳能株式会社 Theme digging method and equipment and query expansion method and equipment
CN103970858A (en) * 2014-05-07 2014-08-06 百度在线网络技术(北京)有限公司 Recommended content determining system and method
CN105243136A (en) * 2015-09-30 2016-01-13 北京奇虎科技有限公司 Method and apparatus for mining point of interest (POI) data in internet
CN105488196A (en) * 2015-12-07 2016-04-13 中国人民大学 Automatic hot topic mining system based on internet corpora
CN105677873A (en) * 2016-01-11 2016-06-15 中国电子科技集团公司第十研究所 Text information associating and clustering collecting processing method based on domain knowledge model
CN106168947A (en) * 2016-07-01 2016-11-30 北京奇虎科技有限公司 A kind of related entities method for digging and system
CN107590235A (en) * 2017-09-08 2018-01-16 成都掌中全景信息技术有限公司 A kind of information association searches for recommendation method
CN107766449A (en) * 2017-09-26 2018-03-06 杭州云赢网络科技有限公司 Focus method for digging, device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KALYANI R. POLE等: ""Improvised fuzzy clustering using name entity recognition and natural language processing"", 《2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM)》 *
翟海军: ""面向Web信息检索的知识挖掘"", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222341A (en) * 2019-06-10 2019-09-10 北京百度网讯科技有限公司 Text analyzing method and device
CN111639234A (en) * 2020-05-29 2020-09-08 北京百度网讯科技有限公司 Method and device for mining core entity interest points
CN111639234B (en) * 2020-05-29 2023-06-27 北京百度网讯科技有限公司 Method and device for mining core entity attention points
CN112905741A (en) * 2021-02-08 2021-06-04 合肥供水集团有限公司 Water supply user focus mining method considering space-time characteristics
CN113792149A (en) * 2021-11-15 2021-12-14 北京博瑞彤芸科技股份有限公司 Method and device for generating customer acquisition scheme based on user attention analysis

Also Published As

Publication number Publication date
CN108959550B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
Liu Python machine learning by example
KR101778679B1 (en) Method and system for classifying data consisting of multiple attribues represented by sequences of text words or symbols using deep learning
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN110807150B (en) Information processing method and device, electronic equipment and computer readable storage medium
CN108829822A (en) The recommended method and device of media content, storage medium, electronic device
CN108959550A (en) User's focus method for digging, device, equipment and computer-readable medium
CN105468596B (en) Picture retrieval method and device
CN107480158A (en) The method and system of the matching of content item and image is assessed based on similarity score
CN110633407B (en) Information retrieval method, device, equipment and computer readable medium
US8825620B1 (en) Behavioral word segmentation for use in processing search queries
CN110413888B (en) Book recommendation method and device
CN108154198A (en) Knowledge base entity normalizing method, system, terminal and computer readable storage medium
CN110162594A (en) Viewpoint generation method, device and the electronic equipment of text data
CN111325030A (en) Text label construction method and device, computer equipment and storage medium
CN116108267A (en) Recommendation method and related equipment
CN108304381B (en) Entity edge establishing method, device and equipment based on artificial intelligence and storage medium
CN114281976A (en) Model training method and device, electronic equipment and storage medium
CN110110218A (en) A kind of Identity Association method and terminal
Prasanth et al. Effective big data retrieval using deep learning modified neural networks
CN113569018A (en) Question and answer pair mining method and device
US20230351473A1 (en) Apparatus and method for providing user's interior style analysis model on basis of sns text
CN108804491A (en) item recommendation method, device, computing device and storage medium
CN115168568B (en) Data content identification method, device and storage medium
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN105095385B (en) A kind of output method and device of retrieval result

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant