CN109672706B - Information recommendation method and device, server and storage medium - Google Patents

Information recommendation method and device, server and storage medium Download PDF

Info

Publication number
CN109672706B
CN109672706B CN201710960175.0A CN201710960175A CN109672706B CN 109672706 B CN109672706 B CN 109672706B CN 201710960175 A CN201710960175 A CN 201710960175A CN 109672706 B CN109672706 B CN 109672706B
Authority
CN
China
Prior art keywords
knowledge point
chain
target document
word
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710960175.0A
Other languages
Chinese (zh)
Other versions
CN109672706A (en
Inventor
许瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201710960175.0A priority Critical patent/CN109672706B/en
Publication of CN109672706A publication Critical patent/CN109672706A/en
Application granted granted Critical
Publication of CN109672706B publication Critical patent/CN109672706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services

Abstract

The embodiment of the invention discloses an information recommendation method, an information recommendation device, a server and a storage medium. The method comprises the following steps: determining the relevance weight of the target document and each character chain knowledge point according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document; determining the word chain knowledge points to be recommended of the target document according to the relevance weight of the target document and each word chain knowledge point; and pushing information according to the word chain knowledge points to be recommended. According to the technical scheme provided by the embodiment of the invention, the accuracy of information recommendation is improved, the viscosity between the user and the pushed information is improved, and the user experience is further improved.

Description

Information recommendation method and device, server and storage medium
Technical Field
The invention relates to the technical field of internet application, in particular to an information recommendation method, an information recommendation device, a server and a storage medium.
Background
In recent years, with the development of internet wave, the internet is full of various massive content information, and how to present better content to users and enable users to find desired content is one of the problems to be urgently solved at present.
Currently, common recommendation methods are: collaborative filtering, which is a classic recommendation algorithm, judges what commodity should be recommended by a user through preference data of the user and the commodity; latent factor analysis is a classical derivative of the recommendation system. These two common recommendation methods have had great success in the field of internet advertising.
However, collaborative filtering often presents cold start problems due to lack of user interaction behavior data, such as: when a new user in the system does not have any browsing or purchasing record, the characteristics of the new user cannot be described, and further, recommended article matching cannot be carried out. And the potential factor analysis model is complex, the calculation performance requirement is high, the decomposed factors are abstract vectors and are unreadable, and the requirement of fast iteration of internet products cannot be met. In addition, the method of entity mining based construction of word chains is inefficient.
Disclosure of Invention
The embodiment of the invention provides an information recommendation method, an information recommendation device, a server and a storage medium, which can improve the accuracy of information recommendation, improve the viscosity between a user and push information and improve the user experience.
In a first aspect, an embodiment of the present invention provides an information recommendation method, where the method includes:
determining the relevance weight of the target document and each character chain knowledge point according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document;
determining the word chain knowledge points to be recommended of the target document according to the relevance weight of the target document and each word chain knowledge point;
and pushing information according to the word chain knowledge points to be recommended.
In a second aspect, an embodiment of the present invention further provides an information recommendation apparatus, where the apparatus includes:
the relevance weight determining module is used for determining the relevance weight of the target document and each character chain knowledge point according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document;
the word chain to be recommended determining module is used for determining word chain knowledge points to be recommended of the target document according to the relevance weight of the target document and each word chain knowledge point;
and the information pushing module is used for pushing information according to the word chain knowledge points to be recommended.
In a third aspect, an embodiment of the present invention further provides a server, where the server includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the information recommendation method of any of the first aspects.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the information recommendation method according to any one of the first aspects.
According to the information recommendation method, the information recommendation device, the server and the storage medium, the relevance weight of the target document and each character chain knowledge point is determined according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document, the character chain knowledge point to be recommended of the target document is determined according to the relevance weight, and the character chain knowledge point to be recommended is pushed to a user. The problems of cold start, complex weight determination model, low text chain building efficiency and the like in the conventional recommendation method are solved, the information recommendation accuracy is improved, the viscosity between a user and the pushed information is improved, and the user experience is further improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
fig. 1 is a flowchart of an information recommendation method according to a first embodiment of the present invention;
fig. 2 is a flowchart of an information recommendation method provided in the second embodiment of the present invention;
fig. 3 is a flowchart of a method for determining a knowledge point of a text chain included in a document based on NLP and a knowledge graph according to a third embodiment of the present invention;
fig. 4 is a block diagram of an information recommendation apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server provided in the fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings.
Example one
Fig. 1 is a flowchart of an information recommendation method according to an embodiment of the present invention, which is based on a case that an NLP and an application text chain of a knowledge graph on a library product line are used to help a user find a desired document and content. The method can be executed by the information recommendation device/server/computer readable storage medium provided by the embodiment of the invention, and the device/server/computer readable storage medium can be implemented in a software and/or hardware manner. Referring to fig. 1, the method specifically includes:
s110, determining the relevance weight of the target document and each character chain knowledge point according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document.
Among them, NLP (Natural Language Processing) is a sub-field in the field of Artificial Intelligence (AI), and is used to research various theories and methods that can realize effective communication between a person and a computer using Natural Language. The knowledge graph is a semantic network which takes entities and concepts as nodes and takes semantic relations as edges, and is essentially a semantic network formed by mutually connecting knowledge points. Generally, a knowledge graph is a relational network obtained by connecting all different kinds of information together. Knowledge-graphs provide the ability to analyze problems from a "relational" perspective. Knowledge-graph makes knowledge acquisition more direct.
The word chain knowledge points are selected from all knowledge points contained in the document to be used as the knowledge points of the word chain. The word chain is used for solving the requirement that a user can further interpret the knowledge which the user wants to know in the document content. The word chain can improve the distribution efficiency of the content, and a user can enter the next webpage by clicking the word chain, wherein the webpage comprises documents related to the knowledge points of the word chain and paraphrases of the knowledge. A chain of words that are greenish may be derived from an entity in the educational knowledge map. If one word chain knowledge point and the other word chain knowledge point appear in the same document, the two word chain knowledge points are the co-occurrence word chain knowledge points. The Inverse Document Frequency (IDF) is a measure of the general importance of a term, and the IDF of a particular term can be obtained by dividing the total number of documents by the number of documents containing the term and taking the logarithm of the obtained quotient. The target document may be the kth document of all documents, for example, one document selected from ten million documents as the target document. And S120, determining the word chain knowledge points to be recommended of the target document according to the relevance weight of the target document and each word chain knowledge point.
The word chain knowledge points to be recommended are word chain knowledge points capable of expressing more and potential intentions which the user wants to know. And processing the obtained relevance weight of the target document and each character chain knowledge point, for example, sequencing the relevance weight, screening out character chain knowledge points capable of expressing the intention of the user according to a sequencing result, and serving as character chain knowledge points to be recommended to push to the user.
Illustratively, S120 may specifically include: sequencing the text chain knowledge points according to the relevance weight of the target document and the text chain knowledge points; filtering out text chain knowledge points contained in the target document from the sequencing result; and selecting a preset number of character chain knowledge points with high relevance weight as character chain knowledge points to be recommended of the target document according to the filtering result. The preset value may be set according to needs, for example, may be 6. In order to know the potential requirements of the user, the word chain knowledge points which are already contained in the target document are removed, and then the rest word chain knowledge points are selected to be word chain knowledge points to be recommended, wherein the word chain knowledge points have higher weights, such as the first 6 word chain knowledge points, according to the sequence of the weights from the higher value to the lower value.
And S130, carrying out information push according to the word chain knowledge points to be recommended.
The information pushing mode may be displaying on the right side of the search result display area. Specifically, the content related to the word chain knowledge point to be recommended, such as pictures, news or articles, can be found in the knowledge graph or the library according to the word chain knowledge point to be recommended, and the content is displayed on the right side of the search result display area so as to be recommended to the user.
According to the information recommendation method provided by the embodiment of the invention, the relevance weight of the target document and each character chain knowledge point is determined according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document, the character chain knowledge point to be recommended of the target document is determined according to the relevance weight, and the character chain knowledge point to be recommended is pushed to a user. The problems of cold start, complex weight determination model, low text chain building efficiency and the like in the conventional recommendation method are solved, the information recommendation accuracy is improved, the viscosity between a user and the pushed information is improved, and the user experience is further improved.
Example two
Fig. 2 is a flowchart of an information recommendation method according to a second embodiment of the present invention, which is based on the first embodiment of the present invention and further provides a method for determining a relevance weight between a target document and each text chain knowledge point according to a co-occurrence text chain knowledge point of each text chain knowledge point in all documents, a reverse file frequency of each text chain knowledge point in all documents, and a text chain knowledge point included in the target document. Correspondingly, the method comprises the following steps:
s210, constructing a co-occurrence matrix according to the co-occurrence character chain knowledge points of the character chain knowledge points in all the documents.
If one word chain knowledge point and the other word chain knowledge point occur together in one document, the two word chain knowledge points are the co-occurrence word chain knowledge points. Taking the existing ten-million documents as an example, the co-occurrence matrix G can be obtained by counting the co-occurrence text chains in the ten-million documents for each text chain knowledge point.
For example, constructing the co-occurrence matrix according to the co-occurrence word chain knowledge points of the word chain knowledge points in all the documents may include: counting all the documents to determine co-occurrence text chain knowledge points of the text chain knowledge points; if the jth word chain knowledge point and the kth word chain knowledge point appear in a document together, G in the co-occurrence matrixjkTaking 1; otherwise, GjkTake 0, where j is 1, …, N, k is 1, …, N is all literal chain knowledge pointsThe number of the cells.
For example, taking N to 130 ten thousand, according to the above rule, a 130 x 130 ten thousand co-occurrence matrix G can be obtained, and can be expressed as:
Figure BDA0001435059110000061
s220, determining a reverse file vector according to the reverse file frequency of each character chain knowledge point in all the documents.
The inverse file frequency IDF of a word chain knowledge point can be obtained by dividing the total number of all documents by the number of documents containing the word chain knowledge point, and the inverse file vector is a vector formed by the inverse file frequency IDF of all word chain knowledge points and can be expressed as
Figure BDA0001435059110000071
For example, the inverse document vector may be determined according to the following formula
Figure BDA0001435059110000072
Middle IDFn1
Figure BDA0001435059110000073
Where M is the total number of all documents, EnThe number of documents containing the nth word chain knowledge point is shown, N is 1, …, and N is the number of all word chain knowledge points. For N130 ten thousand, a 130 ten thousand × 1 vector can be obtained. For example, it may be:
Figure BDA0001435059110000074
s230, determining a word chain correlation matrix of the target document according to the word chain knowledge points contained in the target document.
Illustratively, the determination may be made according to the following stepsThe text chain correlation matrix of the target document specifically comprises: if the target document contains the ith word chain knowledge point, X in the word chain correlation matrix of the target document1iTaking 1; otherwise, X1iTake 0, where i ═ 1, …, N, is the number of knowledge points for all literal chains. For example, if there are ten million documents with 130 ten thousand word chain knowledge points, the word chain correlation matrix X may be a 1000 ten thousand by 130 ten thousand matrix:
Figure BDA0001435059110000081
then the Kth document in ten million documents is taken as the target document, and the character chain correlation matrix XKCan be expressed as:
XK=[0010000......01]
s240, determining the relevance vector of the target document and each character chain knowledge point according to the character chain relevance matrix, the co-occurrence matrix and the reverse file vector of the target document.
For example, the relevance vector of the target document and each word chain knowledge point can be determined according to the following formula:
Figure BDA0001435059110000082
wherein the content of the first and second substances,
Figure BDA0001435059110000083
is the correlation vector of the target document and each word chain knowledge point, X is the word chain correlation matrix of the target document, G is the co-occurrence matrix,
Figure BDA0001435059110000084
is an inverse file vector. Where, denotes the dot product of two matrices, and x denotes the product of two vectors (cross product).
When the Kth document is taken as the target document, the steps S210, S220 and S230 are executed to obtain the co-occurrence matrix G and the reverse file vector
Figure BDA0001435059110000085
And a correlation matrix XKSubstituting it into the above formula to obtain a correlation vector
Figure BDA0001435059110000086
Take N130 ten thousand to order XKDot product with G matrix to obtain matrix X/ KPost-processing into vectors
Figure BDA0001435059110000087
The specific operation process is as follows:
first, matrix XKDot product with matrix G yields matrix X/ KNamely: x/ K=XK·G
Figure BDA0001435059110000088
Then, to facilitate subsequent vector cross multiplication, matrix X is multiplied/ KConverted into a column vector
Figure BDA0001435059110000091
After transformation:
Figure BDA0001435059110000092
finally, by the product of two vectors, i.e.
Figure BDA0001435059110000093
The specific calculation process is as follows:
Figure BDA0001435059110000094
and S250, determining the relevance weight of the target document and each character chain knowledge point according to the relevance vector of the target document and each character chain knowledge point.
The relevance vector between the target document and each word chain knowledge point is obtained in step S240
Figure BDA0001435059110000095
Each value in (a) represents a relevance weight of the target document to each word chain knowledge point.
And S260, determining the word chain knowledge points to be recommended of the target document according to the relevance weight of the target document and each word chain knowledge point.
And S270, carrying out information push according to the word chain knowledge points to be recommended.
According to the information recommendation method provided by the embodiment of the invention, the relevance vector of the target document and each character chain knowledge point is determined according to the character chain relevance matrix, the co-occurrence matrix and the reverse file vector of the target document, so that the relevance weight of the target document and each character chain knowledge point is obtained, the character chain knowledge point to be recommended of the target document is determined according to the relevance weight, and the character chain knowledge point to be recommended of the target document is pushed to a user. The problem of current weight determination model complicacy is solved, the degree of accuracy of information recommendation has been improved, the viscosity between user and the propelling movement information has been promoted, and then user experience has been promoted.
EXAMPLE III
Fig. 3 is a flowchart of a method for determining a word chain knowledge point included in a document based on NLP and a knowledge graph according to a third embodiment of the present invention, where the method is based on the above-mentioned embodiment of the present invention, and the specific method may include the following steps:
s310, taking the product of the word frequency of each knowledge point in the document and the reverse file frequency of each knowledge point contained in the document as the correlation degree of each knowledge point and the document.
Wherein, the term frequency refers to the number of times a given knowledge point is in the document, and can be represented by C (d, e), wherein d represents the document, e represents the knowledge point in the knowledge map, and the inverse file frequency of each knowledge point can be represented by IDFe. Relevancy refers to the percentage of two things that are related to each other and is commonly used to evaluate the importance of a word to one of a set of documents or a corpus, i.e., to one of the documents in a corpusThe degree of association between a given knowledge point and the document in which the knowledge point is located, and the mathematical formula for expressing the degree of association by the product of the word frequency and the inverse file frequency can be expressed as follows: c (d, e). times.IDFe
S320, determining the weight of each knowledge point according to the correlation degree of each knowledge point and the document, the information amount of each knowledge point and the similarity of each knowledge point and the document title.
The information amount refers to a measure of the amount of information, and the information amount of each knowledge point can be determined according to the following formula:
Ie=log2(len(e))
wherein e is a knowledge point, IeIs the amount of information of e, len (e) is the length of e.
The specific operation process for determining the similarity between each knowledge point and the document title comprises the following steps: extracting subject terms in the document titles, calculating semantic similarity between the subject terms and each knowledge point, and expressing the semantic similarity as follows by using mathematical symbols:
sim (t, e), where t represents the title of document d.
The weight of each knowledge point is the proportion value of each knowledge point in the document and is represented by Q (d, e). According to the correlation degree C (d, e) x IDF between each knowledge point and the documenteInformation amount of each knowledge point Ie=log2(len (e)) and the similarity sim (t, e) of each knowledge point and the document title determine an expression of each knowledge point weight Q (d, e) that can be expressed as:
Q(d,e)=α((C(d,e)×IDFe)+β(log2(len(e))+θsim(t,e))
wherein alpha, beta and theta are adjustment parameters, and document parameters in different fields are slightly different.
S330, screening the knowledge points contained in the document according to the weight of the knowledge points, and taking the screened knowledge points as character chain knowledge points.
Arranging the related weights of the knowledge points according to the weights of the knowledge points calculated in the step S320 from large to small, and screening the knowledge points in the preset range as character chain knowledge points. The preset range is user-defined according to actual conditions, for example, 10 or 15 is available, and the first 10 knowledge points with large weight are screened out as text chain knowledge points.
According to the method for determining the character chain knowledge points in the document based on the NLP and the knowledge graph, provided by the embodiment of the invention, the weight of each knowledge point is determined according to the correlation degree of each knowledge point and the document, the information content of each knowledge point and the similarity of each knowledge point and the title of the document; and screening the knowledge points contained in the document according to the weight of the knowledge points to determine the text chain knowledge points in the document. The problem that the efficiency of constructing the character chain is low in the conventional recommendation method is solved.
Example four
Fig. 4 is a block diagram of an information recommendation apparatus according to a fourth embodiment of the present invention, which is capable of executing an information recommendation method according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. As shown in fig. 4, the apparatus may include:
a relevance weight determining module 401, configured to determine a relevance weight of the target document and each text chain knowledge point according to a co-occurrence text chain knowledge point of each text chain knowledge point in all documents, a reverse file frequency of each text chain knowledge point in all documents, and a text chain knowledge point included in the target document;
a module 402 for determining a word chain to be recommended, configured to determine a word chain knowledge point to be recommended for the target document according to the relevance weight between the target document and each word chain knowledge point;
and an information pushing module 403, configured to push information according to the word chain knowledge point to be recommended.
According to the information recommendation device provided by the embodiment of the invention, the relevance weight of the target document and each character chain knowledge point is determined according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document, the character chain knowledge point to be recommended of the target document is determined according to the relevance weight, and the character chain knowledge point to be recommended of the target document is pushed to a user. The problems of cold start, complex weight determination model, low text chain building efficiency and the like in the conventional recommendation method are solved, the information recommendation accuracy is improved, the viscosity between a user and the pushed information is improved, and the user experience is further improved.
Illustratively, the relevance weight determining module 401 includes:
the co-occurrence matrix construction unit is used for constructing a co-occurrence matrix according to the co-occurrence character chain knowledge points of the character chain knowledge points in all the documents;
the reverse file vector determining unit is used for determining a reverse file vector according to the reverse file frequency of each character chain knowledge point in all the documents;
the correlation matrix determining unit is used for determining a word chain correlation matrix of the target document according to word chain knowledge points contained in the target document;
a relevance vector determining unit, configured to determine a relevance vector between the target document and each text chain knowledge point according to the text chain relevance matrix of the target document, the co-occurrence matrix, and the reverse file vector;
and the relevance weight determining unit is used for determining the relevance weight of the target document and each character chain knowledge point according to the relevance vector of the target document and each character chain knowledge point.
Optionally, the correlation matrix determining unit may be specifically configured to:
if the target document K contains the ith word chain knowledge point, the word chain correlation matrix of the target document
Figure BDA0001435059110000131
Taking 1; if not, then,
Figure BDA0001435059110000132
take 0, where i ═ 1, …, N, is the number of knowledge points for all literal chains.
Optionally, the co-occurrence matrix constructing unit may specifically be configured to:
counting all the documents to determine co-occurrence text chain knowledge points of the text chain knowledge points;
if the jth word chain knowledge point and the kth word chain knowledge point appear in a document together, G in the co-occurrence matrixjkTaking 1; otherwise, GjkTake 0 where j is 1, …, N, k is 1, …, N is the number of knowledge points of all word chains.
Optionally, the inverse file vector determining unit may be specifically configured to:
determining IDF in the reverse file vector according to the following formulan1
Figure BDA0001435059110000133
Where M is the total number of all documents, EnThe number of documents containing the nth word chain knowledge point is N, which is 1, …, N is the number of all word chain knowledge points.
Optionally, the correlation vector determination unit may be specifically configured to:
determining a relevance vector of the target document and each word chain knowledge point according to the word chain relevance matrix of the target document, the co-occurrence matrix and the reverse file vector, wherein the determining comprises the following steps:
determining a relevance vector of the target document and each word chain knowledge point according to the following formula:
Figure BDA0001435059110000134
wherein
Figure BDA0001435059110000135
Is the correlation vector, X, of the target document K and each word chain knowledge pointkIs the word chain dependency matrix of the target document K, G is the co-occurrence matrix,
Figure BDA0001435059110000136
is the inverse document vector.
For example, the module 402 for determining word chain to be recommended may include:
the word chain knowledge point sequencing unit is used for sequencing the word chain knowledge points according to the relevance weight of the target document and each word chain knowledge point;
the word chain knowledge point filtering unit is used for filtering word chain knowledge points contained in the target document from the sequencing result;
and the word chain to be recommended determining unit is used for selecting a preset number of word chain knowledge points with high relevance weight as the word chain knowledge points to be recommended of the target document according to the filtering result.
Optionally, the apparatus may further include a text chain knowledge point determining module; the word chain knowledge point determination module may be specifically configured to:
taking the product of the word frequency of each knowledge point in the document and the reverse file frequency of each knowledge point contained in the document as the correlation degree of each knowledge point and the document;
determining the weight of each knowledge point according to the correlation degree of each knowledge point and the document, the information content of each knowledge point and the similarity of each knowledge point and the title of the document;
and screening the knowledge points contained in the document according to the weight of the knowledge points, and taking the screened knowledge points as character chain knowledge points.
Optionally, the information amount of each knowledge point is determined according to the following formula:
Ie=log2(len(e))
wherein e is a knowledge point, IeIs the amount of information of e, len (e) is the length of e.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a server according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary server 12 suitable for use in implementing embodiments of the present invention. The server 12 shown in fig. 5 is only an example, and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in fig. 5, the server 12 is in the form of a general purpose computing device. The components of the server 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
The server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by server 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The server 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5 and commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The server 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with the device, and/or with any devices (e.g., network card, modem, etc.) that enable the server 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the server 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with the other modules of the server 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the server 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 16 executes various functional applications and data processing, for example, implementing an information recommendation method provided by an embodiment of the present invention, by executing a program stored in the system memory 28.
EXAMPLE six
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement any one of the information recommendation methods in the foregoing embodiments.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The above example numbers are for description only and do not represent the merits of the examples.
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (12)

1. An information recommendation method, comprising:
determining the relevance weight of the target document and each character chain knowledge point according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document;
determining the word chain knowledge points to be recommended of the target document according to the relevance weight of the target document and each word chain knowledge point;
carrying out information pushing according to the word chain knowledge points to be recommended, wherein the word chain knowledge points select a part from all knowledge points contained in the document as word chain knowledge points;
determining the relevance weight of the target document and each character chain knowledge point according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document, wherein the relevance weight comprises the following steps:
constructing a co-occurrence matrix according to the co-occurrence text chain knowledge points of the text chain knowledge points in all the documents;
determining a reverse file vector according to the reverse file frequency of each character chain knowledge point in all the documents;
determining a word chain correlation matrix of a target document according to word chain knowledge points contained in the target document;
determining a correlation vector of the target document and each word chain knowledge point according to the word chain correlation matrix of the target document, the co-occurrence matrix and the reverse file vector;
determining the relevance weight of the target document and each character chain knowledge point according to the relevance vector of the target document and each character chain knowledge point;
wherein, the determining the relevance vector of the target document and each word chain knowledge point according to the word chain relevance matrix of the target document, the co-occurrence matrix and the reverse file vector comprises:
determining a relevance vector of the target document and each word chain knowledge point according to the following formula:
Figure FDA0003568457430000021
wherein
Figure FDA0003568457430000022
Is the correlation vector of the target document and each word chain knowledge point, X is the word chain correlation matrix of the target document, G is the co-occurrence matrix,
Figure FDA0003568457430000023
is the inverse document vector.
2. The method of claim 1, wherein determining a word chain correlation matrix of a target document according to word chain knowledge points contained in the target document comprises:
if the target document contains the ith word chain knowledge point, X in the word chain correlation matrix of the target document1iTaking 1; otherwise, X1iTake 0, where i ═ 1, …, N, is the number of knowledge points for all literal chains.
3. The method of claim 1, wherein constructing a co-occurrence matrix from co-occurrence word chain knowledge points of each word chain knowledge point in all documents comprises:
counting all the documents to determine co-occurrence text chain knowledge points of the text chain knowledge points;
if the jth word chain knowledge point and the kth word chain knowledge point appear in a document together, G in the co-occurrence matrixjkTaking 1; otherwise, GjkTake 0 where j is 1, …, N, k is 1, …, N is the number of knowledge points of all word chains.
4. The method of claim 1, wherein determining a reverse document vector based on the reverse document frequency of each word chain knowledge point in all documents comprises:
determining IDF in the reverse file vector according to the following formulan1
Figure FDA0003568457430000024
Where M is the total number of all documents, EnThe number of documents containing the nth word chain knowledge point is shown, N is 1, …, and N is the number of all word chain knowledge points.
5. The method of claim 1, wherein determining the word chain knowledge points to be recommended for the target document according to the relevance weight of the target document and each word chain knowledge point comprises:
sequencing the knowledge points of each character chain according to the relevance weight of the target document and the knowledge points of each character chain;
filtering out word chain knowledge points contained in the target document from the sequencing result;
and selecting a preset number of character chain knowledge points with high relevance weight as character chain knowledge points to be recommended of the target document according to the filtering result.
6. The method of claim 1, wherein determining word-chain knowledge points contained in the document comprises:
taking the product of the word frequency of each knowledge point in the document and the reverse file frequency of each knowledge point contained in the document as the correlation degree of each knowledge point and the document;
determining the weight of each knowledge point according to the correlation degree of each knowledge point and the document, the information content of each knowledge point and the similarity of each knowledge point and the title of the document;
and screening the knowledge points contained in the document according to the weight of the knowledge points, and taking the screened knowledge points as character chain knowledge points.
7. The method of claim 6, comprising:
the information amount of each knowledge point is determined according to the following formula:
Ie=log2(len(e))
wherein e is a knowledge point, IeIs the amount of information of e, len (e) is the length of e.
8. An information recommendation apparatus, comprising:
the relevance weight determining module is used for determining the relevance weight of the target document and each character chain knowledge point according to the co-occurrence character chain knowledge point of each character chain knowledge point in all documents, the reverse file frequency of each character chain knowledge point in all documents and the character chain knowledge point contained in the target document;
the word chain to be recommended determining module is used for determining word chain knowledge points to be recommended of the target document according to the relevance weight of the target document and each word chain knowledge point;
the information pushing module is used for pushing information according to the word chain knowledge points to be recommended, wherein the word chain knowledge points select a part from all knowledge points contained in a document as word chain knowledge points;
the correlation weight determination module includes:
the co-occurrence matrix construction unit is used for constructing a co-occurrence matrix according to the co-occurrence character chain knowledge points of the character chain knowledge points in all the documents;
the reverse file vector determining unit is used for determining a reverse file vector according to the reverse file frequency of each character chain knowledge point in all the documents;
the correlation matrix determining unit is used for determining a word chain correlation matrix of the target document according to word chain knowledge points contained in the target document;
a relevance vector determining unit, configured to determine a relevance vector between the target document and each text chain knowledge point according to the text chain relevance matrix of the target document, the co-occurrence matrix, and the reverse file vector;
the relevance weight determining unit is used for determining the relevance weight of the target document and each character chain knowledge point according to the relevance vector of the target document and each character chain knowledge point;
wherein the correlation vector determination unit is specifically configured to:
determining a relevance vector of the target document and each word chain knowledge point according to the following formula:
Figure FDA0003568457430000041
wherein
Figure FDA0003568457430000042
Is the correlation vector of the target document and each word chain knowledge point, X is the word chain correlation matrix of the target document, G is the co-occurrence matrix,
Figure FDA0003568457430000043
is the inverse document vector.
9. The apparatus of claim 8, wherein the text chain to be recommended determining module comprises:
the word chain knowledge point sequencing unit is used for sequencing the word chain knowledge points according to the relevance weight of the target document and the word chain knowledge points;
the word chain knowledge point filtering unit is used for filtering word chain knowledge points contained in the target document from the sequencing result;
and the word chain to be recommended determining unit is used for selecting a preset number of word chain knowledge points with high relevance weight as the word chain knowledge points to be recommended of the target document according to the filtering result.
10. The apparatus of claim 9, further comprising a text chain knowledge point determination module; the word chain knowledge point determination module is specifically configured to:
taking the product of the word frequency of each knowledge point in the document and the reverse file frequency of each knowledge point contained in the document as the correlation degree of each knowledge point and the document;
determining the weight of each knowledge point according to the correlation degree of each knowledge point and the document, the information content of each knowledge point and the similarity of each knowledge point and the title of the document;
and screening the knowledge points contained in the document according to the weight of the knowledge points, and taking the screened knowledge points as character chain knowledge points.
11. A server, characterized in that the server comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the information recommendation method of any of claims 1-7.
12. A storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the information recommendation method according to any one of claims 1-7.
CN201710960175.0A 2017-10-16 2017-10-16 Information recommendation method and device, server and storage medium Active CN109672706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710960175.0A CN109672706B (en) 2017-10-16 2017-10-16 Information recommendation method and device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710960175.0A CN109672706B (en) 2017-10-16 2017-10-16 Information recommendation method and device, server and storage medium

Publications (2)

Publication Number Publication Date
CN109672706A CN109672706A (en) 2019-04-23
CN109672706B true CN109672706B (en) 2022-06-14

Family

ID=66139318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710960175.0A Active CN109672706B (en) 2017-10-16 2017-10-16 Information recommendation method and device, server and storage medium

Country Status (1)

Country Link
CN (1) CN109672706B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611344B (en) * 2020-05-06 2023-06-13 北京智通云联科技有限公司 Complex attribute query method, system and equipment based on dictionary and knowledge graph
CN112329964B (en) * 2020-11-24 2024-03-29 北京百度网讯科技有限公司 Method, device, equipment and storage medium for pushing information
CN112434173B (en) * 2021-01-26 2021-04-20 浙江口碑网络技术有限公司 Search content output method and device, computer equipment and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432714A (en) * 2004-09-14 2009-05-13 A9.Com公司 Methods and apparatus for automatic generation of recommended links
CN104239298A (en) * 2013-06-06 2014-12-24 腾讯科技(深圳)有限公司 Text message recommendation method, server, browser and system
CN104965902A (en) * 2015-06-30 2015-10-07 北京奇虎科技有限公司 Enriched URL (uniform resource locator) recognition method and apparatus
CN105653737A (en) * 2016-03-01 2016-06-08 广州神马移动信息科技有限公司 Method, equipment and electronic equipment for content document sorting
CN105740460A (en) * 2016-02-24 2016-07-06 中国科学技术信息研究所 Webpage collection recommendation method and device
CN105808636A (en) * 2016-02-03 2016-07-27 北京中搜云商网络技术有限公司 APP information data based hypertext link pushing system
CN106339407A (en) * 2016-08-09 2017-01-18 百度在线网络技术(北京)有限公司 Processing method and device for message containing URL (uniform resource locator) address in IM (instant messaging)
CN106815297A (en) * 2016-12-09 2017-06-09 宁波大学 A kind of academic resources recommendation service system and method
CN107220352A (en) * 2017-05-31 2017-09-29 北京百度网讯科技有限公司 The method and apparatus that comment collection of illustrative plates is built based on artificial intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170139899A1 (en) * 2015-11-18 2017-05-18 Le Holdings (Beijing) Co., Ltd. Keyword extraction method and electronic device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101432714A (en) * 2004-09-14 2009-05-13 A9.Com公司 Methods and apparatus for automatic generation of recommended links
CN104239298A (en) * 2013-06-06 2014-12-24 腾讯科技(深圳)有限公司 Text message recommendation method, server, browser and system
CN104965902A (en) * 2015-06-30 2015-10-07 北京奇虎科技有限公司 Enriched URL (uniform resource locator) recognition method and apparatus
CN105808636A (en) * 2016-02-03 2016-07-27 北京中搜云商网络技术有限公司 APP information data based hypertext link pushing system
CN105740460A (en) * 2016-02-24 2016-07-06 中国科学技术信息研究所 Webpage collection recommendation method and device
CN105653737A (en) * 2016-03-01 2016-06-08 广州神马移动信息科技有限公司 Method, equipment and electronic equipment for content document sorting
CN106339407A (en) * 2016-08-09 2017-01-18 百度在线网络技术(北京)有限公司 Processing method and device for message containing URL (uniform resource locator) address in IM (instant messaging)
CN106815297A (en) * 2016-12-09 2017-06-09 宁波大学 A kind of academic resources recommendation service system and method
CN107220352A (en) * 2017-05-31 2017-09-29 北京百度网讯科技有限公司 The method and apparatus that comment collection of illustrative plates is built based on artificial intelligence

Also Published As

Publication number Publication date
CN109672706A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN107220352B (en) Method and device for constructing comment map based on artificial intelligence
US9645999B1 (en) Adjustment of document relationship graphs
CN108287864B (en) Interest group dividing method, device, medium and computing equipment
Ding et al. Entity discovery and assignment for opinion mining applications
US8620934B2 (en) Systems and methods for selecting data elements, such as population members, from a data source
CN108140018A (en) Creation is used for the visual representation of text based document
CN110390052B (en) Search recommendation method, training method, device and equipment of CTR (China train redundancy report) estimation model
CN109672706B (en) Information recommendation method and device, server and storage medium
CN111522886B (en) Information recommendation method, terminal and storage medium
CN110546633A (en) Named entity based category tag addition for documents
US20150169740A1 (en) Similar image retrieval
Weisser et al. Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data
KR101494795B1 (en) Method for representing document as matrix
CN110008396B (en) Object information pushing method, device, equipment and computer readable storage medium
US10296624B2 (en) Document curation
Becheru et al. Tourist review analytics using complex networks
Hosseinabadi et al. ISSE: a new iterative sentence scoring and extraction scheme for automatic text summarization
Lamothe Comparing usage patterns recorded between an electronic reference and an electronic monograph collection: The differences in searches and full-text content viewings
CN111222918B (en) Keyword mining method and device, electronic equipment and storage medium
CN109408725B (en) Method and apparatus for determining user interest
CN114090891A (en) Personalized content recommendation method, device, equipment and storage medium
CN113722593A (en) Event data processing method and device, electronic equipment and medium
CN112231444A (en) Processing method and device for corpus data combining RPA and AI and electronic equipment
CN111310016B (en) Label mining method, device, server and storage medium
CN104809165A (en) Determination method and equipment for relevancy of multi-media document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant