CN113139049A - Associated document recommendation method and device, computer equipment and storage medium - Google Patents

Associated document recommendation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113139049A
CN113139049A CN202110427735.2A CN202110427735A CN113139049A CN 113139049 A CN113139049 A CN 113139049A CN 202110427735 A CN202110427735 A CN 202110427735A CN 113139049 A CN113139049 A CN 113139049A
Authority
CN
China
Prior art keywords
document
candidate
field
candidate document
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110427735.2A
Other languages
Chinese (zh)
Inventor
荆小兵
霍京超
曹雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Minglue Zhaohui Technology Co Ltd
Original Assignee
Beijing Minglue Zhaohui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Minglue Zhaohui Technology Co Ltd filed Critical Beijing Minglue Zhaohui Technology Co Ltd
Priority to CN202110427735.2A priority Critical patent/CN113139049A/en
Publication of CN113139049A publication Critical patent/CN113139049A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method and a device for recommending associated documents, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining a candidate document which is possibly interested by a user according to a browsing behavior log of the user, estimating the degree of the interest of the user in the candidate document, namely a first matching degree, further finely comparing document attribute information of the candidate document with document attribute information of a currently browsed document, integrating a comparison result with the first matching degree to form a second matching degree between the candidate document and the user, combining the browsing behavior log of the user and the comparison result between the candidate document and the currently browsed document according to the second matching degree, and recommending a related document which meets the user preference and is higher in similarity with the currently browsed document to the user more accurately through the second matching degree. According to the scheme, the full-text content of the document is not required to be compared with the full-text content of the candidate document, the data comparison calculation amount is reduced, and the recommendation rate is improved.

Description

Associated document recommendation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for recommending an associated document, a computer device, and a storage medium.
Background
The document management system is used for managing common office documents, pdf documents, pictures, videos and other multimedia documents. The user can check various documents on line, and during the checking period of the user, the system automatically recommends the associated document of the currently browsed content, so that the user can be helped to find more related information, the requirements of the user are further met, and the value of the system is improved.
However, in the prior art, in order to recommend the associated document related to the currently browsed content, the similarity between the currently browsed content and the full-text content of the document is compared, and firstly, the difficulty coefficient for obtaining the full-text content of the document is high, and the full-text content of the document cannot be obtained necessarily; secondly, the similarity processing of the full-text content is large in calculation amount, the calculation resources are wasted, and the effect of real-time recommendation is difficult to achieve.
Disclosure of Invention
In order to solve the technical problem, the application provides a method and a device for recommending associated documents, computer equipment and a storage medium.
In a first aspect, the present application provides a method for recommending associated documents, including:
acquiring browsing information of a currently browsed document, wherein the browsing information comprises document attribute information of the currently browsed document and current user information, and the user information comprises a browsing behavior log of a current user;
obtaining a candidate document to be recommended according to the browsing behavior log, wherein the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user;
comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association degree between each candidate document and the currently browsed document;
integrating a first matching degree between each candidate document and the current user and an association degree between each candidate document and the current browsed document to obtain a second matching degree between the candidate document and the current user;
and recommending the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
Optionally, the document attribute information has a plurality of fields, and the comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association degree between each candidate document and the currently browsed document includes:
comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain a plurality of field similarities;
and summing all field similarities between the candidate document and the current browsed document to obtain the association degree between the candidate document and the current browsed document.
Optionally, the comparing the content carried by the field in the currently browsed document with the content carried by the same field in the candidate document to obtain a plurality of field similarities includes:
comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain the content similarity corresponding to each field in the candidate document;
determining a target weight value corresponding to each field in the candidate document according to a preset mapping relation between the field and the weight value;
and multiplying the content similarity corresponding to each field in the candidate document by the target weight value to obtain the field similarity corresponding to each field in the candidate document.
Optionally, the integrating the first matching degree between each candidate document and the current user and the association degree between each candidate document and the current browsing document to obtain the second matching degree between the candidate document and the current user includes:
counting the number of fields in the candidate document;
multiplying the first matching degree corresponding to the candidate document by the field number to obtain an estimated product;
and according to a preset weighting rule, carrying out weighted summation processing on the association degree and the pre-estimated product corresponding to the candidate document to obtain a second matching degree between the candidate document and the current user, wherein the preset weighting rule is used for weighting the association degree according to a first preset weight and weighting the pre-estimated product according to a second preset weight.
Optionally, after obtaining the field similarity, the method further includes:
and generating prompt information corresponding to each field according to the field similarity corresponding to each field in the candidate document and the content carried by the field, wherein the prompt information is used for showing the association state between the field in the candidate document and the same field in the current browsed document, and the association state is used for representing the similar state or the same state.
Optionally, before sequentially recommending and displaying the candidate documents corresponding to the respective second matching degrees according to the descending order of the respective second matching degrees, the method further includes:
classifying candidate documents containing the same prompt information to obtain a plurality of document sets, wherein each document set corresponds to one prompt information and comprises at least one candidate document;
and sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
Optionally, after obtaining a plurality of document sets, the method further includes:
carrying out duplicate removal processing on candidate documents in each document set, wherein one candidate document after duplicate removal only belongs to one document set;
and under the condition that the deduplication processing is finished, sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
In a second aspect, the present application provides an associated document recommendation apparatus, including:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring browsing information of a currently browsed document, the browsing information comprises document attribute information of the currently browsed document and current user information, and the user information comprises a browsing behavior log of a current user;
the filtering recall module is used for acquiring a candidate document to be recommended according to the browsing behavior log, wherein the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user;
the comparison module is used for comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association degree between each candidate document and the currently browsed document;
the integration module is used for integrating the first matching degree between each candidate document and the current user and the association degree between each candidate document and the current browsing document to obtain a second matching degree between the candidate document and the current user;
and the recommending module is used for recommending the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring browsing information of a currently browsed document, wherein the browsing information comprises document attribute information of the currently browsed document and current user information, and the user information comprises a browsing behavior log of a current user;
obtaining a candidate document to be recommended according to the browsing behavior log, wherein the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user;
comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association degree between each candidate document and the currently browsed document;
integrating a first matching degree between each candidate document and the current user and an association degree between each candidate document and the current browsed document to obtain a second matching degree between the candidate document and the current user;
and recommending the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring browsing information of a currently browsed document, wherein the browsing information comprises document attribute information of the currently browsed document and current user information, and the user information comprises a browsing behavior log of a current user;
obtaining a candidate document to be recommended according to the browsing behavior log, wherein the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user;
comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association degree between each candidate document and the currently browsed document;
integrating a first matching degree between each candidate document and the current user and an association degree between each candidate document and the current browsed document to obtain a second matching degree between the candidate document and the current user;
and recommending the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
The method is applied to the technical field of information retrieval and used for enhancing the knowledge retrieval function, and based on the associated document recommendation method in the method, the candidate documents which are possibly interested by the user are obtained according to the browsing behavior log of the user, the degree of the interest of the user in the candidate documents can be estimated according to the browsing behavior log of the user, namely, the first matching degree, then the document attribute information of the candidate documents and the document attribute information of the current browsing documents are subjected to further detailed comparison, the comparison result and the first matching degree are integrated to form a second matching degree between the candidate documents and the user, the second matching degree combines the browsing behavior log of the user and the comparison result between the candidate documents and the current browsing documents, and the associated documents which meet the preference of the user and are higher in similarity with the current browsing documents can be more accurately recommended for the user through the second matching degree. According to the scheme, the full-text content of the document is not required to be compared with the full-text content of the candidate document, the data comparison calculation amount is reduced, and the recommendation rate is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a diagram of an application environment of a method for associated document recommendation in one embodiment;
FIG. 2 is a flowchart illustrating a method for associated document recommendation in one embodiment;
FIG. 3 is a block diagram showing the configuration of an associated document recommending apparatus in one embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
FIG. 1 is a diagram of an application environment of a method for associated document recommendation in one embodiment. Referring to fig. 1, the associated document recommendation method is applied to an associated document recommendation system. The associated document recommendation system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may specifically be a desktop terminal or a mobile terminal, and the mobile terminal may specifically be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, fig. 2 is a flowchart illustrating a method for recommending associated documents in one embodiment, and referring to fig. 2, a method for recommending associated documents is provided. The embodiment is mainly exemplified by applying the method to the terminal 110 in fig. 1, and the method for recommending the associated document specifically includes the following steps:
step S210, obtaining browsing information of a currently browsed document, where the browsing information includes document attribute information of the currently browsed document and current user information, and the user information includes a browsing behavior log of a current user.
In this embodiment, the document attribute information is used to represent structured information in a currently browsed document, the current user information includes identity information and a browsing behavior log, the identity information includes an identity tag, a gender, an age, and the like, the browsing behavior log includes a historical behavior log, a current behavior log, and the like, which record browsing behaviors of the user on the document, and the browsing behaviors of the user on the document include behaviors of reading all contents of the document, reading the document for more than a preset time, collecting the document, downloading the document after reading, and the like.
Step S220, obtaining a candidate document to be recommended according to the browsing behavior log, wherein the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user.
In this embodiment, the implementation process of obtaining the candidate document according to the browsing behavior log may be implemented by different recommendation schemes, where the recommendation schemes include content-based recommendation, collaborative filtering recommendation, rule-based recommendation, demographic information-based recommendation, and hybrid recommendation. Content-based recommendations are recommendations of similar content based on content that the user has focused on. Collaborative filtering recommendations are divided into three categories, the first being user-based collaborative filtering, the second being project-based collaborative filtering, and the third being model-based collaborative filtering.
The collaborative filtering based on the users mainly considers the similarity between the users, and as long as finding out articles liked by similar users and predicting the scores of the target users for the corresponding articles, a plurality of articles with the highest scores can be found and recommended to the users. And the project-based collaborative filtering is similar to the user-based collaborative filtering, the similarity between the articles is obtained, then the similar articles with high similarity can be predicted according to the scores of the target user for some articles, and a plurality of similar articles with the highest scores are recommended to the user.
In the model-based collaborative filtering, the data of m articles and n users are included, only part of the users and part of the articles have scoring data, the scoring data of other parts are blank, the scoring data corresponding to the blank articles are predicted by using the existing part of sparse data, and the articles with higher scoring are recommended to the users. The collaborative filtering based on the model is modeled by the idea of machine learning, and specifically, the model can be modeled by using a correlation algorithm, a clustering algorithm, a classification algorithm, a regression algorithm, matrix decomposition, a neural network and the like.
In the embodiment, a model-based collaborative filtering is selected to obtain a candidate document to be recommended, and the first matching degree is used for representing that score data of a user on the candidate document is predicted according to the browsing behavior log, namely that a probability that the user is interested in the candidate document is predicted based on the browsing behavior log.
Step S230, comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain a degree of association between each candidate document and the currently browsed document.
In this embodiment, the candidate document is only a document recommended according to the past browsing behavior of the user, but there may be no association between the candidate document and the currently browsed document, and browsing the current document by the user indicates that the user is most interested in the content in the currently browsed document currently, although there is also content in the candidate document that is interested by the user, the candidate document related to the currently browsed document is recommended for the user according to the browsing condition of the user currently, and the document attribute information of the candidate document is compared with the document attribute information of the currently browsed document to obtain the association degree between the candidate document and the currently browsed document, so that the full-text content of the document does not need to be compared with the full-text content of the candidate document, thereby reducing the data comparison calculation amount and improving the recommendation rate.
Step S240, integrating the first matching degree between each candidate document and the current user and the association degree between each candidate document and the current browsing document to obtain a second matching degree between the candidate document and the current user.
In this embodiment, the second matching degree is used to indicate a probability that the user is interested in the candidate document when the user combines the past browsing behavior with the currently browsed document, and a higher second matching degree indicates a higher probability that the user is interested in the candidate document.
Step S250, recommending the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
In this embodiment, the candidate documents are sequentially displayed in a descending order according to the corresponding second matching degrees, the second matching degrees are combined with the browsing behavior log of the user and the comparison result between the candidate documents and the currently browsed document, the associated documents meeting the user preference and having a higher similarity to the currently browsed document can be more accurately recommended to the user through the second matching degrees, and the browsing experience of the user is improved.
In one embodiment, the comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association between each candidate document and the currently browsed document includes: comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain a plurality of field similarities; and summing all field similarities between the candidate document and the current browsed document to obtain the association degree between the candidate document and the current browsed document.
In this embodiment, the document attribute information includes a plurality of fields for representing structured information, where the fields may be specifically divided into a tag field, a text field, and a list field, where the tag field may be a document tag, an author, an uploader, and the like, the text field may be a document main title, a document subheading, and the like, and the list field may be a statistical list or an item list, and content carried in each field in the currently browsed document is compared with content carried in the same field in the candidate document, for example, the document title of the currently browsed document is compared with the document title in the candidate document, and field similarity corresponding to the document title is generated.
The field similarity is used for representing the similarity between the field content in the current browsed document and the content carried by the same field in the candidate document, the similarity between the current browsed document and the candidate document can be obtained according to the field similarity summary corresponding to all the fields, the association degree is used for representing the similar association degree between the current browsed document and the candidate document, the association degree between the current browsed document and the candidate document can be obtained without comparing the full-text content of the current browsed document and the full-text content of the candidate document, the data calculation amount in the comparison process is reduced, the recommendation rate is improved, and the effect of real-time recommendation is achieved.
In one embodiment, the comparing the content carried by the field in the currently browsed document with the content carried by the same field in the candidate document to obtain a plurality of field similarities includes: comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain the content similarity corresponding to each field in the candidate document; determining a target weight value corresponding to each field in the candidate document according to a preset mapping relation between the field and the weight value; and multiplying the content similarity corresponding to each field in the candidate document by the target weight value to obtain the field similarity corresponding to each field in the candidate document.
In this embodiment, the content belonging to the tag field in the currently browsed document is compared with the content carried by the same field in the candidate document, and whether the content corresponding to the field is equal or not is directly judged, if the content is equal, the completely same content similarity is obtained and is marked as 1; if the contents are not equal, the content similarity of 0 is obtained. For example, whether the authors of the currently browsed document and the candidate document are the same is judged, and if the authors are the same, the content similarity between the author field in the currently browsed document and the author field in the candidate document is 1; if the authors are different, the content similarity between the author field in the current browsed document and the author field in the candidate document is 0.
And comparing the content belonging to the text field in the current browsed document with the content carried by the same field in the candidate document, and obtaining the content similarity corresponding to the text field by adopting the modes of calculating a jaccard similarity coefficient or an editing distance and the like. For example, comparing the main title of the document in the current browsed document with the main title of the document in the candidate document, wherein the similarity of the obtained content after comparison is closer to 0, which means that the similarity of the main title of the document in the current browsed document and the main title of the document in the candidate document is lower; conversely, the closer the content similarity is to 1, the higher the similarity between the document headline representing the currently viewed document and the document headline in the candidate document.
And comparing the content belonging to the list field in the current browsed document with the content carried by the same field in the candidate document, and obtaining the content similarity corresponding to the list field by adopting a mode of calculating the intersection and comparison of the item lists carried by the list field corresponding to the current browsed document and the candidate document.
The preset mapping relation between the field and the weight value can be set according to actual service requirements, the weight value is set according to the recommendation degree of the field, and if the document is not required to be recommended according to an author in the service requirements, the weight value of the field of the author is set to be 0; if the document is recommended according to the author in the service requirement, the weight value of the field of the author is set to be a value between 0 and 1. The higher the recommendation level based on a field, the higher the weight value corresponding to that field.
And multiplying the content similarity between the content of the field in the current browsed document and the content carried by the same field in the candidate document by the weight value corresponding to the same field, namely combining the similarity of the field between the current browsed document and the candidate document with the recommendation degree of the field to obtain the field similarity corresponding to the field.
In one embodiment, the integrating the first matching degree between each candidate document and the current user and the association degree between each candidate document and the current browsing document to obtain the second matching degree between the candidate document and the current user includes: counting the number of fields in the candidate document; multiplying the first matching degree corresponding to the candidate document by the field number to obtain an estimated product; and according to a preset weighting rule, carrying out weighted summation processing on the association degree and the pre-estimated product corresponding to the candidate document to obtain a second matching degree between the candidate document and the current user, wherein the preset weighting rule is used for weighting the association degree according to a first preset weight and weighting the pre-estimated product according to a second preset weight.
In one embodiment, the number of fields in the candidate document is recorded as n, a first matching degree between the candidate document and the current user is recorded as Y, and a degree of association between the candidate document and the current browsing document is recorded as X, where X is X1+X2+…+XnWherein X is1To XnAre respectively asThe field similarity between the content of each field of the candidate document and the content carried by the same field in the current browsed document is estimated, the estimated product is n X Y, the first preset weight is recorded as a, the second preset weight is recorded as b, the correlation degree and the estimated product are subjected to weighted summation, and the obtained second matching degree is a (X)1+X2+…+Xn) + b × n × Y, wherein the first preset weight and the second preset weight can be customized according to actual business requirements, and if the business requirements are more emphasized to recommend documents for the user according to the historical behaviors of the user, the second preset weight is larger than the first preset weight; and if the business requirement emphasizes recommending the associated document for the user according to the currently browsed document, enabling the first preset weight to be larger than the second preset weight.
In one embodiment, the integrating the first matching degree between each candidate document and the current user and the association degree between each candidate document and the current browsing document to obtain the second matching degree between the candidate document and the current user includes:
multiplying the association degree between the candidate document and the current browsed document by the first matching degree between the candidate document and the current user to obtain a second matching degree between the candidate document and the current user, wherein the second matching degree is X Y (X)1+X2+…+Xn) Y, the second matching degree is used to represent the probability that the user is interested in the candidate document when the user combines the previous browsing behavior with the currently browsed document, and the combining manner may be weighted summation of the first matching degree and the association degree, or may be multiplication of the first matching degree and the association degree.
In one embodiment, after obtaining the field similarity, the method further includes: and generating prompt information corresponding to each field according to the field similarity corresponding to each field in the candidate document and the content carried by the field, wherein the prompt information is used for showing the association state between the field in the candidate document and the same field in the current browsed document, and the association state is used for representing the similar state or the same state.
In this embodiment, the association status between the fields is determined by the field similarity, for example, if the field similarity between the author field in the candidate document and the author field in the currently browsed document is 0, the prompt information corresponding to the author field is "different authors"; and if the field similarity between the author field in the candidate document and the author field in the current browsed document is 1, the prompt message corresponding to the author field is 'same author'. And if the field similarity between the document title in the candidate document and the document label in the current browsed document is 0.7, the prompt information corresponding to the document title is the similar title. If the field similarity between the item list in the candidate document and the item list in the current browsed document is 0.6, the prompt information corresponding to the item list is "same tag (tag1, tag 2)", wherein tag1 and tag2 are the same tags in the item list of the candidate document and the item list of the current browsed document, and the same items in the item list are shown for the user.
The relevance reason of the candidate document and the current browsing document, namely the recommendation reason of the candidate document, is displayed for the user through the prompt information, so that the user can be helped to find interesting relevance information more quickly, and the information searching efficiency is improved.
In an embodiment, before sequentially recommending and presenting the candidate documents corresponding to the respective second matching degrees according to the descending order of the respective second matching degrees, the method further includes: classifying candidate documents containing the same prompt information to obtain a plurality of document sets, wherein each document set corresponds to one prompt information and comprises at least one candidate document; and sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
In this embodiment, the candidate documents to be recommended are displayed for the user, and may be displayed in sequence directly according to the descending order of the second matching degree, and may also be classified according to the association reason between the candidate documents and the currently browsed document, each kind of classified prompt information corresponds to one document set, and each document set is sorted in descending order according to the second matching degree, and the classified candidate documents are displayed in sequence according to the descending order of the second matching degree, so that the user can quickly find the document to be read according to the association reason, the time for the user to browse a plurality of documents and find the association information is reduced, and the browsing experience of the user is improved.
In one embodiment, after obtaining the plurality of document sets, the method further comprises: carrying out duplicate removal processing on candidate documents in each document set, wherein one candidate document after duplicate removal only belongs to one document set; and under the condition that the deduplication processing is finished, sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
In this embodiment, since there may be multiple types of prompt information in one candidate document, classifying the candidate documents according to the types of the prompt information may possibly cause the same candidate document to appear in different document sets, and in order to avoid that a user repeatedly refers to the same document in different document sets, it is necessary to perform deduplication processing on the candidate documents in the document sets, so that one candidate document only appears in one document set, and the user is prevented from repeatedly seeing the same candidate document in different document sets, thereby avoiding wasting reading time of the user.
FIG. 2 is a flowchart illustrating a method for associated document recommendation in one embodiment. It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided an associated document recommending apparatus including:
an obtaining module 310, configured to obtain browsing information of a currently browsed document, where the browsing information includes document attribute information of the currently browsed document and current user information, and the user information includes a browsing behavior log of a current user;
a filtering recall module 320, configured to obtain a candidate document to be recommended according to the browsing behavior log, where the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user;
a comparing module 330, configured to compare the document attribute information of the currently browsed document with the document attribute information of each candidate document, to obtain a degree of association between each candidate document and the currently browsed document;
an integration module 340, configured to integrate a first matching degree between each candidate document and the current user and an association degree between each candidate document and the current browsing document to obtain a second matching degree between the candidate document and the current user;
and a recommending module 350, configured to recommend the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
In one embodiment, the document attribute information has a plurality of fields, and the alignment module 330 is further configured to:
comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain a plurality of field similarities;
and summing all field similarities between the candidate document and the current browsed document to obtain the association degree between the candidate document and the current browsed document.
In one embodiment, the alignment module 330 is further configured to:
comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain the content similarity corresponding to each field in the candidate document;
determining a target weight value corresponding to each field in the candidate document according to a preset mapping relation between the field and the weight value;
and multiplying the content similarity corresponding to each field in the candidate document by the target weight value to obtain the field similarity corresponding to each field in the candidate document.
In one embodiment, the integration module 340 is further configured to:
counting the number of fields in the candidate document;
multiplying the first matching degree corresponding to the candidate document by the field number to obtain an estimated product;
and according to a preset weighting rule, carrying out weighted summation processing on the association degree and the pre-estimated product corresponding to the candidate document to obtain a second matching degree between the candidate document and the current user, wherein the preset weighting rule is used for weighting the association degree according to a first preset weight and weighting the pre-estimated product according to a second preset weight.
In one embodiment, after obtaining the field similarity, the apparatus further includes:
and the prompt module is used for generating prompt information corresponding to each field according to the field similarity corresponding to each field in the candidate document and the content carried by the field, wherein the prompt information is used for displaying the association state between the field in the candidate document and the same field in the current browsed document, and the association state is used for representing the similar state or the same state.
In one embodiment, the prompt module is further configured to:
classifying candidate documents containing the same prompt information to obtain a plurality of document sets, wherein each document set corresponds to one prompt information and comprises at least one candidate document;
and sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
In one embodiment, the prompt module is further configured to:
carrying out duplicate removal processing on candidate documents in each document set, wherein one candidate document after duplicate removal only belongs to one document set;
and under the condition that the deduplication processing is finished, sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
FIG. 4 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal 110 in fig. 1. As shown in fig. 4, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the associated document recommendation method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform the associated document recommendation method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the associated document recommendation apparatus provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 4. The memory of the computer device may store various program modules constituting the associated document recommendation apparatus, such as the acquisition module 310, the filtering recall module 320, the comparison module 330, the integration module 340 and the recommendation module 350 shown in fig. 3. The computer program constituted by the respective program modules causes the processor to execute the steps in the associated document recommendation method of the respective embodiments of the present application described in the present specification.
The computer device shown in fig. 4 may perform the step of obtaining browsing information of a currently browsed document through an obtaining module 310 in the associated document recommending apparatus shown in fig. 3, wherein the browsing information includes document attribute information of the currently browsed document and current user information, and the user information includes a browsing behavior log of a current user. The computer device may execute, by the filtering recall module 320, obtaining a candidate document to be recommended according to the browsing behavior log, where the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user. The computer device may compare the document attribute information of the currently browsed document with the document attribute information of each candidate document through the comparison module 330, so as to obtain the association degree between each candidate document and the currently browsed document. The computer device may perform, by the integration module 340, a first matching degree between each candidate document and the current user and an association degree between each candidate document and the current browsing document, and integrate to obtain a second matching degree between the candidate document and the current user. The computer device may recommend, through the recommending module 350, the candidate documents corresponding to each of the second matching degrees in descending order of the second matching degrees.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the above embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the method of any of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by instructing the relevant hardware through a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for recommending associated documents, the method comprising:
acquiring browsing information of a currently browsed document, wherein the browsing information comprises document attribute information of the currently browsed document and current user information, and the user information comprises a browsing behavior log of a current user;
obtaining a candidate document to be recommended according to the browsing behavior log, wherein the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user;
comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association degree between each candidate document and the currently browsed document;
integrating a first matching degree between each candidate document and the current user and an association degree between each candidate document and the current browsed document to obtain a second matching degree between the candidate document and the current user;
and recommending the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
2. The method of claim 1, wherein the document attribute information has a plurality of fields, and the comparing the document attribute information of the currently viewed document with the document attribute information of each candidate document to obtain the association between each candidate document and the currently viewed document comprises:
comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain a plurality of field similarities;
and summing all field similarities between the candidate document and the current browsed document to obtain the association degree between the candidate document and the current browsed document.
3. The method of claim 2, wherein the comparing the content carried by the field in the currently viewed document with the content carried by the same field in the candidate document to obtain a plurality of field similarities comprises:
comparing the content carried by the field in the current browsed document with the content carried by the same field in the candidate document to obtain the content similarity corresponding to each field in the candidate document;
determining a target weight value corresponding to each field in the candidate document according to a preset mapping relation between the field and the weight value;
and multiplying the content similarity corresponding to each field in the candidate document by the target weight value to obtain the field similarity corresponding to each field in the candidate document.
4. The method of claim 2, wherein the integrating the first degree of matching between each candidate document and the current user and the degree of association between each candidate document and the current browsing document to obtain the second degree of matching between the candidate document and the current user comprises:
counting the number of fields in the candidate document;
multiplying the first matching degree corresponding to the candidate document by the field number to obtain an estimated product;
and according to a preset weighting rule, carrying out weighted summation processing on the association degree and the pre-estimated product corresponding to the candidate document to obtain a second matching degree between the candidate document and the current user, wherein the preset weighting rule is used for weighting the association degree according to a first preset weight and weighting the pre-estimated product according to a second preset weight.
5. The method of claim 2, wherein after obtaining the field similarity, the method further comprises:
and generating prompt information corresponding to each field according to the field similarity corresponding to each field in the candidate document and the content carried by the field, wherein the prompt information is used for showing the association state between the field in the candidate document and the same field in the current browsed document, and the association state is used for representing the similar state or the same state.
6. The method according to claim 5, wherein before recommending and presenting the candidate documents corresponding to the second matching degrees in sequence according to the descending order of the second matching degrees, the method further comprises:
classifying candidate documents containing the same prompt information to obtain a plurality of document sets, wherein each document set corresponds to one prompt information and comprises at least one candidate document;
and sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
7. The method of claim 6, wherein after obtaining the plurality of document collections, the method further comprises:
carrying out duplicate removal processing on candidate documents in each document set, wherein one candidate document after duplicate removal only belongs to one document set;
and under the condition that the deduplication processing is finished, sequentially recommending and displaying the candidate documents corresponding to the second matching degrees in each document set according to the descending order of the second matching degrees.
8. An associated document recommendation apparatus, characterized in that the apparatus comprises:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring browsing information of a currently browsed document, the browsing information comprises document attribute information of the currently browsed document and current user information, and the user information comprises a browsing behavior log of a current user;
the filtering recall module is used for acquiring a candidate document to be recommended according to the browsing behavior log, wherein the candidate document carries corresponding document attribute information and a first matching degree between the candidate document and the current user;
the comparison module is used for comparing the document attribute information of the currently browsed document with the document attribute information of each candidate document to obtain the association degree between each candidate document and the currently browsed document;
the integration module is used for integrating the first matching degree between each candidate document and the current user and the association degree between each candidate document and the current browsing document to obtain a second matching degree between the candidate document and the current user;
and the recommending module is used for recommending the candidate documents corresponding to the second matching degrees according to the descending order of the second matching degrees.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202110427735.2A 2021-04-21 2021-04-21 Associated document recommendation method and device, computer equipment and storage medium Pending CN113139049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110427735.2A CN113139049A (en) 2021-04-21 2021-04-21 Associated document recommendation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110427735.2A CN113139049A (en) 2021-04-21 2021-04-21 Associated document recommendation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113139049A true CN113139049A (en) 2021-07-20

Family

ID=76813024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110427735.2A Pending CN113139049A (en) 2021-04-21 2021-04-21 Associated document recommendation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113139049A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313539A (en) * 2021-07-29 2021-08-27 广东联讯科技发展股份有限公司 Digital marketing service management platform based on big data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059448A1 (en) * 2006-09-06 2008-03-06 Walter Chang System and Method of Determining and Recommending a Document Control Policy for a Document
CN107944033A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Associate topic and recommend method and apparatus
CN111475729A (en) * 2020-04-07 2020-07-31 腾讯科技(深圳)有限公司 Search content recommendation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059448A1 (en) * 2006-09-06 2008-03-06 Walter Chang System and Method of Determining and Recommending a Document Control Policy for a Document
CN101529373A (en) * 2006-09-06 2009-09-09 奥多比公司 System and method of determining and recommending a document control policy for a document
CN107944033A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Associate topic and recommend method and apparatus
CN111475729A (en) * 2020-04-07 2020-07-31 腾讯科技(深圳)有限公司 Search content recommendation method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113313539A (en) * 2021-07-29 2021-08-27 广东联讯科技发展股份有限公司 Digital marketing service management platform based on big data

Similar Documents

Publication Publication Date Title
CN111199428B (en) Commodity recommendation method and device, storage medium and computer equipment
US11663254B2 (en) System and engine for seeded clustering of news events
CN111680219B (en) Content recommendation method, device, equipment and readable storage medium
CN109145215B (en) Network public opinion analysis method, device and storage medium
Marinho et al. Collaborative tag recommendations
Kim et al. Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation
CN107391687B (en) Local log website-oriented hybrid recommendation system
Kim et al. A scientometric review of emerging trends and new developments in recommendation systems
US8234311B2 (en) Information processing device, importance calculation method, and program
Symeonidis et al. A unified framework for providing recommendations in social tagging systems based on ternary semantic analysis
CN111080398B (en) Commodity recommendation method, commodity recommendation device, computer equipment and storage medium
Ma et al. Exploring performance of clustering methods on document sentiment analysis
Kong et al. Predicting search intent based on pre-search context
US20080222105A1 (en) Entity recommendation system using restricted information tagged to selected entities
CN108228745B (en) Recommendation algorithm and device based on collaborative filtering optimization
Li et al. A hybrid recommendation system for Q&A documents
US11615494B2 (en) Intellectual property recommending method and system
Movahedian et al. Folksonomy-based user interest and disinterest profiling for improved recommendations: An ontological approach
CA2956627A1 (en) System and engine for seeded clustering of news events
CN113032668A (en) Product recommendation method, device and equipment based on user portrait and storage medium
US9552415B2 (en) Category classification processing device and method
Malhotra et al. A comprehensive review from hyperlink to intelligent technologies based personalized search systems
CN113139049A (en) Associated document recommendation method and device, computer equipment and storage medium
Yin et al. ISART: a generic framework for searching books with social information
CN112685635A (en) Item recommendation method, device, server and storage medium based on classification label

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination