WO2021027149A1 - Portrait similarity-based information retrieval recommendation method and device and storage medium - Google Patents

Portrait similarity-based information retrieval recommendation method and device and storage medium Download PDF

Info

Publication number
WO2021027149A1
WO2021027149A1 PCT/CN2019/117794 CN2019117794W WO2021027149A1 WO 2021027149 A1 WO2021027149 A1 WO 2021027149A1 CN 2019117794 W CN2019117794 W CN 2019117794W WO 2021027149 A1 WO2021027149 A1 WO 2021027149A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
similarity
portrait
query
portraits
Prior art date
Application number
PCT/CN2019/117794
Other languages
French (fr)
Chinese (zh)
Inventor
刘利
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021027149A1 publication Critical patent/WO2021027149A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Definitions

  • This application relates to the field of data analysis technology, and in particular to an information retrieval recommendation method, device, system and computer-readable storage medium based on the similarity of user portraits.
  • CIR Collaborative Information Retrieval
  • the CIR collaborative information retrieval system can analyze user interaction history records to more effectively respond to subsequent user queries.
  • two users send the same query to the CIR system at the same time, because the goals and behavior characteristics of the two users may be different, they may be interested in two different document lists.
  • CIR faces a personalized Query recommended questions.
  • information retrieval is the main way for users to query and obtain information. It is a method and means to find information.
  • Information storage is the basis for information retrieval.
  • the information to be stored here includes original document data, pictures, videos, and audio. In order to achieve information retrieval, the original information must be converted into computer language and stored in the database, otherwise machine identification cannot be performed.
  • the retrieval system After the user enters the query request according to the intention, the retrieval system searches the database for information related to the query according to the user’s query request, calculates the similarity of the information through a certain matching mechanism, and converts the information in order from large to small Output.
  • the inventor realizes that the existing information retrieval methods are either relatively complicated, or have poor retrieval accuracy and insufficient personalization, resulting in poor recommendation effects and poor user experience.
  • This application provides a method, electronic device, system and computer-readable storage medium for information retrieval and recommendation based on the similarity of user portraits.
  • the main purpose of the method is to obtain the similarity of user portraits through the maximum matching of weighted bipartite graphs, and to obtain information between different users.
  • This method can dynamically build user communities in a collaborative information retrieval environment and apply it to personalized information retrieval, improve retrieval accuracy, and optimize user experience.
  • this application provides an information retrieval recommendation method based on the similarity of user portraits, which is applied to an electronic device, and the method includes:
  • the user is recommended for information retrieval.
  • the present application also provides an electronic device, the electronic device comprising: a memory and a processor, the memory includes an information retrieval recommendation program based on the similarity of portraits, and the information retrieval recommendation program based on the similarity of user portraits is processed by the processor.
  • the following steps are implemented during execution:
  • the user is recommended for information retrieval.
  • this application also provides an information retrieval recommendation system based on the similarity of portraits, including:
  • the user portrait similarity determination unit is used to obtain user portraits of different users and determine the user portrait similarity between user portraits
  • Dynamic community creation unit users create user dynamic communities based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
  • the search recommendation unit is used to perform information search and recommendation for users according to the user's dynamic community and the user's query sentence.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium includes an information retrieval recommendation program based on the similarity of user portraits.
  • the information retrieval recommendation program based on the similarity of user portraits is processed by the processor. When executed, the steps of the above information retrieval recommendation method based on the similarity of the user portrait are realized.
  • the method, device, system and computer-readable storage medium for information retrieval and recommendation based on the similarity of user portraits proposed in this application construct a weighted bipartite graph based on user portraits, and obtain the maximum weight between user portraits by using the maximum matching of the weighted bipartite graphs
  • the matching value can dynamically construct a user community based on the similarity of user portraits in a collaborative information retrieval environment, and perform personalized information retrieval recommendations based on the user community, which can improve user retrieval accuracy, optimize user experience, and achieve personalized recommendations.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of an information retrieval recommendation method based on the similarity of user portraits according to the present application;
  • FIG. 2 is a schematic diagram of modules of a preferred embodiment of an information retrieval recommendation system based on the similarity of user portraits according to the present application;
  • FIG. 3 is a flowchart of a preferred embodiment of an information retrieval recommendation method based on the similarity of user portraits according to the present application
  • Figure 4 is a flowchart of a method for calculating the similarity of user portraits based on graph algorithms:
  • Figure 5 is a bipartite graph constructed based on user portraits of two different users.
  • This application provides an information retrieval and recommendation method based on the similarity of user portraits, which is applied to an electronic device 1.
  • FIG. 1 it is a schematic diagram of the application environment of the preferred embodiment of the information retrieval recommendation method based on the similarity of user portraits of this application.
  • the electronic device 1 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the electronic device 1 includes a processor 12, a memory 11, a network interface 14 and a communication bus 15.
  • the memory 11 includes at least one type of readable storage medium.
  • At least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. , Secure Digital (SD) card, Flash Card, etc.
  • SD Secure Digital
  • the readable storage medium of the memory 11 is generally used to store the information retrieval recommendation program 10 based on the similarity of user portraits installed in the electronic device 1 and the like.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, to perform execution based on user profile Similarity information retrieval recommendation program 10 etc.
  • CPU central processing unit
  • microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, to perform execution based on user profile Similarity information retrieval recommendation program 10 etc.
  • the network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the communication bus 15 is used to realize the connection and communication between these components.
  • FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 1 may also include a user interface.
  • the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 1 may also include a display, and the display may also be called a display screen or a display unit.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device.
  • OLED Organic Light-Emitting Diode
  • the display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the electronic device 1 further includes a touch sensor.
  • the area provided by the touch sensor for the user to perform a touch operation is called a touch area.
  • the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
  • the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
  • the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor.
  • the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.
  • the electronic device 1 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
  • RF radio frequency
  • the memory 11 as a computer storage medium may include an operating system and an information retrieval recommendation program 10 based on the similarity of user portraits; the processor 12 executes the user-based information stored in the memory 11
  • the image similarity information retrieval recommendation program 10 implements the following steps:
  • the user is recommended for information retrieval.
  • the user portraits of different users are obtained, and the user portrait similarity between the user portraits is determined to be obtained by the user portrait similarity calculation method based on the graph algorithm;
  • the user portrait similarity calculation method based on the graph algorithm includes the following steps:
  • P(X) is the user portrait of user X
  • P(Y) is the user portrait of user Y
  • the vertex e of is connected to the vertex é of P(Y) through the edge (e, é);
  • the user portrait similarity of user X and user Y is obtained according to the maximum weighted matching value.
  • the user portrait P(X) of user X is stored as:
  • the user portrait P(Y) of user Y is stored as:
  • the vertex e of the user portrait P(X) includes a corresponding first query element and a first document element
  • the vertex e of the user portrait P(Y) includes a corresponding second query element and a second document element
  • the process of obtaining the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) includes:
  • the similarity between the vertex e and the vertex é is determined based on the first similarity and the second similarity.
  • the first similarity between the first query element and the second query element is obtained through edit distance algorithm, Jaccard coefficient algorithm, TF algorithm, TFIDF algorithm, or Word2Vec algorithm;
  • the second similarity between the first document element and the second document element is obtained by the TFIDF algorithm or the space vector-based cosine algorithm.
  • the user portrait P(X) of the user X includes elements A, B, C, D, and E, wherein the elements A, B, C, D, and E include the first query element and the first document element;
  • the user portrait P(Y) of user Y includes elements 1, 2, 3, 4, and 5, where elements 1, 2, 3, 4, and 5 include the second query element and the second document element;
  • Step 1 Obtain all weighted matching values of the weighted bipartite graph by the following formula
  • M 1 w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
  • M 2 w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
  • M 2 w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
  • M 2 w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
  • w(i, j) represents the similarity between element i and element j or the weight of edge ij;
  • Step 2 Determine the maximum weighted matching value from all weighted matching values.
  • a user community can be created based on the user portrait similarity between the user P(X) and the user P(Y), and the user query results can be ranked and recommended according to the created user community.
  • the steps of querying based on the similarity of user portraits between user P(X) and user P(Y) include:
  • Step 1 Find a historical query record A similar to query q.
  • U m represents the user
  • q m is the query of the user U m
  • D qm is all documents related to the query q m
  • P(U) is the user portrait of the user U
  • P(U i ) is the user portrait of the user i
  • S(P(u),P(U i )) is the similarity of user portrait between user U and user I
  • s(q,q 1 ) is the similarity between sentence q and sentence qi, the above similarity
  • Both can be obtained by the user portrait similarity calculation method based on graph algorithm.
  • Step 2 Calculate all document collections related to query q.
  • Step 3 For each document d in the corpus, calculate the similarity between d and q to obtain the similarity r(d, q);
  • Step 4 Calculate the final ranking of each document in the corpus:
  • a and b are setting coefficients.
  • Step 5 According to the final ranking of the documents, the documents can be sorted to construct an output list. According to the output list, the sentence q that the user U needs to query can be queried and output.
  • the electronic device 1 proposed in the above embodiment obtains the similarity between user portraits through the maximum matching of the weighted bipartite graph, and can dynamically construct a user community based on the similarity of user portraits in a collaborative information retrieval environment, and is personalized according to the user community Information retrieval recommendation can improve user retrieval accuracy, optimize user experience, and achieve personalized recommendation.
  • this application also provides an information retrieval recommendation system based on the similarity of user portraits.
  • FIG. 2 it is a program module diagram of a preferred embodiment of the information retrieval recommendation system based on the similarity of user portraits in the embodiment of this application.
  • the information retrieval recommendation system based on the similarity of user portraits can be divided into:
  • the user portrait similarity determination unit 110 is configured to obtain user portraits of different users and determine the user portrait similarity between the user portraits;
  • the dynamic community creation unit 120 the user creates a user dynamic community based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
  • the search recommendation unit 130 is configured to perform information search and recommendation for users based on the user's dynamic community and the user's query sentence.
  • the user portrait similarity determination unit 110 further includes:
  • the user portrait storage module 111 is configured to store the user portrait P as a collection related to coordinates (q, D q ); where q represents any query record of the user, and D q represents all documents related to the query record q;
  • the weighted bipartite graph construction module 112 is used to construct a weighted bipartite graph based on the user profile P(X) and the user profile P(Y) to be processed; where P(X) is the user profile of user X, and P(Y) is the user User portrait of Y, vertex e of P(X) is connected to vertex é of P(Y) through edge (e, é);
  • the similarity acquisition module 113 is configured to acquire the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph;
  • the weight determination module 114 is configured to determine the weight of the edge (e, é) according to the similarity between the vertex e of P(X) and the vertex é of P(Y);
  • the maximum weighted matching value obtaining module 115 is configured to obtain the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é);
  • the user portrait similarity determination module 116 is configured to obtain the user portrait similarity of the user X and the user Y according to the maximum weighted matching value.
  • the user portrait P(X) of user X is stored as:
  • the user portrait P(Y) of user Y is stored as:
  • the vertex e of the user portrait P(X) includes the corresponding first query element and the first document element
  • the vertex é of the user portrait P(Y) includes the corresponding second query element and the second document element
  • the similarity acquisition module 113 includes:
  • the query element and document element similarity acquisition module 1131 configured to acquire the first similarity between the first query element and the second query element, and to acquire the second similarity between the first document element and the second document element;
  • the similarity determination module 1132 between vertices is used to determine the similarity between the vertex e and the vertex e based on the first similarity and the second similarity.
  • the query element and document element similarity acquisition module 1131 includes:
  • the first similarity acquisition module is used to acquire the first similarity between the first query element and the second query element through the edit distance algorithm, the Jaccard coefficient algorithm, the TF algorithm, the TFIDF algorithm, or the Word2Vec algorithm;
  • the second similarity acquisition module is configured to acquire the second similarity between the first document element and the second document element through the TFIDF algorithm or the space vector-based cosine algorithm.
  • the user portrait P(X) of user X includes elements A, B, C, D, and E, where elements A, B, C, D, and E include the first query element and the first document element;
  • the user portrait P(Y) of user Y includes elements 1, 2, 3, 4, and 5, where elements 1, 2, 3, 4, and 5 include the second query element and the second document element;
  • Step 1 Obtain all weighted matching values of the weighted bipartite graph by the following formula
  • M 1 w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
  • M 2 w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
  • M 2 w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
  • M 2 w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
  • w(i, j) represents the similarity between element i and element j or the weight of edge ij;
  • Step 2 Determine the maximum weighted matching value from all weighted matching values.
  • this application also provides an information retrieval recommendation method based on the similarity of user portraits.
  • FIG. 3 it is a flowchart of a preferred embodiment of an information retrieval recommendation method based on the similarity of user portraits according to this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the information retrieval recommendation method based on the similarity of user portraits includes the following steps:
  • Step S11 Obtain user portraits of different users, and determine the user portrait similarity between the user portraits.
  • Step S12 Create a user dynamic community based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community.
  • Step S13 Perform information search and recommendation on the user according to the user's dynamic community and the user's query sentence.
  • step S11 further includes the following steps:
  • Step S101 Store the user portrait P as a collection related to the coordinates (q, D q ); where q represents any query record of the user, and D q represents all documents related to the query record q.
  • user portraits are also known as user roles.
  • user portraits As an effective tool for delineating target users, contacting user demands and design directions, user portraits have been widely used in various fields. In the process of actual operation, we often use the most simple and life-like words to connect users' attributes, behaviors and expectations. As virtual representatives of actual users, the user roles formed by user portraits are not constructed outside of the product and the market. The user roles formed need to have representative performance to represent the main audience and target groups of the product.
  • the user portrait P(X) of user X can be stored as:
  • the user portrait P(Y) of user Y can be stored as:
  • the User Profile Similarity (UPS) between User X and User Y is to calculate the similarity between the above two sets of P(x) and P(y).
  • Step S102 Construct a weighted bipartite graph based on the user portrait P(X) and user portrait P(Y) to be processed; where P(X) is the user portrait of user X, P(Y) is the user portrait of user Y, and P The vertex e of (X) is connected to the vertex é of P(Y) through the edge (e, é).
  • bipartite graph is also called bipartite graph, which is a special model in graph theory.
  • the elements of the user portrait P(X) form part of the graph G, and the elements of P(Y) form another part of the graph.
  • Each vertex e of P(X) is connected to each vertex é of P(Y) by an edge (e, é).
  • the weight of the edge (e, é) is equal to the similarity between the vertices (or elements) e and é.
  • the weight of the edge (e, é) is related to the element type, and the element type includes query or document.
  • the vertex e of the user portrait P(X) includes a corresponding first query element and a first document element
  • the vertex e of the user portrait P(Y) includes a corresponding second query element and a second document element
  • the process of obtaining the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) includes:
  • Step S103 Obtain the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph.
  • each vertex e of the user portrait P(X) includes a corresponding query element and document element
  • each vertex of the user portrait P(Y) also includes a corresponding query element and document element.
  • the difference between vertex e and vertex é is obtained.
  • the similarity between the elements based on the similarity between the query vertices of the user profile P(x) and the user profile P(Y) and the similarity between the vertices of each document, we can determine all the vertices e and é The similarity.
  • the current method for obtaining the similarity of query sentences mainly includes: Edit distance algorithm, Jaccard coefficient algorithm, TF algorithm, TFIDF algorithm, Word2Vec algorithm, etc.
  • Edit Distance in English
  • Levenshtein distance algorithm refers to the minimum number of edit operations required to convert two strings from one to the other. If their distance is greater, they The more different.
  • the permitted editing operations include replacing one character with another, inserting a character, deleting a character, etc.
  • Jaccard coefficient called Jaccard index in English
  • Jaccard similarity coefficient which is used to compare the similarity and difference between a limited sample set.
  • the calculation method of the Jaccard coefficient is very simple. It is the value obtained by dividing the intersection of two samples by the union. When the two samples are exactly the same, the result is 1, and when the two samples are completely different, the result is 0.
  • the similarity calculation methods between the documents of the user profile P(X) and the user profile P(Y) mainly include the TFIDF algorithm and the cosine algorithm based on space vectors.
  • the first similarity between the first query element and the second query element is obtained by the edit distance algorithm, the Jacquard coefficient algorithm, the TF algorithm, the TFIDF algorithm, or the Word2Vec algorithm; between the first document element and the second document element
  • the second similarity of is obtained by TFIDF algorithm or cosine algorithm based on space vector.
  • Step S104 Determine the weight of the edge (e, é) according to the similarity between the vertex e of P(X) and the vertex é of P(Y).
  • the weight of the edge (e, é) can be set equal to the similarity between the vertex e of P(X) and the vertex é of P(Y).
  • Step S105 Obtain the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é).
  • the maximum matching of the bipartite graph mainly refers to: given a bipartite graph G, in a subgraph M of the bipartite graph G, any two edges in the edge set of M are not attached to the same vertex, then M is called a match. Choosing such a subset with the largest number of edges is called the maximum matching problem of the graph. If in a match, every vertex in the graph is associated with an edge in the graph, then the match is called a complete match , Also known as complete matching.
  • the user portrait P(X) of user X includes elements A, B, C, D, and E, where A, B, C, D, and E contain the first query element and the first document element, and the user portrait of user Y P(Y) contains elements 1, 2, 3, 4, and 5, of which 1, 2, 3, 4, and 5 contain the second query element and the second document element.
  • the user profile P(X) and the user profile P(Y ) The constructed bipartite graph is shown in Figure 4.
  • the weighted matching value of the maximum matching situation is calculated by the following formula:
  • M 1 w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
  • M 2 w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
  • M 2 w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
  • M 2 w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
  • w(i, j) represents the similarity between element i and element j or the weight of edge ij; for example, w(A, 1) represents the similarity between element A and element 1, which also represents the edge
  • w(B,3), w(C,2)...w(E,5), etc. are similar.
  • the maximum weighted matching value is determined from all the weighted matching values.
  • the maximum weighted matching value is 3.5.
  • Step S106 Acquire the user portrait similarity of the user X and the user Y according to the maximum weighted matching value.
  • a user community can be created based on the user portrait similarity between the user P(X) and the user P(Y), and the user query results can be performed according to the created user community. Sort recommendation.
  • the steps of querying based on the similarity of user portraits between user P(X) and user P(Y) include:
  • Step 1 Find a historical query record A similar to query q.
  • U m represents the user
  • q m is the query of the user U m
  • D qm is all documents related to the query q m
  • P(U) is the user portrait of the user U
  • P(U i ) is the user portrait of the user i
  • S(P(U),P(U i )) is the similarity of user portrait between user U and user I
  • s(q,q 1 ) is the similarity between sentence q and sentence qi, the above similarity
  • Both can be obtained by the user portrait similarity calculation method based on graph algorithm.
  • Step 2 Calculate all document collections related to query q.
  • Step 3 For each document d in the corpus, calculate the similarity between d and q to obtain the similarity r(d, q);
  • Step 4 Calculate the final ranking of each document in the corpus:
  • a and b are setting coefficients.
  • Step 5 According to the final ranking of the documents, the documents can be sorted to construct an output list. According to the output list, the sentence q that the user U needs to query can be queried and output.
  • the weighted bipartite graph maximum matching method is used to obtain the similarity between user portraits, and the user community can be dynamically constructed based on the similarity of user portraits in the collaborative information retrieval environment, and according to users
  • the community’s personalized information retrieval recommendation can improve user retrieval accuracy, optimize user experience, and achieve personalized recommendation.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium includes an information retrieval recommendation program based on the similarity of user portraits.
  • the information retrieval recommendation program based on the similarity of user portraits is implemented when the processor is executed. Do as follows:
  • the user is recommended for information retrieval.
  • the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned information retrieval recommendation method, electronic device, and system based on the similarity of user portraits, and will not be repeated here.

Abstract

A portrait similarity-based information retrieval recommendation method, a device, a system and a storage medium, the method comprising: obtaining user portraits of different users, and determining the user portrait similarity between the user portraits (S11); creating a dynamic user community on the basis of the user portrait similarity, and enabling users having similar portraits to be grouped into the same dynamic user community (S12); and performing information retrieval recommendation on the user according to the dynamic user community and a query statement of the user (S13). By calculating the similarity between the user portraits, the similarity between different users may be obtained, and personalized information retrieval and recommendation may be achieved.

Description

基于画像相似性的信息检索推荐方法、装置及存储介质Information retrieval recommendation method, device and storage medium based on portrait similarity
本申请要求申请号为201910748591.3,申请日为2019年8月14日,发明创造名称为“基于画像相似性的信息检索推荐方法、装置及存储介质”的专利申请的优先权。This application requires the priority of the patent application whose application number is 201910748591.3, the filing date is August 14, 2019, and the invention-creation title is "Method, device and storage medium for information retrieval and recommendation based on portrait similarity".
技术领域Technical field
本申请涉及数据分析技术领域,尤其涉及一种基于用户画像相似性的信息检索推荐方法、装置、系统及计算机可读存储介质。This application relates to the field of data analysis technology, and in particular to an information retrieval recommendation method, device, system and computer-readable storage medium based on the similarity of user portraits.
背景技术Background technique
协同信息检索(Collaborative Information Retrieval,CIR)是一种基于社会关系的信息检索方法,该CIR协同信息检索系统能够对用户交互历史记录进行分析,以便更有效地响应后续的用户查询。但是,当两个用户同时向CIR系统发送相同的查询时,由于两个用户的目标和行为特征可能不同,二者可能对两个不同的文档列表感兴趣,此时,CIR就面临个性化的查询推荐问题。Collaborative Information Retrieval (CIR) is an information retrieval method based on social relations. The CIR collaborative information retrieval system can analyze user interaction history records to more effectively respond to subsequent user queries. However, when two users send the same query to the CIR system at the same time, because the goals and behavior characteristics of the two users may be different, they may be interested in two different document lists. At this time, CIR faces a personalized Query recommended questions.
目前,信息检索是用户进行信息查询和获取的主要方式,是查找信息的方法和手段,信息的存储是实现信息检索的基础,这里要存储的信息包括原始文档数据、图片、视频和音频等,为实现信息检索首先要将这些原始信息进行计算机语言的转换,并将其存储在数据库中,否则无法进行机器识别。待用户根据意图输入查询请求后,检索系统根据用户的查询请求在数据库中搜索与查询相关的信息,通过一定的匹配机制计算出信息的相似度大小,并按从大到小的顺序将信息转换输出。At present, information retrieval is the main way for users to query and obtain information. It is a method and means to find information. Information storage is the basis for information retrieval. The information to be stored here includes original document data, pictures, videos, and audio. In order to achieve information retrieval, the original information must be converted into computer language and stored in the database, otherwise machine identification cannot be performed. After the user enters the query request according to the intention, the retrieval system searches the database for information related to the query according to the user’s query request, calculates the similarity of the information through a certain matching mechanism, and converts the information in order from large to small Output.
发明人意识到,现有的信息检索方法要么比较复杂,要么检索精度差,个性化不足,导致推荐效果差,用户体验不佳。The inventor realizes that the existing information retrieval methods are either relatively complicated, or have poor retrieval accuracy and insufficient personalization, resulting in poor recommendation effects and poor user experience.
发明内容Summary of the invention
本申请提供一种基于用户画像相似性的信息检索推荐方法、电子装置、系统及计算机可读存储介质,其主要目的在于通过加权二分图最大匹配获取 用户画像相似性的方式,获取不同用户之间的画像相似性,该方法能够在协同信息检索环境中动态构建用户社区,并将其应用于个性化信息检索,提高检索准确率,优化用户体验。This application provides a method, electronic device, system and computer-readable storage medium for information retrieval and recommendation based on the similarity of user portraits. The main purpose of the method is to obtain the similarity of user portraits through the maximum matching of weighted bipartite graphs, and to obtain information between different users. This method can dynamically build user communities in a collaborative information retrieval environment and apply it to personalized information retrieval, improve retrieval accuracy, and optimize user experience.
为实现上述目的,本申请提供一种基于用户画像相似性的信息检索推荐方法,应用于电子装置,方法包括:To achieve the above objective, this application provides an information retrieval recommendation method based on the similarity of user portraits, which is applied to an electronic device, and the method includes:
获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;Acquire user portraits of different users, and determine the similarity of user portraits between user portraits;
基于用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;Create user dynamic communities based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
根据用户动态社区及用户的查询语句对用户进行信息检索推荐。According to the user's dynamic community and the user's query statement, the user is recommended for information retrieval.
为实现上述目的,本申请还提供一种电子装置,该电子装置包括:存储器及处理器,存储器中包括基于画像相似性的信息检索推荐程序,基于用户画像相似性的信息检索推荐程序被处理器执行时实现如下步骤:In order to achieve the above objective, the present application also provides an electronic device, the electronic device comprising: a memory and a processor, the memory includes an information retrieval recommendation program based on the similarity of portraits, and the information retrieval recommendation program based on the similarity of user portraits is processed by the processor. The following steps are implemented during execution:
获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;Acquire user portraits of different users, and determine the similarity of user portraits between user portraits;
基于用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;Create user dynamic communities based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
根据用户动态社区及用户的查询语句对用户进行信息检索推荐。According to the user's dynamic community and the user's query statement, the user is recommended for information retrieval.
为实现上述目的,本申请还提供一种基于画像相似性的信息检索推荐系统,包括:In order to achieve the above objective, this application also provides an information retrieval recommendation system based on the similarity of portraits, including:
用户画像相似性确定单元,用于获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;The user portrait similarity determination unit is used to obtain user portraits of different users and determine the user portrait similarity between user portraits;
动态社区创建单元,用户基于用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;Dynamic community creation unit, users create user dynamic communities based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
检索推荐单元,用于根据用户动态社区及用户的查询语句对用户进行信息检索推荐。The search recommendation unit is used to perform information search and recommendation for users according to the user's dynamic community and the user's query sentence.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,计算机可读存储介质中包括基于用户画像相似性的信息检索推荐程序,基于用户画像相似性的信息检索推荐程序被处理器执行时,实现如上的基于用户画像相似性的信息检索推荐方法的步骤。In addition, in order to achieve the above-mentioned purpose, the present application also provides a computer-readable storage medium. The computer-readable storage medium includes an information retrieval recommendation program based on the similarity of user portraits. The information retrieval recommendation program based on the similarity of user portraits is processed by the processor. When executed, the steps of the above information retrieval recommendation method based on the similarity of the user portrait are realized.
本申请提出的基于用户画像相似性的信息检索推荐方法、装置、系统及计算机可读存储介质,基于用户画像构造加权二分图,并采用加权二分图最 大匹配的方式获取用户画像之间的最大加权匹配值,能够在协同信息检索环境中基于用户画像相似性动态构建用户社区,并根据用户社区进行个性化信息检索推荐,能够提高用户检索准确率,优化用户体验,实现个性化推荐。The method, device, system and computer-readable storage medium for information retrieval and recommendation based on the similarity of user portraits proposed in this application construct a weighted bipartite graph based on user portraits, and obtain the maximum weight between user portraits by using the maximum matching of the weighted bipartite graphs The matching value can dynamically construct a user community based on the similarity of user portraits in a collaborative information retrieval environment, and perform personalized information retrieval recommendations based on the user community, which can improve user retrieval accuracy, optimize user experience, and achieve personalized recommendations.
为了实现上述以及相关目的,本申请的一个或多个方面包括后面将详细说明的特征。下面的说明以及附图详细说明了本申请的某些示例性方面。然而,这些方面指示的仅仅是可使用本申请的原理的各种方式中的一些方式。此外,本申请旨在包括所有这些方面以及它们的等同物。In order to achieve the above and related objects, one or more aspects of the present application include features that will be described in detail later. The following description and drawings illustrate certain exemplary aspects of the present application in detail. However, these aspects indicate only some of the various ways in which the principles of this application can be used. Furthermore, this application is intended to include all these aspects and their equivalents.
附图说明Description of the drawings
图1为根据本申请基于用户画像相似性的信息检索推荐方法较佳实施例的应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of an information retrieval recommendation method based on the similarity of user portraits according to the present application;
图2为根据本申请基于用户画像相似性的信息检索推荐系统较佳实施例的模块示意图;2 is a schematic diagram of modules of a preferred embodiment of an information retrieval recommendation system based on the similarity of user portraits according to the present application;
图3为根据本申请基于用户画像相似性的信息检索推荐方法较佳实施例的流程图;3 is a flowchart of a preferred embodiment of an information retrieval recommendation method based on the similarity of user portraits according to the present application;
图4为基于图算法的用户画像相似性计算方法流程图:Figure 4 is a flowchart of a method for calculating the similarity of user portraits based on graph algorithms:
图5为基于两个不同用户的用户画像构造的二分图。Figure 5 is a bipartite graph constructed based on user portraits of two different users.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式detailed description
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.
本申请提供一种基于用户画像相似性的信息检索推荐方法,应用于一种电子装置1。参照图1所示,为本申请基于用户画像相似性的信息检索推荐方法较佳实施例的应用环境示意图。This application provides an information retrieval and recommendation method based on the similarity of user portraits, which is applied to an electronic device 1. Referring to FIG. 1, it is a schematic diagram of the application environment of the preferred embodiment of the information retrieval recommendation method based on the similarity of user portraits of this application.
在本实施例中,电子装置1可以是服务器、智能手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端设备。In this embodiment, the electronic device 1 may be a terminal device with arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, a desktop computer, and the like.
该电子装置1包括:处理器12、存储器11、网络接口14及通信总线15。The electronic device 1 includes a processor 12, a memory 11, a network interface 14 and a communication bus 15.
存储器11包括至少一种类型的可读存储介质。至少一种类型的可读存储 介质可为如闪存、硬盘、多媒体卡、卡型存储器11等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子装置1的内部存储单元,例如该电子装置1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子装置1的外部存储器11,例如所述电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. At least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory 11, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be the external memory 11 of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. , Secure Digital (SD) card, Flash Card, etc.
在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子装置1的基于用户画像相似性的信息检索推荐程序10等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。In this embodiment, the readable storage medium of the memory 11 is generally used to store the information retrieval recommendation program 10 based on the similarity of user portraits installed in the electronic device 1 and the like. The memory 11 can also be used to temporarily store data that has been output or will be output.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行基于用户画像相似性的信息检索推荐程序10等。In some embodiments, the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, to perform execution based on user profile Similarity information retrieval recommendation program 10 etc.
网络接口14可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子装置1与其他电子设备之间建立通信连接。The network interface 14 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is generally used to establish a communication connection between the electronic device 1 and other electronic devices.
通信总线15用于实现这些组件之间的连接通信。The communication bus 15 is used to realize the connection and communication between these components.
图1仅示出了具有组件11-15的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。FIG. 1 only shows the electronic device 1 with the components 11-15, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
可选地,该电子装置1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的设备、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may also include a user interface. The user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other devices with voice recognition functions, and a voice output device such as audio, earphones, etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.
可选地,该电子装置1还可以包括显示器,显示器也可以称为显示屏或显示单元。在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子装置1中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device 1 may also include a display, and the display may also be called a display screen or a display unit. In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device. The display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
可选地,该电子装置1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。Optionally, the electronic device 1 further includes a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is called a touch area. In addition, the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like. In addition, the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
此外,该电子装置1的显示器的面积可以与所述触摸传感器的面积相同,也可以不同。可选地,将显示器与所述触摸传感器层叠设置,以形成触摸显示屏。该装置基于触摸显示屏侦测用户触发的触控操作。In addition, the area of the display of the electronic device 1 may be the same as or different from the area of the touch sensor. Optionally, the display and the touch sensor are stacked to form a touch display screen. The device detects the touch operation triggered by the user based on the touch screen.
可选地,该电子装置1还可以包括射频(Radio Frequency,RF)电路,传感器、音频电路等等,在此不再赘述。Optionally, the electronic device 1 may also include a radio frequency (RF) circuit, a sensor, an audio circuit, etc., which will not be repeated here.
在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统、以及基于用户画像相似性的信息检索推荐程序10;处理器12执行存储器11中存储的基于用户画像相似性的信息检索推荐程序10时实现如下步骤:In the device embodiment shown in FIG. 1, the memory 11 as a computer storage medium may include an operating system and an information retrieval recommendation program 10 based on the similarity of user portraits; the processor 12 executes the user-based information stored in the memory 11 The image similarity information retrieval recommendation program 10 implements the following steps:
获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;Acquire user portraits of different users, and determine the similarity of user portraits between user portraits;
基于用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;Create user dynamic communities based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
根据用户动态社区及用户的查询语句对用户进行信息检索推荐。According to the user's dynamic community and the user's query statement, the user is recommended for information retrieval.
在上述步骤中,获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性可通过基于图算法的用户画像相似性计算方法来获取;In the above steps, the user portraits of different users are obtained, and the user portrait similarity between the user portraits is determined to be obtained by the user portrait similarity calculation method based on the graph algorithm;
具体地,基于图算法的用户画像相似性计算方法包括以下步骤:Specifically, the user portrait similarity calculation method based on the graph algorithm includes the following steps:
将用户画像P存储为与坐标(q,D q)相关的集合;其中,q表示用户的任意一个查询记录,D q表示与查询记录q相关的所有文档; Store the user portrait P as a collection related to coordinates (q, D q ); where q represents any query record of the user, and D q represents all documents related to the query record q;
基于待处理的用户画像P(X)和用户画像P(Y)构造加权二分图;其中,P(X)为用户X的用户画像,P(Y)为用户Y的用户画像,P(X)的顶点e通过边(e,é)连接到P(Y)的顶点é;Construct a weighted bipartite graph based on the user portrait P(X) and user portrait P(Y) to be processed; among them, P(X) is the user portrait of user X, P(Y) is the user portrait of user Y, and P(X) The vertex e of is connected to the vertex é of P(Y) through the edge (e, é);
基于加权二分图获取用户画像P(X)的顶点e与用户画像P(Y)的顶点é之间的相似性;Obtain the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph;
根据P(X)的顶点e与P(Y)的顶点é之间的相似性确定边(e,é)的权重;Determine the weight of the edge (e, é) according to the similarity between the vertex e of P(X) and the vertex é of P(Y);
基于边(e,é)的权重获取用户画像P(X)和用户画像P(Y)之间的最大加权匹配值;Obtain the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é);
根据最大加权匹配值获取用户X和用户Y的用户画像相似性。The user portrait similarity of user X and user Y is obtained according to the maximum weighted matching value.
优选地,用户X的用户画像P(X)存储为:Preferably, the user portrait P(X) of user X is stored as:
Figure PCTCN2019117794-appb-000001
Figure PCTCN2019117794-appb-000001
用户Y的用户画像P(Y)存储为:The user portrait P(Y) of user Y is stored as:
Figure PCTCN2019117794-appb-000002
Figure PCTCN2019117794-appb-000002
其中,
Figure PCTCN2019117794-appb-000003
表示用户X的第i个查询,
Figure PCTCN2019117794-appb-000004
表示与查询
Figure PCTCN2019117794-appb-000005
有关的所有文档;
Figure PCTCN2019117794-appb-000006
表示用户Y的第j个查询,
Figure PCTCN2019117794-appb-000007
表示与查询
Figure PCTCN2019117794-appb-000008
有关的所有文档。
among them,
Figure PCTCN2019117794-appb-000003
Represents the i-th query of user X,
Figure PCTCN2019117794-appb-000004
Representation and query
Figure PCTCN2019117794-appb-000005
All relevant documents;
Figure PCTCN2019117794-appb-000006
Represents the jth query of user Y,
Figure PCTCN2019117794-appb-000007
Representation and query
Figure PCTCN2019117794-appb-000008
All relevant documents.
优选地,用户画像P(X)的顶点e包括对应的第一查询元素和第一文档元素,用户画像P(Y)的顶点é包括对应的第二查询元素和第二文档元素;Preferably, the vertex e of the user portrait P(X) includes a corresponding first query element and a first document element, and the vertex e of the user portrait P(Y) includes a corresponding second query element and a second document element;
获取用户画像P(X)的顶点e与用户画像P(Y)的顶点é之间的相似性的过程包括:The process of obtaining the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) includes:
获取第一查询元素和第二查询元素之间的第一相似性,以及获取第一文档元素和第二文档元素之间的第二相似性;Acquiring the first similarity between the first query element and the second query element, and acquiring the second similarity between the first document element and the second document element;
基于第一相似性和第二相似性确定顶点e和顶点é之间的相似性。The similarity between the vertex e and the vertex é is determined based on the first similarity and the second similarity.
优选地,第一查询元素和第二查询元素的第一相似性通过编辑距离算法、杰卡德系数算法、TF算法、TFIDF算法或Word2Vec算法获取;Preferably, the first similarity between the first query element and the second query element is obtained through edit distance algorithm, Jaccard coefficient algorithm, TF algorithm, TFIDF algorithm, or Word2Vec algorithm;
第一文档元素和第二文档元素之间的第二相似性通过TFIDF算法或基于空间向量的余弦算法获取。The second similarity between the first document element and the second document element is obtained by the TFIDF algorithm or the space vector-based cosine algorithm.
优选地,用户X的用户画像P(X)包括元素A、B、C、D、E,其中元素A、B、C、D、E包含第一查询元素和第一文档元素;Preferably, the user portrait P(X) of the user X includes elements A, B, C, D, and E, wherein the elements A, B, C, D, and E include the first query element and the first document element;
用户Y的用户画像P(Y)包含元素1、2、3、4、5,其中元素1、2、3、4、5包含第二查询元素和第二文档元素;The user portrait P(Y) of user Y includes elements 1, 2, 3, 4, and 5, where elements 1, 2, 3, 4, and 5 include the second query element and the second document element;
第一步:通过以下公式获取加权二分图的所有的加权匹配值;Step 1: Obtain all weighted matching values of the weighted bipartite graph by the following formula;
M 1=w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5) M 1 =w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
M 2=w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2) M 2 =w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
M 2=w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5) M 2 =w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
M 2=w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2) M 2 =w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
其中,w(i,j)表示元素i和元素j之间的相似性或者边ij的权重;Among them, w(i, j) represents the similarity between element i and element j or the weight of edge ij;
第二步:从所有的加权匹配值中确定最大加权匹配值。Step 2: Determine the maximum weighted matching value from all weighted matching values.
待获取不同用户的用户画像相似性之后,可以基于用户P(X)和用户P(Y)之间的用户画像相似性创建用户社区,并根据所创建的用户社区对用户查询结果进行排序推荐。After obtaining the user portrait similarity of different users, a user community can be created based on the user portrait similarity between the user P(X) and the user P(Y), and the user query results can be ranked and recommended according to the created user community.
作为具体示例,假设用户U需要查询的语句为q,基于用户P(X)和用户P(Y)之间的用户画像相似性进行查询的步骤包括:As a specific example, assuming that the sentence that user U needs to query is q, the steps of querying based on the similarity of user portraits between user P(X) and user P(Y) include:
步骤一:寻找与查询q相似的历史查询记录A。Step 1: Find a historical query record A similar to query q.
设A={(U 1,q 1,D q1),(U 2,q 2,D q2),…(U m,q m,D qm)} Let A={(U 1 ,q 1 ,D q1 ),(U 2 ,q 2 ,D q2 ),...(U m ,q m ,D qm )}
s(q,q i)>θ且s(P(U),P(U i))>ω 1≤i≤m s(q,q i )>θ and s(P(U),P(U i ))>ω 1≤i≤m
其中,U m表示用户,q m为用户U m的查询,D qm为与查询q m相关的所有文档,P(U)为用户U的用户画像,P(U i)为用户i的用户画像,s(P(u),P(U i))为用户U和用户I之间的用户画像相似性;s(q,q 1)为语句q与语句qi之间的相似性,上述相似性均可通过基于图算法的用户画像相似性计算方法获得。 Among them, U m represents the user, q m is the query of the user U m , D qm is all documents related to the query q m , P(U) is the user portrait of the user U, P(U i ) is the user portrait of the user i , S(P(u),P(U i )) is the similarity of user portrait between user U and user I; s(q,q 1 ) is the similarity between sentence q and sentence qi, the above similarity Both can be obtained by the user portrait similarity calculation method based on graph algorithm.
步骤二:计算所有与查询q相关的文档集合。Step 2: Calculate all document collections related to query q.
D q=D q1∪D q2∪…D qm D q =D q1 ∪D q2 ∪…D qm
其次,对于语料库中每个文档d符合d∈D q,计算如下得分: Secondly, for each document d in the corpus meets d ∈ D q , calculate the following score:
Figure PCTCN2019117794-appb-000009
Figure PCTCN2019117794-appb-000009
而对于每个d不属于D q,则默认R(U,d,q)=0; And for each d that does not belong to D q , the default R(U,d,q)=0;
步骤三:对于语料库中的每个文档d,计算d和q之间的相似性,获取相似性r(d,q);Step 3: For each document d in the corpus, calculate the similarity between d and q to obtain the similarity r(d, q);
步骤四:计算文集中每个文档的最终排名:Step 4: Calculate the final ranking of each document in the corpus:
R final(U,d,q)=a*r(d,q)+b*R(U,d,q) R final (U,d,q)=a*r(d,q)+b*R(U,d,q)
其中,a和b为设定系数。Among them, a and b are setting coefficients.
步骤五:按文档的最终排名可对文档进行排序,以构造输出列表,根据输出列表即可对用户U需要查询的语句q进行查询输出。Step 5: According to the final ranking of the documents, the documents can be sorted to construct an output list. According to the output list, the sentence q that the user U needs to query can be queried and output.
上述实施例提出的电子装置1,通过加权二分图最大匹配的方式获取用户画像之间的相似性,能够在协同信息检索环境中基于用户画像相似性动态构建用户社区,并根据用户社区进行个性化信息检索推荐,能够提高用户检索准确率,优化用户体验,实现个性化推荐。The electronic device 1 proposed in the above embodiment obtains the similarity between user portraits through the maximum matching of the weighted bipartite graph, and can dynamically construct a user community based on the similarity of user portraits in a collaborative information retrieval environment, and is personalized according to the user community Information retrieval recommendation can improve user retrieval accuracy, optimize user experience, and achieve personalized recommendation.
与上述电子装置相对应,本申请还提供一种基于用户画像相似性的信息检索推荐系统。参照图2所示,为本申请实施例中基于用户画像相似性的信息检索推荐系统较佳实施例的程序模块图。基于用户画像相似性的信息检索推荐系统可以被分割为:Corresponding to the above electronic device, this application also provides an information retrieval recommendation system based on the similarity of user portraits. Referring to FIG. 2, it is a program module diagram of a preferred embodiment of the information retrieval recommendation system based on the similarity of user portraits in the embodiment of this application. The information retrieval recommendation system based on the similarity of user portraits can be divided into:
用户画像相似性确定单元110,用于获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;The user portrait similarity determination unit 110 is configured to obtain user portraits of different users and determine the user portrait similarity between the user portraits;
动态社区创建单元120,用户基于用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;The dynamic community creation unit 120, the user creates a user dynamic community based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
检索推荐单元130,用于根据用户动态社区及用户的查询语句对用户进行 信息检索推荐。The search recommendation unit 130 is configured to perform information search and recommendation for users based on the user's dynamic community and the user's query sentence.
用户画像相似性确定单元110进一步包括:The user portrait similarity determination unit 110 further includes:
用户画像存储模块111,用于将用户画像P存储为与坐标(q,D q)相关的集合;其中,q表示用户的任意一个查询记录,D q表示与查询记录q相关的所有文档; The user portrait storage module 111 is configured to store the user portrait P as a collection related to coordinates (q, D q ); where q represents any query record of the user, and D q represents all documents related to the query record q;
加权二分图构造模块112,用于基于待处理的用户画像P(X)和用户画像P(Y)构造加权二分图;其中,P(X)为用户X的用户画像,P(Y)为用户Y的用户画像,P(X)的顶点e通过边(e,é)连接到P(Y)的顶点é;The weighted bipartite graph construction module 112 is used to construct a weighted bipartite graph based on the user profile P(X) and the user profile P(Y) to be processed; where P(X) is the user profile of user X, and P(Y) is the user User portrait of Y, vertex e of P(X) is connected to vertex é of P(Y) through edge (e, é);
相似性获取模块113,用于基于加权二分图获取用户画像P(X)的顶点e与用户画像P(Y)的顶点é之间的相似性;The similarity acquisition module 113 is configured to acquire the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph;
权重确定模块114,用于根据P(X)的顶点e与P(Y)的顶点é之间的相似性确定边(e,é)的权重;The weight determination module 114 is configured to determine the weight of the edge (e, é) according to the similarity between the vertex e of P(X) and the vertex é of P(Y);
最大加权匹配值获取模块115,用于基于边(e,é)的权重获取用户画像P(X)和用户画像P(Y)之间的最大加权匹配值;The maximum weighted matching value obtaining module 115 is configured to obtain the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é);
用户画像相似性确定模块116,用于根据最大加权匹配值获取用户X和用户Y的用户画像相似性。The user portrait similarity determination module 116 is configured to obtain the user portrait similarity of the user X and the user Y according to the maximum weighted matching value.
具体地,用户X的用户画像P(X)存储为:Specifically, the user portrait P(X) of user X is stored as:
Figure PCTCN2019117794-appb-000010
Figure PCTCN2019117794-appb-000010
用户Y的用户画像P(Y)存储为:The user portrait P(Y) of user Y is stored as:
Figure PCTCN2019117794-appb-000011
Figure PCTCN2019117794-appb-000011
其中,
Figure PCTCN2019117794-appb-000012
表示用户X的第i个查询,
Figure PCTCN2019117794-appb-000013
表示与查询
Figure PCTCN2019117794-appb-000014
有关的所有文档;
Figure PCTCN2019117794-appb-000015
表示用户Y的第j个查询,
Figure PCTCN2019117794-appb-000016
表示与查询
Figure PCTCN2019117794-appb-000017
有关的所有文档。
among them,
Figure PCTCN2019117794-appb-000012
Represents the i-th query of user X,
Figure PCTCN2019117794-appb-000013
Representation and query
Figure PCTCN2019117794-appb-000014
All relevant documents;
Figure PCTCN2019117794-appb-000015
Represents the jth query of user Y,
Figure PCTCN2019117794-appb-000016
Representation and query
Figure PCTCN2019117794-appb-000017
All relevant documents.
其中,用户画像P(X)的顶点e包括对应的第一查询元素和第一文档元素,用户画像P(Y)的顶点é包括对应的第二查询元素和第二文档元素;Wherein, the vertex e of the user portrait P(X) includes the corresponding first query element and the first document element, and the vertex é of the user portrait P(Y) includes the corresponding second query element and the second document element;
相似性获取模块113包括:The similarity acquisition module 113 includes:
查询元素和文档元素相似性获取模块1131,用于获取第一查询元素和第二查询元素之间的第一相似性,以及获取第一文档元素和第二文档元素之间的第二相似性;The query element and document element similarity acquisition module 1131, configured to acquire the first similarity between the first query element and the second query element, and to acquire the second similarity between the first document element and the second document element;
顶点间相似性确定模块1132,用于基于第一相似性和第二相似性确定顶点e和顶点é之间的相似性。The similarity determination module 1132 between vertices is used to determine the similarity between the vertex e and the vertex e based on the first similarity and the second similarity.
查询元素和文档元素相似性获取模块1131包括:The query element and document element similarity acquisition module 1131 includes:
第一相似性获取模块,用于通过编辑距离算法、杰卡德系数算法、TF算法、TFIDF算法或Word2Vec算法获取第一查询元素和第二查询元素的第一相似性;The first similarity acquisition module is used to acquire the first similarity between the first query element and the second query element through the edit distance algorithm, the Jaccard coefficient algorithm, the TF algorithm, the TFIDF algorithm, or the Word2Vec algorithm;
第二相似性获取模块,用于通过TFIDF算法或基于空间向量的余弦算法获取第一文档元素和第二文档元素之间的第二相似性。The second similarity acquisition module is configured to acquire the second similarity between the first document element and the second document element through the TFIDF algorithm or the space vector-based cosine algorithm.
用户X的用户画像P(X)包括元素A、B、C、D、E,其中元素A、B、C、D、E包含第一查询元素和第一文档元素;The user portrait P(X) of user X includes elements A, B, C, D, and E, where elements A, B, C, D, and E include the first query element and the first document element;
用户Y的用户画像P(Y)包含元素1、2、3、4、5,其中元素1、2、3、4、5包含第二查询元素和第二文档元素;The user portrait P(Y) of user Y includes elements 1, 2, 3, 4, and 5, where elements 1, 2, 3, 4, and 5 include the second query element and the second document element;
第一步:通过以下公式获取加权二分图的所有的加权匹配值;Step 1: Obtain all weighted matching values of the weighted bipartite graph by the following formula;
M 1=w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5) M 1 =w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
M 2=w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2) M 2 =w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
M 2=w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5) M 2 =w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
M 2=w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2) M 2 =w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
其中,w(i,j)表示元素i和元素j之间的相似性或者边ij的权重;Among them, w(i, j) represents the similarity between element i and element j or the weight of edge ij;
第二步:从所有的加权匹配值中确定最大加权匹配值。Step 2: Determine the maximum weighted matching value from all weighted matching values.
此外,本申请还提供一种基于用户画像相似性的信息检索推荐方法。参照图3所示,为本申请基于用户画像相似性的信息检索推荐方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。In addition, this application also provides an information retrieval recommendation method based on the similarity of user portraits. Referring to FIG. 3, it is a flowchart of a preferred embodiment of an information retrieval recommendation method based on the similarity of user portraits according to this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
在本实施例中,基于用户画像相似性的信息检索推荐方法包括以下步骤:In this embodiment, the information retrieval recommendation method based on the similarity of user portraits includes the following steps:
步骤S11:获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性。Step S11: Obtain user portraits of different users, and determine the user portrait similarity between the user portraits.
步骤S12:基于用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内。Step S12: Create a user dynamic community based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community.
步骤S13:根据用户动态社区及用户的查询语句对用户进行信息检索推荐。Step S13: Perform information search and recommendation on the user according to the user's dynamic community and the user's query sentence.
如图4基于图算法的用户画像相似性计算方法流程所示,上述步骤S11进一步包括以下步骤:As shown in the flowchart of the user portrait similarity calculation method based on the graph algorithm in FIG. 4, the above step S11 further includes the following steps:
步骤S101:将用户画像P存储为与坐标(q,D q)相关的集合;其中,q表示用户的任意一个查询记录,D q表示与查询记录q相关的所有文档。 Step S101: Store the user portrait P as a collection related to the coordinates (q, D q ); where q represents any query record of the user, and D q represents all documents related to the query record q.
其中,用户画像又称用户角色,作为一种勾画目标用户、联系用户诉求与设计方向的有效工具,用户画像在各领域得到了广泛的应用。我们在实际 操作的过程中往往会以最为浅显和贴近生活的话语将用户的属性、行为与期待联结起来。作为实际用户的虚拟代表,用户画像所形成的用户角色并不是脱离产品和市场之外所构建出来的,形成的用户角色需要有代表性能代表产品的主要受众和目标群体。Among them, user portraits are also known as user roles. As an effective tool for delineating target users, contacting user demands and design directions, user portraits have been widely used in various fields. In the process of actual operation, we often use the most simple and life-like words to connect users' attributes, behaviors and expectations. As virtual representatives of actual users, the user roles formed by user portraits are not constructed outside of the product and the market. The user roles formed need to have representative performance to represent the main audience and target groups of the product.
在本申请中,用户X的用户画像P(X)可存储为:In this application, the user portrait P(X) of user X can be stored as:
Figure PCTCN2019117794-appb-000018
Figure PCTCN2019117794-appb-000018
用户Y的用户画像P(Y)可存储为:The user portrait P(Y) of user Y can be stored as:
Figure PCTCN2019117794-appb-000019
Figure PCTCN2019117794-appb-000019
其中,
Figure PCTCN2019117794-appb-000020
表示用户X的第i个查询,
Figure PCTCN2019117794-appb-000021
表示与查询
Figure PCTCN2019117794-appb-000022
有关的所有文档;
Figure PCTCN2019117794-appb-000023
表示用户Y的第j个查询,
Figure PCTCN2019117794-appb-000024
表示与查询
Figure PCTCN2019117794-appb-000025
有关的所有文档。
among them,
Figure PCTCN2019117794-appb-000020
Represents the i-th query of user X,
Figure PCTCN2019117794-appb-000021
Representation and query
Figure PCTCN2019117794-appb-000022
All relevant documents;
Figure PCTCN2019117794-appb-000023
Represents the jth query of user Y,
Figure PCTCN2019117794-appb-000024
Representation and query
Figure PCTCN2019117794-appb-000025
All relevant documents.
因此,用户X和用户Y之间的用户画像相似性(User Profile Similarity,UPS)即为计算以上两组集合P(x)与P(y)之间的相似性。Therefore, the User Profile Similarity (UPS) between User X and User Y is to calculate the similarity between the above two sets of P(x) and P(y).
步骤S102:基于待处理的用户画像P(X)和用户画像P(Y)构造加权二分图;其中,P(X)为用户X的用户画像,P(Y)为用户Y的用户画像,P(X)的顶点e通过边(e,é)连接到P(Y)的顶点é。Step S102: Construct a weighted bipartite graph based on the user portrait P(X) and user portrait P(Y) to be processed; where P(X) is the user portrait of user X, P(Y) is the user portrait of user Y, and P The vertex e of (X) is connected to the vertex é of P(Y) through the edge (e, é).
其中,二分图又称作二部图,是图论中的一种特殊模型。设G=(V,E)是一个无向图,如果顶点V可分割为两个互不相交的子集(A,B),并且图中的每条边(i,j)所关联的两个顶点i和j分别属于这两个不同的顶点集(i in A,j in B),则称图G为一个二分图。Among them, bipartite graph is also called bipartite graph, which is a special model in graph theory. Let G=(V,E) be an undirected graph. If the vertex V can be divided into two disjoint subsets (A, B), and each edge (i, j) in the graph is associated with two If the vertices i and j belong to these two different vertex sets (i in A, j in B), then the graph G is called a bipartite graph.
其中,基于上述用户画像P(x)和用户画像P(Y)构造一个加权二分图G=(V=(P(X),P(Y)),E)。用户画像P(X)的元素构成图G的一部分,P(Y)的元素构成图的另一部分。P(X)的每个顶点e通过边(e,é)连接到P(Y)的每个顶点é。而边(e,é)的权重则等于顶点(或元素)e和é之间的相似性。其中,边(e,é)的权重和元素类型有关,元素类型包括查询或者文档。Among them, a weighted bipartite graph G=(V=(P(X), P(Y)), E) is constructed based on the aforementioned user portrait P(x) and user portrait P(Y). The elements of the user portrait P(X) form part of the graph G, and the elements of P(Y) form another part of the graph. Each vertex e of P(X) is connected to each vertex é of P(Y) by an edge (e, é). The weight of the edge (e, é) is equal to the similarity between the vertices (or elements) e and é. Among them, the weight of the edge (e, é) is related to the element type, and the element type includes query or document.
优选地,用户画像P(X)的顶点e包括对应的第一查询元素和第一文档元素,用户画像P(Y)的顶点é包括对应的第二查询元素和第二文档元素;Preferably, the vertex e of the user portrait P(X) includes a corresponding first query element and a first document element, and the vertex e of the user portrait P(Y) includes a corresponding second query element and a second document element;
获取用户画像P(X)的顶点e与用户画像P(Y)的顶点é之间的相似性的过程包括:The process of obtaining the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) includes:
首先,获取第一查询元素和第二查询元素之间的第一相似性,以及获取第一文档元素和第二文档元素之间的第二相似性;然后,基于第一相似性和第二相似性确定顶点e和顶点é之间的相似性。First, obtain the first similarity between the first query element and the second query element, and obtain the second similarity between the first document element and the second document element; then, based on the first similarity and the second similarity The sex determines the similarity between vertex e and vertex é.
步骤S103:基于加权二分图获取用户画像P(X)的顶点e与用户画像P(Y)的顶点é之间的相似性。Step S103: Obtain the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph.
其中,用户画像P(X)的各顶点e包括相应的查询元素和文档元素,用户画像P(Y)的各顶点也包括相应的查询元素和文档元素,在获取顶点e与顶点é之间的相似性的过程中,首先获取用户画像P(X)和用户画像P(Y)的各查询元素之间的相似性,以及用户画像P(X)和用户画像P(Y)之间的各文档元素之间的相似性,基于用户画像P(x)和用户画像P(Y)的查询顶点之间的相似性及各文档顶点之间的相似性,就可以确定所有顶点e与顶点é之间的相似性。Among them, each vertex e of the user portrait P(X) includes a corresponding query element and document element, and each vertex of the user portrait P(Y) also includes a corresponding query element and document element. The difference between vertex e and vertex é is obtained. In the process of similarity, first obtain the similarity between the query elements of the user profile P(X) and the user profile P(Y), and the documents between the user profile P(X) and the user profile P(Y) The similarity between the elements, based on the similarity between the query vertices of the user profile P(x) and the user profile P(Y) and the similarity between the vertices of each document, we can determine all the vertices e and é The similarity.
具体地,用户画像P(X)和用户画像P(Y)的各查询之间的相似性的计算,实际为各查询语句之间的相似性计算,目前的查询语句相似性获取方法主要包括:编辑距离算法、杰卡德系数算法、TF算法、TFIDF算法以及Word2Vec算法等。Specifically, the calculation of the similarity between the queries of the user profile P(X) and the user profile P(Y) is actually the calculation of the similarity between the query sentences. The current method for obtaining the similarity of query sentences mainly includes: Edit distance algorithm, Jaccard coefficient algorithm, TF algorithm, TFIDF algorithm, Word2Vec algorithm, etc.
进一步地,编辑距离算法,英文叫做Edit Distance,又称Levenshtein距离算法,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数,如果它们的距离越大,说明它们越是不同。许可的编辑操作包括将一个字符替换成另一个字符、插入一个字符、删除一个字符等。Further, the edit distance algorithm, called Edit Distance in English, is also called Levenshtein distance algorithm, which refers to the minimum number of edit operations required to convert two strings from one to the other. If their distance is greater, they The more different. The permitted editing operations include replacing one character with another, inserting a character, deleting a character, etc.
而杰卡德系数,英文叫做Jaccard index,又称为Jaccard相似系数,用于比较有限样本集之间的相似性与差异性。Jaccard系数值越大,样本相似度越高。实际上。杰卡德系数的计算方式非常简单,就是两个样本的交集除以并集得到的数值,当两个样本完全一致时,结果为1,当两个样本完全不同时,结果为0。The Jaccard coefficient, called Jaccard index in English, is also called Jaccard similarity coefficient, which is used to compare the similarity and difference between a limited sample set. The larger the Jaccard coefficient value, the higher the sample similarity. Actually. The calculation method of the Jaccard coefficient is very simple. It is the value obtained by dividing the intersection of two samples by the union. When the two samples are exactly the same, the result is 1, and when the two samples are completely different, the result is 0.
另外,用户画像P(X)和用户画像P(Y)的各文档之间的相似性计算方法主要包括TFIDF算法和基于空间向量的余弦算法等。In addition, the similarity calculation methods between the documents of the user profile P(X) and the user profile P(Y) mainly include the TFIDF algorithm and the cosine algorithm based on space vectors.
换言之,第一查询元素和第二查询元素的第一相似性通过编辑距离算法、杰卡德系数算法、TF算法、TFIDF算法或Word2Vec算法等算法获取;第一文档元素和第二文档元素之间的第二相似性通过TFIDF算法或基于空间向量的余弦算法等方法获取。In other words, the first similarity between the first query element and the second query element is obtained by the edit distance algorithm, the Jacquard coefficient algorithm, the TF algorithm, the TFIDF algorithm, or the Word2Vec algorithm; between the first document element and the second document element The second similarity of is obtained by TFIDF algorithm or cosine algorithm based on space vector.
步骤S104:根据P(X)的顶点e与P(Y)的顶点é之间的相似性确定边(e,é)的权重。Step S104: Determine the weight of the edge (e, é) according to the similarity between the vertex e of P(X) and the vertex é of P(Y).
具体地,可将边(e,é)的权重设置为等于P(X)的顶点e与P(Y)的顶点é之 间的相似性。Specifically, the weight of the edge (e, é) can be set equal to the similarity between the vertex e of P(X) and the vertex é of P(Y).
步骤S105:基于边(e,é)的权重获取用户画像P(X)和用户画像P(Y)之间的最大加权匹配值。Step S105: Obtain the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é).
其中,二分图的最大匹配主要指:给定一个二分图G,在二分图G的一个子图M中,M的边集中的任意两条边都不依附于同一个顶点,则称M是一个匹配。选择这样的边数最大的子集称为图的最大匹配问题(maximal matching problem),如果一个匹配中,图中的每个顶点都和图中某条边相关联,则称此匹配为完全匹配,也称作完备匹配。Among them, the maximum matching of the bipartite graph mainly refers to: given a bipartite graph G, in a subgraph M of the bipartite graph G, any two edges in the edge set of M are not attached to the same vertex, then M is called a match. Choosing such a subset with the largest number of edges is called the maximum matching problem of the graph. If in a match, every vertex in the graph is associated with an edge in the graph, then the match is called a complete match , Also known as complete matching.
例如,用户X的用户画像P(X)包括元素A、B、C、D、E,其中A、B、C、D、E中包含第一查询元素和第一文档元素,用户Y的用户画像P(Y)包含元素1、2、3、4、5,其中1、2、3、4、5中包含第二查询元素和第二文档元素,用户画像P(X)和用户画像P(Y)构造的二分图如图4所示。For example, the user portrait P(X) of user X includes elements A, B, C, D, and E, where A, B, C, D, and E contain the first query element and the first document element, and the user portrait of user Y P(Y) contains elements 1, 2, 3, 4, and 5, of which 1, 2, 3, 4, and 5 contain the second query element and the second document element. The user profile P(X) and the user profile P(Y ) The constructed bipartite graph is shown in Figure 4.
根据图5基于用户画像构造的二分图所示,最大匹配情况的加权匹配值计算通过以下公式计算:According to the bipartite graph constructed based on the user portrait in Figure 5, the weighted matching value of the maximum matching situation is calculated by the following formula:
M 1=w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5) M 1 =w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
M 2=w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2) M 2 =w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
M 2=w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5) M 2 =w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
M 2=w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2) M 2 =w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
其中,w(i,j)表示元素i和元素j之间的相似性或者边ij的权重;例如,w(A,1)表示为元素A和元素1之间的相似性,其也表示边A1的权重,w(B,3)、w(C,2)…w(E,5)等类似。Among them, w(i, j) represents the similarity between element i and element j or the weight of edge ij; for example, w(A, 1) represents the similarity between element A and element 1, which also represents the edge The weight of A1, w(B,3), w(C,2)...w(E,5), etc. are similar.
进而,从所有的加权匹配值中确定最大加权匹配值,在该具体实施例中,最大加权匹配值为3.5。Furthermore, the maximum weighted matching value is determined from all the weighted matching values. In this specific embodiment, the maximum weighted matching value is 3.5.
步骤S106:根据最大加权匹配值获取用户X和用户Y的用户画像相似性。Step S106: Acquire the user portrait similarity of the user X and the user Y according to the maximum weighted matching value.
其中,待待获取不同用户的用户画像相似性之后,可以基于用户P(X)和用户P(Y)之间的用户画像相似性创建用户社区,并根据所创建的用户社区对用户查询结果进行排序推荐。Among them, after the user portrait similarity of different users is to be obtained, a user community can be created based on the user portrait similarity between the user P(X) and the user P(Y), and the user query results can be performed according to the created user community. Sort recommendation.
作为具体示例,假设用户U需要查询的语句为q,基于用户P(X)和用户P(Y)之间的用户画像相似性进行查询的步骤包括:As a specific example, assuming that the sentence that user U needs to query is q, the steps of querying based on the similarity of user portraits between user P(X) and user P(Y) include:
步骤一:寻找与查询q相似的历史查询记录A。Step 1: Find a historical query record A similar to query q.
设A={(U 1,q 1,D q1),(U 2,q 2,D q2),…(U m,q m,D qm)} Let A={(U 1 ,q 1 ,D q1 ),(U 2 ,q 2 ,D q2 ),...(U m ,q m ,D qm )}
s(q,q i)>θ且s(P(U),P(U i))>ω 1≤i≤m s(q,q i )>θ and s(P(U),P(U i ))>ω 1≤i≤m
其中,U m表示用户,q m为用户U m的查询,D qm为与查询q m相关的所有文档,P(U)为用户U的用户画像,P(U i)为用户i的用户画像,s(P(U),P(U i))为用户U和用户I之间的用户画像相似性;s(q,q 1)为语句q与语句qi之间的相似性,上述相似性均可通过基于图算法的用户画像相似性计算方法获得。 Among them, U m represents the user, q m is the query of the user U m , D qm is all documents related to the query q m , P(U) is the user portrait of the user U, P(U i ) is the user portrait of the user i , S(P(U),P(U i )) is the similarity of user portrait between user U and user I; s(q,q 1 ) is the similarity between sentence q and sentence qi, the above similarity Both can be obtained by the user portrait similarity calculation method based on graph algorithm.
步骤二:计算所有与查询q相关的文档集合。Step 2: Calculate all document collections related to query q.
D q=D q1∪D q2∪…D qm D q =D q1 ∪D q2 ∪…D qm
其次,对于语料库中每个文档d符合d∈D q,计算如下得分: Secondly, for each document d in the corpus meets d ∈ D q , calculate the following score:
Figure PCTCN2019117794-appb-000026
Figure PCTCN2019117794-appb-000026
而对于每个d不属于D q,则默认R(U,d,q)=0; And for each d that does not belong to D q , the default R(U,d,q)=0;
步骤三:对于语料库中的每个文档d,计算d和q之间的相似性,获取相似性r(d,q);Step 3: For each document d in the corpus, calculate the similarity between d and q to obtain the similarity r(d, q);
步骤四:计算文集中每个文档的最终排名:Step 4: Calculate the final ranking of each document in the corpus:
R final(U,d,q)=a*r(d,q)+b*R(U,d,q) R final (U,d,q)=a*r(d,q)+b*R(U,d,q)
其中,a和b为设定系数。Among them, a and b are setting coefficients.
步骤五:按文档的最终排名可对文档进行排序,以构造输出列表,根据输出列表即可对用户U需要查询的语句q进行查询输出。Step 5: According to the final ranking of the documents, the documents can be sorted to construct an output list. According to the output list, the sentence q that the user U needs to query can be queried and output.
利用上述基于用户选项相似性的信息检索推荐方法,采用加权二分图最大匹配的方式获取用户画像之间的相似性,能够在协同信息检索环境中基于用户画像相似性动态构建用户社区,并根据用户社区进行个性化信息检索推荐,能够提高用户检索准确率,优化用户体验,实现个性化推荐。Using the above-mentioned information retrieval recommendation method based on the similarity of user options, the weighted bipartite graph maximum matching method is used to obtain the similarity between user portraits, and the user community can be dynamically constructed based on the similarity of user portraits in the collaborative information retrieval environment, and according to users The community’s personalized information retrieval recommendation can improve user retrieval accuracy, optimize user experience, and achieve personalized recommendation.
此外,本申请实施例还提出一种计算机可读存储介质,计算机可读存储介质中包括基于用户画像相似性的信息检索推荐程序,基于用户画像相似性的信息检索推荐程序被处理器执行时实现如下操作:In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium includes an information retrieval recommendation program based on the similarity of user portraits. The information retrieval recommendation program based on the similarity of user portraits is implemented when the processor is executed. Do as follows:
获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;Acquire user portraits of different users, and determine the similarity of user portraits between user portraits;
基于用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;Create user dynamic communities based on the similarity of user portraits, so that users with similar portraits belong to the same user dynamic community;
根据用户动态社区及用户的查询语句对用户进行信息检索推荐。According to the user's dynamic community and the user's query statement, the user is recommended for information retrieval.
本申请之计算机可读存储介质的具体实施方式与上述基于用户画像相似性的信息检索推荐方法、电子装置、系统的具体实施方式大致相同,在此不 再赘述。The specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned information retrieval recommendation method, electronic device, and system based on the similarity of user portraits, and will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that in this article, the terms "including", "including" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, device, article, or method. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, device, article or method that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, or a network device, etc.) execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly used in other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种基于画像相似性的信息检索推荐方法,应用于电子装置,其特征在于,所述方法包括:An information retrieval recommendation method based on portrait similarity, applied to an electronic device, characterized in that the method includes:
    获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;Acquire user portraits of different users, and determine the similarity of user portraits between user portraits;
    基于所述用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;Create a user dynamic community based on the similarity of the user portraits, so that users with similar portraits belong to the same user dynamic community;
    根据所述用户动态社区及所述用户的查询语句对所述用户进行信息检索推荐。According to the user dynamic community and the query sentence of the user, information retrieval recommendation is performed on the user.
  2. 根据权利要求1所述的基于画像相似性的信息检索推荐方法,其特征在于,所述获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性的步骤包括:The method for information retrieval and recommendation based on portrait similarity according to claim 1, wherein the step of obtaining user portraits of different users and determining the user portrait similarity between the user portraits comprises:
    将用户画像P存储为与坐标(q,D q)相关的集合;其中,q表示用户的任意一个查询记录,D q表示与所述查询记录q相关的所有文档; Store the user portrait P as a collection related to the coordinates (q, D q ); where q represents any query record of the user, and D q represents all documents related to the query record q;
    基于待处理的用户画像P(X)和用户画像P(Y)构造加权二分图;其中,P(X)为用户X的用户画像,P(Y)为用户Y的用户画像,P(X)的顶点e通过边(e,é)连接到P(Y)的顶点é;Construct a weighted bipartite graph based on the user portrait P(X) and user portrait P(Y) to be processed; among them, P(X) is the user portrait of user X, P(Y) is the user portrait of user Y, and P(X) The vertex e of is connected to the vertex é of P(Y) through the edge (e, é);
    基于所述加权二分图获取所述用户画像P(X)的顶点e与所述用户画像P(Y)的顶点é之间的相似性;Acquiring the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph;
    根据所述P(X)的顶点e与所述P(Y)的顶点é之间的相似性确定所述边(e,é)的权重;Determining the weight of the edge (e, é) according to the similarity between the vertex e of the P(X) and the vertex é of the P(Y);
    基于所述边(e,é)的权重获取用户画像P(X)和用户画像P(Y)之间的最大加权匹配值;Obtaining the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é);
    根据所述最大加权匹配值获取所述用户X和用户Y的用户画像相似性。The user portrait similarity of the user X and the user Y is obtained according to the maximum weighted matching value.
  3. 根据权利要求2所述的基于画像相似性的信息检索推荐方法,其特征在于,The information retrieval recommendation method based on the similarity of portraits according to claim 2, characterized in that,
    所述用户X的用户画像P(X)存储为:The user portrait P(X) of the user X is stored as:
    Figure PCTCN2019117794-appb-100001
    Figure PCTCN2019117794-appb-100001
    所述用户Y的用户画像P(Y)存储为:The user portrait P(Y) of the user Y is stored as:
    Figure PCTCN2019117794-appb-100002
    Figure PCTCN2019117794-appb-100002
    其中,
    Figure PCTCN2019117794-appb-100003
    表示用户X的第i个查询,
    Figure PCTCN2019117794-appb-100004
    表示与查询
    Figure PCTCN2019117794-appb-100005
    有关的所有文档;
    Figure PCTCN2019117794-appb-100006
    表示用户Y的第j个查询,
    Figure PCTCN2019117794-appb-100007
    表示与查询
    Figure PCTCN2019117794-appb-100008
    有关的所有文档。
    among them,
    Figure PCTCN2019117794-appb-100003
    Represents the i-th query of user X,
    Figure PCTCN2019117794-appb-100004
    Representation and query
    Figure PCTCN2019117794-appb-100005
    All relevant documents;
    Figure PCTCN2019117794-appb-100006
    Represents the jth query of user Y,
    Figure PCTCN2019117794-appb-100007
    Representation and query
    Figure PCTCN2019117794-appb-100008
    All relevant documents.
  4. 根据权利要求2所述的基于画像相似性的信息检索推荐方法,其特征在于,The information retrieval recommendation method based on the similarity of portraits according to claim 2, characterized in that,
    所述用户画像P(X)的顶点e包括对应的第一查询元素和第一文档元素,所述用户画像P(Y)的顶点é包括对应的第二查询元素和第二文档元素;The vertex e of the user portrait P(X) includes a corresponding first query element and a first document element, and the vertex e of the user portrait P(Y) includes a corresponding second query element and a second document element;
    所述获取所述用户画像P(X)的顶点e与所述用户画像P(Y)的顶点é之间的相似性的过程包括:The process of obtaining the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) includes:
    获取所述第一查询元素和第二查询元素之间的第一相似性,以及获取所述第一文档元素和所述第二文档元素之间的第二相似性;Acquiring a first similarity between the first query element and a second query element, and acquiring a second similarity between the first document element and the second document element;
    基于所述第一相似性和所述第二相似性确定所述顶点e和顶点é之间的相似性。The similarity between the vertex e and the vertex e is determined based on the first similarity and the second similarity.
  5. 根据权利要求4所述的基于画像相似性的信息检索推荐方法,其特征在于,The information retrieval recommendation method based on the similarity of portraits according to claim 4, characterized in that,
    所述第一查询元素和第二查询元素的第一相似性通过编辑距离算法、杰卡德系数算法、TF算法、TFIDF算法或Word2Vec算法获取;The first similarity between the first query element and the second query element is obtained through an edit distance algorithm, a Jacquard coefficient algorithm, a TF algorithm, a TFIDF algorithm, or a Word2Vec algorithm;
    所述第一文档元素和所述第二文档元素之间的第二相似性通过TFIDF算法或基于空间向量的余弦算法获取。The second similarity between the first document element and the second document element is obtained through a TFIDF algorithm or a space vector-based cosine algorithm.
  6. 根据权利要求2所述的基于画像相似性的信息检索推荐方法,其特征在于,The information retrieval recommendation method based on the similarity of portraits according to claim 2, characterized in that,
    所述用户X的用户画像P(X)包括元素A、B、C、D、E,其中元素A、B、C、D、E包含第一查询元素和第一文档元素;The user portrait P(X) of the user X includes elements A, B, C, D, and E, where the elements A, B, C, D, and E include the first query element and the first document element;
    用户Y的用户画像P(Y)包含元素1、2、3、4、5,其中元素1、2、3、4、5包含第二查询元素和第二文档元素;The user portrait P(Y) of user Y includes elements 1, 2, 3, 4, and 5, where elements 1, 2, 3, 4, and 5 include the second query element and the second document element;
    第一步:通过以下公式获取所述加权二分图的所有的加权匹配值;Step 1: Obtain all weighted matching values of the weighted bipartite graph by the following formula;
    M 1=w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5) M 1 =w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
    M 2=w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2) M 2 =w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
    M 2=w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5) M 2 =w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
    M 2=w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2) M 2 =w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
    其中,w(i,j)表示元素i和元素j之间的相似性或者边ij的权重;Among them, w(i, j) represents the similarity between element i and element j or the weight of edge ij;
    第二步:从所有的加权匹配值中确定最大加权匹配值。Step 2: Determine the maximum weighted matching value from all weighted matching values.
  7. 根据权利要求1所述的基于画像相似性的信息检索推荐方法,其特征在于,所述相似画像的用户分属到同一用户动态社区内的步骤包括:The method for information retrieval and recommendation based on portrait similarity according to claim 1, wherein the step of assigning users with similar portraits to the same user dynamic community comprises:
    将所述用户画像相似性符合相似性预设范围的所有用户分属到同一用户动态社区。All users whose user portrait similarity meets the preset similarity range belong to the same user dynamic community.
  8. 根据权利要求1所述的基于画像相似性的信息检索推荐方法,其特征在于,所述根据所述用户动态社区及所述用户的查询语句对所述用户进行信息检索推荐的步骤包括:The method for information retrieval and recommendation based on portrait similarity according to claim 1, wherein the step of performing information retrieval and recommendation on the user according to the user dynamic community and the query sentence of the user comprises:
    获取与所述查询语句相关的所有历史查询记录;Obtaining all historical query records related to the query sentence;
    获取与所述历史查询记录相关的文档集合;Acquiring a document collection related to the historical query record;
    获取所述文档集合中的各文档与所述查询语句之间的相似性;Acquiring the similarity between each document in the document collection and the query sentence;
    基于所述相似性对所述文档集合中的各文档进行排序,以根据排序构造输出列表;Sorting the documents in the document collection based on the similarity to construct an output list according to the sorting;
    基于所述输出列表输出与所述查询语句对应的查询结果。Output the query result corresponding to the query sentence based on the output list.
  9. 根据权利要求8所述的基于画像相似性的信息检索推荐方法,其特征在于,所述所有历史查询记录的获取公式如下:The method for information retrieval and recommendation based on the similarity of portraits according to claim 8, wherein the formula for obtaining all historical query records is as follows:
    A={(U 1,q 1,D q1),(U 2,q 2,D q2),…(U m,q m,D qm)} A={(U 1 ,q 1 ,D q1 ),(U 2 ,q 2 ,D q2 ),...(U m ,q m ,D qm )}
    s(q,q i)>θ且s(P(U),P(U i))>ω1≤i≤m s(q,q i )>θ and s(P(U),P(U i ))>ω1≤i≤m
    其中,A表示历史查询记录,U m表示用户,q m为用户U m的查询,D qm为与查询q m相关的所有文档,P(U)为用户U的用户画像,P(U i)为用户i的用户画像,s(P(U),P(U i))为用户U和用户I之间的用户画像相似性;s(q,q 1)为语句q与语句qi之间的相似性。 Among them, A represents historical query records, U m represents users, q m is the query of user U m , D qm is all documents related to the query q m , P(U) is the user portrait of user U, P(U i ) Is the user portrait of user i, s(P(U), P(U i )) is the user portrait similarity between user U and user I; s(q, q 1 ) is the difference between sentence q and sentence qi Similarity.
  10. 根据权利要求8所述的基于画像相似性的信息检索推荐方法,其特征在于,所述获取与所述历史查询记录相关的文档集合的公式如下:The method for information retrieval and recommendation based on portrait similarity according to claim 8, wherein the formula for obtaining the document collection related to the historical query record is as follows:
    D q=D q1∪D q2∪…D qm D q =D q1 ∪D q2 ∪…D qm
    其次,D q为文档集合,D qm为与查询q m相关的所有文档,对于所述文档集合中的各文档d符合d∈D qSecondly, D q is a document set, D qm is all documents related to the query q m , and each document d in the document set conforms to d ∈ D q .
  11. 一种电子装置,其特征在于,该电子装置包括:存储器及处理器,所述存储器中包括基于画像相似性的信息检索推荐程序,所述基于画像相似性的信息检索推荐程序被所述处理器执行时实现如下步骤:An electronic device, characterized in that the electronic device includes a memory and a processor, the memory includes an information retrieval recommendation program based on portrait similarity, and the information retrieval recommendation program based on portrait similarity is used by the processor The following steps are implemented during execution:
    获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;Acquire user portraits of different users, and determine the similarity of user portraits between user portraits;
    基于所述用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;Create a user dynamic community based on the similarity of the user portraits, so that users with similar portraits belong to the same user dynamic community;
    根据所述用户动态社区及所述用户的查询语句对所述用户进行信息检索推荐。According to the user dynamic community and the query sentence of the user, information retrieval recommendation is performed on the user.
  12. 根据权利要求11所述的电子装置,其特征在于,所述获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性的步骤包括:The electronic device according to claim 11, wherein the step of obtaining user portraits of different users and determining the similarity of the user portraits between the user portraits comprises:
    将用户画像P存储为与坐标(q,D q)相关的集合;其中,q表示用户的任意一个查询记录,D q表示与所述查询记录q相关的所有文档; Store the user portrait P as a collection related to the coordinates (q, D q ); where q represents any query record of the user, and D q represents all documents related to the query record q;
    基于待处理的用户画像P(X)和用户画像P(Y)构造加权二分图;其中,P(X)为用户X的用户画像,P(Y)为用户Y的用户画像,P(X)的顶点e通过边(e,é)连接到P(Y)的顶点é;Construct a weighted bipartite graph based on the user portrait P(X) and user portrait P(Y) to be processed; among them, P(X) is the user portrait of user X, P(Y) is the user portrait of user Y, and P(X) The vertex e of is connected to the vertex é of P(Y) through the edge (e, é);
    基于所述加权二分图获取所述用户画像P(X)的顶点e与所述用户画像P(Y)的顶点é之间的相似性;Acquiring the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph;
    根据所述P(X)的顶点e与所述P(Y)的顶点é之间的相似性确定所述边(e,é)的权重;Determining the weight of the edge (e, é) according to the similarity between the vertex e of the P(X) and the vertex é of the P(Y);
    基于所述边(e,é)的权重获取用户画像P(X)和用户画像P(Y)之间的最大加权匹配值;Obtaining the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é);
    根据所述最大加权匹配值获取所述用户X和用户Y的用户画像相似性。The user portrait similarity of the user X and the user Y is obtained according to the maximum weighted matching value.
  13. 根据权利要求12所述的电子装置,其特征在于,所述用户X的用户画像P(X)存储为:The electronic device according to claim 12, wherein the user portrait P(X) of the user X is stored as:
    Figure PCTCN2019117794-appb-100009
    Figure PCTCN2019117794-appb-100009
    所述用户Y的用户画像P(Y)存储为:The user portrait P(Y) of the user Y is stored as:
    Figure PCTCN2019117794-appb-100010
    Figure PCTCN2019117794-appb-100010
    其中,
    Figure PCTCN2019117794-appb-100011
    表示用户X的第i个查询,
    Figure PCTCN2019117794-appb-100012
    表示与查询
    Figure PCTCN2019117794-appb-100013
    有关的所有文档;
    Figure PCTCN2019117794-appb-100014
    表示用户Y的第j个查询,
    Figure PCTCN2019117794-appb-100015
    表示与查询
    Figure PCTCN2019117794-appb-100016
    有关的所有文档。
    among them,
    Figure PCTCN2019117794-appb-100011
    Represents the i-th query of user X,
    Figure PCTCN2019117794-appb-100012
    Representation and query
    Figure PCTCN2019117794-appb-100013
    All relevant documents;
    Figure PCTCN2019117794-appb-100014
    Represents the jth query of user Y,
    Figure PCTCN2019117794-appb-100015
    Representation and query
    Figure PCTCN2019117794-appb-100016
    All relevant documents.
  14. 一种基于画像相似性的信息检索推荐系统,其特征在于,包括:An information retrieval recommendation system based on portrait similarity, which is characterized in that it includes:
    用户画像相似性确定单元,用于获取不同用户的用户画像,并确定各用户画像之间的用户画像相似性;The user portrait similarity determination unit is used to obtain user portraits of different users and determine the user portrait similarity between user portraits;
    动态社区创建单元,用户基于所述用户画像相似性创建用户动态社区,使相似画像的用户分属到同一用户动态社区内;A dynamic community creation unit, where the user creates a user dynamic community based on the similarity of the user portrait, so that users with similar portraits belong to the same user dynamic community;
    检索推荐单元,用于根据所述用户动态社区及所述用户的查询语句对所述用户进行信息检索推荐。The search recommendation unit is configured to perform information search and recommendation on the user according to the user dynamic community and the user's query sentence.
  15. 根据权利要求14所述的基于画像相似性的信息检索推荐系统,其特征在于,所述用户画像相似性确定单元包括:The information retrieval recommendation system based on portrait similarity according to claim 14, wherein the user portrait similarity determination unit comprises:
    用户画像存储模块,用于将用户画像P存储为与坐标(q,D q)相关的集合;其中,q表示用户的任意一个查询记录,D q表示与所述查询记录q相关的所有文档; The user portrait storage module is used to store the user portrait P as a collection related to coordinates (q, D q ); wherein, q represents any query record of the user, and D q represents all documents related to the query record q;
    加权二分图构造模块,用于基于待处理的用户画像P(X)和用户画像P(Y)构造加权二分图;其中,P(X)为用户X的用户画像,P(Y)为用户Y的用户画像,P(X)的顶点e通过边(e,é)连接到P(Y)的顶点é;The weighted bipartite graph construction module is used to construct a weighted bipartite graph based on the user profile P(X) and user profile P(Y) to be processed; where P(X) is the user profile of user X, and P(Y) is user Y In the user portrait of, the vertex e of P(X) is connected to the vertex é of P(Y) through the edge (e, é);
    相似性获取模块,用于基于所述加权二分图获取所述用户画像P(X)的顶点e与所述用户画像P(Y)的顶点é之间的相似性;A similarity acquisition module, configured to acquire the similarity between the vertex e of the user portrait P(X) and the vertex é of the user portrait P(Y) based on the weighted bipartite graph;
    权重确定模块,用于根据所述P(X)的顶点e与所述P(Y)的顶点é之间的相似性确定所述边(e,é)的权重;A weight determination module, configured to determine the weight of the edge (e, é) according to the similarity between the vertex e of the P(X) and the vertex é of the P(Y);
    最大加权匹配值获取模块,用于基于所述边(e,é)的权重获取用户画像P(X)和用户画像P(Y)之间的最大加权匹配值;The maximum weighted matching value obtaining module is configured to obtain the maximum weighted matching value between the user portrait P(X) and the user portrait P(Y) based on the weight of the edge (e, é);
    用户画像相似性确定模块,用于根据所述最大加权匹配值获取所述用户X和用户Y的用户画像相似性。The user portrait similarity determination module is configured to obtain the user portrait similarity of the user X and the user Y according to the maximum weighted matching value.
  16. 根据权利要求15所述的基于画像相似性的信息检索推荐系统,其特征在于,The information retrieval recommendation system based on the similarity of portraits according to claim 15, characterized in that,
    所述用户X的用户画像P(X)存储为:The user portrait P(X) of the user X is stored as:
    Figure PCTCN2019117794-appb-100017
    Figure PCTCN2019117794-appb-100017
    所述用户Y的用户画像P(Y)存储为:The user portrait P(Y) of the user Y is stored as:
    Figure PCTCN2019117794-appb-100018
    Figure PCTCN2019117794-appb-100018
    其中,
    Figure PCTCN2019117794-appb-100019
    表示用户X的第i个查询,
    Figure PCTCN2019117794-appb-100020
    表示与查询
    Figure PCTCN2019117794-appb-100021
    有关的所有文档;
    Figure PCTCN2019117794-appb-100022
    表示用户Y的第j个查询,
    Figure PCTCN2019117794-appb-100023
    表示与查询
    Figure PCTCN2019117794-appb-100024
    有关的所有文档。
    among them,
    Figure PCTCN2019117794-appb-100019
    Represents the i-th query of user X,
    Figure PCTCN2019117794-appb-100020
    Representation and query
    Figure PCTCN2019117794-appb-100021
    All relevant documents;
    Figure PCTCN2019117794-appb-100022
    Represents the jth query of user Y,
    Figure PCTCN2019117794-appb-100023
    Representation and query
    Figure PCTCN2019117794-appb-100024
    All relevant documents.
  17. 根据权利要求15所述的基于画像相似性的信息检索推荐系统,其特征在于,The information retrieval recommendation system based on the similarity of portraits according to claim 15, characterized in that,
    所述用户画像P(X)的顶点e包括对应的第一查询元素和第一文档元素,所述用户画像P(Y)的顶点é包括对应的第二查询元素和第二文档元素;The vertex e of the user portrait P(X) includes a corresponding first query element and a first document element, and the vertex e of the user portrait P(Y) includes a corresponding second query element and a second document element;
    所述相似性获取模块包括:The similarity acquisition module includes:
    查询元素和文档元素相似性获取模块,用于获取所述第一查询元素和第二查询元素之间的第一相似性,以及获取所述第一文档元素和所述第二文档元素之间的第二相似性;The query element and the document element similarity acquisition module is used to acquire the first similarity between the first query element and the second query element, and to acquire the difference between the first document element and the second document element Second similarity
    顶点间相似性确定模块,用于基于所述第一相似性和所述第二相似性确定所述顶点e和顶点é之间的相似性。The similarity determination module between vertices is used to determine the similarity between the vertex e and the vertex e based on the first similarity and the second similarity.
  18. 根据权利要求17所述的基于画像相似性的信息检索推荐系统,其特征在于,所述查询元素和文档元素相似性获取模块包括:The information retrieval recommendation system based on portrait similarity according to claim 17, wherein said query element and document element similarity acquisition module comprises:
    第一相似性获取模块,用于通过编辑距离算法、杰卡德系数算法、TF算法、TFIDF算法或Word2Vec算法获取所述第一查询元素和第二查询元素的第一相似性;The first similarity acquisition module is configured to acquire the first similarity between the first query element and the second query element through edit distance algorithm, Jaccard coefficient algorithm, TF algorithm, TFIDF algorithm or Word2Vec algorithm;
    第二相似性获取模块,用于通过TFIDF算法或基于空间向量的余弦算法获取所述第一文档元素和所述第二文档元素之间的第二相似性。The second similarity acquisition module is configured to acquire the second similarity between the first document element and the second document element through the TFIDF algorithm or the space vector-based cosine algorithm.
  19. 根据权利要求15所述的基于画像相似性的信息检索推荐系统,其特征在于,The information retrieval recommendation system based on portrait similarity according to claim 15, characterized in that,
    所述用户X的用户画像P(X)包括元素A、B、C、D、E,其中元素A、B、C、D、E包含第一查询元素和第一文档元素;The user portrait P(X) of the user X includes elements A, B, C, D, and E, where the elements A, B, C, D, and E include the first query element and the first document element;
    用户Y的用户画像P(Y)包含元素1、2、3、4、5,其中元素1、2、3、4、5包含第二查询元素和第二文档元素;The user portrait P(Y) of user Y includes elements 1, 2, 3, 4, and 5, where elements 1, 2, 3, 4, and 5 include the second query element and the second document element;
    第一步:通过以下公式获取所述加权二分图的所有的加权匹配值;Step 1: Obtain all weighted matching values of the weighted bipartite graph by the following formula;
    M 1=w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5) M 1 =w(A,1)+w(B,3)+w(C,2)+w(D,4)+w(E,5)
    M 2=w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2) M 2 =w(A,1)+w(B,3)+w(C,5)+w(D,4)+w(E,2)
    M 2=w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5) M 2 =w(A,1)+w(B,4)+w(C,2)+w(D,3)+w(E,5)
    M 2=w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2) M 2 =w(A,1)+w(B,4)+w(C,5)+w(D,3)+w(E,2)
    其中,w(i,j)表示元素i和元素j之间的相似性或者边ij的权重;Among them, w(i, j) represents the similarity between element i and element j or the weight of edge ij;
    第二步:从所有的加权匹配值中确定最大加权匹配值。Step 2: Determine the maximum weighted matching value from all weighted matching values.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中包括基于画像相似性的信息检索推荐程序,所述基于画像相似性的信息检索推荐程序被处理器执行时,实现如权利要求1至10中任一项所述的基于画像相似性的信息检索推荐方法的步骤。A computer-readable storage medium, wherein the computer-readable storage medium includes an information retrieval recommendation program based on portrait similarity, and when the information retrieval recommendation program based on portrait similarity is executed by a processor, the following The steps of an information retrieval recommendation method based on portrait similarity according to any one of claims 1 to 10.
PCT/CN2019/117794 2019-08-14 2019-11-13 Portrait similarity-based information retrieval recommendation method and device and storage medium WO2021027149A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910748591.3A CN110598123B (en) 2019-08-14 2019-08-14 Information retrieval recommendation method, device and storage medium based on image similarity
CN201910748591.3 2019-08-14

Publications (1)

Publication Number Publication Date
WO2021027149A1 true WO2021027149A1 (en) 2021-02-18

Family

ID=68854177

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117794 WO2021027149A1 (en) 2019-08-14 2019-11-13 Portrait similarity-based information retrieval recommendation method and device and storage medium

Country Status (2)

Country Link
CN (1) CN110598123B (en)
WO (1) WO2021027149A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111857660B (en) * 2020-07-06 2021-10-08 南京航空航天大学 Context-aware API recommendation method and terminal based on query statement
CN112686462A (en) * 2021-01-06 2021-04-20 广州视源电子科技股份有限公司 Student portrait-based anomaly detection method, device, equipment and storage medium
CN113486985B (en) * 2021-08-02 2023-04-18 汤恩智能科技(上海)有限公司 User identification method, management method, medium and electronic device for electric device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398854A (en) * 2008-10-24 2009-04-01 清华大学 Video fragment searching method and system
CN102521659A (en) * 2011-11-26 2012-06-27 北京航空航天大学 Method for judging incidence relation between services orienting to cloud manufacturing
KR101752636B1 (en) * 2017-01-31 2017-07-03 주식회사 스켈터랩스 Recommended method using the entity's record application
CN110111167A (en) * 2018-02-01 2019-08-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining recommended

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556603A (en) * 2009-05-06 2009-10-14 北京航空航天大学 Coordinate search method used for reordering search results
CN106021423B (en) * 2016-05-16 2019-05-21 西安电子科技大学 META Search Engine personalization results recommended method based on group division
CN106599148A (en) * 2016-12-02 2017-04-26 东软集团股份有限公司 Method and device for generating abstract
CN108062375B (en) * 2017-12-12 2021-12-10 百度在线网络技术(北京)有限公司 User portrait processing method and device, terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101398854A (en) * 2008-10-24 2009-04-01 清华大学 Video fragment searching method and system
CN102521659A (en) * 2011-11-26 2012-06-27 北京航空航天大学 Method for judging incidence relation between services orienting to cloud manufacturing
KR101752636B1 (en) * 2017-01-31 2017-07-03 주식회사 스켈터랩스 Recommended method using the entity's record application
CN110111167A (en) * 2018-02-01 2019-08-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining recommended

Also Published As

Publication number Publication date
CN110598123A (en) 2019-12-20
CN110598123B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN110162695B (en) Information pushing method and equipment
CN107679144B (en) News sentence clustering method and device based on semantic similarity and storage medium
US9342583B2 (en) Book content item search
WO2017045443A1 (en) Image retrieval method and system
US7769771B2 (en) Searching a document using relevance feedback
US8577882B2 (en) Method and system for searching multilingual documents
WO2017031996A1 (en) Method and device for calculating similarity of search terms, searching method and device using search terms
WO2021164231A1 (en) Official document abstract extraction method and apparatus, and device and computer readable storage medium
WO2021027149A1 (en) Portrait similarity-based information retrieval recommendation method and device and storage medium
CN109299383B (en) Method and device for generating recommended word, electronic equipment and storage medium
US9275128B2 (en) Method and system for document indexing and data querying
WO2019085474A1 (en) Calculation engine implementing method, electronic device, and storage medium
WO2020248379A1 (en) Method for searching for similar network pages, and apparatus
JP2015225669A (en) Annotation display assistance device and annotation display assistance method
WO2018176913A1 (en) Search method and apparatus, and non-temporary computer-readable storage medium
CN112632261A (en) Intelligent question and answer method, device, equipment and storage medium
US8799314B2 (en) System and method for managing information map
CN114780746A (en) Knowledge graph-based document retrieval method and related equipment thereof
WO2020164204A1 (en) Text template recognition method and apparatus, and computer readable storage medium
CN111967045A (en) Big data-based data publishing privacy protection algorithm and system
WO2023151576A1 (en) Search recommendation method, search recommendation system, computer device and storage medium
WO2022257455A1 (en) Determination metod and apparatus for similar text, and terminal device and storage medium
CN111985217B (en) Keyword extraction method, computing device and readable storage medium
CN116186198A (en) Information retrieval method, information retrieval device, computer equipment and storage medium
JP6495206B2 (en) Document concept base generation device, document concept search device, method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19941582

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19941582

Country of ref document: EP

Kind code of ref document: A1