WO2020019562A1 - Search sorting method and device, electronic device, and storage medium - Google Patents

Search sorting method and device, electronic device, and storage medium Download PDF

Info

Publication number
WO2020019562A1
WO2020019562A1 PCT/CN2018/113348 CN2018113348W WO2020019562A1 WO 2020019562 A1 WO2020019562 A1 WO 2020019562A1 CN 2018113348 W CN2018113348 W CN 2018113348W WO 2020019562 A1 WO2020019562 A1 WO 2020019562A1
Authority
WO
WIPO (PCT)
Prior art keywords
weight
initial search
text similarity
search results
update time
Prior art date
Application number
PCT/CN2018/113348
Other languages
French (fr)
Chinese (zh)
Inventor
彭钊
Original Assignee
天津字节跳动科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天津字节跳动科技有限公司 filed Critical 天津字节跳动科技有限公司
Publication of WO2020019562A1 publication Critical patent/WO2020019562A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web

Definitions

  • the present application relates to the technical field of enterprise instant messaging systems, and in particular, to a search sorting method, device, electronic device, and storage medium.
  • chat application software With the rapid development of smart devices, there are more and more chat application software.
  • the use of chat application software can facilitate users to communicate in different places.
  • the chat application software includes personal chat application software and enterprise chat application software.
  • enterprise chat application software when a user needs to find relevant information, a search function is activated, such as searching for chat information, contacts, or group chat, in order to quickly find relevant information or quickly establish a chat link.
  • the search results of the enterprise chat application software are displayed separately according to different objects, such as contacts, group chats, messages, etc. are displayed in columns, and the displayed objects are sorted by time, and users are displayed according to the displayed columns. Finding relevant information is tedious and time-consuming.
  • a search ranking method includes:
  • a search sorting device includes:
  • An initial search result extraction module which obtains search keywords and determines a plurality of initial search results matching the plurality of keywords
  • a feature factor extraction module that extracts text similarity, update time dimension, and user association degree related to each of the initial search results
  • the weight calculation module obtains the corresponding text similarity weight, update time dimension weight, and user relevance degree weight according to the text similarity, update time dimension, and user association degree, and updates the time dimension according to the text similarity weight, Performing a fusion calculation on each of the initial search results with a weight and a weight of a degree of user association to obtain a comprehensive weight of each of the initial search results;
  • a sorting module sorts the plurality of initial search results according to the comprehensive weight.
  • An electronic device includes a memory and a processor.
  • the memory stores a computer program.
  • the processor executes the computer program, the following steps are implemented:
  • a computer-readable storage medium stores a computer program thereon.
  • the computer program is executed by a processor, the following steps are implemented:
  • the above search sorting method, device, electronic device and storage medium ensure that the sorting is performed according to time by extracting and updating the time dimension parameters, ranking the initial search results that have common characteristics with the user according to the degree of user association, and sorting through multiple dimensions. Sorting the search results makes the sorting intelligent, making it easy for users to quickly find relevant information, simplifying operations and improving search efficiency.
  • FIG. 1 is an application environment diagram of a search ranking method in an embodiment
  • FIG. 2 is a schematic flowchart of a search ranking method according to an embodiment
  • FIG. 3 is a schematic flowchart of a step of obtaining a text similarity weight in an embodiment
  • FIG. 4 is a schematic flowchart of a step of obtaining weights of an update time dimension according to an embodiment
  • FIG. 5 is a schematic flowchart of a step of obtaining a user association degree weight in an embodiment
  • FIG. 6 is a structural block diagram of a search and ranking device in an embodiment
  • FIG. 7 is a structural block diagram of a feature factor extraction module in an embodiment
  • FIG. 8 is a structural block diagram of a weight calculation module in an embodiment
  • FIG. 9 is an internal structure diagram of an electronic device in an embodiment
  • FIG. 10 is a block diagram of a server search subject in an embodiment.
  • the search ranking method provided in this application can be applied to the application environment shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the server 104 obtains the search keywords, and determines a plurality of initial search results that match the plurality of keywords; extracts the text similarity, update time dimension and user related to each of the initial search results Degree of relevance; obtaining corresponding text similarity weights, update time dimension weights, and user relevance degree weights according to the text similarity, update time dimension, and user relevance degrees, and according to the text similarity weights, update time dimension weights, and
  • the user relevance degree weight is calculated by fusing each of the initial search results to obtain a comprehensive weight of each of the initial search results; the plurality of initial search results are sorted according to the comprehensive weight, and the ranking results are displayed in Terminal 102.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented by
  • a search ranking method is provided.
  • the method is applied to the server in FIG. 1 as an example, and includes the following steps:
  • Step 210 Obtain a search keyword, and determine multiple initial search results that match the multiple keywords.
  • the search keywords are input information such as words, words, and symbols entered by the user when using a search engine to find related information.
  • the initial search results include multiple fields. Specifically, the initial search results refer to contacts or group chats. .
  • a search keyword is input at the terminal, and the terminal obtains the search keyword input by the user and sends it to the server.
  • Step 220 Extract the text similarity, update time dimension and user association degree related to each of the initial search results.
  • each initial search result includes: object type, object status, object name, initial recall search engine score, chat update time, last message location, object pinyin name, object English name, department information, or Multiple.
  • the object types include chat applications and emails, and the object status includes whether to register or leave.
  • the method before extracting the text similarity, the update time dimension, and the degree of user relevance related to each of the initial search results, the method includes: filtering the initial search results.
  • the filtering the initial search results includes: the initial search results of the leaving users and no chat history are not sorted; and the initial search results of the unregistered users are ranked last.
  • the chat history can be determined by the chat update time or the corresponding location of the latest message.
  • Step 230 Obtain corresponding text similarity weights, update time dimension weights, and user correlation degree weights according to the text similarity, update time dimension, and user association degree, and according to the text similarity weights, update time dimension weights, and The user's degree of relevance weight performs fusion calculation on each of the initial search results to obtain a comprehensive weight of each of the initial search results.
  • the text similarity weight is used to represent the degree of matching between the search keywords and the initial search results
  • the update time dimension weight is used to represent the initial search results chat record update
  • the user relevance degree weight is used to indicate that the initial search results are of interest to multiple users aims.
  • Step 240 Sort the multiple initial search results according to the comprehensive weight.
  • the sorting when sorting, the sorting can be performed according to the weight value from large to small, or the sorting can be performed according to the weight value from small to large. Using this technical solution does not distinguish the sorting methods according to the columns, but sorts according to the weights, so as to quickly find relevant information.
  • the degree of user association is determined by common feature data of a user currently performing a search and the initial search result.
  • the update time dimension parameters are extracted to ensure that the ranking is performed according to time, the initial search results that have common characteristics with the user are ranked higher by the degree of user association, and the search results are ranked by multiple dimensions.
  • the sorting is intelligent, so that users can quickly find relevant information, simplifying operations and improving search efficiency.
  • the obtaining the text similarity weight includes:
  • the step of calculating a text similarity weight according to the hit ratio, the order consistency index, the position closeness, and the coverage includes: according to the hit rate, the order consistency index, the position closeness, and the coverage.
  • the offset value and the correction value can be determined through machine learning.
  • obtaining the offset value and the correction value according to the hit rate, the order consistency index, the position closeness, and the coverage rate includes: obtaining the offset value and the correction value according to the hit rate, and obtaining the offset value and the correction value according to the order consistency index.
  • An offset value and a correction value, an offset value and a correction value are obtained according to the position closeness index, and an offset value and a correction value are obtained according to the coverage ratio.
  • the specific formula for calculating the text similarity weight is:
  • text_similar (a * hit + b) * (c * sequence + d) * (e * position + f) * (g * cover + h);
  • text_similar is the text similarity weight
  • hit is the text hit rate
  • sequence I is the order consistency index
  • position is the position closeness
  • cover is the coverage.
  • a, b are the offset values and correction values of the hit rate
  • c, d are the offset values and correction values of the order consistency index
  • e, f are the offset values and correction values of the position closeness
  • g, h It is the offset value and correction value of the coverage rate. The larger the offset value is, the more important the item is.
  • the text hit ratio indicates the ratio of the number of search keywords hitting in the corresponding text document to the total number of search keywords. Obviously, the higher the ratio, the closer the initial search result is to the search target.
  • the order consistency index indicates the consistency of the order of the search keywords and the order of the search keywords that appear in the corresponding text documents.
  • the order consistency is expressed by the ratio of the number of reverse orders, such as (1, 2, 3) reverse order The number is 0, which is the most ordered arrangement, and the number of (3, 2, 1) reverse order is 3, which is the most disordered arrangement.
  • the position closeness indicates the ratio of the number of hit text documents to the sum of the number of hit text documents and the number of hit intervals.
  • the obtaining the update time dimension weights includes:
  • the calculation formula of the update time dimension weight is as follows:
  • update_time_weight factor / (factor + update_time_secs);
  • update_time_weight is the weight of the update time dimension
  • factor is a constant that decays with time
  • the unit is second.
  • the acquiring user association degree weight includes:
  • the step of calculating the weight of the degree of user association based on the number of common contacts, common department characteristic values, common office location characteristic values, and common personal tags includes: Obtain the offset value and the correction value of the department characteristic value, the characteristic value of the common office location and the number of the common personal tags, respectively; The shift value and the correction value are fused and calculated to obtain the user relevance degree weight.
  • the offset value and the correction value can be determined through machine learning.
  • obtaining the offset value and the correction value according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags includes: obtaining the offset value and the correction value according to the number of the common contacts, An offset value and a correction value are obtained according to the characteristic value of the common department, an offset value and a correction value are obtained according to the characteristic value of the common office location, and an offset value and a correction value are obtained according to the number of the common personal tags.
  • the degree of user association is used to describe the common characteristics of users and contacts. Common characteristics include: people who have been connected, departments, offices, and personal tags. Users refer to users who perform searches, contacts The person refers to the contact corresponding to the initial search result. For example, if there are many contacts between User A and Contact B, it means that User A and Contact B are highly related, and User A and Contact B have not yet established a contact, but there are many common characteristics, then Contact B is User A tends to search for objects. By calculating the degree of user association, the user's personalized search can be satisfied, and contacts with the same characteristics as the user are ranked higher.
  • the degree of user association is mined by offline data, and calculated by a plurality of common characteristics.
  • the specific calculation formula for the user association degree weight is as follows:
  • user_relevant_weight (i * same_user_num + j) * (k * same_department + l) * (m * same_place + n) * (o * same_tag + p);
  • user_relevant_weight is the weight of the user's association degree
  • same_user_num is the number of common contacts
  • the number of common contacts represents the number of common contacts of the contact corresponding to the initial search result of the subject and the value is an integer greater than 0
  • same_department is Common department characteristic value, when it is located in the same department, the value is 1, but not in the same department, the value is 0
  • same_place is the characteristic value of the common office location, when it is located in the same office, the value is 1, but not in the same office.
  • the value of place is 0
  • same_tag is the number of common personal tags, which means that users have the same number of tags. If they all have the same "travel reading" tag, the value of same_tag is 2.
  • i, j are the offset values and correction values of the number of common contacts
  • k, l are the offset values and correction values of the common department characteristic values
  • m, n are the offset values and correction values of the common office location characteristic values
  • o and p are the offset value and the correction value of the number of common personal tags, where a larger offset value indicates that the item is more important.
  • performing the fusion calculation according to the text similarity weight, the update time dimension weight, and the user relevance degree weight, and obtaining the comprehensive weight of each of the initial search results includes: weighting the text similarity
  • the update time dimension weight and the user relevance degree weight are normalized to decimals between 0 and 1.
  • a fusion calculation is performed to obtain each A comprehensive weight of the initial search result.
  • the corresponding text similarity weight, update time dimension weight, and user correlation degree weight are obtained according to the text similarity, update time dimension, and user association degree, and according to the text similarity weight, Performing a fusion calculation on each of the initial search results by updating the weight of the time dimension and the weight of the degree of user association, and obtaining a comprehensive weight of each of the initial search results includes: according to the text similarity, the update time dimension, and the degree of user association, Calculate text similarity weight, update time dimension weight, and user relevance degree weight; obtain offset and correction values respectively according to the text similarity weight, update time dimension weight, and user relevance degree weight; calculate text similarity weight, update respectively The product of the time dimension weight and the user's relevance degree weight and the offset value corresponding to it is then added to the sum of the corresponding correction value to obtain a fusion coefficient; the fusion coefficient is multiplied to obtain each of the initial search results Comprehensive weight.
  • the offset value and the correction value can be determined through machine learning.
  • obtaining the offset value and the correction value according to the text similarity weight, the update time dimension weight, and the user relevance degree weight respectively includes: obtaining the offset value and the correction value according to the text similarity weight, and obtaining the offset value and the correction value according to the update time dimension weight. Offset value and correction value, and obtain offset value and correction value according to the user's correlation degree weight.
  • the formula for calculating the comprehensive weight is as follows:
  • weight (a1 * text_weight + b1) * (a2 * update_time_weight + b2) * (a3 * user_relevant_weight + b3)
  • weight represents the comprehensive weight of the initial search result
  • text_weight represents the text similarity weight
  • update_time_weight represents the chat update time weight
  • user_relevant_weight represents the user relevance degree weight
  • a1 is the offset value
  • b1 is the correction value
  • a1 * text_weight + b1 is calculated to obtain the first A fusion coefficient
  • update_time_weight represents the update time dimension weight
  • a2 is the offset value
  • b2 is the correction value
  • a2 * update_time_weight + b2 is calculated to obtain the second fusion coefficient
  • multiple fusion coefficients are multiplied to obtain the comprehensive weight of the initial search result.
  • a1, a2, and a3 are offset values
  • b1, b2, and b3 are correction values.
  • steps in the flowchart of FIG. 2-5 are sequentially displayed according to the directions of the arrows, these steps are not necessarily performed sequentially in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in FIG. 2-5 may include multiple sub-steps or stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.
  • a search ranking device which includes: an initial retrieval result extraction module 601, a feature factor extraction module 602, a weight calculation module 603, and a ranking module 604, where:
  • An initial search result extraction module 601 is configured to obtain a search keyword and determine a plurality of initial search results that match the plurality of keywords.
  • the search keywords are input information such as words, words, and symbols entered by the user when using a search engine to find related information.
  • the initial search results include multiple fields. Specifically, the initial search results refer to contacts or group chats. .
  • a search keyword is input at the terminal, and the terminal obtains the search keyword input by the user and sends it to the server.
  • a feature factor extraction module 602 is configured to extract a text similarity, an update time dimension, and a user association degree related to each of the initial search results.
  • the initial search result is a text document matching the search keywords; the text similarity, update time dimension, and user relevance are obtained from the initial search results, and some information related to the keywords is extracted according to the text document.
  • the search and ranking device further includes: a filtering module, configured to filter the initial search result.
  • the filtering the initial search results includes: the initial search results of the leaving users and no chat history are not sorted; and the initial search results of the unregistered users are ranked last.
  • the chat history can be determined by the chat update time or the corresponding location of the latest message.
  • a weight calculation module 603 configured to obtain corresponding text similarity weights, update time dimension weights, and user correlation degree weights according to the text similarity, update time dimension, and user association degree, and according to the text similarity weights,
  • the update time dimension weight and the user relevance degree weight and the text similarity parameter, the update time dimension parameter, and the user relevance degree parameter are subjected to fusion calculation to obtain a comprehensive weight value of each of the initial search results.
  • a sorting module 604 is configured to sort the multiple initial search results according to the comprehensive weight.
  • the sorting when sorting, the sorting can be performed according to the weight value from large to small, or the sorting can be performed according to the weight value from small to large. Using this technical solution does not distinguish the sorting methods according to the columns, but sorts according to the weights, so as to quickly find relevant information.
  • the degree of user association is determined by common feature data of a user currently performing a search and the initial search result.
  • each initial search result is aimed at contacts or groups.
  • the fields contained in each initial search result include: one or more of object type, object status, object name, initial recall search engine score, chat update time, last message location, object pinyin name, object English name, and department information Species.
  • the object types include chat applications and emails, and the object status includes whether to register or leave.
  • the feature factor extraction module 602 includes a text similarity weight calculation unit 701, an update time dimension weight calculation unit 702, and a user relevance degree weight calculation unit 703, where:
  • the text similarity weight calculation unit 701 is configured to calculate a hit rate, an order consistency index, a location closeness, and a coverage rate of the keywords in the initial search result, and according to the hit rate, the order consistency index, Position compactness and coverage, and calculate text similarity weights.
  • the text similarity weight calculation unit includes: a first offset value and a correction value acquisition subunit, configured to obtain offsets respectively according to the hit ratio, the order consistency index, the position closeness, and the coverage ratio. Value and correction value; a text similarity fusion calculation subunit, configured to perform fusion calculation according to the hit ratio, order consistency index, position closeness and coverage, and the offset value and correction value to obtain a text similarity weight .
  • the offset value and the correction value can be determined through machine learning.
  • obtaining the offset value and the correction value according to the hit rate, the order consistency index, the position closeness, and the coverage rate includes: obtaining the offset value and the correction value according to the hit rate, and obtaining the offset value and the correction value according to the order consistency index. An offset value and a correction value, an offset value and a correction value are obtained according to the position closeness index, and an offset value and a correction value are obtained according to the coverage ratio.
  • the specific formula for calculating the text similarity weight is:
  • text_similar (a * hit + b) * (c * sequence + d) * (e * position + f) * (g * cover + h);
  • text_similar is the text similarity weight
  • hit is the text hit rate
  • sequence I is the order consistency index
  • position is the position closeness
  • cover is the coverage.
  • a, b are the offset values and correction values of the hit rate
  • c, d are the offset values and correction values of the order consistency index
  • e, f are the offset values and correction values of the position closeness
  • g, h It is the offset value and correction value of the coverage rate. The larger the offset value is, the more important the item is.
  • the text hit ratio indicates the ratio of the number of search keywords hitting in the corresponding text document to the total number of search keywords. Obviously, the higher the ratio, the closer the initial search result is to the search target.
  • the order consistency index indicates the consistency of the order of the search keywords and the order of the search keywords that appear in the corresponding text documents.
  • the order consistency is expressed by the ratio of the number of reverse orders, such as (1, 2, 3) reverse order The number is 0, which is the most ordered arrangement, and the number of (3, 2, 1) reverse order is 3, which is the most disordered arrangement.
  • the position closeness indicates the ratio of the number of hit text documents to the sum of the number of hit text documents and the number of hit intervals.
  • An update time dimension weight calculation unit 702 is configured to obtain a time interval between the last chat time and the current time according to the initial search result, and calculate a ratio of the attenuation constant to the sum of the time interval and the attenuation constant to obtain the The chat update time weight.
  • the formula for calculating the update time dimension weight is as follows:
  • update_time_weight factor / (factor + update_time_secs);
  • update_time_weight is the weight of the update time dimension
  • factor is a constant that decays with time
  • the unit is second.
  • the user association degree weight calculation unit 703 is configured to calculate the number of common contacts, common department characteristic values, common office location characteristic values, and common personal tags in the initial search result and the current search, and according to the number of common contacts , The characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags, and calculate the user's relevance degree weight.
  • the user association degree weight calculation unit 703 includes: a second offset value and a correction value acquisition subunit, configured to obtain offsets respectively according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags. Value and correction value; a user correlation degree fusion calculation subunit, configured to perform fusion calculation according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, the number of common personal tags, and the offset value and the correction value To get the user relevance weight.
  • the offset value and the correction value can be determined through machine learning.
  • obtaining the offset value and the correction value according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags includes: obtaining the offset value and the correction value according to the number of the common contacts, An offset value and a correction value are obtained according to the characteristic value of the common department, an offset value and a correction value are obtained according to the characteristic value of the common office location, and an offset value and a correction value are obtained according to the number of the common personal tags.
  • the degree of user association is used to describe the common characteristics of users and contacts. Common characteristics include: people who have been connected, departments, offices, and personal tags. Users refer to users who perform searches, contacts The person refers to the contact corresponding to the initial search result. For example, there are many people that have been contacted between User A and Contact B, indicating that User A and Contact B are highly relevant, and User A and Contact B have not yet established a contact, but there are many common characteristics, then Contact B is User A tends to search for objects. By calculating the degree of user association, the user's personalized search can be satisfied, and contacts with the same characteristics as the user are ranked higher.
  • the degree of user association is mined by offline data, and calculated by a plurality of common characteristics.
  • the specific calculation formula for the user association degree weight is as follows:
  • user_relevant_weight (i * same_user_num + j) * (k * same_department + l) * (m * same_place + n) * (o * same_tag + p);
  • user_relevant_weight is the weight of the user's association degree
  • same_user_num is the number of common contacts
  • the number of common contacts represents the number of common contacts of the contact corresponding to the initial search result of the subject and the value is an integer greater than 0
  • Common department characteristic value when it is located in the same department, the value is 1, but not in the same department, the value is 0
  • same_place is the characteristic value of the common office location, when it is located in the same office, the value is 1, but not in the same office.
  • the value of place is 0
  • same_tag is the number of common personal tags, which means that users have the same number of tags. If they all have the same "travel reading" tag, the value of same_tag is 2.
  • i, j are the offset values and correction values of the number of common contacts
  • k, l are the offset values and correction values of the common department characteristic values
  • m, n are the offset values and correction values of the common office location characteristic values
  • o and p are the offset value and the correction value of the number of common personal tags, where a larger offset value indicates that the item is more important.
  • the weight calculation module 603 includes:
  • a normalization unit 801 configured to normalize the text similarity weight, the update time dimension weight, and the user association degree weight to decimals between 0 and 1;
  • a fusion calculation unit 802 is configured to perform fusion calculation according to the normalized text similarity weight, update time dimension weight, and user association degree weight to obtain a comprehensive weight of each of the initial search results.
  • the weight calculation module includes a weight acquisition unit configured to calculate a text similarity weight, an update time dimension weight, and a user correlation degree weight according to the text similarity, an update time dimension, and a user association degree.
  • An offset value and a correction value acquisition unit configured to obtain an offset value and a correction value respectively according to the text similarity weight, an update time dimension weight, and a user relevance degree weight;
  • a fusion coefficient calculation unit to calculate a text similarity weight, The product of the update time dimension weight and the user association degree weight and the offset value corresponding to it is added to the sum of the corresponding correction value to obtain a fusion coefficient;
  • a comprehensive weight calculation unit is configured to multiply the fusion coefficient To obtain a comprehensive weight of each of the initial search results.
  • the formula for calculating the comprehensive weight is as follows:
  • weight (a1 * text_weight + b1) * (a2 * update_time_weight + b2) * (a3 * user_relevant_weight + b3)
  • weight represents the comprehensive weight of the initial search result
  • text_weight represents the text similarity weight
  • update_time_weight represents the chat update time weight
  • user_relevant_weight represents the user relevance degree weight
  • a1 is the offset value
  • b1 is the correction value
  • a1 * text_weight + b1 is calculated to obtain the first A fusion coefficient
  • update_time_weight represents the update time dimension weight
  • a2 is the offset value
  • b2 is the correction value
  • a2 * update_time_weight + b2 is calculated to obtain the second fusion coefficient
  • multiple fusion coefficients are multiplied to obtain the comprehensive weight of the initial search result.
  • a1, a2, and a3 are offset values
  • b1, b2, and b3 are correction values.
  • the above-mentioned search sorting device ensures that the sorting is performed according to time by extracting and updating the time dimension parameters, ranking the initial search results that have common characteristics with the user according to the degree of user association, and sorting the search results through multiple dimensions, so that Intelligent sorting makes it easy for users to quickly find relevant information, which simplifies operations and improves search efficiency.
  • Each module in the search sorting device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the hardware or independent of the processor in the electronic device, or may be stored in the memory of the electronic device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • an electronic device is provided.
  • the electronic device may be a server, and the internal structure diagram may be as shown in FIG. 9.
  • the electronic device includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the electronic device is used to provide computing and control capabilities.
  • the memory of the electronic device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for running an operating system and computer programs in a non-volatile storage medium.
  • the database of the electronic device is used to store search-sorted data.
  • the network interface of the electronic device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by a processor to implement a search ranking method.
  • FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the electronic device to which the solution of the present application is applied.
  • the specific electronic device may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • ElasticSearch (hereinafter referred to as ES) is an open source distributed search engine.
  • ES is used for data storage, and it can quickly recall the matching initial search results by establishing an inverted index;
  • Search is used to pass the search request issued by the application layer to the ES and obtain the initial search results corresponding to the search request;
  • Ranker is used to combine the initial search results with the text similarity, update time dimension and user association degree to perform comprehensive weight calculation And sort, and return the sorted results to the Searcher.
  • the initial search results of the ES recall include the initial recall search engine scores.
  • the initial recall search engine scores cannot meet the needs of multi-dimensional sorting.
  • Using the search ranking method of the embodiment of the present invention can sort the initial search results. Search and Ranker can be implemented through the server.
  • an electronic device which includes a memory and a processor.
  • a computer program is stored in the memory, and the processor executes the computer program to implement the following steps: acquiring a search keyword, and determining a relationship with the plurality of keywords. Matching multiple initial search results; extracting text similarity, update time dimension, and user relevance degree related to each of the initial search results; obtaining corresponding text similarity according to the text similarity, update time dimension, and user relevance degree Degree weight, update time dimension weight, and user association degree weight, and fuse according to the text similarity weight, update time dimension weight and user association degree weight, and the text similarity parameter, update time dimension parameter, and user association degree parameter. Calculate to obtain an integrated weight of each of the initial search results; and sort the plurality of initial search results according to the integrated weight.
  • a computer-readable storage medium on which a computer program is stored.
  • the following steps are implemented: obtaining a search keyword, and determining a plurality of keywords matching the plurality of keywords.
  • Initial search results extracting text similarity, update time dimension, and user relevance degree associated with each of the initial search results; obtaining corresponding text similarity weights, based on the text similarity, update time dimension, and user relevance degree, Update the time dimension weight and user relevance degree weight, and perform fusion calculation based on the text similarity weight, update time dimension weight and user relevance degree weight, and the text similarity parameter, update time dimension parameter, and user relevance degree parameter to obtain A comprehensive weight of each of the initial search results; and ranking the plurality of initial search results according to the comprehensive weight.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM dual data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Synchlink DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Abstract

A search sorting method and system, an electronic device and a storage medium. The method comprises: acquiring search keywords, and determining a plurality of initial search results that match a plurality of the keywords (S210); extracting a text similarity level, an update time dimension and a user association degree which are associated with each of the initial search results (S220); acquiring a corresponding text similarity level weight, update time dimension weight and user association degree weight according to the text similarity, the update time dimension and the user association degree, and performing fusion calculation on each of the initial search results according to the text similarity level weight, the update time dimension weight and the user association degree so as to obtain a comprehensive weight of each of the initial search results (S230); and sorting the plurality of initial search results according to the comprehensive weights (S240). By sorting initial search results of multiple columns, the described method may quickly find a target result, thereby saving operation time and improving search efficiency.

Description

搜索排序方法、装置、电子设备和存储介质Search sorting method, device, electronic equipment and storage medium
相关申请的交叉引用Cross-reference to related applications
本申请要求天津字节跳动科技有限公司于2018年07月27日提交的、申请名称为“搜索排序方法、装置、电子设备和存储介质”的、中国专利申请号“201810847290.1”的优先权,该申请的全部内容通过引用结合在本申请中。This application claims the priority of China Patent Application No. “201810847290.1”, submitted by Tianjin BYTE Technology Co., Ltd. on July 27, 2018, with the application name “Search Sorting Method, Device, Electronic Equipment and Storage Medium”, which The entire contents of the application are incorporated herein by reference.
技术领域Technical field
本申请涉及企业即时通讯系统技术领域,特别是涉及一种搜索排序方法、装置、电子设备和存储介质。The present application relates to the technical field of enterprise instant messaging systems, and in particular, to a search sorting method, device, electronic device, and storage medium.
背景技术Background technique
随着智能设备的快速发展,聊天应用软件越来越多,聊天应用软件的使用能够方便用户进行异地沟通。其中聊天应用软件包括个人聊天应用软件和企业聊天应用软件。企业聊天应用软件的使用过程中,用户需要查找相关信息时,会启动搜索功能,如搜索聊天信息、联系人或者群聊,以便快速查找到相关信息或者快速建立聊天链接。With the rapid development of smart devices, there are more and more chat application software. The use of chat application software can facilitate users to communicate in different places. The chat application software includes personal chat application software and enterprise chat application software. During the use of the enterprise chat application software, when a user needs to find relevant information, a search function is activated, such as searching for chat information, contacts, or group chat, in order to quickly find relevant information or quickly establish a chat link.
目前,在实现企业聊天应用软件搜索功能时,发现存在如下问题:At present, when implementing the search function of enterprise chat application software, the following problems are found:
企业聊天应用软件的搜索结果是按不同的对象分开展示的,如联系人、群聊、消息等信息都是分栏目展示的,且显示的对象是通过时间先后来进行排序,用户根据展示的栏目来查找相关信息,操作繁琐且耗时多。The search results of the enterprise chat application software are displayed separately according to different objects, such as contacts, group chats, messages, etc. are displayed in columns, and the displayed objects are sorted by time, and users are displayed according to the displayed columns. Finding relevant information is tedious and time-consuming.
发明内容Summary of the Invention
基于此,有必要针对上述技术问题,提供一种能够多维度进行排序的搜索排序方法、装置、电子设备和存储介质。Based on this, it is necessary to provide a search sorting method, device, electronic device, and storage medium capable of sorting in multiple dimensions in response to the above technical problems.
一种搜索排序方法,所述方法包括:A search ranking method, the method includes:
获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;Acquiring search keywords, and determining a plurality of initial search results matching the plurality of keywords;
提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;Extracting text similarity, update time dimension and user relevance degree related to each of the initial search results;
根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值;Obtain the corresponding text similarity weight, update time dimension weight and user relevance degree weight according to the text similarity, update time dimension and user relevance degree, and according to the text similarity weight, update time dimension weight and user relevance degree Performing weight calculation on each of the initial search results to obtain a comprehensive weight value of each of the initial search results;
根据所述综合权值对所述多个初始检索结果进行排序。Sort the plurality of initial search results according to the comprehensive weight.
一种搜索排序装置,所述装置包括:A search sorting device, the device includes:
初始检索结果提取模块,获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;An initial search result extraction module, which obtains search keywords and determines a plurality of initial search results matching the plurality of keywords;
特征因子提取模块,提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;A feature factor extraction module that extracts text similarity, update time dimension, and user association degree related to each of the initial search results;
权值计算模块,根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值;The weight calculation module obtains the corresponding text similarity weight, update time dimension weight, and user relevance degree weight according to the text similarity, update time dimension, and user association degree, and updates the time dimension according to the text similarity weight, Performing a fusion calculation on each of the initial search results with a weight and a weight of a degree of user association to obtain a comprehensive weight of each of the initial search results;
排序模块,根据所述综合权值对所述多个初始检索结果进行排序。A sorting module sorts the plurality of initial search results according to the comprehensive weight.
一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如下步骤:An electronic device includes a memory and a processor. The memory stores a computer program. When the processor executes the computer program, the following steps are implemented:
获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;Acquiring search keywords, and determining a plurality of initial search results matching the plurality of keywords;
提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;Extracting text similarity, update time dimension and user relevance degree related to each of the initial search results;
根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值;Obtain the corresponding text similarity weight, update time dimension weight and user relevance degree weight according to the text similarity, update time dimension and user relevance degree, and according to the text similarity weight, update time dimension weight and user relevance degree Performing weight calculation on each of the initial search results to obtain a comprehensive weight value of each of the initial search results;
根据所述综合权值对所述多个初始检索结果进行排序。Sort the plurality of initial search results according to the comprehensive weight.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium stores a computer program thereon. When the computer program is executed by a processor, the following steps are implemented:
获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;Acquiring search keywords, and determining a plurality of initial search results matching the plurality of keywords;
提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;Extracting text similarity, update time dimension and user relevance degree related to each of the initial search results;
根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值;Obtain the corresponding text similarity weight, update time dimension weight and user relevance degree weight according to the text similarity, update time dimension and user relevance degree, and according to the text similarity weight, update time dimension weight and user relevance degree Performing weight calculation on each of the initial search results to obtain a comprehensive weight value of each of the initial search results;
根据所述综合权值对所述多个初始检索结果进行排序。Sort the plurality of initial search results according to the comprehensive weight.
上述搜索排序方法、装置、电子设备和存储介质,通过提取更新时间维度参数来确保排序是依照时间进行,通过用户关联程度将与用户具有共同特征的初始检索结果排序靠前,通过多个维度来进行检索结果的排序,使得排序智能化,方便用户快速查找到相关信息, 简化了操作提高了查找效率。The above search sorting method, device, electronic device and storage medium ensure that the sorting is performed according to time by extracting and updating the time dimension parameters, ranking the initial search results that have common characteristics with the user according to the degree of user association, and sorting through multiple dimensions. Sorting the search results makes the sorting intelligent, making it easy for users to quickly find relevant information, simplifying operations and improving search efficiency.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本公开实施例中的技术方案,下面将对一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,其中:In order to explain the technical solutions in the embodiments of the present disclosure more clearly, one or more embodiments will be exemplarily described through the pictures in the accompanying drawings, and these exemplary descriptions do not limit the embodiments. ,among them:
图1为一个实施例中搜索排序方法的应用环境图;FIG. 1 is an application environment diagram of a search ranking method in an embodiment; FIG.
图2为一个实施例中搜索排序方法的流程示意图;FIG. 2 is a schematic flowchart of a search ranking method according to an embodiment; FIG.
图3为一个实施例中获取文本相似度权重步骤的流程示意图;3 is a schematic flowchart of a step of obtaining a text similarity weight in an embodiment;
图4为一个实施例中获取更新时间维度权重步骤的流程示意图;4 is a schematic flowchart of a step of obtaining weights of an update time dimension according to an embodiment;
图5为一个实施例中获取用户关联程度权重步骤的流程示意图;FIG. 5 is a schematic flowchart of a step of obtaining a user association degree weight in an embodiment; FIG.
图6为一个实施例中搜索排序装置的结构框图;FIG. 6 is a structural block diagram of a search and ranking device in an embodiment; FIG.
图7为一个实施例中特征因子提取模块的结构框图;7 is a structural block diagram of a feature factor extraction module in an embodiment;
图8为一个实施例中权值计算模块的结构框图;8 is a structural block diagram of a weight calculation module in an embodiment;
图9为一个实施例中电子设备的内部结构图FIG. 9 is an internal structure diagram of an electronic device in an embodiment
图10为一个实施例中服务器搜索主体模块图。FIG. 10 is a block diagram of a server search subject in an embodiment.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution, and advantages of the present application clearer, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.
本申请提供的搜索排序方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。在终端102输入搜索关键词,服务器104获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值;根据所述综合权值对所述多个初始检索结果进行排序,排序结果显示于终端102。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The search ranking method provided in this application can be applied to the application environment shown in FIG. 1. The terminal 102 communicates with the server 104 through the network through the network. Enter the search keywords in the terminal 102, the server 104 obtains the search keywords, and determines a plurality of initial search results that match the plurality of keywords; extracts the text similarity, update time dimension and user related to each of the initial search results Degree of relevance; obtaining corresponding text similarity weights, update time dimension weights, and user relevance degree weights according to the text similarity, update time dimension, and user relevance degrees, and according to the text similarity weights, update time dimension weights, and The user relevance degree weight is calculated by fusing each of the initial search results to obtain a comprehensive weight of each of the initial search results; the plurality of initial search results are sorted according to the comprehensive weight, and the ranking results are displayed in Terminal 102. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.
在一个实施例中,如图2所示,提供了一种搜索排序方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In one embodiment, as shown in FIG. 2, a search ranking method is provided. The method is applied to the server in FIG. 1 as an example, and includes the following steps:
步骤210,获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果。Step 210: Obtain a search keyword, and determine multiple initial search results that match the multiple keywords.
其中,搜索关键词为用户在使用搜索引擎查找相关信息时输入的字、词、符号等输入信息,初始检索结果包含多个字段,具体的,初始检索结果指代的对象为联系人或群聊。The search keywords are input information such as words, words, and symbols entered by the user when using a search engine to find related information. The initial search results include multiple fields. Specifically, the initial search results refer to contacts or group chats. .
具体地,在终端输入搜索关键词,终端获取用户输入的搜素关键词发送至服务器。Specifically, a search keyword is input at the terminal, and the terminal obtains the search keyword input by the user and sends it to the server.
步骤220,提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度。Step 220: Extract the text similarity, update time dimension and user association degree related to each of the initial search results.
其中,每条初始检索结果包含的字段包括:对象类型、对象状态、对象名称、初始召回搜索引擎分数、聊天更新时间、最近一条消息位置、对象拼音名、对象英文名、所在部门信息一种或多种。其中,对象类型包括聊天应用、邮件,对象状态包括是否注册、是否离职。The fields contained in each initial search result include: object type, object status, object name, initial recall search engine score, chat update time, last message location, object pinyin name, object English name, department information, or Multiple. Among them, the object types include chat applications and emails, and the object status includes whether to register or leave.
作为一个优选的实施方式,所述提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度之前包括:对所述初始检索结果进行筛选。其中,所述对所述初始检索结果进行筛选包括:离职用户且无聊天记录的初始检索结果不进行排序;将未注册用户的初始检索结果排在最后。聊天记录可以通过聊天更新时间或最近一条消息对应位置确定。As a preferred embodiment, before extracting the text similarity, the update time dimension, and the degree of user relevance related to each of the initial search results, the method includes: filtering the initial search results. Wherein, the filtering the initial search results includes: the initial search results of the leaving users and no chat history are not sorted; and the initial search results of the unregistered users are ranked last. The chat history can be determined by the chat update time or the corresponding location of the latest message.
步骤230,根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值。Step 230: Obtain corresponding text similarity weights, update time dimension weights, and user correlation degree weights according to the text similarity, update time dimension, and user association degree, and according to the text similarity weights, update time dimension weights, and The user's degree of relevance weight performs fusion calculation on each of the initial search results to obtain a comprehensive weight of each of the initial search results.
其中,文本相似度权重用于表征搜索关键词与初始检索结果匹配程度,更新时间维度权重用于表征初始检索结果聊天记录更新情况,用户关联程度权重用于表征初始检索结果是多个用户关注的目标。Among them, the text similarity weight is used to represent the degree of matching between the search keywords and the initial search results, the update time dimension weight is used to represent the initial search results chat record update, and the user relevance degree weight is used to indicate that the initial search results are of interest to multiple users aims.
步骤240,根据所述综合权值对所述多个初始检索结果进行排序。Step 240: Sort the multiple initial search results according to the comprehensive weight.
其中,进行排序时,可以依据权值从大到小来进行排序,也可以依据权值从小到大来进行排序。采用此种技术方案不依据栏目来区分排序方式,而是根据权值来进行排序,实现快速查找到相关信息。Wherein, when sorting, the sorting can be performed according to the weight value from large to small, or the sorting can be performed according to the weight value from small to large. Using this technical solution does not distinguish the sorting methods according to the columns, but sorts according to the weights, so as to quickly find relevant information.
本实施例中,所述用户关联程度由当前进行搜索的用户与所述初始检索结果的共同特征数据确定。In this embodiment, the degree of user association is determined by common feature data of a user currently performing a search and the initial search result.
上述搜素排序方法中,通过提取更新时间维度参数来确保排序是依照时间进行,通过用户关联程度将与用户具有共同特征的初始检索结果排序靠前,通过多个维度来进行检索结果的排序,使得排序智能化,方便用户快速查找到相关信息,简化了操作提高了查找效率。In the above search ranking method, the update time dimension parameters are extracted to ensure that the ranking is performed according to time, the initial search results that have common characteristics with the user are ranked higher by the degree of user association, and the search results are ranked by multiple dimensions. The sorting is intelligent, so that users can quickly find relevant information, simplifying operations and improving search efficiency.
在一个实施例中,如图3所示,所述获取文本相似度权重包括:In one embodiment, as shown in FIG. 3, the obtaining the text similarity weight includes:
S321,计算所述关键词在所述初始检索结果中的命中率、顺序一致性指标、位置紧密度和覆盖率。S321. Calculate a hit rate, an order consistency index, a position closeness, and a coverage rate of the keywords in the initial search result.
S322,根据所述命中率、顺序一致性指标、位置紧密度和覆盖率,计算文本相似度权重。S322. Calculate a text similarity weight according to the hit ratio, the order consistency index, the position closeness, and the coverage ratio.
在一个实施例中,所述根据所述命中率、顺序一致性指标、位置紧密度和覆盖率计算文本相似度权重的步骤包括:根据所述命中率、顺序一致性指标、位置紧密度和覆盖率分别获取偏移值和修正值;根据所述命中率、顺序一致性指标、位置紧密度和覆盖率和所述偏移值和修正值进行融合计算,得到文本相似度权重。其中,所述偏移值和修正值可通过机器学习确定。其中,根据所述命中率、顺序一致性指标、位置紧密度和覆盖率分别获取偏移值和修正值包括:根据所述命中率获取偏移值和修正值,根据所述顺序一致性指标获取偏移值和修正值,根据所述位置紧密度指标获取偏移值和修正值,根据所述覆盖率获取偏移值和修正值。In one embodiment, the step of calculating a text similarity weight according to the hit ratio, the order consistency index, the position closeness, and the coverage includes: according to the hit rate, the order consistency index, the position closeness, and the coverage. Obtain offset values and correction values respectively; perform fusion calculations according to the hit ratio, order consistency index, location closeness and coverage, and the offset values and correction values to obtain text similarity weights. The offset value and the correction value can be determined through machine learning. Wherein, obtaining the offset value and the correction value according to the hit rate, the order consistency index, the position closeness, and the coverage rate includes: obtaining the offset value and the correction value according to the hit rate, and obtaining the offset value and the correction value according to the order consistency index. An offset value and a correction value, an offset value and a correction value are obtained according to the position closeness index, and an offset value and a correction value are obtained according to the coverage ratio.
在其中一个实施例中,计算文本相似度权重具体公式为:In one embodiment, the specific formula for calculating the text similarity weight is:
text_similar=(a*hit+b)*(c*sequence+d)*(e*position+f)*(g*cover+h);其中,text_similar为文本相似度权重,hit为文本命中率,sequence为顺序一致性指标,position为位置紧密度,cover为覆盖率。其中,a、b为命中率的偏移值和修正值,c、d为顺序一致性指标的偏移值和修正值,e、f为位置紧密度的偏移值和修正值,g、h为覆盖率的偏移值和修正值,其中,偏移值越大表示该项的重要程度越高。其中,文本命中率表示搜索关键词在对应的文本文档中命中的个数与搜索关键词的总个数的比率,显然所占的比率越高表示初始检索结果越接近搜索目标。顺序一致性指标表示搜索关键词的顺序与对应的文本文档的出现的搜索关键词的顺序的一致性,顺序一致性通过逆序的个数的比例来表达,如(1,2,3)逆序个数为0,即最有序的排列,(3,2,1)逆序个数为3,为最无序的排列。位置紧密度表示命中的文本文档个数与命中文本文档个数与命中的间隔数之和的比率,如关键词“张三张四李四”,命中的初始检索结果“张三”、“李四的群”,命中的关键词“张三李四”,命中文本文档个数t为2,命中的间隔数之和为1(因为中间隔了一个张四),因此,位置紧密度=2/(1+2)=2/3。覆盖率表示命中的关键字占全部命中文本文档总字段的比率。text_similar = (a * hit + b) * (c * sequence + d) * (e * position + f) * (g * cover + h); where text_similar is the text similarity weight, hit is the text hit rate, and sequence Is the order consistency index, position is the position closeness, and cover is the coverage. Among them, a, b are the offset values and correction values of the hit rate, c, d are the offset values and correction values of the order consistency index, e, f are the offset values and correction values of the position closeness, and g, h It is the offset value and correction value of the coverage rate. The larger the offset value is, the more important the item is. The text hit ratio indicates the ratio of the number of search keywords hitting in the corresponding text document to the total number of search keywords. Obviously, the higher the ratio, the closer the initial search result is to the search target. The order consistency index indicates the consistency of the order of the search keywords and the order of the search keywords that appear in the corresponding text documents. The order consistency is expressed by the ratio of the number of reverse orders, such as (1, 2, 3) reverse order The number is 0, which is the most ordered arrangement, and the number of (3, 2, 1) reverse order is 3, which is the most disordered arrangement. The position closeness indicates the ratio of the number of hit text documents to the sum of the number of hit text documents and the number of hit intervals. For example, the keyword "Zhang San Zhang Si Li Si", the initial search results of the hit "Zhang San", "Li "Group of four", the hit keyword "Zhang Sanli Si", the number of hits to the text document t is 2, and the sum of the number of hits is 1 (because there is a Zhang Si in the middle), so the position tightness = 2 / (1 +2) = 2/3. Coverage represents the ratio of hit keywords to the total field of all hit text documents.
在一个实施例中,如图4所示,所述获取更新时间维度权重包括:In one embodiment, as shown in FIG. 4, the obtaining the update time dimension weights includes:
S421,根据所述初始检索结果,获取最后一次聊天时间距离当前时间的时间间隔。S421. Obtain a time interval between the last chat time and the current time according to the initial search result.
S422,计算衰减常数与所述时间间隔与所述衰减常数之和的比值,得到所述聊天更新 时间权重。S422. Calculate the ratio of the attenuation constant to the sum of the time interval and the attenuation constant to obtain the chat update time weight.
在其中一个实施例中,更新时间维度权重计算公式如下:In one embodiment, the calculation formula of the update time dimension weight is as follows:
update_time_weight=factor/(factor+update_time_secs);update_time_weight = factor / (factor + update_time_secs);
其中,update_time_weight为更新时间维度权重,factor是一个随时间衰减的常数,单位是秒,这里按照30天衰减一半来计算,factor=30*24*3600=2592000。update_time_secs是最后一次聊天时间距离现在的秒数,比如最后一次聊天时间是30天前,则update_time_secs=30*24*3600=259200,那么更新时间维度update_time_weight=259200/(259200+259200)=1/2。Among them, update_time_weight is the weight of the update time dimension, factor is a constant that decays with time, and the unit is second. Here, it is calculated based on a 30-day decay, factor = 30 * 24 * 3600 = 2592000. update_time_secs is the number of seconds from the last chat time to the present. For example, the last chat time was 30 days ago, then update_time_secs = 30 * 24 * 3600 = 259200, then the update time dimension update_time_weight = 259200 / (259200 + 259200) = 1/2 .
在一个实施例中,如图5所示,所述获取用户关联程度权重包括:In one embodiment, as shown in FIG. 5, the acquiring user association degree weight includes:
S521,计算所述初始检索结果与当前进行搜索的共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数;S521. Calculate the initial search result and the number of common contacts, common department characteristic values, common office location characteristic values, and common personal tags that are currently being searched;
S522,根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数,计算用户关联程度权重。S522. Calculate a user association degree weight according to the number of common contacts, a characteristic value of a common department, a characteristic value of a common office location, and a number of common personal tags.
在一个实施例中,所述根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数,计算用户关联程度权重的步骤包括:根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数分别获取偏移值和修正值;根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数和所述偏移值和修正值进行融合计算,得到用户关联程度权重。其中,所述偏移值和修正值可通过机器学习确定。其中,根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数分别获取偏移值和修正值包括:根据所述共同联系人数目获取偏移值和修正值,根据所述共同部门特征值获取偏移值和修正值,根据所述共同办公地点特征值获取偏移值和修正值,根据所述共同个人标签数获取偏移值和修正值。In one embodiment, the step of calculating the weight of the degree of user association based on the number of common contacts, common department characteristic values, common office location characteristic values, and common personal tags includes: Obtain the offset value and the correction value of the department characteristic value, the characteristic value of the common office location and the number of the common personal tags, respectively; The shift value and the correction value are fused and calculated to obtain the user relevance degree weight. The offset value and the correction value can be determined through machine learning. Wherein, obtaining the offset value and the correction value according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags includes: obtaining the offset value and the correction value according to the number of the common contacts, An offset value and a correction value are obtained according to the characteristic value of the common department, an offset value and a correction value are obtained according to the characteristic value of the common office location, and an offset value and a correction value are obtained according to the number of the common personal tags.
其中,用户关联程度用于描述用户与联系人的共同特征,共同特征包括:共同联系过的人、共同的部门、共同的办公地点、共同的个人标签,其中用户指代执行搜索的用户、联系人指代初始检索结果所对应的联系人。如,用户A与联系人B之间都联系过的人多,说明用户A与联系人B相关性强,用户A与联系人B暂时未建立联系,但存在许多共同特征,则联系人B是用户A倾向搜索的对象。通过计算用户关联程度,可以满足用户的个性化搜索,对与用户有相同特征的联系人排序靠前。Among them, the degree of user association is used to describe the common characteristics of users and contacts. Common characteristics include: people who have been connected, departments, offices, and personal tags. Users refer to users who perform searches, contacts The person refers to the contact corresponding to the initial search result. For example, if there are many contacts between User A and Contact B, it means that User A and Contact B are highly related, and User A and Contact B have not yet established a contact, but there are many common characteristics, then Contact B is User A tends to search for objects. By calculating the degree of user association, the user's personalized search can be satisfied, and contacts with the same characteristics as the user are ranked higher.
在其中一个实施例中,用户关联程度通过离线数据进行挖掘,通过多个共同特征来计算。用户关联程度权重具体的计算公式如下:In one of the embodiments, the degree of user association is mined by offline data, and calculated by a plurality of common characteristics. The specific calculation formula for the user association degree weight is as follows:
user_relevant_weight=(i*same_user_num+j)*(k*same_department+l)* (m*same_place+n)*(o*same_tag+p);user_relevant_weight = (i * same_user_num + j) * (k * same_department + l) * (m * same_place + n) * (o * same_tag + p);
其中,user_relevant_weight为用户关联程度权重;same_user_num为共同联系人数目,共同联系人数目表示搜索执行的主体与初始检索结果对应的联系人共同联系人的数目,取值为一个大于0的整数;same_department为共同部门特征值,当位于同一个部门,取值为1,不位于同一个部门取值为0;same_place为共同办公地点特征值,当位于同一个办公地点取值为1,不位于同一个办公地点取值为0;same_tag为共同个人标签数,表示用户具有相同的标签个数,如都有相同的“旅游阅读”标签,same_tag取值为2。其中,i、j为共同联系人数目的偏移值和修正值,k、l为共同部门特征值的偏移值和修正值,m、n为共同办公地点特征值的偏移值和修正值,o、p为共同个人标签数的偏移值和修正值,其中,偏移值越大表示该项的重要程度越高。Among them, user_relevant_weight is the weight of the user's association degree; same_user_num is the number of common contacts, and the number of common contacts represents the number of common contacts of the contact corresponding to the initial search result of the subject and the value is an integer greater than 0; same_department is Common department characteristic value, when it is located in the same department, the value is 1, but not in the same department, the value is 0; same_place is the characteristic value of the common office location, when it is located in the same office, the value is 1, but not in the same office. The value of place is 0; same_tag is the number of common personal tags, which means that users have the same number of tags. If they all have the same "travel reading" tag, the value of same_tag is 2. Among them, i, j are the offset values and correction values of the number of common contacts, k, l are the offset values and correction values of the common department characteristic values, and m, n are the offset values and correction values of the common office location characteristic values, o and p are the offset value and the correction value of the number of common personal tags, where a larger offset value indicates that the item is more important.
在一个实施例中,所述根据所述文本相似度权重、更新时间维度权重和用户关联程度权重进行融合计算,得到每个所述初始检索结果的综合权值包括:将所述文本相似度权重、更新时间维度权重和用户关联程度权重归一化成0~1之间的小数;根据所述归一化后的文本相似度权重、更新时间维度权重和用户关联程度权重进行融合计算,得到每个所述初始检索结果的综合权值。In one embodiment, performing the fusion calculation according to the text similarity weight, the update time dimension weight, and the user relevance degree weight, and obtaining the comprehensive weight of each of the initial search results includes: weighting the text similarity The update time dimension weight and the user relevance degree weight are normalized to decimals between 0 and 1. According to the normalized text similarity weight, the update time dimension weight and the user relevance degree weight, a fusion calculation is performed to obtain each A comprehensive weight of the initial search result.
在一个实施例中,所述根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值包括:根据所述文本相似度、更新时间维度和用户关联程度,计算文本相似度权重、更新时间维度权重和用户关联程度权重;根据所述文本相似度权重、更新时间维度权重和用户关联程度权重分别获取偏移值和修正值;分别计算文本相似度权重、更新时间维度权重和用户关联程度权重与与其对应的所述偏移值之积再与与其对应的所述修正值之和得到融合系数;将所述融合系数相乘,得到每个所述初始检索结果的综合权值。其中,所述偏移值和修正值可通过机器学习确定。其中,所述根据所述文本相似度权重、更新时间维度权重和用户关联程度权重分别获取偏移值和修正值包括:根据文本相似度权重获取偏移值和修正值,根据更新时间维度权重获取偏移值和修正值,根据用户关联程度权重获取偏移值和修正值。In one embodiment, the corresponding text similarity weight, update time dimension weight, and user correlation degree weight are obtained according to the text similarity, update time dimension, and user association degree, and according to the text similarity weight, Performing a fusion calculation on each of the initial search results by updating the weight of the time dimension and the weight of the degree of user association, and obtaining a comprehensive weight of each of the initial search results includes: according to the text similarity, the update time dimension, and the degree of user association, Calculate text similarity weight, update time dimension weight, and user relevance degree weight; obtain offset and correction values respectively according to the text similarity weight, update time dimension weight, and user relevance degree weight; calculate text similarity weight, update respectively The product of the time dimension weight and the user's relevance degree weight and the offset value corresponding to it is then added to the sum of the corresponding correction value to obtain a fusion coefficient; the fusion coefficient is multiplied to obtain each of the initial search results Comprehensive weight. The offset value and the correction value can be determined through machine learning. Wherein obtaining the offset value and the correction value according to the text similarity weight, the update time dimension weight, and the user relevance degree weight respectively includes: obtaining the offset value and the correction value according to the text similarity weight, and obtaining the offset value and the correction value according to the update time dimension weight. Offset value and correction value, and obtain offset value and correction value according to the user's correlation degree weight.
在一个具体的实施例中,综合权值计算公式如下:In a specific embodiment, the formula for calculating the comprehensive weight is as follows:
weight=(a1*text_weight+b1)*(a2*update_time_weight+b2)*(a3*user_relevant_weight+b3)weight = (a1 * text_weight + b1) * (a2 * update_time_weight + b2) * (a3 * user_relevant_weight + b3)
其中,weight表示初始检索结果综合权值,text_weight表示文本相似度权重, update_time_weight表示聊天更新时间权重,user_relevant_weight表示用户关联程度权重a1为偏移值,b1为修正值,a1*text_weight+b1计算得到第一融合系数;update_time_weight表示更新时间维度权重,a2为偏移值,b2为修正值,a2*update_time_weight+b2计算得到第二融合系数;多个融合系数相乘得到初始检索结果的综合权值。式中,a1、a2、a3均为偏移值,b1、b2、b3均为修正值。Among them, weight represents the comprehensive weight of the initial search result, text_weight represents the text similarity weight, update_time_weight represents the chat update time weight, user_relevant_weight represents the user relevance degree weight a1 is the offset value, b1 is the correction value, and a1 * text_weight + b1 is calculated to obtain the first A fusion coefficient; update_time_weight represents the update time dimension weight, a2 is the offset value, b2 is the correction value, and a2 * update_time_weight + b2 is calculated to obtain the second fusion coefficient; multiple fusion coefficients are multiplied to obtain the comprehensive weight of the initial search result. In the formula, a1, a2, and a3 are offset values, and b1, b2, and b3 are correction values.
应该理解的是,虽然图2-5的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-5中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the flowchart of FIG. 2-5 are sequentially displayed according to the directions of the arrows, these steps are not necessarily performed sequentially in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in FIG. 2-5 may include multiple sub-steps or stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.
在一个实施例中,如图6所示,提供了一种搜索排序装置,包括:初始检索结果提取模块601、特征因子提取模块602、权值计算模块603和排序模块604,其中:In one embodiment, as shown in FIG. 6, a search ranking device is provided, which includes: an initial retrieval result extraction module 601, a feature factor extraction module 602, a weight calculation module 603, and a ranking module 604, where:
初始检索结果提取模块601,用于获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果。An initial search result extraction module 601 is configured to obtain a search keyword and determine a plurality of initial search results that match the plurality of keywords.
其中,搜索关键词为用户在使用搜索引擎查找相关信息时输入的字、词、符号等输入信息,初始检索结果包含多个字段,具体的,初始检索结果指代的对象为联系人或群聊。The search keywords are input information such as words, words, and symbols entered by the user when using a search engine to find related information. The initial search results include multiple fields. Specifically, the initial search results refer to contacts or group chats. .
具体地,在终端输入搜索关键词,终端获取用户输入的搜素关键词发送至服务器。Specifically, a search keyword is input at the terminal, and the terminal obtains the search keyword input by the user and sends it to the server.
特征因子提取模块602,用于提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度。A feature factor extraction module 602 is configured to extract a text similarity, an update time dimension, and a user association degree related to each of the initial search results.
其中,初始检索结果为与搜索关键词所匹配的文本文档;从初始检索结果中获取文本相似度、更新时间维度和用户关联程度,根据文本文档提取与关键词相关的一些信息。The initial search result is a text document matching the search keywords; the text similarity, update time dimension, and user relevance are obtained from the initial search results, and some information related to the keywords is extracted according to the text document.
作为一个优选的实施方式,所述搜索排序装置还包括:筛选模块,用于对所述初始检索结果进行筛选。其中,所述对所述初始检索结果进行筛选包括:离职用户且无聊天记录的初始检索结果不进行排序;将未注册用户的初始检索结果排在最后。聊天记录可以通过聊天更新时间或最近一条消息对应位置确定。As a preferred embodiment, the search and ranking device further includes: a filtering module, configured to filter the initial search result. Wherein, the filtering the initial search results includes: the initial search results of the leaving users and no chat history are not sorted; and the initial search results of the unregistered users are ranked last. The chat history can be determined by the chat update time or the corresponding location of the latest message.
权值计算模块603,用于根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重以及所述文本相似度参数、更新时间维度参 数、用户关联程度参数进行融合计算,得到每个所述初始检索结果的综合权值。A weight calculation module 603, configured to obtain corresponding text similarity weights, update time dimension weights, and user correlation degree weights according to the text similarity, update time dimension, and user association degree, and according to the text similarity weights, The update time dimension weight and the user relevance degree weight and the text similarity parameter, the update time dimension parameter, and the user relevance degree parameter are subjected to fusion calculation to obtain a comprehensive weight value of each of the initial search results.
排序模块604,用于根据所述综合权值对所述多个初始检索结果进行排序。A sorting module 604 is configured to sort the multiple initial search results according to the comprehensive weight.
其中,进行排序时,可以依据权值从大到小来进行排序,也可以依据权值从小到大来进行排序。采用此种技术方案不依据栏目来区分排序方式,而是根据权值来进行排序,实现快速查找到相关信息。Wherein, when sorting, the sorting can be performed according to the weight value from large to small, or the sorting can be performed according to the weight value from small to large. Using this technical solution does not distinguish the sorting methods according to the columns, but sorts according to the weights, so as to quickly find relevant information.
本实施例中,所述用户关联程度由当前进行搜索的用户与所述初始检索结果的共同特征数据确定。In this embodiment, the degree of user association is determined by common feature data of a user currently performing a search and the initial search result.
其中,初始检索结果,针对的对象是联系人或群。每条初始检索结果包含的字段包括:对象类型、对象状态、对象名称、初始召回搜索引擎分数、聊天更新时间、最近一条消息位置、对象拼音名、对象英文名、所在部门信息的一种或多种。其中,对象类型包括聊天应用、邮件,对象状态包括是否注册、是否离职。Among them, the initial search result is aimed at contacts or groups. The fields contained in each initial search result include: one or more of object type, object status, object name, initial recall search engine score, chat update time, last message location, object pinyin name, object English name, and department information Species. Among them, the object types include chat applications and emails, and the object status includes whether to register or leave.
在一个实施例中,如图7所示,特征因子提取模块602包括:文本相似度权重计算单元701,更新时间维度权重计算单元702,用户关联程度权重计算单元703,其中:In one embodiment, as shown in FIG. 7, the feature factor extraction module 602 includes a text similarity weight calculation unit 701, an update time dimension weight calculation unit 702, and a user relevance degree weight calculation unit 703, where:
文本相似度权重计算单元701,用于计算所述关键词在所述初始检索结果中的命中率、顺序一致性指标、位置紧密度和覆盖率,并根据所述命中率、顺序一致性指标、位置紧密度和覆盖率,计算文本相似度权重。The text similarity weight calculation unit 701 is configured to calculate a hit rate, an order consistency index, a location closeness, and a coverage rate of the keywords in the initial search result, and according to the hit rate, the order consistency index, Position compactness and coverage, and calculate text similarity weights.
在一个实施例中,所文本相似度权重计算单元包括:第一偏移值和修正值获取子单元,用于根据所述命中率、顺序一致性指标、位置紧密度和覆盖率分别获取偏移值和修正值;文本相似度融合计算子单元,用于根据所述命中率、顺序一致性指标、位置紧密度和覆盖率和所述偏移值和修正值进行融合计算,得到文本相似度权重。其中,所述偏移值和修正值可通过机器学习确定。其中,根据所述命中率、顺序一致性指标、位置紧密度和覆盖率分别获取偏移值和修正值包括:根据所述命中率获取偏移值和修正值,根据所述顺序一致性指标获取偏移值和修正值,根据所述位置紧密度指标获取偏移值和修正值,根据所述覆盖率获取偏移值和修正值。In one embodiment, the text similarity weight calculation unit includes: a first offset value and a correction value acquisition subunit, configured to obtain offsets respectively according to the hit ratio, the order consistency index, the position closeness, and the coverage ratio. Value and correction value; a text similarity fusion calculation subunit, configured to perform fusion calculation according to the hit ratio, order consistency index, position closeness and coverage, and the offset value and correction value to obtain a text similarity weight . The offset value and the correction value can be determined through machine learning. Wherein, obtaining the offset value and the correction value according to the hit rate, the order consistency index, the position closeness, and the coverage rate includes: obtaining the offset value and the correction value according to the hit rate, and obtaining the offset value and the correction value according to the order consistency index. An offset value and a correction value, an offset value and a correction value are obtained according to the position closeness index, and an offset value and a correction value are obtained according to the coverage ratio.
在其中一个实施例中,计算文本相似度权重具体公式为:In one embodiment, the specific formula for calculating the text similarity weight is:
text_similar=(a*hit+b)*(c*sequence+d)*(e*position+f)*(g*cover+h);其中,text_similar为文本相似度权重,hit为文本命中率,sequence为顺序一致性指标,position为位置紧密度,cover为覆盖率。其中,a、b为命中率的偏移值和修正值,c、d为顺序一致性指标的偏移值和修正值,e、f为位置紧密度的偏移值和修正值,g、h为覆盖率的偏移值和修正值,其中,偏移值越大表示该项的重要程度越高。其中,文本命中率表示搜索关键词在对应的文本文档中命中的个数与搜索关键词的总个数的比率,显然所占的比率越高表示初始检索结果越接近搜索目标。顺序一致性指标表示搜索关键词的顺序与对应的文本文档的出现的 搜索关键词的顺序的一致性,顺序一致性通过逆序的个数的比例来表达,如(1,2,3)逆序个数为0,即最有序的排列,(3,2,1)逆序个数为3,为最无序的排列。位置紧密度表示命中的文本文档个数与命中文本文档个数与命中的间隔数之和的比率,如关键词“张三张四李四”,命中的初始检索结果“张三”、“李四的群”,命中的关键词“张三李四”,命中文本文档个数t为2,命中的间隔数之和为1(因为中间隔了一个张四),因此,位置紧密度=2/(1+2)=2/3。覆盖率表示命中的关键字占全部命中文本文档总字段的比率。text_similar = (a * hit + b) * (c * sequence + d) * (e * position + f) * (g * cover + h); where text_similar is the text similarity weight, hit is the text hit rate, and sequence Is the order consistency index, position is the position closeness, and cover is the coverage. Among them, a, b are the offset values and correction values of the hit rate, c, d are the offset values and correction values of the order consistency index, e, f are the offset values and correction values of the position closeness, and g, h It is the offset value and correction value of the coverage rate. The larger the offset value is, the more important the item is. The text hit ratio indicates the ratio of the number of search keywords hitting in the corresponding text document to the total number of search keywords. Obviously, the higher the ratio, the closer the initial search result is to the search target. The order consistency index indicates the consistency of the order of the search keywords and the order of the search keywords that appear in the corresponding text documents. The order consistency is expressed by the ratio of the number of reverse orders, such as (1, 2, 3) reverse order The number is 0, which is the most ordered arrangement, and the number of (3, 2, 1) reverse order is 3, which is the most disordered arrangement. The position closeness indicates the ratio of the number of hit text documents to the sum of the number of hit text documents and the number of hit intervals. For example, the keyword "Zhang San Zhang Si Li Si", the initial search results of the hit "Zhang San", "Li "Group of four", the hit keyword "Zhang Sanli Si", the number of hits to the text document t is 2, and the sum of the number of hits is 1 (because there is a Zhang Si in the middle), so the position tightness = 2 / (1 +2) = 2/3. Coverage represents the ratio of hit keywords to the total field of all hit text documents.
更新时间维度权重计算单元702,用于根据所述初始检索结果,获取最后一次聊天时间距离当前时间的时间间隔,并计算衰减常数与所述时间间隔与所述衰减常数之和的比值,得到所述聊天更新时间权重。An update time dimension weight calculation unit 702 is configured to obtain a time interval between the last chat time and the current time according to the initial search result, and calculate a ratio of the attenuation constant to the sum of the time interval and the attenuation constant to obtain the The chat update time weight.
在其中一个实施例中,计算更新时间维度权重计算公式如下:In one embodiment, the formula for calculating the update time dimension weight is as follows:
update_time_weight=factor/(factor+update_time_secs);update_time_weight = factor / (factor + update_time_secs);
其中,update_time_weight为更新时间维度权重,factor是一个随时间衰减的常数,单位是秒,这里按照30天衰减一半来计算,factor=30*24*3600=2592000。update_time_secs是最后一次聊天时间距离现在的秒数,比如最后一次聊天时间是30天前,则update_time_secs=30*24*3600=259200,那么更新时间维度update_time_weight=259200/(259200+259200)=1/2。Among them, update_time_weight is the weight of the update time dimension, factor is a constant that decays with time, and the unit is second. Here, it is calculated based on a 30-day decay, factor = 30 * 24 * 3600 = 2592000. update_time_secs is the number of seconds from the last chat time to the present. For example, the last chat time was 30 days ago, then update_time_secs = 30 * 24 * 3600 = 259200, then the update time dimension update_time_weight = 259200 / (259200 + 259200) = 1/2 .
用户关联程度权重计算单元703,用于计算所述初始检索结果与当前进行搜索的共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数,并根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数,计算用户关联程度权重。The user association degree weight calculation unit 703 is configured to calculate the number of common contacts, common department characteristic values, common office location characteristic values, and common personal tags in the initial search result and the current search, and according to the number of common contacts , The characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags, and calculate the user's relevance degree weight.
用户关联程度权重计算单元703包括:第二偏移值和修正值获取子单元,用于根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数分别获取偏移值和修正值;用户关联程度融合计算子单元,用于根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数和所述偏移值和修正值进行融合计算,得到用户关联程度权重。其中,所述偏移值和修正值可通过机器学习确定。其中,根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数分别获取偏移值和修正值包括:根据所述共同联系人数目获取偏移值和修正值,根据所述共同部门特征值获取偏移值和修正值,根据所述共同办公地点特征值获取偏移值和修正值,根据所述共同个人标签数获取偏移值和修正值。The user association degree weight calculation unit 703 includes: a second offset value and a correction value acquisition subunit, configured to obtain offsets respectively according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags. Value and correction value; a user correlation degree fusion calculation subunit, configured to perform fusion calculation according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, the number of common personal tags, and the offset value and the correction value To get the user relevance weight. The offset value and the correction value can be determined through machine learning. Wherein, obtaining the offset value and the correction value according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags includes: obtaining the offset value and the correction value according to the number of the common contacts, An offset value and a correction value are obtained according to the characteristic value of the common department, an offset value and a correction value are obtained according to the characteristic value of the common office location, and an offset value and a correction value are obtained according to the number of the common personal tags.
其中,用户关联程度用于描述用户与联系人的共同特征,共同特征包括:共同联系过的人、共同的部门、共同的办公地点、共同的个人标签,其中用户指代执行搜索的用户、联系人指代初始检索结果所对应的联系人。如,用户A与联系人B之间都联系过的人多, 说明用户A与联系人B相关性强,用户A与联系人B暂时未建立联系,但存在许多共同特征,则联系人B是用户A倾向搜索的对象。通过计算用户关联程度,可以满足用户的个性化搜索,对与用户有相同特征的联系人排序靠前。Among them, the degree of user association is used to describe the common characteristics of users and contacts. Common characteristics include: people who have been connected, departments, offices, and personal tags. Users refer to users who perform searches, contacts The person refers to the contact corresponding to the initial search result. For example, there are many people that have been contacted between User A and Contact B, indicating that User A and Contact B are highly relevant, and User A and Contact B have not yet established a contact, but there are many common characteristics, then Contact B is User A tends to search for objects. By calculating the degree of user association, the user's personalized search can be satisfied, and contacts with the same characteristics as the user are ranked higher.
在其中一个实施例中,用户关联程度通过离线数据进行挖掘,通过多个共同特征来计算。用户关联程度权重具体的计算公式如下:In one of the embodiments, the degree of user association is mined by offline data, and calculated by a plurality of common characteristics. The specific calculation formula for the user association degree weight is as follows:
user_relevant_weight=(i*same_user_num+j)*(k*same_department+l)*(m*same_place+n)*(o*same_tag+p);user_relevant_weight = (i * same_user_num + j) * (k * same_department + l) * (m * same_place + n) * (o * same_tag + p);
其中,user_relevant_weight为用户关联程度权重;same_user_num为共同联系人数目,共同联系人数目表示搜索执行的主体与初始检索结果对应的联系人共同联系人的数目,取值为一个大于0的整数;same_department为共同部门特征值,当位于同一个部门,取值为1,不位于同一个部门取值为0;same_place为共同办公地点特征值,当位于同一个办公地点取值为1,不位于同一个办公地点取值为0;same_tag为共同个人标签数,表示用户具有相同的标签个数,如都有相同的“旅游阅读”标签,same_tag取值为2。其中,i、j为共同联系人数目的偏移值和修正值,k、l为共同部门特征值的偏移值和修正值,m、n为共同办公地点特征值的偏移值和修正值,o、p为共同个人标签数的偏移值和修正值,其中,偏移值越大表示该项的重要程度越高。Among them, user_relevant_weight is the weight of the user's association degree; same_user_num is the number of common contacts, and the number of common contacts represents the number of common contacts of the contact corresponding to the initial search result of the subject and the value is an integer greater than 0; Common department characteristic value, when it is located in the same department, the value is 1, but not in the same department, the value is 0; same_place is the characteristic value of the common office location, when it is located in the same office, the value is 1, but not in the same office. The value of place is 0; same_tag is the number of common personal tags, which means that users have the same number of tags. If they all have the same "travel reading" tag, the value of same_tag is 2. Among them, i, j are the offset values and correction values of the number of common contacts, k, l are the offset values and correction values of the common department characteristic values, and m, n are the offset values and correction values of the common office location characteristic values, o and p are the offset value and the correction value of the number of common personal tags, where a larger offset value indicates that the item is more important.
在一个实施例中,权值计算模块603包括:In one embodiment, the weight calculation module 603 includes:
归一化单元801,用于将所述文本相似度权重、更新时间维度权重和用户关联程度权重归一化成0~1之间的小数;A normalization unit 801, configured to normalize the text similarity weight, the update time dimension weight, and the user association degree weight to decimals between 0 and 1;
融合计算单元802,用于根据所述归一化后的文本相似度权重、更新时间维度权重和用户关联程度权重进行融合计算,得到每个所述初始检索结果的综合权值。A fusion calculation unit 802 is configured to perform fusion calculation according to the normalized text similarity weight, update time dimension weight, and user association degree weight to obtain a comprehensive weight of each of the initial search results.
在一个实施例中,所述权值计算模块包括:权重获取单元,用于根据所述文本相似度、更新时间维度和用户关联程度,计算文本相似度权重、更新时间维度权重和用户关联程度权重;偏移值和修正值获取单元,用于根据所述文本相似度权重、更新时间维度权重和用户关联程度权重分别获取偏移值和修正值;融合系数计算单元,分别计算文本相似度权重、更新时间维度权重和用户关联程度权重与与其对应的所述偏移值之积再与与其对应的所述修正值之和得到融合系数;综合权值计算单元,用于将所述融合系数相乘,得到每个所述初始检索结果的综合权值。In one embodiment, the weight calculation module includes a weight acquisition unit configured to calculate a text similarity weight, an update time dimension weight, and a user correlation degree weight according to the text similarity, an update time dimension, and a user association degree. An offset value and a correction value acquisition unit, configured to obtain an offset value and a correction value respectively according to the text similarity weight, an update time dimension weight, and a user relevance degree weight; a fusion coefficient calculation unit to calculate a text similarity weight, The product of the update time dimension weight and the user association degree weight and the offset value corresponding to it is added to the sum of the corresponding correction value to obtain a fusion coefficient; a comprehensive weight calculation unit is configured to multiply the fusion coefficient To obtain a comprehensive weight of each of the initial search results.
在一个具体的实施例中,综合权值计算公式如下:In a specific embodiment, the formula for calculating the comprehensive weight is as follows:
weight=(a1*text_weight+b1)*(a2*update_time_weight+b2)*(a3*user_relevant_weight+b3)weight = (a1 * text_weight + b1) * (a2 * update_time_weight + b2) * (a3 * user_relevant_weight + b3)
其中,weight表示初始检索结果综合权值,text_weight表示文本相似度权重,update_time_weight表示聊天更新时间权重,user_relevant_weight表示用户关联程度权重a1为偏移值,b1为修正值,a1*text_weight+b1计算得到第一融合系数;update_time_weight表示更新时间维度权重,a2为偏移值,b2为修正值,a2*update_time_weight+b2计算得到第二融合系数;多个融合系数相乘得到初始检索结果的综合权值。式中,a1、a2、a3均为偏移值,b1、b2、b3均为修正值。Among them, weight represents the comprehensive weight of the initial search result, text_weight represents the text similarity weight, update_time_weight represents the chat update time weight, user_relevant_weight represents the user relevance degree weight a1 is the offset value, b1 is the correction value, and a1 * text_weight + b1 is calculated to obtain the first A fusion coefficient; update_time_weight represents the update time dimension weight, a2 is the offset value, b2 is the correction value, and a2 * update_time_weight + b2 is calculated to obtain the second fusion coefficient; multiple fusion coefficients are multiplied to obtain the comprehensive weight of the initial search result. In the formula, a1, a2, and a3 are offset values, and b1, b2, and b3 are correction values.
上述搜素排序装置,通过提取更新时间维度参数来确保排序是依照时间进行,通过用户关联程度将与用户具有共同特征的初始检索结果排序靠前,通过多个维度来进行检索结果的排序,使得排序智能化,方便用户快速查找到相关信息,简化了操作提高了查找效率。The above-mentioned search sorting device ensures that the sorting is performed according to time by extracting and updating the time dimension parameters, ranking the initial search results that have common characteristics with the user according to the degree of user association, and sorting the search results through multiple dimensions, so that Intelligent sorting makes it easy for users to quickly find relevant information, which simplifies operations and improves search efficiency.
关于搜索排序装置的具体限定可以参见上文中对于搜索排序方法的限定,在此不再赘述。上述搜索排序装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于电子设备中的处理器中,也可以以软件形式存储于电子设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the search sorting device, refer to the foregoing limitation on the search sorting method, which is not described herein again. Each module in the search sorting device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware or independent of the processor in the electronic device, or may be stored in the memory of the electronic device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种电子设备,该电子设备可以是服务器,其内部结构图可以如图9所示。该电子设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该电子设备的处理器用于提供计算和控制能力。该电子设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该电子设备的数据库用于存储搜索排序的数据。该电子设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种搜索排序方法。In one embodiment, an electronic device is provided. The electronic device may be a server, and the internal structure diagram may be as shown in FIG. 9. The electronic device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the electronic device is used to provide computing and control capabilities. The memory of the electronic device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for running an operating system and computer programs in a non-volatile storage medium. The database of the electronic device is used to store search-sorted data. The network interface of the electronic device is used to communicate with an external terminal through a network connection. The computer program is executed by a processor to implement a search ranking method.
本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的电子设备的限定,具体的电子设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the electronic device to which the solution of the present application is applied. The specific electronic device may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
在一个实施例中,如图10所示,ElasticSearch(以下简称ES)是一种开源的分布式搜索引擎,ES用于做数据的存储,通过建立倒排索引能够快速召回匹配的初始检索结果;Search用于传递应用层给ES下发的搜索请求,并获取与搜索请求对应的初始检索结果;Ranker用于将初始检索结果,结合文本相似度、更新时间维度和用户关联程度进行综合权值计算并排序,并将排序结果返回给Searcher。ES召回的初始检索结果包含初始召回搜索引擎分数,初始召回搜索引擎分数不能满足多维度的排序的需要,采用本发明实施例搜索排序方法能够对初始检索结果进行排序。Search、Ranker可通过服务器来实现。In one embodiment, as shown in FIG. 10, ElasticSearch (hereinafter referred to as ES) is an open source distributed search engine. ES is used for data storage, and it can quickly recall the matching initial search results by establishing an inverted index; Search is used to pass the search request issued by the application layer to the ES and obtain the initial search results corresponding to the search request; Ranker is used to combine the initial search results with the text similarity, update time dimension and user association degree to perform comprehensive weight calculation And sort, and return the sorted results to the Searcher. The initial search results of the ES recall include the initial recall search engine scores. The initial recall search engine scores cannot meet the needs of multi-dimensional sorting. Using the search ranking method of the embodiment of the present invention can sort the initial search results. Search and Ranker can be implemented through the server.
在一个实施例中,提供了一种电子设备,包括存储器和处理器,存储器中存储有计算 机程序,该处理器执行计算机程序时实现以下步骤:获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重以及所述文本相似度参数、更新时间维度参数、用户关联程度参数进行融合计算,得到每个所述初始检索结果的综合权值;根据所述综合权值对所述多个初始检索结果进行排序。In one embodiment, an electronic device is provided, which includes a memory and a processor. A computer program is stored in the memory, and the processor executes the computer program to implement the following steps: acquiring a search keyword, and determining a relationship with the plurality of keywords. Matching multiple initial search results; extracting text similarity, update time dimension, and user relevance degree related to each of the initial search results; obtaining corresponding text similarity according to the text similarity, update time dimension, and user relevance degree Degree weight, update time dimension weight, and user association degree weight, and fuse according to the text similarity weight, update time dimension weight and user association degree weight, and the text similarity parameter, update time dimension parameter, and user association degree parameter. Calculate to obtain an integrated weight of each of the initial search results; and sort the plurality of initial search results according to the integrated weight.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重以及所述文本相似度参数、更新时间维度参数、用户关联程度参数进行融合计算,得到每个所述初始检索结果的综合权值;根据所述综合权值对所述多个初始检索结果进行排序。In one embodiment, a computer-readable storage medium is provided on which a computer program is stored. When the computer program is executed by a processor, the following steps are implemented: obtaining a search keyword, and determining a plurality of keywords matching the plurality of keywords. Initial search results; extracting text similarity, update time dimension, and user relevance degree associated with each of the initial search results; obtaining corresponding text similarity weights, based on the text similarity, update time dimension, and user relevance degree, Update the time dimension weight and user relevance degree weight, and perform fusion calculation based on the text similarity weight, update time dimension weight and user relevance degree weight, and the text similarity parameter, update time dimension parameter, and user relevance degree parameter to obtain A comprehensive weight of each of the initial search results; and ranking the plurality of initial search results according to the comprehensive weight.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by using a computer program to instruct related hardware. The computer program can be stored in a non-volatile computer-readable storage. In the medium, the computer program, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be arbitrarily combined. In order to make the description concise, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范 围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and their descriptions are more specific and detailed, but they cannot be understood as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims (12)

  1. 一种搜索排序方法,其特征在于,所述方法包括:A search ranking method, characterized in that the method includes:
    获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;Acquiring search keywords, and determining a plurality of initial search results matching the plurality of keywords;
    提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;Extracting text similarity, update time dimension and user relevance degree related to each of the initial search results;
    根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值;Obtain the corresponding text similarity weight, update time dimension weight and user relevance degree weight according to the text similarity, update time dimension and user relevance degree, and according to the text similarity weight, update time dimension weight and user relevance degree Performing weight calculation on each of the initial search results to obtain a comprehensive weight value of each of the initial search results;
    根据所述综合权值对所述多个初始检索结果进行排序。Sort the plurality of initial search results according to the comprehensive weight.
  2. 根据权利要求1所述的方法,其特征在于,所述获取文本相似度权重包括:The method according to claim 1, wherein the obtaining the text similarity weight comprises:
    计算所述关键词在所述初始检索结果中的命中率、顺序一致性指标、位置紧密度和覆盖率;Calculating a hit rate, an order consistency index, a location closeness, and a coverage rate of the keywords in the initial search result;
    根据所述命中率、顺序一致性指标、位置紧密度和覆盖率,计算文本相似度权重。The text similarity weight is calculated according to the hit ratio, the order consistency index, the position closeness, and the coverage ratio.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述命中率、顺序一致性指标、位置紧密度和覆盖率计算文本相似度权重的步骤包括:The method according to claim 2, wherein the step of calculating a text similarity weight according to the hit ratio, order consistency index, location closeness, and coverage ratio comprises:
    根据所述命中率、顺序一致性指标、位置紧密度和覆盖率分别获取偏移值和修正值;Obtaining an offset value and a correction value respectively according to the hit ratio, the order consistency index, the position closeness, and the coverage ratio;
    根据所述命中率、顺序一致性指标、位置紧密度和覆盖率和所述偏移值和修正值进行融合计算,得到文本相似度权重。Fusion calculation is performed according to the hit ratio, the order consistency index, the position closeness and coverage, and the offset value and the correction value to obtain a text similarity weight.
  4. 根据权利要求1所述的方法,其特征在于,所述获取更新时间维度权重包括:The method according to claim 1, wherein the obtaining the update time dimension weight comprises:
    根据所述初始检索结果,获取最后一次聊天时间距离当前时间的时间间隔;Obtaining the time interval between the last chat time and the current time according to the initial search result;
    计算衰减常数与所述时间间隔与所述衰减常数之和的比值,得到所述聊天更新时间权重。Calculate the ratio of the attenuation constant to the sum of the time interval and the attenuation constant to obtain the chat update time weight.
  5. 根据权利要求1所述的方法,其特征在于,所述获取用户关联程度权重包括:The method according to claim 1, wherein the obtaining a user association degree weight comprises:
    计算所述初始检索结果与当前进行搜索的共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数;Calculating the initial search result and the number of common contacts, common department characteristic values, common office location characteristic values, and common personal tags currently being searched;
    根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数,计算用户关联程度权重。According to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags, the weight of the user association degree is calculated.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数,计算用户关联程度权重的步骤包括:The method according to claim 5, wherein the step of calculating the weight of the degree of user association based on the number of common contacts, characteristic values of common departments, characteristic values of common office locations, and number of common personal tags comprises:
    根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数分别获取偏移值和修正值;Obtaining an offset value and a correction value according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, and the number of common personal tags;
    根据所述共同联系人数目、共同部门特征值、共同办公地点特征值和共同个人标签数和所述偏移值和修正值进行融合计算,得到用户关联程度权重。Fusion calculation is performed according to the number of the common contacts, the characteristic value of the common department, the characteristic value of the common office location, the number of common personal tags, and the offset value and the correction value to obtain the user relevance degree weight.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述根据所述文本相似度权重、更新时间维度权重和用户关联程度权重进行融合计算,得到每个所述初始检索结果的综合权值包括:The method according to any one of claims 1-6, wherein the fusion calculation is performed according to the text similarity weight, the update time dimension weight, and the user relevance degree weight to obtain each of the initial search results. Comprehensive weights include:
    将所述文本相似度权重、更新时间维度权重和用户关联程度权重归一化成0~1之间的小数;Normalizing the text similarity weight, update time dimension weight, and user relevance degree weight to decimals between 0 and 1;
    根据所述归一化后的文本相似度权重、更新时间维度权重和用户关联程度权重进行融合计算,得到每个所述初始检索结果的综合权值。Fusion calculation is performed according to the normalized text similarity weight, update time dimension weight, and user relevance degree weight to obtain a comprehensive weight of each of the initial search results.
  8. 根据权利要求1-6任一项所述的方法,其特征在于,所述根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值包括:The method according to any one of claims 1 to 6, wherein the corresponding text similarity weight, update time dimension weight, and user association are obtained according to the text similarity, update time dimension, and user association degree. Degree weight, and performing fusion calculation on each of the initial search results according to the text similarity weight, update time dimension weight, and user relevance degree weight, and obtaining a comprehensive weight of each of the initial search results includes:
    根据所述文本相似度、更新时间维度和用户关联程度,计算文本相似度权重、更新时间维度权重和用户关联程度权重;Calculating a text similarity weight, an update time dimension weight, and a user relevance degree weight according to the text similarity, an update time dimension, and a user relevance degree;
    根据所述文本相似度权重、更新时间维度权重和用户关联程度权重分别获取偏移值和修正值;Obtaining an offset value and a correction value according to the text similarity weight, the update time dimension weight, and the user relevance degree weight, respectively;
    分别计算文本相似度权重、更新时间维度权重和用户关联程度权重与与其对应的所述偏移值之积再与与其对应的所述修正值之和得到融合系数;Calculate a text similarity weight, an update time dimension weight, a user correlation weight, a product of the offset value corresponding to the product weight, and a sum of the correction value corresponding to the fusion coefficient to obtain a fusion coefficient;
    将所述融合系数相乘,得到每个所述初始检索结果的综合权值。The fusion coefficients are multiplied to obtain a comprehensive weight of each of the initial search results.
  9. 根据权利要求1-8任一项所述方法,其特征在于,所述提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度之前包括:The method according to any one of claims 1-8, wherein before extracting the text similarity, update time dimension, and user relevance degree related to each of the initial search results comprises:
    对所述初始检索结果进行筛选,包括:Screening the initial search results includes:
    对离职用户且无聊天记录的初始检索结果不进行排序;Do not sort the initial search results of departing users without chat history;
    将未注册用户的初始检索结果排在最后。The initial search results of unregistered users are ranked last.
  10. 一种搜索排序装置,其特征在于,所述装置包括:A search sorting device, characterized in that the device comprises:
    初始检索结果提取模块,获取搜索关键词,确定与所述多个关键词匹配的多个初始检索结果;An initial search result extraction module, which obtains search keywords and determines a plurality of initial search results matching the plurality of keywords;
    特征因子提取模块,提取每个所述初始检索结果相关的文本相似度、更新时间维度和用户关联程度;A feature factor extraction module that extracts text similarity, update time dimension, and user association degree related to each of the initial search results;
    权值计算模块,根据所述文本相似度、更新时间维度和用户关联程度,获取对应的文本相似度权重、更新时间维度权重和用户关联程度权重,并根据所述文本相似度权重、更 新时间维度权重和用户关联程度权重对每个所述初始检索结果进行融合计算,得到每个所述初始检索结果的综合权值;The weight calculation module obtains the corresponding text similarity weight, update time dimension weight, and user relevance degree weight according to the text similarity, update time dimension, and user association degree, and updates the time dimension according to the text similarity weight, Performing a fusion calculation on each of the initial search results with a weight and a weight of a degree of user association to obtain a comprehensive weight of each of the initial search results;
    排序模块,根据所述综合权值对所述多个初始检索结果进行排序。A sorting module sorts the plurality of initial search results according to the comprehensive weight.
  11. 一种电子设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至9中任一项所述方法的步骤。An electronic device includes a memory and a processor. The memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 9 when the processor executes the computer program.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的方法的步骤。A computer-readable storage medium having stored thereon a computer program, characterized in that when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 9 are implemented.
PCT/CN2018/113348 2018-07-27 2018-11-01 Search sorting method and device, electronic device, and storage medium WO2020019562A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810847290.1 2018-07-27
CN201810847290.1A CN109063108B (en) 2018-07-27 2018-07-27 Search ranking method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2020019562A1 true WO2020019562A1 (en) 2020-01-30

Family

ID=64835819

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/113348 WO2020019562A1 (en) 2018-07-27 2018-11-01 Search sorting method and device, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN109063108B (en)
WO (1) WO2020019562A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium
CN111737608A (en) * 2020-06-22 2020-10-02 中国银行股份有限公司 Enterprise information retrieval result ordering method and device
CN116805044A (en) * 2023-08-17 2023-09-26 北京睿企信息科技有限公司 Label acquisition method, electronic equipment and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096655B (en) * 2019-04-29 2021-04-09 北京字节跳动网络技术有限公司 Search result sorting method, device, equipment and storage medium
CN111428100A (en) * 2020-03-27 2020-07-17 京东方科技集团股份有限公司 Data retrieval method and device, electronic equipment and computer-readable storage medium
CN112784007B (en) * 2020-07-16 2023-02-21 上海芯翌智能科技有限公司 Text matching method and device, storage medium and computer equipment
CN112214573A (en) * 2020-10-30 2021-01-12 数贸科技(北京)有限公司 Information search system, method, computing device, and computer storage medium
CN113343046B (en) * 2021-05-20 2023-08-25 成都美尔贝科技股份有限公司 Intelligent search ordering system
CN113468441A (en) * 2021-06-29 2021-10-01 平安信托有限责任公司 Search sorting method, device, equipment and storage medium based on weight adjustment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008170A (en) * 2014-05-30 2014-08-27 广州金山网络科技有限公司 Search result providing method and device
CN104462293A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Search processing method and method and device for generating search result ranking model
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN104731882A (en) * 2015-03-11 2015-06-24 北京航空航天大学 Self-adaptive query method based on Hash code weighting ranking
CN105760381A (en) * 2014-12-16 2016-07-13 深圳市腾讯计算机系统有限公司 Search result processing method and device
CN105808649A (en) * 2016-02-27 2016-07-27 腾讯科技(深圳)有限公司 Search result sorting method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739416A (en) * 2008-11-04 2010-06-16 未序网络科技(上海)有限公司 Method for sequencing multi-index comprehensive weight video
CN102194006B (en) * 2011-05-30 2013-07-31 李郁文 Search system and method capable of gathering personalized features of group
CN102411638B (en) * 2011-12-30 2013-06-19 中国科学院自动化研究所 Method for generating multimedia summary of news search result
CN104281619A (en) * 2013-07-11 2015-01-14 鸿富锦精密工业(深圳)有限公司 System and method for ordering search results
CN107729336B (en) * 2016-08-11 2021-07-27 阿里巴巴集团控股有限公司 Data processing method, device and system
CN107133290B (en) * 2017-04-19 2019-10-29 中国人民解放军国防科学技术大学 A kind of Personalized search and device
CN107122469B (en) * 2017-04-28 2019-12-17 中国人民解放军国防科学技术大学 Query recommendation ranking method and device based on semantic similarity and timeliness frequency
CN108304512B (en) * 2018-01-19 2021-05-25 北京奇艺世纪科技有限公司 Video search engine coarse sorting method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN104008170A (en) * 2014-05-30 2014-08-27 广州金山网络科技有限公司 Search result providing method and device
CN104462293A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Search processing method and method and device for generating search result ranking model
CN105760381A (en) * 2014-12-16 2016-07-13 深圳市腾讯计算机系统有限公司 Search result processing method and device
CN104731882A (en) * 2015-03-11 2015-06-24 北京航空航天大学 Self-adaptive query method based on Hash code weighting ranking
CN105808649A (en) * 2016-02-27 2016-07-27 腾讯科技(深圳)有限公司 Search result sorting method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625621A (en) * 2020-04-27 2020-09-04 中国铁道科学研究院集团有限公司电子计算技术研究所 Document retrieval method and device, electronic equipment and storage medium
CN111737608A (en) * 2020-06-22 2020-10-02 中国银行股份有限公司 Enterprise information retrieval result ordering method and device
CN111737608B (en) * 2020-06-22 2024-01-19 中国银行股份有限公司 Method and device for ordering enterprise information retrieval results
CN116805044A (en) * 2023-08-17 2023-09-26 北京睿企信息科技有限公司 Label acquisition method, electronic equipment and storage medium
CN116805044B (en) * 2023-08-17 2023-11-17 北京睿企信息科技有限公司 Label acquisition method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109063108B (en) 2020-03-03
CN109063108A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
WO2020019562A1 (en) Search sorting method and device, electronic device, and storage medium
WO2020019564A1 (en) Search ranking method and apparatus, electronic device and storage medium
WO2020019565A1 (en) Search sorting method and apparatus, and electronic device and storage medium
CN108959644B (en) Search ranking method and device, computer equipment and storage medium
US9639579B2 (en) Determination of a desired repository for retrieving search results
Zamani et al. Situational context for ranking in personal search
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
US20200110842A1 (en) Techniques to process search queries and perform contextual searches
CN109062994A (en) Recommended method, device, computer equipment and storage medium
CN108334632B (en) Entity recommendation method and device, computer equipment and computer-readable storage medium
US11568011B2 (en) System and method for improved searching across multiple databases
CN110321437B (en) Corpus data processing method and device, electronic equipment and medium
CN104641371A (en) Context-based object retrieval in a social networking system
CN112883030A (en) Data collection method and device, computer equipment and storage medium
CN110555165B (en) Information identification method and device, computer equipment and storage medium
CN114580402A (en) Enterprise-oriented product information acquisition method and device, server and storage medium
CN113918807A (en) Data recommendation method and device, computing equipment and computer-readable storage medium
CN112732927A (en) Content similarity analysis method and device based on knowledge graph
CN112836126A (en) Recommendation method and device based on knowledge graph, electronic equipment and storage medium
CN111324687A (en) Data processing method and device in knowledge base, computer equipment and storage medium
US11086961B2 (en) Visual leaf page identification and processing
CN110263137B (en) Theme keyword extraction method and device and electronic equipment
JP2015106346A (en) Recommendation information generation device and recommendation information generation method
CN111753199B (en) User portrait construction method and device, electronic device and medium
US20220180378A1 (en) Linking physical locations and online channels in a database

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18927399

Country of ref document: EP

Kind code of ref document: A1