WO2023138428A1 - Search result sorting method, search system and computer-readable storage medium - Google Patents

Search result sorting method, search system and computer-readable storage medium Download PDF

Info

Publication number
WO2023138428A1
WO2023138428A1 PCT/CN2023/071322 CN2023071322W WO2023138428A1 WO 2023138428 A1 WO2023138428 A1 WO 2023138428A1 CN 2023071322 W CN2023071322 W CN 2023071322W WO 2023138428 A1 WO2023138428 A1 WO 2023138428A1
Authority
WO
WIPO (PCT)
Prior art keywords
search results
search
information
data
user preference
Prior art date
Application number
PCT/CN2023/071322
Other languages
French (fr)
Chinese (zh)
Inventor
殷俊杰
周祥生
高洪
屠要峰
钟斌
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023138428A1 publication Critical patent/WO2023138428A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to but not limited to the technical field of data processing, and in particular relates to a method for sorting search results, a search system, and a computer-readable storage medium.
  • search processing technology is becoming more and more mature, becoming the main entrance for users to find information.
  • the current search method can filter search results matching the search requirement information from the database of the search system or the Internet according to the search keywords and present them to the user.
  • search keywords there are differences among users, and different users submitting the same search keywords may have different search requirements, and the target search results obtained by the current search methods are difficult to meet the personalized search requirements of users.
  • Embodiments of the present application provide a method for sorting search results, a search system, and a computer-readable storage medium.
  • an embodiment of the present application provides a method for ranking search results, including: obtaining a search request, the search request including keyword information and user identification; obtaining at least two target search results according to the keyword information; determining user preference information according to the user identification, and determining the ranking of at least two target search results according to the user preference information.
  • an embodiment of the present application provides a search system, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the method for sorting search results as described in any one of the embodiments of the first aspect when executing the computer program.
  • the embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions, the computer-executable instructions being used to execute the method for ranking search results as described in any one embodiment of the first aspect.
  • FIG. 1 is a flow chart of steps of a method for sorting search results provided by an embodiment of the present application
  • FIG. 2 is a flow chart of steps for obtaining user preference information provided by another embodiment of the present application.
  • FIG. 3 is a flow chart of steps for obtaining search result types provided by another embodiment of the present application.
  • FIG. 4 is a flow chart of steps of a method for sorting search results provided in another embodiment of the present application.
  • Fig. 5 is a flow chart of the steps of training the first sorting model provided by another embodiment of the present application.
  • FIG. 6 is a flow chart of steps for obtaining training data provided by another embodiment of the present application.
  • Fig. 7 is a flow chart of steps for sorting target search results before inputting target search results into the first sorting model provided by another embodiment of the present application;
  • FIG. 8 is a flow chart of steps for filtering target search results provided by another embodiment of the present application.
  • Fig. 9 is a block diagram of a search system provided by another embodiment of the present application.
  • Fig. 10 is a flowchart of steps of a ranking model training method provided by another embodiment of the present application.
  • Fig. 11 is a flow chart of the steps of the search result sorting method provided by another embodiment of the present application.
  • Fig. 12 is a flow chart of steps for marking document information provided by another embodiment of the present application.
  • Fig. 13 is a schematic structural diagram of a search system provided by another embodiment of the present application.
  • the present application provides a method for sorting search results, a search system, and a computer-readable storage medium, wherein the method for sorting search results includes: obtaining a search request, the search request including keyword information and a user ID; acquiring at least two target search results according to the keyword information; determining user preference information according to the user ID, and determining the ranking of at least two target search results according to the user preference information.
  • the user preference information is determined according to the user identification
  • the ranking of the search results is determined according to the user preference information.
  • the technical solution of the present application can improve the matching degree between the target search results and the search needs, thereby meeting the personalized search needs of users.
  • Figure 1 is a flow chart of the steps of a method for sorting search results provided by an embodiment of the present application.
  • the method for sorting search results includes but is not limited to the following steps:
  • step S110 a search request is obtained, and the search request includes keyword information and user identification.
  • the search request includes the keyword information input by the user to the user interface of the search system and the user identification corresponding to the input keyword information, which can provide a data basis for obtaining target search results and user preference information.
  • the embodiment of the present application does not limit the content of the user identification, which may be the Internet Protocol (Internet Protocol Address, IP) address of the user terminal device that initiates the search request, or the user identification number (Identity document, ID) that initiates the search request, and those skilled in the art can select according to the actual situation.
  • IP Internet Protocol Address, IP
  • ID Identity document, ID
  • Step S120 acquiring at least two target search results according to the keyword information.
  • the embodiment of the present application does not limit the method of obtaining target search results based on keywords.
  • the target search results can be obtained from the database according to the searched keyword information.
  • Each candidate search result in the database has a mapping relationship with the keyword information.
  • the target search result corresponding to the new keyword information is obtained from the database according to the mapping relationship; it can also be understood that after the keyword information of the search request is obtained, the keyword information is matched with all candidate search results in the database, so as to obtain the target search result that matches the keyword information.
  • the matching method of keyword information and candidate search results is well known to those skilled in the art, and may be implemented through text matching, which is not limited in this embodiment of the present application.
  • the obtained at least two target search results can be combined with user identifiers to provide a data basis for sorting the target search results.
  • Step S130 determining user preference information according to the user identifier, and determining rankings of at least two target search results according to the user preference information.
  • the user preference information can be obtained from the user preference information database corresponding to the user identifier according to the user identifier.
  • the user preference information in the user preference information database is obtained according to the historical search data corresponding to the user identifier, such as analyzing and processing the historical browsing information and click information.
  • the tag information of the document A is obtained, and the tag information of the document A is determined as the user preference information.
  • the user preference information can represent the interest information of the user corresponding to the user identifier.
  • the technical solution of the present application is based on the matching and sorting of the target search results according to the keyword information in the related art, and adds the user preference information as a consideration parameter of the sorting process, so that the sorting of the target search results presented to the user can be more in line with the user's personalized search needs.
  • step S130 in the embodiment shown in FIG. 1 also includes but is not limited to the following steps:
  • Step S210 obtaining historical search data from the log center according to the user identification
  • Step S220 determining the search result type according to the historical search data
  • Step S230 determining distribution information of search result types within at least one preset time period, and performing normalization processing on the distribution information to obtain user preference information.
  • the historical search data is obtained from the log center according to the user identification, and the search result type is determined according to the historical search data.
  • the search result type is the category label of the historical browsing result in the historical search data.
  • the embodiment of the present application does not limit the rules for dividing the preset time periods and the number of preset time periods. It can be divided into four preset time periods according to 1 week, 1 month, 3 months, and 6 months. Those skilled in the art can choose according to the actual situation.
  • the distribution information represents the number of clicks on the browsed document corresponding to the search result type within the preset time period, and the historical search data in the log center will be continuously updated.
  • the value of the user preference information in different preset time periods will also change accordingly.
  • the cumulative value will become larger and larger, causing the long-term interest weight to be too high, thereby weakening the timeliness of user interest.
  • the embodiment of the present application proposes a method for updating user preference information.
  • score update ⁇ score old +(1- ⁇ )score new
  • score update is the user preference information in the preset time period after updating
  • score old is the user preference information in the preset time period before updating
  • score new is the user preference information in the current preset time period
  • the value of score new is as follows
  • the formula determines: Among them, click i is the number of times the user clicks on the browsing document corresponding to the i-th search result type within the current preset time period, L is a collection of search result types; ⁇ is the attenuation factor, and the value range is [0, 1], and the value of ⁇ is determined by the following formula: Wherein, c is the number of clicks of the user within the current preset time period, and ⁇ t is a preset attenuation coefficient.
  • step S220 in the embodiment shown in FIG. 2 also includes but is not limited to the following steps:
  • Step S310 obtaining the search result identifier from the historical search data
  • Step S320 obtaining historical search results from the search result database according to the search result identifier, and determining the search result type of the historical search results.
  • the search result database stores all the candidate search result resources of the search system, and is used to match the keyword information in the search request, so as to filter out target search results that match the keyword information.
  • the candidate search results stored in the search result database carry search result type information. Obtaining the search result identifier from the historical search data, obtaining the historical search result from the search result database through the search result identifier, and determining the search result type of the historical search result can provide a data basis for obtaining user preference information.
  • step S130 in the embodiment shown in FIG. 1 also includes but is not limited to the following steps:
  • Step S410 obtaining a pre-trained first ranking model
  • Step S420 input keyword information, user preference information, historical search results and target search results into the first ranking model to obtain sorted target search results.
  • a click-through rate prediction model based on deep learning or machine learning can be selected, such as the DeepFM model, Wide&Deep model, LambdaMart model, etc. Those skilled in the art can choose according to the actual situation.
  • inputting keyword information, user preference information, historical search results, and target search results into the first ranking model can enable the first ranking model to sort the target search results according to the keyword information, user preference information, and historical search results according to preset rules, and adding user preference information and historical search data as consideration parameters for sorting processing can make the ranking of the target search results presented to the user more in line with the user's personalized search needs.
  • this embodiment of the present application does not limit the data processing method of inputting keyword information, user preference information, historical search results, and target search results into the first ranking model.
  • Data may be extracted from keyword information, user preference information, and historical search results according to preset rules for splicing to obtain feature data, and then the feature data and target search results are input into the pre-trained first ranking model, so that the first ranking model can sort the target search results according to the feature data, thereby improving the matching degree of search results and search requirements.
  • the obtaining of the first sorting model in the embodiment shown in FIG. 4 includes but is not limited to the following steps:
  • Step S510 obtaining training data according to keyword information, user preference information and historical search results
  • step S520 a preset second ranking model is obtained, and the second ranking model is trained according to the training data to obtain the first ranking model.
  • the training data obtained according to keyword information, user preference information and historical search results, and the first ranking model obtained by training the preset second ranking model according to the training data can provide a data basis for determining the ranking of target search results, so as to better meet the personalized search needs of users.
  • the embodiment of the present application does not limit the training method of the second ranking model, and those skilled in the art can adjust the training parameters of the model according to the actual situation.
  • the embodiment of the present application does not limit the way of obtaining training data according to keyword information, user preference information and historical search results. It may be the method steps shown in FIG. 6. Referring to FIG. 6, in one embodiment, the acquisition of training data in the embodiment shown in FIG.
  • Step S610 determining the first matching data according to the keyword information and the historical search results, the first matching data represents the matching score between the keyword information and the historical search results;
  • Step S620 associating the first matching data, historical search results and user preference information to obtain training data.
  • the matching score between the keyword information and the historical search results is determined, and the first matching data, historical search results and user preference information are associated to obtain the training data.
  • the training data obtained through the above embodiment is subjected to data preprocessing and feature extraction, which can improve the data utilization rate of the training data in training the second ranking model.
  • step S420 in the embodiment shown in FIG. 4 it also includes but is not limited to the following steps:
  • Step S710 determining second matching data according to the keyword information and the target search result, where the second matching data represents the matching score between the keyword information and the target search result;
  • Step S720 sorting the target search results according to the second matching data to obtain the sorted target search results.
  • the second matching data is determined according to the keyword information and the target search results, and the target search results are initially sorted according to the second matching data, and the sorted target search results are then input into the first ranking model along with the keyword information, user preference information and historical search results, which can improve the accuracy of the first ranking model and make the target search results processed by the first ranking model more in line with the individual needs of users.
  • step S130 in the embodiment shown in FIG. 1 it also includes but is not limited to the following steps:
  • Step S810 filter the sorted target search results.
  • the embodiment of the present application proposes to filter the sorted target search results, which can effectively avoid duplicate data in the target search result list, or the target search results in the sorted target search result list have illegal content, or there are target search results for which the user corresponding to the user identifier does not have browsing authority, so as to improve the matching degree of the target search results and search requirements, thereby meeting the personalized search needs of users.
  • the embodiment of the present application does not limit the operation mode of filtering target search results.
  • the preset filtering rules can be defined according to the needs of actual business scenarios to obtain the matching score between each target search result, and filter and filter the target search results according to the matching score.
  • the user ID that has browsing authority for the target search result is matched and screened according to the current user ID and the authority verification information of each target search result to obtain the target search result that meets the browsing authority of the current user ID.
  • Those skilled in the art can adjust and select according to the actual situation.
  • FIG. 9 is a block diagram of a search system provided by another embodiment of the present application.
  • the search system 900 includes functional modules and data units.
  • the functional modules include: a user interface module 901, a document recall module 902, a secondary sorting module 903, a document filtering module 904, an automatic document labeling module 905, a data preprocessing module 906, and a model training module 907; The functions of each module are described:
  • the user interface module 901 the function of the user interface module 901 is to interact with the user through the front-end page, the user inputs the search keyword information through the user interface module 901, the user interface module 901 transmits the search request to the background related module, finally receives the target search result list delivered by the background module, and displays it to the user on the front-end page.
  • the user interface module 901 will record the user's historical search record data, including the search request submitted by the user, the search result information returned by the system, and the user's browsing information on the documents in the search result, etc., and transmit these information to the log center 908 for storage.
  • the document recall module 902 the function of the document recall module 902 is to initially match the search keyword information input by the user with all the document information in the document database 910, recall a small number of candidate documents, and provide a data basis for obtaining target search results that meet the user's personalized search needs.
  • the function of the secondary ranking module 903 is to reorder the candidate documents recalled by the document recall module 902 through the ranking model.
  • the secondary sorting module 903 includes a trained secondary sorting model.
  • the model can choose a click-through rate prediction model based on deep learning or machine learning, such as the DeepFM model, Wide&Deep model, LambdaMart model, etc.
  • the secondary sorting receives the various features delivered by the data preprocessing module 906, inputs them into the secondary sorting model, outputs the scoring of candidate documents, and sorts, and passes the sorted document list to the downstream document filtering module 904.
  • the document filtering module 904 the function of the document filtering module 904 is to filter the document list output by the secondary sorting module 903 based on certain filtering rules, such as removing duplicate documents in the document list, document auditing, that is, filtering operations such as removing illegal documents and documents that the user has no authority, and finally passing the filtered document list to the user interface module 901.
  • certain filtering rules such as removing duplicate documents in the document list, document auditing, that is, filtering operations such as removing illegal documents and documents that the user has no authority, and finally passing the filtered document list to the user interface module 901.
  • An automatic document labeling module 905. The function of the automatic document labeling module 905 is to perform automatic document labeling on documents in the database of the search system 900 through text classification technology in the field of natural language processing.
  • the principle is to construct a labeling system that can accurately describe all document categories based on domain expert knowledge; then manual labeling of some documents by labelers who are familiar with documents and labeling systems to generate high-quality manual labeling data; then use the manual labeling data as training data, and use supervised learning methods to train text classification models; finally, use the trained text classification model to label all documents in the database, and perform corresponding operations on newly entered documents to ensure that all documents in the document database 910 of the search system 900 have their corresponding documents. Documentation tab.
  • the data preprocessing module 906 the function of the data preprocessing module 906 is divided into two parts: online reasoning scene and offline scene.
  • the data preprocessing module 906 receives the keyword information transmitted by the user interface module 901, the user preference information extracted from the user preference information database 909, and the corresponding information of the candidate documents in the document database 910, and performs feature extraction and preprocessing to generate the input features required by the online secondary ranking model;
  • the data preprocessing module 906 receives the historical search data extracted from the log center 908 and the corresponding user preference information and document information, and performs feature extraction and preprocessing in the same way as in the online reasoning scenario.
  • the processed model input features are combined with the document tags in the historical search data to form training data and sent to the model training module 907 for model training.
  • Model training module 907. The function of the model training module 907 is to use the training data to train the sorting model and update the model after offline training and offline testing to the online secondary sorting module 903.
  • the log center 908 the function of the log center 908 is to store all the search record data generated by the users of the search system 900 accessing the search system 900, including the search keyword information submitted by the user, the target search results returned by the system, and the browsing information of the documents in the search results by the user (whether to click and browse, browsing time, etc.) and other information.
  • the log center 908 is responsible for providing user historical search data for subsequent modules to generate training data for the ranking model.
  • the user preference information database 909 The function of the user preference information database 909 is to maintain user preference information and update it regularly. The principle is to calculate the category label distribution of the clicked documents in the corresponding time dimension according to the user's historical click document data, and perform normalization processing respectively to obtain the user preference information of the current user in a given time dimension.
  • the document database 910 the function of the document database 910 is to store all the document resources of the search system 900, including document related information data, including document related information data, such as document title, document text, document tags and so on.
  • Embodiment 1 Embodiment 1, Embodiment 2, and Embodiment 3 are applied to the search system 900 shown in FIG. 9 .
  • FIG. 10 is a flow chart of the steps of a ranking model training method provided by another embodiment of the present application.
  • the ranking model training method includes the following steps:
  • step S1010 the user interface module 901 transmits the historical search data to the log center 908 for storage.
  • the historical search data includes the search keyword information input by the user, the search result list displayed to the user by the search system 900, and the user's click and browse information on documents in the search result list.
  • Step S1020 the data preprocessing module 906 obtains historical search data from the log center 908, and the format of the historical search data of the log center 908 can be as shown in Table 1:
  • Step S1030 the data preprocessing module 906 obtains the user identifier from the historical search data, and extracts user preference information from the user preference information database 909 according to the user identifier.
  • the user preference information represents the preference score of the document tag corresponding to the historical search document within the preset time period. For example, assuming that the document tag system in the automatic document tagging module 905108 has three tags (such as three categories of news, technical literature, and rules and regulations), and the preset time period takes four dimensions of 5 days, 10 days, 30 days, and 60 days.
  • Step S1040 the data preprocessing module 906 obtains the search result document identifier from the historical search data, and obtains the corresponding document information from the document database 910 according to the search result document identifier.
  • the document information includes the title of the document, the text of the document, the category label of the document, and the historical click volume information of the document.
  • step S1050 the data preprocessing module 906 performs feature extraction, preprocessing, and splicing operations according to the data obtained from the log center 908, the user preference information database 909, and the document database 910 respectively, that is, historical search data, user preference information, and document information, to generate training data.
  • the format of the training data can be as shown in Table 2:
  • the sample identification is the serial number of the training sample
  • the user identification and the document identification are the unique identification numbers of the user and the document respectively
  • the feature vector is spliced from the features extracted from the input data, including matching features, user preference information and document features. , generally use whether the user clicks on the corresponding document as the sample label, 1 is clicked, and 0 is not clicked.
  • the data preprocessing module 906 integrates all training samples, generates training data and sends it to the model training module 907 .
  • Step S1060 the model training module 907 uses the training data obtained from the data preprocessing module 906 to train the ranking model, and the trained ranking model is used to update the ranking model in the model training module 907 .
  • FIG. 11 is a flowchart of steps of a method for sorting search results provided in another embodiment of the present application.
  • the method for sorting search results includes the following steps:
  • Step S1110 the user interface module 901 interacts with the user of the search system 900 through the front-end page, and the user submits his keyword information through the search dialog box on the page. After clicking the search button, the user interface module 901 passes the keyword information to the document recall module 902.
  • the document recall module 902 performs preliminary matching according to the keyword information and all document information in the document database 910, for example, using the matching score between the keyword information and the document title, sorting and recalling some documents according to the scores as the candidate target document list, and sending the target document list to the secondary ranking module.
  • Step S1130 the data preprocessing module 906 performs feature extraction, preprocessing, and splicing operations according to the keyword information transmitted by the user interface module 901, the user preference information provided by the user preference information database 909, and the document information in the candidate document list provided by the document database 910, to generate a feature vector with the same format as the training data, and pass it to the secondary ranking module 903.
  • step S1140 the secondary ranking module 903 inputs the feature vectors delivered by the data preprocessing module 906 into the pre-trained ranking model to generate the ranking scores of the documents in the candidate document list, and reorder them according to the ranking scores to generate a sorted target document list.
  • the document filtering module 904 filters the sorted target document list according to preset filtering rules.
  • the filtering rules generally include removing duplicate documents, removing illegal content documents, removing documents that the user does not have permission to browse, etc., and the filtered document list is passed to the user interface module 901 as a target search result list.
  • step S1160 the user interface module 901 displays the target search result list returned by the document filtering module 904 to the user on the page, and the user can click to browse the corresponding document according to their needs.
  • FIG. 12 is a flow chart of the steps of labeling document information provided by another embodiment of the present application.
  • the application scenario of the step of labeling document information is the data interaction between the automatic document labeling module 905 and the document database 910.
  • the steps of labeling document information are as follows:
  • step S1210 the automatic document labeling module 905 acquires a pre-labeled document data set, and the document data set carries document tags.
  • step S1220 the automatic document labeling module 905 obtains a preset text classification model, and trains the text classification model according to the document data set to obtain a document labeling model.
  • Step S1230 the automatic document labeling module 905 obtains document information from the document database 910, inputs the document information into the document labeling model, and obtains document information carrying document labels.
  • Step S1240 the automatic document labeling module 905 stores the document information carrying the document label into the document database 910 .
  • FIG. 13 is a schematic structural diagram of a search system 1300 provided by another embodiment of the present application.
  • An embodiment of the present application also provides a search system 1300 .
  • the processor 1320 and the memory 1310 may be connected through a bus or in other ways.
  • the non-transitory software programs and instructions required to realize the search result sorting method of the above-mentioned embodiment are stored in the memory 1310, and when executed by the processor 1320, the search result sorting method in the above-mentioned embodiment is executed, for example, the method step S110 to method step S130 in Fig. 1 described above, the method step S210 to method step S230 in Fig. 2 , the method step S310 to method step S320 in Fig. 3 , and the method step S410 to method step in Fig. 4 S420, method step S510 to method step S520 in FIG. 5 , method step S610 to method step S620 in FIG. 6 , method step S710 to method step S720 in FIG. 7 , method step S810 in FIG. 8 .
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor 1320 or a controller, for example, by a processor 1320 in the above-mentioned embodiment of the search system 1300, so that the above-mentioned processor 1320 can execute the method for sorting search results in the above-mentioned embodiment, for example, execute the above-described method step S110 to method step S130 in FIG. 1 and method step S210 in FIG. 2
  • To method step S230 from method step S310 to method step S320 in FIG. 3, from method step S410 to method step S420 in FIG. 4, from method step S510 to method step S520 in FIG.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and that can be accessed by a computer.
  • communication media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is known to those of ordinary skill in the art.
  • Embodiments of the present application include a search result ranking method, a search system, and a computer-readable storage medium, wherein the search result ranking method includes: obtaining a search request, the search request including keyword information and user identification; obtaining at least two target search results according to the keyword information; determining user preference information according to the user identification, and determining the ranking of at least two target search results according to the user preference information.
  • the user preference information is determined according to the user identification
  • the ranking of the search results is determined according to the user preference information.
  • the technical solution of the present application can improve the matching degree between the target search results and the search needs, thereby meeting the personalized search needs of users.

Abstract

Disclosed in the present application are a search result sorting method, a search system and a computer-readable storage medium. The search result sorting method comprises: acquiring a search request, wherein the search request comprises keyword information and a user identifier (S110); acquiring at least two target search results according to the keyword information (S120); and determining user preference information according to the user identifier, and determining the order of the at least two target search results according to the user preference information (S130).

Description

搜索结果排序方法、搜索系统、计算机可读存储介质Search result sorting method, search system, computer readable storage medium
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202210059836.3、申请日为2022年01月19日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202210059836.3 and a filing date of January 19, 2022, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本申请涉及但不限于数据处理技术领域,尤其涉及一种搜索结果排序方法、搜索系统、计算机可读存储介质。The present application relates to but not limited to the technical field of data processing, and in particular relates to a method for sorting search results, a search system, and a computer-readable storage medium.
背景技术Background technique
随着信息技术的发展,搜索处理技术也日渐成熟,成为用户查找信息的主要入口。目前的搜索方法能够根据搜索关键词,从搜索系统的数据库或互联网上筛选与搜索需求信息相匹配的搜索结果呈现给用户。但是,用户之间存在差异性,不同的用户提交相同的搜索关键词可能带有不同的搜索需求,目前的搜索方法得到的目标搜索结果难以满足用户的个性化搜索需求。With the development of information technology, search processing technology is becoming more and more mature, becoming the main entrance for users to find information. The current search method can filter search results matching the search requirement information from the database of the search system or the Internet according to the search keywords and present them to the user. However, there are differences among users, and different users submitting the same search keywords may have different search requirements, and the target search results obtained by the current search methods are difficult to meet the personalized search requirements of users.
发明内容Contents of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本申请实施例提供了一种搜索结果排序方法、搜索系统、计算机可读存储介质。Embodiments of the present application provide a method for sorting search results, a search system, and a computer-readable storage medium.
第一方面,本申请实施例提供了一种搜索结果排序方法,包括:获取搜索请求,所述搜索请求包括关键词信息和用户标识;根据所述关键词信息获取至少两个目标搜索结果;根据所述用户标识确定用户偏好信息,根据所述用户偏好信息确定至少两个所述目标搜索结果的排序。In a first aspect, an embodiment of the present application provides a method for ranking search results, including: obtaining a search request, the search request including keyword information and user identification; obtaining at least two target search results according to the keyword information; determining user preference information according to the user identification, and determining the ranking of at least two target search results according to the user preference information.
第二方面,本申请实施例提供了一种搜索系统,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面任意一项实施例所述的搜索结果排序方法。In a second aspect, an embodiment of the present application provides a search system, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the method for sorting search results as described in any one of the embodiments of the first aspect when executing the computer program.
第三方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如第一方面任意一项实施例所述的搜索结果排序方法。In a third aspect, the embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions, the computer-executable instructions being used to execute the method for ranking search results as described in any one embodiment of the first aspect.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
附图说明Description of drawings
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.
图1是本申请一个实施例提供的搜索结果排序方法的步骤流程图;FIG. 1 is a flow chart of steps of a method for sorting search results provided by an embodiment of the present application;
图2是本申请另一个实施例提供的得到用户偏好信息的步骤流程图;FIG. 2 is a flow chart of steps for obtaining user preference information provided by another embodiment of the present application;
图3是本申请另一个实施例提供的得到搜索结果类型的步骤流程图;FIG. 3 is a flow chart of steps for obtaining search result types provided by another embodiment of the present application;
图4是本申请另一个实施例提供的搜索结果排序方法的步骤流程图;FIG. 4 is a flow chart of steps of a method for sorting search results provided in another embodiment of the present application;
图5是本申请另一个实施例提供的训练第一排序模型的步骤流程图;Fig. 5 is a flow chart of the steps of training the first sorting model provided by another embodiment of the present application;
图6是本申请另一个实施例提供的得到训练数据的步骤流程图;FIG. 6 is a flow chart of steps for obtaining training data provided by another embodiment of the present application;
图7是本申请另一个实施例提供的在将目标搜索结果输入至第一排序模型之前,对目标搜索结果进行排序的步骤流程图;Fig. 7 is a flow chart of steps for sorting target search results before inputting target search results into the first sorting model provided by another embodiment of the present application;
图8是本申请另一个实施例提供的对目标搜索结果进行过滤处理的步骤流程图;FIG. 8 is a flow chart of steps for filtering target search results provided by another embodiment of the present application;
图9是本申请另一个实施例提供的搜索系统的模块示意图;Fig. 9 is a block diagram of a search system provided by another embodiment of the present application;
图10是本申请另一个实施例提供的排序模型训练方法的步骤流程图;Fig. 10 is a flowchart of steps of a ranking model training method provided by another embodiment of the present application;
图11是本申请另一个实施例提供的搜索结果排序方法的步骤流程图;Fig. 11 is a flow chart of the steps of the search result sorting method provided by another embodiment of the present application;
图12是本申请另一个实施例提供的标注文档信息的步骤流程图;Fig. 12 is a flow chart of steps for marking document information provided by another embodiment of the present application;
图13是本申请另一个实施例提供的搜索系统的结构示意图。Fig. 13 is a schematic structural diagram of a search system provided by another embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书、权利要求书或上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than the module division in the device or the flow chart. The terms "first", "second" and the like in the specification, claims or the above-mentioned drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or sequential order.
本申请提供了一种搜索结果排序方法、搜索系统、计算机可读存储介质,其中,所述搜索结果排序方法包括:获取搜索请求,所述搜索请求包括关键词信息和用户标识;根据所述关键词信息获取至少两个目标搜索结果;根据所述用户标识确定用户偏好信息,根据所述用户偏好信息确定至少两个所述目标搜索结果的排序。根据本申请实施例提供的方案,根据用户标识确定用户偏好信息,并根据用户偏好信息确定搜索结果的排序,相较于目前仅根据关键词信息进行目标搜索结果的匹配排序的技术方案,本申请的技术方案能够提高目标搜索结果与搜索需求的匹配度,从而能够满足用户的个性化搜索需求。The present application provides a method for sorting search results, a search system, and a computer-readable storage medium, wherein the method for sorting search results includes: obtaining a search request, the search request including keyword information and a user ID; acquiring at least two target search results according to the keyword information; determining user preference information according to the user ID, and determining the ranking of at least two target search results according to the user preference information. According to the solution provided by the embodiment of the present application, the user preference information is determined according to the user identification, and the ranking of the search results is determined according to the user preference information. Compared with the current technical solution for matching and sorting the target search results based only on keyword information, the technical solution of the present application can improve the matching degree between the target search results and the search needs, thereby meeting the personalized search needs of users.
下面结合附图,对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.
如图1所示,图1是本申请一个实施例提供的搜索结果排序方法的步骤流程图,该搜索结果排序方法包括但不限于有以下步骤:As shown in Figure 1, Figure 1 is a flow chart of the steps of a method for sorting search results provided by an embodiment of the present application. The method for sorting search results includes but is not limited to the following steps:
步骤S110,获取搜索请求,搜索请求包括关键词信息和用户标识。In step S110, a search request is obtained, and the search request includes keyword information and user identification.
可以理解的是,搜索请求包括用户向搜索系统的用户界面中输入的关键词信息,以及输入关键词信息所对应的用户标识,能够为获取目标搜索结果以及用户偏好信息提供数据基础。It can be understood that the search request includes the keyword information input by the user to the user interface of the search system and the user identification corresponding to the input keyword information, which can provide a data basis for obtaining target search results and user preference information.
需要说明的是,本领域技术人员熟知关键词信息以及用户标识的获取方法,本申请实施例在此不多做限制。It should be noted that those skilled in the art are familiar with methods for acquiring keyword information and user identifiers, and this embodiment of the present application does not make any limitations here.
需要说明的是,本申请实施例并不限制用户标识的内容,可以是发起搜索请求的用户终端设备的互联网协议(Internet Protocol Address,IP)地址,还可以是发起搜索请求的用 户标识号(Identity document,ID),本领域技术人员可以根据实际情况进行选用。It should be noted that the embodiment of the present application does not limit the content of the user identification, which may be the Internet Protocol (Internet Protocol Address, IP) address of the user terminal device that initiates the search request, or the user identification number (Identity document, ID) that initiates the search request, and those skilled in the art can select according to the actual situation.
步骤S120,根据关键词信息获取至少两个目标搜索结果。Step S120, acquiring at least two target search results according to the keyword information.
需要说明的是,本申请实施例并不对根据关键词获取目标搜索结果的方式做限制,目标搜索结果可以是根据搜索的关键词信息从数据库中获取,该数据库中的每个候选搜索结果与关键词信息存在映射关系,当获取到新的关键词信息,根据映射关系从数据库中获取到新的关键词信息相对应的目标搜索结果;还可以是获取到搜索请求的关键词信息之后,将关键词信息与数据库中的所有候选搜索结果进行匹配查询,从而获取到与关键词信息匹配的目标搜索结果,可以理解的是,关键词信息与候选搜索结果的匹配方式为本领域技术人员所熟知,可以是通过文本匹配方式实现,本申请实施例在此不多做限制。It should be noted that the embodiment of the present application does not limit the method of obtaining target search results based on keywords. The target search results can be obtained from the database according to the searched keyword information. Each candidate search result in the database has a mapping relationship with the keyword information. When new keyword information is obtained, the target search result corresponding to the new keyword information is obtained from the database according to the mapping relationship; it can also be understood that after the keyword information of the search request is obtained, the keyword information is matched with all candidate search results in the database, so as to obtain the target search result that matches the keyword information. Yes, the matching method of keyword information and candidate search results is well known to those skilled in the art, and may be implemented through text matching, which is not limited in this embodiment of the present application.
可以理解的是,获取到至少两个目标搜索结果能够结合用户标识,为实现对目标搜索结果进行排序提供数据基础。It can be understood that the obtained at least two target search results can be combined with user identifiers to provide a data basis for sorting the target search results.
步骤S130,根据用户标识确定用户偏好信息,根据用户偏好信息确定至少两个目标搜索结果的排序。Step S130, determining user preference information according to the user identifier, and determining rankings of at least two target search results according to the user preference information.
可以理解的是,用户偏好信息可以是根据用户标识从用户偏好信息数据库中获取与用户标识相对应的用户偏好信息,用户偏好信息数据库中的用户偏好信息根据用户标识对应的历史搜索数据得到,例如对历史浏览信息和点击信息进行分析处理,当检测到历史浏览文档A的点击次数超过预设阈值,获取该文档A的标签信息,将文档A的标签信息确定为用户偏好信息。It can be understood that the user preference information can be obtained from the user preference information database corresponding to the user identifier according to the user identifier. The user preference information in the user preference information database is obtained according to the historical search data corresponding to the user identifier, such as analyzing and processing the historical browsing information and click information. When it is detected that the number of clicks on the historically browsed document A exceeds a preset threshold, the tag information of the document A is obtained, and the tag information of the document A is determined as the user preference information.
可以理解的是,用户偏好信息能够表征用户标识对应的用户的兴趣信息,本申请的技术方案在相关技术根据关键词信息进行目标搜索结果的匹配排序的基础上,增加用户偏好信息作为排序处理的考虑参数,能够使得呈现给用户的目标搜索结果的排序更符合用户的个性化搜索需求。It can be understood that the user preference information can represent the interest information of the user corresponding to the user identifier. The technical solution of the present application is based on the matching and sorting of the target search results according to the keyword information in the related art, and adds the user preference information as a consideration parameter of the sorting process, so that the sorting of the target search results presented to the user can be more in line with the user's personalized search needs.
另外,参照图2,在一实施例中,图1所示实施例中的步骤S130还包括但不限于有以下步骤:In addition, referring to FIG. 2, in one embodiment, step S130 in the embodiment shown in FIG. 1 also includes but is not limited to the following steps:
步骤S210,根据用户标识从日志中心中获取历史搜索数据;Step S210, obtaining historical search data from the log center according to the user identification;
步骤S220,根据历史搜索数据确定搜索结果类型;Step S220, determining the search result type according to the historical search data;
步骤S230,确定搜索结果类型在至少一个预设时间段内的分布信息,对分布信息进行归一化处理,得到用户偏好信息。Step S230, determining distribution information of search result types within at least one preset time period, and performing normalization processing on the distribution information to obtain user preference information.
可以理解的是,根据用户标识从日志中心中获取历史搜索数据,并根据历史搜索数据确定搜索结果类型,搜索结果类型为历史搜索数据中历史浏览结果的类别标签,通过确定搜索结果类型在至少一个预设时间段内的分布信息,并对分布信息进行归一化处理,得到用户偏好信息,能够提高用户偏好信息的实时性,增强对用户浏览偏好的刻画能力,从而提高目标搜索结果与搜索需求的匹配度,从而能够满足用户的个性化搜索需求。It can be understood that the historical search data is obtained from the log center according to the user identification, and the search result type is determined according to the historical search data. The search result type is the category label of the historical browsing result in the historical search data. By determining the distribution information of the search result type in at least one preset time period, and normalizing the distribution information, the user preference information can be obtained, which can improve the real-time performance of the user preference information, enhance the ability to describe the user's browsing preference, thereby improving the matching degree between the target search result and the search demand, so as to meet the personalized search needs of the user.
需要说明的是,本申请实施例并不对预设时间段的划分规则,以及预设时间段的个数做限制,可以是按照1周、1个月、3个月、6个月划分四个预设时间段,本领域技术人员可以根据实际情况进行选择。It should be noted that the embodiment of the present application does not limit the rules for dividing the preset time periods and the number of preset time periods. It can be divided into four preset time periods according to 1 week, 1 month, 3 months, and 6 months. Those skilled in the art can choose according to the actual situation.
可以理解的是,分布信息表征搜索结果类型对应的浏览文档在预设时间段内的点击数,日志中心的历史搜索数据会不断更新,当浏览文档在预设时间段内的点击数产生变化,不同 预设时间段的用户偏好信息的数值也会随之变更,但是,如果对类别标签浏览文档的点击次数按照时间维度进行累积,会导致累积值越来越大,造成长期兴趣权重过高,从而减弱用户兴趣的时效性。为了保证用户偏好信息的时效性,本申请实施例提出了更新用户偏好信息的方法,更新用户偏好信息的公式可以表示为:score update=λ×score old+(1-λ)score new,其中,score update为更新后的预设时间段内的用户偏好信息;score old为更新前的预设时间段内的用户偏好信息;score new为当前的预设时间段内的用户偏好信息,score new的取值由以下公式确定:
Figure PCTCN2023071322-appb-000001
其中,click i为当前预设时间段内用户对第i个搜索结果类型对应的浏览文档的点击次数,L为搜索结果类型的集合;λ为衰减因子,取值范围为[0,1],λ的取值由以下公式确定:
Figure PCTCN2023071322-appb-000002
其中,c为该用户当前预设时间段内的点击次数,α t为预设的衰减系数。
It is understandable that the distribution information represents the number of clicks on the browsed document corresponding to the search result type within the preset time period, and the historical search data in the log center will be continuously updated. When the number of clicks on the browsed document within the preset time period changes, the value of the user preference information in different preset time periods will also change accordingly. However, if the number of clicks on the category tag browsed document is accumulated according to the time dimension, the cumulative value will become larger and larger, causing the long-term interest weight to be too high, thereby weakening the timeliness of user interest. In order to ensure the timeliness of user preference information, the embodiment of the present application proposes a method for updating user preference information. The formula for updating user preference information can be expressed as: score update =λ×score old +(1-λ)score new , wherein, score update is the user preference information in the preset time period after updating; score old is the user preference information in the preset time period before updating; score new is the user preference information in the current preset time period, and the value of score new is as follows The formula determines:
Figure PCTCN2023071322-appb-000001
Among them, click i is the number of times the user clicks on the browsing document corresponding to the i-th search result type within the current preset time period, L is a collection of search result types; λ is the attenuation factor, and the value range is [0, 1], and the value of λ is determined by the following formula:
Figure PCTCN2023071322-appb-000002
Wherein, c is the number of clicks of the user within the current preset time period, and α t is a preset attenuation coefficient.
需要说明的是,本申请实施例并不限制对分布信息进行归一化处理的方式,本领域技术人员可以根据实际情况进行选用。It should be noted that, the embodiment of the present application does not limit the manner of normalizing the distribution information, and those skilled in the art can select according to the actual situation.
另外,参照图3,在一实施例中,图2所示实施例中的步骤S220还包括但不限于有以下步骤:In addition, referring to FIG. 3 , in one embodiment, step S220 in the embodiment shown in FIG. 2 also includes but is not limited to the following steps:
步骤S310,从历史搜索数据中获取搜索结果标识;Step S310, obtaining the search result identifier from the historical search data;
步骤S320,根据搜索结果标识从搜索结果数据库中获取历史搜索结果,并确定历史搜索结果的搜索结果类型。Step S320, obtaining historical search results from the search result database according to the search result identifier, and determining the search result type of the historical search results.
可以理解的是,搜索结果数据库存储有搜索系统所有的候选搜索结果资源,用于匹配搜索请求中的关键词信息,从而筛选出与关键词信息相匹配的目标搜索结果,搜索结果数据库中存储的候选搜索结果携带有搜索结果类型信息。从历史搜索数据中获取搜索结果标识,通过搜索结果标识从搜索结果数据库中获取历史搜索结果,并确定历史搜索结果的搜索结果类型,能够为获得用户偏好信息提供数据基础。It can be understood that the search result database stores all the candidate search result resources of the search system, and is used to match the keyword information in the search request, so as to filter out target search results that match the keyword information. The candidate search results stored in the search result database carry search result type information. Obtaining the search result identifier from the historical search data, obtaining the historical search result from the search result database through the search result identifier, and determining the search result type of the historical search result can provide a data basis for obtaining user preference information.
另外,参照图4,在一实施例中,图1所示实施例中的步骤S130还包括但不限于有以下步骤:In addition, referring to FIG. 4 , in one embodiment, step S130 in the embodiment shown in FIG. 1 also includes but is not limited to the following steps:
步骤S410,获取预先训练好的第一排序模型;Step S410, obtaining a pre-trained first ranking model;
步骤S420,将关键词信息、用户偏好信息、历史搜索结果和目标搜索结果输入至第一排序模型,得到排序后的目标搜索结果。Step S420, input keyword information, user preference information, historical search results and target search results into the first ranking model to obtain sorted target search results.
需要说明的是,本申请实施例并不涉及对第一排序模型的改进,也不限制所应用的排序模型的类型,可以选择基于深度学习或机器学习的点击率预估模型,例如DeepFM模型、Wide&Deep模型、LambdaMart模型等,本领域技术人员可以根据实际情况进行选择。It should be noted that the embodiment of the present application does not involve the improvement of the first ranking model, nor does it limit the type of ranking model applied. A click-through rate prediction model based on deep learning or machine learning can be selected, such as the DeepFM model, Wide&Deep model, LambdaMart model, etc. Those skilled in the art can choose according to the actual situation.
可以理解的是,将关键词信息、用户偏好信息、历史搜索结果和目标搜索结果输入至第一排序模型,能够使得第一排序模型根据预设的规则根据关键词信息、用户偏好信息、历史搜索结果对目标搜索结果进行排序,增加用户偏好信息以及历史搜索数据作为排序处理的考 虑参数,能够使得呈现给用户的目标搜索结果的排序更符合用户的个性化搜索需求。It can be understood that inputting keyword information, user preference information, historical search results, and target search results into the first ranking model can enable the first ranking model to sort the target search results according to the keyword information, user preference information, and historical search results according to preset rules, and adding user preference information and historical search data as consideration parameters for sorting processing can make the ranking of the target search results presented to the user more in line with the user's personalized search needs.
需要说明的是,本申请实施例并不对将关键词信息、用户偏好信息、历史搜索结果和目标搜索结果输入至第一排序模型的数据处理方式做限制,可以是根据预置规则分别从关键词信息、用户偏好信息和历史搜索结果中提取数据进行拼接,得到特征数据,再将特征数据以及目标搜索结果输入至预先训练好的第一排序模型,以使第一排序模型根据特征数据对目标搜索结果进行排序处理,从而能够提高搜索结果与搜索需求的匹配度。It should be noted that this embodiment of the present application does not limit the data processing method of inputting keyword information, user preference information, historical search results, and target search results into the first ranking model. Data may be extracted from keyword information, user preference information, and historical search results according to preset rules for splicing to obtain feature data, and then the feature data and target search results are input into the pre-trained first ranking model, so that the first ranking model can sort the target search results according to the feature data, thereby improving the matching degree of search results and search requirements.
另外,参照图5,在一实施例中,图4所示实施例中的第一排序模型的获得包括但不限于有以下步骤:In addition, referring to FIG. 5, in one embodiment, the obtaining of the first sorting model in the embodiment shown in FIG. 4 includes but is not limited to the following steps:
步骤S510,根据关键词信息、用户偏好信息和历史搜索结果得到训练数据;Step S510, obtaining training data according to keyword information, user preference information and historical search results;
步骤S520,获取预设的第二排序模型,根据训练数据对第二排序模型进行训练,得到第一排序模型。In step S520, a preset second ranking model is obtained, and the second ranking model is trained according to the training data to obtain the first ranking model.
可以理解的是,根据关键词信息、用户偏好信息和历史搜索结果得到训练数据,并根据训练数据对预设的第二排序模型训练得到的第一排序模型能够为确定目标搜索结果的排序提供数据基础,从而能够更好地满足用户的个性化搜索需求。It can be understood that the training data obtained according to keyword information, user preference information and historical search results, and the first ranking model obtained by training the preset second ranking model according to the training data can provide a data basis for determining the ranking of target search results, so as to better meet the personalized search needs of users.
需要说明的是,本申请实施例并不对第二排序模型的训练方法做限制,本领域技术人员可以根据实际情况进行调整模型的训练参数。It should be noted that the embodiment of the present application does not limit the training method of the second ranking model, and those skilled in the art can adjust the training parameters of the model according to the actual situation.
需要说明的是,本申请实施例并不限制根据关键词信息、用户偏好信息和历史搜索结果得到训练数据的方式,可以是如图6所示的方法步骤,参照图6,在一实施例中,图5所示实施例中的训练数据的获取包括但不限于有以下步骤:It should be noted that the embodiment of the present application does not limit the way of obtaining training data according to keyword information, user preference information and historical search results. It may be the method steps shown in FIG. 6. Referring to FIG. 6, in one embodiment, the acquisition of training data in the embodiment shown in FIG.
步骤S610,根据关键词信息和历史搜索结果确定第一匹配数据,第一匹配数据表征关键词信息与历史搜索结果之间的匹配度评分;Step S610, determining the first matching data according to the keyword information and the historical search results, the first matching data represents the matching score between the keyword information and the historical search results;
步骤S620,关联第一匹配数据、历史搜索结果和用户偏好信息,得到训练数据。Step S620, associating the first matching data, historical search results and user preference information to obtain training data.
可以理解的是,确定关键词信息和历史搜索结果之间的匹配度评分,并关联第一匹配数据、历史搜索结果和用户偏好信息,得到训练数据,通过上述实施例获得的训练数据经过数据预处理以及特征提取,能够提高训练数据在训练第二排序模型的数据利用率。It can be understood that the matching score between the keyword information and the historical search results is determined, and the first matching data, historical search results and user preference information are associated to obtain the training data. The training data obtained through the above embodiment is subjected to data preprocessing and feature extraction, which can improve the data utilization rate of the training data in training the second ranking model.
另外,参照图7,在一实施例中,在图4所示实施例中的步骤S420之前,还包括但不限于有以下步骤:In addition, referring to FIG. 7 , in one embodiment, before step S420 in the embodiment shown in FIG. 4 , it also includes but is not limited to the following steps:
步骤S710,根据关键词信息和目标搜索结果确定第二匹配数据,第二匹配数据表征关键词信息与目标搜索结果之间的匹配度评分;Step S710, determining second matching data according to the keyword information and the target search result, where the second matching data represents the matching score between the keyword information and the target search result;
步骤S720,根据第二匹配数据对目标搜索结果进行排序处理,得到排序后的目标搜索结果。Step S720, sorting the target search results according to the second matching data to obtain the sorted target search results.
可以理解的是,在将关键词信息、用户偏好信息、历史搜索结果和目标搜索结果输入至第一排序模型之前,根据关键词信息和目标搜索结果确定第二匹配数据,并根据第二匹配数据对目标搜索结果进行初步的排序处理,将经过排序后的目标搜索结果再与关键词信息、用户偏好信息和历史搜索结果输入至第一排序模型,能够提高第一排序模型的准确率,能够使得经过第一排序模型处理后的目标搜索结果更符合用户的个性化需求。It can be understood that before inputting the keyword information, user preference information, historical search results and target search results into the first ranking model, the second matching data is determined according to the keyword information and the target search results, and the target search results are initially sorted according to the second matching data, and the sorted target search results are then input into the first ranking model along with the keyword information, user preference information and historical search results, which can improve the accuracy of the first ranking model and make the target search results processed by the first ranking model more in line with the individual needs of users.
另外,参照图8,在一实施例中,在图1所示实施例中的步骤S130之后,还包括但不限于有以下步骤:In addition, referring to FIG. 8 , in one embodiment, after step S130 in the embodiment shown in FIG. 1 , it also includes but is not limited to the following steps:
步骤S810,对排序后的目标搜索结果进行过滤处理。Step S810, filter the sorted target search results.
可以理解的是,在对目标搜索结果进行排序之后,本申请实施例提出对排序后的目标搜索结果进行过滤处理,能够有效避免目标搜索结果列表中存在重复数据,或者排序后的目标搜索结果列表中的目标搜索结果存在违规内容,或者存在用户标识所对应的用户没有浏览权限的目标搜索结果的情况发生,从而能够提高目标搜索结果与搜索需求的匹配度,从而能够满足用户的个性化搜索需求。It can be understood that, after the target search results are sorted, the embodiment of the present application proposes to filter the sorted target search results, which can effectively avoid duplicate data in the target search result list, or the target search results in the sorted target search result list have illegal content, or there are target search results for which the user corresponding to the user identifier does not have browsing authority, so as to improve the matching degree of the target search results and search requirements, thereby meeting the personalized search needs of users.
需要说明的是,本申请实施例并不对目标搜索结果进行过滤处理的操作方式做限制,可以通过预置的过滤规则,过滤规则根据实际业务场景需要进行定义,获得每个目标搜索结果之间的匹配度评分,根据匹配度评分对目标搜索结果进行过滤筛选,例如,目标搜索结果A与目标搜索结果B之间的匹配度评分大于预设的阈值,在呈现的目标搜索结果列表中删除目标搜索结果A或目标搜索结果B;或者,获取每个目标搜索结果的权限验证信息,权限验证信息包括对该目标搜索结果具有浏览权限的用户标识,根据当前用户标识与每个目标搜索结果的权限验证信息进行匹配筛选,得到符合当前用户标识浏览权限的目标搜索结果,本领域技术人员可以根据实际情况进行调整和选用。It should be noted that the embodiment of the present application does not limit the operation mode of filtering target search results. The preset filtering rules can be defined according to the needs of actual business scenarios to obtain the matching score between each target search result, and filter and filter the target search results according to the matching score. The user ID that has browsing authority for the target search result is matched and screened according to the current user ID and the authority verification information of each target search result to obtain the target search result that meets the browsing authority of the current user ID. Those skilled in the art can adjust and select according to the actual situation.
另外,为了对本申请提供的数据同步方法进行更详细的说明,以下以三个实施例对本申请的技术方案进行描述。In addition, in order to describe the data synchronization method provided by the present application in more detail, the technical solutions of the present application are described below with three embodiments.
参考图9,图9是本申请另一个实施例提供的搜索系统的模块示意图,该搜索系统900包括功能模块以及数据单元,功能模块包括:用户界面模块901、文档召回模块902、二次排序模块903、文档过滤模块904、自动化文档标注模块905、数据预处理模块906和模型训练模块907;数据单元包括:日志中心908、用户偏好信息数据库909和文档数据库910,下面对搜索系统900的各个模块的功能进行描述:Referring to FIG. 9 , FIG. 9 is a block diagram of a search system provided by another embodiment of the present application. The search system 900 includes functional modules and data units. The functional modules include: a user interface module 901, a document recall module 902, a secondary sorting module 903, a document filtering module 904, an automatic document labeling module 905, a data preprocessing module 906, and a model training module 907; The functions of each module are described:
用户界面模块901,用户界面模块901的功能是通过前端页面与用户进行交互,用户通过用户界面模块901输入搜索关键词信息,用户界面模块901将搜索请求传递给后台相关模块,最后接收后台模块传递的目标搜索结果列表,在前端页面上展示给用户。此外,用户界面模块901会记录用户的历史搜索记录数据,包括用户提交的搜索请求、系统返回的搜索结果信息以及用户对搜索结果中文档的浏览信息等,将这些信息传递给日志中心908进行存储。The user interface module 901, the function of the user interface module 901 is to interact with the user through the front-end page, the user inputs the search keyword information through the user interface module 901, the user interface module 901 transmits the search request to the background related module, finally receives the target search result list delivered by the background module, and displays it to the user on the front-end page. In addition, the user interface module 901 will record the user's historical search record data, including the search request submitted by the user, the search result information returned by the system, and the user's browsing information on the documents in the search result, etc., and transmit these information to the log center 908 for storage.
文档召回模块902,文档召回模块902的功能是通过用户输入的搜索关键词信息与文档数据库910中所有的文档信息进行初步的匹配,召回少量的候选文档,为获取满足用户个性化搜索需求的目标搜索结果提供数据基础。The document recall module 902, the function of the document recall module 902 is to initially match the search keyword information input by the user with all the document information in the document database 910, recall a small number of candidate documents, and provide a data basis for obtaining target search results that meet the user's personalized search needs.
二次排序模块903,二次排序模块903的功能是,通过排序模型对文档召回模块902召回的候选文档进行重排序。二次排序模块903包括一个训练好的二次排序模型,模型可选择基于深度学习或机器学习的点击率预估模型,例如DeepFM模型、Wide&Deep模型、LambdaMart模型等,二次排序接收数据预处理模块906传递来的各项特征,输入到二次排序模型中,输出对候选文档的评分,并进行排序,将排序后的文档列表传递给下游的文档过滤模块904。 Secondary ranking module 903. The function of the secondary ranking module 903 is to reorder the candidate documents recalled by the document recall module 902 through the ranking model. The secondary sorting module 903 includes a trained secondary sorting model. The model can choose a click-through rate prediction model based on deep learning or machine learning, such as the DeepFM model, Wide&Deep model, LambdaMart model, etc. The secondary sorting receives the various features delivered by the data preprocessing module 906, inputs them into the secondary sorting model, outputs the scoring of candidate documents, and sorts, and passes the sorted document list to the downstream document filtering module 904.
文档过滤模块904,文档过滤模块904的功能是对二次排序模块903输出的文档列表基于一定的过滤规则进行过滤,例如去除文档列表中重复的文档,文档审核,即去除违规文档及用户没有权限的文档等过滤操作,最后将过滤后的文档列表传递给用户界面模块901。The document filtering module 904, the function of the document filtering module 904 is to filter the document list output by the secondary sorting module 903 based on certain filtering rules, such as removing duplicate documents in the document list, document auditing, that is, filtering operations such as removing illegal documents and documents that the user has no authority, and finally passing the filtered document list to the user interface module 901.
自动化文档标注模块905,自动化文档标注模块905的功能是通过自然语言处理领域中的文本分类技术来对搜索系统900的数据库中的文档进行自动化的文档标签标注。原理为首先通过领域专家知识构建一个能够较为准确地描述所有文档类别的标签体系;然后由对于文 档和标签体系较为了解的标注人员对部分文档进行人工标注,产生高质量的人工标注数据;之后将人工标注数据作为训练数据,通过监督学习方法,训练文本分类模型;最后通过训练好的文本分类模型,对数据库中的所有文档进行文档标签的标注,且对于新入库的文档也进行相应的操作,保证搜索系统900的文档数据库910中的所有文档都有其对应的文档标签。An automatic document labeling module 905. The function of the automatic document labeling module 905 is to perform automatic document labeling on documents in the database of the search system 900 through text classification technology in the field of natural language processing. The principle is to construct a labeling system that can accurately describe all document categories based on domain expert knowledge; then manual labeling of some documents by labelers who are familiar with documents and labeling systems to generate high-quality manual labeling data; then use the manual labeling data as training data, and use supervised learning methods to train text classification models; finally, use the trained text classification model to label all documents in the database, and perform corresponding operations on newly entered documents to ensure that all documents in the document database 910 of the search system 900 have their corresponding documents. Documentation tab.
数据预处理模块906,数据预处理模块906的功能分为线上推理场景和线下场景两部分。线上推理场景中数据预处理模块906接收用户界面模块901传递的关键词信息,从用户偏好信息数据库909中提取的用户偏好信息以及文档数据库910中候选文档的相应信息,进行特征的提取和预处理,产生线上二次排序模型所需要的输入特征;线下场景中数据预处理模块906接收从日志中心908提取的历史搜索数据以及相应的用户偏好信息和文档信息,以与线上推理场景中相同的方式的进行特征的提取和预处理,将处理后的模型输入特征与历史搜索数据中的文档标签组合成训练数据传递给模型训练模块907进行模型训练。The data preprocessing module 906, the function of the data preprocessing module 906 is divided into two parts: online reasoning scene and offline scene. In the online reasoning scenario, the data preprocessing module 906 receives the keyword information transmitted by the user interface module 901, the user preference information extracted from the user preference information database 909, and the corresponding information of the candidate documents in the document database 910, and performs feature extraction and preprocessing to generate the input features required by the online secondary ranking model; in the offline scenario, the data preprocessing module 906 receives the historical search data extracted from the log center 908 and the corresponding user preference information and document information, and performs feature extraction and preprocessing in the same way as in the online reasoning scenario. The processed model input features are combined with the document tags in the historical search data to form training data and sent to the model training module 907 for model training.
模型训练模块907,模型训练模块907的功能是利用训练数据来进行排序模型的训练将经过线下训练和离线测试的模型更新到线上的二次排序模块903中。 Model training module 907. The function of the model training module 907 is to use the training data to train the sorting model and update the model after offline training and offline testing to the online secondary sorting module 903.
日志中心908,日志中心908的功能是存储搜索系统900用户访问搜索系统900所产生的所有搜索记录数据,包括用户提交的搜索关键词信息、系统返回的目标搜索结果以及用户对搜索结果中文档的浏览信息(是否点击并浏览,浏览时长等)等信息。此外,日志中心908在线下场景中,负责提供用户历史搜索数据来供后续模块生成排序模型的训练数据。The log center 908, the function of the log center 908 is to store all the search record data generated by the users of the search system 900 accessing the search system 900, including the search keyword information submitted by the user, the target search results returned by the system, and the browsing information of the documents in the search results by the user (whether to click and browse, browsing time, etc.) and other information. In addition, in the offline scenario, the log center 908 is responsible for providing user historical search data for subsequent modules to generate training data for the ranking model.
用户偏好信息数据库909,用户偏好信息数据库909的功能是维护用户偏好信息并进行定时更新,原理为根据用户的历史点击文档数据,分时间维度统计其在相应时间维度下点击文档的类别标签分布,并分别进行归一化处理,得到当前的用户在给定时间维度下的用户偏好信息。The user preference information database 909. The function of the user preference information database 909 is to maintain user preference information and update it regularly. The principle is to calculate the category label distribution of the clicked documents in the corresponding time dimension according to the user's historical click document data, and perform normalization processing respectively to obtain the user preference information of the current user in a given time dimension.
文档数据库910,文档数据库910的功能是存储搜索系统900的所有的文档资源,包括文档的相关信息数据,包括文档的相关信息数据,如文档标题、文档正文、文档标签等。The document database 910, the function of the document database 910 is to store all the document resources of the search system 900, including document related information data, including document related information data, such as document title, document text, document tags and so on.
需要说明的是,实施例一、实施例二和实施例三的方法步骤应用于图9所示的搜索系统900。It should be noted that the method steps of Embodiment 1, Embodiment 2, and Embodiment 3 are applied to the search system 900 shown in FIG. 9 .
实施例一,参考图10,图10是本申请另一个实施例提供的排序模型训练方法的步骤流程图,该排序模型训练方法包括有以下步骤:Embodiment 1, referring to FIG. 10, FIG. 10 is a flow chart of the steps of a ranking model training method provided by another embodiment of the present application. The ranking model training method includes the following steps:
步骤S1010,用户界面模块901将历史搜索数据,传递给日志中心908进行存储,历史搜索数据包括用户输入的搜索关键词信息、搜索系统900展示给用户的搜索结果列表以及用户对搜索结果列表中文档的点击浏览信息等信息。In step S1010, the user interface module 901 transmits the historical search data to the log center 908 for storage. The historical search data includes the search keyword information input by the user, the search result list displayed to the user by the search system 900, and the user's click and browse information on documents in the search result list.
步骤S1020,数据预处理模块906从日志中心908中获取历史搜索数据,日志中心908的历史搜索数据的格式可以如表1所示:Step S1020, the data preprocessing module 906 obtains historical search data from the log center 908, and the format of the historical search data of the log center 908 can be as shown in Table 1:
Figure PCTCN2023071322-appb-000003
Figure PCTCN2023071322-appb-000003
表1历史搜索数据列表Table 1 List of historical search data
步骤S1030,数据预处理模块906从历史搜索数据中获取用户标识,根据用户标识从用户偏好信息数据库909中提取用户偏好信息。用户偏好信息表征历史搜索文档对应的文档标签在预设时间段内的偏好得分,例如,假设自动化文档标注模块905 108中的文档标签体系共3个标签(例如新闻、技术文献、规章制度三个类别),且预设时间段取5天、10天、30天、60天这四个维度,则每个用户的偏好特征是4个维度为3的标准化向量,分别表示在对应的预设时间段内用户对3个标签的偏好得分。Step S1030, the data preprocessing module 906 obtains the user identifier from the historical search data, and extracts user preference information from the user preference information database 909 according to the user identifier. The user preference information represents the preference score of the document tag corresponding to the historical search document within the preset time period. For example, assuming that the document tag system in the automatic document tagging module 905108 has three tags (such as three categories of news, technical literature, and rules and regulations), and the preset time period takes four dimensions of 5 days, 10 days, 30 days, and 60 days.
步骤S1040,数据预处理模块906从历史搜索数据中获取搜索结果文档标识,根据搜索结果文档标识从文档数据库910中获取对应的文档信息,文档信息包括文档的标题、文档的正文、文档的类别标签以及文档的历史被点击量信息。Step S1040, the data preprocessing module 906 obtains the search result document identifier from the historical search data, and obtains the corresponding document information from the document database 910 according to the search result document identifier. The document information includes the title of the document, the text of the document, the category label of the document, and the historical click volume information of the document.
步骤S1050,数据预处理模块906根据分别从日志中心908、用户偏好信息数据库909和文档数据库910中获取到的数据,即历史搜索数据、用户偏好信息以及文档信息,进行特征的抽取、预处理、拼接操作,产生训练数据,训练数据的格式可以如表2所示:In step S1050, the data preprocessing module 906 performs feature extraction, preprocessing, and splicing operations according to the data obtained from the log center 908, the user preference information database 909, and the document database 910 respectively, that is, historical search data, user preference information, and document information, to generate training data. The format of the training data can be as shown in Table 2:
样本标识Sample ID 用户标识User ID 文档标识document identification 特征向量Feature vector 样本标签sample label
00 User101User101 Doc 103Doc 103 [0.1,0.3,...][0.1,0.3,...] 11
11 User101User101 Doc 101Doc 101 [1.0,0.4,...][1.0,0.4,...] 00
表2训练数据列表Table 2 Training data list
其中,样本标识的是训练样本的序号;用户标识和文档标识分别是用户和文档的唯一标识号;特征向量是从输入的数据抽取的特征拼接而成,包括匹配特征、用户偏好信息和文档特征三部分,匹配特征包括关键词信息和文档标题以及文档正文的匹配得分等,用户偏好信息包括从用户偏好信息数据库909中直接提取的对应用户在不同预设时间段内的特征信息,文档特征则包括文档标题和文档正文的向量表示等,样本标签则是样本的训练目标,一般用用户是否点击对应文档作为样本标签,点击为1,未点击为0。数据预处理模块906将所有的训练样本进行整合,生成训练数据传递给模型训练模块907。Among them, the sample identification is the serial number of the training sample; the user identification and the document identification are the unique identification numbers of the user and the document respectively; the feature vector is spliced from the features extracted from the input data, including matching features, user preference information and document features. , generally use whether the user clicks on the corresponding document as the sample label, 1 is clicked, and 0 is not clicked. The data preprocessing module 906 integrates all training samples, generates training data and sends it to the model training module 907 .
步骤S1060,模型训练模块907通过从数据预处理模块906中得到的训练数据,进行排序模型的训练,训练后的排序模型用于更新模型训练模块907中的排序模型。Step S1060 , the model training module 907 uses the training data obtained from the data preprocessing module 906 to train the ranking model, and the trained ranking model is used to update the ranking model in the model training module 907 .
实施例二,参考图11,图11是本申请另一个实施例提供的搜索结果排序方法的步骤流程图,该搜索结果排序方法包括有以下步骤:Embodiment 2, refer to FIG. 11 . FIG. 11 is a flowchart of steps of a method for sorting search results provided in another embodiment of the present application. The method for sorting search results includes the following steps:
步骤S1110,用户界面模块901通过前端页面与搜索系统900用户进行信息的交互,用户通过页面的搜索对话框提交自己的关键词信息,点击搜索按钮后,用户界面模块901将关键词信息传递给文档召回模块902。Step S1110, the user interface module 901 interacts with the user of the search system 900 through the front-end page, and the user submits his keyword information through the search dialog box on the page. After clicking the search button, the user interface module 901 passes the keyword information to the document recall module 902.
步骤S1120,文档召回模块902根据关键词信息和文档数据库910中的所有文档信息进行初步匹配,例如采用关键词信息与文档标题的匹配得分,按得分排列召回一部分文档作为候选的目标文档列表,并将目标文档列表发送至二次排序模块。In step S1120, the document recall module 902 performs preliminary matching according to the keyword information and all document information in the document database 910, for example, using the matching score between the keyword information and the document title, sorting and recalling some documents according to the scores as the candidate target document list, and sending the target document list to the secondary ranking module.
步骤S1130,数据预处理模块906根据用户界面模块901传输的关键词信息、用户偏好信息数据库909提供的用户偏好信息以及文档数据库910提供的候选文档列表中的文档信息,进行特征的抽取、预处理、拼接操作,产生与训练数据格式相同的特征向量,传递给二次排 序模块903。Step S1130, the data preprocessing module 906 performs feature extraction, preprocessing, and splicing operations according to the keyword information transmitted by the user interface module 901, the user preference information provided by the user preference information database 909, and the document information in the candidate document list provided by the document database 910, to generate a feature vector with the same format as the training data, and pass it to the secondary ranking module 903.
步骤S1140,二次排序模块903根据数据预处理模块906传递的特征向量,输入到预先训练好的排序模型中,产生候选文档列表中文档的排序得分,按排序得分对其进行重新排序,产生排序后的目标文档列表。In step S1140, the secondary ranking module 903 inputs the feature vectors delivered by the data preprocessing module 906 into the pre-trained ranking model to generate the ranking scores of the documents in the candidate document list, and reorder them according to the ranking scores to generate a sorted target document list.
步骤S1150,文档过滤模块904根据预设的过滤规则,对排序后的目标文档列表进行过滤,过滤规则一般包括去除重复文档、去除违规内容文档、去除用户没有权限浏览的文档等,过滤后的文档列表作为目标搜索结果列表传递给用户界面模块901。In step S1150, the document filtering module 904 filters the sorted target document list according to preset filtering rules. The filtering rules generally include removing duplicate documents, removing illegal content documents, removing documents that the user does not have permission to browse, etc., and the filtered document list is passed to the user interface module 901 as a target search result list.
步骤S1160,用户界面模块901根据文档过滤模块904返回的目标搜索结果列表在页面上展示给用户,用户可以根据自己的需求点击浏览相应的文档。In step S1160, the user interface module 901 displays the target search result list returned by the document filtering module 904 to the user on the page, and the user can click to browse the corresponding document according to their needs.
实施例三,参考图12,图12是本申请另一个实施例提供的标注文档信息的步骤流程图,该标注文档信息的步骤的应用场景为自动化文档标注模块905与文档数据库910之间的数据交互,该标注文档信息的步骤如下:Embodiment 3, referring to FIG. 12 , FIG. 12 is a flow chart of the steps of labeling document information provided by another embodiment of the present application. The application scenario of the step of labeling document information is the data interaction between the automatic document labeling module 905 and the document database 910. The steps of labeling document information are as follows:
步骤S1210,自动化文档标注模块905获取预先标注好的文档数据集,文档数据集携带有文档标签。In step S1210, the automatic document labeling module 905 acquires a pre-labeled document data set, and the document data set carries document tags.
步骤S1220,自动化文档标注模块905获取预设的文本分类模型,根据文档数据集对文本分类模型进行训练,得到文档标注模型。In step S1220, the automatic document labeling module 905 obtains a preset text classification model, and trains the text classification model according to the document data set to obtain a document labeling model.
步骤S1230,自动化文档标注模块905从文档数据库910中获取文档信息,将文档信息输入至文档标注模型,得到携带有文档标签的文档信息.Step S1230, the automatic document labeling module 905 obtains document information from the document database 910, inputs the document information into the document labeling model, and obtains document information carrying document labels.
步骤S1240,自动化文档标注模块905将携带有文档标签的文档信息存入文档数据库910。Step S1240 , the automatic document labeling module 905 stores the document information carrying the document label into the document database 910 .
另外,参考图13,图13是本申请另一个实施例提供的搜索系统1300的结构示意图,本申请的一个实施例还提供了一种搜索系统1300,该搜索系统1300包括:存储器1310、处理器1320及存储在存储器1310上并可在处理器1320上运行的计算机程序。In addition, referring to FIG. 13 , FIG. 13 is a schematic structural diagram of a search system 1300 provided by another embodiment of the present application. An embodiment of the present application also provides a search system 1300 .
处理器1320和存储器1310可以通过总线或者其他方式连接。The processor 1320 and the memory 1310 may be connected through a bus or in other ways.
实现上述实施例的搜索结果排序方法所需的非暂态软件程序以及指令存储在存储器1310中,当被处理器1320执行时,执行上述实施例中的搜索结果排序方法,例如,执行以上描述的图1中的方法步骤S110至方法步骤S130、图2中的方法步骤S210至方法步骤S230、图3中的方法步骤S310至方法步骤S320、图4中的方法步骤S410至方法步骤S420、图5中的方法步骤S510至方法步骤S520、图6中的方法步骤S610至方法步骤S620、图7中的方法步骤S710至方法步骤S720、图8中的方法步骤S810。The non-transitory software programs and instructions required to realize the search result sorting method of the above-mentioned embodiment are stored in the memory 1310, and when executed by the processor 1320, the search result sorting method in the above-mentioned embodiment is executed, for example, the method step S110 to method step S130 in Fig. 1 described above, the method step S210 to method step S230 in Fig. 2 , the method step S310 to method step S320 in Fig. 3 , and the method step S410 to method step in Fig. 4 S420, method step S510 to method step S520 in FIG. 5 , method step S610 to method step S620 in FIG. 6 , method step S710 to method step S720 in FIG. 7 , method step S810 in FIG. 8 .
以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器1320或控制器执行,例如,被上述搜索系统1300实施例中的一个处理器1320执行,可使得上述处理器1320执行上述实施例中的搜索结果排序方法,例如,执行以上描述的图1中的方法步骤S110至方法步骤S130、图2中的方法步骤S210至方法步骤S230、图3中的方法步骤S310至方法步骤S320、图4中的方法步骤S410至方法步骤S420、图5中的方法步骤S510至方法步骤S520、图6中的方法步骤S610至方法步骤S620、图7中的方法步骤S710至方法步骤S720、图8中的方法步骤 S810。本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。In addition, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor 1320 or a controller, for example, by a processor 1320 in the above-mentioned embodiment of the search system 1300, so that the above-mentioned processor 1320 can execute the method for sorting search results in the above-mentioned embodiment, for example, execute the above-described method step S110 to method step S130 in FIG. 1 and method step S210 in FIG. 2 To method step S230, from method step S310 to method step S320 in FIG. 3, from method step S410 to method step S420 in FIG. 4, from method step S510 to method step S520 in FIG. 5, from method step S610 to method step S620 in FIG. 6, from method step S710 to method step S720 in FIG. Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired information and that can be accessed by a computer. In addition, communication media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is known to those of ordinary skill in the art.
本申请实施例包括一种搜索结果排序方法、搜索系统、计算机可读存储介质,其中,所述搜索结果排序方法包括:获取搜索请求,所述搜索请求包括关键词信息和用户标识;根据所述关键词信息获取至少两个目标搜索结果;根据所述用户标识确定用户偏好信息,根据所述用户偏好信息确定至少两个所述目标搜索结果的排序。根据本申请实施例提供的方案,根据用户标识确定用户偏好信息,并根据用户偏好信息确定搜索结果的排序,相较于目前仅根据关键词信息进行目标搜索结果的匹配排序的技术方案,本申请的技术方案能够提高目标搜索结果与搜索需求的匹配度,从而能够满足用户的个性化搜索需求。Embodiments of the present application include a search result ranking method, a search system, and a computer-readable storage medium, wherein the search result ranking method includes: obtaining a search request, the search request including keyword information and user identification; obtaining at least two target search results according to the keyword information; determining user preference information according to the user identification, and determining the ranking of at least two target search results according to the user preference information. According to the solution provided by the embodiment of the present application, the user preference information is determined according to the user identification, and the ranking of the search results is determined according to the user preference information. Compared with the current technical solution for matching and sorting the target search results based only on keyword information, the technical solution of the present application can improve the matching degree between the target search results and the search needs, thereby meeting the personalized search needs of users.
以上是对本申请的若干实施方式进行了说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a description of several implementations of the present application, but the application is not limited to the above-mentioned implementations. Those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the application. These equivalent deformations or replacements are all included within the scope defined by the claims of the application.

Claims (10)

  1. 一种搜索结果排序方法,包括:A method for sorting search results, comprising:
    获取搜索请求,所述搜索请求包括关键词信息和用户标识;Obtain a search request, the search request includes keyword information and user identification;
    根据所述关键词信息获取至少两个目标搜索结果;Acquiring at least two target search results according to the keyword information;
    根据所述用户标识确定用户偏好信息,根据所述用户偏好信息确定至少两个所述目标搜索结果的排序。User preference information is determined according to the user identifier, and ranking of at least two of the target search results is determined according to the user preference information.
  2. 根据权利要求1所述的方法,其中,所述根据所述用户标识确定用户偏好信息,包括:The method according to claim 1, wherein said determining user preference information according to said user identifier comprises:
    根据所述用户标识从日志中心获取历史搜索数据;Acquiring historical search data from the log center according to the user identification;
    根据所述历史搜索数据确定搜索结果类型;determining the search result type according to the historical search data;
    确定所述搜索结果类型在至少一个预设时间段内的分布信息;determining the distribution information of the search result type within at least one preset time period;
    对所述分布信息进行归一化处理,得到所述用户偏好信息。Perform normalization processing on the distribution information to obtain the user preference information.
  3. 根据权利要求2所述的方法,其中,所述根据所述历史搜索数据确定搜索结果类型,包括:The method according to claim 2, wherein said determining the search result type according to the historical search data comprises:
    从所述历史搜索数据中获取搜索结果标识;Obtain the search result identifier from the historical search data;
    根据所述搜索结果标识从搜索结果数据库中获取历史搜索结果,并确定所述历史搜索结果的搜索结果类型。Acquiring historical search results from the search result database according to the search result identification, and determining the search result type of the historical search results.
  4. 根据权利要求3所述的方法,其中,所述根据所述用户偏好信息确定至少两个所述目标搜索结果的排序,包括:The method according to claim 3, wherein said determining the ranking of at least two of said target search results according to said user preference information comprises:
    获取预先训练好的第一排序模型;Obtain the pre-trained first ranking model;
    将所述关键词信息、所述用户偏好信息、所述历史搜索结果和所述目标搜索结果输入至所述第一排序模型,得到排序后的目标搜索结果。Inputting the keyword information, the user preference information, the historical search results and the target search results into the first ranking model to obtain the sorted target search results.
  5. 根据权利要求4所述的方法,其中,所述第一排序模型通过以下训练步骤得到:The method according to claim 4, wherein the first ranking model is obtained through the following training steps:
    根据所述关键词信息、所述用户偏好信息和所述历史搜索结果得到训练数据;obtaining training data according to the keyword information, the user preference information and the historical search results;
    获取预设的第二排序模型,根据所述训练数据对所述第二排序模型进行训练,得到所述第一排序模型。A preset second ranking model is obtained, and the second ranking model is trained according to the training data to obtain the first ranking model.
  6. 根据权利要求5所述的方法,其中,所述根据所述关键词信息、所述用户偏好信息和所述历史搜索结果得到训练数据,包括:The method according to claim 5, wherein said obtaining training data according to said keyword information, said user preference information and said historical search results comprises:
    根据所述关键词信息和所述历史搜索结果确定第一匹配数据,所述第一匹配数据表征所述关键词信息与所述历史搜索结果之间的匹配度评分;determining first matching data according to the keyword information and the historical search results, the first matching data characterizing the matching score between the keyword information and the historical search results;
    关联所述第一匹配数据、所述历史搜索结果和所述用户偏好信息,得到所述训练数据。The training data is obtained by associating the first matching data, the historical search results and the user preference information.
  7. 根据权利要求4所述的方法,其中,所述在所述将所述关键词信息、所述用户偏好信息、所述历史搜索结果和所述目标搜索结果输入至所述第一排序模型之前,还包括:The method according to claim 4, wherein, before said inputting said keyword information, said user preference information, said historical search results and said target search results into said first ranking model, further comprising:
    根据所述关键词信息和所述目标搜索结果确定第二匹配数据,所述第二匹配数据表征所述关键词信息与所述目标搜索结果之间的匹配度评分;determining second matching data according to the keyword information and the target search result, where the second matching data characterizes a matching score between the keyword information and the target search result;
    根据所述第二匹配数据对所述目标搜索结果进行排序处理,得到排序后的目标搜索结果。The target search results are sorted according to the second matching data to obtain the sorted target search results.
  8. 根据权利要求1所述的方法,其中,在所述根据所述用户偏好信息确定至少两个所述目标搜索结果的排序之后,还包括:The method according to claim 1, wherein, after determining the ranks of at least two of the target search results according to the user preference information, further comprising:
    对排序后的目标搜索结果进行过滤处理。Filtering is performed on the sorted target search results.
  9. 一种搜索系统,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至8中任意一项所述的搜索结果排序方法。A search system, comprising: a memory, a processor, and a computer program stored in the memory and operable on the processor, when the processor executes the computer program, the search result sorting method according to any one of claims 1 to 8 is realized.
  10. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如权利要求1至8中任意一项所述的搜索结果排序方法。A computer-readable storage medium storing computer-executable instructions, the computer-executable instructions being used to execute the method for ranking search results according to any one of claims 1-8.
PCT/CN2023/071322 2022-01-19 2023-01-09 Search result sorting method, search system and computer-readable storage medium WO2023138428A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210059836.3A CN116501951A (en) 2022-01-19 2022-01-19 Search result ordering method, search system, and computer-readable storage medium
CN202210059836.3 2022-01-19

Publications (1)

Publication Number Publication Date
WO2023138428A1 true WO2023138428A1 (en) 2023-07-27

Family

ID=87321812

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071322 WO2023138428A1 (en) 2022-01-19 2023-01-09 Search result sorting method, search system and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN116501951A (en)
WO (1) WO2023138428A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938463B1 (en) * 2007-03-12 2015-01-20 Google Inc. Modifying search result ranking based on implicit user feedback and a model of presentation bias
CN104462293A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Search processing method and method and device for generating search result ranking model
CN107885889A (en) * 2017-12-13 2018-04-06 聚好看科技股份有限公司 Feedback method, methods of exhibiting and the device of search result
CN110020128A (en) * 2017-10-26 2019-07-16 阿里巴巴集团控股有限公司 A kind of search result ordering method and device
CN110489638A (en) * 2019-07-08 2019-11-22 广州视源电子科技股份有限公司 A kind of searching method, device, server, system and storage medium
CN111143695A (en) * 2019-12-31 2020-05-12 腾讯科技(深圳)有限公司 Searching method, searching device, server and storage medium
CN111177551A (en) * 2019-12-27 2020-05-19 百度在线网络技术(北京)有限公司 Method, device, equipment and computer storage medium for determining search result

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938463B1 (en) * 2007-03-12 2015-01-20 Google Inc. Modifying search result ranking based on implicit user feedback and a model of presentation bias
CN104462293A (en) * 2014-11-27 2015-03-25 百度在线网络技术(北京)有限公司 Search processing method and method and device for generating search result ranking model
CN110020128A (en) * 2017-10-26 2019-07-16 阿里巴巴集团控股有限公司 A kind of search result ordering method and device
CN107885889A (en) * 2017-12-13 2018-04-06 聚好看科技股份有限公司 Feedback method, methods of exhibiting and the device of search result
CN110489638A (en) * 2019-07-08 2019-11-22 广州视源电子科技股份有限公司 A kind of searching method, device, server, system and storage medium
CN111177551A (en) * 2019-12-27 2020-05-19 百度在线网络技术(北京)有限公司 Method, device, equipment and computer storage medium for determining search result
CN111143695A (en) * 2019-12-31 2020-05-12 腾讯科技(深圳)有限公司 Searching method, searching device, server and storage medium

Also Published As

Publication number Publication date
CN116501951A (en) 2023-07-28

Similar Documents

Publication Publication Date Title
CN107169049B (en) Application tag information generation method and device
US10459971B2 (en) Method and apparatus of generating image characteristic representation of query, and image search method and apparatus
US10565533B2 (en) Systems and methods for similarity and context measures for trademark and service mark analysis and repository searches
WO2021093755A1 (en) Matching method and apparatus for questions, and reply method and apparatus for questions
US10366093B2 (en) Query result bottom retrieval method and apparatus
US20180268038A1 (en) Systems and Methods for Similarity and Context Measures for Trademark and Service Mark Analysis and Repository Searches
US10089581B2 (en) Data driven classification and data quality checking system
US20170235820A1 (en) System and engine for seeded clustering of news events
US10348550B2 (en) Method and system for processing network media information
US9116985B2 (en) Computer-implemented systems and methods for taxonomy development
WO2021098648A1 (en) Text recommendation method, apparatus and device, and medium
WO2017097231A1 (en) Topic processing method and device
CN107220295A (en) A kind of people's contradiction reconciles case retrieval and mediation strategy recommends method
US10387805B2 (en) System and method for ranking news feeds
CN108027814A (en) Disable word recognition method and device
TWI743623B (en) Artificial intelligence-based business intelligence system and its analysis method
US20220405312A1 (en) Methods and systems for modifying a search result
US20220107980A1 (en) Providing an object-based response to a natural language query
CN113297457A (en) High-precision intelligent information resource pushing system and pushing method
CN110955767A (en) Algorithm and device for generating intention candidate set list set in robot dialogue system
CN110880142A (en) Risk entity acquisition method and device
CN110188291B (en) Document processing based on proxy log
CN112307336A (en) Hotspot information mining and previewing method and device, computer equipment and storage medium
CN112685440B (en) Structural query information expression method for marking search semantic role
CN112487808A (en) Big data based news message pushing method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23742743

Country of ref document: EP

Kind code of ref document: A1