CN114780846A - Ranking model training method, device, medium and equipment of information retrieval system - Google Patents

Ranking model training method, device, medium and equipment of information retrieval system Download PDF

Info

Publication number
CN114780846A
CN114780846A CN202210431856.9A CN202210431856A CN114780846A CN 114780846 A CN114780846 A CN 114780846A CN 202210431856 A CN202210431856 A CN 202210431856A CN 114780846 A CN114780846 A CN 114780846A
Authority
CN
China
Prior art keywords
sample
sample data
loss function
weight
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210431856.9A
Other languages
Chinese (zh)
Inventor
曲瑛琪
吴奇飞
刘璟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210431856.9A priority Critical patent/CN114780846A/en
Publication of CN114780846A publication Critical patent/CN114780846A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • G06F16/437Administration of user profiles, e.g. generation, initialisation, adaptation, distribution

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a ranking model training method, a ranking model training device, a ranking model training medium and ranking model training equipment of an information retrieval system, and relates to the technical field of artificial intelligence, in particular to the technical field of intelligent search. The implementation scheme is as follows: acquiring a plurality of sample data batches; and sequentially executing the following operations on each sample data batch: inputting the sample query result and the corresponding query content in each sample data in the sample data batch into a sorting model to obtain a correlation degree prediction result between the sample query result corresponding to the sample data and the corresponding query content; calculating a single-sample loss function and a sample pair loss function of the sample data batch based on the corresponding correlation degree prediction result and the corresponding correlation degree grade label of each sample data; calculating a comprehensive loss function of the sample data batch based on the single-sample loss function and the sample pair loss function; and adjusting a plurality of parameters of the ranking model based on the composite loss function.

Description

Ranking model training method, device, medium and equipment of information retrieval system
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for training a ranking model of an information retrieval system, an electronic device, a computer-readable storage medium, and a computer program product.
Background
Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
To alleviate the problem of Information overload, Information Retrieval technology (IR) has been extensively studied. The goal of information retrieval is to find the most relevant data to a user query from a database given a large-scale database and a user query, thereby satisfying the user's information needs.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been acknowledged in any prior art, unless otherwise indicated.
Disclosure of Invention
The present disclosure provides a method, apparatus, electronic device, computer-readable storage medium, and computer program product for ranking model training of an information retrieval system.
According to an aspect of the present disclosure, there is provided a training method of a ranking model for an information retrieval system, including: obtaining a plurality of sample data batches, wherein each sample data batch in the plurality of sample data batches comprises a plurality of sample data, and each sample data of the plurality of sample data comprises a sample query result, query content corresponding to the sample query result and a relevance grade label for indicating the relevance between the sample query result and the corresponding query content; and sequentially carrying out the following operations on each sample data batch: inputting the sample query result and the corresponding query content in each sample data in the sample data batch into a sequencing model to obtain a correlation prediction result between the sample query result corresponding to the sample data and the corresponding query content; calculating a single-sample loss function and a sample pair loss function of the sample data batch based on the corresponding correlation degree prediction result and the corresponding correlation degree grade label of each sample data; calculating a comprehensive loss function of the sample data batch based on the single-sample loss function and the sample pair loss function; and adjusting a plurality of parameters of the ranking model based on the composite loss function.
According to another aspect of the present disclosure, there is provided a ranking method for an information retrieval system, including: recalling a plurality of query results based on the query content; inputting each query result in the plurality of query results into a ranking model in combination with the query content so as to obtain a relevancy score between each query result in the plurality of query results and the query content through the ranking model, wherein the ranking model is obtained by training based on the training method for the ranking model of the information retrieval system; and determining an order of ranking of the plurality of query results based on the relevancy score of each of the plurality of query results to the query content.
According to another aspect of the present disclosure, there is provided an information retrieval system including: a recall model configured to recall a plurality of query results based on query content; the ranking model is configured to obtain a relevance score of each query result in the plurality of query results and the query content based on the plurality of query results and the query content so as to be used for ranking the plurality of query results, wherein the ranking model is obtained based on training of the training method for the ranking model of the information retrieval system.
According to another aspect of the present disclosure, there is provided a training apparatus for a ranking model of an information retrieval system, including: the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a plurality of sample data batches, each sample data batch in the plurality of sample data batches comprises a plurality of sample data, and each sample data in the plurality of sample data comprises a sample query result, query content corresponding to the sample query result and a relevancy grade label for indicating the relevancy between the sample query result and the corresponding query content; the execution unit is configured to execute the following sub-unit operations on each sample data batch in sequence: the input subunit is configured to input the sample query result and the corresponding query content in each sample data in the sample data batch into the ranking model to obtain a correlation prediction result between the sample query result corresponding to the sample data and the corresponding query content; the first calculation subunit is configured to calculate a single-sample loss function and a sample pair loss function of each sample data batch based on the corresponding correlation prediction result and the corresponding correlation grade label of each sample data; a second calculating subunit configured to calculate a comprehensive loss function of the sample data batch based on the single-sample loss function and the sample-pair loss function; and an adjusting subunit configured to adjust a plurality of parameters of the ranking model based on the synthetic loss function.
According to another aspect of the present disclosure, there is provided a sorting apparatus for an information retrieval system, including: a recall unit configured to recall a plurality of query results based on the query content; the input unit is configured to input each query result in the plurality of query results into the ranking model in combination with the query content so as to obtain the relevancy score of each query result in the plurality of query results and the query content through the ranking model, wherein the ranking model is obtained by training based on the training method of the ranking model for the information retrieval system; and a determination unit configured to determine an arrangement order of the plurality of query results based on the relevancy score of each of the plurality of query results to the query content.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a training method for a ranking model of an information retrieval system or a ranking method for an information retrieval system.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute a training method for a ranking model of an information retrieval system or a ranking method for an information retrieval system.
According to another aspect of the disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when being executed by a processor, implements a training method for a ranking model of an information retrieval system or a ranking method for an information retrieval system.
According to one or more embodiments of the disclosure, sample data is input into the ranking model according to batches, and calculation of a single-sample loss function and a sample-to-loss function is performed based on model output of each batch to obtain a comprehensive loss function, so that model parameters are adjusted based on the comprehensive loss function, the ranking model can be trained by combining two training modes, and the ranking accuracy of the ranking model is improved.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of example only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.
FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a training method for a ranking model of an information retrieval system according to an embodiment of the present disclosure;
FIG. 3 shows a flow diagram for obtaining a plurality of batches of sample data, according to an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of sample pairs in an exemplary sample data batch;
FIG. 5 shows a flow diagram for computing a composite loss function for the sample data batch according to an embodiment of the present disclosure;
FIG. 6 shows a flow diagram of a ranking method for an information retrieval system according to an embodiment of the present disclosure;
FIG. 7 shows a block diagram of an information retrieval system according to an embodiment of the present disclosure;
FIG. 8 shows a block diagram of a training apparatus for a ranking model of an information retrieval system according to an embodiment of the present disclosure;
FIG. 9 shows a block diagram of a ranking apparatus for an information retrieval system, according to an embodiment of the present disclosure;
FIG. 10 shows a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, it will be recognized by those of ordinary skill in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to define a positional relationship, a temporal relationship, or an importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.
The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the element may be one or a plurality of. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.
With the popularization of the internet and the development of Information technology, numerous and complicated Information flows bring huge cognitive burden to users, and in order to alleviate the problem of Information overload, Information Retrieval (IR) technology has been widely researched by people. The goal of information retrieval is to find the most relevant data to a user query from a database given a large-scale database and a user query, thereby satisfying the user's information needs.
Information retrieval systems typically contain two cascaded models: a recall model and a ranking model. The recall model is responsible for quickly screening out a candidate data set relevant to user query from the large-scale database under the condition of giving the user query, and the sequencing model is responsible for further sequencing the candidate data set and finally returning a relevant data list for the user to refer to.
In the related art, common training modes of the ranking model include: a training mode based on a single sample (Pointwise) and a training mode based on a sample pair (Pairwise). The method comprises the steps that a ranking model trained by a training mode based on a single sample (Pointwise) has higher prediction accuracy on the relevance between a single query result and a user query, but has lower prediction accuracy on the ranking of two query results; the ranking model trained by applying the training mode based on the sample pair (Pairwise) is more consistent with the expectation of the user for ranking the two query results, but the relevance prediction accuracy of each query result is poor.
Therefore, the inventor proposes a training method for a ranking model of an information retrieval system, which is characterized in that sample data is input into the ranking model according to batches, and a single-sample loss function and a sample-to-loss function are respectively calculated based on model output of each batch to obtain a comprehensive loss function, so that model parameters are adjusted based on the comprehensive loss function. Therefore, the sequencing model can be trained by combining two training modes, and the sequencing accuracy of the sequencing model is improved.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.
In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the method for ranking model training for an information retrieval system to be performed.
In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.
In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a client device 101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.
A user may use client devices 101, 102, 103, 104, 105, and/or 106 to collect and upload sample data. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.
Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptops), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.
Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.
The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.
The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.
In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the client devices 101, 102, 103, 104, 105, and/or 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and/or 106.
In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the conventional physical host and Virtual Private Server (VPS) service.
The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and video files. The database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The database 130 may be of different types. In certain embodiments, the database used by the server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.
In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or conventional stores supported by a file system.
The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.
According to an embodiment of the present disclosure, as shown in fig. 2, there is provided a training method for a ranking model of an information retrieval system, including: step S201, obtaining a plurality of sample data batches, wherein each sample data batch in the plurality of sample data batches comprises a plurality of sample data, each sample data in the plurality of sample data batches comprises a sample query result, query contents corresponding to the sample query result and a relevance grade label used for indicating the relevance between the sample query result and the corresponding query contents; and sequentially executing the following operations on each sample data batch: step S202, inputting the sample query result and the corresponding query content in each sample data in the sample data batch into a sequencing model to obtain a correlation prediction result between the sample query result corresponding to the sample data and the corresponding query content; step S203, calculating a single-sample loss function and a sample pair loss function of the sample data batch based on the corresponding correlation degree prediction result and the corresponding correlation degree grade label of each sample data; step S204, calculating a comprehensive loss function of the sample data batch based on the single-sample loss function and the sample pair loss function; and step S205, adjusting a plurality of parameters of the sequencing model based on the synthetic loss function.
Thus, sample data is input into the sequencing model in batches, and a single-sample loss function and a sample-to-loss function are calculated based on the model output of each batch to obtain a comprehensive loss function, so that model parameters are adjusted based on the comprehensive loss function. By the method, the sequencing model can be trained by combining two training modes, and the sequencing accuracy of the sequencing model is improved.
In some embodiments, the ranking model may be composed of a concatenation of a pre-trained language model and a fully-connected layer, where the pre-trained language model may apply, for example, ERNIE, BERT, RoBERTa, MacBERT, and the like models.
In some embodiments, each sample data used for training includes a sample query result and its corresponding query content. The form of the sample query result may include text, image, audio, video, and the like, and the form of the corresponding query content may be in the form of text or voice information, and the like. Semantic feature extraction can be respectively carried out on the sample query result and the corresponding query content, the corresponding semantic features are input into a pre-training language model in the sequencing model, and the correlation degree prediction result between the sample query result and the corresponding query content is output through a full connection layer in the sequencing model.
In some embodiments, the sample query results may be text data extracted from the original sample data in the form of images, audio, and video, respectively, as well as text data in a document. The voice information in the audio and video can be converted into text data through a voice recognition technology or a character recognition technology, and the corresponding text data such as titles, description texts and the like can be respectively extracted from the audio, the video and the image.
The query content may be text query information or text data extracted based on voice query information. When the sample query result is applied to model training, the sample query result and the query content can be spliced into text data in a text splicing mode, the text data is input into a pre-training language model in the sequencing model, and a correlation degree prediction result between the sample query result and the corresponding query content is output through a full connection layer in the sequencing model.
Each sample query result corresponds to a relevance grade label, and the label can be used for indicating the relevance between the sample query result and the corresponding query content.
In some embodiments, the relevancy rating labels may include multiple ranks, and the larger the value of a label, the higher the rank, the greater the relevancy between its corresponding sample query results.
In some examples, the relevance rank label can take on a value of [0,1,2 … z ], z ≧ 1. Wherein z may, for example, take the value 3, 4 or 5. Taking z as an example, if 3 is taken, the relevancy grade labels are divided into 4 grades, which are 0,1,2 and 3 respectively, wherein the label 0 represents that the sample query result is irrelevant with the query content, the label 1 represents that the sample query result is weakly relevant with the query content, the label 2 represents that the sample query result is relevant with the query content, and the label 3 represents that the sample query result is strongly relevant with the query content.
It can be understood that, a person skilled in the relevant art may set the value of the relevancy grade label and the number of label grades by himself or herself, which is not limited herein.
In some embodiments, as shown in fig. 3, obtaining a plurality of sample data batches may comprise: step S301, obtaining a plurality of sample data sets respectively corresponding to a plurality of query contents, wherein a plurality of sample data in each sample data set in the plurality of sample data sets are respectively sorted according to the numerical value of the corresponding relevancy grade label, and the sample data with the same relevancy grade label is randomly arranged; step S302, arranging a plurality of sample data sets according to a preset sequence to obtain a sample data sequence; and step S303, dividing the sample data sequence into a plurality of sample data batches according to a preset batch.
Therefore, by arranging a plurality of sample data into a sample sequence which is ordered according to the query content and further according to the label grade and dividing batches based on the sequence, the sample data can be distributed more uniformly, most batches can comprise a plurality of sample data corresponding to labels with different correlation degrees grades of the same query content, more convenient conditions are provided for calculating the loss function of the sample corresponding to each batch and the loss function of a single sample, the reuse of the sample data is avoided, and the sample preparation and model training efficiency is further improved.
In some embodiments, a plurality of sample data sets respectively corresponding to a plurality of query contents may be first obtained, where the plurality of query contents may constitute one query content set Q ═ Q1,q2,...,qmAnd m is equal to N. Each query content qi(i∈[1,m]) Corresponding to one having n (q)i) Sample data set of individual sample query results
Figure BDA0003611042040000101
Wherein li,jRepresenting the jth sample query result di,jIs given by the relevance rank label of (j ∈ [1, n (q))i)],n(qi) E.g. N. Wherein each sample query result di,jFor example, text data.
In some embodiments, each sample data set d may be combined with a sample data setiAccording to its corresponding query content qiAre arranged in the order of (a).
In some cases, the ranking model may apply multiple rounds of training using the multiple sample data sets described above. Before starting each round of training, the query content Q in the query content set Q may be first searchediThe order of (a) is randomly scrambled,and then based on the scrambled query content qiOrder arranged sample data set d ofi. Therefore, sample data of each round of training can be input into the model based on different sequences, and the effect of model training is further improved.
In some embodiments, each sample data set d may be further sorted based on the sample data setiEach sample query result d in (1)i,jThe sample query results with the same relevancy rating label are gathered together, that is, each relevancy rating label corresponds to one sample query result subset, each sample query result subset is ranked according to the numeric value of the relevancy rating label (for example, ranked from large to small according to the relevancy rating label), and the sample query results included in each sample query result subset are randomly arranged.
In some cases, the ranking model may apply multiple rounds of training using the multiple sample data sets described above. Before starting each round of training, the sample data set d can be samplediOn the basis of the order disorder, the order of the sample query results in each sample query result subset is further scrambled, so that the effect of model training can be further improved.
Through the operation, a sample data sequence formed by arranging a plurality of sample data sets can be obtained, and further, the sample data sequence can be divided into a plurality of sample data batches according to the preset batch B.
In some embodiments, the value of the preset batch B may be determined according to a memory size of a graphics card (GPU) to which the model training is applied, and may be, for example, 16 or 32. It can be understood that, a person skilled in the relevant art may determine the value of the preset batch B according to the actual situation, and the value is not limited herein.
In some embodiments, the parameters in the ranking model may be adjusted based on the composite loss function by sequentially inputting sample data into the ranking model in batches, and calculating a single-sample loss function and a sample-to-loss function based on the prediction result of the model for each batch, and further calculating the composite loss function.
In some embodiments, sample data in a sample data batch may correspond to one or more query contents qi. Query results d for each samplei,jQuery content q that may first correspond theretoiSplicing, inputting the spliced data into a pre-training model (such as a pre-training language model ERNIE) in the sequencing model, and outputting through a full connection layer cascaded behind the pre-training model to obtain each sample query result di,jQuery content q corresponding theretoiCorrelation prediction results (i.e., correlation scores) between: s isi,j=ERNIE([CLS]qi[SEP]di,j[SEP]])。
In some embodiments, the single sample loss function corresponding to a sample data batch may be calculated by the following formula:
Figure BDA0003611042040000111
where | B | represents the batch size of the sample data batch (i.e., a preset batch), and where,
Figure BDA0003611042040000121
x is a variable, li,jDenotes di,jCorresponding relevance rank labels.
In some embodiments, the correlation level label/may first be applied before proceeding with the single sample loss functioni,jNormalized to map to [0,1 respectively]And (4) interval. For example, when the relevance grade labels are divided into 4 grades, 0,1,2 and 3 respectively, wherein label 0 is normalized to 0, label 1 is normalized to 0.33, label 2 is normalized to 0.67, and label 3 is normalized to 1.
In some embodiments, the normalized relevance rank labels may be adjusted as a hyper-parameter. For example, when the normalized correlation degree level label corresponding to a certain sample query result is 0.33, and it can be determined according to actual experience that the correlation degree between the sample query result and the query content should be about 0.5, the correlation degree can be adjusted accordingly, so that the label better conforms to the actual situation, and the training effect of the model is improved.
In some embodiments, calculating the sample pair loss function for the sample data batch based on the respective relevancy prediction result and the respective relevancy rating label for each sample data batch may include: in response to including at least one sample pair in the batch of sample data, calculating a sample pair loss function based on the respective prediction of the degree of correlation and the respective degree of correlation label for each sample data in each of the at least one sample pair, wherein each sample pair in the at least one sample pair includes the first sample data and the second sample data, and each sample pair in the at least one sample pair satisfies the following condition: the query content corresponding to the first sample data is the same as the query content corresponding to the second sample data; and the numerical value of the relevancy grade label corresponding to the first sample data is larger than the numerical value of the relevancy grade label corresponding to the second sample data.
Therefore, by calculating the sample pair loss function between the sample pairs under the same query content, the obtained loss is more in line with the actual scene of the model application (namely, a plurality of query results recalled according to the same query content are ranked), and the performance of the model can be further improved by the sample pair loss function constructed based on the method.
In some embodiments, the sample query results d included in one sample data batchi,jMay all correspond to the same query content qiBut may correspond to different relevancy rating labels. For the sample data batch, a sample pair in the batch may be obtained first, where the sample pair includes first sample data and second sample data, and a numerical value of a correlation level label corresponding to the first sample data is greater than a numerical value of a correlation level label corresponding to the second sample data. At this time, the sample pair loss function corresponding to the sample data batch
Figure BDA0003611042040000122
The calculation can be made by the following formula:
Figure BDA0003611042040000131
wherein li,jAnd li,kRespectively the sample query results d in the sample data batch (batch size B)i,jAnd sample query results di,k,si,jAnd si,kRespectively as sample query results di,jAnd sample query results di,kCorresponding correlation prediction results, wherein j, k is equal to [1, B ]];niThe number of the sample pairs meeting the conditions in the sample data batch is determined;
Figure BDA0003611042040000132
is an adjustable hyper-parameter, usually with li,jAnd li,kThe difference of (a) and (b) is in positive correlation.
In some embodiments, this may be done by adjusting
Figure BDA0003611042040000133
Adjusting sample query results di,jAnd sample query results di,kThe distance between them that differ in rank, for example, can be
Figure BDA0003611042040000134
Set to 0 or 1. It can be understood that the related technical personnel can automatically make a pair according to the actual situation
Figure BDA0003611042040000135
The adjustment is not limited herein.
In some embodiments, the model may be adjusted based on the results of the validation of the model after each round of training
Figure BDA0003611042040000136
When adjusted well
Figure BDA0003611042040000137
Then the next round of model training is started, and the model training process can be carried out in an unaligned way
Figure BDA0003611042040000138
And (6) adjusting.
In some embodiments, the sample query results included in one sample data batch may correspond to a plurality of query contents, respectively, where the plurality of sample query results corresponding to the same query contents may correspond to different relevancy rating labels.
Sample pairs can be obtained from a plurality of sample query results corresponding to the same query content, wherein the sample pairs comprise first sample data and second sample data, and the numerical value of the relevance grade label corresponding to the first sample data is greater than the numerical value of the relevance grade label corresponding to the second sample data. And calculating the loss l of the sample pair corresponding to each query content by applying the formulaq. Further, a sample pair loss function corresponding to the sample data batch may be calculated by the following formula:
Figure BDA0003611042040000139
wherein, Uq(B) Representing a subset corresponding to different query contents in the sample data batch; i Uq(B) And | represents the number of the above-mentioned subsets in the sample data batch.
Fig. 4 presents a schematic diagram of sample pairs in an exemplary sample data batch.
In one example, as shown in FIG. 4, the batch size of a sample data batch is 8, and the sample query results in the batch respectively correspond to two query contents q1And q is2Wherein, corresponding to the query content q1Respectively comprise d11、d12、d13And d14And wherein the sample query result d11、d12、d13And d14Respectively corresponding to the degree of correlation label3. 2, 1 and 0 (namely, the sample query results are sorted from large to small according to the numerical value of the relevance grade label); corresponding to the query content q2Respectively comprise d21、d22、d23And d24And wherein the sample query result d21、d22、d23And d24Respectively corresponding to the relevance grade labels 3, 1 and 0 (namely, the sample query results are sorted from large to small according to the numerical values of the relevance grade labels).
Wherein, the content q is queried1Corresponding sample query result d11、d12、d13And d14In a sample pair, e.g., sample pair 411 is d11-d12Sample pair, wherein d11The value 3 of the corresponding relevance rank label is greater than d12The value of the corresponding relevance rank label is 2. Similarly, in querying content q1Corresponding sample query result d11、d12、d13And d14The sample pairs meeting the above condition further include sample pair 412 (d)11-d13Sample pair), sample pair 413 (d)11-d14Sample pair), sample pair 414 (d)12-d13Sample pair), sample pair 415 (d)12-d14Sample pair) and sample pair 416 (d)13-d14Sample pairs).
Based on the sample pairs and the formula, query content q can be obtained1Corresponding sample pair loss lq1
Similarly, in querying content q2Corresponding sample query result d21、d22、d23And d24The following pairs of samples can be obtained: sample pair 421 (d)21-d22Sample pair), sample pair 422 (d)21-d23Sample pair), sample pair 423 (d)21-d24Sample pair), sample pair 424 (d)22-d24Sample pair) and sample pair 425 (d)23-d24Sample pairs).
Based on the sample pairs and the formula, query content can be obtainedq2Corresponding sample pair loss lq2
Then, the sample pair loss function corresponding to the sample data batch can be calculated based on the formula of the sample pair loss function:
Figure BDA0003611042040000141
in some embodiments, calculating the sample pair loss function for the batch of sample data based on the respective relevancy prediction result and the respective relevancy rating label for each sample data may include: in response to not including at least one sample pair in the sample data batch, setting a sample pair loss function for the sample data batch to a predetermined value.
In some embodiments, the sample query results included in one sample data batch may all correspond to the same query content and have the same relevancy rating label, or the sample query results included in one sample data batch may correspond to a plurality of query contents, but the sample query results corresponding to each query content correspond to the same relevancy rating label, in which case, the sample pair loss function of the sample data batch may be set to a predetermined value, for example, to zero.
Therefore, when the sample data batch does not have the sample pairs meeting the requirements, the model parameters can be adjusted only based on the single sample loss function, and the influence of the sample pairs not meeting the training requirements on the model training is avoided.
In some embodiments, calculating the composite loss function for the batch of sample data based on the single-sample loss function and the sample-to-loss function may include: and calculating the comprehensive loss function of the sample data batch based on the first weight corresponding to the single-sample loss function, the second weight corresponding to the sample pair loss function, the single-sample loss function and the sample pair loss function.
Therefore, the comprehensive loss function is obtained by weighting and summing the single-sample loss function and the sample-to-loss function, so that the calculation amount is reduced and the model training efficiency is improved while the two training modes are combined.
In some embodiments, the single-sample loss function L of the sample data batch is obtained based on the above method respectivelyPointwiseSum sample pair loss function LPairwiseOn the basis, the comprehensive loss function of the sample data batch can be further obtained based on the following formula:
L=αLPointwise+(1-α)LPairwise
where α is a first weight, 1- α is a second weight, and α ∈ [0,1 ].
In some embodiments, calculating the composite loss function for the sample data batch may include: adjusting the first weight and the second weight according to the order of the sample data batch in the plurality of sample data batches, wherein the sum of the adjusted first weight and the adjusted second weight is equal to the sum of the first weight and the second weight; and calculating a comprehensive loss function of the sample data batch based on the adjusted first weight, the adjusted second weight, the single-sample loss function and the sample pair loss function.
In the training process, the related technicians may adjust the two weights according to the progress of model training (for example, may judge according to the sequence of the current batch used for training in the total batch), so that different training objectives and training effects (for example, emphasizing the prediction capability of the optimization model on the correlation score, or emphasizing the capability of the optimization model in the order) can be achieved when the model is trained by applying the data of each batch.
It is understood that, persons skilled in the relevant art can determine whether to adjust the first weight and the second weight and adjust the magnitudes of the first weight and the second weight according to practical situations, and the invention is not limited thereto.
In some embodiments, as shown in fig. 5, calculating the composite loss function for the sample data batch may include: step S501, determining a first weight corresponding to the sample data batch based on the order of the sample data batch in the sample data batches, a preset weight adjustment rate and a first preset weight, wherein the first preset weight is a weight corresponding to a single sample loss function of the first sample data batch input into the sorting model in the sample data batches; step S502, determining a second weight corresponding to the sample data batch based on the order of the sample data batch in the sample data batches, a preset weight adjustment rate and a second preset weight, wherein the second preset weight is the weight corresponding to the sample pair loss function of the first sample data batch input into the sorting model in the sample data batches, the first preset weight is greater than the second preset weight, and the sum of the first weight and the second weight is equal to the sum of the first preset weight and the second preset weight; and step S503, calculating the comprehensive loss function of the sample data batch based on the first weight, the second weight, the single-sample loss function and the sample pair loss function.
The initial value and the change rate of the weight are preset, so that the two weights are automatically adjusted according to the training progress in the training process of the model; by setting the weight of the single-sample loss function to a large value in the initial training stage, the model can be rapidly converged to the degree of roughly finishing classification; and in the later training stage, the weight of the sample on loss is gradually increased, so that the sequencing capability of the model is further optimized. Therefore, the model training efficiency can be improved while the model performance is optimized.
In some embodiments, the first and second preset weights (i.e. the initial values of the first and second weights) and the preset weight adjustment rate (e.g. the first weight is decreased by a predetermined value while the second weight is increased by a predetermined value every time a predetermined number of sample data batches are passed) may be determined according to the verification result of the model after the previous training round.
In order to verify that the training method for the ranking model of the information retrieval system can effectively improve the performance of the model, related technicians respectively use a training mode based on a single sample, a training mode based on a sample pair and the training method for the ranking model of the information retrieval system, train the ranking model and test the trained model, and the test results are shown in the following table:
Figure BDA0003611042040000171
NDCG (Normalized discrete Cumulative Gain) is a common index for evaluating the sequencing result. According to the test results, the ranking model trained by the training method for the ranking model of the information retrieval system has better comprehensive performance.
According to some embodiments, as shown in fig. 6, there is provided a ranking method for an information retrieval system, comprising: step S601, recalling a plurality of query results based on query contents; step S602, inputting each query result in the plurality of query results into a ranking model in combination with the query content so as to obtain a relevancy score between each query result in the plurality of query results and the query content through the ranking model, wherein the ranking model is obtained by training based on the training method for the ranking model of the information retrieval system; and step S603, determining the arrangement order of the plurality of query results based on the relevancy scores of each query result and the query contents in the plurality of query results.
According to some embodiments, as shown in fig. 7, there is provided an information retrieval system 700, comprising: a recall model 710 configured to recall a plurality of query results based on query content; a ranking model 720, configured to obtain a relevance score of each query result of the plurality of query results to the query content based on the plurality of query results and the query content, for ranking of the plurality of query results, wherein the ranking model is obtained based on training of the training method of the ranking model for the information retrieval system of the present disclosure.
According to some embodiments, as shown in fig. 8, there is provided a training apparatus 800 for a ranking model of an information retrieval system, comprising: the obtaining unit 810 is configured to obtain a plurality of sample data batches, where each sample data batch in the plurality of sample data batches includes a plurality of sample data, and each sample data in the plurality of sample data batches includes a sample query result, query content corresponding to the sample query result, and a relevance level tag indicating a magnitude of relevance between the sample query result and the corresponding query content; an execution unit 820 configured to execute the following sub-unit operations for each sample data batch in turn: an input subunit 821, configured to input the sample query result and the corresponding query content in each sample data in the sample data batch into the ranking model, so as to obtain a correlation prediction result between the sample query result corresponding to the sample data and the corresponding query content; a first calculating subunit 822, configured to calculate a single-sample loss function and a sample-to-loss function of the sample data batch based on the corresponding correlation prediction result and the corresponding correlation level label of each sample data; a second calculating subunit 823 configured to calculate a comprehensive loss function of the sample data batch based on the single-sample loss function and the sample-pair loss function; and an adjusting subunit 824 configured to adjust a plurality of parameters of the ranking model based on the synthetic loss function.
The operations of the unit 810, the unit 820 and the sub-units 821-824 in the training apparatus 800 for the ranking model of the information retrieval system are similar to the operations of the steps S201-S205 in the training method for the ranking model of the information retrieval system, and are not repeated herein.
According to some embodiments, the obtaining unit may include: the acquisition subunit is configured to acquire a plurality of sample data sets respectively corresponding to the plurality of query contents, wherein the plurality of sample data in each sample data set in the plurality of sample data sets are respectively sorted according to the numerical value of the corresponding relevancy grade label, and the sample data with the same relevancy grade label are randomly arranged; the arrangement subunit is configured to arrange the plurality of sample data sets according to a predetermined order to acquire a sample data sequence; and the dividing subunit is configured to divide the sample data sequence into a plurality of sample data batches according to a preset batch.
According to some embodiments, the first computing subunit may be configured to: in response to including at least one sample pair in the batch of sample data, calculating a sample pair loss function based on the respective prediction of the degree of correlation and the respective degree of correlation label for each sample data in each of the at least one sample pair, wherein each sample pair in the at least one sample pair includes the first sample data and the second sample data, and each sample pair in the at least one sample pair satisfies the following condition: the query content corresponding to the first sample data is the same as the query content corresponding to the second sample data; and the numerical value of the relevancy grade label corresponding to the first sample data is larger than the numerical value of the relevancy grade label corresponding to the second sample data.
According to some embodiments, the first computing subunit may be further configured to: in response to not including at least one sample pair in the batch of sample data, setting a sample pair loss function for the batch of sample data to a predetermined value.
According to some embodiments, the second calculation subunit may be configured to: and calculating the comprehensive loss function of the sample data batch based on the first weight corresponding to the single-sample loss function, the second weight corresponding to the sample pair loss function, the single-sample loss function and the sample pair loss function.
According to some embodiments, the second computing subunit may be further configured to: adjusting the first weight and the second weight according to the order of the sample data batch in the plurality of sample data batches, wherein the sum of the adjusted first weight and the adjusted second weight is equal to the sum of the first weight and the second weight; and calculating the comprehensive loss function of the sample data batch based on the adjusted first weight, the adjusted second weight, the single-sample loss function and the sample pair loss function.
According to some embodiments, the second computing subunit may be further configured to: determining a first weight corresponding to the sample data batch based on the order of the sample data batch in a plurality of sample data batches, a preset weight adjustment rate and a first preset weight, wherein the first preset weight is a weight corresponding to a single sample loss function of the first sample data batch input into the sorting model in the plurality of sample data batches; determining a second weight corresponding to the sample data batch based on the order of the sample data batch in the sample data batches, a preset weight adjustment rate and a second preset weight, wherein the second preset weight is the weight corresponding to the loss function of the sample data batch of the first sample data batch input into the sorting model in the sample data batches, the first preset weight is greater than the second preset weight, and the sum of the first weight and the second weight is equal to the sum of the first preset weight and the second preset weight; and calculating a comprehensive loss function of the sample data batch based on the first weight, the second weight, the single-sample loss function and the sample pair loss function.
According to some embodiments, as shown in fig. 9, there is provided a sorting apparatus 900 for an information retrieval system, including: a recall unit 910 configured to recall a plurality of query results based on the query content; an input unit 920 configured to input each query result of the plurality of query results in combination with the query content into a ranking model to obtain a relevancy score of each query result of the plurality of query results to the query content through the ranking model, wherein the ranking model is obtained by training based on the training method of the ranking model for the information retrieval system disclosed by the present disclosure; and a determining unit 930 configured to determine an arrangement order of the plurality of query results based on the relevancy score of each query result in the plurality of query results to the query content.
The operations of the units 910 to 930 in the ranking apparatus 900 for an information retrieval system are similar to the operations of the steps S601 to S603 in the training method for a ranking model of an information retrieval system, and are not repeated herein.
According to an embodiment of the present disclosure, there is also provided an electronic device, a readable storage medium, and a computer program product.
Referring to fig. 10, a block diagram of a structure of an electronic device 1000, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
A number of components in the electronic device 1000 are connected to the I/O interface 1005, including: input section 1006, output section 1007, storage section 1008, and communication section 1009. The input unit 1006 may be any type of device capable of inputting information to the electronic device 1000, and the input unit 1006 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 1007 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 1008 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.
Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above, such as the training method for the ranking model of the information retrieval system or the ranking method for the information retrieval system of the present disclosure. For example, in some embodiments, the training method for ranking models of information retrieval systems or the ranking method for information retrieval systems of the present disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1008. In some embodiments, part or all of a computer program may be loaded and/or installed onto the electronic device 1000 via the ROM 1002 and/or the communication unit 1009. When the computer program is loaded into RAM 1003 and executed by the computing unit 1001, one or more steps of the training method for a ranking model of an information retrieval system or the ranking method for an information retrieval system of the present disclosure described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured by any other suitable means (e.g., by means of firmware) to perform the training method for a ranking model of an information retrieval system or the ranking method for an information retrieval system of the present disclosure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims (20)

1. A method of training a ranking model for an information retrieval system, the method comprising:
obtaining a plurality of sample data batches, wherein each sample data batch in the plurality of sample data batches comprises a plurality of sample data, each sample data in the plurality of sample data batches comprises a sample query result, query content corresponding to the sample query result and a relevance grade label for indicating the relevance between the sample query result and the corresponding query content;
and sequentially carrying out the following operations on each sample data batch:
inputting the sample query result and the corresponding query content in each sample data in the sample data batch into the ranking model to obtain a correlation degree prediction result between the sample query result corresponding to the sample data and the corresponding query content;
calculating a single-sample loss function and a sample pair loss function of the sample data batch based on the corresponding correlation degree prediction result and the corresponding correlation degree grade label of each sample data;
calculating a comprehensive loss function of the sample data batch based on the single-sample loss function and the sample pair loss function; and
adjusting a plurality of parameters of the order model based on the synthetic loss function.
2. The method of claim 1, wherein said obtaining a plurality of batches of sample data comprises:
acquiring a plurality of sample data sets respectively corresponding to a plurality of query contents, wherein a plurality of sample data in each sample data set in the plurality of sample data sets are respectively sorted according to the numerical value of the corresponding relevancy grade label, and the sample data with the same relevancy grade label is randomly arranged;
arranging the plurality of sample data sets according to a preset sequence to obtain a sample data sequence; and
and dividing the sample data sequence into the plurality of sample data batches according to a preset batch.
3. The method of claim 2, wherein said computing the sample pair loss function for the batch of sample data based on the respective prediction of relevance for each sample data and the respective relevance tier label comprises:
in response to the batch of sample data including at least one sample pair, calculating the sample pair loss function based on the respective prediction of relevance for each sample data in each of the at least one sample pair and the respective relevance class label,
wherein each of the at least one sample pair comprises first and second sample data, and each of the at least one sample pair satisfies the following condition:
the query content corresponding to the first sample data is the same as the query content corresponding to the second sample data; and is provided with
And the numerical value of the relevancy grade label corresponding to the first sample data is greater than the numerical value of the relevancy grade label corresponding to the second sample data.
4. The method of claim 3, wherein said computing a sample-to-loss function for a batch of sample data based on a respective prediction of relevance for each sample data and a respective relevance rank label comprises:
in response to the sample data batch not containing said at least one sample pair, setting a sample pair loss function for the sample data batch to a predetermined value.
5. The method of any of claims 2 to 4, wherein said computing a composite loss function for the sample data batch based on the single-sample loss function and the sample-to-loss function comprises:
and calculating the comprehensive loss function of the sample data batch based on the first weight corresponding to the single-sample loss function, the second weight corresponding to the sample pair loss function, the single-sample loss function and the sample pair loss function.
6. The method of claim 5, wherein said calculating a composite loss function for the sample data batch comprises:
adjusting the first weight and the second weight according to the order of the batch of sample data in the plurality of batches of sample data, wherein the sum of the adjusted first weight and the adjusted second weight is equal to the sum of the first weight and the second weight;
and calculating the comprehensive loss function of the sample data batch based on the adjusted first weight, the adjusted second weight, the single-sample loss function and the sample pair loss function.
7. The method of claim 5, wherein said calculating the composite loss function for the sample data batch comprises:
determining a first weight corresponding to the sample data batch based on the order of the sample data batch in the sample data batches, a preset weight adjustment rate and the first preset weight, wherein the first preset weight is a weight corresponding to a single sample loss function of the first sample data batch input into the sorting model in the sample data batches;
determining a second weight corresponding to the sample data batch based on the order of the sample data batch in the sample data batches, the preset weight adjustment rate and the second preset weight, wherein the second preset weight is the weight corresponding to the sample pair loss function of the first sample data batch input into the sorting model in the sample data batches,
wherein the first preset weight is greater than the second preset weight, and a sum of the first weight and the second weight is equal to a sum of the first preset weight and the second preset weight; and
and calculating a comprehensive loss function of the sample data batch based on the first weight, the second weight, the single-sample loss function and the sample pair loss function.
8. A ranking method for an information retrieval system, the method comprising:
recalling a plurality of query results based on the query content;
inputting each query result in the plurality of query results into a ranking model in combination with the query content to obtain a relevancy score of each query result in the plurality of query results to the query content through the ranking model, wherein the ranking model is obtained by training based on the method of any one of claims 1 to 7; and
determining an order of ranking of the plurality of query results based on the relevancy score of each of the plurality of query results to the query content.
9. An information retrieval system comprising:
a recall model configured to recall a plurality of query results based on query content;
a ranking model configured to obtain a relevance score of each of the plurality of query results to the query content for ranking of the plurality of query results based on the plurality of query results and the query content, wherein the ranking model is trained based on the method of any one of claims 1-7.
10. A training apparatus for a ranking model of an information retrieval system, the apparatus comprising:
the system comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is configured to acquire a plurality of sample data batches, each sample data batch in the plurality of sample data batches comprises a plurality of sample data, and each sample data in the plurality of sample data comprises a sample query result, query content corresponding to the sample query result and a relevance grade label for indicating the relevance between the sample query result and the corresponding query content;
the execution unit is configured to execute the following sub-unit operations on each sample data batch in sequence:
the input subunit is configured to input the sample query result and the corresponding query content in each sample data in the sample data batch into the ranking model to obtain a correlation prediction result between the sample query result corresponding to the sample data and the corresponding query content;
the first calculation subunit is configured to calculate a single-sample loss function and a sample pair loss function of each sample data batch based on the corresponding correlation prediction result and the corresponding correlation grade label of each sample data;
a second calculation subunit configured to calculate a composite loss function for the sample data batch based on the single-sample loss function and the sample-pair loss function; and
an adjusting subunit configured to adjust a plurality of parameters of the ranking model based on the synthetic loss function.
11. The apparatus of claim 10, wherein the obtaining unit comprises:
the acquisition subunit is configured to acquire a plurality of sample data sets respectively corresponding to a plurality of query contents, wherein a plurality of sample data in each sample data set in the plurality of sample data sets are respectively sorted according to the numerical value of the corresponding relevancy grade label, and the sample data with the same relevancy grade label is randomly arranged;
a permutation subunit configured to permute the plurality of sample data sets according to a predetermined order to obtain a sample data sequence; and
and the dividing subunit is configured to divide the sample data sequence into the plurality of sample data batches according to a preset batch.
12. The apparatus of claim 11, wherein the first computing subunit is configured to:
responsive to including at least one sample pair in the batch of sample data, computing the sample pair loss function based on the respective prediction of the degree of correlation for each sample data in each of the at least one sample pair and the respective degree of correlation rating label,
wherein each of the at least one sample pair comprises first and second sample data and each of the at least one sample pair satisfies the condition:
the query content corresponding to the first sample data is the same as the query content corresponding to the second sample data; and is provided with
And the numerical value of the relevancy grade label corresponding to the first sample data is greater than the numerical value of the relevancy grade label corresponding to the second sample data.
13. The apparatus of claim 12, wherein the first computing subunit is further configured to:
in response to the sample data batch not containing said at least one sample pair, setting a sample pair loss function for the sample data batch to a predetermined value.
14. The apparatus of any of claims 11 to 13, wherein the second computing subunit is configured to:
and calculating the comprehensive loss function of the sample data batch based on the first weight corresponding to the single-sample loss function, the second weight corresponding to the sample pair loss function, the single-sample loss function and the sample pair loss function.
15. The apparatus of claim 14, wherein the second computing subunit is further configured to:
adjusting the first weight and the second weight according to the order of the batch of sample data in the plurality of batches of sample data, wherein the sum of the adjusted first weight and the adjusted second weight is equal to the sum of the first weight and the second weight;
and calculating the comprehensive loss function of the sample data batch based on the adjusted first weight, the adjusted second weight, the single-sample loss function and the sample pair loss function.
16. The apparatus of claim 14, wherein the second computing subunit is further configured to:
determining a first weight corresponding to the sample data batch based on the order of the sample data batch in the sample data batches, a preset weight adjustment rate and the first preset weight, wherein the first preset weight is a weight corresponding to a single sample loss function of the first sample data batch input into the sorting model in the sample data batches;
determining a second weight corresponding to the sample data batch based on the order of the sample data batch in the sample data batches, the preset weight adjustment rate and a second preset weight, wherein the second preset weight is a weight corresponding to a sample pair loss function of a first sample data batch input into the sorting model in the sample data batches,
wherein the first preset weight is greater than the second preset weight, and a sum of the first weight and the second weight is equal to a sum of the first preset weight and the second preset weight; and
and calculating the comprehensive loss function of the sample data batch based on the first weight, the second weight, the single-sample loss function and the sample pair loss function.
17. A ranking apparatus for an information retrieval system, the apparatus comprising:
a recall unit configured to recall a plurality of query results based on the query content;
an input unit configured to input each query result in the plurality of query results into a ranking model in combination with the query content to obtain a relevancy score of each query result in the plurality of query results to the query content through the ranking model, wherein the ranking model is obtained based on the method of any one of claims 1 to 7; and
a determining unit configured to determine an arrangement order of the plurality of query results based on the relevancy score of each of the plurality of query results to the query content.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
20. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-8 when executed by a processor.
CN202210431856.9A 2022-04-22 2022-04-22 Ranking model training method, device, medium and equipment of information retrieval system Pending CN114780846A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210431856.9A CN114780846A (en) 2022-04-22 2022-04-22 Ranking model training method, device, medium and equipment of information retrieval system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210431856.9A CN114780846A (en) 2022-04-22 2022-04-22 Ranking model training method, device, medium and equipment of information retrieval system

Publications (1)

Publication Number Publication Date
CN114780846A true CN114780846A (en) 2022-07-22

Family

ID=82432950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210431856.9A Pending CN114780846A (en) 2022-04-22 2022-04-22 Ranking model training method, device, medium and equipment of information retrieval system

Country Status (1)

Country Link
CN (1) CN114780846A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556275A (en) * 2024-01-11 2024-02-13 腾讯科技(深圳)有限公司 Correlation model data processing method, device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556275A (en) * 2024-01-11 2024-02-13 腾讯科技(深圳)有限公司 Correlation model data processing method, device, computer equipment and storage medium
CN117556275B (en) * 2024-01-11 2024-04-02 腾讯科技(深圳)有限公司 Correlation model data processing method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112579909A (en) Object recommendation method and device, computer equipment and medium
CN113407850B (en) Method and device for determining and acquiring virtual image and electronic equipment
CN113656668B (en) Retrieval method, management method, device, equipment and medium of multi-modal information base
US20240104154A1 (en) Ranking of recall data
WO2023245938A1 (en) Object recommendation method and apparatus
WO2023142406A1 (en) Ranking method and apparatus, ranking model training method and apparatus, and electronic device and medium
CN114611532B (en) Language model training method and device, and target translation error detection method and device
CN115168545A (en) Group searching method, device, electronic equipment and medium
CN115114424A (en) Response method and device for query request
CN114780846A (en) Ranking model training method, device, medium and equipment of information retrieval system
CN116541536A (en) Knowledge-enhanced content generation system, data generation method, device, and medium
CN115600646B (en) Language model training method, device, medium and equipment
CN114547252A (en) Text recognition method and device, electronic equipment and medium
CN114238745A (en) Method and device for providing search result, electronic equipment and medium
CN115809364B (en) Object recommendation method and model training method
EP4113398A2 (en) Data labeling processing method and apparatus, electronic device and medium
CN114117046A (en) Data processing method, device, electronic equipment and medium
CN115203544A (en) Recommendation method and device, electronic device and medium
CN115829653A (en) Method, device, equipment and medium for determining relevancy of advertisement text
CN114511742A (en) Image recognition method and device, electronic device and medium
CN115879468A (en) Text element extraction method, device and equipment based on natural language understanding
CN115564992A (en) Image classification method and training method of image classification model
CN116541537A (en) Knowledge graph-based enterprise trade information visual display method
CN114974263A (en) Identity authentication method, device, equipment and storage medium
CN114693977A (en) Image processing method, model training method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination