CN112231545A

CN112231545A - Sorting method, device and equipment of block aggregation and storage medium

Info

Publication number: CN112231545A
Application number: CN202011063125.0A
Authority: CN
Inventors: 祝升; 汤彪; 李宁; 高泽洲; 张敏; 王仲远; 张弓
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-15
Anticipated expiration: 2040-09-30
Also published as: CN112231545B

Abstract

The application discloses a method, a device, equipment and a storage medium for sorting a cluster set, and belongs to the field of machine learning. The method comprises the following steps: determining at least two block sets comprising i sorted block sets and j unordered block sets; acquiring first state features corresponding to the i sorted cluster block sets; based on each unordered cluster block set, calling a state transfer function to perform state transfer on the first state features to generate j second state features; sequentially inputting the j second state features into a scoring model, and outputting scoring values of each unordered cluster block set when the unordered cluster block sets are arranged at the t +1 th discrete time; and determining the unordered block set corresponding to the maximum scoring value as the block set arranged at the t +1 th discrete time. The method fully considers the influence of the list item form corresponding to each block on the space utilization rate of the display area, thereby achieving the effect of better utilizing the list item display area on the terminal.

Description

Sorting method, device and equipment of block aggregation and storage medium

Technical Field

The present application relates to the field of machine learning, and in particular, to a method, an apparatus, a device, and a storage medium for sorting a cluster set.

Background

The search technology is an important traffic inlet and content distribution channel in the internet field, and with the development of the search technology, the display of search results is changed from a single style to a plurality of styles.

Illustratively, at least two types of search results are displayed on a page in a list item form, each type of search results are aggregated together to form a block to be displayed on a list item, and the list item forms corresponding to different blocks may be different.

The diversity of list item forms corresponding to each block directly affects the space utilization rate of a list item display area on a page.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for sorting a cluster block set, and fully considers the influence of a list item form corresponding to each cluster block on the space utilization rate of a display area, so that the effect of better utilizing the list item display area on a terminal is achieved, for example, the effect of improving the space utilization rate of the list item display area on a page can be achieved. The technical scheme is as follows:

according to one aspect of the present application, a method for sorting a cluster set is provided, which is applied to a search service platform, and the method includes:

determining at least two block aggregation sets, wherein the block aggregation sets comprise result sets of search results of the same type and form information of list item forms when the result sets are displayed on a terminal, the form information and the types of the search results in the result sets have a one-to-one correspondence relationship, and the at least two block aggregation sets comprise i sorted block aggregation sets and j unordered block aggregation sets;

acquiring first state features corresponding to the i sorted aggregation block sets, wherein the first state features are used for indicating the sorting states of the i sorted aggregation block sets at the tth discrete moment;

based on each unordered cluster block set, calling a state transfer function to perform state transfer on the first state features to generate j second state features, wherein the g-th second state feature is used for indicating the ordering states of the i ordered cluster block sets and the g-th unordered cluster block set at t + 1-th discrete time;

sequentially inputting the j second state characteristics into a scoring model, and outputting scoring values of each unordered cluster block set arranged at the t +1 th discrete time by the scoring model;

and determining the unordered block set corresponding to the maximum score value as a block set arranged at the t +1 th discrete time, wherein i and t are non-negative integers, j and g are positive integers, and g is less than or equal to j.

According to another aspect of the present application, there is provided an apparatus for sorting a cluster block set, the apparatus including:

the determining module is used for determining at least two block aggregation sets, the block aggregation sets comprise result sets of the same type of search results and form information of list item forms when the result sets are displayed on a terminal, the form information and the types of the search results in the result sets have a one-to-one correspondence relationship, and the at least two block aggregation sets comprise i sorted block aggregation sets and j unordered block aggregation sets;

the acquisition module is used for acquiring first state characteristics corresponding to the i sorted aggregation block sets, and the first state characteristics are used for indicating the sorting states of the i sorted aggregation block sets at the tth discrete moment;

a generating module, configured to invoke a state transfer function to perform state transfer on the first state feature based on each unordered cluster block set, and generate j second state features, where the g-th second state feature is used to indicate an ordering state of the i-th ordered cluster block set and the g-th unordered cluster block set at the t + 1-th discrete time;

the scoring module is used for sequentially inputting the j second state characteristics into a scoring model, and the scoring model outputs the scoring value of each unordered block set arranged at the t +1 th discrete time;

the determining module is used for determining the unordered block aggregation corresponding to the maximum scoring value as the block aggregation arranged at the t +1 th discrete time, wherein i and t are non-negative integers, j and g are positive integers, and g is smaller than or equal to j.

According to another aspect of the present application, there is provided a computer apparatus, including: a processor and a memory, the memory storing a computer program that is loaded and executed by the processor to implement the method of sorting a set of tiles as described above.

According to another aspect of the present application, there is provided a computer-readable storage medium having stored therein a computer program, which is loaded and executed by a processor to implement the method for sorting a collection of clumps as described above.

According to another aspect of the present application, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions to cause the computer device to execute the sorting method of the aggregation block set as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the sorting process of the block aggregation, the search service platform constructs the block aggregation, so that the block aggregation comprises a result set of search results of the same type and form information of list item forms corresponding to the types of the search results in the block aggregation, and then the sorting sequence of the blocks formed by aggregating the search results of the same type when the blocks are displayed on the terminal is determined based on the block aggregation, and the influence of the list item forms corresponding to each block aggregation on the space utilization rate of the display area is fully considered, so that the effect of better utilizing the list item display area on the terminal is achieved, for example, the effect of improving the space utilization rate of the list item display area on a page can be achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates a block diagram of a computer system provided in an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating a method for ordering a collection of clumps provided by an exemplary embodiment of the present application;

FIG. 3 illustrates an interface diagram of a clump as presented in accordance with an exemplary embodiment of the present application;

FIG. 4 is a flow chart illustrating a method for ordering a collection of clumps provided by another exemplary embodiment of the present application;

FIG. 5 is a flow chart illustrating a method of training a scoring model provided by an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a deep reinforcement learning network according to an exemplary embodiment of the present application;

FIG. 7 is a flow chart illustrating a method for training a scoring model provided by another exemplary embodiment of the present application;

fig. 8 is a schematic structural diagram illustrating an apparatus for sorting a cluster block set according to an exemplary embodiment of the present application;

fig. 9 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference will first be made to several terms referred to in this application:

agglomeration: refers to a list item that is displayed on the terminal in an aggregation manner for the result set of the same type of search results. The clustering blocks are generated corresponding to the types of the search results, each type of the search results corresponds to one clustering block, optionally, the list item forms corresponding to different clustering blocks are the same or different, further, each type of the search results corresponds to one list item form, the list item forms corresponding to different types of the search results are the same or different, that is, the form information of the list item forms corresponding to different types of the search results is the same or different.

The cluster block comprises a result set of the search results of the same type and form information of a list item form when the result set is displayed on the terminal, namely the form information of the list item form of the cluster block; before being displayed on the terminal, the two pieces of information of the block are in a set mode, namely the block set. For example, the form information in the form of the list item may include the length and width of the list item, as shown in fig. 3, after searching for "shaddock", 4 blocks are displayed on the display page 11, and the widths of the list items are different among the

blocks

12, 13, and 14. Furthermore, the form information of the list item also includes the layout of the controls on the list item, such as the picture display control 17 on the cluster 12 on the left side, the text control 16 on the right side, the text control 16 on the cluster 13 on the left side, and the picture display control 17 on the right side in fig. 3. Furthermore, the form information of the list items also includes the shapes of the list items, such as the shapes of the cluster 13, the cluster 14 and the cluster 15 in fig. 3.

Deep reinforcement learning network: referred to as DQN (Deep Q-learning Network), which is used for the context based state S_tDetermine the next action A_tThe computer executes action A_tThen, state S_tTransition to State S_t+1. For example, in the present application, the environment refers to a scene where a cluster set corresponding to a search request is sequentially displayed on a terminal, the cluster set is sequentially sent to the terminal, and the ordering of the cluster set is described as follows, where an ordering state S of an ordered cluster set is defined_tWill S_tInput trainingCalculating the unsorted DQN, and calculating an unordered cluster block set A₀,A₁,A₂,……,A_jRespectively as the Q value of the next ordered aggregation block set, and determining the next ordered aggregation block set A from the j unordered aggregation block sets based on the Q value_t(ii) a Wherein t is a non-negative integer and j is a positive integer.

FIG. 1 shows a schematic structural diagram of a computer system 100 provided by an exemplary embodiment of the present application, where the computer system 100 includes a terminal 120 and a server cluster 140.

The terminal 120 may be an electronic device held by a user; the terminal 120 is installed and operated with a client, and optionally, the client may be a client of a search service platform, or an applet of the search service platform is operated in the client, or the client is a browser. The client, the applet, or the browser is provided with a search engine to provide a search service for the user. For example, the terminal 120 may implement functions of document search, video search, news search, and the like through a search engine. Optionally, the client may be at least one of a lifestyle service client, a payment client, a financial client, a communication client, and a game client, and the type of the client is not limited in this application.

Illustratively, the terminal 120 may include at least one of a smart phone, a tablet computer, an e-book reader, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4) player, a laptop portable computer, a desktop computer, and a notebook computer.

The terminals 120 and the server clusters 140 are connected through a wired or wireless network. The server cluster 140 is used to provide background services for the search service platform. Optionally, a database is disposed on the server cluster 140, and the database may provide search results for the search request sent by the terminal 120.

Optionally, the server cluster 140 includes a database, a content distribution server, and a search server; the search server receives a search request sent by the terminal 120; the search server requests the content requested by the search request to the content distribution server, the content distribution server obtains the content requested by the search request from the database and distributes the content requested by the search request to the search server, and the search server feeds back the content requested by the search request to the terminal 120.

Optionally, the server cluster 140 divides at least two search results corresponding to the search request, aggregates the search results of the same type to form a cluster set, and orders the cluster set and then returns the cluster set to the terminal 120. Illustratively, the server cluster 140 executes the sorting method of the cluster block set provided by the present application to sort the cluster block set of the search result, and sequentially sends the sorted cluster block set to the terminal 120, and the terminal 120 renders and displays the ordered cluster block set.

Illustratively, the terminal 120 sends a search request to a search server, and the search server requests the distribution server for content requested by the search request; the distribution server searches the content requested by the search request from a database to obtain at least two search results; the distribution server distributes the at least two search results to the search server; the search server divides the search results of the same type in at least two search results into a block aggregation set corresponding to the type, so that the block aggregation set comprises the result set of the search results of the same type, and form information when the result set is displayed on a terminal in a list item form, wherein the form information has a one-to-one correspondence relationship with the type of the search results in the result set; further, the search server executes the sorting method of the aggregation block set provided by the present application, and finally returns the sorted aggregation block set to the terminal 120; the terminal 120 receives the aggregation blocks sequentially sent by the search server, and displays the aggregation blocks corresponding to the aggregation block set in a list item form on a display interface according to the form information in the aggregation block set.

For example, the sorting method of the block aggregation may be applied to an application scenario of information search, for example, may be applied to a life service platform, a payment service platform, a communication service platform, and the like, where the service platforms all provide search services. For example, a "shaddock" is searched in a client of a living service platform, the living service platform searches for content related to the "shaddock" to obtain a search result related to the "shaddock", the search result includes a store link selling the shaddock, a store selling the shaddock is divided into a store link providing delivery service and a store link not providing delivery service, and the search result further includes popular science of the shaddock; the life service platform generates a block set 1 linked with a store providing delivery service, a block set 2 linked with a store not providing delivery service, and a block set 3 of science popularization knowledge of the shaddock, then calls a scoring model provided by the application to sort the three block sets to obtain a block set 2, a block set 1, and a block set 3 which are sequentially arranged, and sends the three block sets after the sorting to the client for display, as shown in the display of the first 3 blocks in fig. 3 for example.

The server cluster 140 may include at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Those skilled in the art will appreciate that the number of terminals 120 described above may be greater or fewer. For example, the number of the terminals 120 may be only one, or the number of the terminals 120 may be tens or hundreds, or more, and the number of the terminals 120 and the type of the device are not limited in the embodiment of the present application.

Fig. 2 is a flowchart illustrating a sorting method for a collection of clumps according to an exemplary embodiment of the present application, where the method is applied to a search service platform, and the method includes:

step 201, at least two block aggregation sets are determined, wherein the at least two block aggregation sets comprise i sorted block aggregation sets and j unordered block aggregation sets.

The block aggregation comprises a result set of the search results of the same type and form information of a list item form when the result set is displayed on a terminal, and the form information and the types of the search results in the result set have one-to-one correspondence.

After receiving a search request sent by a terminal, a search service platform acquires r search results corresponding to the search request, wherein r is a positive integer; and dividing the r search results according to the types of the search results, dividing the r search results into the block aggregation sets matched with the types of the search results respectively to obtain j unordered block aggregation sets, wherein the block aggregation sets are not ordered at the moment, and no ordered block aggregation set exists, namely i is 0.

Exemplarily, when r is 1, then j is 1 set of unordered aggregation blocks; when r is greater than 1, if the types of r search results are the same, obtaining j as 1 unordered block aggregation; when r >1, if the types of the r search results are different, j unordered block aggregation sets are obtained, and j > 1.

It should be noted that, when j is 1, the search service platform directly feeds back the 1 unsorted aggregation block set to the terminal without sorting the unsorted aggregation block set; and when j is greater than 1, the search service platform executes the following steps 202 to 205, sorts the unordered cluster block sets, and feeds back the sorted cluster block sets to the terminal for display. Exemplarily, after finishing sorting d aggregation blocks, the search service platform sends the d sorted aggregation blocks as a batch to the terminal; the search service platform continues to sort the aggregation sets, and d aggregation sets of the next batch are sorted out; if the search service platform receives an acquisition request of a block set sent by a terminal, d block sets in a batch which are well ordered are sent to the terminal, and the acquisition request is sent to the search service platform when the terminal receives a refreshing operation or a page turning operation; if the search service platform does not receive the acquisition request of the block aggregation sent by the terminal, the sorting of the block aggregation is suspended, or the sorting of the block aggregation of the next batch is continued; wherein d is a positive integer.

In the process of determining at least two block aggregation sets, if the block aggregation sets in the first round are sorted, no sorted block aggregation set exists, and j unordered block aggregation sets are determined, namely i is 0, and j is greater than 0; and if the cluster block sets after the first round are sorted, determining i sorted cluster block sets and j unordered cluster block sets, wherein i is greater than 0, and j is greater than 0.

For example, the types of the search results can be classified according to the source of the results, the nature of the content of the results, or the type of the service to which the results belong. The result source refers to a provider of the search result, for example, the search result may be provided by encyclopedia, or by a life service platform, or by a knowledge library, etc.; the result content property refers to the content form, for example, the content of the search result may be a paper, or news, or video, or picture, or prose, or poetry; the service type of the result refers to the type of service that the search result can provide, for example, the search result may provide navigation service, or provide delivery service, or provide predetermined service, etc. For example, as shown in fig. 3, after "shaddock" is searched, 4 blocks are displayed on the display page 11, and 4 search results of different business types are displayed on the 4 blocks, and each of the 4 blocks includes 4 types of "go to store", "delivery to home", "content", and "offer", where 3 results are displayed in a block corresponding to "go to store", "2 results are displayed in a block corresponding to" delivery to home "," the width of a block corresponding to "go to store" is different from that of a block corresponding to "delivery to home", and the structures when search results are displayed in a block corresponding to "go to store" and a block corresponding to "delivery to home" are also different; the type division of the search results in this embodiment is not limited.

Optionally, the search request includes search content. Illustratively, the search content may be keywords or keywords; the search service platform searches for content related to the keyword or keywords. Optionally, the search request further includes a search user characteristic. Illustratively, a user can log in a user account on a search service platform through a terminal, when the user uses the terminal to search for search content, the terminal sends a search request to the search service platform through the user account, and correspondingly, the search user characteristics refer to search behavior characteristics of the user account sending the search request, such as search content which the user is interested in, browsing behavior of the user, and the like; the search service platform searches for related contents of keywords or keywords that are consistent with the characteristics of the search user.

Step 202, obtaining first state features corresponding to the i sorted aggregation block sets.

When the search service platform generates the second state feature at the t +1 th discrete moment, the first state feature at the t +1 th discrete moment is generated, and the search service platform directly acquires the generated first state feature; wherein the first state feature is used to indicate the sorting state of the i sorted aggregation block sets at the tth discrete time, and i is a non-negative integer.

Illustratively, the first state feature may be represented as S_t＝{Z_t}，Z_tIs the ordering status of the i ordered cluster block sets at the t-th discrete time, i.e. Z_t＝{a₁,a₂,……,a_i}，a_iIs the ith sorted cluster set.

For example, the search service platform may also directly generate the first state feature, for example, the search service platform obtains i sorted cluster block sets, splices and combines the i sorted cluster block sets according to the arrangement order, and generates the sorting state Z of the i sorted cluster block sets at the tth discrete time_tFurther generating a first state feature S_t。

It should be noted that, in this step, at the 0 th discrete time, there is no sorted aggregation block set (i.e., i ═ 0), and the sorting status may be represented as an empty set Φ, i.e., Z₀＝{Φ}。

Step 203, based on each unordered cluster block set, calling a state transfer function to perform state transfer on the first state features, and generating j second state features.

J unordered cluster block sets exist on a search service platform, each unordered cluster block set is respectively used as a cluster block set ordered at the t +1 th discrete time, a second state feature after the state transfer of the first state feature is determined by adopting a state transfer function, wherein the g-th second state feature is used for indicating the ordering states of the i ordered cluster block sets and the g-th unordered cluster block set at the t +1 th discrete time; g is a positive integer less than or equal to j.

Illustratively, for j unordered gather sets A₁,A₂,……,A_jBased on the g-th unordered cluster block set A_gUsing a state transfer function T to the first state characteristic S_tMaking a state transition, the second state function S_t+1＝T(S_t,A_g)＝{a₁,a₂,……,a_i,A_g}; i.e. the g-th unordered cluster block set A_gAs the (i + 1) th ordered cluster set a_i+1To Z_tFinally, j second state features corresponding to the j unordered cluster block sets are obtained. At this time, the j second state features are all the second state features to be determined, and are not the second state features at the t +1 th discrete time which are finally determined.

And step 204, sequentially inputting the j second state characteristics into a scoring model, and outputting scoring values of each unordered block set arranged at the t +1 th discrete time by the scoring model.

A scoring model is arranged on the search service platform and used for scoring the display effect of each unordered cluster block set when the unordered cluster block sets are arranged at the t +1 th discrete time; the display effect comprises the utilization rate of the display space on the terminal by the block aggregation sequencing. Optionally, the above-mentioned display effect further includes at least one of a click rate of the cluster set by the cluster set ordering and a conversion rate of the search result in the cluster set.

The click rate refers to the ratio of the number of released times to the number of viewed times of the block set sorted at the t +1 th discrete time; the conversion rate is a ratio of the participation times to the viewed times, and the participation times are the times of the user participating in the content in the search result when the search result in the block aggregation is viewed.

And the search service platform inputs the j second state characteristics into the scoring model in sequence, and the scoring model outputs the scoring value of each unordered block set when the unordered block set is ordered at the t +1 th discrete time. Optionally, the scoring model is obtained by DQN training, and the scoring value is a Q value; when the search service platform scores the second state features, firstly performing word embedding processing on each second state feature to obtain j second state feature vectors corresponding to the j second state features; then, inputting each second state feature vector into the scoring model, and outputting a Q value corresponding to each second state feature vector by the scoring model, namely, the Q value of each unordered block set sorted at the t +1 th discrete time.

Step 205, determining the unordered block set corresponding to the maximum score value as the block set arranged at the t +1 th discrete time.

And the search service platform determines the unordered block set corresponding to the maximum credit value as the block set ordered at the t +1 th discrete time. Optionally, if a scoring model obtained by DQN training is used to obtain a scoring value of each unordered cluster block set when arranged at the t +1 th discrete time, the search service platform determines, through a greedy algorithm in the scoring model, a cluster block set arranged at the t +1 th discrete time from the j unordered cluster block sets, and outputs the cluster block set arranged at the t +1 th discrete time and a corresponding Q value. After the aggregation block sets arranged at the t +1 th discrete time and the corresponding Q values are output, continuously sequencing the aggregation block sets arranged at the t +2 th discrete time, and after d aggregation block sets of one batch are arranged, sending the arranged aggregation block sets to the terminal according to the acquisition requirements of the terminal. It should be noted that the search service platform is disposed on a server.

In summary, in the sorting method for the aggregation block set provided in this embodiment, in the sorting process of the aggregation block set, the search service platform constructs the aggregation block set, so that the aggregation block set includes both the result set of the search result of the same type and the form information of the list item form corresponding to the type of the search result in the aggregation block set, and then determines the arrangement order of the aggregation blocks formed by aggregating the search results of the same type when the aggregation block is displayed on the terminal based on the aggregation block set, so that the influence of the list item form corresponding to each aggregation block on the space utilization rate of the display area is fully considered, thereby achieving a better effect of utilizing the list item display area on the terminal, for example, achieving an effect of improving the space utilization rate of the list item display area on the page.

The method adopts a grading model obtained by DQN training, can determine a more accurate and more appropriate sorting state at the next discrete time based on the sorting state at the current discrete time, and can also determine the cluster set sorted at the next discrete time more efficiently.

Since the influence of the list item form on the sorting of the block aggregation is considered, the element in the list item form is added when the block aggregation is sorted, and the diversity in the list item form is considered again to influence the response performance of the scoring model, the form information in the list item form carried by the block aggregation is simplified in the sorting process, as shown in fig. 4, for example, based on the embodiment shown in fig. 2, the above step 203 may include the following steps 2031 to 2034:

step 2031, reading the form information in each unordered cluster block set to obtain j form information.

When the search service platform sorts the unordered cluster block sets, firstly, the form information in the form of the list items in each unordered cluster block set is read, and j form information corresponding to j unordered cluster block sets is obtained. Optionally, the form information of the list item form includes at least one of a length and a width, a shape, and a structural element of the list item. Wherein, the structural element may refer to a control element on the composition list item. Illustratively, the search service platform reads the form information of the list items such as length, width, shape and structural elements from each unordered cluster block set.

Step 2032, based on the correspondence, finding out the form grade information corresponding to each form information, and obtaining j form grade information corresponding to j form information.

The search service platform comprises a corresponding relation between form information and form grade information, and the form grade information is used for representing the influence degree of a list item form on the sorting of the aggregation block set. For j unordered cluster block sets, the search service platform searches for form grade information corresponding to the form information in the g-th unordered cluster block set based on the corresponding relation to obtain g-th form grade information corresponding to the g-th unordered cluster block set, and finally obtains j form grade information corresponding to the j form information.

Optionally, if the form information includes the length and width of the list item, the correspondence includes a correspondence between the ratio of the length and width of the list item and the form level information; for the determination of the form grade information of the g-th unordered cluster block set, the search service platform calculates the length-width ratio of the list items according to the length and width of the list items; then, searching form grade information corresponding to the length-width ratio based on the corresponding relation to obtain the g-th form grade information corresponding to the g-th unordered cluster block set; sequentially determining form grade information corresponding to the j form information to finally obtain j form grade information corresponding to the j form information; the length-width ratio of the list item and the form level information are in a negative correlation relationship, that is, the larger the length-width ratio is, the lower the form level in the form level information is, the higher the negative influence on the sorting of the block set is, for example, the lower the form level is, the lower the utilization rate of the block set on the display space on the terminal is.

Optionally, if the form information further includes a shape of the list item, a corresponding relationship between the shape of the list item and a first adjustment weight is further set on the search service platform, and the adjustment weight is used to adjust the form level information determined by the length and width of the list item. Illustratively, if a list item is in the shape of a rounded rectangle, a length-width ratio a is calculated based on the length and width of the rounded rectangle, a form grade value b corresponding to the length-width ratio a is found based on the correspondence between the length-width ratio and the form grade information, a first adjustment weight p corresponding to the rounded rectangle is determined based on the correspondence between the shape and the first adjustment weight, a search service platform adjusts the form grade value b based on the first adjustment weight p, and then determines the form grade information based on the adjusted form grade value b', for example, the form grade value can be directly determined as the form grade information, or the form grade value is mapped as the form grade, and the form grade is determined as the form grade information. Illustratively, the magnitude of the first adjustment weight is determined by the proportion of a shape in the rectangle in which the shape is located, the proportion being in positive correlation with the first adjustment weight.

Optionally, if the form information further includes a structural element of the list item, the search service platform is further provided with a corresponding relationship between the structural element and the second adjustment weight. Illustratively, the magnitude of the second adjustment weight is determined by the density of the arrangement of the structural elements on the list items, which is in positive correlation with the second adjustment weight. Exemplarily, on the premise that the form information includes the length and the width of the list item and the structural element, the search service platform determines a second adjustment weight q based on the corresponding relationship between the structural element and the second adjustment weight, and then adjusts the form grade value b based on the second adjustment weight q; on the premise that the form information includes length, width, shape and structural elements of the list item, the search service platform adjusts the form rank value b' based on the second adjustment weight q.

For example, the correspondence between the form information and the form level information may further include a first sub-correspondence between the length and the width of the list item and the form level information, a second sub-correspondence between the shape of the list item and the form level information, and a third sub-correspondence between the structural element of the list item and the form level information, different weights are set for the form level information corresponding to the different form information, and at least one form level information determined by integration is obtained based on the set weights.

Or, the corresponding relationship between the form information and the form level information may further include a corresponding relationship between the length and the width of different list items and the form level information, which are set for different shapes of the list items, the corresponding relationship between the length and the width of the list item and the form level information is determined based on the shapes of the list items, and then the form level information corresponding to the form information is determined based on the corresponding relationship between the length and the width of the list item and the form level information. It should be noted that the determination method of the form grade information may also include other determination methods of various form grade information, and is not limited to the above-described method.

Step 2033, replacing j pieces of form information in the j unordered cluster block sets with j pieces of form grade information in a one-to-one correspondence manner, and obtaining j replaced unordered cluster block sets.

The search service platform sequentially replaces the form information in the j unordered cluster block sets with form grade information to obtain j replaced unordered cluster block sets, for example, the search service platform replaces the form information in the c-th unordered cluster block set with form grade information, then continues to replace the form information in the c + 1-th unordered cluster block set with form grade information until the form information in the j unordered cluster block sets is replaced, and c is a positive integer greater than or equal to 1 and less than or equal to j.

Step 2034, based on each replaced unordered cluster block set, call a state transfer function to perform state transfer on the first state features, and generate j second state features.

And the search service platform calls a state transfer function to perform state transfer on the first state characteristics based on each replaced unordered cluster set to generate j second state characteristics.

Optionally, the first state feature includes search content and search user feature carried by a search request corresponding to the search result, and a first state, where the first state is a sorting state of the i sorted aggregation sets at the tth discrete time. Illustratively, if the search content is denoted as Q and the search user characteristic is denoted as U, then the first state characteristic is denoted as S_t＝{Q,U,Z_t}＝{Q,U,[a₁,a₂,……,a_i]}; for j replaced unordered gather A₁,A₂,……,A_jBased on the g-th replaced unordered cluster block set A_gUsing a state transfer function T to the first state characteristic S_tMaking a state transition, the second state function S_t+1＝T(S_t,A_g)＝T([Q,U,Z_t],A_g)＝(Q,U,[Z_t+A_g])＝(Q,U,[a₁,a₂,……,a_i,A_g])＝(Q,U,Z_t+1)，Z_t+1Means the ordering state of i ordered poly block sets and g unordered poly block sets at t +1 discrete time, that is, the g unordered poly block set A_gAs the (i + 1) th ordered cluster set a_i+1To Z_tFinally, j second state features corresponding to the j unordered cluster block sets are obtained.

Optionally, the search service platform reads the first state in the first state feature; calling a state transfer function to perform state transfer on the first state based on each replaced unordered cluster block set to generate j second states, wherein the g second state is an ordered state of the i ordered cluster block sets and the g unordered cluster block sets at t +1 discrete time; and respectively combining the search content and the search user characteristics with each second state to generate j second state characteristics.

Illustratively, the search service platform reads the first state feature { Q, U, Z_tFirst state Z in_tFor j replaced unordered cluster block sets A₁,A₂,……,A_jBased on the g-th replaced unordered cluster block set A_gUsing a state transfer function T for a first state Z_tMaking a state transition to a second state Z_t+1＝T(Z_t,A_g)＝{a₁,a₂,……,a_i,A_g}; respectively associating the search content Q and the search user characteristics U with each second state Z_t+1Combine to generate j second state features { Q, U, Z_t+1}。

It should be noted that, for the replacement of the form information in the form of list items in the unordered cluster block set, before the cluster block set is sorted, in the process of sorting the cluster block set, the unsorted cluster block set after the replacement is directly obtained for sorting.

In summary, the sorting method for the aggregation block set provided by this embodiment replaces the form information such as the length, the width, the shape, and the element structure of the list items in the unordered aggregation block set with the form level information, and compared with the detailed form information, the form level information is easier to perform feature extraction on the premise of ensuring that the influence degree of the list item form on the sorting of the aggregation block set is clearly expressed, so that the computation amount of machine learning is reduced, and thus the response efficiency of the sorting of the aggregation block set is improved.

In the method, search content and search user characteristics are added into the first state characteristics, namely, the relevance of search results and search content and user requirements are also considered, so that the blocks displayed after sorting better meet the user requirements, the click rate of the blocks is further improved, and even the conversion rate of the search results in the blocks is improved.

The scoring model used in the above embodiment is obtained by DQN training, and the training method of the scoring model is described in detail, as shown in fig. 5, the steps are as follows:

step 301, obtaining historical sorting data from a database, where the historical sorting data includes historical block sets corresponding to at least two historical search requests.

The historical block aggregation set comprises historical result sets of historical search results of the same type and historical form information of list item forms when the historical result sets are displayed on the terminal, and the historical form information and the types of the historical search results in the historical result sets have one-to-one correspondence.

Illustratively, the scoring model is trained on a model training platform, for example, a search service platform may also be used as the model training platform; the model training platform is arranged on the server. Historical sorting data is stored in the database, and the historical sorting data can be directly stored in the database after being generated in the application process by the scoring model, or collected and stored in the database after being sorted according to the data format of the historical block set.

The historical block set also carries click feedback of the historical block set, and the click feedback indicates whether the historical block set is clicked to check during sequencing display; for example, the click feedback may be represented by "0" and "1", where "0" represents that the historical chunk set was not clicked to view in the ranking presentation, and "1" represents that the historical chunk set was clicked to view in the ranking presentation.

Step 302, based on the ranking order of the history cluster sets corresponding to each history search request, generating a first history state feature at the T-th discrete time for the first k history cluster sets, and generating a second history state feature at the T + 1-th discrete time for the first k +1 history cluster sets.

For each historical search request, the model training platform generates a historical state feature of each historical block set corresponding to the historical search request, namely, based on the arrangement sequence of the historical block sets corresponding to the historical search requests, a first historical state feature at the Tth discrete time is generated for the first k historical block sets, and a second historical state feature at the Tth +1 discrete time is generated for the first k +1 historical block sets.

Illustratively, the state feature S at the 0 th discrete time is generated₀Phi, phi; if k takes 1 at the 1 st discrete time, generating the state characteristic S at the 1 st discrete time₁＝{a₁}; by analogy, the state feature S at the 2 nd discrete moment is generated₂＝{a₁,a₂… …, generating a state characteristic S at the Tth discrete time_T＝{a₁,a₂,……,a_TAnd generating state characteristics S at the T +1 th discrete time_T+1＝{a₁,a₂,……,a_T+1}。

For example, when the first historical state feature at the tth discrete time is known, the state transition function may be used to perform state transition on the first historical state feature based on the historical block set ordered at the tth discrete time, so as to obtain the second historical state feature.

Step 303, inputting the first historical state feature and the second historical state feature of each historical search request into a deep reinforcement learning network, and training the deep reinforcement learning network to obtain a scoring model, wherein T, k is a non-negative integer.

The DQN includes a first network and a second network. Illustratively, the first network refers to an evaluation network (eval-net); the second network is referred to as a target-net, also called fixed network. In the training process, the loss calculated by a loss function (loss function) is used for updating the network parameters in the first network; correspondingly, the model training platform calculates a first Q value corresponding to the first historical state characteristic by adopting a first network; calculating a second Q value corresponding to the second historical state characteristic by adopting a second network; and updating the network parameters in the first network according to the first Q value and the second Q value until the error between the first Q value and the second Q value is converged to obtain the trained scoring model.

Exemplarily, as shown in fig. 6, a schematic diagram of the structure of DQN is shown, a first state is characterized by S_TWith the historical block set A (S) ordered at the T +1 th discrete time_T) Input into the first network 21 to obtain a first Q value Q (S)_T,A(S_T) ); characterizing the second state S_T+1With the historical block set A (S) ordered at the T +2 discrete time_T+1) Inputting into the second network 22 to obtain a second Q value Q (S)_T+1,A(S_T+1) ); calculating the error Loss between the first Q value and the second Q value, wherein the formula is as follows:

Loss＝R_T+γ*max Q(S_T+1,A(S_T+1))-Q(S_T,A(S_T))；---(1)

updating the network parameters of the first network based on the error Loss, namely updating the Q value in the first network, wherein the updating formula is as follows:

Q(S_T,A(S_T))←Q(S_T,A(S_T))+α[R_T+γ*max Q(S_T+1,A(S_T+1))-Q(S_T,A(S_T))]；---(2)

wherein R in the above formula_TMeans A (S)_T) Click feedback of (1); for example, if at the 1 st discrete time k is 1, a (S) is determined_T) Namely a_T+1(ii) a Gamma and alpha are network parameters set in the model; max Q (S)_T+1,A(S_T+1) Means that at least two A (S) are present_T+1) Corresponds to Q (S)_T+1,A(S_T+1) Maximum value of).

Optionally, the model training platform further copies and updates the network parameters in the first network to the second network according to a specified period, and based on the updated second network, the model training platform cooperates with the first network to continue the overall training of the DQN.

In summary, in the training method for the scoring model provided in this embodiment, the DQN is trained by using the historical block set, the historical block set includes both the historical result set of the historical search result of the same type and the form information of the historical list item form corresponding to the type of the historical search result in the historical block set, so that the trained scoring model can evaluate a better second state based on the first state at the tth discrete time, thereby determining the block set ordered at the t +1 th discrete time, and fully considering the influence of the list item form corresponding to each block on the space utilization rate of the display area, thereby achieving a better effect of utilizing the list item display area on the terminal, for example, achieving an effect of improving the space utilization rate of the list item display area on the page.

In some alternative embodiments, the above-mentioned form information in the form of history list entries in the history cluster block set used by the training DQN may also be replaced by history form level information, for example, as shown in fig. 7, step 302 in the embodiment shown in fig. 5 may include the following steps 3021 to 3026:

step 3021, based on the ranking order of the history cluster block sets corresponding to each history search request, generating a first history state feature at the Tth discrete time for the first k history cluster block sets.

Illustratively, if T is 0, k is 0, and the historical state feature Z at the 0 th discrete time is generated₀Phi, phi; if T>0, the first historical state feature at the tth discrete time may be obtained by calling a state transition function to perform state transition on the historical state feature at the T-1 discrete time based on the historical block aggregation at the tth discrete time.

Or the first historical state feature at the tth discrete time may be obtained by splicing and combining the first k discrete block sets according to the sorting order.

Step 3022, a T +1 th history cluster block set arranged at the T +1 th discrete time in the arrangement order is obtained.

And the model training platform acquires the T +1 th historical block aggregation arranged at the T +1 th discrete time according to the arrangement sequence of the historical block aggregation corresponding to the historical search request.

Step 3023, reading the historical form information in the T +1 th historical block set.

The historical block collection comprises historical form information in a historical list item form, and the model training platform reads the historical form information from the T +1 th historical block collection. Optionally, the history form information includes at least one of length, width, shape, and structural element of the history list item.

And step 3024, finding out historical form grade information corresponding to the historical form information in the T +1 th historical block set based on the corresponding relation.

The model training platform has a corresponding relation between form information and form grade information, and the form grade information is used for representing the influence degree of a list item form on the sorting of the aggregation block set; and the model training platform finds out historical form grade information corresponding to the historical form information in the T +1 th historical block set based on the corresponding relation.

Optionally, the historical form information includes the length and width of the historical list item, and the corresponding relationship includes the corresponding relationship between the length-width ratio of the list item and the form level information; for the determination of the historical form grade information, the model training platform calculates the historical length-width ratio according to the length and width of the historical list items in the T +1 th historical block aggregation set; searching historical form grade information corresponding to the historical length-width ratio based on the corresponding relation; wherein, the length-width ratio of the list item is in negative correlation with the form grade information.

Optionally, the historical form information includes a shape of a historical list item, and the comparison relationship includes a correspondence between the shape of the list item and the form level information; for the determination of the historical form grade information, the model training platform finds out the historical form grade information corresponding to the shape of the historical list item in the T +1 th historical block set based on the corresponding relation; wherein, the form grade information and the proportion of the shape in the rectangle are in positive correlation.

Optionally, the historical form information includes length and width of the historical list item and a structural element, and the comparison relationship includes a corresponding relationship between the density of the structural element distributed on the list item and the historical form grade information; for the determination of the historical form grade information, the model training platform calculates the distribution density of the structural elements on the list items according to the length and the width of the historical list items in the T +1 th historical block set and the structural elements; searching out historical form grade information corresponding to the intensity based on the comparison relation; wherein the form grade information is in positive correlation with the distribution density of the structural elements on the list items. It should be noted that the determination method of the form grade information may also include other determination methods of various form grade information, and is not limited to the above-described method.

And step 3025, replacing the historical form information in the T +1 th historical block clustering set with the historical form grade information to obtain a replaced T +1 th historical block clustering set.

And the model training platform replaces the historical form information in the T +1 th historical block clustering set with the historical form grade information to obtain the T +1 th historical block clustering set after replacement.

Step 3026, based on the T +1 th history block set after replacement, calling a state transfer function to perform state transfer on the first history state feature, and generating a second history state feature.

And calling a state transfer function to perform state transfer on the first historical state feature by the model training platform based on the T +1 th historical block set after replacement to generate a second historical state feature.

Optionally, the first history state feature includes history search content and history search user feature carried by the history search request, and a first history state, and the first history state is an ordering state of the top k history block aggregation at the tth discrete time. Optionally, for the generation of the second historical state feature, reading a first historical state in the first historical state feature; based on the T +1 th history block set after replacement, calling a state transfer function to carry out state transfer on the first history state to generate a second history state, wherein the second history state is the sequencing state of the first k +1 history block sets at the T +1 th discrete time; and sequentially combining the historical search content, the historical search user characteristics and the second historical state to generate second historical state characteristics.

In summary, in the training method for the scoring model provided in this embodiment, the form information such as the length, the width, the shape, and the element structure of the list items in the historical block set is replaced with the form grade information, and compared with the detailed form information, the form grade information is easier to perform feature extraction on the premise of ensuring that the influence degree of the list item form on the ranking of the block set is clearly expressed, so that the computation load of machine learning is reduced, the training efficiency of the scoring model is improved, and the response efficiency of the ranking of the block set can be improved when the scoring model is applied.

Fig. 8 is a block diagram illustrating an apparatus for sorting a collection of blocks according to an exemplary embodiment of the present application. The apparatus may be implemented as part or all of a server or a terminal by software, hardware, or a combination of both. The device includes:

a determining module 401, configured to determine at least two aggregation block sets, where the aggregation block sets include result sets of search results of the same type and form information of list item forms when the result sets are displayed on a terminal, where the form information and types of the search results in the result sets have a one-to-one correspondence relationship, and the at least two aggregation block sets include i sorted aggregation block sets and j unordered aggregation block sets;

an obtaining module 402, configured to obtain first state features corresponding to the i sorted aggregation block sets, where the first state features are used to indicate sorting states of the i sorted aggregation block sets at a tth discrete time;

a generating module 403, configured to invoke a state transfer function to perform state transfer on the first state feature based on each unordered cluster block set, and generate j second state features, where the g-th second state feature is used to indicate an ordering state of the i-th ordered cluster block set and the g-th unordered cluster block set at the t + 1-th discrete time;

the scoring module 404 is configured to sequentially input the j second state features into a scoring model, and the scoring model outputs a scoring value of each unordered cluster block set when the unordered cluster block sets are arranged at the t +1 th discrete time;

a determining module 401, configured to determine an unordered block set corresponding to the maximum score value as a block set arranged at the t +1 th discrete time, where i and t are non-negative integers, j and g are positive integers, and g is less than or equal to j.

In some embodiments, the search service platform includes a corresponding relationship between form information and form grade information, where the form grade information is used to indicate a degree of influence of a list item form on the sorting of the aggregation block set; the generating module 403 includes:

the first reading submodule 4031 is used for reading the form information in each unordered cluster block set to obtain j form information;

the first searching submodule 4032 is configured to search, based on the correspondence, form level information corresponding to each piece of form information to obtain j pieces of form level information corresponding to the j pieces of form information;

a first replacement sub-module 4033, configured to replace j pieces of formal information in the j unordered cluster block sets with j pieces of formal grade information in a one-to-one correspondence manner, so as to obtain j replaced unordered cluster block sets;

the first generation submodule 4034 is configured to call a state transfer function to perform state transfer on the first state feature based on each replaced unsorted cluster set, and generate j second state features.

In some embodiments, the format information includes a length and width of the list item; the corresponding relation comprises the corresponding relation between the length-width ratio of the list items and the form grade information;

the first lookup submodule 4032 is used for calculating the length-width ratio of the list items according to the length and the width of the list items; finding out form grade information corresponding to the length-width ratio based on the corresponding relation to obtain j form grade information corresponding to the j form information; wherein, the length-width ratio of the list item is in negative correlation with the form grade information.

In some embodiments, the first state features include search content and search user features carried by a search request corresponding to a search result, and a first state, where the first state is a sorting state of the i sorted aggregation sets at a tth discrete time;

a first generation submodule 4034 for reading a first state in the first state feature; calling a state transfer function to perform state transfer on the first state based on each replaced unordered cluster block set to generate j second states, wherein the g second state is an ordered state of the i ordered cluster block sets and the g unordered cluster block sets at t +1 discrete time; and respectively combining the search content and the search user characteristics with each second state to generate j second state characteristics.

In some embodiments, the scoring model is obtained by deep reinforcement learning network training, and the scoring value is a Q value;

the scoring module 404 is configured to perform word embedding processing on each second state feature to obtain j second state feature vectors; and inputting each second state feature vector into a scoring model, and outputting the Q value of each unordered block set arranged at the t +1 th discrete time by the scoring model.

In some embodiments, the apparatus further comprises a training module 405 for:

obtaining historical sorting data from a database, wherein the historical sorting data comprises at least two historical block aggregation sets corresponding to historical search requests, each historical block aggregation set comprises a historical result set of historical search results of the same type and historical form information of a list item form when the historical result set is displayed on a terminal, and the historical form information and the types of the historical search results in the historical result set have one-to-one correspondence;

generating a first historical state feature at the Tth discrete moment for the first k historical block aggregation sets and a second historical state feature at the Tth +1 discrete moment for the first k +1 historical block aggregation sets on the basis of the arrangement sequence of the historical block aggregation sets corresponding to each historical search request;

and inputting the first historical state feature and the second historical state feature of each historical search request into a deep reinforcement learning network, and training the deep reinforcement learning network to obtain a scoring model, wherein T, k is a non-negative integer.

In some embodiments, there is a corresponding relationship between the form information and the form level information, and the form level information is used to indicate the degree of influence of the list item form on the sorting of the aggregation block set; a training module 405 comprising:

an obtaining sub-module 4051, configured to obtain the T +1 th historical block set arranged at the T +1 th discrete time in the arrangement order;

the second reading submodule 4052 is configured to read historical form information in the T +1 th historical block aggregation set;

the second searching submodule 4053 is configured to search, based on the correspondence, historical form level information corresponding to the historical form information in the T +1 th historical block set;

the second replacement submodule 4054 is configured to replace the historical form information in the T +1 th historical block clustering set with the historical form grade information, so as to obtain a replaced T +1 th historical block clustering set;

the second generating submodule 4055 is configured to invoke a state transfer function to perform state transfer on the first historical state feature based on the T +1 th historical block set after replacement, and generate a second historical state feature.

In some embodiments, the historical form information includes the length and width of the historical list item, and the correspondence includes the correspondence between the ratio of the length and width of the list item and the form level information;

the second searching submodule 4053 is configured to calculate a historical length-width ratio according to the length and width of the history list item in the T +1 th history cluster block set; searching historical form grade information corresponding to the historical length-width ratio based on the corresponding relation; wherein, the length-width ratio of the list item is in negative correlation with the form grade information.

In some embodiments, the first history state feature includes history search content and history search user features carried by the history search request, and a first history state, and the first history state is an ordering state of the top k history block sets at the tth discrete time;

a second generating submodule 4055, configured to read a first history state in the first history state feature; based on the T +1 th history block set after replacement, calling a state transfer function to carry out state transfer on the first history state to generate a second history state, wherein the second history state is the sequencing state of the first k +1 history block sets at the T +1 th discrete time; and sequentially combining the historical search content, the historical search user characteristics and the second historical state to generate second historical state characteristics.

In some embodiments, the deep reinforcement learning network comprises a first network and a second network; the training module 405 further includes:

the training submodule 4056 is configured to calculate a first Q value corresponding to the first historical state feature by using a first network; calculating a second Q value corresponding to the second historical state characteristic by adopting a second network; and updating the network parameters in the first network according to the first Q value and the second Q value until the error between the first Q value and the second Q value is converged to obtain the trained scoring model.

In some embodiments, the training module 405 further comprises:

the update submodule 4057 is configured to copy and update the network parameters in the first network to the second network according to a specified period.

In summary, in the sorting apparatus for a block aggregation provided in this embodiment, in the sorting process of a block aggregation, a search service platform constructs a block aggregation, so that the block aggregation includes both a result set of search results of the same type and form information of list item forms corresponding to the types of the search results in the block aggregation, and then determines an arrangement order of blocks aggregated by the search results of the same type when the block aggregation is displayed on a terminal based on the block aggregation, so that an influence of the list item form corresponding to each block on a space utilization rate of a display area is fully considered, thereby achieving an effect of better utilizing the list item display area on the terminal, for example, an effect of improving the space utilization rate of the list item display area on a page can be achieved.

The device also adopts a scoring model obtained by DQN training, can determine a more accurate and more appropriate sorting state at the next discrete time based on the sorting state at the current discrete time, and can also determine the cluster block set sorted at the next discrete time more efficiently.

Fig. 9 shows a schematic structural diagram of a computer device provided in an exemplary embodiment of the present application. The computer device may be a device that executes a ranking method or a training method of a scoring model of a set of clumps as provided herein, and may be a terminal or a server. Specifically, the method comprises the following steps:

the computer apparatus 500 includes a Central Processing Unit (CPU) 501, a system Memory 504 including a Random Access Memory (RAM) 502 and a Read Only Memory (ROM) 503, and a system bus 505 connecting the system Memory 504 and the Central Processing Unit 501. The computer device 500 also includes a basic Input/Output System (I/O System)506, which facilitates information transfer between various devices within the computer, and a mass storage device 507, which stores an operating System 513, application programs 514, and other program modules 515.

The basic input/output system 506 comprises a display 508 for displaying information and an input device 509, such as a mouse, keyboard, etc., for user input of information. Wherein a display 508 and an input device 509 are connected to the central processing unit 501 through an input output controller 510 connected to the system bus 505. The basic input/output system 506 may also include an input/output controller 510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 510 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 507 is connected to the central processing unit 501 through a mass storage controller (not shown) connected to the system bus 505. The mass storage device 507 and its associated computer-readable media provide non-volatile storage for the computer device 500. That is, mass storage device 507 may include a computer-readable medium (not shown) such as a hard disk or Compact disk Read Only Memory (CD-ROM) drive.

Computer-readable media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Solid State Memory technology, CD-ROM, Digital Versatile Disks (DVD), or Solid State Drives (SSD), other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 504 and mass storage device 507 described above may be collectively referred to as memory.

According to various embodiments of the present application, the computer device 500 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the computer device 500 may be connected to the network 512 through the network interface unit 511 connected to the system bus 505, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 511.

The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

In an alternative embodiment, a computer device is provided, which includes a processor and a memory, wherein at least one instruction, at least one program, set of codes, or set of instructions is stored in the memory, and the at least one instruction, the at least one program, set of codes, or set of instructions is loaded and executed by the processor to implement the ranking method or the training method of the scoring model for a set of clumps as described above.

In an alternative embodiment, a computer-readable storage medium is provided, in which at least one instruction, at least one program, code set, or set of instructions is stored, which is loaded and executed by a processor to implement the method for ordering a set of clumps or training a scoring model as described above.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the method for ranking the clump collection or the method for training the scoring model provided in the foregoing method embodiments.

The present application also provides a computer program product comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the sorting method of the clump collection or the training method of the scoring model as described above.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for sorting a block set, applied to a search server platform, the method comprising:

based on each unordered cluster block set, calling a state transfer function to perform state transfer on the first state feature to generate j second state features, wherein the g-th second state feature is used for indicating the ordering states of the i ordered cluster block sets and the g-th unordered cluster block set at t +1 th discrete time;

sequentially inputting the j second state features into a scoring model, and outputting scoring values of each unordered cluster block set arranged at the t +1 th discrete time by the scoring model;

and determining the unordered block set corresponding to the maximum score value as the block set arranged at the t +1 th discrete time, wherein i and t are non-negative integers, j and g are positive integers, and g is smaller than or equal to j.

2. The method according to claim 1, wherein the search service platform includes a corresponding relationship between form information and form grade information, and the form grade information is used for representing the degree of influence of list item forms on the sorting of the aggregation block set;

the calling a state transfer function to perform state transfer on the first state features based on each unordered cluster block set to generate j second state features, including:

reading form information in each unordered cluster block set to obtain j form information;

finding out form grade information corresponding to each form information based on the corresponding relation to obtain j form grade information corresponding to the j form information;

replacing the j pieces of form information in the j unordered cluster block sets with the j pieces of form grade information in a one-to-one correspondence manner to obtain j replaced unordered cluster block sets;

and calling the state transfer function to perform state transfer on the first state features based on each replaced unordered cluster block set to generate j second state features.

3. The method of claim 2, wherein the form information includes a length and a width of a list item; the corresponding relation comprises the corresponding relation between the length-width ratio of the list items and the form grade information;

the finding out the form grade information corresponding to each form information based on the corresponding relationship to obtain j form grade information corresponding to the j form information includes:

calculating the length-width ratio of the list items according to the length and the width of the list items;

finding out the form grade information corresponding to the length-width ratio based on the corresponding relation to obtain the j form grade information corresponding to the j form information;

wherein the aspect ratio of the list item is in a negative correlation with the formal grade information.

4. The method according to claim 2 or 3, wherein the first state features include search content and search user features carried by a search request corresponding to the search result, and a first state, and the first state is a sorting state of the i sorted aggregation block sets at the t-th discrete time;

the calling the state transfer function to perform state transfer on the first state feature based on each replaced unordered cluster block set to generate the j second state features, including:

reading the first state in the first state feature;

calling the state transfer function to perform state transfer on the first state based on each replaced unordered cluster block set to generate j second states, wherein the g-th second state is an ordered state of the i ordered cluster block sets and the g-th unordered cluster block set at the t + 1-th discrete time;

and combining the search content and the search user characteristics with each second state respectively to generate j second state characteristics.

5. The method according to any one of claims 1 to 4, wherein the scoring model is obtained by deep reinforcement learning network training, and the scoring value is a Q value;

the sequentially inputting the j second state features into a scoring model, and outputting score values of each unordered cluster block set arranged at the t +1 th discrete time by the scoring model, including:

performing word embedding processing on each second state feature to obtain j second state feature vectors;

inputting each second state feature vector into the scoring model, and outputting, by the scoring model, a Q value of each unordered cluster block set when arranged at the t +1 th discrete time.

6. The method of claim 5, wherein the scoring model is trained as follows:

obtaining historical sorting data from a database, wherein the historical sorting data comprises at least two historical block aggregation sets corresponding to historical search requests, the historical block aggregation sets comprise historical result sets of historical search results of the same type and historical form information of list item forms of the historical result sets when the historical result sets are displayed on the terminal, and the historical form information and the types of the historical search results in the historical result sets have one-to-one correspondence;

generating first historical state features at the Tth discrete time for the first k historical block aggregation sets and generating second historical state features at the Tth +1 discrete time for the first k +1 historical block aggregation sets on the basis of the arrangement sequence of the historical block aggregation sets corresponding to each historical search request;

inputting the first historical state feature and the second historical state feature of each historical search request into the deep reinforcement learning network, and training the deep reinforcement learning network to obtain the scoring model, wherein T, k is a non-negative integer.

7. The method according to claim 6, wherein there is a corresponding relationship between form information and form level information, and the form level information is used for representing the degree of influence of list item forms on the sorting of the aggregation block set;

generating a second historical state feature at the T +1 discrete time for the first k +1 historical block aggregation sets, including:

obtaining the T +1 th historical block aggregation set arranged at the T +1 th discrete time in the arrangement sequence;

reading historical form information in the T +1 th historical block aggregation;

finding out historical form grade information corresponding to the historical form information in the T +1 th historical block set based on the corresponding relation;

replacing the historical form information in the T +1 th historical block clustering set with the historical form grade information to obtain a replaced T +1 th historical block clustering set;

and calling a state transfer function to carry out state transfer on the first historical state feature based on the replaced T +1 th historical block aggregation to generate the second historical state feature.

8. The method according to claim 7, wherein the historical form information includes length and width of the historical list item, and the corresponding relationship includes the corresponding relationship between the length-width ratio of the list item and the form grade information;

the finding of the historical form grade information corresponding to the historical form information in the T +1 th historical block set based on the corresponding relationship comprises:

calculating a historical length-width ratio according to the length and width of the historical list items in the T +1 th historical block aggregation set;

searching the historical form grade information corresponding to the historical length-width ratio based on the corresponding relation;

wherein the aspect ratio of the list item is in negative correlation with the formal grade information.

9. The method according to claim 7 or 8, wherein the first history state features include history search content and history search user features carried by the history search request, and a first history state, and the first history state is an ordering state of the top k history block sets at the tth discrete time;

the calling the state transfer function to perform state transfer on the first historical state feature based on the replaced T +1 th historical block set to generate the second historical state feature includes:

reading the first historical state in the first historical state feature;

calling the state transfer function to perform state transfer on the first historical state based on the T +1 th historical block aggregation after replacement to generate a second historical state, wherein the second historical state is the sequencing state of the first k +1 historical block aggregation at the T +1 th discrete time;

and sequentially combining the historical search content, the historical search user characteristics and the second historical state to generate the second historical state characteristics.

10. The method of claim 6, wherein the deep reinforcement learning network comprises a first network and a second network;

the inputting the first historical state feature and the second historical state feature of each historical search request into the deep reinforcement learning network, and training the deep reinforcement learning network to obtain the scoring model includes:

calculating a first Q value corresponding to the first historical state characteristic by adopting the first network; calculating a second Q value corresponding to the second historical state characteristic by adopting the second network;

and updating the network parameters in the first network according to the first Q value and the second Q value until the error between the first Q value and the second Q value is converged to obtain the trained scoring model.

11. The method of claim 10, further comprising:

and copying and updating the network parameters in the first network to the second network according to a specified period.

12. An apparatus for ordering a set of clumps, the apparatus comprising:

a determining module, configured to determine at least two aggregation block sets, where each aggregation block set includes a result set of a search result of a same type and form information of a list item form when the result set is displayed on a terminal, where the form information and a type of the search result in the result set have a one-to-one correspondence relationship, and the at least two aggregation block sets include i sorted aggregation block sets and j unordered aggregation block sets;

an obtaining module, configured to obtain first state features corresponding to the i sorted aggregation block sets, where the first state features are used to indicate sorting states of the i sorted aggregation block sets at a tth discrete time;

a generating module, configured to invoke a state transfer function to perform state transfer on the first state feature based on each unordered cluster block set, and generate j second state features, where a g-th second state feature is used to indicate an ordering state of the i-th ordered cluster block set and the g-th unordered cluster block set at a t + 1-th discrete time;

the scoring module is used for sequentially inputting the j second state features into a scoring model, and the scoring model outputs scoring values of each unordered block set arranged at the t +1 th discrete time;

the determining module is configured to determine an unsorted block set corresponding to the maximum score value as a block set arranged at the t +1 th discrete time, where i and t are non-negative integers, j and g are positive integers, and g is less than or equal to j.

13. A computer device, characterized in that the computer device comprises: a processor and a memory, the memory storing a computer program that is loaded and executed by the processor to implement the method of sorting a collection of clumps according to any of claims 1 to 11.

14. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the method of sorting a collection of tiles of any one of claims 1 to 11.