CN113722537B

CN113722537B - Short video ordering and model training method and device, electronic equipment and storage medium

Info

Publication number: CN113722537B
Application number: CN202110916738.2A
Authority: CN
Inventors: 温恒一
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2024-04-26
Anticipated expiration: 2041-08-11
Also published as: CN113722537A

Abstract

The embodiment of the invention provides a method and a device for sequencing short video searches and model training, which are applied to a scene of coarse-ranking short videos, wherein the method for sequencing comprises the following steps: acquiring a plurality of short video recall files to be sequenced; generating a characteristic array of a plurality of paths of short video recall files; inputting the multiple paths of the feature arrays into a trained network model, and correspondingly outputting recall scores of the multiple paths of short video recall files; and selecting short video recall files corresponding to the recall scores meeting a preset sorting condition as sorting results of the short video recall files. The invention avoids coarse arrangement of recall files by using a general formula, sorts the multi-path short video recall files based on the trained network model, can solve the technical problems that the conventional coarse arrangement scheme cannot adapt to the multi-path short video recall files and the coarse arrangement efficiency is low, and achieves the effects of adapting to the multi-path short video recall files and improving the coarse arrangement efficiency.

Description

Short video ordering and model training method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for ranking short video searches, a method and apparatus for training a network model, an electronic device, and a computer-readable storage medium.

Background

In a search system, coarse ranking may be understood as coarse ranking and fine ranking may be understood as fine ranking. The role of the coarse rank is to screen out hundreds of thousands of results recalled and provide the fine rank with a critical role in the final search results.

Limited by factors such as performance, the current search system utilizes a general formula to perform coarse ranking, and parameters in the general formula correspond to fixed characteristics. In a short video scene, the recall files of each path have obvious difference characteristics, and a general formula cannot adapt to the recall files of different paths with different characteristics. In addition, when coarse ranking is performed on short videos, a large number of recall files need to be processed at the same time, and the existing coarse ranking scheme can only score single recall files in sequence and then sort the recall files according to scoring results.

Disclosure of Invention

The embodiment of the invention aims to provide a short video search ordering method and device, a network model training method and device, electronic equipment and a computer readable storage medium, and solves the problems that a traditional coarse arrangement scheme cannot adapt to multiple recall files and the coarse arrangement efficiency is low. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a method for sorting short video searches, including: acquiring a plurality of short video recall files to be sequenced; generating a characteristic array of a plurality of paths of short video recall files; inputting the multiple paths of the feature arrays into a trained network model, and correspondingly outputting recall scores of the multiple paths of short video recall files; and selecting short video recall files corresponding to the recall scores meeting a preset sorting condition as sorting results of the short video recall files.

Optionally, the generating the feature array of the short video recall file includes: acquiring characteristic data of multiple paths of video recall files in each dimension; and compressing and storing the characteristic data to obtain the characteristic array.

Optionally, the acquiring feature data of multiple paths of the video recall files in each dimension includes: acquiring one of the following characteristics of the multiple paths of video recall files in the document dimension: quality characteristics, freshness characteristics, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in a query dimension; and/or acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.

Optionally, the compressing and storing the feature data to obtain the feature array includes: and storing the characteristic data into a sparse matrix in a compressed sparse line format to obtain the characteristic array.

Optionally, the selecting the short video recall file corresponding to the recall score that meets the preset sorting condition as the sorting result of the multiple short video recall files includes: descending order arrangement is carried out on a plurality of short video recall files according to the recall score; and taking the preset number of short video recall files which are arranged in descending order and are positioned in front of the descending order as the sorting result.

In a second aspect of the present invention, there is also provided a training method of a network model, including: acquiring a plurality of paths of short video recall sample files; adding corresponding sample characteristics for a plurality of paths of short video recall sample files; and training a network model according to the multipath short video recall sample files and the corresponding sample characteristics.

Optionally, the obtaining the multiple short video recall sample file includes: and acquiring a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to the watching time length and/or the display click condition.

Optionally, adding corresponding sample features to multiple short video recall sample files includes: for multiplexing the short video recall sample file, adding one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.

In a third aspect of the present invention, there is also provided a sorting apparatus for short video search, including: the file acquisition module is used for acquiring a plurality of short video recall files to be sequenced; the feature generation module is used for generating a feature array of the short video recall file in multiple paths; the score output module is used for inputting a plurality of paths of the characteristic arrays into the trained network model and correspondingly outputting recall scores of a plurality of paths of short video recall files; and the file selection module is used for selecting short video recall files corresponding to the recall scores meeting a preset sorting condition and taking the short video recall files as sorting results of the plurality of paths of short video recall files.

Optionally, the feature generation module includes: the feature data acquisition module is used for acquiring feature data of multiple paths of video recall files in each dimension; and the compression storage module is used for compressing and storing the characteristic data to obtain the characteristic array.

Optionally, the feature data obtaining module is configured to obtain one of the following features of the multiple paths of video recall files in a document dimension: quality characteristics, freshness characteristics, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in a query dimension; and/or acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.

Optionally, the compressed storage module is configured to store the feature data into a sparse matrix in a compressed sparse row format, so as to obtain the feature array.

Optionally, the file selection module includes: the score sorting module is used for sorting the short video recall files in descending order according to the recall scores; and the result determining module is used for taking the preset number of short video recall files which are arranged in descending order and are arranged in the front as the sorting result.

In a fourth aspect of the present invention, there is also provided a training apparatus for a network model, including: the sample acquisition module is used for acquiring a plurality of paths of short video recall sample files; the feature adding module is used for adding corresponding sample features for the short video recall sample files; and the model training module is used for training the network model according to the short video recall sample files and the corresponding sample characteristics.

Optionally, the sample obtaining module is configured to obtain a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to the viewing duration and/or the display click condition.

Optionally, the feature adding module is configured to recall the sample file for multiple paths of the short videos, and add one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.

In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus; a memory for storing a computer program; and the processor is used for realizing the short video searching sequencing method in the first aspect or the network model training method in the second aspect when executing the program stored in the memory.

In yet another aspect of the implementation of the present invention, there is further provided a computer readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the method for ordering short video searches according to the first aspect or the method for training a network model according to the second aspect.

In a further aspect of the implementation of the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of ordering short video searches according to the first aspect or the method of training a network model according to the second aspect.

The short video searching sequencing scheme provided by the embodiment of the invention adopts the technical means of acquiring multiple paths of short video recall files, outputting recall scores of the multiple paths of short video recall files according to a characteristic array and a network model of the multiple paths of short video recall files, and determining sequencing results according to the recall scores. The method and the device avoid ordering the recall files by using a general formula, order the multi-path short video recall files based on the trained network model, solve the technical problems that the existing coarse arrangement scheme cannot adapt to the multi-path short video recall files and the coarse arrangement efficiency is low, achieve the effects of adapting to the multi-path short video recall files and improving the coarse arrangement efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart illustrating a method for sorting short video searches according to an embodiment of the present invention.

Fig. 2 is a flowchart illustrating steps of a training method of a network model according to an embodiment of the present invention.

FIG. 3a is a schematic diagram of an offline training process of xgboost model according to an embodiment of the present invention.

Fig. 3b is a schematic diagram of an on-line application flow of xgboost model according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a sorting device for short video searching according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a training device for a network model according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.

The embodiment of the invention provides a sequencing scheme for short video searching, which is applied to a short video searching scene. When the short video recall files are roughly arranged, indexes such as click rate and the like are considered, and consumption indexes such as average person display click times, average person display playing time length and the like are also considered. Different from the common search scene, the search scene of the short video has relatively low requirements on the relativity of the short video, and has certain requirements on the diversity and the divergence of the short video. Therefore, the embodiment of the invention can sort multiple paths of short video recall files including term recall, embedding recall, and the like.

The embodiment of the invention provides a training scheme of a network model, which can be used for training based on xgboost (an optimized distributed gradient enhancement library) model. In the training process of the network model, sample characteristics can be flexibly set for training samples, such as adding recall source information sample characteristics, recall score sample characteristics and the like, bias or deviation caused by different recall sources can be relieved, and personalized search for short video recall files can be realized by adding user behavior sample characteristics.

As shown in FIG. 1, a flow chart of steps of a method for ranking short video searches in accordance with an embodiment of the present invention is shown. The short video search ranking method may be applied to a server, which may be a short video server. The short video search ranking method may specifically include the following steps.

And step 101, acquiring a plurality of short video recall files to be sequenced.

In an embodiment of the present invention, the short video recall file may be documents obtained from multiple recall sources, each of which may contain one short video data and query information for the short video data. The query information may include, but is not limited to: video information and user information of the short video data. Wherein the video information contains name, number, type, capacity, duration, author, format, etc. The user information includes a user name, a click operation for video data, a viewing operation, a viewing time period, and the like.

And 102, generating a feature array of the multi-channel short video recall file.

In an embodiment of the invention, since the short video recall files may be sourced from multiple recall sources, the short video recall files of each recall source have their own characteristics. Therefore, the characteristics of the multi-path short video recall file are constructed as the characteristic array, and the characteristics of the multi-path short video recall file are numerous in variety and quantity.

And step 103, inputting the multi-path feature array into the trained network model, and correspondingly outputting recall scores of the multi-path short video recall file.

In the embodiment of the invention, each short video recall file may correspond to one path of feature data, that is, features of one path of short video recall file may be constructed as a set of feature arrays in units of the number of paths of the short video recall file. The network model may be trained based on xgboost models. In practice, xgboost models may contain 40 decision trees, each containing a maximum of 6 nodes. Recall scores are scalar representations of the underlying relevance of short video recall files, which can represent the popularity of short video recall files to users.

And 104, selecting short video recall files corresponding to recall scores meeting preset sorting conditions as sorting results of the multi-channel short video recall files.

In the embodiment of the invention, after the recall score of each short video recall file is output, a part of short video recall files with higher recall score can be selected according to the recall score, and the selected short video recall files are used as the sorting result of the multi-channel short video recall files to be sorted.

The short video searching sequencing scheme provided by the embodiment of the invention adopts the technical means of acquiring multiple paths of short video recall files, outputting recall scores of the multiple paths of short video recall files according to a characteristic array and a network model of the multiple paths of short video recall files, and determining sequencing results according to the recall scores. The method and the device avoid coarse arrangement of recall files by utilizing a general formula, and order the multipath short video recall files based on the trained network model, so that the technical problem that the conventional coarse arrangement scheme cannot adapt to multipath short video recall files and the coarse arrangement efficiency is low is solved, the effects of adapting multipath short video recall files and improving the coarse arrangement efficiency are achieved.

In an exemplary embodiment of the present invention, in the process of executing the above-mentioned generation of the feature array of the multi-path short video recall file, feature data of the multi-path short video recall file in each dimension may be obtained, and then the feature data of each dimension may be compressed and stored to obtain the feature data. In practical applications, the dimensions include, but are not limited to: document dimensions, query dimensions, and query and document dimensions. The feature data for the document dimension may include at least one of: quality characteristics, freshness characteristics, user characteristics. The quality features may be sharpness, code rate, etc. of the short video. The freshness characteristic may be an upload time of the short video data. The user features may be click users, view users, show users, etc. of the short video. The feature data of the query dimension may include query category features, and the like. The feature data for the query and document dimensions may include at least one of: click rate features, viewing duration features, presentation features, click features, etc. Wherein the click-through rate feature may indicate that the short video data is after presentation to the user. The user who clicks on the short video data accounts for the proportion of all users. The viewing duration characteristic may represent an average viewing duration of the short video data. The presentation characteristic may indicate how many users the short video data is presented to, the duration of the presentation, etc. The click feature may represent a user who clicks on the short video data, a time of a click operation, and the like.

In an exemplary embodiment of the present invention, in the process of performing the above-mentioned compression storage on the feature data to obtain the feature array, the feature data may be stored into a sparse matrix in a compressed sparse row (Compressed Sparse Row, abbreviated as CSR) format to obtain the feature array. A sparse matrix is a matrix with a small number of non-zero terms (in a matrix, if the number of elements of the value 0 is far greater than the number of non-0 elements, and the non-0 elements are distributed irregularly). If the two-dimensional array storage method is adopted for the sparse matrix, a large number of storage units are wasted to store zero elements, and a large amount of time is wasted in operation to perform zero element invalidation operation. Therefore, consideration must be given to compression storing the sparse matrix (storing only non-zero elements). The CSR format is a storage format for sparse matrices, and is an efficient format for matrix-matrix and matrix-vector operations. The CSR uses three arrays, namely a numerical value, a row offset (representing the initial offset position of the first element of a certain row within the numerical value), and a column number, and the total number of elements of the matrix is added at the end of the row offset. Wherein the one-dimensional array data (numerical value) has all non-0 values stored in order, having as many elements as non-0 elements. One-dimensional array indptr (row offset): the certificate is included such that indptr [ i ] is the index of the element in data, which is the first non-0 element in row i. If the entire row i is 0, indptr [ i ] = indptr [ i+1], if the initial matrix has m rows, then len (indtr) = m+1. One-dimensional array Indices (column number) contains column index information using the method that indices [ indptr [ i ]: indptr [ i+1] ] is an integer array with column indices of non-0 elements in row i. The column index indicates the column number in which the value is located, starting from 0. Array data: non-zero elements contained in the matrix are stored in a row-first form. Row offset: the row index in the CSR is compressed and there is no row index, here represented by a row offset.

In practical applications, the feature array input to the network model may form TensorProto (tensor prototype), and the TensorProto may specifically be a map structure including the following four keywords: indptr, indices, data, option _mask. The values of the keywords are TensorProto(dtype＝uint64,tensor_shape＝[n1]),TensorProto(dtype＝uint32,tensor_shape＝[n2]),TensorProto(dtype＝float32,tensor_shape＝[n3]),int. in turn, where data represents feature data and option_mask specifies the recall score of the output. That is, recall scores output from the network model are output through the option_mask. n1, n2 and n3 each represent the number of elements.

In an exemplary embodiment of the present invention, in the process of executing selection of short video recall files corresponding to recall scores satisfying a preset sorting condition as a sorting result of multiple short video recall files, the multiple short video recall files may be sorted in descending order according to the recall scores, and a preset number of short video recall files before the descending order may be used as the sorting result. For example, if the preset number is 100, the first 100 short video recall files after descending order may be used as the sorting result.

Referring to fig. 2, a flowchart illustrating steps of a training method for a network model according to an embodiment of the present invention is shown. The training method of the network model can be applied to a server, and the server can be a short video server. The training method of the network model specifically comprises the following steps.

Step 201, a multi-path short video recall sample file is obtained.

In the embodiment of the invention, multiple short video recall positive sample files and multiple short video recall negative sample files can be acquired. In practical application, a duration threshold may be set, where the duration threshold may be understood as a play duration threshold of a short video when a user views the short video. And determining whether the obtained short video recall sample file is a short video recall positive sample file or a short video recall negative sample file according to the time length threshold. Specifically, the average watching time length of the short video data in the short video recall sample file can be compared with a time length threshold, and if the average watching time length is greater than the time length threshold, the short video recall sample file is a short video recall positive sample file; and if the average watching time length is less than or equal to the time length threshold value, the short video recall sample file is a short video recall negative sample file. Besides judging whether the short video recall sample file is a short video recall positive sample file or a short video recall negative sample file by using a time length threshold, the multi-path short video recall positive sample file and the multi-path short video recall negative sample file can be obtained according to the watching time length and/or the display click condition. For example, if the short video data in a short video recall file is shown to a user, and the user clicks on the short video data, and at the same time, the average viewing time for the user to view the short video data is greater than the duration threshold, the short video recall sample file is a short video recall positive sample file. If the short video data in a short video recall file is displayed to a user, but the user does not click on the short video data, or the user clicks on the short video data, but the average watching duration of the user watching the short video data is less than or equal to a duration threshold, the short video recall sample file is a short video recall negative sample file.

Step 202, adding corresponding sample characteristics for the multi-path short video recall sample file.

In an embodiment of the present invention, one of the following sample characteristics may be added to the multiple short video recall sample file: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature. The recall source information sample characteristics may include, among other things, a recall source sample name, a recall source sample type, a recall source sample state, and the like. The recall score sample characteristics may represent the underlying relevance score of the short video recall sample file at the corresponding recall source. Recall file sample characteristics may include quality sample characteristics, freshness sample characteristics, click through rate sample characteristics, and the like. The user-presented click sample feature may comprise a user sample feature that clicks or views short video sample data in a short video recall sample file. The user behavior sample feature may represent an operation sample feature for a user to interactively manipulate short video sample data in a short video recall sample file, including, but not limited to: download operation, share operation, praise operation, comment operation, collection operation, and the like.

And 203, training the network model according to the multipath short video recall sample file and the corresponding sample characteristics.

In the embodiment of the invention, recall score sample characteristics, recall source information sample characteristics, user behavior sample characteristics and the like in sample characteristics can be used as the annotation data of the short video recall sample file, the training result obtained by training is compared with the corresponding annotation data through training of the network model, and then the parameters of the network model are adjusted according to the comparison result until the training result is the same as, similar to or meets the preset training condition with the corresponding annotation data. The trained network model can output short video recall sample files arranged in sequence, and particularly can output short video recall sample files arranged in descending order according to recall scores.

Based on the above description of the ranking method embodiment related to the short video search and the training method embodiment of the network model, a ranking scheme based on xgboost model is described below. FIG. 3a shows a schematic diagram of an offline training process for xgboost models. Firstly, acquiring multiple paths of sample data, extracting features for the sample data, dividing the sample data into positive sample data and negative sample data according to the extracted features, and finally training a xgboost model based on an artificial intelligence technology. Fig. 3b shows a schematic of the in-line application flow of xgboost model. Firstly, receiving a plurality of paths of recall files, then, calculating feature data of the recall files, constructing a feature array in a CSR format according to the feature data, next, calling an artificial intelligence service to input the recall files and the feature array into a trained xgboost model in parallel, and finally, receiving recall scores returned by the xgboost model as final scores of sequencing.

As shown in fig. 4, a schematic structural diagram of a sorting apparatus for short video search according to an embodiment of the present invention is shown. The short video search ranking apparatus may include the following modules.

The file acquisition module 41 is used for acquiring a plurality of short video recall files to be sequenced;

a feature generation module 42, configured to generate a feature array of multiple paths of the short video recall file;

The score output module 43 is configured to input multiple paths of the feature arrays into the trained network model, and correspondingly output recall scores of multiple paths of the short video recall files;

the file selection module 44 is configured to select short video recall files corresponding to the recall scores that satisfy a preset sorting condition, as a sorting result of multiple paths of the short video recall files.

In an exemplary embodiment of the present invention, the feature generation module 42 includes:

the feature data acquisition module is used for acquiring feature data of multiple paths of video recall files in each dimension;

and the compression storage module is used for compressing and storing the characteristic data to obtain the characteristic array.

In an exemplary embodiment of the present invention, the feature data obtaining module is configured to obtain one of the following features of the multiple video recall files in a document dimension: quality characteristics, freshness characteristics, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in a query dimension; and/or acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.

In an exemplary embodiment of the present invention, the compression storage module is configured to store the feature data into a sparse matrix in a compressed sparse row format, to obtain the feature array.

In an exemplary embodiment of the present invention, the file selection module 44 includes:

the score sorting module is used for sorting the short video recall files in descending order according to the recall scores;

and the result determining module is used for taking the preset number of short video recall files which are arranged in descending order and are arranged in the front as the sorting result.

Fig. 5 is a schematic structural diagram of a training device for a network model according to an embodiment of the present invention. The training apparatus of the network model may include the following modules.

The sample acquisition module 51 is configured to acquire multiple short video recall sample files;

a feature adding module 52, configured to add corresponding sample features to multiple paths of the short video recall sample files;

The model training module 53 is configured to train a network model according to the multiple short video recall sample files and the corresponding sample features.

In an exemplary embodiment of the present invention, the sample obtaining module 51 is configured to obtain a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to a viewing duration and/or a display click condition.

In an exemplary embodiment of the present invention, the feature adding module 52 is configured to add, for multiplexing the short video recall sample files, one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.

The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 61, a communication interface 62, a memory 63 and a communication bus 64, wherein the processor 61, the communication interface 62 and the memory 63 complete communication with each other through the communication bus 64,

A memory 63 for storing a computer program;

the processor 61 is configured to execute the program stored in the memory 63, and implement the following steps:

Acquiring a plurality of short video recall files to be sequenced;

generating a characteristic array of a plurality of paths of short video recall files;

inputting the multiple paths of the feature arrays into a trained network model, and correspondingly outputting recall scores of the multiple paths of short video recall files;

and selecting short video recall files corresponding to the recall scores meeting a preset sorting condition as sorting results of the short video recall files.

The step of generating the characteristic array of the short video recall file comprises the following steps:

Acquiring characteristic data of multiple paths of video recall files in each dimension;

and compressing and storing the characteristic data to obtain the characteristic array.

The step of acquiring the feature data of the multiple paths of video recall files in each dimension comprises the following steps:

acquiring one of the following characteristics of the multiple paths of video recall files in the document dimension: quality characteristics, freshness characteristics, user characteristics; and/or the number of the groups of groups,

Acquiring query category characteristics of multiple paths of video recall files in a query dimension; and/or the number of the groups of groups,

Acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.

The step of compressing and storing the feature data to obtain the feature array comprises the following steps:

and storing the characteristic data into a sparse matrix in a compressed sparse line format to obtain the characteristic array.

The step of selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as sorting results of the short video recall files comprises the following steps:

Descending order arrangement is carried out on a plurality of short video recall files according to the recall score;

and taking the preset number of short video recall files which are arranged in descending order and are positioned in front of the descending order as the sorting result.

The processor 61 is further configured to execute the program stored in the memory 63, thereby implementing the following steps:

Acquiring a plurality of paths of short video recall sample files;

adding corresponding sample characteristics for a plurality of paths of short video recall sample files;

And training a network model according to the multipath short video recall sample files and the corresponding sample characteristics.

The step of obtaining the multipath short video recall sample file comprises the following steps:

and acquiring a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to the watching time length and/or the display click condition.

The step of adding corresponding sample characteristics for the short video recall sample files comprises the following steps:

For multiplexing the short video recall sample file, adding one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the short video search ranking method or the network model training method of any of the above embodiments.

In yet another embodiment of the present invention, a computer program product comprising instructions, which when run on a computer, causes the computer to perform the short video search ranking method of any one of the above embodiments or the training method of the network model is also provided.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A method for ranking short video searches, comprising:

Acquiring a plurality of short video recall files to be sequenced;

Generating a characteristic array of a plurality of paths of short video recall files; the feature array is constructed by taking the number of the short video recall files as a unit, and the features of one short video recall file are constructed into a group of feature arrays; the short video recall files are derived from multiple recall sources, and the short video recall files of each recall source have respective characteristics;

2. The method of claim 1, wherein generating the feature array of the plurality of short video recall files comprises:

3. The method of claim 2, wherein the obtaining feature data in each dimension for multiple paths of the video recall file comprises:

4. The method according to claim 2, wherein the compressing and storing the feature data to obtain the feature array includes:

5. The method according to any one of claims 1 to 4, wherein selecting short video recall files corresponding to the recall scores satisfying a preset sorting condition as a sorting result of multiplexing the short video recall files includes:

6. The method of claim 1, wherein the training method of the network model comprises:

Acquiring a plurality of paths of short video recall sample files;

and training a network model according to the multipath short video recall sample files and the corresponding sample features, wherein in the training process, the sample features are used as the marking data of the short video recall sample files.

7. The method of claim 6, wherein the obtaining the multiplexed short video recall sample file comprises:

8. The method of claim 6, wherein the adding corresponding sample features to the multiple short video recall sample files comprises:

9. A short video search ranking apparatus, comprising:

The file acquisition module is used for acquiring a plurality of short video recall files to be sequenced;

the feature generation module is used for generating a feature array of the short video recall file in multiple paths; the feature array is constructed by taking the number of the short video recall files as a unit, and the features of one short video recall file are constructed into a group of feature arrays; the short video recall files are derived from multiple recall sources, and the short video recall files of each recall source have respective characteristics;

The score output module is used for inputting a plurality of paths of the characteristic arrays into the trained network model and correspondingly outputting recall scores of a plurality of paths of short video recall files;

And the file selection module is used for selecting short video recall files corresponding to the recall scores meeting a preset sorting condition and taking the short video recall files as sorting results of the plurality of paths of short video recall files.

10. The sequencing device of claim 9 wherein said device further comprises:

the sample acquisition module is used for acquiring a plurality of paths of short video recall sample files;

The feature adding module is used for adding corresponding sample features for the short video recall sample files;

And the model training module is used for training the network model according to the multipath short video recall sample files and the corresponding sample characteristics, and in the training process, the sample characteristics are used as the labeling data of the short video recall sample files.

11. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

a processor for implementing the short video search ranking method of any one of claims 1 to 8 when executing a program stored on a memory.

12. A computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the short video search ranking method of any one of claims 1 to 8.