CN113722537B - Short video ordering and model training method and device, electronic equipment and storage medium - Google Patents

Short video ordering and model training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113722537B
CN113722537B CN202110916738.2A CN202110916738A CN113722537B CN 113722537 B CN113722537 B CN 113722537B CN 202110916738 A CN202110916738 A CN 202110916738A CN 113722537 B CN113722537 B CN 113722537B
Authority
CN
China
Prior art keywords
recall
short video
files
sample
video recall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110916738.2A
Other languages
Chinese (zh)
Other versions
CN113722537A (en
Inventor
温恒一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202110916738.2A priority Critical patent/CN113722537B/en
Publication of CN113722537A publication Critical patent/CN113722537A/en
Application granted granted Critical
Publication of CN113722537B publication Critical patent/CN113722537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/743Browsing; Visualisation therefor a collection of video files or sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for sequencing short video searches and model training, which are applied to a scene of coarse-ranking short videos, wherein the method for sequencing comprises the following steps: acquiring a plurality of short video recall files to be sequenced; generating a characteristic array of a plurality of paths of short video recall files; inputting the multiple paths of the feature arrays into a trained network model, and correspondingly outputting recall scores of the multiple paths of short video recall files; and selecting short video recall files corresponding to the recall scores meeting a preset sorting condition as sorting results of the short video recall files. The invention avoids coarse arrangement of recall files by using a general formula, sorts the multi-path short video recall files based on the trained network model, can solve the technical problems that the conventional coarse arrangement scheme cannot adapt to the multi-path short video recall files and the coarse arrangement efficiency is low, and achieves the effects of adapting to the multi-path short video recall files and improving the coarse arrangement efficiency.

Description

Short video ordering and model training method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for ranking short video searches, a method and apparatus for training a network model, an electronic device, and a computer-readable storage medium.
Background
In a search system, coarse ranking may be understood as coarse ranking and fine ranking may be understood as fine ranking. The role of the coarse rank is to screen out hundreds of thousands of results recalled and provide the fine rank with a critical role in the final search results.
Limited by factors such as performance, the current search system utilizes a general formula to perform coarse ranking, and parameters in the general formula correspond to fixed characteristics. In a short video scene, the recall files of each path have obvious difference characteristics, and a general formula cannot adapt to the recall files of different paths with different characteristics. In addition, when coarse ranking is performed on short videos, a large number of recall files need to be processed at the same time, and the existing coarse ranking scheme can only score single recall files in sequence and then sort the recall files according to scoring results.
Disclosure of Invention
The embodiment of the invention aims to provide a short video search ordering method and device, a network model training method and device, electronic equipment and a computer readable storage medium, and solves the problems that a traditional coarse arrangement scheme cannot adapt to multiple recall files and the coarse arrangement efficiency is low. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided a method for sorting short video searches, including: acquiring a plurality of short video recall files to be sequenced; generating a characteristic array of a plurality of paths of short video recall files; inputting the multiple paths of the feature arrays into a trained network model, and correspondingly outputting recall scores of the multiple paths of short video recall files; and selecting short video recall files corresponding to the recall scores meeting a preset sorting condition as sorting results of the short video recall files.
Optionally, the generating the feature array of the short video recall file includes: acquiring characteristic data of multiple paths of video recall files in each dimension; and compressing and storing the characteristic data to obtain the characteristic array.
Optionally, the acquiring feature data of multiple paths of the video recall files in each dimension includes: acquiring one of the following characteristics of the multiple paths of video recall files in the document dimension: quality characteristics, freshness characteristics, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in a query dimension; and/or acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.
Optionally, the compressing and storing the feature data to obtain the feature array includes: and storing the characteristic data into a sparse matrix in a compressed sparse line format to obtain the characteristic array.
Optionally, the selecting the short video recall file corresponding to the recall score that meets the preset sorting condition as the sorting result of the multiple short video recall files includes: descending order arrangement is carried out on a plurality of short video recall files according to the recall score; and taking the preset number of short video recall files which are arranged in descending order and are positioned in front of the descending order as the sorting result.
In a second aspect of the present invention, there is also provided a training method of a network model, including: acquiring a plurality of paths of short video recall sample files; adding corresponding sample characteristics for a plurality of paths of short video recall sample files; and training a network model according to the multipath short video recall sample files and the corresponding sample characteristics.
Optionally, the obtaining the multiple short video recall sample file includes: and acquiring a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to the watching time length and/or the display click condition.
Optionally, adding corresponding sample features to multiple short video recall sample files includes: for multiplexing the short video recall sample file, adding one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.
In a third aspect of the present invention, there is also provided a sorting apparatus for short video search, including: the file acquisition module is used for acquiring a plurality of short video recall files to be sequenced; the feature generation module is used for generating a feature array of the short video recall file in multiple paths; the score output module is used for inputting a plurality of paths of the characteristic arrays into the trained network model and correspondingly outputting recall scores of a plurality of paths of short video recall files; and the file selection module is used for selecting short video recall files corresponding to the recall scores meeting a preset sorting condition and taking the short video recall files as sorting results of the plurality of paths of short video recall files.
Optionally, the feature generation module includes: the feature data acquisition module is used for acquiring feature data of multiple paths of video recall files in each dimension; and the compression storage module is used for compressing and storing the characteristic data to obtain the characteristic array.
Optionally, the feature data obtaining module is configured to obtain one of the following features of the multiple paths of video recall files in a document dimension: quality characteristics, freshness characteristics, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in a query dimension; and/or acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.
Optionally, the compressed storage module is configured to store the feature data into a sparse matrix in a compressed sparse row format, so as to obtain the feature array.
Optionally, the file selection module includes: the score sorting module is used for sorting the short video recall files in descending order according to the recall scores; and the result determining module is used for taking the preset number of short video recall files which are arranged in descending order and are arranged in the front as the sorting result.
In a fourth aspect of the present invention, there is also provided a training apparatus for a network model, including: the sample acquisition module is used for acquiring a plurality of paths of short video recall sample files; the feature adding module is used for adding corresponding sample features for the short video recall sample files; and the model training module is used for training the network model according to the short video recall sample files and the corresponding sample characteristics.
Optionally, the sample obtaining module is configured to obtain a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to the viewing duration and/or the display click condition.
Optionally, the feature adding module is configured to recall the sample file for multiple paths of the short videos, and add one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.
In yet another aspect of the present invention, there is also provided an electronic device including a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory perform communication with each other through the communication bus; a memory for storing a computer program; and the processor is used for realizing the short video searching sequencing method in the first aspect or the network model training method in the second aspect when executing the program stored in the memory.
In yet another aspect of the implementation of the present invention, there is further provided a computer readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the method for ordering short video searches according to the first aspect or the method for training a network model according to the second aspect.
In a further aspect of the implementation of the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of ordering short video searches according to the first aspect or the method of training a network model according to the second aspect.
The short video searching sequencing scheme provided by the embodiment of the invention adopts the technical means of acquiring multiple paths of short video recall files, outputting recall scores of the multiple paths of short video recall files according to a characteristic array and a network model of the multiple paths of short video recall files, and determining sequencing results according to the recall scores. The method and the device avoid ordering the recall files by using a general formula, order the multi-path short video recall files based on the trained network model, solve the technical problems that the existing coarse arrangement scheme cannot adapt to the multi-path short video recall files and the coarse arrangement efficiency is low, achieve the effects of adapting to the multi-path short video recall files and improving the coarse arrangement efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart illustrating a method for sorting short video searches according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating steps of a training method of a network model according to an embodiment of the present invention.
FIG. 3a is a schematic diagram of an offline training process of xgboost model according to an embodiment of the present invention.
Fig. 3b is a schematic diagram of an on-line application flow of xgboost model according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a sorting device for short video searching according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a training device for a network model according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
The embodiment of the invention provides a sequencing scheme for short video searching, which is applied to a short video searching scene. When the short video recall files are roughly arranged, indexes such as click rate and the like are considered, and consumption indexes such as average person display click times, average person display playing time length and the like are also considered. Different from the common search scene, the search scene of the short video has relatively low requirements on the relativity of the short video, and has certain requirements on the diversity and the divergence of the short video. Therefore, the embodiment of the invention can sort multiple paths of short video recall files including term recall, embedding recall, and the like.
The embodiment of the invention provides a training scheme of a network model, which can be used for training based on xgboost (an optimized distributed gradient enhancement library) model. In the training process of the network model, sample characteristics can be flexibly set for training samples, such as adding recall source information sample characteristics, recall score sample characteristics and the like, bias or deviation caused by different recall sources can be relieved, and personalized search for short video recall files can be realized by adding user behavior sample characteristics.
As shown in FIG. 1, a flow chart of steps of a method for ranking short video searches in accordance with an embodiment of the present invention is shown. The short video search ranking method may be applied to a server, which may be a short video server. The short video search ranking method may specifically include the following steps.
And step 101, acquiring a plurality of short video recall files to be sequenced.
In an embodiment of the present invention, the short video recall file may be documents obtained from multiple recall sources, each of which may contain one short video data and query information for the short video data. The query information may include, but is not limited to: video information and user information of the short video data. Wherein the video information contains name, number, type, capacity, duration, author, format, etc. The user information includes a user name, a click operation for video data, a viewing operation, a viewing time period, and the like.
And 102, generating a feature array of the multi-channel short video recall file.
In an embodiment of the invention, since the short video recall files may be sourced from multiple recall sources, the short video recall files of each recall source have their own characteristics. Therefore, the characteristics of the multi-path short video recall file are constructed as the characteristic array, and the characteristics of the multi-path short video recall file are numerous in variety and quantity.
And step 103, inputting the multi-path feature array into the trained network model, and correspondingly outputting recall scores of the multi-path short video recall file.
In the embodiment of the invention, each short video recall file may correspond to one path of feature data, that is, features of one path of short video recall file may be constructed as a set of feature arrays in units of the number of paths of the short video recall file. The network model may be trained based on xgboost models. In practice, xgboost models may contain 40 decision trees, each containing a maximum of 6 nodes. Recall scores are scalar representations of the underlying relevance of short video recall files, which can represent the popularity of short video recall files to users.
And 104, selecting short video recall files corresponding to recall scores meeting preset sorting conditions as sorting results of the multi-channel short video recall files.
In the embodiment of the invention, after the recall score of each short video recall file is output, a part of short video recall files with higher recall score can be selected according to the recall score, and the selected short video recall files are used as the sorting result of the multi-channel short video recall files to be sorted.
The short video searching sequencing scheme provided by the embodiment of the invention adopts the technical means of acquiring multiple paths of short video recall files, outputting recall scores of the multiple paths of short video recall files according to a characteristic array and a network model of the multiple paths of short video recall files, and determining sequencing results according to the recall scores. The method and the device avoid coarse arrangement of recall files by utilizing a general formula, and order the multipath short video recall files based on the trained network model, so that the technical problem that the conventional coarse arrangement scheme cannot adapt to multipath short video recall files and the coarse arrangement efficiency is low is solved, the effects of adapting multipath short video recall files and improving the coarse arrangement efficiency are achieved.
In an exemplary embodiment of the present invention, in the process of executing the above-mentioned generation of the feature array of the multi-path short video recall file, feature data of the multi-path short video recall file in each dimension may be obtained, and then the feature data of each dimension may be compressed and stored to obtain the feature data. In practical applications, the dimensions include, but are not limited to: document dimensions, query dimensions, and query and document dimensions. The feature data for the document dimension may include at least one of: quality characteristics, freshness characteristics, user characteristics. The quality features may be sharpness, code rate, etc. of the short video. The freshness characteristic may be an upload time of the short video data. The user features may be click users, view users, show users, etc. of the short video. The feature data of the query dimension may include query category features, and the like. The feature data for the query and document dimensions may include at least one of: click rate features, viewing duration features, presentation features, click features, etc. Wherein the click-through rate feature may indicate that the short video data is after presentation to the user. The user who clicks on the short video data accounts for the proportion of all users. The viewing duration characteristic may represent an average viewing duration of the short video data. The presentation characteristic may indicate how many users the short video data is presented to, the duration of the presentation, etc. The click feature may represent a user who clicks on the short video data, a time of a click operation, and the like.
In an exemplary embodiment of the present invention, in the process of performing the above-mentioned compression storage on the feature data to obtain the feature array, the feature data may be stored into a sparse matrix in a compressed sparse row (Compressed Sparse Row, abbreviated as CSR) format to obtain the feature array. A sparse matrix is a matrix with a small number of non-zero terms (in a matrix, if the number of elements of the value 0 is far greater than the number of non-0 elements, and the non-0 elements are distributed irregularly). If the two-dimensional array storage method is adopted for the sparse matrix, a large number of storage units are wasted to store zero elements, and a large amount of time is wasted in operation to perform zero element invalidation operation. Therefore, consideration must be given to compression storing the sparse matrix (storing only non-zero elements). The CSR format is a storage format for sparse matrices, and is an efficient format for matrix-matrix and matrix-vector operations. The CSR uses three arrays, namely a numerical value, a row offset (representing the initial offset position of the first element of a certain row within the numerical value), and a column number, and the total number of elements of the matrix is added at the end of the row offset. Wherein the one-dimensional array data (numerical value) has all non-0 values stored in order, having as many elements as non-0 elements. One-dimensional array indptr (row offset): the certificate is included such that indptr [ i ] is the index of the element in data, which is the first non-0 element in row i. If the entire row i is 0, indptr [ i ] = indptr [ i+1], if the initial matrix has m rows, then len (indtr) = m+1. One-dimensional array Indices (column number) contains column index information using the method that indices [ indptr [ i ]: indptr [ i+1] ] is an integer array with column indices of non-0 elements in row i. The column index indicates the column number in which the value is located, starting from 0. Array data: non-zero elements contained in the matrix are stored in a row-first form. Row offset: the row index in the CSR is compressed and there is no row index, here represented by a row offset.
In practical applications, the feature array input to the network model may form TensorProto (tensor prototype), and the TensorProto may specifically be a map structure including the following four keywords: indptr, indices, data, option _mask. The values of the keywords are TensorProto(dtype=uint64,tensor_shape=[n1]),TensorProto(dtype=uint32,tensor_shape=[n2]),TensorProto(dtype=float32,tensor_shape=[n3]),int. in turn, where data represents feature data and option_mask specifies the recall score of the output. That is, recall scores output from the network model are output through the option_mask. n1, n2 and n3 each represent the number of elements.
In an exemplary embodiment of the present invention, in the process of executing selection of short video recall files corresponding to recall scores satisfying a preset sorting condition as a sorting result of multiple short video recall files, the multiple short video recall files may be sorted in descending order according to the recall scores, and a preset number of short video recall files before the descending order may be used as the sorting result. For example, if the preset number is 100, the first 100 short video recall files after descending order may be used as the sorting result.
Referring to fig. 2, a flowchart illustrating steps of a training method for a network model according to an embodiment of the present invention is shown. The training method of the network model can be applied to a server, and the server can be a short video server. The training method of the network model specifically comprises the following steps.
Step 201, a multi-path short video recall sample file is obtained.
In the embodiment of the invention, multiple short video recall positive sample files and multiple short video recall negative sample files can be acquired. In practical application, a duration threshold may be set, where the duration threshold may be understood as a play duration threshold of a short video when a user views the short video. And determining whether the obtained short video recall sample file is a short video recall positive sample file or a short video recall negative sample file according to the time length threshold. Specifically, the average watching time length of the short video data in the short video recall sample file can be compared with a time length threshold, and if the average watching time length is greater than the time length threshold, the short video recall sample file is a short video recall positive sample file; and if the average watching time length is less than or equal to the time length threshold value, the short video recall sample file is a short video recall negative sample file. Besides judging whether the short video recall sample file is a short video recall positive sample file or a short video recall negative sample file by using a time length threshold, the multi-path short video recall positive sample file and the multi-path short video recall negative sample file can be obtained according to the watching time length and/or the display click condition. For example, if the short video data in a short video recall file is shown to a user, and the user clicks on the short video data, and at the same time, the average viewing time for the user to view the short video data is greater than the duration threshold, the short video recall sample file is a short video recall positive sample file. If the short video data in a short video recall file is displayed to a user, but the user does not click on the short video data, or the user clicks on the short video data, but the average watching duration of the user watching the short video data is less than or equal to a duration threshold, the short video recall sample file is a short video recall negative sample file.
Step 202, adding corresponding sample characteristics for the multi-path short video recall sample file.
In an embodiment of the present invention, one of the following sample characteristics may be added to the multiple short video recall sample file: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature. The recall source information sample characteristics may include, among other things, a recall source sample name, a recall source sample type, a recall source sample state, and the like. The recall score sample characteristics may represent the underlying relevance score of the short video recall sample file at the corresponding recall source. Recall file sample characteristics may include quality sample characteristics, freshness sample characteristics, click through rate sample characteristics, and the like. The user-presented click sample feature may comprise a user sample feature that clicks or views short video sample data in a short video recall sample file. The user behavior sample feature may represent an operation sample feature for a user to interactively manipulate short video sample data in a short video recall sample file, including, but not limited to: download operation, share operation, praise operation, comment operation, collection operation, and the like.
And 203, training the network model according to the multipath short video recall sample file and the corresponding sample characteristics.
In the embodiment of the invention, recall score sample characteristics, recall source information sample characteristics, user behavior sample characteristics and the like in sample characteristics can be used as the annotation data of the short video recall sample file, the training result obtained by training is compared with the corresponding annotation data through training of the network model, and then the parameters of the network model are adjusted according to the comparison result until the training result is the same as, similar to or meets the preset training condition with the corresponding annotation data. The trained network model can output short video recall sample files arranged in sequence, and particularly can output short video recall sample files arranged in descending order according to recall scores.
Based on the above description of the ranking method embodiment related to the short video search and the training method embodiment of the network model, a ranking scheme based on xgboost model is described below. FIG. 3a shows a schematic diagram of an offline training process for xgboost models. Firstly, acquiring multiple paths of sample data, extracting features for the sample data, dividing the sample data into positive sample data and negative sample data according to the extracted features, and finally training a xgboost model based on an artificial intelligence technology. Fig. 3b shows a schematic of the in-line application flow of xgboost model. Firstly, receiving a plurality of paths of recall files, then, calculating feature data of the recall files, constructing a feature array in a CSR format according to the feature data, next, calling an artificial intelligence service to input the recall files and the feature array into a trained xgboost model in parallel, and finally, receiving recall scores returned by the xgboost model as final scores of sequencing.
As shown in fig. 4, a schematic structural diagram of a sorting apparatus for short video search according to an embodiment of the present invention is shown. The short video search ranking apparatus may include the following modules.
The file acquisition module 41 is used for acquiring a plurality of short video recall files to be sequenced;
a feature generation module 42, configured to generate a feature array of multiple paths of the short video recall file;
The score output module 43 is configured to input multiple paths of the feature arrays into the trained network model, and correspondingly output recall scores of multiple paths of the short video recall files;
the file selection module 44 is configured to select short video recall files corresponding to the recall scores that satisfy a preset sorting condition, as a sorting result of multiple paths of the short video recall files.
In an exemplary embodiment of the present invention, the feature generation module 42 includes:
the feature data acquisition module is used for acquiring feature data of multiple paths of video recall files in each dimension;
and the compression storage module is used for compressing and storing the characteristic data to obtain the characteristic array.
In an exemplary embodiment of the present invention, the feature data obtaining module is configured to obtain one of the following features of the multiple video recall files in a document dimension: quality characteristics, freshness characteristics, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in a query dimension; and/or acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.
In an exemplary embodiment of the present invention, the compression storage module is configured to store the feature data into a sparse matrix in a compressed sparse row format, to obtain the feature array.
In an exemplary embodiment of the present invention, the file selection module 44 includes:
the score sorting module is used for sorting the short video recall files in descending order according to the recall scores;
and the result determining module is used for taking the preset number of short video recall files which are arranged in descending order and are arranged in the front as the sorting result.
Fig. 5 is a schematic structural diagram of a training device for a network model according to an embodiment of the present invention. The training apparatus of the network model may include the following modules.
The sample acquisition module 51 is configured to acquire multiple short video recall sample files;
a feature adding module 52, configured to add corresponding sample features to multiple paths of the short video recall sample files;
The model training module 53 is configured to train a network model according to the multiple short video recall sample files and the corresponding sample features.
In an exemplary embodiment of the present invention, the sample obtaining module 51 is configured to obtain a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to a viewing duration and/or a display click condition.
In an exemplary embodiment of the present invention, the feature adding module 52 is configured to add, for multiplexing the short video recall sample files, one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.
The embodiment of the invention also provides an electronic device, as shown in fig. 6, which comprises a processor 61, a communication interface 62, a memory 63 and a communication bus 64, wherein the processor 61, the communication interface 62 and the memory 63 complete communication with each other through the communication bus 64,
A memory 63 for storing a computer program;
the processor 61 is configured to execute the program stored in the memory 63, and implement the following steps:
Acquiring a plurality of short video recall files to be sequenced;
generating a characteristic array of a plurality of paths of short video recall files;
inputting the multiple paths of the feature arrays into a trained network model, and correspondingly outputting recall scores of the multiple paths of short video recall files;
and selecting short video recall files corresponding to the recall scores meeting a preset sorting condition as sorting results of the short video recall files.
The step of generating the characteristic array of the short video recall file comprises the following steps:
Acquiring characteristic data of multiple paths of video recall files in each dimension;
and compressing and storing the characteristic data to obtain the characteristic array.
The step of acquiring the feature data of the multiple paths of video recall files in each dimension comprises the following steps:
acquiring one of the following characteristics of the multiple paths of video recall files in the document dimension: quality characteristics, freshness characteristics, user characteristics; and/or the number of the groups of groups,
Acquiring query category characteristics of multiple paths of video recall files in a query dimension; and/or the number of the groups of groups,
Acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.
The step of compressing and storing the feature data to obtain the feature array comprises the following steps:
and storing the characteristic data into a sparse matrix in a compressed sparse line format to obtain the characteristic array.
The step of selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as sorting results of the short video recall files comprises the following steps:
Descending order arrangement is carried out on a plurality of short video recall files according to the recall score;
and taking the preset number of short video recall files which are arranged in descending order and are positioned in front of the descending order as the sorting result.
The processor 61 is further configured to execute the program stored in the memory 63, thereby implementing the following steps:
Acquiring a plurality of paths of short video recall sample files;
adding corresponding sample characteristics for a plurality of paths of short video recall sample files;
And training a network model according to the multipath short video recall sample files and the corresponding sample characteristics.
The step of obtaining the multipath short video recall sample file comprises the following steps:
and acquiring a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to the watching time length and/or the display click condition.
The step of adding corresponding sample characteristics for the short video recall sample files comprises the following steps:
For multiplexing the short video recall sample file, adding one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the short video search ranking method or the network model training method of any of the above embodiments.
In yet another embodiment of the present invention, a computer program product comprising instructions, which when run on a computer, causes the computer to perform the short video search ranking method of any one of the above embodiments or the training method of the network model is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (12)

1. A method for ranking short video searches, comprising:
Acquiring a plurality of short video recall files to be sequenced;
Generating a characteristic array of a plurality of paths of short video recall files; the feature array is constructed by taking the number of the short video recall files as a unit, and the features of one short video recall file are constructed into a group of feature arrays; the short video recall files are derived from multiple recall sources, and the short video recall files of each recall source have respective characteristics;
inputting the multiple paths of the feature arrays into a trained network model, and correspondingly outputting recall scores of the multiple paths of short video recall files;
and selecting short video recall files corresponding to the recall scores meeting a preset sorting condition as sorting results of the short video recall files.
2. The method of claim 1, wherein generating the feature array of the plurality of short video recall files comprises:
Acquiring characteristic data of multiple paths of video recall files in each dimension;
and compressing and storing the characteristic data to obtain the characteristic array.
3. The method of claim 2, wherein the obtaining feature data in each dimension for multiple paths of the video recall file comprises:
acquiring one of the following characteristics of the multiple paths of video recall files in the document dimension: quality characteristics, freshness characteristics, user characteristics; and/or the number of the groups of groups,
Acquiring query category characteristics of multiple paths of video recall files in a query dimension; and/or the number of the groups of groups,
Acquiring one of the following characteristics of the multiple paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, presentation characteristics.
4. The method according to claim 2, wherein the compressing and storing the feature data to obtain the feature array includes:
and storing the characteristic data into a sparse matrix in a compressed sparse line format to obtain the characteristic array.
5. The method according to any one of claims 1 to 4, wherein selecting short video recall files corresponding to the recall scores satisfying a preset sorting condition as a sorting result of multiplexing the short video recall files includes:
Descending order arrangement is carried out on a plurality of short video recall files according to the recall score;
and taking the preset number of short video recall files which are arranged in descending order and are positioned in front of the descending order as the sorting result.
6. The method of claim 1, wherein the training method of the network model comprises:
Acquiring a plurality of paths of short video recall sample files;
adding corresponding sample characteristics for a plurality of paths of short video recall sample files;
and training a network model according to the multipath short video recall sample files and the corresponding sample features, wherein in the training process, the sample features are used as the marking data of the short video recall sample files.
7. The method of claim 6, wherein the obtaining the multiplexed short video recall sample file comprises:
and acquiring a plurality of short video recall positive sample files and a plurality of short video recall negative sample files according to the watching time length and/or the display click condition.
8. The method of claim 6, wherein the adding corresponding sample features to the multiple short video recall sample files comprises:
For multiplexing the short video recall sample file, adding one of the following sample features: recall source information sample feature, recall score sample feature, recall file sample feature, user presentation click sample feature, user behavior sample feature.
9. A short video search ranking apparatus, comprising:
The file acquisition module is used for acquiring a plurality of short video recall files to be sequenced;
the feature generation module is used for generating a feature array of the short video recall file in multiple paths; the feature array is constructed by taking the number of the short video recall files as a unit, and the features of one short video recall file are constructed into a group of feature arrays; the short video recall files are derived from multiple recall sources, and the short video recall files of each recall source have respective characteristics;
The score output module is used for inputting a plurality of paths of the characteristic arrays into the trained network model and correspondingly outputting recall scores of a plurality of paths of short video recall files;
And the file selection module is used for selecting short video recall files corresponding to the recall scores meeting a preset sorting condition and taking the short video recall files as sorting results of the plurality of paths of short video recall files.
10. The sequencing device of claim 9 wherein said device further comprises:
the sample acquisition module is used for acquiring a plurality of paths of short video recall sample files;
The feature adding module is used for adding corresponding sample features for the short video recall sample files;
And the model training module is used for training the network model according to the multipath short video recall sample files and the corresponding sample characteristics, and in the training process, the sample characteristics are used as the labeling data of the short video recall sample files.
11. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
a processor for implementing the short video search ranking method of any one of claims 1 to 8 when executing a program stored on a memory.
12. A computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the short video search ranking method of any one of claims 1 to 8.
CN202110916738.2A 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium Active CN113722537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110916738.2A CN113722537B (en) 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110916738.2A CN113722537B (en) 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113722537A CN113722537A (en) 2021-11-30
CN113722537B true CN113722537B (en) 2024-04-26

Family

ID=78675471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110916738.2A Active CN113722537B (en) 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113722537B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388583A (en) * 2018-01-26 2018-08-10 北京览科技有限公司 A kind of video searching method and video searching apparatus based on video content
CN111046224A (en) * 2019-12-02 2020-04-21 上海麦克风文化传媒有限公司 Real-time recall method for audio products
CN111079022A (en) * 2019-12-20 2020-04-28 深圳前海微众银行股份有限公司 Personalized recommendation method, device, equipment and medium based on federal learning
CN111931055A (en) * 2020-08-14 2020-11-13 工银科技有限公司 Object recommendation method, object recommendation device and electronic equipment
CN112328909A (en) * 2020-11-17 2021-02-05 中国平安人寿保险股份有限公司 Information recommendation method and device, computer equipment and medium
CN112507216A (en) * 2020-12-01 2021-03-16 北京奇艺世纪科技有限公司 Data object recommendation method, device, equipment and storage medium
CN112785397A (en) * 2021-03-09 2021-05-11 中国工商银行股份有限公司 Product recommendation method, device and storage medium
CN112801760A (en) * 2021-03-30 2021-05-14 南京蓝鲸人网络科技有限公司 Sequencing optimization method and system of content personalized recommendation system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364693B2 (en) * 2008-06-13 2013-01-29 News Distribution Network, Inc. Searching, sorting, and displaying video clips and sound files by relevance
US9449089B2 (en) * 2012-05-07 2016-09-20 Pixability, Inc. Methods and systems for identifying distribution opportunities
US20160041998A1 (en) * 2014-08-05 2016-02-11 NFL Enterprises LLC Apparatus and Methods for Personalized Video Delivery
US20190250998A1 (en) * 2018-02-14 2019-08-15 Commvault Systems, Inc. Machine-learning based data object retrieval
CN113746874B (en) * 2020-05-27 2024-04-05 百度在线网络技术(北京)有限公司 Voice package recommendation method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388583A (en) * 2018-01-26 2018-08-10 北京览科技有限公司 A kind of video searching method and video searching apparatus based on video content
CN111046224A (en) * 2019-12-02 2020-04-21 上海麦克风文化传媒有限公司 Real-time recall method for audio products
CN111079022A (en) * 2019-12-20 2020-04-28 深圳前海微众银行股份有限公司 Personalized recommendation method, device, equipment and medium based on federal learning
CN111931055A (en) * 2020-08-14 2020-11-13 工银科技有限公司 Object recommendation method, object recommendation device and electronic equipment
CN112328909A (en) * 2020-11-17 2021-02-05 中国平安人寿保险股份有限公司 Information recommendation method and device, computer equipment and medium
CN112507216A (en) * 2020-12-01 2021-03-16 北京奇艺世纪科技有限公司 Data object recommendation method, device, equipment and storage medium
CN112785397A (en) * 2021-03-09 2021-05-11 中国工商银行股份有限公司 Product recommendation method, device and storage medium
CN112801760A (en) * 2021-03-30 2021-05-14 南京蓝鲸人网络科技有限公司 Sequencing optimization method and system of content personalized recommendation system

Also Published As

Publication number Publication date
CN113722537A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN110321422B (en) Method for training model on line, pushing method, device and equipment
CN109408665B (en) Information recommendation method and device and storage medium
US7903125B1 (en) Compact clustered 2-D layout
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
CN110399487B (en) Text classification method and device, electronic equipment and storage medium
CN106326391A (en) Method and device for recommending multimedia resources
CN111061957A (en) Article similarity recommendation method and device
US9558185B2 (en) Method and system to discover and recommend interesting documents
CN112464100B (en) Information recommendation model training method, information recommendation method, device and equipment
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN111177559A (en) Text travel service recommendation method and device, electronic equipment and storage medium
CN112231555A (en) Recall method, apparatus, device and storage medium based on user portrait label
CN111259195A (en) Video recommendation method and device, electronic equipment and readable storage medium
CN112825089A (en) Article recommendation method, article recommendation device, article recommendation equipment and storage medium
CN114168790A (en) Personalized video recommendation method and system based on automatic feature combination
CN113722537B (en) Short video ordering and model training method and device, electronic equipment and storage medium
CN113190696A (en) Training method of user screening model, user pushing method and related devices
CN110968790B (en) Intelligent recommendation method, equipment and storage medium for potential clients based on big data
CN110569447B (en) Network resource recommendation method and device and storage medium
CN107577690B (en) Recommendation method and recommendation device for mass information data
CN113821676A (en) Video retrieval method, device, equipment and storage medium
CN105324767A (en) A method for clustering results from a same channel
CN113282807B (en) Keyword expansion method, device, equipment and medium based on bipartite graph
CN113205369B (en) User consumption duration prediction method and device, electronic equipment and storage medium
CN116777529B (en) Object recommendation method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant