CN113722537A - Short video sequencing and model training method and device, electronic equipment and storage medium - Google Patents

Short video sequencing and model training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113722537A
CN113722537A CN202110916738.2A CN202110916738A CN113722537A CN 113722537 A CN113722537 A CN 113722537A CN 202110916738 A CN202110916738 A CN 202110916738A CN 113722537 A CN113722537 A CN 113722537A
Authority
CN
China
Prior art keywords
recall
short video
files
sample
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110916738.2A
Other languages
Chinese (zh)
Other versions
CN113722537B (en
Inventor
温恒一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202110916738.2A priority Critical patent/CN113722537B/en
Publication of CN113722537A publication Critical patent/CN113722537A/en
Application granted granted Critical
Publication of CN113722537B publication Critical patent/CN113722537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/743Browsing; Visualisation therefor a collection of video files or sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for sequencing short video search and training a model, which are applied to a scene of coarse sequencing of short videos, wherein the sequencing method comprises the following steps: acquiring a multi-channel short video recall file to be sorted; generating a characteristic array of a plurality of paths of the short video recall files; inputting the plurality of paths of feature arrays into a trained network model, and correspondingly outputting recall scores of the plurality of paths of short video recall files; and selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as sorting results of the multiple short video recall files. The method avoids the steps of roughly arranging the recall files by using a general formula and sequencing the multi-path short video recall files based on the trained network model, can solve the technical problems that the conventional rough arrangement scheme cannot adapt to the multi-path short video recall files and has low rough arrangement efficiency, and achieves the effects of adapting to the multi-path short video recall files and improving the rough arrangement efficiency.

Description

Short video sequencing and model training method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for ranking short video searches, a method and an apparatus for training a network model, an electronic device, and a computer-readable storage medium.
Background
In a search system, coarse ranking may be understood as coarse ordering and fine ranking may be understood as fine ordering. The function of the rough bar is to screen hundreds of results from thousands of recalls to provide for the fine bar, and the rough bar plays a crucial role in the final search result.
The current search system uses a general formula to carry out rough arrangement, and parameters in the general formula correspond to fixed characteristics. In a short video scene, the recall files of each path have obviously different characteristics, and a general formula cannot adapt to the recall files of different paths with different characteristics. Moreover, when the short video is subjected to rough arrangement, a large number of recall files need to be processed at the same time, and the conventional rough arrangement scheme can only score single recall files in sequence and then sort according to scoring results.
Disclosure of Invention
Embodiments of the present invention provide a method and an apparatus for ordering short video searches, a method and an apparatus for training a network model, an electronic device, and a computer-readable storage medium, which solve the problems that a conventional coarse-layout scheme cannot adapt to multiple recalled files and the coarse-layout efficiency is low. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided a method for ordering short video searches, including: acquiring a multi-channel short video recall file to be sorted; generating a characteristic array of a plurality of paths of the short video recall files; inputting the plurality of paths of feature arrays into a trained network model, and correspondingly outputting recall scores of the plurality of paths of short video recall files; and selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as sorting results of the multiple short video recall files.
Optionally, the generating a feature array of multiple short video recall files includes: acquiring characteristic data of a plurality of paths of video recall files in each dimension; and compressing and storing the feature data to obtain the feature array.
Optionally, the acquiring feature data of multiple paths of the video recall files in each dimension includes: acquiring one of the following characteristics of a plurality of paths of the video recall files in the document dimension: quality, freshness, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in query dimensions; and/or acquiring one of the following characteristics of the plurality of paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, and presentation characteristics.
Optionally, the compressing and storing the feature data to obtain the feature array includes: and storing the characteristic data into a sparse matrix in a compressed sparse row format to obtain the characteristic array.
Optionally, the selecting the short video recall file corresponding to the recall score meeting the preset sorting condition as the sorting result of the multiple short video recall files includes: according to the recall scores, performing descending order arrangement on the multiple short video recall files; and taking the short video recall files of the preset number at the front after descending order as the ordering result.
In a second aspect of the present invention, there is also provided a method for training a network model, including: acquiring a plurality of paths of short video recall sample files; adding corresponding sample characteristics to the multi-path short video recall sample file; and training a network model according to the multipath short video recall sample file and the corresponding sample characteristics.
Optionally, the obtaining multiple short videos recalling the sample file includes: and acquiring multiple paths of short video recall positive sample files and multiple paths of short video recall negative sample files according to the watching duration and/or the display click condition.
Optionally, the adding corresponding sample features to the multiple short video recall sample files includes: for multiplexing the short video recall sample file, one of the following sample features is added: the system comprises a recall source information sample characteristic, a recall score sample characteristic, a recall file sample characteristic, a user display click sample characteristic and a user behavior sample characteristic.
In a third aspect of the present invention, there is also provided a device for ordering short video searches, including: the file acquisition module is used for acquiring the multi-channel short video recall files to be sorted; the characteristic generating module is used for generating a characteristic array of the multipath short video recall file; the score output module is used for inputting the plurality of paths of feature arrays to a trained network model and correspondingly outputting the recall scores of the plurality of paths of short video recall files; and the file selection module is used for selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as the sorting results of the multiple short video recall files.
Optionally, the feature generation module includes: the characteristic data acquisition module is used for acquiring characteristic data of the multiple paths of video recall files in all dimensions; and the compression storage module is used for carrying out compression storage on the feature data to obtain the feature array.
Optionally, the feature data acquiring module is configured to acquire one of the following features of the multiple paths of video recall files in a document dimension: quality, freshness, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in query dimensions; and/or acquiring one of the following characteristics of the plurality of paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, and presentation characteristics.
Optionally, the compressed storage module is configured to store the feature data into a sparse matrix in a compressed sparse row format, so as to obtain the feature array.
Optionally, the file selection module includes: the score sorting module is used for performing descending sorting on the plurality of short video recall files according to the recall scores; and the result determining module is used for taking the short video recall files with the preset number at the front after the descending order as the ordering result.
In a fourth aspect of the present invention, there is also provided a network model training apparatus, including: the sample acquisition module is used for acquiring a plurality of paths of short video recall sample files; the characteristic adding module is used for adding corresponding sample characteristics for the multi-path short video recall sample file; and the model training module is used for training the network model according to the multipath short video recall sample file and the corresponding sample characteristics.
Optionally, the sample obtaining module is configured to obtain multiple short video recall positive sample files and multiple short video recall negative sample files according to the viewing duration and/or the display click condition.
Optionally, the feature adding module is configured to recall a sample file for multiple short videos, and add one of the following sample features: the system comprises a recall source information sample characteristic, a recall score sample characteristic, a recall file sample characteristic, a user display click sample characteristic and a user behavior sample characteristic.
In another aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; a memory for storing a computer program; and a processor, configured to implement the short video search ranking method according to the first aspect or the network model training method according to the second aspect when executing the program stored in the memory.
In yet another aspect of the present invention, there is also provided a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the ranking method for short video search according to the first aspect or the training method for network model according to the second aspect.
In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the ranking method for short video search according to the first aspect or the training method for network model according to the second aspect.
According to the short video search sequencing scheme provided by the embodiment of the invention, the technical means of obtaining the multi-path short video recall file, outputting the recall score of the multi-path short video recall file according to the characteristic array and the network model of the multi-path short video recall file, and determining the sequencing result according to the recall score are adopted. The method has the advantages that the recall files are sorted by using a general formula, the multi-channel short video recall files are sorted based on a trained network model, the technical problem that the existing coarse-arrangement scheme cannot adapt to the multi-channel short video recall files can be solved, in addition, the coarse-arrangement efficiency is low, the multi-channel short video recall files are adapted, and the effect of the coarse-arrangement efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart illustrating a method for ordering short video searches according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating steps of a method for training a network model according to an embodiment of the present invention.
Fig. 3a is a schematic diagram of an offline training process of an xgboost model according to an embodiment of the present invention.
Fig. 3b is a schematic diagram of an online application process of the xgboost model according to the embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a sorting apparatus for short video search according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a training apparatus for a network model according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
The embodiment of the invention provides a sequencing scheme for short video search, which is applied to a search scene of a short video. When the short video recall file is roughly arranged, not only the indexes such as click rate and the like, but also the consumption indexes such as click times of per-person display, playing time of per-person display and the like need to be considered. Different from a common search scene, the search scene of the short video has relatively low requirements on the correlation of the short video, and has certain requirements on the diversity and the divergence of the short video. Therefore, the embodiment of the invention can sort the multi-channel short video recall files including term recall, embedding layer recall and the like.
The embodiment of the invention provides a training scheme of a network model, which can be trained based on an xgboost (an optimized distributed gradient enhancement library) model. In the training process of the network model, sample characteristics can be flexibly set for training samples, such as adding recall source information sample characteristics and recall score sample characteristics, bias or deviation caused by different recall sources can be relieved, and personalized search of short video recall files can be realized by adding user behavior sample characteristics.
Fig. 1 is a flowchart illustrating steps of a method for ranking short video searches according to an embodiment of the present invention. The ordering method of the short video search can be applied to a server, and the server can be a short video server. The method for ordering the short video search may specifically include the following steps.
Step 101, obtaining a multi-channel short video recall file to be sorted.
In an embodiment of the present invention, the short video recall file may be a document obtained from multiple recall sources, and each document may contain one short video data and query information of the short video data. The query information may include, but is not limited to: video information and user information of the short video data. The video information includes name, number, type, capacity, duration, author, format, etc. The user information includes a user name, a click operation for video data, a viewing operation, a viewing time period, and the like.
And 102, generating a feature array of the multi-path short video recall file.
In embodiments of the present invention, since the short video recall file may originate from multiple recall sources, the short video recall files of each recall source have respective characteristics. Therefore, the multi-path short video recall file has a large number of feature types and quantity, and the features of the multi-path short video recall file can be constructed into a feature array.
And 103, inputting the multi-path characteristic array into the trained network model, and correspondingly outputting the recall scores of the multi-path short video recall files.
In the embodiment of the present invention, each short video recall file may correspond to one path of feature data, that is, the features of one path of short video recall file may be constructed as a group of feature arrays by taking the number of paths of the short video recall file as a unit. The network model may be trained based on the xgboost model. In practical applications, the xgboost model may contain 40 decision trees, each decision tree containing 6 nodes at most. The recall score is a scalar manifestation of the underlying relevance of the short video recall file, which may represent how popular the short video recall file is to the user.
And 104, selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as the sorting results of the multi-channel short video recall files.
In the embodiment of the present invention, after the recall score of each short video recall file is output, a part of the short video recall files with higher recall scores can be selected according to the recall score, and the selected short video recall file is used as the sorting result of the multiple short video recall files to be sorted.
According to the short video search sequencing scheme provided by the embodiment of the invention, the technical means of obtaining the multi-path short video recall file, outputting the recall score of the multi-path short video recall file according to the characteristic array and the network model of the multi-path short video recall file, and determining the sequencing result according to the recall score are adopted. The method has the advantages that the general formula is avoided being utilized to carry out rough arrangement on the recall files, the multi-channel short video recall files are sequenced based on the trained network model, the technical problem that the existing rough arrangement scheme cannot adapt to the multi-channel short video recall files can be solved, in addition, the rough arrangement efficiency is low, the multi-channel short video recall files are adapted, and the effect of the rough arrangement efficiency is improved.
In an exemplary embodiment of the present invention, in the process of generating the feature array of the multi-channel short video recall file, feature data of the multi-channel short video recall file in each dimension may be obtained, and then the feature data of each dimension is compressed and stored to obtain the feature data. In practical applications, the dimensions include, but are not limited to: document dimensions, query dimensions, and query and document dimensions. The feature data for the document dimension may comprise at least one of: quality characteristics, freshness characteristics, user characteristics. The quality characteristics may be the definition, the code rate, etc. of the short video. The freshness feature may be an upload time of short video data. The user features may be click users, watch users, show users, etc. of the short videos. The feature data for the query dimension may include query category features, and the like. The feature data of the query and document dimensions may contain at least one of: click rate characteristics, viewing duration characteristics, presentation characteristics, click characteristics, and the like. Wherein the click-through rate characteristic may indicate that the short video data is after presentation to the user. The users clicking the short video data account for the proportion of all the users. The viewing duration characteristic may represent an average viewing duration of the short video data. The presentation characteristic may indicate how many users the short video data is presented to, the duration of the presentation, and the like. The click feature may represent a user clicking on the short video data, a time of the click operation, and the like.
In an exemplary embodiment of the present invention, in the process of performing the above-mentioned compression storage on the feature data to obtain the feature array, the feature data may be stored in a Sparse matrix in a Compressed Sparse Row (CSR) format to obtain the feature array. A sparse matrix is a matrix with a small number of non-zero entries (in a matrix, if the number of elements of value 0 is much larger than the number of elements of non-0, and the distribution of the elements of non-0 is not regular). If the two-dimensional array storage method is adopted for the sparse matrix, a large number of storage units are wasted for storing the zero elements, and a large amount of time is wasted in operation for carrying out invalid operation on the zero elements. Compressed storage of the sparse matrix (storing only non-zero elements) must therefore be considered. The CSR format is a storage format for sparse matrices, and is an efficient format for matrix-matrix and matrix-vector operations. CSR uses three arrays of values, row offsets (indicating the starting offset position of the first element of a row within the value, and the final offset of the row plus the total number of elements of the matrix), and column numbers. Wherein the one-dimensional array data (values) stores all non-0 values in order, having as many elements as non-0 elements. One-dimensional array indptr (row offset): the certificate is included so that indptr [ i ] is the index of the element in data, which is the first non-0 element in line i. If the entire row i is 0, then indptr [ i ] ═ indptr [ i +1], if the initial matrix has m rows, then len (indptr) ═ m + 1. One-dimensional arrays of Indices contain column index information in such a way that Indices [ indptr [ i ] indptr [ i +1] is an integer array of column Indices with non-0 elements in row i. The column index indicates the column number where the value is located, starting from 0. Array data: contains non-zero elements in the matrix and is stored in a row-first form. Line offset: the row indices in CSR are compressed, without row indices, where row indices are represented by row offsets.
In practical applications, the feature array input to the network model may constitute a tensorroto (tensor prototype), which may be a map structure specifically including the following four keywords: index, indices, data, option _ mask. The value of each keyword is TensorProto (dtype ═ uin 64, tensor _ shape ═ n 1), TensorProto (dtype ═ uin 32, tensor _ shape ═ n 2), TensorProto (dtype ═ float32, tensor _ shape ═ n 3), int. Where data represents feature data and option _ mask specifies the output recall score. That is, the recall score output from the network model is output through the option _ mask. n1, n2, and n3 respectively represent the respective numbers of elements.
In an exemplary embodiment of the present invention, in the process of selecting the short video recall files corresponding to the recall scores meeting the preset sorting condition as the sorting result of the multi-path short video recall files, the multi-path short video recall files may be sorted in a descending order according to the recall scores, and a preset number of short video recall files that are arranged in the descending order before are used as the sorting result. For example, if the preset number is 100, the top 100 short video recall files in descending order can be used as the sorting result.
Fig. 2 is a flowchart illustrating steps of a method for training a network model according to an embodiment of the present invention. The network model training method can be applied to a server, and the server can be a short video server. The training method of the network model specifically comprises the following steps.
Step 201, obtaining a plurality of short videos and recalling a sample file.
In the embodiment of the invention, a plurality of short video recall positive sample files and a plurality of short video recall negative sample files can be obtained. In practical application, a duration threshold may be set, and the duration threshold may be understood as a playing duration threshold of a short video when a user watches the short video. And determining whether the obtained short video recall sample file is a short video recall positive sample file or a short video recall negative sample file according to the duration threshold. Specifically, the average watching duration of the short video data in the short video recall sample file may be compared with a duration threshold, and if the average watching duration is greater than the duration threshold, the short video recall sample file is a short video recall sample file; and if the average watching duration is less than or equal to the duration threshold, the short video recall sample file is a short video recall negative sample file. Besides the fact that the short video recall sample file is judged to be the short video recall positive sample file or the short video recall negative sample file by utilizing the time length threshold, multiple paths of short video recall positive sample files and multiple paths of short video recall negative sample files can be obtained according to the watching time length and/or the display click condition. For example, if the short video data in a short video recall file is shown to the user, and the user clicks on the short video data, and meanwhile, the average watching duration of the short video data watched by the user is greater than the duration threshold, the short video recall sample file is a short video recall positive sample file. If the short video data in a certain short video recall file is shown to the user, but the user does not click on the short video data, or the user clicks on the short video data, but the average watching time of the user watching the short video data is less than or equal to the time threshold, the short video recall sample file is a short video recall negative sample file.
Step 202, adding corresponding sample characteristics for the multi-path short video recall sample file.
In an embodiment of the present invention, one of the following sample features may be added to the short-cut video recall sample file: the system comprises a recall source information sample characteristic, a recall score sample characteristic, a recall file sample characteristic, a user display click sample characteristic and a user behavior sample characteristic. The recall source information sample characteristics may include a recall source sample name, a recall source sample type, a recall source sample status, and the like. The recall score sample feature may represent a base relevance score of the short video recall sample file at the corresponding recall source. The recall file sample characteristics may include quality sample characteristics, freshness sample characteristics, click through rate sample characteristics, and the like. The user-presented click sample features may include user sample features that click on or view short video sample data in the short video recall sample file. The user behavior sample feature may represent an operation sample feature of a user performing an interactive operation on the short video sample data in the short video recall sample file, where the interactive operation includes, but is not limited to: download operations, share operations, like operations, comment operations, favorites operations, and the like.
And step 203, training the network model according to the multi-channel short video recall sample file and the corresponding sample characteristics.
In the embodiment of the invention, the recall score sample characteristics, the recall source information sample characteristics, the user behavior sample characteristics and the like in the sample characteristics can be used as the marking data of the short video recall sample file, the training result obtained by training is compared with the corresponding marking data through the training of the network model, and the parameters of the network model are adjusted according to the comparison result until the training result is the same as, similar to or meets the preset training condition with the corresponding marking data. The trained network model can output short video recall sample files arranged in sequence, and specifically can output short video recall sample files arranged in descending order from large recall score to small recall score.
Based on the above description about the embodiments of the short video search ranking method and the network model training method, a ranking scheme based on an xgboost model is introduced below. FIG. 3a shows an off-line training flow diagram of the xgboost model. Firstly, obtaining multi-path sample data, extracting characteristics for the sample data, then dividing the sample data into positive sample data and negative sample data according to the extracted characteristics, and finally training the xgboost model based on an artificial intelligence technology. Fig. 3b shows a schematic diagram of an online application flow of the xgboost model. Firstly, receiving a plurality of paths of recall files, then calculating feature data of the recall files, constructing a feature array in a CSR format according to the feature data, then calling artificial intelligence service to input the recall files and the feature array to a trained xgboost model in parallel, and finally receiving recall scores returned by the xgboost model as final scores of sequencing.
Fig. 4 is a schematic structural diagram illustrating an apparatus for sorting short video searches according to an embodiment of the present invention. The ordering means for the short video search may comprise the following modules.
The file acquisition module 41 is configured to acquire a multi-channel short video recall file to be sorted;
the characteristic generating module 42 is configured to generate a characteristic array of the multiple short video recall files;
a score output module 43, configured to input the multiple paths of feature arrays to a trained network model, and correspondingly output recall scores of the multiple paths of short video recall files;
and the file selection module 44 is configured to select the short video recall file corresponding to the recall score meeting the preset sorting condition as a sorting result of the multiple short video recall files.
In an exemplary embodiment of the present invention, the feature generation module 42 includes:
the characteristic data acquisition module is used for acquiring characteristic data of the multiple paths of video recall files in all dimensions;
and the compression storage module is used for carrying out compression storage on the feature data to obtain the feature array.
In an exemplary embodiment of the present invention, the feature data obtaining module is configured to obtain one of the following features of the multiple video recall files in a document dimension: quality, freshness, user characteristics; and/or acquiring query category characteristics of the multiple paths of video recall files in query dimensions; and/or acquiring one of the following characteristics of the plurality of paths of video recall files in query and document dimensions: click rate characteristics, viewing duration characteristics, and presentation characteristics.
In an exemplary embodiment of the invention, the compressed storage module is configured to store the feature data into a sparse matrix in a compressed sparse row format, so as to obtain the feature array.
In an exemplary embodiment of the present invention, the file selecting module 44 includes:
the score sorting module is used for performing descending sorting on the plurality of short video recall files according to the recall scores;
and the result determining module is used for taking the short video recall files with the preset number at the front after the descending order as the ordering result.
Fig. 5 is a schematic structural diagram illustrating a training apparatus for a network model according to an embodiment of the present invention. The training device of the network model may include the following modules.
The sample acquisition module 51 is configured to acquire a plurality of short videos and recall sample files;
a feature adding module 52, configured to add corresponding sample features to the multiple short video recall sample files;
and the model training module 53 is configured to train a network model according to the multiple short video recall sample files and the corresponding sample features.
In an exemplary embodiment of the present invention, the sample obtaining module 51 is configured to obtain multiple short video recall positive sample files and multiple short video recall negative sample files according to the viewing duration and/or the display click condition.
In an exemplary embodiment of the present invention, the feature adding module 52 is configured to recall a sample file for multiple short videos, and add one of the following sample features: the system comprises a recall source information sample characteristic, a recall score sample characteristic, a recall file sample characteristic, a user display click sample characteristic and a user behavior sample characteristic.
An embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 61, a communication interface 62, a memory 63, and a communication bus 64, where the processor 61, the communication interface 62, and the memory 63 complete mutual communication through the communication bus 64,
a memory 63 for storing a computer program;
the processor 61 is configured to implement the following steps when executing the program stored in the memory 63:
acquiring a multi-channel short video recall file to be sorted;
generating a characteristic array of a plurality of paths of the short video recall files;
inputting the plurality of paths of feature arrays into a trained network model, and correspondingly outputting recall scores of the plurality of paths of short video recall files;
and selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as sorting results of the multiple short video recall files.
The step of generating the feature array of the multipath short video recall file comprises the following steps:
acquiring characteristic data of a plurality of paths of video recall files in each dimension;
and compressing and storing the feature data to obtain the feature array.
The step of obtaining feature data of the plurality of paths of the video recall files in each dimension includes:
acquiring one of the following characteristics of a plurality of paths of the video recall files in the document dimension: quality, freshness, user characteristics; and/or the presence of a gas in the gas,
acquiring query category characteristics of a plurality of paths of video recall files in a query dimension; and/or the presence of a gas in the gas,
acquiring one of the following characteristics of the multi-channel video recall file in the dimensions of query and document: click rate characteristics, viewing duration characteristics, and presentation characteristics.
The step of compressing and storing the feature data to obtain the feature array comprises the following steps:
and storing the characteristic data into a sparse matrix in a compressed sparse row format to obtain the characteristic array.
The step of selecting the short video recall file corresponding to the recall score meeting the preset sorting condition as the sorting result of the plurality of short video recall files comprises the following steps:
according to the recall scores, performing descending order arrangement on the multiple short video recall files;
and taking the short video recall files of the preset number at the front after descending order as the ordering result.
The processor 61 is further configured to implement the following steps when executing the program stored in the memory 63:
acquiring a plurality of paths of short video recall sample files;
adding corresponding sample characteristics to the multi-path short video recall sample file;
and training a network model according to the multipath short video recall sample file and the corresponding sample characteristics.
The step of obtaining the multipath short video recalling the sample file comprises the following steps:
and acquiring multiple paths of short video recall positive sample files and multiple paths of short video recall negative sample files according to the watching duration and/or the display click condition.
The step of adding corresponding sample features to the plurality of short video recall sample files comprises:
for multiplexing the short video recall sample file, one of the following sample features is added: the system comprises a recall source information sample characteristic, a recall score sample characteristic, a recall file sample characteristic, a user display click sample characteristic and a user behavior sample characteristic.
The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the terminal and other equipment.
The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In another embodiment of the present invention, there is also provided a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the ranking method of short video search or the training method of network model in any of the above embodiments.
In yet another embodiment of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for ranking short video searches or the method for training the network model according to any of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. A method for ordering short video searches, comprising:
acquiring a multi-channel short video recall file to be sorted;
generating a characteristic array of a plurality of paths of the short video recall files;
inputting the plurality of paths of feature arrays into a trained network model, and correspondingly outputting recall scores of the plurality of paths of short video recall files;
and selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as sorting results of the multiple short video recall files.
2. The method of claim 1, wherein the generating a feature array of the plurality of short video recall files comprises:
acquiring characteristic data of a plurality of paths of video recall files in each dimension;
and compressing and storing the feature data to obtain the feature array.
3. The method of claim 2, wherein the obtaining feature data of the plurality of video recall files in each dimension comprises:
acquiring one of the following characteristics of a plurality of paths of the video recall files in the document dimension: quality, freshness, user characteristics; and/or the presence of a gas in the gas,
acquiring query category characteristics of a plurality of paths of video recall files in a query dimension; and/or the presence of a gas in the gas,
acquiring one of the following characteristics of the multi-channel video recall file in the dimensions of query and document: click rate characteristics, viewing duration characteristics, and presentation characteristics.
4. The method of claim 2, wherein the compressing the feature data to obtain the feature array comprises:
and storing the characteristic data into a sparse matrix in a compressed sparse row format to obtain the characteristic array.
5. The method according to any one of claims 1 to 4, wherein the selecting a short video recall file corresponding to the recall score meeting a preset ranking condition as a ranking result of a plurality of short video recall files comprises:
according to the recall scores, performing descending order arrangement on the multiple short video recall files;
and taking the short video recall files of the preset number at the front after descending order as the ordering result.
6. A method for training a network model, comprising:
acquiring a plurality of paths of short video recall sample files;
adding corresponding sample characteristics to the multi-path short video recall sample file;
and training a network model according to the multipath short video recall sample file and the corresponding sample characteristics.
7. The method of claim 6, wherein obtaining the plurality of short video recall sample files comprises:
and acquiring multiple paths of short video recall positive sample files and multiple paths of short video recall negative sample files according to the watching duration and/or the display click condition.
8. The method of claim 6, wherein adding corresponding sample features for the plurality of short video recall sample files comprises:
for multiplexing the short video recall sample file, one of the following sample features is added: the system comprises a recall source information sample characteristic, a recall score sample characteristic, a recall file sample characteristic, a user display click sample characteristic and a user behavior sample characteristic.
9. An apparatus for ordering short video searches, comprising:
the file acquisition module is used for acquiring the multi-channel short video recall files to be sorted;
the characteristic generating module is used for generating a characteristic array of the multipath short video recall file;
the score output module is used for inputting the plurality of paths of feature arrays to a trained network model and correspondingly outputting the recall scores of the plurality of paths of short video recall files;
and the file selection module is used for selecting the short video recall files corresponding to the recall scores meeting the preset sorting conditions as the sorting results of the multiple short video recall files.
10. An apparatus for training a network model, comprising:
the sample acquisition module is used for acquiring a plurality of paths of short video recall sample files;
the characteristic adding module is used for adding corresponding sample characteristics for the multi-path short video recall sample file;
and the model training module is used for training the network model according to the multipath short video recall sample file and the corresponding sample characteristics.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method for ranking short video searches of any of claims 1 to 5 or the method for training a network model of any of claims 6 to 8 when executing a program stored in a memory.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of ranking short video searches of any of claims 1 to 5 or the method of training a network model of any of claims 6 to 8.
CN202110916738.2A 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium Active CN113722537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110916738.2A CN113722537B (en) 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110916738.2A CN113722537B (en) 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113722537A true CN113722537A (en) 2021-11-30
CN113722537B CN113722537B (en) 2024-04-26

Family

ID=78675471

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110916738.2A Active CN113722537B (en) 2021-08-11 2021-08-11 Short video ordering and model training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113722537B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090313236A1 (en) * 2008-06-13 2009-12-17 News Distribution Network, Inc. Searching, sorting, and displaying video clips and sound files by relevance
US20130297638A1 (en) * 2012-05-07 2013-11-07 Pixability, Inc. Methods and systems for identifying distribution opportunities
US20160041998A1 (en) * 2014-08-05 2016-02-11 NFL Enterprises LLC Apparatus and Methods for Personalized Video Delivery
CN108388583A (en) * 2018-01-26 2018-08-10 北京览科技有限公司 A kind of video searching method and video searching apparatus based on video content
US20190250998A1 (en) * 2018-02-14 2019-08-15 Commvault Systems, Inc. Machine-learning based data object retrieval
CN111046224A (en) * 2019-12-02 2020-04-21 上海麦克风文化传媒有限公司 Real-time recall method for audio products
CN111079022A (en) * 2019-12-20 2020-04-28 深圳前海微众银行股份有限公司 Personalized recommendation method, device, equipment and medium based on federal learning
CN111931055A (en) * 2020-08-14 2020-11-13 工银科技有限公司 Object recommendation method, object recommendation device and electronic equipment
CN112328909A (en) * 2020-11-17 2021-02-05 中国平安人寿保险股份有限公司 Information recommendation method and device, computer equipment and medium
CN112507216A (en) * 2020-12-01 2021-03-16 北京奇艺世纪科技有限公司 Data object recommendation method, device, equipment and storage medium
CN112785397A (en) * 2021-03-09 2021-05-11 中国工商银行股份有限公司 Product recommendation method, device and storage medium
CN112801760A (en) * 2021-03-30 2021-05-14 南京蓝鲸人网络科技有限公司 Sequencing optimization method and system of content personalized recommendation system
KR20210090273A (en) * 2020-05-27 2021-07-19 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 Voice packet recommendation method, device, equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090313236A1 (en) * 2008-06-13 2009-12-17 News Distribution Network, Inc. Searching, sorting, and displaying video clips and sound files by relevance
US20130297638A1 (en) * 2012-05-07 2013-11-07 Pixability, Inc. Methods and systems for identifying distribution opportunities
US20160041998A1 (en) * 2014-08-05 2016-02-11 NFL Enterprises LLC Apparatus and Methods for Personalized Video Delivery
CN108388583A (en) * 2018-01-26 2018-08-10 北京览科技有限公司 A kind of video searching method and video searching apparatus based on video content
US20190250998A1 (en) * 2018-02-14 2019-08-15 Commvault Systems, Inc. Machine-learning based data object retrieval
CN111046224A (en) * 2019-12-02 2020-04-21 上海麦克风文化传媒有限公司 Real-time recall method for audio products
CN111079022A (en) * 2019-12-20 2020-04-28 深圳前海微众银行股份有限公司 Personalized recommendation method, device, equipment and medium based on federal learning
KR20210090273A (en) * 2020-05-27 2021-07-19 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 Voice packet recommendation method, device, equipment and storage medium
CN111931055A (en) * 2020-08-14 2020-11-13 工银科技有限公司 Object recommendation method, object recommendation device and electronic equipment
CN112328909A (en) * 2020-11-17 2021-02-05 中国平安人寿保险股份有限公司 Information recommendation method and device, computer equipment and medium
CN112507216A (en) * 2020-12-01 2021-03-16 北京奇艺世纪科技有限公司 Data object recommendation method, device, equipment and storage medium
CN112785397A (en) * 2021-03-09 2021-05-11 中国工商银行股份有限公司 Product recommendation method, device and storage medium
CN112801760A (en) * 2021-03-30 2021-05-14 南京蓝鲸人网络科技有限公司 Sequencing optimization method and system of content personalized recommendation system

Also Published As

Publication number Publication date
CN113722537B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN110321422B (en) Method for training model on line, pushing method, device and equipment
CN106326391B (en) Multimedia resource recommendation method and device
CN108304512B (en) Video search engine coarse sorting method and device and electronic equipment
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
US20100070507A1 (en) Hybrid content recommending server, system, and method
CN110909182A (en) Multimedia resource searching method and device, computer equipment and storage medium
CN103279513A (en) Method for generating content label and method and device for providing multi-media content information
CN104423621A (en) Pinyin string processing method and device
CN112231555B (en) Recall method, device, equipment and storage medium based on user portrait label
CN112464100B (en) Information recommendation model training method, information recommendation method, device and equipment
CN110472016B (en) Article recommendation method and device, electronic equipment and storage medium
CN108363730B (en) Content recommendation method, system and terminal equipment
CN110929166A (en) Content recommendation method, electronic device and storage medium
CN111159563A (en) Method, device and equipment for determining user interest point information and storage medium
CN111259195A (en) Video recommendation method and device, electronic equipment and readable storage medium
CN105808581B (en) Data clustering method and device and Spark big data platform
CN103069825A (en) System and method for television search assistant
CN106227881B (en) Information processing method and server
CN106709851A (en) Big data retrieval method and apparatus
CN112825089A (en) Article recommendation method, article recommendation device, article recommendation equipment and storage medium
EP3871115A1 (en) Data retrieval
CN110971973A (en) Video pushing method and device and electronic equipment
CN113722537B (en) Short video ordering and model training method and device, electronic equipment and storage medium
CN110942376A (en) Fusion method of real-time multi-recall strategy of audio products
CN110188277A (en) A kind of recommended method and device of resource

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant