CN113254714B

CN113254714B - Video feedback method, device, equipment and medium based on query analysis

Info

Publication number: CN113254714B
Application number: CN202110686106.1A
Authority: CN
Inventors: 陈宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2021-11-05
Anticipated expiration: 2041-06-21
Also published as: CN113254714A

Abstract

The invention relates to the field of artificial intelligence, and provides a query analysis-based video feedback method, a query analysis-based video feedback device, query analysis-based video feedback equipment and a query analysis-based video feedback medium, wherein a target query with higher attention and a video segment related to the target query can be obtained by analyzing a query log, the model obtained by training can better accord with the attention point of the user, the video content analysis model obtained by training can analyze and convert the content of the input video segment, and then the vector representation of the video segment content is output, the main content of the video segment can be reflected, the relevance between the problem and the feedback video content can be effectively improved by adopting the spliced vector training target classification model after the vector representation of the target query and the vector representation of the video segment associated with each target query are fused, the classification effect of the model is better, and then the target video is obtained, and then accurate video feedback is realized based on artificial intelligence means. The invention also relates to blockchain technology, and the trained model can be stored on blockchain nodes.

Description

Video feedback method, device, equipment and medium based on query analysis

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a video feedback method, a device, equipment and a medium based on query analysis.

Background

The following areas of application of video content analysis techniques are mainly entertainment related, for example: sports video generation collection, assembly of art-integrated videos and the like are relatively deficient for video assembly in the field of professional training.

In addition, when each training platform carries out video feedback according to the problems brought forward by the user, the video theme is mainly relied on, and the video content is not related to the problems input by the user, so that the fed-back video is not detailed enough, the accuracy is low, and the requirements of the user cannot be matched.

Disclosure of Invention

The embodiment of the invention provides a query analysis-based video feedback method, a query analysis-based video feedback device, query analysis-based video feedback equipment and a query log-based video feedback medium, wherein a target query with higher attention and a video segment associated with the target query can be obtained by analyzing a query log so as to be used by a subsequent training model, the main content of the video segment can be reflected, the trained model can better accord with the attention point of a user, the target classification model is trained by adopting a spliced vector obtained by fusing vector representation of the target query and vector representation of the video segment associated with each target query, the relevance between the problem and the feedback video content can be effectively improved, the classification effect of the model is better, and accurate video feedback is realized based on an artificial intelligence means.

In a first aspect, an embodiment of the present invention provides a video feedback method based on query analysis, including:

acquiring a query log generated within a preset time length, acquiring browsing information from the query log, and analyzing the browsing information to obtain a target query and a video segment associated with the target query;

performing visual feature extraction on the video segments associated with the target query to obtain a vector representation of each video segment in the video segments associated with the target query;

determining video segments associated with the target query as input, determining vector characterization of each video segment as output, and training a preset neural network model to obtain a video content analysis model;

converting the target query into query vectors, and splicing each query vector in the query vectors and the corresponding vector representation of each video segment to obtain a spliced vector;

training a preset classification model by using the splicing vector to obtain a target classification model;

when a query to be processed and a feedback video of the query to be processed are received, converting the query to be processed into a first vector, and inputting the feedback video into the video content analysis model to obtain at least one second vector;

splicing the first vector and each second vector in the at least one second vector to obtain at least one third vector, and inputting the at least one third vector to the target classification model to obtain at least one classification result and a first probability of each classification result;

calculating the similarity between the first vector and each second vector to obtain at least one second similarity;

and calculating the correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity, and determining the target video corresponding to the query to be processed according to the correlation.

According to a preferred embodiment of the present invention, the analyzing the browsing information to obtain a target query and a video segment associated with the target query includes:

obtaining the query input by a user from the browsing information, calculating the query frequency of each query according to the query input by the user, and obtaining the query with the query frequency greater than or equal to the configuration frequency as the target query; and/or

Acquiring the click rate of each video segment from the browsing information, and determining the query corresponding to the video segment with the click rate greater than or equal to the configured click rate as the target query; and/or

Acquiring the playing time length of each video segment from the browsing information, and determining the query corresponding to the video segment with the playing time length being greater than or equal to the configured time length as the target query;

and connecting a specified video library, and acquiring a video segment associated with the target query from the specified video library.

According to the preferred embodiment of the present invention, the extracting the visual features of the video segments associated with the target query to obtain the vector representation of each video segment in the video segments associated with the target query includes:

calling an increment-Renet v2 model as a pre-training model;

and inputting the video segments associated with the target query into the pre-training model, and acquiring a vector output by the pre-training model as a vector representation of each video segment in the video segments associated with the target query.

According to the preferred embodiment of the present invention, the converting the target query into query vectors, and splicing each query vector in the query vectors and the corresponding vector representation of each video segment to obtain a spliced vector includes:

converting the target query into a query vector by adopting a word2vec algorithm;

acquiring a vector representation of a video segment associated with each target query, and acquiring a query vector of each target query;

and transversely splicing the vector representation of the video segment associated with each target query and the query vector of each target query to obtain the spliced vector.

According to the preferred embodiment of the present invention, the training of the preset classification model by using the stitching vector to obtain the target classification model includes:

dividing the splicing vector according to a preset proportion to obtain a training sample and a verification sample;

training a linear classifier by using the training sample until the linear classifier reaches convergence, and stopping training;

verifying the trained model by using the verification sample;

and when the model obtained by training passes the verification, stopping training to obtain the target classification model.

According to a preferred embodiment of the present invention, the calculating the correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity includes:

determining a first weight of the first probability and determining a second weight of the second similarity;

calculating a product between the first probability and the first weight as a first product;

calculating a product between each second similarity in the at least one second similarity and the second weight as a second product corresponding to each second similarity;

and calculating the sum of the first product and each corresponding second product as the correlation between the query to be processed and the feedback video of the query to be processed.

According to the preferred embodiment of the present invention, the determining the target video corresponding to the query to be processed according to the correlation includes:

sequencing the feedback videos according to the sequence of the correlation degrees from high to low, and acquiring the feedback videos arranged at the front preset positions as target videos corresponding to the query to be processed; and/or

And acquiring the feedback video with the correlation degree larger than or equal to the configuration correlation degree as the target video corresponding to the query to be processed.

In a second aspect, an embodiment of the present invention provides a video feedback apparatus based on query analysis, including:

the analysis unit is used for acquiring a query log generated within a preset time length, acquiring browsing information from the query log, and analyzing the browsing information to obtain a target query and a video segment associated with the target query;

the extracting unit is used for extracting visual features of the video segments associated with the target query to obtain a vector representation of each video segment in the video segments associated with the target query;

the training unit is used for determining the video segments associated with the target query as input, determining the vector characteristics of each video segment as output, and training a preset neural network model to obtain a video content analysis model;

the splicing unit is used for converting the target query into query vectors, and splicing each query vector in the query vectors and the corresponding vector representation of each video segment to obtain a spliced vector;

the training unit is also used for training a preset classification model by using the splicing vector to obtain a target classification model;

the conversion unit is used for converting the query to be processed into a first vector when the query to be processed and the feedback video of the query to be processed are received, and inputting the feedback video into the video content analysis model to obtain at least one second vector;

the classification unit is used for splicing the first vector and each second vector in the at least one second vector to obtain at least one third vector, and inputting the at least one third vector into the target classification model to obtain at least one classification result and a first probability of each classification result;

the calculating unit is used for calculating the similarity between the first vector and each second vector to obtain at least one second similarity;

and the determining unit is used for calculating the correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity, and determining the target video corresponding to the query to be processed according to the correlation.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the query analysis-based video feedback method described in the first aspect.

In a fourth aspect, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the query analysis-based video feedback method according to the first aspect.

The embodiment of the invention provides a query analysis-based video feedback method, a query analysis-based video feedback device, a query analysis-based video feedback equipment and a query analysis-based video feedback medium, which can acquire a query log generated within a preset time length, acquire browsing information from the query log, analyze the browsing information to acquire a target query and a video segment associated with the target query, acquire the target query with higher attention of a user and the video segment associated with the target query by analyzing the query log for use by a subsequent training model, make the trained model better accord with the attention point of the user, extract visual features of the video segment associated with the target query to acquire vector representation of each video segment in the video segments associated with the target query, determine the video segment associated with the target query as input, determine the vector representation of each video segment as output, train a preset neural network model to acquire a video content analysis model, the trained video content analysis model can analyze and convert the content of an input video segment, further output the vector representation of the video segment content, reflect the main content of a video segment, convert the target query into query vectors, splice each query vector in the query vectors with the corresponding vector representation of each video segment to obtain a spliced vector, train a preset classification model by using the spliced vector to obtain a target classification model, train the target classification model by using the spliced vector obtained by fusing the vector representation of the target query and the vector representation of the video segment associated with each target query, effectively improve the association between problems and feedback video content, enable the classification effect of the model to be better, convert the query to be processed into a first vector when receiving the query to be processed and the feedback video of the query to be processed, and input the feedback video into the video content analysis model, obtaining at least one second vector, splicing the first vector and each second vector in the at least one second vector to obtain at least one third vector, inputting the at least one third vector to the target classification model to obtain at least one classification result and a first probability of each classification result, calculating a similarity between the first vector and each second vector to obtain at least one second similarity, calculating a correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity, determining a target video corresponding to the query to be processed according to the correlation, and further realizing accurate video feedback based on an artificial intelligence means.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video feedback method based on query analysis according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of a query analysis-based video feedback device according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Please refer to fig. 1, which is a flowchart illustrating a video feedback method based on query analysis according to an embodiment of the present invention.

S10, obtaining a query log generated within a preset time length, obtaining browsing information from the query log, and analyzing the browsing information to obtain a target query and a video segment associated with the target query.

The preset duration can be configured in a user-defined manner, such as one month.

In this embodiment, the query log is a log generated by the system, and the query log stores a query of a user, a click condition of the user on an answer, and the like.

It can be understood that the attention degree of the user to the video can be analyzed through the browsing information in the query log.

In at least one embodiment of the present invention, the browsing information may include, but is not limited to, one or a combination of the following:

whether the fed back video is clicked or not, the dwell time of the page, the number of clicks of the fed back video and the query input by the user.

In at least one embodiment of the present invention, the analyzing the browsing information to obtain a target query and a video segment associated with the target query includes:

The configuration frequency, the configuration click rate and the configuration duration may be configured by user-defined, which is not limited in the present invention.

In addition, in other embodiments, according to the size of the data volume, the target query may also be determined in a sorting manner, for example, a query corresponding to a video segment with a click rate of top 100 is determined as the target query, which is not limited in the present invention.

In this embodiment, the designated video library refers to a video library containing abundant video resources, and the data volume in the designated video library is sufficient, so that the requirement of subsequent model training can be met.

By the embodiment, the target query with higher attention of the user and the video segment associated with the target query can be obtained by analyzing the query log so as to be used by a subsequent training model, and the model obtained by training can better accord with the attention point of the user.

And S11, performing visual feature extraction on the video segments associated with the target query to obtain the vector representation of each video segment in the video segments associated with the target query.

In at least one embodiment of the present invention, the performing visual feature extraction on the video segments associated with the target query to obtain a vector representation of each video segment in the video segments associated with the target query includes:

calling an increment-Renet v2 model as a pre-training model;

In the embodiment, the inclusion-Renet v2 is used as a model for visual feature extraction, so that the number of layers of a depth network can be increased, and the generalization of visual features can be improved.

And S12, determining the video segments associated with the target query as input, determining the vector characteristics of each video segment as output, and training a preset neural network model to obtain a video content analysis model.

In this embodiment, the preset Neural network model may be a Convolutional Neural Network (CNN), which is not limited in the present invention.

The trained video content analysis model can analyze and convert the content of the input video segment, and then the vector representation of the content of the video segment is output.

That is, the output vector representation is the subject vector of the video content, and can reflect the main content of the video segment.

And S13, converting the target query into query vectors, and splicing each query vector in the query vectors and the corresponding vector representation of each video segment to obtain a spliced vector.

In at least one embodiment of the present invention, the converting the target query into query vectors, and splicing each query vector in the query vectors and the vector representation of each corresponding video segment to obtain a spliced vector includes:

By the implementation method, the vector characterization of the target query (namely the query vector) and the vector characterization of the video segment associated with each target query can be subjected to feature fusion, namely the fusion between the query and the video content is realized for the use of a subsequent training model.

And S14, training a preset classification model by using the splicing vector to obtain a target classification model.

In at least one embodiment of the present invention, the training a preset classification model by using the stitching vector to obtain a target classification model includes:

verifying the trained model by using the verification sample;

Wherein, the preset proportion can be configured by self-definition, such as 8: 2.

Wherein the linear classifier may include, but is not limited to: logistic Regression classifier (Logistic Regression) or softmax.

Through the implementation mode, the classification model can be obtained based on the training of the linear classifier, the algorithm of the linear classifier is simple, the learning capability is realized, the classification effect on simple two-classification and multi-classification problems is high, and the operation speed is high.

Meanwhile, the target classification model is trained by adopting the splicing vector after the vector representation of the target query and the vector representation of the video segment associated with each target query are fused, so that the relevance between the problem and the feedback video content can be effectively improved, and the classification effect of the model is better.

S15, when a query to be processed and a feedback video of the query to be processed are received, converting the query to be processed into a first vector, and inputting the feedback video into the video content analysis model to obtain at least one second vector.

In this embodiment, the query to be processed refers to a query to be queried input by a user, and the feedback video of the query to be processed refers to a result of preliminary feedback performed by a query platform (such as a training platform) where the user is located according to the query input by the user, where the result is invisible to the user and is only used for subsequent further data analysis.

Moreover, the coverage of the feedback video of the query to be processed is comprehensive, so that sufficient data can be provided for subsequent analysis and inaccurate feedback video can be avoided.

In this embodiment, the query to be processed may be converted into the first vector by using a word2vec algorithm, which is not limited in the present invention.

In this embodiment, the feedback video is input to the video content analysis model, so that a vector representation of the video content included in the feedback video, that is, the at least one second vector, can be obtained.

S16, splicing the first vector and each second vector in the at least one second vector to obtain at least one third vector, inputting the at least one third vector to the target classification model to obtain at least one classification result and a first probability of each classification result.

In this embodiment, the splicing the first vector and each of the at least one second vector to obtain at least one third vector includes:

and transversely splicing the first vector and each second vector in the at least one second vector to obtain at least one third vector.

And further inputting the at least one third vector to the target classification model, and acquiring the output of the target classification model aiming at the at least one third vector as the at least one classification result and the first probability of each classification result.

For example: the classification result may be in the form of 0 and 1, where 0 represents a mismatch between the corresponding first vector and the second vector, that is, a mismatch between the input query to be processed and the corresponding video feed back, and at the same time, outputs a probability of the classification result, which represents a probability value of a mismatch between the input query to be processed and the corresponding video feed back.

Similarly, 1 represents matching between the corresponding first vector and the corresponding second vector, that is, represents matching between the input query to be processed and the corresponding feedback video, and simultaneously outputs the probability of the classification result, and represents the probability value of matching between the input query to be processed and the corresponding feedback video.

The first probability represents the similarity of the video content dimension and the query to be processed.

S17, calculating the similarity between the first vector and each second vector to obtain at least one second similarity.

In this embodiment, the cosine similarity between the first vector and each second vector may be calculated to obtain the at least one second similarity, which is not limited in the present invention.

And the second similarity represents the similarity between the input query to be processed and the corresponding feedback video in the text dimension.

S18, calculating the correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity, and determining the target video corresponding to the query to be processed according to the correlation.

In at least one embodiment of the present invention, the calculating the correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity includes:

The first weight and the second weight can be configured by self-definition according to actual conditions. Such as: the first weight is 0.5 and the second weight is 0.5.

In the above embodiment, the relevance between the query to be processed and the feedback video of the query to be processed is determined in a weighting manner, so that the finally calculated relevance has the similarity in text dimension and the similarity in video content dimension, the fed back video and the query to be processed have strong relevance in multiple dimensions, and the accuracy of the fed back video is further improved.

In at least one embodiment of the present invention, the determining, according to the correlation, a target video corresponding to the query to be processed includes:

Wherein, the configuration relevancy can be configured by self, such as 95%.

The above embodiment simultaneously determines the target video corresponding to the query to be processed by combining the ranking of the correlation degree and the magnitude of the correlation degree value, thereby avoiding the problem of limited feedback quantity in a single mode and enabling the fed back target video to be more reasonable.

In at least one embodiment of the present invention, after the target video corresponding to the query to be processed is determined according to the relevance, the target video is fed back to a display interface of a related device (such as user equipment) for a user to view in time.

It should be noted that, in order to further ensure the security of the data and avoid malicious tampering of the data, the trained model may be stored on the blockchain node.

According to the technical scheme, the method can acquire the query log generated within the preset time, acquire the browsing information from the query log, analyze the browsing information to obtain the target query and the video segment associated with the target query, acquire the target query with higher attention of the user and the video segment associated with the target query by analyzing the query log for use by a subsequent training model, make the trained model better conform to the attention point of the user, extract the visual characteristics of the video segment associated with the target query to obtain the vector representation of each video segment in the video segments associated with the target query, determine the video segment associated with the target query as input, determine the vector representation of each video segment as output, train the preset neural network model to obtain the video content analysis model, and analyze and convert the content of the input video segment by the trained video content analysis model, and then outputting vector representation of video segment content, which can reflect the main content of a video segment, converting the target query into query vectors, splicing each query vector in the query vectors with the vector representation of each corresponding video segment to obtain a spliced vector, training a preset classification model by using the spliced vector to obtain a target classification model, training the target classification model by using the spliced vector fused by the vector representation of the target query and the vector representation of the video segment associated with each target query, which can effectively improve the association between a problem and feedback video content and make the classification effect of the model better, when receiving a query to be processed and a feedback video of the query to be processed, converting the query to be processed into a first vector, inputting the feedback video into the video content analysis model to obtain at least one second vector, splicing the first vector and each second vector in the at least one second vector, obtaining at least one third vector, inputting the at least one third vector into the target classification model to obtain at least one classification result and a first probability of each classification result, calculating the similarity between the first vector and each second vector to obtain at least one second similarity, calculating the correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity, determining the target video corresponding to the query to be processed according to the correlation, and further realizing accurate video feedback based on an artificial intelligence means.

The embodiment of the invention also provides a query analysis-based video feedback device, which is used for executing any embodiment of the query analysis-based video feedback method. Specifically, please refer to fig. 2, fig. 2 is a schematic block diagram of a query analysis-based video feedback apparatus according to an embodiment of the present invention.

As shown in fig. 2, the video feedback apparatus 100 based on query analysis includes: the device comprises an analysis unit 101, an extraction unit 102, a training unit 103, a splicing unit 104, a conversion unit 105, a classification unit 106, a calculation unit 107 and a determination unit 108.

The analysis unit 101 acquires a query log generated within a preset time length, acquires browsing information from the query log, and analyzes the browsing information to obtain a target query and a video segment associated with the target query.

In at least one embodiment of the present invention, the analyzing unit 101, for analyzing the browsing information to obtain a target query and a video segment associated with the target query, includes:

The extracting unit 102 performs visual feature extraction on the video segments associated with the target query to obtain a vector representation of each video segment in the video segments associated with the target query.

In at least one embodiment of the present invention, the extracting unit 102 performs visual feature extraction on the video segments associated with the target query, and obtaining the vector representation of each video segment in the video segments associated with the target query includes:

calling an increment-Renet v2 model as a pre-training model;

The training unit 103 determines the video segments associated with the target query as input, determines the vector characterization of each video segment as output, and trains a preset neural network model to obtain a video content analysis model.

The splicing unit 104 converts the target query into query vectors, and splices each query vector in the query vectors and the vector representation of each corresponding video segment to obtain a spliced vector.

In at least one embodiment of the present invention, the splicing unit 104 converts the target query into a query vector, and splices each query vector in the query vector and the vector representation of each corresponding video segment to obtain a spliced vector, where the splicing vector includes:

The training unit 103 trains a preset classification model by using the splicing vector to obtain a target classification model.

In at least one embodiment of the present invention, the training unit 103 trains a preset classification model by using the stitching vector, and obtaining a target classification model includes:

verifying the trained model by using the verification sample;

When a query to be processed and a feedback video of the query to be processed are received, the conversion unit 105 converts the query to be processed into a first vector, and inputs the feedback video to the video content analysis model to obtain at least one second vector.

The classification unit 106 splices the first vector and each of the at least one second vector to obtain at least one third vector, and inputs the at least one third vector to the target classification model to obtain at least one classification result and a first probability of each classification result.

In this embodiment, the step of splicing the first vector and each of the at least one second vector by the classification unit 106 to obtain at least one third vector includes:

The calculating unit 107 calculates the similarity between the first vector and each second vector to obtain at least one second similarity.

The determining unit 108 calculates a correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity, and determines a target video corresponding to the query to be processed according to the correlation.

In at least one embodiment of the present invention, the calculating, by the determining unit 108, a correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity includes:

In at least one embodiment of the present invention, the determining unit 108, according to the correlation, determining the target video corresponding to the query to be processed includes:

Wherein, the configuration relevancy can be configured by self, such as 95%.

The above-described query analysis based video feedback apparatus may be implemented in the form of a computer program which can be run on a computer device as shown in fig. 3.

Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 3, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a video feedback method based on query analysis.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of the computer program 5032 in the storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute a video feedback method based on query analysis.

The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 3 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 500 to which aspects of the present invention may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The processor 502 is configured to run the computer program 5032 stored in the memory, so as to implement the query analysis-based video feedback method disclosed in the embodiment of the present invention.

Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 3 does not constitute a limitation on the specific construction of the computer device, and in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 3, and are not described herein again.

It should be understood that, in the embodiment of the present invention, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the invention, a computer-readable storage medium is provided. The computer-readable storage medium may be a nonvolatile computer-readable storage medium or a volatile computer-readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the query analysis-based video feedback method disclosed in the embodiments of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A video feedback method based on query analysis is characterized by comprising the following steps:

determining video segments associated with the target query as input, determining a vector characterization of each video segment as output, and training a preset neural network model to obtain a video content analysis model, wherein the video content analysis model is used for analyzing and converting the content of the input video segments and outputting the vector characterization of the content of the input video segments;

associating the target query with the content of each video segment, comprising: converting the target query into query vectors, and splicing each query vector in the query vectors and the corresponding vector representation of each video segment to obtain a spliced vector;

splicing the first vector and each second vector in the at least one second vector to obtain at least one third vector, inputting the at least one third vector to the target classification model to obtain at least one classification result and a first probability of each classification result, wherein the first probability represents the similarity between the query to be processed and the video content dimension;

calculating the similarity between the first vector and each second vector to obtain at least one second similarity, wherein the second similarity represents the similarity between the query to be processed and the feedback video of the query to be processed in text dimension;

calculating the correlation between the query to be processed and the feedback video of the query to be processed according to the first probability of each classification result and the at least one second similarity, comprising: determining a first weight of the first probability and determining a second weight of the second similarity; calculating a product between the first probability and the first weight as a first product; calculating a product between each second similarity in the at least one second similarity and the second weight as a second product corresponding to each second similarity; calculating the sum of the first product and each corresponding second product as the correlation between the query to be processed and the feedback video of the query to be processed;

and determining the target video corresponding to the query to be processed according to the correlation.

2. The query analysis-based video feedback method of claim 1, wherein the analyzing the browsing information to obtain a target query and a video segment associated with the target query comprises:

3. The query analysis-based video feedback method according to claim 1, wherein the performing visual feature extraction on the video segments associated with the target query to obtain the vector representation of each of the video segments associated with the target query comprises:

calling an increment-Renet v2 model as a pre-training model;

4. The query analysis-based video feedback method according to claim 1, wherein the converting the target query into query vectors, and splicing each query vector in the query vectors with the corresponding vector representation of each video segment to obtain a spliced vector comprises:

5. The query analysis-based video feedback method according to claim 1, wherein the training of a preset classification model by using the stitching vector to obtain a target classification model comprises:

verifying the trained model by using the verification sample;

6. The query analysis-based video feedback method according to claim 1, wherein the determining the target video corresponding to the query to be processed according to the correlation comprises:

7. A query analysis-based video feedback device, comprising:

the training unit is used for determining the video segments associated with the target query as input, determining the vector characterization of each video segment as output, training a preset neural network model to obtain a video content analysis model, wherein the video content analysis model is used for analyzing and converting the content of the input video segments and outputting the vector characterization of the content of the input video segments;

the splicing unit is used for associating the target query with the content of each video segment, and comprises: converting the target query into query vectors, and splicing each query vector in the query vectors and the corresponding vector representation of each video segment to obtain a spliced vector;

the classification unit is used for splicing the first vector and each second vector in the at least one second vector to obtain at least one third vector, inputting the at least one third vector into the target classification model to obtain at least one classification result and a first probability of each classification result, wherein the first probability represents the similarity between the query to be processed and the video content dimension;

the calculating unit is used for calculating the similarity between the first vector and each second vector to obtain at least one second similarity, wherein the second similarity represents the similarity between the query to be processed and the feedback video of the query to be processed in text dimension;

a determining unit, configured to calculate, according to the first probability of each classification result and the at least one second similarity, a correlation between the query to be processed and the feedback video of the query to be processed, including: determining a first weight of the first probability and determining a second weight of the second similarity; calculating a product between the first probability and the first weight as a first product; calculating a product between each second similarity in the at least one second similarity and the second weight as a second product corresponding to each second similarity; calculating the sum of the first product and each corresponding second product as the correlation between the query to be processed and the feedback video of the query to be processed;

the determining unit is further configured to determine a target video corresponding to the query to be processed according to the correlation.

8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the query analysis based video feedback method as claimed in any one of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, causes the processor to carry out the query analysis based video feedback method according to any one of claims 1 to 6.