CN110781835B

CN110781835B - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN110781835B
Application number: CN201911029239.0A
Authority: CN
Inventors: 靳聪; 帖云; 严文彩; 李小兵; 王南苏; 吕欣; 宋雷雨; 李亚杰
Original assignee: Zhengzhou University; Communication University of China
Current assignee: Zhengzhou University; Communication University of China
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2022-08-23
Anticipated expiration: 2039-10-28
Also published as: CN110781835A

Abstract

The application provides a data processing method, a data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring attribute feature vectors of key frames of a target video; obtaining the feature vector of the key frame according to the attribute feature vector of the key frame; inputting the feature vector of the key frame and the vector to be ordered of the musical notes used for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes used for representing the key frame; according to the obtained all the vectors to be sequenced, the background music of the target video is obtained, and by the method, the background music of the target video can be obtained without manual participation, so that the method is beneficial to reducing the manual workload.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.

Background

With the development of technology, the popularity of computer multimedia technology is higher and higher, and video production becomes a work that ordinary people can operate. People can shoot video production materials through tools such as digital video cameras, mobile phones and cameras, and then produce videos so as to record the learning, work and daily life of users.

When a high-quality video is produced, background music needs to be configured for the video after the video is produced, so that the produced video has a good scene reproduction effect when being played, and when the background music is produced, a user needs to find the background music which accords with the video in a large amount of music materials, so that the manual workload is large when the background music is configured for the video.

Disclosure of Invention

In view of the above, embodiments of the present application provide a data processing method, an apparatus, an electronic device, and a storage medium, so as to reduce the workload of configuring background music for a video.

In a first aspect, an embodiment of the present application provides a data processing method, including:

acquiring attribute feature vectors of key frames of a target video;

obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;

inputting the feature vector of the key frame and the vector to be sequenced of the notes for representing the previous key frame into a decoding model as input parameters to obtain the vector to be sequenced of the notes for representing the key frame;

and obtaining the background music of the target video according to all the obtained vectors to be sequenced.

Optionally, the attribute feature vector of the key frame includes:

the dynamic feature vector of the key frame, the static feature vector of the key frame, and/or the optical flow feature vector of the key frame.

Optionally, the obtaining the feature vector of the key frame according to the attribute feature vector of the key frame includes:

and according to the attribute feature vector of the key frame, carrying out vector splicing processing through a full connection layer to obtain the feature vector of the key frame.

Optionally, the obtaining the background music of the target video according to all the obtained vectors to be sorted includes:

judging the vector to be sorted by using a preset target note vector set so as to determine whether the vector to be sorted meets a preset requirement;

and sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule so as to take a sequencing result as the background music of the target video.

Optionally, the determining, by using a preset target note vector set, the vector to be sorted to determine whether the vector to be sorted meets a preset requirement includes:

carrying out mean square error operation on the vector to be ordered and a preset target note vector set to obtain a loss function value of the vector to be ordered relative to the target note vector;

when the loss function value is within a preset range, determining that the vector to be ordered meets the preset requirement;

and when the loss function value is not in the preset range, determining that the vector to be sorted does not meet the preset requirement.

Optionally, the method further comprises:

and taking all vectors to be sequenced which do not meet the preset requirement as training samples to train the decoding model.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

the acquiring unit is used for acquiring the attribute feature vector of the key frame of the target video;

the first processing unit is used for obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;

the second processing unit is used for inputting the feature vector of the key frame and the vector to be ordered of the musical notes for representing the previous key frame into the decoding model as input parameters to obtain the vector to be ordered of the musical notes for representing the key frame;

and the third processing unit is used for obtaining the background music of the target video according to all the obtained vectors to be sequenced.

Optionally, the attribute feature vector of the key frame includes:

Optionally, when the configuration of the first processing unit is configured to obtain the feature vector of the key frame according to the attribute feature vector of the key frame, the configuration of the first processing unit includes:

Optionally, when the third processing unit is configured to obtain the background music of the target video according to all obtained vectors to be sorted, the third processing unit includes:

Optionally, when the third processing unit is configured to determine the vector to be sorted by using a preset set of target note vectors to determine whether the vector to be sorted meets a preset requirement, the method includes:

Optionally, the data processing apparatus further includes:

and the training unit is used for training the decoding model by taking all vectors to be ordered which do not meet the preset requirement as training samples.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the data processing method according to any one of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data processing method according to any one of the first aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

in this application, when configuring background music for a target video, attribute feature vectors of key frames of the target video are obtained first, and each key frame includes different attribute feature vectors, so that the feature vectors of the key frames can be obtained according to the attribute feature vectors of the key frames, and then the feature vectors of the key frames and a to-be-ordered vector for representing notes of a previous key frame are taken as input parameters and input into a decoding model, and the feature vectors of the key frames can represent related contents represented by the key frames, for example: the method comprises the steps of obtaining a vector of a musical note, obtaining a musical note corresponding to the key frame, obtaining a background music of a target video according to the obtained vector to be sequenced after obtaining all vectors to be sequenced for representing the musical note, wherein the obtained vector is used as the vector for representing the musical note, the content expressed by the obtained musical note is matched with the key frame, the vector to be sequenced of the musical note of the previous key frame is also used as an input parameter, the purpose is to ensure that the vector of the current obtained musical note is matched with the vector of the musical note of the previous key frame, so that the musical notes corresponding to two adjacent key frames are matched, and the music is formed by a plurality of musical notes, the background music obtained by utilizing the vectors to be sequenced is also matched with the target video, and the background music of the target video can be obtained without manual participation through the method, thus facilitating a reduction in manual effort.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another data processing method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a data processing apparatus according to a second embodiment of the present application;

fig. 4 is a schematic structural diagram of a data processing apparatus according to a second embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

Example one

Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application, as shown in fig. 1, the data processing method includes the following steps:

step 101, obtaining attribute feature vectors of key frames of a target video.

Specifically, the video comprises a plurality of key frames, each key frame comprises a plurality of attributes, such as objects, people, actions, scenes, human-object relationships and the like, the attributes are combined together to form all elements of one key frame, and after the attribute feature vectors of the key frames are obtained, quantized data of the key frames can be obtained, so that data support is provided for subsequent processing.

It should be noted that, the specific attribute feature vector may be set according to actual needs, and is not specifically limited herein.

And 102, obtaining the feature vector of the key frame according to the attribute feature vector of the key frame.

Specifically, since the attribute vectors of the key frames are combined together to represent a complete key frame, and when configuring the background music for the target video, consideration needs to be given to the whole target video, the attribute feature vectors of the key frames need to be used to obtain the feature vectors of the key frames, and since the feature vectors of the key frames can represent the content in the target video as a whole, the feature vectors of the key frames can provide a reference basis from a whole angle when configuring the background music for the target video.

It should be noted that, the specific implementation manner of obtaining the feature vector of the key frame according to the attribute feature vector may be set according to actual needs, and is not specifically limited herein.

And 103, inputting the feature vector of the key frame and the vector to be ordered of the musical notes for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes for representing the key frame.

Specifically, after obtaining the feature vector of the key frame, the purpose of analyzing the whole target video can be achieved, and in order to analyze the contents such as the person, the motion, the scene, the person-object relationship, and the like from the whole target video, it is necessary to input the feature vector of the key frame into the decoding model so as to obtain the vector related to the parameters, that is: after the feature vectors of the key frames are input into the decoding model, content vectors representing different characters, actions, scenes, person-to-object relationships and the like in the relevant target video can be output, and since the notes in a piece of music are all related, in order to make the notes generated by the current key frame match the notes corresponding to the previous key frame, vectors to be ordered for representing the notes of the previous key frame are also required to be input into the decoding model as input parameters, wherein the vectors to be ordered for representing the notes of the first key frame of the target video are obtained by inputting the preset note vectors and the feature vectors of the first key frame into the decoding model as input parameters, and since the output vectors are closely related to the content expressed by the target video and the obtained notes are related, the output vectors are used as the vectors to be ordered for representing the notes, the content of the musical notes corresponding to the vector to be sorted and the content of the target video are closely related, and all the obtained musical notes are also related, so that background music with high matching degree with the target video can be configured by using the musical notes corresponding to the vector to be sorted.

It should be noted that, which decoding model is specifically used may be set according to actual needs, and is not limited in particular here.

And step 104, obtaining the background music of the target video according to all the obtained vectors to be sequenced.

Specifically, since the music is formed by combining a plurality of notes according to a certain rule, after all vectors to be sorted are obtained, the notes forming the background music can be obtained, and therefore the background music of the target video can be obtained according to the vectors to be sorted.

It should be noted that, the specific method for obtaining the background music according to all the vectors to be sorted can be set according to actual needs, for example: the method may obtain the corresponding musical notes by using the vector to be sorted, and then combine the musical notes together according to a certain rule to obtain the background music, or perform sorting on the vector to be sorted, and then sort the musical notes with the sequence corresponding to the vector to be sorted, so as to use the musical notes with the sequence as the background music, and the specific implementation manner is not specifically limited herein.

In the method, the obtained musical notes are matched with the key frames, so that the background music obtained by using the vectors to be sequenced is also matched with the target video.

In one possible embodiment, the attribute feature vector of the key frame includes:

Specifically, the motion feature vector of the key frame, the static feature vector of the key frame, and the optical flow feature vector of the key frame may quantitatively describe the relationship between the object, person, motion, scene, and person and object in the key frame, so after obtaining the above vectors, the feature vectors of the key frame may be obtained through the above vectors.

It should be noted that, which attribute feature vector or attribute feature vectors are specifically used may be set according to actual needs, and is not specifically limited herein.

In a possible embodiment, in step 102, a feature vector of the key frame may be obtained by performing a vector stitching process through the full-link layer according to the attribute feature vector of the key frame.

It should be noted that, what kind of fully connected layer is specifically used to perform the splicing processing on the attribute feature vector may be set according to actual needs, and is not specifically limited herein.

In a possible implementation, fig. 2 is a schematic flow chart of another data processing method provided in the first embodiment of the present application, and as shown in fig. 2, when step 104 is executed, the following steps may be implemented:

step 201, judging the vector to be sorted by using a preset target note vector set to determine whether the vector to be sorted meets a preset requirement.

And step 202, sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule, and taking a sequencing result as background music of the target video.

Specifically, the preset target note vectors are notes meeting the requirements of the target video, when the vectors to be sorted meet the preset requirements, the vectors to be sorted meet the preset requirements of the user, after all the vectors to be sorted meeting the preset requirements of the user are determined, all the vectors to be sorted meeting the requirements can be sorted according to a preset note arrangement rule, and the notes corresponding to all the vectors to be sorted meeting the requirements can be arranged according to a certain rule, so that the background music of the target video is formed.

It should be noted that, the specific preset requirement and the specific arrangement rule may be set according to actual needs, and are not specifically limited herein.

In a possible embodiment, in step 201, a mean square error operation may be performed on the vector to be sorted and a preset target note vector set to obtain a loss function value of the vector to be sorted relative to the target note vector, and when the loss function value is within a preset range, it is determined that the vector to be sorted meets a preset requirement; and when the loss function value is not in the preset range, determining that the vector to be sorted does not meet the preset requirement.

It should be noted that, a specific preset range may be set according to actual needs, and the preset range may be a numerical range, or may also be a specific numerical value, for example: if the loss function value is 0, determining that the vector to be sorted meets the preset requirement, and if the loss function value is not 0, determining that the vector to be sorted does not meet the preset requirement, wherein the specific preset range is not specifically limited herein.

In a possible embodiment, all vectors to be ordered which do not meet the preset requirement are used as training samples to train the decoding model.

Specifically, when a certain vector to be ordered does not meet the preset requirement, and the obtained vector to be ordered is not in the preset target note vector set, the accuracy of the result indicating the output of the decoding model needs to be improved, so that the vector to be ordered can be used as a training sample to train the decoding model, and the accuracy of the result output by the decoding model can be improved.

It should be noted that the specific model training mode may be set according to actual needs, and is not specifically limited herein.

Example two

Fig. 3 is a schematic structural diagram of a data processing apparatus according to a second embodiment of the present application, and as shown in fig. 3, the data processing apparatus includes:

an obtaining unit 31, configured to obtain an attribute feature vector of a key frame of a target video;

a first processing unit 32, configured to obtain a feature vector of the key frame according to the attribute feature vector of the key frame;

a second processing unit 33, configured to input the feature vector of the key frame and the to-be-sorted vector for representing the note of the previous key frame into the decoding model as input parameters, so as to obtain the to-be-sorted vector for representing the note of the key frame;

and the third processing unit 34 is configured to obtain background music of the target video according to all the obtained vectors to be sorted.

In a possible embodiment, the configuration of the first processing unit 32, when configured to obtain the feature vector of the key frame according to the attribute feature vector of the key frame, includes:

In a possible embodiment, the configuration of the third processing unit 34, when configured to obtain the background music of the target video according to all the obtained vectors to be sorted, includes:

In a possible embodiment, the configuration of the third processing unit 34, when configured to determine the vector to be sorted by using a preset set of target note vectors to determine whether the vector to be sorted meets a preset requirement, includes:

and when the loss function value is not in the preset range, determining that the vector to be sequenced does not meet the preset requirement.

In a possible implementation, fig. 4 is a schematic structural diagram of a data processing apparatus provided in example two of the present application, and as shown in fig. 4, the data processing apparatus further includes:

and the training unit 35 is configured to train the decoding model by using all vectors to be ordered which do not meet the preset requirement as training samples.

For the principles of the second embodiment, reference may be made to the related descriptions of the first embodiment, which are not repeated herein.

EXAMPLE III

Fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present application, including: a processor 501, a storage medium 502 and a bus 503, wherein the storage medium 502 stores machine-readable instructions executable by the processor 501, when the electronic device executes the data processing method, the processor 501 and the storage medium 502 communicate with each other through the bus 503, and the processor 501 executes the machine-readable instructions to perform the following steps:

acquiring attribute feature vectors of key frames of a target video;

In this embodiment of the application, the storage medium 502 may further execute other machine-readable instructions to perform other methods as described in the first embodiment, and for the method steps and principles to be specifically executed, refer to the description of the first embodiment, which is not described in detail herein.

Example four

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the following steps:

acquiring attribute feature vectors of key frames of a target video;

inputting the feature vector of the key frame and the vector to be ordered of the musical notes used for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes used for representing the key frame;

In the embodiment of the present application, when being executed by a processor, the computer program may further execute other machine-readable instructions to perform other methods as described in the first embodiment, and for the specific method steps and principles to be performed, reference is made to the description of the first embodiment, which is not described in detail herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method, comprising:

acquiring attribute feature vectors of key frames of a target video;

obtaining a feature vector of the key frame according to the attribute feature vector of the key frame, wherein the attribute feature vector of the key frame comprises a dynamic feature vector of the key frame, a static feature vector of the key frame and/or an optical flow feature vector of the key frame;

the obtaining the feature vector of the key frame according to the attribute feature vector of the key frame includes:

according to the attribute feature vector of the key frame, carrying out vector splicing processing through a full connection layer to obtain the feature vector of the key frame;

obtaining background music of the target video according to all the obtained vectors to be sequenced;

the obtaining the background music of the target video according to all the obtained vectors to be sequenced comprises:

2. The method as claimed in claim 1, wherein said determining the vector to be sorted by using a predetermined set of target note vectors to determine whether the vector to be sorted meets a predetermined requirement comprises:

3. The method of claim 1, wherein the method further comprises:

and taking all vectors to be ordered which do not meet the preset requirement as training samples to train the decoding model.

4. A data processing apparatus, comprising:

the acquiring unit is used for acquiring attribute feature vectors of key frames of the target video;

the first processing unit is used for obtaining the feature vector of the key frame according to the attribute feature vector of the key frame, wherein the attribute feature vector of the key frame comprises a dynamic feature vector of the key frame, a static feature vector of the key frame and/or an optical flow feature vector of the key frame;

when the first processing unit is configured to obtain the feature vector of the key frame according to the attribute feature vector of the key frame, the configuration of the first processing unit includes:

the third processing unit is used for obtaining background music of the target video according to all the obtained vectors to be sequenced;

the third processing unit is configured to, when obtaining the background music of the target video according to all the obtained vectors to be sorted, include:

judging the vector to be sorted by utilizing a preset target note vector set to determine whether the vector to be sorted meets the preset requirement;

5. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the data processing method according to any one of claims 1 to 3.

6. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the data processing method according to any one of claims 1 to 3.