WO2022042157A1

WO2022042157A1 - Method and apparatus for manufacturing video data, and computer device and storage medium

Info

Publication number: WO2022042157A1
Application number: PCT/CN2021/108174
Authority: WO
Inventors: 裴得利
Original assignee: 百果园技术(新加坡)有限公司; 裴得利
Priority date: 2020-08-31
Filing date: 2021-07-23
Publication date: 2022-03-03
Also published as: CN112040339A

Abstract

Disclosed are a method and apparatus for manufacturing video data, and a computer device and a storage medium. The method for manufacturing video data comprises: when first video data is played, displaying element digest information, wherein the element digest information is used for representing an audiovisual element included in the first video data; receiving a first operation that acts on the element digest information; in response to the first operation, jointly displaying video digest information of second video data and a manufacturing control, wherein the second video data includes the audiovisual element; receiving a second operation that acts on the manufacturing control; and in response to the second operation, collecting third video data, and adding, to the third video data, the audiovisual element corresponding to the element digest information.

Description

Video data production method, device, computer equipment and storage medium

This application claims the priority of the Chinese patent application with application number 202010896513.0 filed with the China Patent Office on August 31, 2020, the entire contents of which are incorporated herein by reference.

technical field

The present application relates to the technical field of multimedia, for example, to a method, apparatus, computer equipment and storage medium for producing video data.

Background technique

With the widespread popularity of mobile terminals, users can use mobile terminals to create video data, such as short videos, anytime, anywhere, and publish them to platforms on the Internet.

When producing video data, users usually add various elements to the video data, so as to improve the splendor of the video data.

The elements added by users to the video data are mostly templates provided by the platform. However, the platform provides fewer templates, and more users produce video data, which leads to the obvious homogeneity of the video data produced using these templates. Therefore, , many users manually collect elements to realize element personalization, thereby realizing the personalization of video data, for example, downloading data from the Internet as elements, parsing data from other video data as elements, and so on.

Since the format of the element (such as resolution, sampling rate, size, etc.) may not meet the production specifications, users are often required to use professional applications to correct the element, such as cropping, compression, etc., which requires a high technical threshold and takes a long time. The cost of producing video data is relatively high.

SUMMARY OF THE INVENTION

The present application proposes a method, device, computer equipment and storage medium for producing video data, so as to solve the problem of how to reduce the cost of producing video data under the condition of keeping the individuality of video data.

The application provides a method for producing video data, including:

In the case of playing the first video data, element summary information is displayed, wherein the element summary information is used to represent audiovisual elements included in the first video data;

receiving a first operation acting on the element summary information;

in response to the first operation, collectively displaying video summary information and production controls for second video data, wherein the second video data includes the audiovisual element;

receiving a second operation acting on the production control;

In response to the second operation, third video data is collected, and audiovisual elements corresponding to the element summary information are added to the third video data.

The present application also provides a method for producing video data, including:

Send the first video data to the client, wherein the client is configured to display element summary information when the first video data is played, and the element summary information is used to indicate that the first video data contains audiovisual elements;

in the case of receiving a request triggered by the client based on the element summary information, searching for second video data containing the audiovisual element;

sending the video summary information of the second video data to the client, wherein the client is further configured to jointly display the video summary information and production controls;

In the case of receiving the request triggered by the client based on the production control, the audiovisual element corresponding to the element summary information is sent to the client, and the client is further configured to collect third video data, The audiovisual element corresponding to the element summary information is added to the third video data.

The present application also provides a device for producing video data, including:

a display screen, configured to display element summary information when the first video data is played, wherein the element summary information is used to represent audiovisual elements included in the first video data;

a touch screen, configured to receive a first operation acting on the element summary information;

a display screen, further configured to jointly display video summary information and production controls of second video data in response to the first operation, wherein the second video data includes the audiovisual element;

a touch screen, further configured to receive a second operation acting on the production control;

a camera, configured to collect third video data in response to the second operation;

The processor is configured to add the audiovisual element corresponding to the element summary information to the third video data.

The first video data sending module is configured to send the first video data to the client, wherein the client is configured to display element summary information when the first video data is played, and the element summary information uses to represent the audiovisual elements contained in the first video data;

A second video data search module, configured to search for the second video data containing the audiovisual element in the case of receiving a request triggered by the client based on the element summary information;

A video summary information sending module, configured to send the video summary information of the second video data to the client, wherein the client is also set to jointly display the video summary information and the production control;

The audiovisual element sending module is configured to send the audiovisual element corresponding to the element summary information to the client when receiving the request triggered by the client based on the production control, wherein the client also further Setting is to collect third video data, and add audiovisual elements corresponding to the element summary information to the third video data.

The present application also provides a computer device, the computer device comprising:

one or more processors;

memory, arranged to store one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the above-mentioned method for producing video data.

The present application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned method for producing video data is implemented.

Description of drawings

1 is a flowchart of a method for producing video data according to Embodiment 1 of the present application;

FIG. 2A is an exemplary diagram of producing video data according to Embodiment 1 of the present application;

FIG. 2B is another exemplary diagram of producing video data according to Embodiment 1 of the present application;

FIG. 2C is another exemplary diagram of producing video data according to Embodiment 1 of the present application;

FIG. 2D is another exemplary diagram of producing video data according to Embodiment 1 of the present application;

FIG. 2E is another exemplary diagram of producing video data according to Embodiment 1 of the present application;

FIG. 2F is another exemplary diagram of producing video data according to Embodiment 1 of the present application;

3 is a flowchart of a method for producing video data according to Embodiment 2 of the present application;

4 is a flowchart of a method for producing video data according to Embodiment 3 of the present application;

5 is a schematic structural diagram of a multi-task learning model provided in Embodiment 3 of the present application;

6 is a flowchart of a method for producing video data according to Embodiment 4 of the present application;

7 is a schematic structural diagram of an apparatus for producing video data according to Embodiment 5 of the present application;

8 is a schematic structural diagram of an apparatus for producing video data according to Embodiment 6 of the present application;

FIG. 9 is a schematic structural diagram of a computer device according to Embodiment 7 of the present application.

detailed description

The present application will be described below with reference to the accompanying drawings and embodiments.

In general, video platforms need to maintain a good ecology of consumption and production, that is, in terms of consumption, video platforms strive to push video data in line with users’ interests and preferences to obtain users’ higher consumption time and satisfaction; in terms of production , the video platform should also encourage users to shoot more video data and upload it to the video platform for release, enrich the content of the video platform, and richer content can make it easier for users to obtain video data that meets their interests and preferences, forming a virtuous circle.

The video recommendation algorithm is mainly aimed at the consumption mechanism. By recording the user's implicit feedback on the video data, such as the viewing time, whether to like, whether to comment, whether to forward, etc., these implicit feedbacks are used to construct the positive feedback of the training data. Negative samples, use the training data to train a ranking model, use the ranking model to calculate the user's score on the video data, and then select the video data that best meets the user's interests and preferences, and push it to the user.

In addition, in terms of production incentives, most of the conventional means are based on operation mechanisms, such as grades, contribution values, cash rewards and other means to directly motivate users' behavior of producing video data.

However, such methods require manual operations, anti-cheating, etc., which are costly. Moreover, once the incentives are stopped, the willingness of users to produce will drop rapidly.

Due to business requirements, the optimization goal of the ranking model has also developed from a single playback duration (or completion rate) to a multi-task learning model that has both consumption indicators such as duration, and satisfaction indicators such as likes, comments and forwarding.

In this embodiment, with the appropriate product form, the user can reuse the audio-visual elements of interest more concisely when consuming video data, and quickly produce video data. At the same time, the multi-task learning model is used to introduce A goal of conversion from consumption to production, to predict which video data is more likely to arouse users’ interest and then produce new video data. When finally sorting the recalled video data, a factor is added, that is, whether the current video data will arouse users’ interest. Interested in production, when the consumption and satisfaction conditions are close, this type of video data will be preferentially pushed to increase the willingness of users to produce, which can guide more users from single consumption to consumption, while further production and improve The conversion ratio from consumption to production, thereby enriching the ecological closed loop of video platform content.

Example 1

1 is a flowchart of a method for producing video data according to Embodiment 1 of the present application. This embodiment is applicable to the situation of providing audiovisual elements of existing video data and producing new video data. The video data production device can be implemented by software and/or hardware, and can be configured in computer equipment, for example, mobile terminals (such as mobile phones, tablet computers, etc.), smart wearable devices (such as smart watches, smart watches, etc.) glasses, etc.), personal computer, etc., including the following steps:

Step 101: When playing the first video data, display element summary information.

In this embodiment, the operating system of the computer device may include Android (Android), a mobile operating system (IOS) developed by Apple, Windows, etc., in these operating systems, a client for running and playing and producing video data is supported, For example, short video applications, instant messaging tools, online video applications, and so on.

The client may request the server to play video data in the form of a Uniform Resource Locators (URL), etc. The video data is referred to as first video data in this embodiment, and after the server receives the request , searching for the first video data in a personalized or non-personalized manner, and sending part or all of the first video data to the client.

The first video data is video data that has been produced offline, and the form may include short videos, micro-movies, performance programs, and so on. This embodiment does not limit the form of the first video data.

The so-called personalization can refer to the adaptation of the video data to the user currently logged in on the client (represented by an identifier (ID), etc.), that is, based on the multi-objective optimization algorithm, collaborative filtering algorithm, etc. For matching, refer to Embodiment 3 for the matching method. This embodiment does not describe the matching method in detail. When the matching is successful, the video data can be regarded as the first video data.

The so-called non-personalization means that the screening of video data does not depend on the user currently logged in on the client (indicated by ID, etc.), and can be evaluated based on video quality (integrated definition, playback volume, likes, comments, etc.) ), popularity and other non-personalized factors to screen video data, and use the screened video data as the first video data.

If the first video data is made by mixing a video data with one or more audiovisual elements, the server can query one or more element summary information marked in the first video data, and the element summary information indicates the first video data containing audiovisual elements, and sending element summary information for the one or more audiovisual elements to the client.

The so-called audiovisual elements can include visual elements (that is, elements that users can see), audible elements (that are, elements that users can hear), and the form of audio-visual elements can be set according to actual conditions, such as audio data, video data. , beauty special effects, filters, etc., the form of audiovisual elements is not limited in this embodiment.

In addition, the element summary information may include text data, image data, etc., for example, the name of the audio data, the cover of the audio data, the name of the video data, the cover of the video data, the size, the author, the publisher, the number of users, etc. Wait. In addition to representing the audiovisual element, the element summary information may also carry other information, for example, the ID of the audiovisual element, etc., which is not limited in this embodiment.

After buffering part or all of the first video data, the client can call the video player provided by the operating system to play the first video data, that is, the client generates a first user interface and displays it in the first user interface The picture of the first video data, and the speaker is driven to play the audio of the first video data.

The first user interface has a play area, and the play area is used to display a picture of the first video data. In some cases, the play area is a partial area of the first user interface. In this case, for the first video data The other information of the video data can be displayed in the area outside the play area. In some cases, the play area is the entire area of the first user interface. At this time, other information for the first video data can be displayed in the floating The form is displayed above this display area.

Other information for the first video data may include controls for expressing positive emotions (such as "like", "like"), comment information, controls for sharing, fields for inputting comment information, and the like.

Taking the Android system as an example, components such as VideoView, MediaPlayer, SurfaceView, Vitamio, and JCPlayer can be called to play the first video data.

In addition, if the client has cached one or more element summary information of the first video data, the element summary information can be converted into the first data structure in the first user interface, and the element summary information can be converted into the first data structure in a static manner (such as textual The name of the audiovisual element is displayed in a dynamic manner (such as rotating the cover of the audiovisual element) to display the element summary information under the first data structure.

If the play area is the entire area of the first user interface, the one or more element summary information can be displayed above the play area in a floating form, so that the one or more element summary information is displayed in the first video data on the screen.

For example, as shown in FIG. 2A, in the first user interface 210, the screen of the first video data is displayed in all areas, and the first video data has audible elements, that is, the song "Quiet Night". The lower left corner of the first user interface 210 displays the title of the song and the publisher "Quiet Night-Little Red" (element summary information 211), and the cover of the song (element summary information 212) is displayed in the lower right corner of the first user interface 210. ).

For another example, as shown in FIG. 2D, in the first user interface 240, a picture of the first video data is displayed in a partial area, and the first video data has audible elements and visual elements, that is, the song "Exciting", small The video of stepping on a tightrope has just been performed. At this time, the name of the song and the publisher "Exciting-Xiao Ming" (element summary information), the introductory language of the video (to guide the user to use the Video production new video data) and the number of users of the video "Due with Xiaogang (2.27K)" (element summary information 241), the cover of the song (element summary information) is displayed in the lower right corner of the first user interface 240.

Step 102: Receive a first operation acting on the element summary information.

In the process of playing the first video data, if the user is interested in an audio-visual element contained in the first video data, the human-computer interaction tool provided by the computer device can trigger the first audio-visual element summary information corresponding to the audio-visual element. action to select the visual element represented by the element's summary information.

In implementation, for different types of computer equipment, the human-computer interaction tools provided by them are different, and correspondingly, the ways of triggering the first operation through the human-computer interaction tools are also different. The manner in which the tool triggers the first action is not limited.

For example, if the human-computer interaction tool provided by the computer equipment is a touch screen, when the touch screen detects a touch operation (such as a click operation, long-press operation, re-press operation, etc.) It is determined to receive a first operation that acts on the element digest information.

For another example, if the human-computer interaction tool provided by the computer device is an external device, after receiving a key event (such as a single-click event, double-click event, long-press event, etc.) that occurs in an element summary information sent by the external device ), it is determined to receive the first operation acting on the element summary information. Wherein, the external device includes, but is not limited to, a mouse, a remote control, and the like.

Step 103: In response to the first operation, jointly display video summary information and production controls of the second video data.

Applying this embodiment, the server can collect video data in various ways, mark the video data with audiovisual elements (represented by ID, etc.), and store the video data in a local database of the server.

In a method of marking audiovisual elements, for existing video data, a specific visual element can be detected in the picture of the video data by calling an object detection model with a specific visual element as a target, If the specific visual element is detected, the video data is marked with the specific visual element.

The target detection model includes a first-order (One Stage) target detection model and a second-order (Two Stage) target detection model.

A target detection model that generates a series of candidate boxes as samples, and then classifies the samples through a convolutional neural network (CNN) is called a second-order target detection model, for example, a regional convolutional neural network (Region-CNN, R-CNN), Spatial Pyramid Pooling Network (SPP-Net), Fast-RCNN, Faster-RCNN, etc.

A target detection model that does not generate candidate frames and directly converts the problem of target frame positioning into a regression problem is called a first-order target detection model, for example, Generalized Congruence Neural Network (GCNN), YOLO ( You Only Look Once), first-order multi-box prediction (Single Shot Mutibox Detector, SSD), etc.

In another way of marking audiovisual elements, for existing video data, the audio contained in the audio can be extracted, and the features of the audio can be extracted. If the features of the audio are the same as or similar to the features of a specific visual element, then The video data marks the specific visual element.

In yet another method of marking audiovisual elements, if the user uses a custom visual element to create video data, the custom visual element is compared with the original visual element, if the custom visual element is If the element is the same as or similar to the original visual element, the video data will be marked with the original visual element. If the custom visual element is different or not similar to the original visual element, The visual element is set with a new identifier (such as a new ID), and the video data is marked with a custom visual element (represented by the new identifier).

In yet another way of marking the audiovisual element, if the user makes new video data by using the marked visual element of other video data, the visual element can be marked for the new video data.

The above manner of marking audiovisual elements is only an example. When implementing the embodiments of the present application, other manners of marking audiovisual elements may be set according to actual conditions, and the embodiments of the present application do not limit the manners of marking audiovisual elements.

In response to the user's first operation of selecting an audiovisual element represented by the element summary information, the client can send a request to the server carrying the identification (such as ID) of the audiovisual element, requesting the server to search for the audiovisual element (with ID, etc.) containing the audiovisual element. representation) of the video data.

When the server receives the request, it parses the identification of the audiovisual element from the request, uses the identification as a search condition in the local database of the server, searches for the video data marked with the identification, and writes the video data into the video In the collection, for the convenience of distinction, the video data is referred to as the second video data in this embodiment, the video collection is referred to as the first video collection in this embodiment, and the server extracts the first video data from the local database. Video summary information (such as cover, name, producer, etc.) of the second video data in the video set, and send the video summary information to the client.

The so-called mark means that the video data contains the audiovisual element corresponding to the mark, that is, the plurality of second video data contains the same audiovisual element.

Due to the large quantity of the second video data, the second video data may be sorted according to a preset sorting method, and each time the top n (n is a positive integer) second video data of the sorting are selected and sent to the client.

In general, the sorting method may include non-personalized sorting methods such as descending sorting according to video quality, descending sorting according to video popularity, etc., so as to reduce the processing complexity and improve the processing speed.

In addition to a non-personalized sorting manner, a personalized sorting manner such as collaborative filtering may also be used, which is not limited in this embodiment.

For the client, on the one hand, video summary information of the second video data sent by the server is received and cached locally on the client.

On the other hand, a second user interface is generated. In the second user interface, if the element summary information includes element image data (such as the cover of audio data, the thumbnail of video data, etc.), the element image can be displayed in the form of background data, that is, the image data in the element summary information is set as the background, and when setting, the image data can be blurred.

Convert the element summary information into a second data structure, and display the element summary information in the second data structure in the form of a title, thereby indicating that the second video data corresponding to the video summary information contains the audiovisual element corresponding to the element summary information, for example, Element summary information for the visual element is independently displayed at the top of the second user interface.

In the second user interface, one or more information areas may be displayed in a waterfall or the like at a position below the element summary information, wherein the area of the information area matches the type of audiovisual element, that is, according to the type of audiovisual element Sets the size of the information area.

In an example, considering that the audible element does not occupy the display space, but the visual element occupies the display space, if the type of the audiovisual element is a visual element, the area with the area of the first value can be set as the first area , display the first area as an information area, if the type of the audiovisual element is an audible element, you can set the area with the second value as the second area, and display the second area as an information area, where the first area If the value is greater than the second value, that is, the area of the first area is larger than the area of the second area, and the area of the display area is increased for the visual element, so that the visual element retains more details when displayed, and the user can browse more clearly. visual elements.

In this example, if the same second video data contains both visual elements and audible elements, when the second video data is recalled to a first video set for the visual elements, the second video data is displayed in the first area. Two video summaries of the video data, when recalling the second video data for the audible element to another first video set, displaying the video summaries of the second video data in the second area.

The video summary information of the second video data in the first video set is sequentially loaded into the plurality of information areas, so that the video summary information of the second video data is displayed in the information area.

In addition, the production control is displayed on the information area in a floating manner, and the user can trigger operations such as sliding operation and page turning operation for the element summary information corresponding to the audio-visual element through the human-computer interaction tool provided by the computer equipment, so that in the first 2. The user interface switches to display the video summary information of the second video data. During the process of switching to display the video summary information of the second video data, the position of the video summary information of the second video data changes, but the production control maintains the position. It does not change with operations such as sliding operations and page turning operations.

In the process of switching and displaying the video summary information of the second video data, the client can continue to request the server for other second video data in the first video set, and display them in the second user interface until the first video is requested. The other second video data in the set is complete.

For example, as shown in FIG. 2A , if the user triggers a click operation (the first operation) for the title of the song displayed in the lower left corner of the first user interface 210 and the publisher “Quiet Night-Little Red” (element summary information 211 ) , or, a click operation (the first operation) is triggered for the cover (element summary information 212 ) of the song displayed in the lower right corner of the first user interface 210 , then as shown in FIG. 2B , in the second user interface 220 , The cover of the song is blurred and set as the background, on which the cover of the song, the name of the song, the publisher of the song, the number of users of the song are displayed in a concentrated manner, and nine smaller information areas are displayed. The video summary information including the second video data corresponding to the song is loaded in sequence in the area.

The second video data in the third order (ie "N0.3"), in addition to the song, also includes audiovisual elements of other video data, when the user selects a song, the third order (ie "N0.3") ) of the second video data uses a smaller information area to display its video summary information.

In addition, below the second user interface 220, the authoring control 221 "Join" is displayed.

For example, as shown in FIG. 2D , if the user clicks on the introductory phrase of the video displayed in the lower left corner of the first user interface 240 and the number of users of the video “Due to Xiaogang (2.27K)” (element summary information 241 ) operation (the first operation), then as shown in FIG. 2E, in the second user interface 250, the cover of the video is blurred and set as a background, on which the cover, the producer, and the background of the video are collectively displayed. The number of users, and, displays four larger information areas, in each information area sequentially loading video summary information containing the second video data for the video.

In addition, below the second user interface 250, an authoring control 251 "Join" is displayed.

Step 104: Receive a second operation for making the control.

In the process of displaying the video summary information of the second video data, if the user is interested in the current audio-visual element, the human-computer interaction tool provided by the computer device can trigger the second operation for the current production control, so that the production includes the New video data for audiovisual elements.

In implementation, for different types of computer equipment, the provided human-computer interaction tools are different, and correspondingly, the ways of triggering the second operation through the human-computer interaction tools are also different. The manner in which the tool triggers the second action is not limited.

For example, if the human-computer interaction tool provided by the computer equipment is a touch screen, when the touch screen detects a touch operation (such as a click operation, long press operation, repress operation, etc.) The first operation that acts on the make control.

For another example, if the human-computer interaction tool provided by the computer device is an external device, when receiving a key event (such as a single-click event, double-click event, long-press event, etc.) It is determined to receive a first operation acting on the authoring control. Wherein, the external device includes but is not limited to a mouse, a remote control, and the like.

Step 105: In response to the second operation, collect third video data, and add audiovisual elements corresponding to the element summary information to the third video data.

In this embodiment, the client sends a request for downloading the audiovisual element (represented by ID, etc.) to the server in response to the user triggering the second operation of making the control, and after receiving the request, the server searches for the independent audiovisual element ( represented by an ID, etc.) and send the audiovisual element to the client.

The so-called independent can mean that the audiovisual element is an independent file and does not depend on the first video data and the second video data.

The format of the audiovisual element (such as resolution, sampling rate, size, etc.) conforms to the production specification, and the client can directly use the audiovisual element to produce new video data.

After receiving the audiovisual element, the client can generate a third user interface, generate a control for making video data in the third user interface, call the camera of the computer device, preview the video data on the third user interface, and then receive the video data. When the operation is determined, video data is collected, and for convenience of distinction, the video data is referred to as third video data in this embodiment.

While collecting the third video data, the audiovisual elements corresponding to the element summary information are added to the third video data as the produced material.

During the process of adding the audiovisual element, the third video data is kept synchronized on the time axis with the audiovisual element.

In this embodiment, when the third video data is started to be collected, the audiovisual element is started to be played, so that the user can preview the effect of adding the audiovisual element.

Generally, when the audiovisual element ends, the collection of the third video data may be stopped. Of course, when the audiovisual element ends, the collection of the third video data may also be continued, which is not limited in this embodiment.

In practical applications, due to the different types of audiovisual elements, the ways of adding audiovisual elements to the third video data are also different.

In an example, the audiovisual element includes audio data in the audiovisual element, then in this example, the audio element starts to be played at the same time when the third video data is collected, so that the audio data corresponding to the element summary information is set as the third video data. Background music for video data.

For example, as shown in FIG. 2B , if the user “joins” the click operation (second operation) for the display creation control 221 displayed below the second user interface 220 , then as shown in FIG. 2C , in the third user interface 230 Call the camera to preview in the middle, when receiving the confirmation operation triggered by the circular control, collect the third video data and play the song "Quiet Night", so that the song "Quiet Night" is used as the background music of the third video data.

In another example, the audiovisual element includes video data in the visual element, and for convenience of distinction, the video element may be referred to as fourth video data in this embodiment.

In this example, the fourth video data is played at the same time when the third video data is collected. In the same screen, the third video data is displayed on the left and the fourth video data is displayed on the right, or the third video data is displayed on the right. . The fourth video data is displayed on the left, and the third video data and the fourth video data are displayed in the form of picture-in-picture, so that the fourth video data and the third video data corresponding to the element summary information are synthesized in a split-screen manner.

For example, as shown in FIG. 2E , if the user triggers a click operation (second operation) for “joining” the display creation control 251 displayed below the second user interface 250 , then as shown in FIG. 2F , in the third user interface 260 Call the camera to preview in the middle, and when receiving the confirmation operation triggered by the circular control, collect the third video data and display it on the left, and play the video of Xiaogang stepping on the wire on the right, so that the video of Xiaogang stepping on the wire Merged with the third video data.

The above manner of adding audiovisual elements is only an example. When implementing the embodiments of the present application, other manners of adding audiovisual elements may be set according to actual conditions, and the embodiments of the present application do not limit the manners of adding audiovisual elements.

After adding the audiovisual element corresponding to the element summary information to the third video data, the third video data can be sent to the server, and the server receives the third video data sent by the client, and marks the third video data to include the audiovisual element ( Represented by ID, etc.), if the marking is completed, the server publishes the third video data, the client thus publishes the third video data marked with audiovisual elements, and other clients can download the third video data from the server for playback. User browses.

In this embodiment, when the first video data is played, the element summary information is displayed, and the element summary information indicates the audiovisual elements contained in the first video data, the first operation acting on the element summary information is received, and in response to the first operation, the Displaying video summary information and production controls of the second video data, where the second video data includes audiovisual elements, receiving a second operation acting on the production controls, collecting third video data in response to the second operation, and converting the audiovisual elements corresponding to the element summary information The element is added to the third video data. On the one hand, users can use the audiovisual elements contained in the existing video data to create new video data. The audiovisual elements do not depend on the template of the system, and the channels are diversified, which can maintain the individualization of the audiovisual elements. , so as to ensure the personalization of the newly produced video data. On the other hand, the audio-visual elements provided by the system can ensure that the format of the audio-visual elements conforms to the production specifications, and can be directly used to produce new video data, preventing users from using professional applications. Elements are revised, which greatly reduces the technical threshold, reduces the time-consuming, and thus reduces the cost of producing video data.

Embodiment 2

3 is a flowchart of a method for producing video data according to Embodiment 2 of the present application. Based on the foregoing embodiments, this embodiment adds operations of switching the first video data and playing the second video data. The method includes the following steps: step:

Step 301: When playing the first video data, display element summary information.

The element summary information indicates audiovisual elements contained in the first video data.

Step 302: Receive a first operation acting on the element summary information.

Step 303: In response to the first operation, jointly display video summary information and production controls of the second video data.

The second video data contains the audiovisual element.

Step 304: Receive a third operation acting on the video summary information.

In the process of displaying the video summary information of the second video data, if the user is interested in a piece of second video data, a human-computer interaction tool provided by the computer device can trigger a third operation for the corresponding video summary information, thereby selecting The second video data corresponding to the frequency summary information is determined.

In implementation, for different types of computer equipment, the provided human-computer interaction tools are different, and correspondingly, the ways of triggering the third operation through the human-computer interaction tools are also different. The manner in which the tool triggers the third action is not limited.

For example, if the human-computer interaction tool provided by the computer equipment is a touch screen, when the touch screen detects a touch operation (such as a click operation, long press operation, repress operation, etc.) It is determined to receive a third operation acting on the video summary information.

For another example, if the human-computer interaction tool provided by the computer device is an external device, after receiving a key event (such as a single-click event, double-click event, long-press event, etc.) that occurs in a video summary message sent by the external device When , it is determined to receive a third operation acting on the video summary information. Wherein, the external device includes, but is not limited to, a mouse, a remote control, and the like.

Step 305: In response to the third operation, play the second video data to which the video summary information belongs.

In response to the user's third operation of selecting the video summary information, the client may request the server to play the second video data in the form of a URL (carrying the identifier of the second video data, such as ID, etc.), and after the server receives the request , part or all of the second video data can be sent to the client.

After buffering part or all of the second video data, the client can call the video player provided by the operating system to play the second video data, that is, the client generates a first user interface and displays it in the first user interface A picture of the second video data, and driving a speaker to play the audio of the second video data.

In this embodiment, by aggregating the second video data through audio-visual elements, the second video data that the user may like can be pushed centrally, which reduces the user's operation of searching for similar video data through keywords, page turning, etc., and reduces the consumption of searching for similar video data. When the corresponding search operation is performed, the occupation of resources (such as processor resources, memory resources, bandwidth resources, etc.) of the client and the server due to the corresponding search operation is reduced, thereby improving the efficiency of the user browsing the second video data.

Step 306: Receive a fourth operation acting on the first video data.

In the process of playing the first video data, if the user is not interested in the first video data, a human-computer interaction tool provided by the computer device can trigger a fourth operation for the first video data, so that the first user interface Switch to play other first video data.

In implementation, for different types of computer equipment, the human-computer interaction tools provided by them are different, and accordingly, the ways of triggering the fourth operation through the human-computer interaction tools are also different. The manner in which the tool triggers the fourth operation is not limited.

For example, if the human-computer interaction tool provided by the computer device is a touch screen, the touch screen detects that the touch screen detects the occurrence in the spatial area of the first user interface (the area other than the control, element summary information and other operable data) During a touch operation (such as a sliding operation, etc.), it is determined to receive a fourth operation acting on the first video data.

For another example, if the human-computer interaction tool provided by the computer device is an external device, after receiving the data sent by the external device and occurring in the space area in the first user interface (except for control, element summary information and other operable data) When a key event (such as a drag event, etc.) occurs in the region), it is determined to receive a fourth operation acting on the first video data. Wherein, the external device includes, but is not limited to, a mouse, a remote control, and the like.

Step 307: In response to the fourth operation, play other first video data adapted to the current user, or other first video data including other audiovisual elements.

If the currently playing first video data is personalized push video data, that is, the first video data is video data adapted to the current user (represented by ID, etc.), the client switches the first video data in response to the user Fourth operation, you can request the server to play other first video data adapted to the current user in the form of URL, etc. After receiving the request, the server can send other first videos adapted to the current user to the client Part or all of the data.

After buffering part or all of the other first video data adapted to the current user, the client can call the video player provided by the operating system to play other first video data adapted to the current user, that is, in the first video In the user interface, the screen is switched to display other first video data adapted to the current user, and the speaker is driven to switch and play audio of the other first video data adapted to the current user.

If the currently playing first video data is non-personalized push video data, the current first video data contains other audiovisual elements, and the first video data is the video data in the video set corresponding to the other audiovisual elements, that is, the video set indicates that it contains For the video data of the same audio-visual element, for convenience of distinction, the video data is referred to as the second video set in this embodiment.

In response to the user's fourth operation of switching the first video data, the client may request the server to play other first video data in the second video set, and after receiving the request, the server may send the first video to the client. Part or all of the other first video data in the second video set.

After buffering part or all of the other first video data in the second video set, the client can call the video player provided by the operating system to play the other first video data in the second video set, that is, in the In the first user interface, the screen for displaying other first video data in the second video set is switched, and the speaker is driven to switch and play audio of the other first video data in the second video set.

At this time, the user can trigger a return operation for the return control in the first user interface through a touch operation or the like, the client receives the return operation acting on the return control, and, in response to the return operation, displays the second user interface, Video summary information of the first video data in the second video set is displayed in the second user interface.

In this embodiment, the types of the first video data are distinguished, and other first video data adapted to the user and other first video data in the second video set are respectively pushed for personalized and non-personalized service scenarios, which can ensure that The accuracy of the first video data switching meets the requirements of business scenarios.

Embodiment 3

4 is a flowchart of a method for producing video data according to Embodiment 3 of the present application. This embodiment is applicable to the situation of providing audiovisual elements of existing video data and producing new video data. The video data production device can be implemented by software and/or hardware, and can be configured in computer equipment, such as a server, a workstation, etc., and includes the following steps:

Step 401: Send the first video data to the client.

In this embodiment, the operating system of the computer device may include Unix, Linux, Windows Server, Netware, etc., in these operating systems, a server is supported, and the server is configured to provide video services to multiple clients, such as push Video data, publish video data, etc.

The server may determine the first video data in a personalized or non-personalized manner, and send part or all of the first video data to the client.

If the first video data is produced by mixing a video data with one or more audiovisual elements, the server may send element summary information of one or more audiovisual elements to the client, where the element summary information represents the first video The audiovisual elements that the data contains.

For the first video data and element summary information, the client is configured to display the one or more element summary information on the first user interface when playing the first video data.

In an embodiment of the present application, the personalized first video data is pushed to the current client through a multi-objective optimization algorithm. In this embodiment, step 401 may include the following steps:

Step 4011: Acquire historical data recorded when the user browses the video data.

The user (represented by ID, etc.) browses the video data on the client side, and the server side records the information during the browsing process in a log file and stores it in the database.

For the user, the server can query the historical data recorded when the user browses the video data in the log file of the database, and wait to filter the video data suitable for the user.

Step 4012: Extract features from historical data as behavior features.

Under the dimension of features, various types of features can be extracted from the user's historical data as behavioral features.

In one example, the behavioral characteristic may include at least one of the following:

1. User characteristics

In this example, user characteristics may be collected from historical data as user characteristics.

In one aspect, the user characteristics include characteristics inherent to the user, eg, ID (ie, User ID (UID)), gender, age, country, and the like.

On the other hand, the user features include user dynamic features, for example, viewing behaviors in a recent period of time, interaction behaviors in a recent period of time, preferences for multiple types of video data in a recent period of time, and so on.

2. Video Features

In this example, features of video data may be collected from historical data as video features.

In one aspect, the video features include features inherent to the video data, such as ID (ie, Video ID (VID)), length, tag, UID of the photographer (the user who made the video data), and the like.

On the other hand, the video features include dynamic features of the video data, for example, the number of times pushed to users in a recent period of time, the number of times it was viewed in a recent period of time, the number of times it was liked in a recent period of time, and so on.

3. Contextual Features

In this example, the characteristics of the environment where the user browses the video data can be collected from the historical data, as the context characteristics, for example, the time of requesting to browse the video data, the location of the request to browse the video data, the network status of the request to browse the video data, etc.

4. Cross feature

In this example, at least two of the user feature, the video feature, and the context feature may be combined to obtain a cross feature, thereby increasing the dimension of the feature.

For example, combine the user's UID with the tag of the video data as a cross feature, combine the user's ID with the photographer's UID as a cross feature, and so on.

The above behavioral features are only examples. When implementing the embodiments of the present application, other behavioral features may be set according to actual situations, which are not limited in the embodiments of the present application.

Step 4013: Use the behavior feature to predict multiple probabilities corresponding to the user performing multiple target behaviors on the video data.

In the dimension of the target, according to business requirements, not only pay attention to whether the user clicks on the video data, but also consider the playback time of the video data after clicking, and whether there is further interaction with the video data, such as likes, comments, and shares.

In this embodiment, a multi-task learning model can be set, and the multi-task learning model can be used to calculate that the user performs multiple (two or more) target behaviors (such as clicking, playing time, like, comment, share) on the video data , collection, attention, etc.), the probability is expressed as follows:

p(u _i ,v _j ,t)

Among them, _ui represents the i-th user, v _j represents the j-th video data, and t is the current moment, so the probability is abbreviated as pi _,j .

In this embodiment, in order to motivate users to produce (that is, to produce new video data) services, a target behavior of conversion from consumption to production is added, that is, to request other video data containing the same audiovisual elements as the video data, to produce new, For the video data containing the audiovisual element, the target behavior can refer to steps 101-105.

Then, when training a multi-task learning model, if the user requests other video data through an audiovisual element of a video data and creates new video data containing the audiovisual element, the video data can be set as a positive sample, and the negative sample is Video data that has been viewed without triggering audiovisual elements.

The multi-task learning model can be a neural network, such as a deep neural network (Deep Neural Networks, DNN), etc., or can be other machine learning models, such as a logistic regression (Logistics Regression, LR) model, user click-through rate (Click-Through- Rate, CTR) model, etc., the type of the multi-task learning model is not limited in this embodiment.

The multi-task learning model can be trained based on multi-task learning. Multi-task learning is a learning method that derives transfer. Multiple goals (such as the target behavior in this embodiment) are put together to learn from each other, and related goals (such as this The target behavior in the embodiment) shared information and the noise introduced by irrelevant targets can improve the generalization ability of the multi-task learning model to a certain extent.

Multi-task learning belongs to the category of transfer learning. The main difference between it and transfer learning is that it learns to improve the effect of the model through multiple targets (such as the target behavior in this example), while the usual transfer learning uses other targets to improve the effect of the model. Improve the learning effect of a target.

In implementation, a model based on parameter sharing can be used as a multi-task learning model. Taking neural network as an example, as shown in Figure 5, the multi-task learning model receives the same input (Input), the underlying network shares model parameters, and multiple The target behaviors (such as Task1, Task2, Task3, Task4, etc.) learn from each other, and the gradients are back-propagated at the same time, which can improve the generalization ability of the multi-task learning model.

Step 4014 , fuse the multiple probabilities into a quality value of the video data for the user.

With reference to multiple probabilities of the user performing multiple target actions on the same video data, the quality value of the video data for the user can be evaluated, and the quality value can be used to indicate the degree of the user's preference for the video data under the target dimension.

In general, the quality value is positively correlated with the probability, that is, the higher the probability, the greater the quality value, and the lower the probability, the smaller the quality value.

In one example, multiple probabilities can be fused into a quality value of video data for the user by means of linear fusion, and feature weights are configured for each probability. In this case, the larger the feature weight, the more important the target behavior is.

The product between each probability and the feature weight corresponding to each probability is calculated as the feature value, and the sum of all the feature values is calculated as the quality value of the video data for the user.

Denote the set of target behaviors as O={O ₁ ,O ₂ ,...,O _k }, and predict that the probability of user _ui performing the target behavior on video data V _j is

Then the quality value of video data V _j for user _ui is:

Among them, w _l is the feature weight.

Step 4015: If the quality value satisfies the preset recall condition, set the video data to which the quality value belongs as the first video data adapted to the user.

In this embodiment, recall conditions can be preset, for example, n (n is a positive integer) quality values with the highest numerical value, the quality value is greater than the threshold, m% (m is a positive number) quality values with the highest numerical value, etc. .

If the quality value of the current video data meets the recall condition, the video data to which the quality value belongs is set as the first video data adapted to the user, and at this time, the user's identification and the identification of the first video data can be recorded. relationship between.

Step 4016: Send the first video data to the client.

In this embodiment, steps 4011 to 4015 can be performed offline. In the case of online, if a user (represented by an ID, etc.) is currently logged in to a client, the user's ID can be used as a search condition to search for and The identifier of the first video data associated with the identifier of the user is searched for the first video data based on the identifier of the first video data, and the first video data is sent to the client.

Step 402: When a request triggered by the client based on the element summary information is received, search for second video data containing audiovisual elements.

When receiving the first operation acting on the element summary information, the client generates a request, sends the request to the server, and requests the server to push the second video data including the audiovisual element corresponding to the element summary information.

The server can collect video data in various ways, mark the video data with audiovisual elements (represented by ID, etc.), and store the video data in the local database of the server.

When receiving the request from the client, the server can search for the video data marked with the audiovisual element as the second video data, and write the second video data into the first video set.

Step 403: Send the video summary information of the second video data to the client.

The client extracts the video summary information (such as cover, name, producer, etc.) of the second video data in the first video set from the local database, and sends the video summary information to the client.

For the video summary information, the client may be configured to jointly display the video summary information and production controls on the second user interface.

Step 404: When a request triggered by the client based on the production control is received, send the audiovisual element corresponding to the element summary information to the client.

When receiving the second operation acting on the production control, the client generates a request, sends the request to the server, and requests the server to push the audiovisual element corresponding to the element abstract information (represented by ID, etc.).

After receiving the request, the server searches for an independent audiovisual element corresponding to the element abstract information (represented by ID, etc.), and sends the audiovisual element to the client.

For the audiovisual element, the client may be configured to collect third video data, and add the audiovisual element corresponding to the element summary information to the third video data.

In an example, the audiovisual element includes audio data, then in this example, the audio data corresponding to the element summary information may be sent to the client, and the client may be configured to set the audio data corresponding to the element summary information as the third Background music for video data.

In another example, the audiovisual element includes fourth video data, then in this example, the fourth video data corresponding to the element summary information may be sent to the client, and the client may be configured to synthesize the element in a split-screen manner The fourth video data and the third video data corresponding to the abstract information.

If the client completes the production of the third video data, the client can upload the third video data, and the server can receive the third video data sent by the client, and mark the third video data with audiovisual elements (represented by ID, etc.), if After the marking is completed, the third video data is released, so that other clients can browse the third video data.

Since the application of this embodiment is basically similar to that of the first embodiment, the description is relatively simple, and the relevant parts may refer to the partial description of the first embodiment, and this embodiment will not be described in detail here.

In this embodiment, the first video data is sent to the client, and the client is set to display element summary information when the first video data is played, and the element summary information indicates the audiovisual elements contained in the first video data. When the terminal is based on a request triggered by the element summary information, it searches for the second video data containing audiovisual elements, and sends the video summary information of the second video data to the client terminal. The client terminal is set to jointly display the video summary information and make controls. When the client sends the audiovisual element corresponding to the element summary information to the client based on the request triggered by the production control, the client is set to collect the third video data, and the audiovisual element corresponding to the element summary information is added to the third video data. On the other hand, users can use the audiovisual elements contained in the existing video data to create new video data. The audiovisual elements do not depend on the template of the system, and the channels are diversified, which can maintain the personalization of the audiovisual elements, thereby ensuring the individuality of the newly produced video data. On the other hand, the system provides audio-visual elements, which can ensure that the format of the audio-visual elements conforms to the production specifications, and can be directly used to produce new video data, avoiding the need for users to use professional applications to revise the elements, greatly reducing the technical threshold, Time consuming is reduced, thereby reducing the cost of producing video data.

Embodiment 4

6 is a flowchart of a method for producing video data according to Embodiment 4 of the present application. Based on the foregoing embodiments, this embodiment adds operations of switching the first video data and playing the second video data. The method includes the following steps: step:

Step 601: Send the first video data to the client.

For the first video data, the client is configured to display element summary information when the first video data is played, where the element summary information indicates audiovisual elements included in the first video data.

Step 602: When a request triggered by the client based on the element summary information is received, search for second video data containing audiovisual elements.

Step 603: Send the video summary information of the second video data to the client.

For the video summary information of the second video data, the client is set to jointly display the video summary information and the production controls.

Step 604: When a request triggered by the client based on the video summary information is received, send the second video data to which the video summary information belongs to the client for playback.

When receiving the third operation acting on the video summary information, the client generates a request, sends the request to the server, and requests the server to push the second video data corresponding to the video summary information.

When receiving the request from the client, the server can search for the second video data corresponding to the video summary information, and send part or all of the second video data to the client.

After buffering part or all of the second video data, the client can call the video player provided by the operating system to play the second video data.

Step 605: When receiving a request triggered by the client based on the first video data, send other first video data adapted to the user to the client for playback, or send other first video data containing other audiovisual elements to the client for playback.

When receiving the fourth operation acting on the first video data, the client generates a request, sends the request to the server, and requests the server to push other first video data.

When receiving the request, the server identifies the type of the first video data, so as to distinguish and push different first video data.

If the currently playing first video data is personalized push video data, that is, it is adapted to the current user (represented by ID, etc.), the server can send to the client other first video data adapted to the current user. some or all of the data.

After buffering part or all of the other first video data adapted to the current user, the client can call the video player provided by the operating system to play the other first video data adapted to the current user.

If the currently playing first video data is non-personalized push video data, the current first video data contains other audiovisual elements, and the first video data comes from the second video set corresponding to the other audiovisual elements, the server can send the The client sends part or all data of other first video data in the second video set.

After buffering part or all of the other first video data in the second video set, the client can call the video player provided by the operating system to play the other first video data in the second video set.

Since the application of this embodiment is basically similar to that of the second embodiment, the description is relatively simple, and the relevant parts may refer to the partial description of the second embodiment, and this embodiment will not be described in detail here.

For the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but the embodiments of the present application are not limited by the described sequence of actions, because according to the embodiments of the present application, some steps may adopt other sequences or both. Secondly, the embodiments described in the text are all examples, and the actions involved are not necessarily required by the embodiments of the present application.

Embodiment 5

7 is a structural block diagram of an apparatus for producing video data according to Embodiment 5 of the present application, which may include the following modules:

The display screen 701 is set to display element summary information when the first video data is played, and the element summary information represents the audiovisual elements contained in the first video data; the touch screen 702 is set to receive the element summary that acts on the element summary. The first operation of information; the display screen 701 is further configured to, in response to the first operation, jointly display video summary information and production controls of the second video data, where the second video data includes the audiovisual elements; the touch screen 702, further configured to receive a second operation acting on the production control; a camera 703, configured to collect third video data in response to the second operation; and a processor 704, configured to Audiovisual elements are added to the third video data.

The apparatus for producing video data provided by the embodiment of the present application can execute the method for producing video data provided by any embodiment of the present application, and has functional modules and effects corresponding to the execution method.

Embodiment 6

8 is a structural block diagram of an apparatus for producing video data according to Embodiment 6 of the present application, which may include the following modules:

The first video data sending module 801 is configured to send the first video data to the client, and the client is configured to display element summary information when the first video data is played, and the element summary information indicates the first video data. an audiovisual element contained in video data; the second video data search module 802 is configured to search for second video data containing the audiovisual element when receiving a request triggered by the client based on the element summary information; video summary The information sending module 803 is configured to send the video summary information of the second video data to the client, and the client is configured to jointly display the video summary information and production controls; the audiovisual element sending module 804 is configured to When a request triggered by the client based on the production control is received, the audiovisual element corresponding to the element summary information is sent to the client, and the client is set to collect third video data, and the element Audiovisual elements corresponding to the summary information are added to the third video data.

Embodiment 7

FIG. 9 is a schematic structural diagram of a computer device according to Embodiment 7 of the present application. Figure 9 shows a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present application.

As shown in FIG. 9, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to, one or more processors or processing units 16 , system memory 28 , and a bus 18 connecting various system components including system memory 28 and processing unit 16 .

System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Storage system 34 may be configured to read and write to non-removable, non-volatile magnetic media (not shown in Figure 9, commonly referred to as a "hard disk drive").

A program/utility 40 having a set (at least one) of program modules 42 may be stored in memory 28, for example.

Computer device 12 may also communicate with one or more external devices 14 (eg, keyboard, pointing device, display 24, etc.). Such communication may take place through an input/output (I/O) interface 22 . Also, computer device 12 may communicate with one or more networks (eg, Local Area Network (LAN), Wide Area Network (WAN), and/or public networks such as the Internet) through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 .

The processing unit 16 executes a variety of functional applications and data processing by running the programs stored in the system memory 28, for example, implementing the video data production method provided by the embodiments of the present application.

Embodiment 8

The eighth embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, multiple processes of the above-mentioned video data production method can be realized, and the same can be achieved. In order to avoid repetition, the technical effect will not be repeated here.

Claims

A method for producing video data, comprising:

In the case of playing the first video data, element summary information is displayed, wherein the element summary information is used to represent audiovisual elements included in the first video data;

receiving a first operation acting on the element summary information;

in response to the first operation, collectively displaying video summary information and production controls for second video data, wherein the second video data includes the audiovisual element;

receiving a second operation acting on the production control;

In response to the second operation, third video data is collected, and audiovisual elements corresponding to the element summary information are added to the third video data.
The method according to claim 1, wherein the jointly displaying the video summary information and the production control of the second video data comprises:

displaying an information area, wherein the area of the information area matches the type of the audiovisual element;

displaying video summary information of the second video data in the information area;

The authoring control is displayed, wherein the authoring control is suspended on the information area.
The method of claim 2, wherein the displaying the information area comprises:

In the case that the type of the audiovisual element is a visual element, display a first area, and use the first area as the information area;

In the case that the type of the audiovisual element is an audible element, display a second area, and use the second area as the information area;

Wherein, the area of the first region is larger than that of the second region.
The method of claim 2, wherein the element summary information includes element image data;

The jointly displaying the video summary information and the production control of the second video data further includes:

displaying the element image data in the form of a background;

The element summary information is displayed in the form of a title.
The method of claim 1, wherein the audiovisual element comprises audio data;

The adding the audiovisual element corresponding to the element summary information to the third video data includes:

The audio data corresponding to the element summary information is set as the background music of the third video data.
The method of claim 1, wherein the audiovisual element comprises fourth video data;

The adding the audiovisual element corresponding to the element summary information to the third video data includes:

The fourth video data and the third video data corresponding to the element summary information are synthesized in a split-screen manner.
The method of claim 1, further comprising:

receiving a third operation acting on the video summary information;

In response to the third operation, second video data to which the video summary information belongs is played.
The method according to any one of claims 1-7, further comprising:

receiving a fourth operation acting on the first video data;

In response to the fourth operation, play other first video data adapted to the current user, or play other first video data including the other audiovisual elements if the first video data includes other audiovisual elements .
The method according to any one of claims 1-7, after adding the audiovisual element corresponding to the element summary information to the third video data, further comprising:

A release markup contains the third video data of the audiovisual element.
A method for producing video data, comprising:

Send the first video data to the client, wherein the client is configured to display element summary information when the first video data is played, and the element summary information is used to indicate that the first video data contains audiovisual elements;

in the case of receiving a request triggered by the client based on the element summary information, searching for second video data containing the audiovisual element;

sending the video summary information of the second video data to the client, wherein the client is further configured to jointly display the video summary information and production controls;

In the case of receiving the request triggered by the client based on the production control, send the audiovisual element corresponding to the element summary information to the client, wherein the client is further configured to collect third video data , adding the audiovisual element corresponding to the element summary information to the third video data.
The method according to claim 10, wherein the sending the first video data to the client comprises:

Obtain the historical data recorded under the situation that the user browses the video data, wherein, the user is currently logged in to a client;

Extract features from the historical data, and use the extracted features as behavioral features;

Using the behavior feature to predict a plurality of probabilities respectively corresponding to the user performing a plurality of target actions on the video data, wherein the target actions include requesting other video data containing the same audiovisual elements as the video data, and making new video data containing the audiovisual element;

fusing the plurality of probabilities into a quality value of the video data for the user;

In the case that the quality value satisfies a preset recall condition, setting the video data to which the quality value belongs as the first video data adapted to the user;

Sending the first video data to the client.
The method according to claim 11, wherein the behavioral features include at least one of user features, video features, contextual features, and cross-features;

The extracting features from the historical data, as behavioral features, include:

Collect the user's characteristics from the historical data as the user's characteristics;

The features of the video data are collected from the historical data as the video features;

Collect the characteristics of the environment in which the user browses the video data from the historical data, as the context characteristics;

The intersection feature is obtained by combining at least two of the user feature, the video feature, and the context feature.
The method according to any one of claims 11-12, wherein the fusing the plurality of probabilities into a quality value of the video data for the user comprises:

Configure feature weights for each probability;

Calculate the product between each probability and the feature weight corresponding to each probability, and use the product as the feature value corresponding to each probability;

The sum of all feature values is calculated as the quality value of the video data for the user.
A device for producing video data, comprising:

a display screen, configured to display element summary information when the first video data is played, wherein the element summary information is used to represent audiovisual elements included in the first video data;

a touch screen, configured to receive a first operation acting on the element summary information;

a display screen, further configured to jointly display video summary information and production controls of second video data in response to the first operation, wherein the second video data includes the audiovisual element;

a touch screen, further configured to receive a second operation acting on the production control;

a camera, configured to collect third video data in response to the second operation;

The processor is configured to add the audiovisual element corresponding to the element summary information to the third video data.
A device for producing video data, comprising:

The first video data sending module is configured to send the first video data to the client, wherein the client is configured to display element summary information when the first video data is played, and the element summary information uses to represent the audiovisual elements contained in the first video data;

A second video data search module, configured to search for the second video data containing the audiovisual element in the case of receiving a request triggered by the client based on the element summary information;

a video summary information sending module, configured to send the video summary information of the second video data to the client, wherein the client is further configured to jointly display the video summary information and the production control;

The audiovisual element sending module is configured to send the audiovisual element corresponding to the element summary information to the client when receiving the request triggered by the client based on the production control, wherein the client also further Setting is to collect third video data, and add audiovisual elements corresponding to the element summary information to the third video data.
A computer device comprising:

at least one processor;

a memory, arranged to store at least one program;

When the at least one program is executed by the at least one processor, the at least one processor implements the method for producing video data according to any one of claims 1-13.
A computer-readable storage medium configured to store a computer program, when the computer program is executed by a processor, the method for producing video data according to any one of claims 1-13 is implemented.