CN117336524A

CN117336524A - Video data processing method, device, equipment and storage medium

Info

Publication number: CN117336524A
Application number: CN202311237209.5A
Authority: CN
Inventors: 苟亚明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-22
Filing date: 2023-09-22
Publication date: 2024-01-02

Abstract

The embodiment of the application discloses a video data processing method, a device, equipment and a storage medium, wherein the method is executed by service cluster equipment, the service cluster equipment comprises M nodes, the M nodes comprise root nodes, the root nodes are nodes with minimum load in the M nodes, and M is a positive integer; the method comprises the following steps: when a video editing request aiming at a target object is acquired, determining a node which is switched to an online state in M nodes as an online node through a root node, and distributing preprocessing image data associated with the target object to the online node; detecting and identifying a target object in the preprocessed image data through an online node to obtain target object region information; and performing video frame editing processing on the region indicated by the target object region information in the preprocessed image data through a pause node, and recoding the video data to obtain the target coded video. By adopting the embodiment of the application, the video identification efficiency can be improved.

Description

Video data processing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of video recognition technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing video data.

Background

At present, with the development of video recognition technology, more and more application occasions are related to video recognition, and with the higher and higher requirements of video recognition, local shielding processing is often required to be performed on service objects in video data, so that recognition processing and mosaic processing are required to be performed on the service objects in the video data. The current method for identifying and playing mosaic on video data generally uses a manual identification and manual mosaic playing mode, and the detection and mosaic playing are carried out by manually watching the video data.

Disclosure of Invention

The embodiment of the application provides a video data processing method, a device, equipment and a storage medium, which can improve the efficiency of video identification.

In one aspect, the embodiment of the application provides a video data processing method, which is executed by service cluster equipment, wherein the service cluster equipment comprises M nodes, the M nodes comprise root nodes, the root nodes are nodes with minimum load in the M nodes, and M is a positive integer; the method comprises the following steps:

When a video editing request aiming at a target object is acquired, determining a node which is switched to an online state in M nodes as an online node through a root node, and distributing preprocessing image data associated with the target object to the online node; preprocessing image data is generated based on video data associated with a target object;

detecting and identifying a target object in the preprocessed image data through an online node to obtain target object region information;

determining a node which is switched to a pause state from M nodes as a pause node through a root node, and distributing the preprocessed image data and the target object region information acquired from the online node to the pause node;

performing video frame editing processing on the region indicated by the target object region information in the preprocessed image data through a pause node, and recoding the video data to obtain a target coded video; the target video frame is a video frame obtained after decoding the target coded video; the regions associated with the target object in the target video frame are all the edited regions.

An aspect of an embodiment of the present application provides a data processing apparatus, including:

The online node determining module is used for determining a node which is switched to an online state in M nodes as an online node through a root node when a video editing request for a target object is acquired, and distributing the preprocessed image data associated with the target object to the online node; preprocessing image data is generated based on video data associated with a target object;

the image data detection module is used for detecting and identifying a target object in the preprocessed image data through the online node to obtain target object region information;

a pause node determining module for determining a node which has been switched to a pause state among the M nodes as a pause node by the root node, and distributing the preprocessed image data and the target object region information acquired from the online node to the pause node;

the region editing module is used for carrying out video frame editing processing on the region indicated by the target object region information in the preprocessed image data through the pause node;

the video data coding module is used for recoding the video data to obtain a target coded video; the target video frame is a video frame obtained after decoding the target coded video; the regions associated with the target object in the target video frame are all the edited regions.

Wherein, the M nodes also comprise a main node; the data processing apparatus further includes:

the video frame intercepting module is used for intercepting video frames of the video data through the master node when detecting that the client requests the video data;

the minimum heap construction module is used for constructing a minimum heap node structure diagram based on the loads of M nodes; the root node in the minimum heap node structure diagram is the node with the minimum load in M nodes;

the video frame distribution module is used for distributing the video frames to the root node;

the video frame preprocessing module is used for preprocessing video frames in the offline node to obtain preprocessed image data;

and the image data transmitting module is used for returning the preprocessed image data to the root node.

Wherein, the video frame preprocessing module includes:

an offline node determining unit configured to determine, by the root node, a node, which has been switched to an offline state, among the M nodes as an offline node;

the video frame selection unit is used for sending the video frames to the offline node, and selecting key video frames from the video frames according to the acquired video frame selection strategy in the offline node;

the image size detection unit is used for detecting the image size of the key video frame to obtain the key video frame size data and the key video frame pixel data;

The image scaling unit is used for performing image scaling processing on the key video frame size data based on the acquired standard size data to obtain standard size video frame data;

the image pixel detection unit is used for detecting image pixels of the standard-size video frame data to obtain video frame pixel data;

and the color space conversion unit is used for carrying out color space conversion processing on the pixel data of the video frame based on the standard color space data to obtain preprocessed image data.

Wherein, the image data detection module includes:

the image feature extraction unit is used for extracting image features corresponding to the preprocessed image data through the target neural network model;

the characteristic input unit is used for inputting the image characteristic into a residual network layer in the target neural network model;

the characteristic residual processing unit is used for carrying out residual processing on the image characteristics through a residual network layer to obtain residual image characteristics;

and the feature full-connection unit is used for carrying out full-connection processing on the residual image features to obtain target object region information.

Wherein the residual network layer comprises M residual units, and the M residual units comprise residual units S _i M is a positive integer, i is a positive integer less than or equal to M; a feature residual processing unit comprising:

A first convolution subunit for inputting the input features into the residual unit S _i Through residual unit S _i Carrying out convolution processing on the input features to obtain intermediate features i; if residual unit S _i The input feature is an image feature for the first residual unit in the residual network layer; if residual unit S _i For the second residual unit in the residual network layer, the input features include residual unit S _i-1 The output intermediate feature i-1 and the image feature; residual unit S _i-1 Is a residual unit S _i A previous layer residual unit of (a);

a second convolution subunit for inputting the auxiliary feature and the intermediate feature into the residual unit S _i+1 Through residual unit S _i+1 Carrying out convolution processing on the input feature and the intermediate feature i to obtain an intermediate feature i+1; residual unit S _i+1 Is a residual unit S _i A next layer residual unit of (a); if residual unit S _i The auxiliary feature is an image feature for the first residual unit in the residual network layer; if residual unit S _i Not the first residual unit in the residual network layer, the auxiliary feature is the intermediate feature i-1;

a feature determination subunit for determining if the residual unit S _i+1 For the last layer of residual units in the residual network layer, the intermediate feature i+1 is determined as the residual image feature.

Wherein, the data processing device still includes:

the label type acquisition module is used for acquiring the label type of the target object, and sending the label type of the target object to the connectable online node in a heartbeat connection mode so that the connectable online node can inquire associated video data associated with the target object in a video cluster with the label type of the target object;

the video data distribution module is used for distributing the associated video data queried by the connectable online node to the offline node through the root node, extracting key associated video frames in the associated video data through the offline node, and carrying out size pixel preprocessing on the key associated video frames to obtain preprocessed associated data;

the object detection module is used for distributing the preprocessing associated data to the online node through the root node, and detecting and identifying a target object in the preprocessing associated data through the online node to obtain associated object area information corresponding to the preprocessing associated data;

the associated coding video acquisition module is used for distributing the preprocessing associated data and the associated object area information to the pause node through the root node, editing the area indicated by the associated object area information in the preprocessing associated data through the pause node to obtain an associated editing video frame, and recoding the associated video data according to the associated editing video frame to obtain an associated coding video.

The region editing module is specifically configured to perform video frame editing processing on a region indicated by the target object region information in the preprocessed image data through the pause node, so as to obtain an edited video frame.

Wherein, video data encoding module includes:

the frame identification mapping unit is used for mapping the frame identification corresponding to the edited video frame in the file header of the video data;

an adjacent video frame acquisition unit, configured to acquire a video frame adjacent to the key video frame in the video data as an adjacent video frame;

the frame group task acquisition unit is used for adding the edited video frames and the adjacent video frames which are associated with the same key video frame into the same frame group task, adding the key video frames which are not associated with the edited video frames and the corresponding adjacent video frames into the same frame group task to obtain S frame group tasks, wherein the number of S is the same as that of the key video frames, and S is a positive integer;

and the task parallel coding unit is used for carrying out parallel coding on the S frame group tasks to obtain intra-frame coding frames corresponding to the edited video frames in the S frame group tasks respectively, intra-frame coding frames corresponding to the unedited key video frames in the S frame group tasks respectively and predictive coding frames corresponding to the S frame group tasks respectively, and packaging the intra-frame coding frames, the predictive coding frames and the file header into target coded video.

The video data coding module is specifically used for sending the target coded video to the client through the master node; the header file of the target coded video is used for informing a decoder in the client of the position of the intra-frame coded frame belonging to the edited video frame; the decoder is used for determining a region to be edited in the video frame with the coding dependency relationship with the decoded editing video frame based on the target object region information in the decoded editing video frame in the process of decoding the target encoding video, and performing editing rendering in the region to be edited.

Wherein the offline node determining unit includes:

the characteristic information mapping subunit is used for acquiring characteristic information of the video data through the root node, mapping the characteristic information to a hash space and obtaining a hash value of the video data;

an offline node selecting subunit for determining a node having an offline state among the M nodes, and selecting an offline node for generating the pre-processed image data among the nodes having an offline state according to the hash value of the video data.

Wherein, the image data detection module includes:

the sample image acquisition unit is used for acquiring sample image data and sample two-dimensional position information corresponding to a sample object in the sample image data;

The sample image input unit is used for inputting sample image data into the initial neural network model, and extracting sample image features corresponding to the sample image data through the initial neural network model;

the sample characteristic acquisition unit is used for inputting the sample image characteristics into a residual error network layer in the initial neural network model, and carrying out residual error processing on the sample image characteristics through the residual error network layer to obtain sample residual error image characteristics;

the sample position information acquisition unit is used for carrying out full connection processing on the sample residual image characteristics to obtain sample prediction position information corresponding to the sample object in the sample image data;

and the model parameter adjustment unit is used for carrying out parameter adjustment on the initial neural network model based on the sample prediction position information and the sample two-dimensional position information to obtain the target neural network model.

In one aspect, the present application provides a computer device comprising: a processor, a memory, a network interface;

the processor is connected to the memory and the network interface, where the network interface is used to provide a data communication function, the memory is used to store a computer program, and the processor is used to call the computer program to make the computer device execute the method in the embodiment of the present application.

In one aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, the computer program being adapted to be loaded by a processor and to perform a method according to embodiments of the present application.

In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium; the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods in the embodiments of the present application.

In the embodiment of the present application, when a video editing request for a target object is acquired, a node that has been switched to an online state in M nodes is determined as an online node by a root node, and preprocessed image data associated with the target object is distributed to the online node. Further, detecting and identifying the target object in the preprocessed image data through the online node to obtain target object region information. Further, the node which has been switched to the suspended state among the M nodes is determined as the suspended node by the root node, and the preprocessed image data and the target object area information acquired from the online node are distributed to the suspended node. Further, video frame editing processing is carried out on the area indicated by the target object area information in the preprocessed image data through a pause node, and video data is recoded, so that the target coded video is obtained. The target video frame is a video frame obtained after decoding the target coded video; the regions associated with the target object in the target video frame are all the edited regions. According to the embodiment of the application, the image processing of different steps is carried out on the nodes in different states, the division of the image processing tasks is realized, the resource utilization rate of the server cluster equipment is higher, and the automatic identification of the target object associated data by the online nodes and the pause nodes can enable a lot of video data to be identified at the same time in a plurality of nodes, so that the time for identifying the video data is shortened, and the efficiency of identifying the video data is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a network interaction architecture according to an embodiment of the present application;

fig. 2 is a schematic view of a scenario for performing server cluster device call according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of obtaining a target neural network model according to an embodiment of the present application;

FIG. 5 is a flowchart of another data processing method according to an embodiment of the present disclosure;

FIG. 6 is a flow chart for video recognition provided by an embodiment of the present application;

FIG. 7 is a flow chart of another embodiment of the present application for video recognition;

FIG. 8 is a flow chart for video recognition according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

It will be appreciated that in the specific embodiments of the present application, where object or user related data (e.g., video data) is involved, user permissions or consent may be required when the embodiments of the present application are applied to specific products or technologies, and the collection, use and processing of the related data may be required to comply with relevant national and regional laws and regulations and standards.

The embodiment of the application provides a data processing method, and the method relates to the field of artificial intelligence. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology (voice technology), a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

The scheme provided by the embodiment of the application relates to artificial intelligence natural language processing and other technologies, and the specific process is described through the following embodiment.

Referring to fig. 1, fig. 1 is a schematic diagram of a network interaction architecture according to an embodiment of the present application. The network interaction architecture may include a server 100 and a terminal cluster, which may include: terminal device 200a, terminal device 200b, terminal devices 200c, …, and terminal device 200n, wherein server 100 may be a server cluster device in the embodiment of the present application. Wherein a communication connection may exist between the terminal clusters, for example, a communication connection exists between the terminal device 200a and the terminal device 200b, and a communication connection exists between the terminal device 200a and the terminal device 200 c. Meanwhile, any terminal device in the terminal cluster may have a communication connection with the server 100, for example, a communication connection exists between the terminal device 200a and the server 100, where the communication connection is not limited to a connection manner, and may be directly or indirectly connected through a wired communication manner, may also be directly or indirectly connected through a wireless communication manner, or may also be other manners, and the application is not limited herein.

It should be understood that each terminal device in the terminal cluster shown in fig. 1 may be provided with an application client having a video playing function, and when the application client runs in each terminal device, data interaction may be performed between the application client and the server 100 shown in fig. 1. The application client can be an application client with a video playing function, such as a live broadcast application, a short video application, a video playing application, a music application, a shopping application, a game application, a novel application, a payment application, a browser and the like. The application client may be a stand-alone client, or may be an embedded sub-client integrated in a client (e.g., a video client, a payment client, a financial client, or a shopping client), which is not limited herein. Taking a video application as an example, the service server 100 may be a set of multiple servers including a network resource server, a video data transmission scene server, a database proxy server, an AI server, and the like corresponding to the video application, so that each terminal device may perform data interaction related to video data with the server 100 through an application client corresponding to the video application, for example, each terminal device may perform video data interaction with the server 100 (for example, in the video application, play video data).

It may be appreciated that the method provided in the embodiments of the present application may be performed by a server cluster device, which includes, but is not limited to, a server or a server cluster made up of a plurality of servers. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing a cloud database, cloud service, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, basic cloud computing service such as big data and an artificial intelligent platform. The above-mentioned terminal device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a vehicle-mounted device, an augmented Reality/Virtual Reality (AR/VR) device, a head-mounted display, a smart television, a wearable device, a smart speaker, a digital camera, a camera, and other mobile internet devices (mobile internet device, MID) with network access capability, or a terminal device in a scene such as a train, a ship, or a flight.

Further, referring to fig. 2, fig. 2 is a schematic view of a scenario for performing server cluster device call according to an embodiment of the present application. As shown in fig. 2, where the server cluster device 2A may be the server 100 of fig. 1 above, the server cluster device 2A may include nodes in different states, such as an offline node in an offline state, an online node in an online state, and a pause node in a pause state. The server cluster device 2A may perform a state determination according to the communication connection. Specifically, the server cluster device 2A may further include a master node, a root node, and the like. The server cluster device 2A may divide the nodes into different nodes according to the node function. Wherein different nodes may be understood as different components of the server cluster device 2A that are isolated from each other. Alternatively, the server cluster device 2A may be understood as a server cluster, where the server cluster includes a plurality of servers, and different nodes are respectively different servers in the server cluster. Specifically, in the server cluster device 2A, when the server cluster device 2A acquires a video editing request for a target object, the node that has been switched to the online state among the M nodes is determined as the online node 22P by the root node 21P, and the preprocessed image data associated with the target object is distributed to the online node 22P. Wherein the pre-processed image data is generated based on video data associated with the target object. Wherein, the video data can be understood as reply information of the video data loading request. That is, the server cluster device 2A may perform streaming processing on the video data after receiving the video data loading request. Further, the server cluster device 2A may detect and identify the target object in the preprocessed image data through the online node 22P, to obtain the target object region information. Further, the server cluster device 2A may determine, as the suspended node 23P, a node that has been switched to the suspended state among the M nodes by the root node 21P, and distribute the preprocessed image data and the target object area information acquired from the online node 22P to the suspended node 23P. Further, the server cluster device 2A may perform video frame editing processing on the area indicated by the target object area information in the preprocessed image data through the pause node 23P, and re-encode the video data to obtain the target encoded video. The target video frame is a video frame obtained after decoding the target coded video. Specifically, the regions associated with the target object in the target video frame are all the edited regions.

The server cluster device 2A may further determine, through the root node 21P, a node that has been switched to an offline state from among the M nodes as an offline node 24P, distribute video data to the offline node 24P, and perform video preprocessing to obtain preprocessed image data. Further, the server cluster device 2A may send the target encoded video acquired from the pause node 23P to the client, so that the client decodes the target encoded video to obtain decoded video data. The client can render the region to be edited in the video frame where the target object is located in the decoding process, so that the effect that the target object is coded is achieved.

It is understood that a server cluster device may refer to a computer system architecture formed by connecting multiple servers together, where the multiple servers cooperatively provide services. The server cluster device can distribute the video data acquisition request of the client to each server node through a reasonable load balancing strategy, and can realize system architecture availability, system architecture expandability and system architecture flexibility during high load, single-point node failure or system architecture maintenance.

Further, referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 3, the method may be performed by a server cluster device, which may be the server 100 of fig. 1 above, or a server cluster formed by a plurality of servers, which is not limited herein. For easy understanding, the embodiment of the application is described by taking the method performed by the server cluster device as an example, and the data processing method at least may include the following steps S101 to S104:

step S101, when a video editing request for a target object is acquired, determining a node which is switched to an online state in M nodes as an online node through a root node, and distributing preprocessing image data associated with the target object to the online node; the pre-processed image data is generated based on video data associated with the target object.

Specifically, the server cluster device may obtain node states of M nodes, where the M nodes may include an online node in an online state, a pause node in a pause state, and an offline node in an offline state. The server cluster device can be in communication connection with the client through the online node in the online state, or the server cluster device can be in communication connection with other online nodes in the online state through the online node in the online state. Further, the server cluster device may distribute the preprocessed image data associated with the target object to the online node through the root node. In particular, the server cluster device may determine a cluster member node that is not fully valid as an offline node; accordingly, the server cluster device may determine the maintained or suspended cluster member node as a suspended node; accordingly, the server cluster device may determine nodes other than the cluster member node that is not fully active and the cluster member node that is maintained or suspended as online nodes. Further, the online nodes in the server cluster equipment can be in communication connection through a heartbeat mechanism. Further, a root node in the server cluster device may distribute the preprocessed image data associated with the target object to the online node through a connection between the root node and the online node. The root node may refer to a state of a node, which is a state corresponding to a child node, the root node is a node for performing node state judgment, and connections between the root node and other nodes between the server cluster devices may always exist. The target object may be an object that needs to be identified. Wherein video data may refer to files in a video format that may be used for video playback. For example, in the field of short video, video data may refer to a video file that may be played in a short video playback application.

Further, the online node in the server cluster device can acquire the tag type of the target object while identifying the target object, and send the tag type of the target object to the connectable online node (i.e., other online nodes in the server cluster device) in a heartbeat connection manner, so that the connectable online node queries associated video data associated with the target object in a video cluster with the tag type of the target object.

Further, before the root node of the server cluster device obtains the video editing request for the target object, the root node of the server cluster device may perform preset processing in advance, and a specific process of performing preset processing on the root node of the server cluster device may be as follows: the root node in the server cluster equipment can acquire the tag type through the historical video data calling condition and configuration information of the client. The root node in the server cluster device may obtain information including information about historical requests associated with the target object, historical topics or characteristics of historical topics, client configured information, such as request priority, existing historical video data classifications, associated data for historical video data in a database, and so forth. The root node in the server cluster device may categorize the tag types based on content characteristics of the video data, such as quality characteristics, popularity characteristics, in-life importance characteristics, etc. Specifically, the target label of the target object is used to represent classification information of the target object. For example, the target tags of the target objects may be "car", "movie star", "animal" and "plant", etc. In refinement, the target label of the target object may refer to a collection of certain attributes of the video data. Such as a certain cartoon, a certain actress, or a certain brand advertisement. That is, the associated video data corresponding to the target tag may be related clip video data of a certain cartoon, a short video clip of a television show starring by a certain actress, or video data with a certain brand advertisement.

Step S102, detecting and identifying the target object in the preprocessed image data through the online node to obtain target object region information.

Specifically, the online node in the server cluster device can extract the image features corresponding to the preprocessed image data through the target neural network model. Further, the online node in the server cluster device can input the image features into a residual network layer in the target neural network model, and residual processing is performed on the image features through the residual network layer to obtain residual image features. Further, the online node in the server cluster device can perform full connection processing on the residual image characteristics to obtain target object region information. The target neural network model may refer to a model for image feature extraction, for example, a depth residual network (Deep Residaul Network, resNet 50). Specifically, the online node in the server cluster device can perform residual processing through a plurality of residual units contained in the ResNet50, and the residual processing solves the problem of gradient disappearance or gradient explosion in the target neural network model, improves the accuracy of feature extraction of the target neural network model, and improves the training stability and convergence rate of the target neural network model.

The process of obtaining the residual image characteristics by the online node in the server cluster device may be as follows: specifically, the online node in the server cluster device may input the input feature into the residual error unit S _i Through residual unit S _i Carrying out convolution processing on the input features to obtain intermediate features i; if residual unit S _i For the first residual unit in the residual network layer, the input feature is a graphImage characteristics; if residual unit S _i For the second residual unit in the residual network layer, the input features include residual unit S _i-1 The output intermediate feature i-1 and the image feature; residual unit S _i-1 Is a residual unit S _i A previous layer residual unit of (a); if residual unit S _i Not the first residual unit and the second residual unit in the residual network layer, the input features include residual unit S _i-1 The output intermediate feature i-1 and residual unit S _i-2 The output intermediate feature i-2, residual unit S _i-2 Is a residual unit S _i-1 A previous layer residual unit of (a); wherein the residual network layer comprises M residual units, and the M residual units comprise residual units S _i M is a positive integer, i is a positive integer less than or equal to M;

further, the online node in the server cluster device may input the auxiliary and intermediate features i into the residual unit S _i+1 Through residual unit S _i+1 Carrying out convolution processing on the input feature and the intermediate feature i to obtain an intermediate feature i+1; residual unit S _i+1 Is a residual unit S _i A next layer residual unit of (a); if residual unit S _i The auxiliary feature is an image feature for the first residual unit in the residual network layer; if residual unit S _i Not the first residual unit in the residual network layer, the auxiliary feature is the intermediate feature i-1;

further, if the residual unit S _i+1 For the last layer of residual units in the residual network layer, the intermediate feature i+1 is determined as the residual image feature.

For example, in a specific embodiment, if the number of layers of the residual unit is four, the process of obtaining the residual image feature by the online node in the server cluster device may be as follows: specifically, the image feature is input into a first residual unit S ₁ Through residual unit S ₁ Carrying out convolution processing on the image characteristics to obtain intermediate characteristics i; inputting the image feature and the intermediate feature i into a second residual unit S ₂ Through residual unit S ₂ Carrying out convolution processing on the image characteristic and the intermediate characteristic i to obtain an intermediate characteristic i+1;inputting the intermediate feature i+1 and the intermediate feature i into a third residual unit S ₃ Through residual unit S ₃ Performing convolution processing on the intermediate feature i+1 and the intermediate feature i to obtain an intermediate feature i+2; inputting the intermediate feature i+1 and the intermediate feature i+2 into a fourth residual unit S ₄ Through residual unit S ₄ And carrying out convolution processing on the intermediate feature i+1 and the intermediate feature i+2 to obtain an intermediate feature i+3, and determining the intermediate feature i+3 as a residual image feature.

Further, the online node in the server cluster equipment performs full connection processing on the residual image characteristics to obtain target object region information. The target object region information may refer to two-dimensional region information such as coordinates (x, y), coordinates (x, 0), coordinates (0, y), and two-dimensional regions formed between coordinates (0, 0).

For ease of understanding, please refer to fig. 4, fig. 4 is a schematic flow chart of acquiring a target neural network model according to an embodiment of the present application. As shown in fig. 4, in a specific embodiment, the procedure of calling, by an online node in the server cluster device, the residual network layer to perform residual processing may be as follows:

and S41, training the initial neural network model through the online nodes in the server cluster equipment to obtain an updated neural network model.

Specifically, the online node in the server cluster device may train the initial neural network model to obtain the updated neural network model 41T, where the updated neural network model 41T is a neural network model obtained after training the initial neural network model, and a specific model training process may refer to a training process of the target neural network model in step S104 below.

In step S42, the online nodes in the server cluster device may introduce a jump connection between successive network layers updating the neural network model.

In particular, online nodes in the server cluster appliance may introduce a hopping connection between successive network layers updating the neural network model 41T. In short, the online nodes in the server cluster device can superimpose the image features of the network layers which are not directly connected through the jump connection, that is, the online nodes in the server cluster device can process different network layers of the updated neural network model 41T through different nodes, so that the server cluster device processes the updated neural network model 41T through the distributed nodes, and the processing efficiency of the server cluster device is improved. For example, an online node in a server cluster device may process a first network layer of the updated neural network model 41T through a node K1 included in the server cluster device, process a second network layer of the updated neural network model 41T through a node K2 included in the server cluster device, and so on.

In step S43, the online node in the server cluster device sends the updated neural network model to the root node in the server cluster device for storage.

Specifically, the online node in the server cluster device may send the updated neural network model 42L with the jump connection to the root node in the server cluster device for data storage, and call the updated neural network model 42L with the jump connection through the online node in the server cluster device.

And step S44, carrying out residual processing through the online nodes in the server cluster equipment.

Specifically, the online node in the server cluster device may use the residual unit 43C to perform residual processing on the updated neural network model 42L, so as to obtain a residual neural network model. Wherein the residual unit 43C here may comprise the residual unit S above _i-2 Residual error unit S _i-1 Residual error unit S _i And S is equal to _i+1 Etc.

In step S45, the online nodes in the server cluster device output the superimposed convolution layer through the jump connection.

Specifically, the neural network model that the online node in the server cluster device outputs through the superimposed convolution layer is the target neural network model 45S through the jump connection. That is, after the online node in the server cluster device performs convolution processing on the residual neural network model, the target neural network model 45S may be obtained. Specifically, the online node in the server cluster device may superimpose the output of each node distributed by the server cluster device to the updated neural network model 41T in the above step S42, to obtain the target neural network model 45S.

And step S103, determining the node which is switched to the pause state in the M nodes as the pause node through the root node, and distributing the preprocessed image data and the target object area information acquired from the online node to the pause node.

Specifically, a root node in the server cluster device may process (e.g., reassemble) key video frames and other video frames of the video data at a pause node. It should be appreciated that there is no communication connection between the node in the suspended state and the other nodes. Further, the root node in the server cluster device may distribute the preprocessed image data and the target object area information acquired from the online node to the pause node through a connection that always exists between the root node and the pause node.

Step S104, video frame editing processing is carried out on the area indicated by the target object area information in the preprocessed image data through a pause node, and video data is recoded to obtain a target coded video; the target video frame is a video frame obtained after decoding the target coded video; the regions associated with the target object in the target video frame are all the edited regions.

Specifically, the pause node in the server cluster device may perform video frame editing processing on the area indicated by the target object area information in the preprocessed image data, to obtain an edited video frame.

Specifically, the pause node in the server cluster device may obtain an edited video frame corresponding to the area indicated by the target object area information in the preprocessed image data through the inherent connection between the pause node in the server cluster device and the root node in the server cluster device, recode the video data through the encoder, and send the encoded target encoded video to the client. Wherein the pause node in the server cluster device may encode the video clip according to a bit rate and format. Specifically, the pause node in the server cluster device may determine the encoded bit rate and format according to the measured number of bits of encoded data, the software transmission algorithm of the encoded video data, the data presentation format (chart presentation or table presentation) of the video data, and the like. In particular, for video data, a pause node in a server cluster device can measure bit rate based on how many kilobits or megabits of data are processed in one second. For example, encoding 1 megabyte of video data corresponding to one second may be considered to have an encoding bit rate of 8mbps (i.e., 8 megabits per second), while processing 45 kilobytes of video data corresponding to one second may be considered to have an encoding bit rate of 360kbps (i.e., 360 kilobits per second). Wherein the encoding format may correspond to a content representation format for storing or transmitting video content, such as in a data file or bitstream. For example, the encoding formats may include, but are not limited to, moving picture experts group ("MPEG") MPEG-2 part 2, MPEG-4 part 2, H.264 (MPEG-4 part 10), H.265 high efficiency Video coding ("HEVC"), theta, realVideo RV40, VP9, and AOMedia Video 1 ("AV 1"), among others.

Further, the pause node in the server cluster device may perform editing operations by coding (i.e., playing a mosaic) the target object. For example, a pause node in a server cluster appliance may perform the encoding process through an application program interface (Application Program Interface, API) of image processing software in a cross-platform computer vision library (Open Computer Vision Library, openCV).

Specifically, the pause node in the server cluster device may map the frame identifier corresponding to the edited video frame in the file header of the video data. Further, a pause node in the server cluster device may acquire a video frame adjacent to the key video frame in the video data as an adjacent video frame. Further, the pause node in the server cluster device may add the edited video frames and the adjacent video frames associated with the same key video frame to the same frame group task, and add the key video frames not associated with the edited video frames and the corresponding adjacent video frames to the same frame group task, so as to obtain S frame group tasks, where the number of S is the same as the number of key video frames, and S is a positive integer. Further, the pause node in the server cluster device may encode the S frame group tasks in parallel to obtain intra-frame encoded frames corresponding to the edited video frames in the S frame group tasks, intra-frame encoded frames corresponding to the unedited key video frames in the S frame group tasks, and prediction encoded frames corresponding to the S frame group tasks, respectively, and encapsulate the intra-frame encoded frames, the prediction encoded frames, and the file header into the target encoded video. The intra-frame coded frame, I-frame (I-frame), which may be referred to as intra picture, is usually the first frame of each frame group task, and is moderately compressed to serve as a reference point for random access, and includes complete picture information, i.e., may be used as an image, and the server cluster device may independently decode adjacent video frames during playing to obtain a complete image. Among them, the predictive-coded frames may include two types of P frames (P frames) and B frames (B frames). In particular, a P frame depends on the previous I frame or the previous P frame, i.e. the P frame is used to record the difference information between the current predictive encoded frame and the previous video frame of the predictive encoded frame. Accordingly, the B frame depends on the previous and subsequent I frames or P frames, and bidirectional difference information is recorded. For example, in one frame group task, the composition may be 7 video frames, i.e., frame group task a= (I frame, P frame). For another example, in one frame group task, the composition may be 8 video frames, i.e., frame group task b= (I frame, P frame, B frame, I frame).

It can be appreciated that, in the specific embodiment of the present application, the related data, such as the record of the viewing video data of the current login account of the video playing application, the collection video data of the current login account, and the video data of possible interest of the current login account, should be strictly according to the requirements of the relevant national laws and regulations when the embodiments above and below are applied to specific products or technologies, the relevant data collection process should obtain the informed consent or independent consent (or have a legal basis) of the personal information body during the application of the examples, and develop the subsequent data use and processing behavior within the authorized range of the laws and regulations and the personal information body. When the related biological feature recognition technology is involved, the related data collection, use and processing processes should comply with national legal regulation requirements, information processing rules should be informed and independent consent (or legal basis) of a target object should be solicited before the biological features are collected, the biological features are processed strictly according to legal regulation requirements and personal information processing rules, and technical measures are taken to ensure the safety of the related data.

In the embodiment of the present application, when a video editing request for a target object is acquired, a node that has been switched to an online state in M nodes is determined as an online node by a root node, and preprocessed image data associated with the target object is distributed to the online node. Further, detecting and identifying the target object in the preprocessed image data through the online node to obtain target object region information. Further, the node which has been switched to the suspended state among the M nodes is determined as the suspended node by the root node, and the preprocessed image data and the target object area information acquired from the online node are distributed to the suspended node. Further, video frame editing processing is carried out on the area indicated by the target object area information in the preprocessed image data through a pause node, and video data is recoded, so that the target coded video is obtained. In the video frame obtained after the target coded video is decoded, the areas associated with the target object are all the areas after editing processing. In the embodiment of the application, when one node in the server cluster equipment fails, the root node can distribute the image processing task to be processed to other nodes in the same state as the failed node, so that the fault tolerance and flexibility of the server cluster equipment are improved, the server cluster equipment can be applied to more scenes, and the scene suitability of the server cluster equipment is improved. Further, according to the embodiment of the application, the image processing tasks are divided into the steps by carrying out the image processing on the nodes in different states, so that the resource utilization rate of the server cluster equipment is higher, and the automatic identification of the target object associated data by the online node and the pause node can enable a lot of video data to be identified at the same time in a plurality of nodes, so that the time for identifying the video data is shortened, and the efficiency for identifying the video data is improved. In the embodiment of the application, the time for video identification is reduced, and the efficiency of video identification is improved.

Further, referring to fig. 5, fig. 5 is a flow chart of another data processing method according to an embodiment of the present application. As shown in fig. 5, the method may be performed by a server cluster device, which may be the server 100 of fig. 1 above, or a server cluster formed by a plurality of servers, which is not limited herein. For easy understanding, the embodiment of the application is described by taking the method performed by the server cluster device as an example, and the data processing method at least may include the following steps S201 to S209:

in step S201, when it is detected that the client requests video data, a video frame of the video data is intercepted by the master node.

Specifically, a master node in the server cluster device may respond to a video data loading request sent by the client. Specifically, the business object may browse a page of the video data playing website in the client, where a plurality of covers of the video data may be displayed. Further, when a business object browses to video data of interest, the corresponding video data may be loaded by clicking on a cover of the video data. Further, when the business object clicks on the cover of the target video data, the client may send a video data loading request directed to the target video data to the server (i.e., server cluster device) of the video data playing website. Wherein in the load request, a video data identification pointing to the target video data may be carried. For example, the video data identification may be a number in a database of the server cluster device in which the target video data is stored. Further, after receiving the video data loading request sent by the client, the master node in the server cluster device may intercept video frames of the video data through the master node. Wherein a master node may refer to a state of a node. The server cluster device may create a processing center through the master node, that is, the server cluster device may pull the target video data through the master node. Specifically, a master node in the server cluster device may acquire the target video from the network, and perform synchronous streaming on the video. The streaming refers to a video data transmission process in the field of video live broadcasting. In detail, the streaming is to pull the third party live stream address to the server cluster device for video identification and editing. Correspondingly, the push stream also refers to a video data transmission process in the field of video live broadcasting. In detail, push streaming is a process of pushing video data encapsulated in an acquisition stage to a server cluster device, specifically to a specific node.

Further, a master node in the server cluster device may intercept video frames of video data through the master node. Specifically, the master node in the server cluster device may intercept each video frame of the video data to obtain a video frame corresponding to the video data. It should be appreciated that the video frames intercepted by the master node in the server cluster device may include all video frames of the video data. Further, the master node in the server cluster device may store the intercepted video frames in a database of the server cluster device. The server cluster device can realize the function of caching or temporarily storing data through the database. When the server cluster device is a specific server cluster device, each node of the server cluster device may call the data stored in the database. Specifically, a master node in the server cluster device, a root node in the server cluster device and an online node in the server cluster device can perform data call through a communication connection mode of a communication protocol. Among other protocols, communication protocols may include protocols such as HTTP real-time streaming ("HLS"), dynamic adaptive streaming over HTTP ("DASH"), dynamic streaming over HTTP ("HDS"), real-time messaging protocol ("RTMP"), smooth streaming, and the like. Correspondingly, the pause node in the server cluster equipment and the offline node in the server cluster equipment can perform data transmission through inherent connection modes between the pause node and the root node in the server cluster equipment respectively.

Step S202, constructing a minimum heap node structure diagram based on loads of M nodes; the root node in the minimum heap node structure diagram is the node with the minimum load in the M nodes.

Specifically, the master node in the server cluster device may sort the M nodes into a complete binary tree according to the load sizes of the M nodes, and specifically, the server cluster device may generate a minimum heap node structure diagram according to a data format of a minimum heap in the complete binary tree. Where a minimum heap is understood as a complete binary tree with non-leaf nodes having values no greater than the values of the left and right children. The master node in the server cluster device may determine a node with a minimum load in a minimum heap node structure diagram formed by M nodes as a root node.

Step S203, the video frame is distributed to the root node.

Specifically, a master node in the server cluster device may call a video frame from the database, and send the video frame called from the database to the root node, so that the root node may continue to distribute the video frame to other nodes. The master node in the server cluster device may be regarded as a processing center of the server cluster device, and may perform tasks with a higher task level than the task level performed by the root node in the server cluster device. For example, a master node in a server cluster device may distribute video frames to a root node.

Step S204, preprocessing the video frames in the offline node to obtain preprocessed image data.

Specifically, a node which is switched to an off-line state in M nodes is determined to be an off-line node through a root node in the server cluster equipment, a video frame is sent to the off-line node in the server cluster equipment by the root node in the server cluster equipment, a key video frame is selected from the video frames according to an acquired video frame selection strategy in the off-line node, and the off-line node in the server cluster equipment performs size pixel preprocessing on the key video frame to obtain preprocessed image data.

Specifically, feature information of video data is obtained through a root node in the server cluster equipment, and the feature information is mapped to a hash space to obtain a hash value of the video data. Further, the root node in the server cluster device may determine a node having an offline state among the M nodes, and select an offline node for generating the pre-processed image data among the nodes having an offline state according to the hash value of the video data.

The process of performing size pixel preprocessing on the key video frames by the offline nodes in the server cluster device may be as follows: the offline node in the server cluster device can perform image size detection on the key video frame to obtain key video frame size data and key video frame pixel data. Further, the offline node in the server cluster device may perform image scaling processing on the key video frame size data based on the obtained standard size data, so as to obtain standard size video frame data. Further, the offline node in the server cluster device may perform image pixel detection on the standard-size video frame data to obtain video frame pixel data. Further, the offline node in the server cluster device may perform color space conversion processing on the video frame pixel data based on the standard color space data, to obtain the preprocessed image data.

Further, the root node in the server cluster device can distribute the associated video data queried by the connectable online node to the offline node through the root node, the offline node extracts key associated video frames in the associated video data, and the key associated video frames are subjected to size pixel preprocessing to obtain preprocessed associated data.

For ease of understanding, please refer to fig. 6, fig. 6 is a flowchart for video recognition according to an embodiment of the present application. In a specific embodiment, as shown in fig. 6, a master node in the server cluster device may perform the process of extracting video frames through the master node 61C to obtain video frames 62P. Specifically, the server cluster device may select the master node 61C from M nodes included in the server cluster device. Further, the master node in the server cluster device may perform the process of extracting video frames in the master node 61C through video data processing software. In particular, the video data processing software may be commonly used software for video frame extraction of video data, such as cross-platform computer vision library (Open Computer Vision Library, openCV). In a specific embodiment, the master node in the server cluster device may extract all video frames contained in the target video data through OpenCV. Further, the master node in the server cluster device may distribute the video frames 62P to the root node in the server cluster device. Further, the root node in the server cluster device may distribute the video frames 62P to the offline node 63C. Specifically, the server cluster device may acquire all the offline nodes in the server cluster device, randomly select one offline point from all the offline nodes, determine the offline point as the offline node 63C, and distribute the video frame 62P to the offline node 63C. For example, in the embodiment shown in fig. 6, the offline nodes included in the server cluster device may include an offline node 63C, an offline node 64C, and an offline node 65C. Specifically, the process of capturing video frames by the offline node in the server cluster device through the offline node 64C or the offline node 65C may refer to the process of capturing video frames by the offline node in the server cluster device through the offline node 63C, and the process of capturing video frames by the offline node in the server cluster device through the offline node 63C will be described further below. Further, among the offline nodes 63C, the offline nodes in the server cluster device may acquire video data 631P. Further, the offline node in the server cluster device may acquire the video frame capturing policy 632L, and capture and pre-process video frames according to the video frame capturing policy 632L, to obtain the pre-processed image data 633D. The video frame capture policy 632L may include selecting a video frame at a fixed time interval or selecting one video frame from similar video frames as a representative equal video frame capture policy.

For ease of understanding, please refer to fig. 7, fig. 7 is another flowchart for video recognition provided in an embodiment of the present application. As shown in fig. 7, in one particular embodiment, a master node in a server cluster device may receive a get video data request 71Q. Further, in a specific implementation process, after receiving the request 71Q for obtaining video data, the master node in the server cluster device may sort the short video through the obtained script and then perform a crawling process, so as to achieve the effect of data collection. For example, the client may send a get video data request 71Q like a server before loading video data on a cell phone software (Application), applet or other terminal. Optionally, the master node in the server cluster device may obtain video data in a batch under a certain category through a script, that is, the master node in the server cluster device may receive a request for obtaining video data in a batch. Further, a master node in the server cluster device may obtain the request parameters 72Q. Further, the master node in the server cluster device may obtain the played video source data 73D. Further, the master node in the server cluster device may determine whether the acquired video source data 73D is interference data. The interference data may refer to data acquired in association with the target video data, and specifically, the interference data may include, for example, a time stamp of the target video data, an exposure amount of the target video data, a click amount of the target video data, a time length of the target video data, a praise amount of the target video data, a collection amount of the target video data, comments of the target video data, and the like. Further, if the master node in the server cluster device determines that the video source data 73D is interference data, the interference data is deleted; accordingly, if the master node in the server cluster device determines that the video source data 73D is non-interference data, the video source data 73D is determined to be video data 74D. For example, if the master node in the server cluster device acquires video data under a certain category as the video metadata 73D, for example, acquires video data under an "automobile" type in batch, the server cluster device may determine data such as a timestamp, an exposure amount, a click amount, a duration, a praise amount, a collection amount, a comment, and the like in the video metadata 73D as the interference data. Alternatively, after deleting the interference data in the video source data 73D, the master node in the server cluster device may update the acquired script with the interference data to obtain an updated script. Specifically, the master node in the server computing device may update the acquired script by updating the video code stream, the codec algorithm, or the video sampling rate, etc. Further, the master node in the server cluster device may re-acquire the video source data 73D using the updated script.

Further, referring back to fig. 7, the master node in the server cluster device may send video data 74D to the root node in the server cluster device. Further, the root node in the server cluster device may send the video data 74D to an offline node in the server cluster device. Further, the root node in the server cluster equipment can distribute the preprocessing associated data to the online node through the root node, and the online node detects and identifies the target object in the preprocessing associated data to obtain the associated object area information corresponding to the preprocessing associated data.

The offline nodes in the server cluster device may perform video preprocessing to obtain key video frames 75Z. Specifically, the offline node in the server cluster device may read video data through OpenCV, and perform video preprocessing on the video data. In a particular embodiment, video preprocessing may include adjusting resolution, converting color space, and the like. Further, an offline node in the server cluster device may set a video frame intercept policy based on the texture of the target object in the video data. Here, the video frame interception policy may refer to the video frame interception policy 632L as described in fig. 6. Further, the offline node in the server cluster device may apply a video frame interception policy corresponding to each video frame of the video to perform video preprocessing. In a specific embodiment, the offline node in the server cluster device may perform video preprocessing according to a capture threshold and a capture loose degree (also called capture loose degree) of the video frame capture policy. For example, an offline node in the server cluster device may perform video preprocessing on the video data using convolutional neural networks (Convolutional Neural Network, CNN) to obtain the key video frame 75Z. Specifically, the offline nodes in the server cluster device may perform normalization processing, standardization processing, size adjustment and other processing through the CNN. Further, after video data is input to the filter of the CNN, the offline node in the server cluster device may capture spatial features through the filter of the CNN, and retain neighborhood information, to generate a video feature map. Further, the offline node in the server cluster device can perform nonlinear activation processing on the video feature map output by the CNN through a nonlinear activation function (such as ReLU, sigmoid, etc.), and the step can introduce nonlinear characteristics and promote the weight of the video feature map in the convolutional neural network. Further, the offline nodes in the server cluster device can also use repeated convolution and full connection processing to increase the weight of the video feature map, and this step can increase the complexity of the convolutional neural network. Further, the offline nodes in the server cluster device may classify the video feature map in the video data through an output layer in the convolutional neural network according to a specific class classification (such as classification of video attribute where the key frame is located, file size classification, or video format classification). Further, an offline node in the server cluster device may obtain the label type 76T. Further, the offline nodes in the server cluster device may convert the output of the convolutional neural network into a predictive probability distribution by softmax activation function. The prediction probability refers to the probability of each category to which a single video feature map belongs. Further, the offline node in the server cluster device may select the category with the highest prediction probability as the category prediction result (i.e., label type 76T) of the video feature map.

Step S205 returns the preprocessed image data to the root node.

Specifically, the offline node in the server cluster device may return the preprocessed image data to the root node through the offline node in the offline state according to the inherent connection between the offline node in the offline state and the root node. Wherein the inherent connection between the offline node and the root node in the offline state may be a connection through a fixed communication channel.

Step S206, when a video editing request for a target object is acquired, determining a node which is switched to an online state in M nodes as an online node through a root node, and distributing preprocessing image data associated with the target object to the online node; the pre-processed image data is generated based on video data associated with the target object.

Specifically, the process of distributing the preprocessed image data associated with the target object to the online node is described in detail in step S101 in fig. 3, and the detailed description of the process of distributing the preprocessed image data associated with the target object to the online node is omitted here.

In step S207, the on-line node detects and identifies the target object in the preprocessed image data, and obtains the target object region information.

Specifically, the process of obtaining the target object area information is referred to the above detailed description of the process of obtaining the target object area information in step S102 in fig. 3, and will not be repeated here.

Further, referring back to fig. 7, the online node in the server cluster device may obtain associated video data 77P based on the tag type 76T. Further, the root node in the server cluster equipment can distribute the preprocessing associated data to the online node through the root node, and the online node detects and identifies the target object in the preprocessing associated data to obtain the associated object area information corresponding to the preprocessing associated data.

The process of training the target neural network model by the online node in the server cluster device may be: the online node in the server cluster device can acquire sample image data and sample two-dimensional position information corresponding to a sample object in the sample image data. Further, the online node in the server cluster device may input the sample image data into an initial neural network model, and extract sample image features corresponding to the sample image data through the initial neural network model. Further, the online node in the server cluster device can input the sample image characteristics into a residual network layer in the initial neural network model, and residual processing is performed on the sample image characteristics through the residual network layer to obtain the sample residual image characteristics. Further, the online node in the server cluster device can perform full connection processing on the sample residual image characteristics to obtain sample prediction position information corresponding to the sample object in the sample image data. Further, the online node in the server cluster device can perform parameter adjustment on the initial neural network model based on the sample prediction position information and the sample two-dimensional position information to obtain the target neural network model.

Step S208, determining the node which is switched to the pause state in the M nodes as the pause node through the root node, and distributing the preprocessed image data and the target object area information acquired from the online node to the pause node.

Specifically, the process of distributing the preprocessed image data and the target object area information acquired from the online node to the pause node is described in detail in step S103 in fig. 3, and the detailed description of the process of distributing the preprocessed image data and the target object area information acquired from the online node to the pause node is omitted here.

Further, the root node in the server cluster device may distribute the pre-processing association data and the association object region information to the suspension node through the root node. Further, the pause node in the server cluster device performs video frame editing processing on the area indicated by the associated object area information in the preprocessing associated data through the pause node, and re-encodes the associated video data to obtain the associated encoded video.

Step S209, performing video frame editing processing on the area indicated by the target object area information in the preprocessed image data through a pause node, and recoding the video data to obtain a target coded video; the target video frame is a video frame obtained after decoding the target coded video; the regions associated with the target object in the target video frame are all the edited regions.

Specifically, the process of obtaining the target encoded video is referred to the above detailed description of the process of obtaining the target encoded video in step S104 in fig. 3, and will not be described herein.

Optionally, corresponding to an editing operation in a coding manner of the target object, the pause node in the server cluster device may further perform hiding processing on the video frame containing the target object, to obtain an edited video frame. In particular, the hiding process for the video frame containing the target object herein may include directly deleting the video frame containing the target object or performing an ignoring process on the video frame containing the target object at the time of encoding.

Further, the master node in the server cluster device may send the target encoded video acquired from the pause node in the server cluster device to the client through the master node. Wherein the header file of the target encoded video is used to inform the decoder in the client of the location of the intra-coded frames of the edited video frame. The decoder is used for determining a region to be edited in the video frame with the coding dependency relationship with the decoded editing video frame based on the target object region information in the decoded editing video frame in the process of decoding the target encoding video, and performing editing rendering in the region to be edited.

Specifically, after receiving the target coded video of the template sent by the master node in the server cluster device, the client may decode the target coded video to obtain a decoded video frame corresponding to the target video. Wherein each decoded video frame may comprise: image data and header data. In a specific embodiment, the header data may include: video frame rate, video resolution, video frame type, target tag of target object, DTS (Decode Time Stamp, decoding timestamp) and PTS (Presentation Time Stamp, display timestamp) generated when encoding based on the system clock, and the like. Specifically, the client may determine that the decoded video frame is an I-frame, a P-frame, or a B-frame through the video frame type, and arrange the decoded video frames accordingly. Specifically, the target label of the target object is used to represent classification information of the target object. For example, the target tags of the target objects may be "car", "movie star", "animal" and "plant", etc. Wherein the client can determine the time at which the read bitstream in the database is decoded in the decoder by DTS. Accordingly, the client can determine the display time of the decoded video frame through the PTS. In a specific embodiment, when the client decodes a P frame or a B frame in the decoded video frame, the coding region in the I frame on which the P frame or the B frame in the decoded video frame depends may be directly determined as the coding region of the P frame or the B frame in the decoded video frame (because the coding region in the I frame on which the P frame or the B frame in the decoded video frame depends is a consistent region with the coding region of the P frame or the B frame in the decoded video frame, the coding region remains unchanged in the time interval between the I frame on which the P frame or the B frame in the decoded video frame depends and the P frame or the B frame in the decoded video frame). Further, the client may perform coding in the coding region of the P frame or the B frame in the decoded video frame, to obtain the decoded target decoded video after finishing coding. The client can ensure the accuracy of coding the target object through the decoding mode.

Specifically, the client may determine, based on the frame identifier in the file header, a rendering size and rendering content of the video frame corresponding to the region to be edited. Further, the client may determine a rendering root node and a rendering child node based on the rendering size and rendering content of the video frame corresponding to the region to be edited. Further, the client may determine a tree-like rendering structure of a rendering effect corresponding to the rendering content in the video data based on the rendering root node and the rendering child node. For example, the rendering content may be a flower, and the rendering effect corresponding to the rendering content may be a dynamic flower blooming special effect, that is, the client may render the target object in the edited video frame in the video data by using the flower special effect, so as to obtain the video data that the flower (i.e. the rendering content) occludes the target object. Alternatively, the rendered content may include blurred images, such as images of a mosaic effect.

Further, the master node in the server cluster device may send the video data to the client using the communication protocol by using the sending address in the header file of the video data as an index. Among other protocols, communication protocols may include protocols such as HTTP real-time streaming ("HLS"), dynamic adaptive streaming over HTTP ("DASH"), dynamic streaming over HTTP ("HDS"), real-time messaging protocol ("RTMP"), smooth streaming, and the like.

Further, the master node in the server cluster device may retransmit the target encoded video through the master node in case that the target encoded video fails to be sent to the client. Specifically, the target encoded video may include a first timestamp of a video frame to which the target encoded video belongs, an intra-frame sequence number of the video frame to which the target encoded video belongs, a second timestamp of a video frame preceding the video frame to which the target encoded video belongs, and a number of video frame sets included in the video frame to which the target encoded video belongs. In a specific implementation, the sequence number of a video frame set for a target encoded video refers to the ordering of the video frame set in the target encoded video. For example, a video frame set has a sequence number of 10, which indicates that the target encoded video is the 10 th target encoded video of the target encoded videos. Wherein the first timestamp of the video frame is a DTS (Decoding Time Stamp ). The master node in the server cluster device may determine the time to decode the video frame via DTS. Likewise, the second timestamp of the previous frame video frame is also a DTS. The number of video frame sets of the first timestamp can be determined because the first timestamp of the video frames is carried, and the number of video frame sets of the first timestamp can be compared with the number of video frame sets currently carried because the number of video frame sets contained in the target coded video is carried, if the number of video frame sets of the first timestamp is smaller than the number of video frame sets currently carried, the loss of the video frame sets is indicated, and if the number of video frame sets of the first timestamp is equal to the number of video frame sets currently carried, the complete reception of the video frame sets in the target coded video is indicated. And if the master node in the server cluster equipment detects that the video frame set is lost, retransmitting the target coded video through the master node.

For ease of understanding, please refer to fig. 8, fig. 8 is a flowchart of still another video recognition method provided in an embodiment of the present application. As shown in fig. 8, in a specific embodiment, a root node in the server cluster device may obtain the preprocessed image data and the target object area information 81P through the root node. Wherein this step can be seen as the start of the video reorganization process. Further, the root node in the server cluster device may obtain the file header 82H corresponding to the target object. Further, the master node in the server cluster device may acquire the encoder and decoder 83P. Further, the pause node in the server cluster device may perform coding processing on the preprocessed image data and the target object area information, to obtain a file header 84H of the coded video frame map. In a specific embodiment, the header 84H of the coded video frame map may include an identifier, version, encoding format, resolution, code rate, chroma, etc. For example, if the header 84H of the encoded video frame map is header data of a multi-function player (MP 4) file, the header 84H of the encoded video frame map may use "ftyp" as the identifier. Further, a pause node in the server cluster device may obtain the target encoded video 85V encoded using the encoder. Further, after the master node in the server cluster device acquires the target encoded video 85V sent by the pause node in the server cluster device, the master node may package and send the target encoded video 85V to the client, so that the client performs video data streaming. Further, after receiving the target encoded video 85V, the client may decode the target encoded video 85V frame by using a decoder to obtain the header vector data 86D. The decoder here may be one of the above encoder and decoder 83P. The header vector data 86D is used to represent data such as a video frame rate, a video resolution, a video frame type, a target tag of a target object, DTS (Decode Time Stamp, decoding time stamp) and PTS (Presentation Time Stamp, display time stamp) generated when encoding based on a system clock, and the like. Further, the client may render the target object region information for the P frame in the target encoded video 85V, to obtain the target decoded video 87V. Specifically, the client may determine, based on the frame identifier in the file header, a rendering size and rendering content of the video frame corresponding to the region to be edited. Further, the client may determine a rendering root node and a rendering child node based on the rendering size and rendering content of the video frame corresponding to the region to be edited. Further, the client may determine a tree-like rendering structure of a rendering effect corresponding to the rendering content in the video data based on the rendering root node and the rendering child node. For example, the rendering content may be a flower, and the rendering effect corresponding to the rendering content may be a dynamic flower blooming special effect, that is, the client may render the target object in the edited video frame in the video data by using the flower special effect, so as to obtain the video data that the flower (i.e. the rendering content) occludes the target object. Alternatively, the rendered content may include blurred images, such as images of a mosaic effect. In a particular embodiment, the client may render by way of shader stuffing.

Referring again to FIG. 8, a Shader (loader) is used to implement image rendering, replacing the editable program of the fixed rendering pipeline. The Shader (Shader) mainly includes two types, namely a Vertex Shader (Vertex Shader) and a Pixel Shader (Pixel Shader). In brief, a shader is a program that exists independently in a graphics processor (graphics processing unit, GPU) and outputs an image after processing only input video data. For example, in an embodiment of moving object detection, if a license plate running a red light on a road is to be detected, the detection process may be as follows: and when the red light is generated, the camera acquires the video stream detected in real time. Further, a master node in the server cluster device acquires the video stream detected in real time through a central processing unit (Central Processing Unit, CPU). Further, a master node in the server cluster device detects the real-time detected video stream through a graphics processor (graphics processing unit, GPU). Further, the client may perform image conversion on the received video stream detected in real time, which is sent by the master node in the server cluster device, through the vertex shader, so as to detect a license plate running a red light. Further, when detecting license plate numbers again, the client can compare license plate number outlines, store labels of license plate outline images in the database, binarize the license plate outline images to be detected through a fragment shader (for example, license plate base colors are uniformly set to be black, and license plate letters are set to be white). Further, the client can obtain a detection result, such as "certain XXXXXX", by comparing the license plate contour image to be detected with the label of the license plate contour image. Specifically, if the license plate contour image to be detected is different from the label of the license plate contour image, the detection result is empty; correspondingly, if the license plate contour image to be detected is the same as the label of the license plate contour image, determining that the detection result is the license plate contour image to be detected, namely that the license plate corresponding to the license plate contour image to be detected is a red light running license plate.

In the embodiment of the application, when the client side is detected to request the video data, the video frame of the video data is intercepted through the main node; constructing a minimum heap node structure diagram based on the loads of M nodes; the root node in the minimum heap node structure diagram is the node with the minimum load in M nodes; distributing the video frames to the root node; determining a node which is switched to an offline state from M nodes as an offline node through a root node, sending video frames to the offline node, selecting key video frames from the video frames according to an acquired video frame selection strategy in the offline node, and carrying out size pixel preprocessing on the key video frames to obtain preprocessed image data; the preprocessed image data is returned to the root node. According to the embodiment of the application, the root node can be determined by constructing the minimum heap node structure diagram, and the image processing tasks are distributed to different nodes through the root node. Through regulation and control of image processing tasks of the root node, order of server cluster equipment can be improved, so that the image processing tasks processed on the server cluster equipment can be more orderly, and service efficiency of the server cluster equipment is improved. In the embodiment of the application, when one node in the server cluster equipment fails, the root node can distribute the image processing task to be processed to other nodes in the same state as the failed node, so that the fault tolerance and flexibility of the server cluster equipment are improved, the server cluster equipment can be applied to more scenes, and the scene suitability of the server cluster equipment is improved. Further, the time complexity of inquiring the minimum connection number through the priority queue method of the minimum heap is low, the efficiency of using the priority queue method in a large-scale server cluster is improved, labor cost is saved, and the processing speed and the resource utilization efficiency of processing video data by using a parallel strategy in a pause node are improved. The resource utilization rate of the server cluster equipment can be higher through the regulation and control of the image processing task of the root node, and the batch of video data can be identified at the same time in a plurality of nodes through the automatic identification of the associated data of the target object by the online node and the pause node, so that the time for identifying the video data is shortened, and the efficiency for identifying the video data is improved. In the embodiment of the application, the time for video identification is reduced, and the efficiency of video identification is improved.

Further, referring to fig. 9, fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing means may be a computer program (comprising program code) running in a computer device, for example the data processing means is an application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the application. As shown in fig. 9, the data processing apparatus 1 is applied to a service management platform, and the data processing apparatus 1 may include: an online node determination module 11, an image data detection module 12, a pause node determination module 13, a region editing module 14, and a video data encoding module 15.

An online node determining module 11, configured to determine, by a root node, a node that has been switched to an online state among the M nodes as an online node when a video editing request for a target object is acquired, and distribute preprocessed image data associated with the target object to the online node; preprocessing image data is generated based on video data associated with a target object;

the image data detection module 12 is configured to detect and identify a target object in the preprocessed image data through an online node, so as to obtain target object region information;

A suspended node determining module 13 for determining, by the root node, a node of the M nodes that has been switched to a suspended state as a suspended node, distributing the preprocessed image data and the target object area information acquired from the online node to the suspended node;

a region editing module 14 for performing video frame editing processing on a region indicated by the target object region information in the preprocessed image data by the pause node;

a video data encoding module 15, configured to re-encode video data to obtain a target encoded video; the target video frame is a video frame obtained after decoding the target coded video; the regions associated with the target object in the target video frame are all the edited regions.

The specific functional implementation manners of the online node determining module 11, the image data detecting module 12, the pause node determining module 13, the region editing module 14 and the video data encoding module 15 may be referred to in step S101-step S104 in the corresponding embodiment of fig. 3, and will not be described herein.

Referring to fig. 9, the M nodes further include a master node; the data processing apparatus 1 further includes:

the video frame intercepting module 16 is used for intercepting video frames of the video data through the master node when detecting that the client requests the video data;

A minimum heap construction module 17, configured to construct a minimum heap node structure diagram based on loads of M nodes; the root node in the minimum heap node structure diagram is the node with the minimum load in M nodes;

a video frame distribution module 18 for distributing video frames to the root node;

a video frame preprocessing module 19, configured to perform size pixel preprocessing on the key video frame to obtain preprocessed image data;

the image data transmitting module 20 is configured to return the preprocessed image data to the root node.

The specific functional implementation manners of the video frame capturing module 16, the minimum stack constructing module 17, the video frame distributing module 18, the video frame preprocessing module 19, and the image data transmitting module 20 may be referred to the above steps S201 to S205 in the corresponding embodiment of fig. 5, and will not be described herein.

Referring again to fig. 9, the video frame preprocessing module 19 includes:

an offline node determining unit 191 configured to determine, by the root node, a node that has been switched to an offline state among the M nodes as an offline node;

a video frame selection unit 192, configured to send video frames to an offline node, and select key video frames from the video frames in the offline node according to the acquired video frame selection policy;

An image size detection unit 193 for performing image size detection on the key video frame to obtain key video frame size data and key video frame pixel data;

an image scaling unit 194, configured to perform image scaling processing on the key video frame size data based on the obtained standard size data, to obtain standard size video frame data;

an image pixel detection unit 195, configured to perform image pixel detection on standard-size video frame data to obtain video frame pixel data;

the color space conversion unit 196 is configured to perform color space conversion processing on the video frame pixel data based on the standard color space data, resulting in preprocessed image data.

The specific functional implementation manners of the offline node determining unit 191, the video frame selecting unit 192, the image size detecting unit 193, the image scaling unit 194, the image pixel detecting unit 195 and the color space converting unit 196 may be referred to the step S204 in the corresponding embodiment of fig. 5, and will not be described herein.

Referring again to fig. 9, the image data detection module 12 includes:

an image feature extraction unit 121, configured to extract image features corresponding to the preprocessed image data through the target neural network model;

A feature input unit 122 for inputting image features to a residual network layer in the target neural network model;

a feature residual processing unit 123, configured to perform residual processing on the image feature through a residual network layer, so as to obtain a residual image feature;

and the feature full-connection unit 124 is configured to perform full-connection processing on the residual image features to obtain target object region information.

The specific functional implementation manners of the image feature extraction unit 121, the feature input unit 122, the feature residual processing unit 123, and the feature full connection unit 124 may refer to step S102 in the corresponding embodiment of fig. 3, and are not described herein.

Referring again to fig. 9, wherein the residual network layer includes M residual units including residual unit S _i M is a positive integer, i is a positive integer less than or equal to M; the feature residual processing unit 123 includes:

a first convolution subunit 1231 for inputting the input features into the residual unit S _i Through residual unit S _i Carrying out convolution processing on the input features to obtain intermediate features i; if residual unit S _i The input feature is an image feature for the first residual unit in the residual network layer; if residual unit S _i For the second residual unit in the residual network layer, the input features include residual unit S _i-1 The output intermediate feature i-1 and the image feature; residual unit S _i-1 Is a residual unit S _i A previous layer residual unit of (a); if residual unit S _i Not the first residual unit and the second residual unit in the residual network layer, the input features include residual unit S _i-1 The output intermediate feature i-1 and residual unit S _i-2 The output intermediate feature i-2, residual unit S _i-2 Is a residual unit S _i-1 A previous layer residual unit of (a);

a second convolution subunit 1232 for inputting the auxiliary and intermediate features i into the residual unit S _i+1 Through residual unit S _i+1 Carrying out convolution processing on the input feature and the intermediate feature i to obtain an intermediate feature i+1; residual unit S _i+1 Is a residual unit S _i A next layer residual unit of (a); if residual unit S _i The auxiliary feature is an image feature for the first residual unit in the residual network layer; if residual unit S _i Not the first residual unit in the residual network layer, the auxiliary feature is the intermediate feature i-1;

feature determination subunit 1233 for if residual unit S _i+1 For the last layer of residual units in the residual network layer, the intermediate feature i+1 is determined as the residual image feature.

The specific functional implementation manner of the first convolution subunit 1231, the second convolution subunit 1232, and the feature determining subunit 1233 may refer to step S102 in the corresponding embodiment of fig. 3, which is not described herein.

Referring again to fig. 9, the data processing apparatus 1 further includes:

the tag type obtaining module 21 is configured to obtain a tag type of a target object, and send the tag type of the target object to a connectable online node in a heartbeat connection manner, so that the connectable online node queries associated video data associated with the target object in a video cluster having the tag type of the target object;

the specific functional implementation manner of the tag type obtaining module 21 may refer to step S101 in the corresponding embodiment of fig. 3, and will not be described herein.

The video data distribution module 22 is configured to distribute, via the root node, the associated video data queried by the connectable online node to the offline node, extract, via the offline node, a key associated video frame in the associated video data, and perform size pixel preprocessing on the key associated video frame to obtain preprocessed associated data;

the object detection module 23 is configured to distribute the pre-processing associated data to an online node through a root node, and detect and identify a target object in the pre-processing associated data through the online node to obtain associated object region information corresponding to the pre-processing associated data;

The specific functional implementation manner of the video data distribution module 22 and the object detection module 23 may refer to step S204 in the corresponding embodiment of fig. 5, and will not be described herein.

The associated encoded video obtaining module 24 is configured to distribute the preprocessed associated data and the associated object region information to the pause node through the root node, edit the region indicated by the associated object region information in the preprocessed associated data through the pause node, obtain an associated edited video frame, and re-encode the associated video data according to the associated edited video frame, so as to obtain an associated encoded video.

The specific functional implementation manner of the associated encoded video acquisition module 24 may refer to step S208 in the corresponding embodiment of fig. 5, which is not described herein.

Referring to fig. 9, the region editing module 14 is specifically configured to perform video frame editing processing on a region indicated by the target object region information in the preprocessed image data by using a pause node, so as to obtain an edited video frame.

The specific functional implementation of the region editing module 14 may refer to step S104 in the corresponding embodiment of fig. 3, and will not be described herein.

Referring again to fig. 9, the video data encoding module 15 includes:

a frame identifier mapping unit 151, configured to map a frame identifier corresponding to an edited video frame in a header of video data;

an adjacent video frame acquiring unit 152, configured to acquire, as an adjacent video frame, a video frame adjacent to the key video frame in the video data;

the frame group task obtaining unit 153 is configured to add an edited video frame and an adjacent video frame associated with the same key video frame to the same frame group task, and add a key video frame not associated with the edited video frame and the corresponding adjacent video frame to the same frame group task to obtain S frame group tasks, where the number of S is the same as the number of key video frames, and S is a positive integer;

the task parallel encoding unit 154 is configured to encode the S frame group tasks in parallel to obtain intra-frame encoded frames corresponding to the edited video frames in the S frame group tasks, intra-frame encoded frames corresponding to the unedited key video frames in the S frame group tasks, and prediction encoded frames corresponding to the S frame group tasks, and encapsulate the intra-frame encoded frames, the prediction encoded frames, and the header into the target encoded video.

The specific functional implementation manners of the frame identifier mapping unit 151, the adjacent video frame acquiring unit 152, the frame group task acquiring unit 153, and the task parallel encoding unit 154 may be referred to the step S104 in the corresponding embodiment of fig. 3, and will not be described herein.

Referring to fig. 9, the video data encoding module 15 is specifically configured to send the target encoded video to the client through the master node; the header file of the target coded video is used for informing a decoder in the client of the position of the intra-frame coded frame belonging to the edited video frame; the decoder is used for determining a region to be edited in the video frame with the coding dependency relationship with the decoded editing video frame based on the target object region information in the decoded editing video frame in the process of decoding the target encoding video, and performing editing rendering in the region to be edited.

The specific functional implementation manner of the video data encoding module 15 may refer to step S209 in the corresponding embodiment of fig. 5, and will not be described herein.

Referring again to fig. 9, the offline node determining unit 191 includes:

a feature information mapping subunit 1911, configured to obtain feature information of the video data through the root node, map the feature information to a hash space, and obtain a hash value of the video data;

an offline node selection subunit 1912 configured to determine a node having an offline state among the M nodes, and select an offline node for generating the preprocessed image data among the nodes having an offline state according to the hash value of the video data.

The specific functional implementation manner of the feature information mapping unit 1911 and the offline node selecting unit 1912 may refer to step S204 in the corresponding embodiment of fig. 5, and will not be described herein.

Wherein the image data detection module 12 includes:

a sample image obtaining unit 125, configured to obtain sample image data and sample two-dimensional position information corresponding to a sample object in the sample image data;

a sample image input unit 126, configured to input sample image data into an initial neural network model, and extract sample image features corresponding to the sample image data through the initial neural network model;

the sample feature obtaining unit 127 is configured to input the sample image feature into a residual network layer in the initial neural network model, and perform residual processing on the sample image feature through the residual network layer to obtain a sample residual image feature;

the sample position information obtaining unit 128 is configured to perform full connection processing on the sample residual image features to obtain sample prediction position information corresponding to a sample object in the sample image data;

the model parameter adjustment unit 129 is configured to perform parameter adjustment on the initial neural network model based on the sample prediction position information and the sample two-dimensional position information, so as to obtain a target neural network model.

The specific functional implementation manners of the sample image obtaining unit 125, the sample image input unit 126, the sample feature obtaining unit 127, the sample position information obtaining unit 128, and the model parameter adjusting unit 129 may be referred to the step S207 in the corresponding embodiment of fig. 5, and will not be described herein.

Further, referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 10, the computer device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the aforementioned processor 1001. As shown in fig. 10, the memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a device control application.

In the computer device 1000 shown in FIG. 10, the network interface 1004 may provide network communication functions; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

when a video editing request aiming at a target object is acquired, determining a node which is switched to an online state in M nodes as an online node through a root node, and distributing preprocessing image data associated with the target object to the online node; preprocessing image data is generated based on video data associated with a target object; detecting and identifying a target object in the preprocessed image data through an online node to obtain target object region information; determining a node which is switched to a pause state from M nodes as a pause node through a root node, and distributing the preprocessed image data and the target object region information acquired from the online node to the pause node; performing video frame editing processing on the region indicated by the target object region information in the preprocessed image data through a pause node, and recoding the video data to obtain a target coded video; in a video frame obtained after decoding the target coded video, the areas associated with the target object are all areas after editing processing.

It should be understood that the computer device 1000 described in the embodiments of the present application may perform the description of the data processing method in the embodiments corresponding to fig. 2, 3, 4, 5, 6, 7 and 8, and may also perform the description of the data processing apparatus 1 in the embodiments corresponding to fig. 9, which are not described herein again. In addition, the description of the beneficial effects of the same method is omitted.

The embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program includes program instructions, where the program instructions when executed by a processor implement a data processing method provided by each step in fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7, and fig. 8, and specifically refer to an implementation manner provided by each step in fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7, and fig. 8, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

The computer readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device can execute the data processing method in the embodiments corresponding to fig. 2, 3, 4, 5, 6, 7 and 8, which are not described herein. In addition, the description of the beneficial effects of the same method is omitted.

The term "comprising" and any variations thereof in the description of the embodiments of the present application and in the claims and drawings is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The methods and related devices provided in the embodiments of the present application are described with reference to the method flowcharts and/or structure diagrams provided in the embodiments of the present application, and each flowchart and/or block of the method flowcharts and/or structure diagrams may be implemented by computer program instructions, and combinations of flowcharts and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or structures.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. The video data processing method is characterized in that the method is executed by service cluster equipment, the service cluster equipment comprises M nodes, the M nodes comprise root nodes, the root nodes are nodes with minimum load in the M nodes, and M is a positive integer; the method comprises the following steps:

when a video editing request aiming at a target object is acquired, determining a node which is switched to an online state in the M nodes as an online node through the root node, and distributing preprocessing image data associated with the target object to the online node; the pre-processed image data is generated based on video data associated with the target object;

detecting and identifying the target object in the preprocessed image data through the online node to obtain target object region information;

determining, by the root node, a node, which has been switched to a suspended state, of the M nodes as a suspended node, and distributing the preprocessed image data and the target object area information acquired from the online node to the suspended node;

Performing video frame editing processing on the area indicated by the target object area information in the preprocessed image data through the pause node, and recoding the video data to obtain a target coded video; the target video frame is a video frame obtained after the target coded video is decoded; the areas associated with the target object in the target video frame are all areas after editing processing.

2. The method of claim 1, wherein the M nodes further comprise a master node; the method further comprises the steps of:

when detecting that a client requests video data, intercepting a video frame of the video data through the master node;

constructing a minimum heap node structure diagram based on the loads of the M nodes; the root node in the minimum heap node structure diagram is the node with the minimum load in the M nodes;

distributing the video frames to the root node;

preprocessing the video frame in the off-line node to obtain preprocessed image data;

and returning the preprocessed image data to the root node.

3. The method of claim 2, wherein the preprocessing the video frames in the offline node to obtain preprocessed image data comprises:

Determining a node which is switched to an offline state from the M nodes as an offline node through the root node, sending the video frame to the offline node, and selecting a key video frame from the video frames according to an acquired video frame selection strategy in the offline node;

performing image size detection on the key video frames to obtain key video frame size data and key video frame pixel data;

performing image scaling processing on the key video frame size data based on the acquired standard size data to obtain standard size video frame data;

performing image pixel detection on the standard-size video frame data to obtain video frame pixel data;

and performing color space conversion processing on the video frame pixel data based on the standard color space data to obtain preprocessed image data.

4. The method according to claim 1, wherein the detecting and identifying the target object in the preprocessed image data by the online node to obtain target object region information includes:

extracting image features corresponding to the preprocessed image data through a target neural network model;

inputting the image features into a residual network layer in the target neural network model, and carrying out residual processing on the image features through the residual network layer to obtain residual image features;

And performing full connection processing on the residual image characteristics to obtain target object region information.

5. The method of claim 4, wherein the residual network layer comprises M residual units including residual unit S _i M is a positive integer, i is a positive integer less than or equal to M; the residual processing is performed on the image features through the residual network layer to obtain residual image features, including:

inputting an input feature into the residual unit S _i Through the residual unit S _i Performing convolution processing on the input features to obtain intermediate features i; if the residual error unit S _i For a first residual unit in the residual network layer, the input feature is the image feature; if the residual error unit S _i For the second residual unit in the residual network layer, the input features include the residual unit S _i-1 The output intermediate feature i-1 and the image feature; the residual error unit S _i-1 For the residual unit S _i A previous layer residual unit of (a);

inputting an auxiliary feature and the intermediate feature i into a residual unit S _i+1 Through the residual unit S _i+1 Performing convolution processing on the input feature and the intermediate feature i to obtain an intermediate feature i+1; the residual error unit S _i+1 For the residual unit S _i A next layer residual unit of (a); if the residual error unit S _i For a first residual unit in the residual network layer, the auxiliary feature is the image feature; if the residual error unit S _i Not being the first of the residual network layersA residual unit, the auxiliary feature is the intermediate feature i-1;

if the residual error unit S _i+1 And determining the intermediate feature i+1 as a residual image feature for a residual unit of the last layer in the residual network layer.

6. The method according to claim 2, further comprising, after said detecting and identifying the target object in the preprocessed image data by the online node, obtaining target object area information:

the method comprises the steps of obtaining a tag type of a target object, and sending the tag type of the target object to a connectable online node in a heartbeat connection mode so that the connectable online node queries associated video data associated with the target object in a video cluster with the tag type of the target object;

distributing the related video data queried by the connectable online node to the offline node through the root node, extracting key related video frames in the related video data through the offline node, and performing size pixel preprocessing on the key related video frames to obtain preprocessed related data;

Distributing the preprocessing associated data to the online node through the root node, and detecting and identifying the target object in the preprocessing associated data through the online node to obtain associated object area information corresponding to the preprocessing associated data;

and distributing the preprocessing associated data and the associated object area information to the pause node through the root node, editing an area indicated by the associated object area information in the preprocessing associated data through the pause node to obtain an associated edited video frame, and recoding the associated video data according to the associated edited video frame to obtain an associated coded video.

7. The method according to claim 2, wherein the performing, by the pause node, video frame editing processing on the area indicated by the target object area information in the preprocessed image data, and re-encoding the video data, to obtain a target encoded video, includes:

performing video frame editing processing on the area indicated by the target object area information in the preprocessed image data through the pause node to obtain an edited video frame;

Mapping the frame identification corresponding to the edited video frame in the file header of the video data;

acquiring video frames adjacent to the key video frames in the video data as adjacent video frames;

adding the edited video frames and the adjacent video frames which are associated with the same key video frames into the same frame group task, and adding the key video frames which are not associated with the edited video frames and the corresponding adjacent video frames into the same frame group task to obtain S frame group tasks, wherein the number of S is the same as that of the key video frames, and S is a positive integer;

and carrying out parallel coding on the S frame group tasks to obtain intra-frame coding frames corresponding to the edited video frames in the S frame group tasks, intra-frame coding frames corresponding to the unedited key video frames in the S frame group tasks and predictive coding frames corresponding to the S frame group tasks, and packaging the intra-frame coding frames, the predictive coding frames and the file header into target coding videos.

8. The method of claim 7, wherein the method further comprises:

transmitting the target coded video to a client through the master node; the header file of the target coded video is used for informing a decoder in the client of the position of an intra-frame coded frame belonging to the edited video frame; the decoder is used for determining a region to be edited in a video frame with an encoding dependency relationship with the decoded editing video frame based on the target object region information in the decoded editing video frame in the process of decoding the target encoding video, and performing editing rendering in the region to be edited.

9. A method according to claim 3, wherein said determining, by the root node, a node of the M nodes that has been switched to an offline state as an offline node comprises:

acquiring characteristic information of the video data through the root node, and mapping the characteristic information to a hash space to obtain a hash value of the video data;

and determining a node with an offline state from the M nodes, and selecting an offline node for generating preprocessing image data from the nodes with the offline state according to the hash value of the video data.

10. The method according to claim 4, wherein the method further comprises:

acquiring sample image data and sample two-dimensional position information corresponding to a sample object in the sample image data;

inputting the sample image data into an initial neural network model, and extracting sample image features corresponding to the sample image data through the initial neural network model;

inputting the sample image features into a residual network layer in the initial neural network model, and carrying out residual processing on the sample image features through the residual network layer to obtain sample residual image features;

Performing full connection processing on the sample residual image characteristics to obtain sample prediction position information corresponding to a sample object in the sample image data;

and carrying out parameter adjustment on the initial neural network model based on the sample prediction position information and the sample two-dimensional position information to obtain a target neural network model.

11. A data processing apparatus, comprising:

an online node determining module, configured to determine, by the root node, a node that has been switched to an online state among the M nodes as an online node when a video editing request for a target object is acquired, and distribute preprocessed image data associated with the target object to the online node; the pre-processed image data is generated based on video data associated with the target object;

the image data detection module is used for detecting and identifying the target object in the preprocessed image data through the online node to obtain target object region information;

a suspended node determining module, configured to determine, by the root node, a node that has been switched to a suspended state among the M nodes as a suspended node, and distribute the preprocessed image data and the target object area information acquired from the online node to the suspended node;

the video data coding module is used for recoding the video data to obtain a target coded video; and in the video frame obtained after the target coded video is decoded, the areas associated with the target object are all areas after editing processing.

12. A computer device, comprising: a processor, a memory, and a network interface;

the processor is connected to a memory, a network interface for providing data communication functions, the memory for storing a computer program, the processor for invoking the computer program to cause the computer device to perform the method of any of claims 1-10.

13. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, which computer program is adapted to be loaded and executed by a processor to cause a computer device with a processor to perform the method of any of claims 1-10.

14. A computer program product, characterized in that the computer program product comprises a computer program stored in a computer readable storage medium, the computer program being adapted to be read and executed by a processor to cause a computer device with a processor to carry out the steps of the method according to any one of claims 1-10.