CN114844889B

CN114844889B - Video processing model updating method and device, electronic equipment and storage medium

Info

Publication number: CN114844889B
Application number: CN202210414026.5A
Authority: CN
Inventors: 刘吉; 章红; 周吉文; 贾俊铖; 窦德景
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2023-07-07
Anticipated expiration: 2042-04-14
Also published as: CN114844889A

Abstract

The disclosure provides a method, a device, an electronic device and a storage medium for updating a video processing model, which relate to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision and the like, and comprise the following steps: the method comprises the steps of obtaining a video processing model to be updated, determining a plurality of heterogeneous degree information corresponding to a plurality of edge devices respectively, and updating the video processing model to be updated according to shared video data and the heterogeneous degree information to obtain a target video processing model, so that the initial video processing model is subjected to optimization processing by combining the shared sample video and the heterogeneous degree information, the influence of the distribution difference of the video data on the performance of the model can be reduced, the training convergence efficiency of the video processing model is effectively improved, and the performance of the obtained target video processing model is improved.

Description

Video processing model updating method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision and the like, and particularly relates to a method and a device for updating a video processing model, electronic equipment and a storage medium.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

In the related art, a video processing model is generally trained based on federal machine learning, that is, each edge device is respectively trained locally to obtain a model to be aggregated and reported to a server under the condition that data of each side is not taken out of a warehouse, and the server respectively reports a plurality of models to be aggregated to a plurality of edge devices so as to aggregate the models to obtain the video processing model.

Disclosure of Invention

The present disclosure provides a method, apparatus, electronic device, storage medium, and computer program product for updating a video processing model.

According to a first aspect of the present disclosure, there is provided a method for updating a video processing model, including: acquiring a video processing model to be updated, wherein the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated, which are respectively reported by a plurality of edge devices; determining a plurality of pieces of heterogeneous degree information respectively corresponding to the plurality of edge devices, wherein the heterogeneous degree information describes the distribution difference degree between local video data and shared video data in the corresponding edge devices; and updating the video processing model to be updated according to the shared video data and the heterogeneous degree information to obtain a target video processing model.

According to a second aspect of the present disclosure, there is provided an updating apparatus of a video processing model, including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a video processing model to be updated, and the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated, which are respectively reported by a plurality of edge devices; a determining module, configured to determine a plurality of heterogeneous degree information corresponding to the plurality of edge devices, where the heterogeneous degree information describes a degree of distribution difference between local video data and shared video data in the corresponding edge device; and the updating module is used for updating the video processing model to be updated according to the shared video data and the heterogeneous degree information to obtain a target video processing model.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of updating a video processing model as in the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of updating a video processing model as in the first aspect of the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of updating a video processing model as in the first aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a model training flow in accordance with an embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;

fig. 7 shows a schematic block diagram of an example electronic device for implementing the method of updating a video processing model of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that, the execution body of the method for updating a video processing model in this embodiment is an apparatus for updating a video processing model, and the apparatus may be implemented in a software and/or hardware manner, and the apparatus may be configured in an electronic device, where the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of deep learning, computer vision and the like.

Wherein, artificial intelligence (Artificial Intelligence), english is abbreviated AI. It is a new technical science for researching, developing theory, method, technology and application system for simulating, extending and expanding human intelligence.

Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. The final goal of deep learning is to enable a machine to analyze learning capabilities like a person, and to recognize text, images, and sound data.

The computer vision means that a camera and a computer are used for replacing human eyes to perform machine vision such as recognition, tracking and measurement on targets, and further graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect.

Federal machine learning (Federated machine learning/Federated Learning), also known as federal learning, joint learning, federal learning. Federal machine learning defines a machine learning framework under which the problem of disparate data owners collaborating without exchanging data is solved by designing a virtual model. The virtual model is the optimal model for each party to aggregate data together, and each region serves a local target according to the model. This modeling result should be infinitely close to the traditional model in federal machine learning, i.e., the result of modeling by aggregating data from multiple data owners together. Under federal mechanisms, the identities and roles of the participants are the same, and a shared data policy can be established. Because the data is not transferred, the user privacy is not revealed or the data specification is not influenced, the data privacy of the user can be protected, and the legal compliance requirement is met.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

As shown in fig. 1, the method for updating the video processing model includes:

s101: and acquiring a video processing model to be updated, wherein the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated, which are respectively reported by a plurality of edge devices.

The video processing model may be used to process image data and/or speech data in each video frame. The video processing model to be updated may be a video processing model to be updated currently, the video processing model to be updated may be obtained by aggregating a plurality of video processing models to be aggregated respectively reported by a plurality of edge devices, and the video processing model to be updated may be stored in a server.

The model to be aggregated may refer to a video processing model obtained by training the edge device in combination with local data of the edge device.

The edge device is an electronic device, such as a terminal, a computer, etc., that can establish a communication link with a server, receive and upload a model, and perform model training by using local data in the model training process, which is not limited.

In the embodiment of the disclosure, before performing model training, an initial model suitable for an application scene may be predetermined, and then issued to a plurality of edge devices by a server, and the edge devices combine with local data to perform training to obtain models to be aggregated corresponding to the plurality of edge devices.

S102: and determining a plurality of pieces of heterogeneous degree information respectively corresponding to the plurality of edge devices, wherein the heterogeneous degree information describes the distribution difference degree between the local video data and the shared video data in the corresponding edge devices.

The local video data may refer to local data corresponding to a plurality of edge devices. In the embodiment of the disclosure, the edge device may train each round of models issued by the server by using the local video data to obtain corresponding models to be aggregated.

The video data may refer to picture data and/or voice data corresponding to each video frame. The shared video data may refer to video data in a server, and the shared video data may be a small part of video data authorized to be shared by each edge device based on the sharing authorization of each edge device in advance, or the shared video data may be obtained from any other possible third party platform, which is not limited.

In the embodiment of the disclosure, because the types, versions, structures and other features of the local video data and the shared video data may have differences, when determining a plurality of heterogeneous degree information corresponding to a plurality of edge devices respectively, a reliable reference basis can be provided for the subsequent updating process of the video processing model to be updated, and the influence of the distribution differences between the local video data and the shared video data on the model performance can be effectively reduced, so that the model training effect is effectively improved.

For example, in the embodiment of the present disclosure, determining the plurality of heterogeneous degree information corresponding to the plurality of edge devices may be determining attribute information of the local video data and the shared video data, then analyzing and comparing the attribute information between the local video data and the shared video data, and determining the plurality of heterogeneous degree information corresponding to the plurality of edge devices according to the analysis and comparison result.

S103: and updating the video processing model to be updated according to the shared video data and the heterogeneous degree information to obtain a target video processing model.

The target processing model may be a video processing model obtained by the video processing model to be updated through the shared video data and heterogeneous degree information updating process.

In some embodiments, the video processing model to be updated is updated according to the shared video data and the heterogeneous degree information to obtain the target video processing model, which may be that update information corresponding to the shared video data and the heterogeneous degree information is determined by adopting a mathematical function calculation method, and then the video processing model to be updated is updated based on the update information to obtain the target video processing model.

Or updating the video processing model to be updated according to the shared video data and the heterogeneous degree information to obtain the target video processing model, or updating the video processing model to be updated according to the shared video data and the heterogeneous degree information by adopting any other possible modes to obtain the target video processing model, such as engineering, digital combination and the like, without limitation.

That is, in the embodiment of the present disclosure, based on the local shared video data at the server side and the obtained distribution difference degree between the local video data and the shared video data in each edge device, the corresponding optimization processing is performed on the to-be-aggregated model locally in advance before the to-be-aggregated model is issued to each edge device, and the shared video data is not required to be distributed to each edge device, and the training convergence efficiency of the video processing model can be effectively improved by combining the centralized mode and the edge device learning mode.

In this embodiment, a video processing model to be updated is obtained by acquiring the video processing model to be updated, the video processing model to be updated is obtained by aggregating a plurality of to-be-aggregated models respectively reported by a plurality of edge devices, and a plurality of heterogeneous degree information corresponding to the plurality of edge devices is determined, wherein the heterogeneous degree information describes distribution difference degrees between local video data and shared video data in the corresponding edge devices, and then the video processing model to be updated is updated according to the shared video data and the heterogeneous degree information to obtain a target video processing model.

Fig. 2 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 2, the method for updating the video processing model includes:

s201: and acquiring a video processing model to be updated, wherein the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated, which are respectively reported by a plurality of edge devices.

S202: and determining a plurality of pieces of heterogeneous degree information respectively corresponding to the plurality of edge devices, wherein the heterogeneous degree information describes the distribution difference degree between the local video data and the shared video data in the corresponding edge devices.

The descriptions of S201 to S202 may be specifically referred to the above embodiments, and are not repeated herein.

S203: and determining model performance parameters of the video processing model to be updated.

The model performance parameter may refer to a parameter representing the working performance of the video processing model to be updated, such as the working efficiency and/or accuracy of the video processing model to be updated, and the like, which is not limited.

In some embodiments, determining the model performance parameter of the video processing model to be updated may be inputting the shared video data into the video processing model to be updated to obtain the data processing speed of the video processing model to be updated, and then taking the data processing speed as the model performance parameter.

In other embodiments, the determining the model performance parameter of the video processing model to be updated may also be performed by using a third party model performance evaluation device to evaluate the performance of the video processing model to be updated, and transmitting the evaluation result as the model performance parameter to the execution body of the embodiment of the present disclosure, which is not limited.

Optionally, in some embodiments, determining the model performance parameter of the video processing model to be updated may further include obtaining labeling result information corresponding to the shared video data, inputting the shared video data into the video processing model to be updated to obtain processing result information output by the video processing model to be updated, determining an accuracy parameter of the processing result information relative to the labeling result information, and taking the accuracy parameter as the model performance parameter, where when comparing and analyzing based on the processing result information and the labeling result information, the obtained accuracy parameter may effectively represent the output accuracy of the video processing model to be updated, and then taking the accuracy parameter as the model performance parameter, so that the current model performance may be quantized, and the accuracy of the obtained model performance parameter to the representation of the current model performance is effectively improved.

The labeling result information may refer to labeling information related to a model processing result determined in advance based on the shared video data, and the labeling result information may be used as reference information for determining the output accuracy of the model subsequently.

The accuracy parameter may be a parameter determined based on the labeling result information and the processing result information and used for representing the output accuracy of the video processing model to be updated currently.

In the embodiment of the disclosure, by determining the model performance parameters of the video processing model to be updated, the model performance of the video processing model to be updated can be evaluated in real time, so that the video processing model to be updated meeting the preset conditions is determined in time in the process of updating the model of the server, and the performance reliability of the obtained target video processing model can be effectively improved.

For example, the method for updating the video processing model in the embodiment of the disclosure may be applied to a federal learning scenario including 1 server and N edge devices. Wherein N edge devices can respectively train an initial model issued by the server by using local video data stored locally, and the server can aggregate the edge devicesAnd preparing the uploaded model parameters after training, and transmitting the model parameters obtained by aggregation to N edge devices so that the edge devices perform a new round of model training process. Assume that local video data is present locally in edge device k

The expression is as follows:

wherein,,

representing local video data +.>

Data volume, x _k,j The j-th input data, y, representing edge device k _k,j Represents x _k,j Is a label of (a). The set of the plurality of local video data may be expressed as +. >

Total sample->

The objective of the model training is to find the model parameters w to minimize the loss function on the overall video data, so as to ensure the output reliability of the subsequently obtained target video data processing model in the process of multiple uses. The optimal solution for the loss function can be expressed as follows:

wherein F (w) is a loss function of the parameter w in the overall video data,

refers to the minimum solution of F (w), i.e., the optimal solution. F (F) _k (w) is a local loss function in edge device k, the expression of which is as follows:

wherein the function f (w, x _k,j ,y _k,j ) Can be used to measure the model parameters w in a single data pair { x } _k,j ,y _k,j Error in }.

Since in federal learning the data is generally non-independent and co-distributed, zhan Sen shannon divergence (Jensen-Shannon divergence, JS) can be used to characterize the degree of difference in distribution of data across devices and servers, its expression D (P) _k ) The following is shown:

wherein,,

P _k ＝{P _k (Y) |y ε Y. Y may represent a set of a plurality of tags Y. P (P) _k (y) denotes the possibility of the tag y being on the device k or server (k=0). D (D) _KL (P _i ||P _j ) Refers to the Kullback-lebsiella divergence (Kullback-Leibler divergence, KL) expressed as follows:

where i represents the number of iterative updates of the current model, j represents the input data during model training, and P _i (y) can represent the data distribution condition of the label y in the ith iteration update process, P _j (y) may represent the data distribution of the tag y at the time of inputting the j-th data.

Thus, the D _KL (P _i ||P _j ) Can represent P _i And P _j The degree of distribution difference between the two, when the value of the relative entropy is small, the P is represented _i And P _j The two are relatively close together, and a higher degree of distribution difference indicates that a larger distribution difference between a device or server and the shared video data may have an impact on the model performance.

S204: and determining model updating information according to the shared video data, the heterogeneous degree information and the model performance parameters.

The model update information may be determined based on the shared video data, the heterogeneous degree information, and the model performance parameter, and is related information indicating an update process of the video processing model to be updated.

In some embodiments, determining the model update information according to the shared video data, the heterogeneous degree information, and the model performance parameter may be inputting the shared video data, the heterogeneous degree information, and the model performance parameter into a pre-trained machine learning model to determine corresponding model update information, and transmitting the model update information to an execution subject of the embodiments of the present disclosure.

Or determining the model updating information according to the shared video data, the heterogeneous degree information and the model performance parameter, or adopting a big data analysis method in advance to obtain a functional relation between the shared video data, the heterogeneous degree information and the model performance parameter and the model updating information, and then determining the model updating information according to the shared video data, the heterogeneous degree information and the model performance parameter based on the functional relation without limitation.

Optionally, in some embodiments, the determining the model update information according to the shared video data, the heterogeneous degree information, and the model performance parameter may be determining a target iteration number according to the shared video data, the heterogeneous degree information, and the model performance parameter, determining random gradient information corresponding to the server, then determining a target normalized gradient according to the target iteration number and the random gradient information, and using the target iteration number and the target normalized gradient together as the model update information, where the obtained target iteration number effectively characterizes a preferred update number of the server in a model training process, and because the random gradient information has a higher relevance to the model update process, and normalization may effectively reduce a feature difference between the target iteration number and the random gradient information, and when the obtained target normalized gradient and the obtained target iteration number are used together as the model update information, reliability and a model convergence effect of the model update information in the model training process may be effectively improved.

The target iteration number may refer to the effective number of iterative updating of the server determined based on the shared video data, the heterogeneous degree information, and the model performance parameter, that is, when the server may reach the best model training effect after updating the target metabolism number.

It will be appreciated that during model training, a loss function is typically introduced to determine the gap between the model output and the actual results. In the embodiment of the disclosure, a gradient descent algorithm in mathematics can be adopted to minimize a loss function, ensure a model convergence effect and improve the output accuracy of a model obtained by training.

Wherein a gradient, which may refer to the slope of a function curve, i.e. the rate of change of the function curve, may be used to determine the extremum of the function curve. And a random gradient may refer to a gradient of any point of the model update process corresponding to the loss function in the server. The random gradient information may be used to describe information related to the random gradient, such as a gradient value corresponding to the random gradient, or a trend of change of the random gradient, which is not limited.

The target normalized gradient may be a gradient value obtained by processing the target iteration number and random gradient information by using a normalized calculation method.

For example, let the video processing model to be updated be

The calculation formula thereof can be expressed as follows:

wherein,,

is the total number of local video data on the edge device selected during the t-th round of model training, η refers to the learning rate during model training, +.>

Refers to the gradient calculated based on the local video data in the edge device k.

In an application scenario, the data size of the shared video data in the server may be far greater than the data size of the local video data in a single edge device, which may cause inconsistency of the target when the shared video data is used for performing the update operation of random gradient descent, so in the embodiment of the present disclosure, the calculated gradient may be normalized, and thus the obtained model update expression may be as follows:

wherein w is ^t Representing the initial video processing model of the t-th round,

refers to the target iteration number of the server, +.>

Refers to the normalized gradient corresponding to the server in the t-th round. The initial video processing model w ^t The method can be transmitted to a plurality of selected edge devices by a server in the t-th round of model training process, and the edge devices train according to local video data so as to obtain a video processing model to be aggregated.

Wherein the gradient is normalized

The computational expression of (2) may be as follows:

wherein,,

is a model parameter obtained after the server performs the ith batch iteration on the video processing model to be updated in the t-th round, and is ∈>

Is a corresponding random gradient on the server, +.>

Is the initial iteration number of the server update, where n ₀ Representing the amount of shared video data in the server, E representing the number of rounds of local model training, B representing the batch size (i.e., the amount of data used in a model update process), during each round of training, the shared video data may be used to iteratively update the video processing model to be updated multiple times.

Wherein the target number of iterations

The calculation formula of (2) can be as follows:

wherein, acc ^t Is the model after polymerization of the t th round

Accuracy on server data, n ₀ Refers to the amount of shared video data, +.>

Overall data distribution representing selected edge devices of the t-th round,/->

Then indicate->

Degree of difference in distribution of P _k Representing local video data distribution on edge device k, P ₀ Representing a shared video data distribution, D (P ₀ ) Representing P ₀ The degree of difference in distribution of (1) decay E (0, 1) represents ++over the course of the model training process>

For ensuring the final convergence of the model, C is a hyper-parameter, f '(acc) is a function of acc, and f' (acc) =1-acc can be made. In the initial stage of model training, the value of acc is smaller, and the corresponding value of f' (acc) is larger, so that the model needs to be updated by using more data and calculation force of a server. In the later stages of model training, the value of f' (acc) may be reduced to reduce the impact of server data.

In the embodiment of the disclosure, by adopting the normalization processing method, the parameter updating rate can be effectively improved, the time for model training convergence can be reduced, and meanwhile, the influence of dimension on the model can be effectively reduced.

S205: and updating the video processing model to be updated according to the shared video data and the model updating information to obtain a target video processing model.

In some embodiments, the video processing model to be updated is updated according to the shared video data and the model update information to obtain the target video processing model, which may be obtained by performing update processing on the video processing model to be updated based on the model update information to obtain an updated video processing model, and then training the updated video processing model based on the shared video data to obtain the target video processing model, or obtaining the target video processing model, or may be obtained by adopting any other possible method, and updating the video processing model to be updated according to the shared video data and the model update information, which is not limited.

Optionally, in some embodiments, the video processing model to be updated is updated according to the shared video data and the model update information to obtain the target video processing model, which may be training the target iteration number of the video processing model to be updated according to the shared video data to obtain the intermediate video processing model, and then performing reverse iteration training on the intermediate video processing model according to the target normalized gradient to obtain the target video processing model.

The intermediate processing model may refer to a video processing model obtained by training the target iteration number based on the shared video data by the video processing model to be updated.

The reverse iterative training may refer to a training process of traversing the container in a reverse direction performed in a model reverse iterator, and the model may traverse the container from the last element to the first element, so as to achieve the training purpose.

In the embodiment of the disclosure, the model performance parameters of the video processing model to be updated are determined, the model updating information is determined according to the shared video data, the heterogeneous degree information and the model performance parameters, and then the video processing model to be updated is updated according to the shared video data and the model updating information to obtain the target video processing model.

In this embodiment, the video processing model to be updated is obtained, and the heterogeneous degree information corresponding to the plurality of edge devices is determined, so that the model performance parameter of the video processing model to be updated is determined, and then the model update information is determined according to the shared video data, the heterogeneous degree information and the model performance parameter, and then the video processing model to be updated is updated according to the shared video data and the model update information, so as to obtain the target video processing model.

Fig. 3 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 3, the method for updating the video processing model includes:

s301: and acquiring a video processing model to be updated, wherein the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated, which are respectively reported by a plurality of edge devices.

S302: and determining a plurality of pieces of heterogeneous degree information respectively corresponding to the plurality of edge devices, wherein the heterogeneous degree information describes the distribution difference degree between the local video data and the shared video data in the corresponding edge devices.

S303: and determining model performance parameters of the video processing model to be updated.

The descriptions of S301 to S303 may be specifically referred to the above embodiments, and are not repeated herein.

S304: and determining the initial iteration times of the video processing model to be updated.

The initial iteration number may refer to the actual iteration update number of the video processing model to be updated in the server at the current time point.

In the embodiment of the present disclosure, determining the initial iteration number of the video processing model to be updated may be pre-configuring a counting module for the server side to count the update iteration number of the video processing model to be updated in the server, and transmitting the update iteration number as the initial iteration number to the execution body of the embodiment of the present disclosure, so that a reliable reference basis may be provided for subsequent determination of the target iteration number.

S305: and determining iteration number adjustment information according to the shared video data and the heterogeneous degree information and the model performance parameters.

The iteration adjustment information may be used to describe the related information for adjusting the initial iteration number to obtain the target iteration number.

In some embodiments, the iteration number adjustment information is determined according to the shared video data, the heterogeneous degree information, and the model performance parameter, which may be that the corresponding iteration adjustment parameters are determined according to the shared video data, the heterogeneous degree information, and the model performance parameter, respectively, and then the obtained multiple iteration adjustment parameters are combined, and the processing result is used as the iteration number adjustment information.

Alternatively, in some embodiments, an engineering method may be used to determine the iteration number adjustment information according to the shared video data, the heterogeneous degree information, and the model performance parameter, which is not limited.

Optionally, in some embodiments, the determining the iteration number adjustment information according to the shared video data, the heterogeneous degree information, and the model performance parameter may be determining data amount information of the shared video data, determining first data distribution information of the selected edge device in the iterative updating process according to the heterogeneous degree information, determining second data distribution information of the server according to the heterogeneous degree information, and determining the iteration number adjustment information according to the data amount information, the first data distribution information, the second data distribution information, and the model performance parameter, thereby effectively combining a plurality of relevant dimension features of the selected edge device and the server in the determination process of the iteration number adjustment information, and effectively improving suitability and accuracy of the obtained iteration number adjustment information in different application environments.

Wherein the data amount information may be used to describe information about the amount of shared video data in the server, such as a value of the amount of shared video data.

Wherein, data distribution may refer to the likelihood of data in an edge device or server. For example, the server only contains two types of pictures a and B, and the type a pictures are 400, and the type B pictures have 100 pictures, so that the data distribution in the server can be determined to be (0.8, 0.2). The data distribution information may refer to related information describing a data distribution condition. The first data distribution information refers to data distribution information of the selected edge devices in the iterative updating process of the round, and the first data distribution information can be determined by the total number of the uploading servers of the plurality of the selected edge devices and the data distribution condition of the first data distribution information. The second data distribution information may refer to data distribution information in a server.

In some embodiments, the determining the first data distribution information of the selected edge device in the iterative updating process of the present application may be that the plurality of heterogeneous degree information is input into a pre-trained data distribution information determining model to obtain the first data distribution information corresponding to the selected edge device, and the first data distribution information is transmitted to the execution body of the embodiment of the present disclosure.

Alternatively, any other possible method may be used to determine the first data distribution information of the selected edge device in the iterative updating process according to the multiple heterogeneous degree information, which is not limited.

Optionally, in some embodiments, the determining the first data distribution information of the selected edge device in the iterative updating process according to the multiple heterogeneous degree information may be determining target heterogeneous degree information corresponding to the selected edge device, determining data distribution information of the selected edge device according to the target heterogeneous degree information, and then taking the data distribution information of the selected edge device as the first data distribution information.

The target heterogeneous degree information refers to heterogeneous degree information corresponding to the selected edge device.

In the embodiment of the disclosure, by determining the iteration number adjustment information according to the shared video data, the heterogeneous degree information and the model performance parameter, a reliable execution strategy can be provided for the adjustment process of the subsequent initial iteration number, so that the accuracy of the obtained target iteration number is effectively improved.

S306: and adjusting the initial iteration times according to the iteration times adjustment information to obtain the target iteration times.

In the embodiment of the disclosure, the initial iteration number is adjusted according to the iteration number adjustment information to obtain the target iteration number, the proportional relation between the initial iteration number and the target iteration number may be determined based on the iteration number adjustment information, and then the initial iteration number is adjusted according to the proportional relation to obtain the target iteration number, or the invalid iteration number in the initial iteration number may be determined based on the iteration number adjustment information, and then the initial iteration number and the invalid iteration number are subjected to subtraction processing to obtain the target iteration number, which is not limited.

In the embodiment of the disclosure, the initial iteration number of the video processing model to be updated is determined, the iteration number adjustment information is determined according to the shared video data, the heterogeneous degree information and the model performance parameter, the initial iteration number is adjusted according to the iteration number adjustment information, so that the target iteration number is obtained, and the difference information between the initial iteration number and the target iteration number can be accurately represented based on the shared video data, the heterogeneous degree information and the iteration number adjustment information determined by the model performance parameter because the target iteration number has higher relevance to the current initial iteration number of the video processing model to be updated in the server, so that the accuracy of the obtained target iteration number can be effectively improved when the initial iteration number is adjusted according to the iteration number adjustment information.

S307: random gradient information corresponding to the server is determined.

S308: and determining a target normalized gradient according to the target iteration times and the random gradient information.

S309: and jointly using the target iteration times and the target normalized gradient as model updating information.

S310: and updating the video processing model to be updated according to the shared video data and the model updating information to obtain a target video processing model.

The descriptions of S307 to S310 may be specifically referred to the above embodiments, and are not repeated herein.

For example, fig. 4 is a schematic diagram of a model training flow according to an embodiment of the present disclosure, as shown, the entire model training process may be composed of multiple rounds of sub-training processes, which may include: (1) randomly selecting N edge devices from a plurality of edge devices by a server to participate in a training process, and transmitting a predetermined initial model to the selected edge devices; (2) model training by the selected edge device using local video data; (3) after training, the edge equipment uploads the respectively obtained models to be aggregated to a server; (4) the method comprises the steps that aggregation processing is conducted on a plurality of to-be-aggregated models uploaded by edge equipment through a server, so that to-be-updated video processing models are obtained; (5) and updating the obtained video processing model to be updated by adopting shared data in the server to obtain a target video processing model, wherein the target video processing model can be used as an initial model in the next model training process.

In this embodiment, a video processing model to be updated is obtained by acquiring the video processing model to be updated, the video processing model to be updated is obtained by aggregating a plurality of to-be-aggregated models respectively reported by a plurality of edge devices, and a plurality of pieces of heterogeneous degree information corresponding to the plurality of edge devices are determined, wherein the heterogeneous degree information describes distribution difference degrees between local video data and shared video data in the corresponding edge devices, model performance parameters of the video processing model to be updated are determined, initial iteration numbers of the video processing model to be updated are determined, iteration number adjustment information is determined according to the shared video data, heterogeneous degree information and model performance parameters, the initial iteration numbers are adjusted according to the iteration number adjustment information, so that target iteration numbers are obtained, and because the target iteration numbers have higher relevance with the current initial iteration numbers of the video processing model to be updated in a server, the difference information between the initial iteration numbers and the target iteration numbers can be accurately represented based on the shared video data, the heterogeneous degree information and the iteration number adjustment information determined by the model performance parameters.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the updating device 50 of the video processing model includes:

the obtaining module 501 is configured to obtain a video processing model to be updated, where the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated reported by a plurality of edge devices respectively;

a determining module 502, configured to determine a plurality of heterogeneous degree information corresponding to a plurality of edge devices, where the heterogeneous degree information describes a degree of distribution difference between local video data and shared video data in the corresponding edge device; and

and the updating module 503 is configured to update the video processing model to be updated according to the shared video data and the heterogeneous degree information, so as to obtain a target video processing model.

In some embodiments of the present disclosure, as shown in fig. 6, fig. 6 is a schematic diagram of a fifth embodiment of the present disclosure, an updating apparatus 60 of the video processing model, including: the device comprises an acquisition module 601, a determination module 602 and an updating module 603, wherein the updating module 603 comprises:

a first determining submodule 6031 for determining model performance parameters of the video processing model to be updated;

a second determining submodule 6032 for determining model update information according to the shared video data, the heterogeneous degree information and the model performance parameter; and

And the updating submodule 6033 is used for updating the video processing model to be updated according to the shared video data and the model updating information to obtain a target video processing model.

In some embodiments of the present disclosure, wherein the second determining submodule 6032 includes:

a first determining unit 60321, configured to determine a target iteration number according to the shared video data, the heterogeneous degree information, and the model performance parameter;

a second determining unit 60322 for determining random gradient information corresponding to the server;

a third determining unit 60323, configured to determine a target normalized gradient according to the target iteration number and the random gradient information; and

the fourth determining unit 60324 is configured to use the target iteration number and the target normalized gradient together as model update information.

In some embodiments of the present disclosure, the first determining unit 60321 is specifically configured to:

determining the initial iteration times of a video processing model to be updated;

determining iteration number adjustment information according to the shared video data, the heterogeneous degree information and the model performance parameters; and

and adjusting the initial iteration times according to the iteration times adjustment information to obtain the target iteration times.

In some embodiments of the present disclosure, wherein the first determining unit 60321 is further configured to:

Determining data amount information of the shared video data;

according to the heterogeneous degree information, determining first data distribution information of the selected edge equipment in the iterative updating process of the round; and

determining second data distribution information of the server according to the heterogeneous degree information; and

and determining iteration number adjustment information according to the data volume information, the first data distribution information, the second data distribution information and the model performance parameter.

determining target heterogeneous degree information corresponding to the selected edge equipment;

determining data distribution information of the selected edge equipment according to the target heterogeneous degree information; and

and taking the data distribution information of the selected edge device as first data distribution information.

In some embodiments of the present disclosure, the first determining submodule 6031 is specifically configured to:

obtaining labeling result information corresponding to the shared video data;

inputting the shared video data into a video processing model to be updated to obtain processing result information output by the video processing model to be updated; and

and determining accuracy parameters of the processing result information relative to the labeling result information, and taking the accuracy parameters as model performance parameters.

In some embodiments of the present disclosure, the update sub-module 6033 is specifically configured to:

training the target iteration times of the video processing model to be updated according to the shared video data to obtain an intermediate video processing model;

and performing reverse iterative training on the intermediate video processing model according to the target normalized gradient to obtain a target video processing model.

It will be understood that, in the video processing model updating apparatus 60 in fig. 6 of the present embodiment and the video processing model updating apparatus 50 in the above embodiment, the acquiring module 601 and the acquiring module 501 in the above embodiment, the determining module 602 and the determining module 502 in the above embodiment, and the updating module 603 and the updating module 503 in the above embodiment may have the same functions and structures.

The foregoing explanation of the method for updating the video processing model is also applicable to the apparatus for updating the video processing model of the present embodiment.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 shows a schematic block diagram of an example electronic device for implementing the method of updating a video processing model of an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 701 performs the respective methods and processes described above, for example, an update method of a video processing model. For example, in some embodiments, the method of updating a video processing model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When a computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the above-described method of updating a video processing model may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method of updating the video processing model in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of updating a video processing model, comprising:

acquiring a video processing model to be updated, wherein the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated, which are respectively reported by a plurality of edge devices, and the models to be aggregated are obtained by training each round of models issued by a server by the edge devices by utilizing local video data, wherein the local video data are local data corresponding to the plurality of edge devices;

Determining a plurality of pieces of heterogeneous degree information respectively corresponding to the plurality of edge devices, wherein the heterogeneous degree information describes the distribution difference degree between local video data and shared video data in the corresponding edge devices; and

and updating the video processing model to be updated according to the shared video data and the heterogeneous degree information to obtain a target video processing model.

2. The method of claim 1, wherein the updating the video processing model to be updated according to the shared video data and the heterogeneous degree information to obtain a target video processing model comprises:

determining model performance parameters of the video processing model to be updated;

determining model update information according to the shared video data, the heterogeneous degree information and the model performance parameters; and

and updating the video processing model to be updated according to the shared video data and the model updating information to obtain the target video processing model.

3. The method of claim 2, wherein the determining model update information from the shared video data, the heterogeneous degree information, and the model performance parameter comprises:

Determining target iteration times according to the shared video data, the heterogeneous degree information and the model performance parameters;

determining random gradient information corresponding to the server;

determining a target normalized gradient according to the target iteration times and the random gradient information; and

and the target iteration times and the target normalized gradient are used as the model updating information.

4. The method of claim 3, wherein the determining a target number of iterations from the shared video data, the heterogeneous degree information, and the model performance parameter further comprises:

determining the initial iteration times of the video processing model to be updated;

determining iteration number adjustment information according to the shared video data, the heterogeneous degree information and the model performance parameter; and

5. The method of claim 4, wherein the determining iteration number adjustment information from the shared video data, the heterogeneous degree information, and the model performance parameter comprises:

Determining data amount information of the shared video data;

determining first data distribution information of the selected edge equipment in the iterative updating process of the round according to the heterogeneous degree information;

determining second data distribution information of the server according to a plurality of heterogeneous degree information; and

6. The method of claim 5, wherein determining the first data distribution information of the selected edge device in the iterative updating process according to the plurality of heterogeneous degree information comprises:

and taking the data distribution information of the selected edge equipment as the first data distribution information.

7. The method of claim 2, wherein the determining model performance parameters of the video processing model to be updated comprises:

Obtaining labeling result information corresponding to the shared video data;

inputting the shared video data into the video processing model to be updated to obtain processing result information output by the video processing model to be updated; and

and determining an accuracy parameter of the processing result information relative to the labeling result information, and taking the accuracy parameter as the model performance parameter.

8. A method according to claim 3, wherein said updating the video processing model to be updated according to the shared video data and the model update information, to obtain the target video processing model, comprises:

and performing reverse iterative training on the intermediate video processing model according to the target normalized gradient to obtain the target video processing model.

9. An apparatus for updating a video processing model, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a video processing model to be updated, the video processing model to be updated is obtained by aggregating a plurality of models to be aggregated which are respectively reported by a plurality of edge devices, the models to be aggregated are obtained by training each round of models issued by a server by the edge devices by utilizing local video data, and the local video data refer to local data corresponding to the plurality of edge devices;

A determining module, configured to determine a plurality of heterogeneous degree information corresponding to the plurality of edge devices, where the heterogeneous degree information describes a degree of distribution difference between local video data and shared video data in the corresponding edge device; and

and the updating module is used for updating the video processing model to be updated according to the shared video data and the heterogeneous degree information to obtain a target video processing model.

10. The apparatus of claim 9, wherein the update module comprises:

the first determining submodule is used for determining the model performance parameters of the video processing model to be updated;

the second determining submodule is used for determining model updating information according to the shared video data, the heterogeneous degree information and the model performance parameters; and

and the updating sub-module is used for updating the video processing model to be updated according to the shared video data and the model updating information to obtain the target video processing model.

11. The apparatus of claim 10, wherein the second determination submodule comprises:

the first determining unit is used for determining target iteration times according to the shared video data, the heterogeneous degree information and the model performance parameters;

A second determining unit configured to determine random gradient information corresponding to the server;

the third determining unit is used for determining a target normalized gradient according to the target iteration times and the random gradient information; and

and the fourth determining unit is used for taking the target iteration times and the target normalized gradient together as the model updating information.

12. The apparatus of claim 11, wherein the first determining unit is specifically configured to:

13. The apparatus of claim 12, wherein the first determining unit is further configured to:

determining data amount information of the shared video data;

14. The apparatus of claim 13, wherein the first determining unit is further configured to:

15. The apparatus of claim 10, wherein the first determination submodule is configured to:

obtaining labeling result information corresponding to the shared video data;

16. The apparatus of claim 11, wherein the update sub-module is specifically configured to:

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.