CN111488476A

CN111488476A - Image pushing method, model training method and corresponding device

Info

Publication number: CN111488476A
Application number: CN202010263423.8A
Authority: CN
Inventors: 梁喆; 朱雨
Original assignee: Beijing Aixin Technology Co ltd
Current assignee: Beijing Aixin Technology Co ltd
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-08-04
Anticipated expiration: 2040-04-03
Also published as: CN111488476B

Abstract

The application relates to the technical field of artificial intelligence, and provides an image pushing method, a model training method and a corresponding device. The image pushing method comprises the following steps: acquiring a plurality of first images; respectively inputting the first images into a first neural network deployed on the terminal equipment for scoring to obtain a plurality of first scores output by the first neural network; the scoring result of the image to be processed is simulated by an image processing algorithm deployed on a server according to the scoring result of the first neural network; and determining a first image corresponding to a target score meeting a preset rule in the plurality of first scores as a target image and sending the target image to the server so that the server processes the target image by using an image processing algorithm. The method does not need to deploy a large number of attribute networks on the terminal equipment, so the requirement on the performance of the terminal equipment is not high, and the process of selecting the target image takes the expectation of meeting the image processing algorithm as a starting point, so a better effect can be obtained when the target image is used for the image processing task.

Description

Image pushing method, model training method and corresponding device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an image pushing method, a model training method and a corresponding device.

Background

In some security system schemes, a main control server and multiple paths of image acquisition devices (such as a common camera, a snapshot machine and the like) are deployed, but if all the image acquisition devices transmit all acquired images to the main control server for subsequent processing, the deployment of a large number of image acquisition devices cannot be supported due to the influence of network bandwidth, and the system function is limited. The solution is that the image acquisition equipment filters the acquired images according to a certain standard in advance, and only a small amount of screened images are pushed to a server for processing so as to reduce the bandwidth pressure.

Taking a face recognition scene as an example, assuming that an image acquisition device acquires a plurality of face images of the same person, the existing image pushing strategy is to score the quality of the plurality of face images, and then select an image with the highest score and push the image to a server for face recognition. In order to realize quality scoring of a face image, it is a common practice at present to deploy various attribute networks on an image acquisition device, extract multiple attributes of the face image, such as attributes of a 3D pose, a blur degree, a face size, face occlusion, an eye state, illumination intensity, and the like, and then perform weighted average on the attributes to obtain a quality score.

However, in the above scheme, in order to obtain multiple attributes of the face image, the image acquisition device needs to perform complex calculation by using multiple attribute networks, and the requirement on the performance of the device is high; in addition, the weighting coefficients of the attributes are often determined by human subjectivity, so the selected face image has a certain subjective tendency, and the image is not necessarily an image expected to be obtained by a face recognition algorithm deployed on a server.

Disclosure of Invention

An embodiment of the present invention provides an image pushing method, a model training method and a corresponding apparatus, so as to solve the above technical problems.

In order to achieve the above purpose, the present application provides the following technical solutions:

in a first aspect, an embodiment of the present application provides an image pushing method, which is applied to a terminal device, and the method includes: acquiring a plurality of first images; respectively inputting the first images into a first neural network deployed on the terminal equipment for scoring to obtain a plurality of first scores correspondingly output by the first neural network; the scoring result of the image to be processed represents the suitability degree of the image to be processed for executing a preset image processing task; and determining a first image corresponding to a target score meeting a preset rule in the plurality of first scores as a target image, and sending the target image to the server so that the server processes the target image by using the image processing algorithm.

In the method, firstly, the terminal equipment only needs to utilize a first neural network deployed locally to obtain a first score of a first image, but does not need to deploy a large number of attribute networks, so that the computation amount is very limited, and the requirement on the performance of the terminal equipment is not high; secondly, the first neural network can simulate the scoring result of an image processing algorithm deployed on the server and used for executing a preset image processing task, so that a target image is selected according to the scoring of the first neural network, the image selection process does not include subjective factors, but takes expectation (characterized as the scoring result of the image processing algorithm) which accords with the image processing algorithm as a starting point, and a better effect can be obtained when the target image is used for the image processing task (namely, the target image is processed by the image processing algorithm).

In an implementation manner of the first aspect, the first image is a face image, the image processing algorithm is a face recognition algorithm, the to-be-processed image is a to-be-recognized face image, the image processing task is a face recognition task, and a scoring result of the face recognition algorithm represents a similarity degree between the to-be-recognized face image and a bottom library face image.

The image pushing method provided by the embodiment of the application can be used for a face recognition scene, but is not necessarily used for the scene, in the scene, the identity of a person corresponding to the bottom library face image is known, the server processes the face image to be recognized (for example, performs feature extraction and the like) by using a locally deployed face recognition algorithm, and finally determines the identity of the person in the face image to be recognized according to the similarity degree of the face image to be recognized and the bottom library face image, namely, the purpose of face recognition is achieved.

In one implementation manner of the first aspect, the acquiring a plurality of first images includes: a first face track is obtained and comprises a plurality of first images.

In an implementation manner of the first aspect, the acquiring the first face trajectory includes: carrying out face detection on video frames in a first video to obtain a plurality of first images; wherein each first image corresponds to a face frame detected from a video frame; associating the plurality of first images to form the first face trajectory.

In the above two implementations, the first face trajectory may refer to a set of a series of temporally consecutive face images (first images), and the face images in the same face trajectory belong to the same person in principle.

For example, the first face track may be a sequence of frames in a video that contain the face of the same person, and the target image selected from the first face track should be a face image in a normal state (e.g., no occlusion, a front side, a good lighting condition, etc.) for the purpose of face recognition effect. In some implementation manners, the face image included in the first face track may be only a face frame in the video frame instead of the whole video frame, which is beneficial to improving the efficiency of processing the face image subsequently, and only a small bandwidth is occupied when the target image is pushed to the server.

In one implementation of the first aspect, the first neural network includes at least one convolutional layer and at least one fully-connected layer.

In the above implementation, at least one convolution layer is used to extract image features, and at least one full-connection layer is used to convert the extracted features into score output. It will be appreciated that other layers, such as pooling layers, etc., may also be included in the first neural network.

In one implementation form of the first aspect, the parameters of the first neural network and the features generated in the first neural network are quantized.

Because the first neural network is deployed on the terminal device, the processing capability of the terminal device is weaker than that of devices such as a server, and if the neural network adopts high-precision floating point processing, the performance of the terminal device may be difficult to meet the requirements, so that the parameters of the first neural network and the characteristics generated in the first neural network can be quantized, for example, 4-bit or 8-bit quantization is adopted.

In an implementation manner of the first aspect, if the scoring result of the image processing algorithm is positively correlated with the suitability degree of the image to be processed for executing the image processing task, the preset rule includes taking a maximum value.

In a second aspect, an embodiment of the present application provides a model training method, including: acquiring a plurality of second images; inputting the second image into a first neural network for deploying on a terminal device to score, and obtaining a second score output by the first neural network; obtaining a third score obtained by scoring the second image through an image processing algorithm deployed on a server; wherein the third score characterizes how well the second image is to be used to perform a preset image processing task; and taking the third score as a label corresponding to the second score, calculating the predicted loss of the first neural network according to the second score and the third score, and updating the parameters of the first neural network based on the predicted loss.

The above training method provides a scheme for obtaining the first neural network, that is, the scoring result of the image processing algorithm deployed on the server is used as the label of the training sample, so that when the trained first neural network scores the first image, the scoring result of the trained first neural network can simulate the scoring result of the image processing algorithm on the image to be processed, and therefore the first neural network can be used in the image pushing method provided by the embodiment of the application. It should be noted that the execution place of the training method is not limited, and may be on the terminal device, the server, or other devices.

In addition, the method utilizes an image processing algorithm to automatically generate the training labels without manual labeling, and the training mode is simple and efficient.

In an implementation manner of the second aspect, the second image is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score represents a similarity degree between the second image and the face image of the bottom library.

The image pushing method provided by the embodiment of the application can be used for a face recognition scene, but is not necessarily used for the scene.

In one implementation manner of the second aspect, obtaining a third score obtained by scoring the second image by a face recognition algorithm deployed on a server includes: inputting the second image into a pre-trained face recognition network to obtain the features of the second image extracted by the face recognition network; and calculating the distance between the features of the second image and the features of the bottom library face images, and determining the third score based on the minimum distance in the calculated distances.

The above implementation mode provides a specific scoring mode of the face recognition algorithm deployed on the server, and of course, other scoring modes can be adopted for the face recognition algorithm.

In a third aspect, an embodiment of the present application provides an image pushing apparatus, which is applied to a terminal device, and the apparatus includes: the first image acquisition module is used for acquiring a plurality of first images; the first scoring module is used for inputting the first images to a first neural network deployed on the terminal equipment respectively for scoring to obtain a plurality of first scores correspondingly output by the first neural network; the scoring result of the image to be processed represents the suitability degree of the image to be processed for executing a preset image processing task; and the image pushing module is used for determining a first image corresponding to a target score meeting a preset rule in the plurality of first scores as a target image and sending the target image to the server so that the server processes the target image by utilizing the image processing algorithm.

In a fourth aspect, an embodiment of the present application provides a model training apparatus, including: the second image acquisition module is used for acquiring a plurality of second images; the second scoring module is used for inputting the second image into a first neural network deployed on a terminal device for scoring, and obtaining a second score output by the first neural network; the third scoring module is used for obtaining a third score obtained by scoring the second image through an image processing algorithm deployed on the server; wherein the third score characterizes how well the second image is to be used to perform a preset image processing task; and the parameter updating module is used for calculating the predicted loss of the first neural network by taking the third score as a label of the second score and updating the parameter of the first neural network based on the predicted loss.

In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores computer program instructions, and when the computer program instructions are read and executed by a processor, the computer program instructions perform a method provided by any one of the possible implementation manners of the first aspect, the second aspect, or both.

In a sixth aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor, wherein the memory stores computer program instructions, and the computer program instructions, when read and executed by the processor, perform the method provided by any one of the possible implementation manners of the first aspect, the second aspect, or both.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 illustrates an application scenario diagram of an image pushing method provided by an embodiment of the present application;

FIG. 2 is a flowchart illustrating an image pushing method according to an embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a model training method provided by an embodiment of the present application;

fig. 4 is a functional block diagram of an image pushing apparatus according to an embodiment of the present application;

FIG. 5 is a functional block diagram of a model training apparatus provided in an embodiment of the present application;

fig. 6 shows a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element. The terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Fig. 1 shows an application scenario diagram of an image pushing method provided in an embodiment of the present application. Referring to fig. 1, the terminal device 100 and the server 200 are involved in this scenario, wherein a first neural network is deployed on the terminal device 100, and an image processing algorithm for executing a preset image processing task is deployed on the server 200, and the present application does not limit what kind of image processing task the image processing algorithm deployed on the server 200 is specifically used for executing, for example, a face recognition task (at this time, the image processing algorithm is a face recognition algorithm), a target detection task (at this time, the image processing algorithm is a target detection algorithm), an image segmentation task (at this time, the image processing algorithm is an image segmentation algorithm), and the like, and it is needless to say that a case where the image processing task is a face recognition task is mainly described in the following text.

The image pushing method provided by the embodiment of the present application may be executed by the terminal device 100, and fig. 6 shows one possible structure of the terminal device 100, and reference may be made to the following explanation of fig. 6. The functions to be realized by the terminal device 100 executing the image push method are: a target image is selected from the plurality of first images and transmitted to the server 200, and the server 200 processes the target image using an image processing algorithm (i.e., performs a preset image processing task using the target image). The number of the selected target images is certainly smaller than the number of the first images on the terminal device 100.

The reason for selecting the target image from the plurality of first images is related to a specific application scenario, and the present application is not limited thereto. For example, the bandwidth between the terminal device 100 and the server 200 is limited, and if the terminal device 100 sends the first image to the server 200 without selection, the terminal device occupies a large bandwidth, so that the number of the terminal devices 100 that can access the server 200 is limited, and the actual requirement cannot be met, and if only the target image is transmitted, the bandwidth pressure can be greatly reduced. Typical cases such as security systems are described above. For example, if the server 200 has a limited processing capacity and the terminal device 100 indiscriminately transmits the first image to the server 200, the server 200 cannot efficiently process the received image when the number of terminal devices 100 accessing the server 200 is large, and if only the target image is transmitted, the processing load of the server 200 can be significantly reduced. Of course, other reasons may exist and are not listed. In the image pushing method provided by the embodiment of the application, the terminal device 100 selects and pushes the target image based on the scoring result of the first image by the first neural network, and a specific process of the terminal device will be described later.

In most cases, the operation processing capability of the terminal device 100 is significantly lower than that of the server 200, and a specific policy is specifically set in this application, but this does not represent that the image push method provided in the embodiment of this application has a mandatory requirement on the performance of the terminal device 100 and the server 200.

Fig. 2 shows a flowchart of an image pushing method provided in an embodiment of the present application. Referring to fig. 2, the method includes:

step S300: the terminal device acquires a plurality of first images.

The plurality of first images acquired in step S300 are used as candidate images for the target image. The method for acquiring the first image is not limited, and for example, a decoded video frame may be acquired from a video as the first image, or an image may be acquired from an existing image library as the first image. And, the first image may be generated by the terminal device itself, or may be acquired from another device.

The plurality of first images may have a certain association in content, for example, may be a series of images captured for the same environment (such as a room, a road, etc.), may be a series of images captured for the same object (such as a person, a vehicle, etc.), and so on.

Step S310: the terminal equipment respectively inputs the first images into the first neural network for scoring, and a plurality of first scores correspondingly output by the first neural network are obtained.

Wherein the first neural network has the property of: the scoring result can simulate the scoring result of the image to be processed by the image processing algorithm deployed on the server, in other words, if the same image to be processed is respectively input to the first neural network and the image processing algorithm deployed on the server, the obtained scoring results are similar.

The model training method provided by the embodiments of the present application can be used to obtain the first neural network with the above properties, and the specific steps are described later. Of course, the first neural network may be obtained in other ways, for example, in the case that the image processing algorithm is also a certain neural network-based algorithm, the neural network on the server is not called as the second neural network, and the first neural network may be simplified by the second neural network, or even directly used. Of course, it is not common to use the second neural network directly, and as mentioned above, in most cases, the operation processing capability of the terminal device is significantly lower than that of the server, and it is difficult to completely adopt the same algorithm as that of the server on the terminal device.

The present application does not limit what kind of Neural Network is specifically adopted for the first Neural Network, and examples of the Neural Network may be a Convolutional Neural Network (CNN), a regional Convolutional Neural Network (R-CNN), a cyclic Neural Network (RNN), and so on.

In one implementation, the first neural network is a convolutional neural network, and the structure of the convolutional neural network comprises at least one convolutional layer and at least one fully-connected layer which are connected in sequence. Wherein, at least one convolution layer is used for extracting the characteristics of the input image, and at least one full-connection layer is used for converting the extracted characteristics into scores to be output. By "sequentially connected" it is meant only the order of the layers and does not necessarily mean that the layers are directly connected, e.g. that other layers, such as pooling layers for down-sampling, may be added between adjacent convolutional layers. Considering that the terminal device may have a weak performance and is not suitable for deploying a neural network with an excessively complex structure, in some alternatives, the number of layers of the at least one convolutional layer is not set too much (certainly, not too small, otherwise, feature extraction cannot be effectively performed), for example, 7 to 11 convolutional layers may be set.

Further, still considering the performance of the terminal device, if the first neural network employs high-precision floating point processing, the terminal device may be difficult to bear, and therefore in some implementations, the parameters of the first neural network and the features generated in the first neural network may also be quantized, for example, using 4-bit or 8-bit quantization, so as to reduce the operation amount of the floating point operation. In contrast, if the server executes the image processing task using the second neural network, the parameters of the second neural network and the features generated in the second neural network do not need to be quantized.

In a simpler implementation, the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for executing the preset image processing task, that is, the more suitable the image to be processed is for executing the image processing task, the higher the scoring result of the image processing algorithm is, and the lower the scoring result is otherwise. For example, for the face recognition task, if the probability that the face image is successfully recognized is higher, the scoring result of the face recognition algorithm is higher, otherwise, the scoring result is lower. It should be noted that some image processing algorithms may not output a score directly, and at this time, the intermediate or final output result of the algorithm may be mapped to a score as the scoring result of the algorithm (it is also considered that the image processing algorithm scores the image to be processed).

As can be seen from the above description of step S310, the scoring result of the first image by the first neural network and the scoring result of the first image by the image processing algorithm on the server are similar (although the first image is not actually scored by the image processing algorithm when the first neural network scores the first image, such similarity is expected). Thus, the first score may characterize how well the first image is used to perform the pre-set image processing task.

Step S320: the terminal equipment determines a first image corresponding to a target score meeting a preset rule in the plurality of first scores as a target image, and sends the target image to the server.

The present application does not limit what kind of rules are specifically adopted for the preset rules. For example, since the first score may characterize the suitability of the first image for performing a preset image processing task, in order to optimize the result of performing the image processing task, the first score with the highest suitability of the characterization may be selected as the target score and the target image may be determined therefrom. For example, if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task, the preset rule may be set to select the largest first score.

For another example, the best execution result of the image processing task is not pursued, and only one first image which can enable the execution result to meet the preset requirement needs to be found, then only one first score of which the suitability degree of the representation meets the requirement needs to be selected as the target score, and the target image is determined accordingly. For example, if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task, the preset rule may be set to select a first score greater than a preset threshold.

After the target image is determined, the terminal device sends the target image to the server, and the server processes the target image by using an image processing algorithm, namely, executes a preset image processing task.

In the method, firstly, the terminal equipment only needs to utilize a first neural network deployed locally to obtain a first score of a first image, but does not need to deploy a large number of attribute networks, so that the computation amount is very limited, and the requirement on the performance of the terminal equipment is not high; secondly, the first neural network can simulate the scoring result of an image processing algorithm deployed on the server and used for executing a preset image processing task, so that a target image is selected according to the scoring of the first neural network, the image selection process does not include subjective factors, but takes the expectation (characterized as the scoring result of the image processing algorithm) meeting the image processing algorithm on the server as a starting point, and a better effect can be obtained when the selected target image is used for executing the image processing task (namely the server processes the target image by using the image processing algorithm).

In the following, the method in fig. 2 is described by taking a face recognition scene as an example. In a face recognition scene, the first image mentioned in the method of fig. 2 is a face image, the image processing algorithm is a face recognition algorithm, the image to be processed is a face image to be recognized, and the image processing task is a face recognition task. One possible approach to face recognition is: the server processes the facial images to be recognized by using a locally deployed face recognition algorithm (e.g., performs feature extraction, etc.), and finally determines the identities of the persons in the facial images to be recognized according to the similarity degree (e.g., the similarity degree of image features) between the facial images to be recognized and the facial images in the base library (e.g., determines the identity of the person corresponding to the image in the base library with the highest similarity degree as the recognition result). The specific face recognition algorithm is not limited, and may be, for example, fast R-CNN, FaceNet, CosFace, or the like.

If the method is adopted for face recognition, the scoring result of the face recognition algorithm represents the similarity degree between the face image to be recognized and the face image in the bottom library: if the similarity between the face image to be recognized and a face image in a certain bottom library is high, the identity of a person in the face image to be recognized is definite, or the face image to be recognized is suitable for face recognition; if the similarity between the face image to be recognized and any bottom library face image is not high, the identity of the person in the face image to be recognized is not very clear, or the face image to be recognized is not very suitable for face recognition.

If the scoring result of the face recognition algorithm is positively correlated with the similarity between the face image to be recognized and the face image in the base library, the preset rule for selecting the target image can be set to select the largest first score as the target score, and if the target image selected in the way is pushed to a server for face recognition, a more accurate recognition result is expected to be obtained.

In some implementations, the plurality of first images in step S200 may be derived from a first face track, where the first face track may refer to a set of a series of temporally consecutive face images (first images), and the face images in the same face track belong to the same person in principle, so in principle, it is sufficient to arbitrarily select one first image from the first face track for face recognition, and in practice, the target image selected from the first face track is preferably a face image in a normal state (for example, no occlusion, a front side, a good lighting condition, etc.) in view of a face recognition effect, so that the target image needs to be selected according to the first score output by the first neural network.

For example, the first face track may be a plurality of continuous frames containing faces of the same person in one video, certainly, there may be a plurality of persons in one video, the faces of each person may form a face track, the first face track is not particularly a certain face track, if images in each face track are pushed, the processing modes are all similar, and therefore the first face track has generality.

The first face trajectory may be extracted from the video by, but not limited to:

firstly, carrying out face detection on video frames in a first video to obtain a plurality of first images. Wherein each first image corresponds to a face frame detected from a video frame. The face detection method is not limited, and may be FPN, SSD, Fast R-CNN, or the like, for example.

Next, the obtained plurality of first images are associated to form a first face trajectory. The primary purpose of this step is to ensure that the first images in the first face trajectory are all for the same person. For example, a first image with a cross-over Union Tracking (IOU Tracking for short) greater than a certain threshold value is associated with a first image with a cross-over Union Tracking detected in an adjacent frame.

In the first face track determined in the above manner, the first image is only a face frame in the video frame, not the entire video frame, which is beneficial to improving the efficiency of subsequently processing the first image, and only a small bandwidth is occupied when the target image is pushed to the server. Of course, it is not excluded that in some implementations the entire video frame is directly taken as the first image in the first face trajectory.

Fig. 3 shows a flowchart of a model training method provided in an embodiment of the present application. The method is used for training a first neural network, and the device performing the method is called a training device. The training device may be implemented by using the structure of the electronic device 700 in fig. 6, which is described in detail below with reference to fig. 6. The training device may be a terminal device, a server, or other device. Certainly, in consideration of the performance of the terminal device, the training of the first neural network is performed on the high-performance device, and after the training is completed, the first neural network is deployed on the terminal device to operate, which is a reasonable implementation manner. Referring to fig. 3, the method includes:

step S400: the training device acquires a plurality of second images.

Step S410: the training equipment respectively inputs the second images into the first neural network for scoring, and a plurality of second scores correspondingly output by the first neural network are obtained.

The above two steps are similar to steps S300 and S310, and the difference is mainly that the second image is an image for training, and therefore, the description is not repeated. It should be noted that the first neural network in step S410 is deployed on the training device, as described above, the training device may not be a terminal device, and only after the training is completed, the first neural network is deployed on the terminal device for use.

Step S420: and the training equipment acquires a third score obtained by scoring the second image by an image processing algorithm.

Wherein the third score characterizes how well the second image is to be used for performing the pre-set image processing task. It has been pointed out when step S310 is introduced that the image processing algorithm deployed on the server can score the image to be processed, step S420 is similar, and the second image can be understood as the image to be processed, and therefore the process is not repeated.

When the first neural network is trained, the image processing algorithm may be deployed on a server or a training device, which is not limited in this application, but the image processing algorithm is ultimately deployed on the server. If the image processing algorithm is deployed on the server and the training device is not the server, the training device also needs to send the second image to the server, and the server scores the second image by using the image processing algorithm to obtain a third score, and then returns the third score to the training device.

Step S430: and the training equipment takes the third fraction as a label corresponding to the second fraction, calculates the predicted loss of the first neural network according to the second fraction and the third fraction, and updates the parameters of the first neural network based on the predicted loss.

The third score and the second score are respectively the scoring results of the image processing algorithm and the second image by the first neural network, and the third score is used as a label corresponding to the second score to perform parameter updating (i.e., training) of the first neural network, so that when the trained first neural network scores the first image, the scoring results of the first image can simulate the scoring results of the image processing algorithm deployed on the server, and therefore the trained first neural network can be used in the image pushing method provided by the embodiment of the application. In addition, the training method utilizes an image processing algorithm to automatically generate the training labels without manual labeling, and the training mode is simple and efficient. As for the back propagation algorithm for updating the model parameters according to the prediction loss, reference can be made to the prior art and no explanation is made here.

In the following, the method in fig. 3 is described by taking a face recognition scene as an example. In the face recognition scene, the second image mentioned in the method of fig. 3 is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score represents the degree of similarity between the second image and the face image of the base.

Further, in the face recognition scene, the process of obtaining the third score by the face recognition algorithm may be implemented as follows:

firstly, inputting a second image into a pre-trained face recognition network to obtain the characteristics of the second image extracted by the face recognition network. Here, the face recognition network may be a neural network, such as the second neural network mentioned above.

Then, the distance between the feature of the second image and the feature of the bottom library face image is calculated, wherein the feature of the bottom library face image can be extracted and stored in advance, the feature of the image can be represented in a vector form, and the distance referred to herein can refer to L2 distance between vectors, which represents the similarity degree between two features, or the similarity degree between two images with the feature (the higher the similarity degree, the smaller the distance).

Finally, a third score is determined based on the minimum distance of the calculated distances. For example, a mapping relationship may be preset between the distance and the score, so that a corresponding third score may be obtained according to the determined minimum distance, and since the values of the third score and the minimum distance are directly related, the third score may represent the degree of similarity between the second image and the base library face image.

Further, since there may be a plurality of bottom library face images, a plurality of distances are calculated in the previous step, and determining the minimum distance therefrom is equivalent to determining the bottom library image most similar to the second image. In other words, the person identity corresponding to the base image is the recognition result of the second image for face recognition, so that the third score calculated based on the minimum distance reflects the suitability of the second image for performing the face recognition task.

The above implementation mode provides a specific scoring mode of the face recognition algorithm deployed on the server, and of course, other scoring modes can be adopted for the face recognition algorithm. It should also be noted that the face recognition algorithm deployed on the server may be a neural network-based algorithm or may not be a neural network-based algorithm, and the model training method provided in the embodiment of the present application mainly uses the third score output by the algorithm, and does not limit the implementation of the algorithm.

Fig. 4 shows a functional block diagram of an image pushing apparatus 500 provided in an embodiment of the present application. Referring to fig. 4, the image push apparatus 500 includes:

a first image acquisition module 510 for acquiring a plurality of first images;

a first scoring module 520, configured to input the multiple first images to a first neural network deployed on the terminal device respectively for scoring, so as to obtain multiple first scores correspondingly output by the first neural network; the scoring result of the image to be processed represents the suitability degree of the image to be processed for executing a preset image processing task;

an image pushing module 530, configured to determine, as a target image, a first image corresponding to a target score that meets a preset rule in the plurality of first scores, and send the target image to the server, so that the server processes the target image by using the image processing algorithm.

In an implementation manner of the image pushing device 500, the first image is a face image, the image processing algorithm is a face recognition algorithm, the to-be-processed image is a to-be-recognized face image, the image processing task is a face recognition task, and a scoring result of the face recognition algorithm represents a similarity degree between the to-be-recognized face image and the bottom library face image.

In one implementation of the image pushing apparatus 500, the first image obtaining module 510 obtains a plurality of first images, including: a first face track is obtained and comprises a plurality of first images.

In one implementation of the image pushing apparatus 500, the first image obtaining module 510 obtains a first face track, including: carrying out face detection on video frames in a first video to obtain a plurality of first images; wherein each first image corresponds to a face frame detected from a video frame; associating the plurality of first images to form the first face trajectory.

In one implementation of the image pushing device 500, the first neural network includes at least one convolutional layer and at least one fully-connected layer.

In one implementation of the image pushing apparatus 500, the parameters of the first neural network and the features generated in the first neural network are quantized.

In an implementation manner of the image pushing apparatus 500, if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for executing the image processing task, the preset rule includes taking a maximum value.

The image pushing apparatus 500 provided in the embodiment of the present application, the implementation principle and the generated technical effects thereof have been introduced in the foregoing method embodiments, and for brief description, reference may be made to corresponding contents in the method embodiments where no mention is made in part of the apparatus embodiments.

Fig. 5 shows a functional block diagram of a model training apparatus 600 provided in an embodiment of the present application. Referring to fig. 5, the model training apparatus 600 includes:

a second image obtaining module 610, configured to obtain a plurality of second images;

a second scoring module 620, configured to input the second image into a first neural network for deployment on a terminal device to score, and obtain a second score output by the first neural network;

a third scoring module 630, configured to obtain a third score obtained by scoring the second image by using an image processing algorithm deployed on the server; wherein the third score characterizes how well the second image is to be used to perform a preset image processing task;

a parameter updating module 640, configured to use the third score as a corresponding label of the second score, calculate a predicted loss of the first neural network according to the second score and the third score, and update a parameter of the first neural network based on the predicted loss.

In one implementation of the model training device 600, the second image is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score represents a degree of similarity between the second image and the face images of the corpus.

In an implementation manner of the model training apparatus 600, the third scoring module 630 is configured to score the second image by using a face recognition algorithm deployed on the server to obtain a third score, and includes: inputting the second image into a pre-trained face recognition network to obtain the features of the second image extracted by the face recognition network; and calculating the distance between the features of the second image and the features of the bottom library face images, and determining the third score based on the minimum distance in the calculated distances.

The model training apparatus 600 provided in the embodiment of the present application, the implementation principle and the technical effects thereof have been introduced in the foregoing method embodiments, and for the sake of brief description, portions of the apparatus embodiments that are not mentioned in the foregoing description may refer to corresponding contents in the method embodiments.

Fig. 6 shows a possible structure of an electronic device 700 provided in an embodiment of the present application. Referring to fig. 6, the electronic device 700 includes: a processor 710, a memory 720, and a communication interface 730, interconnected and in communication with each other by a communication bus 740 and/or other form of connection mechanism (not shown).

The Memory 720 includes one or more (Only one is shown in the figure), which may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), and the like. Processor 710, and possibly other components, may access, read, and/or write data to memory 720.

Processor 710 includes one or more (only one shown), which may be an integrated circuit chip having signal processing capabilities. The Processor 710 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Micro Control Unit (MCU), a Network Processor (NP), or other conventional processors; or a special-purpose Processor, including a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, and a discrete hardware component.

Communication interface 730 includes one or more devices (only one of which is shown) that can be used to communicate directly or indirectly with other devices for interaction of data. Communication interface 730 may include an interface to communicate wired and/or wireless.

One or more computer program instructions may be stored in the memory 720 and read and executed by the processor 710 to implement the image pushing method and/or the model training method provided by the embodiments of the present application.

It will be appreciated that the configuration shown in fig. 6 is merely illustrative and that electronic device 700 may include more or fewer components than shown in fig. 6 or have a different configuration than shown in fig. 6. The components shown in fig. 6 may be implemented in hardware, software, or a combination thereof. The electronic device 700 may be a physical device, such as a server, a PC, a laptop, a tablet, a mobile phone, a wearable device, an image capture device, an in-vehicle device, a drone, a robot, etc., or may be a virtual device, such as a virtual machine, a virtualized container, etc. The electronic device 700 is not limited to a single device, and may be a combination of a plurality of devices or one or more clusters of a large number of devices. For example, the terminal device 100 in fig. 1 may be implemented by using the structure of the electronic device 700, and the training device in fig. 3 may also be implemented by using the structure of the electronic device 700.

The embodiment of the present application further provides a computer-readable storage medium, where computer program instructions are stored on the computer-readable storage medium, and when the computer program instructions are read and executed by a processor of a computer, the image pushing method and/or the model training method provided in the embodiment of the present application are/is executed. For example, a computer-readable storage medium may be implemented as memory 720 in electronic device 700 in FIG. 6.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image pushing method is applied to a terminal device, and comprises the following steps:

acquiring a plurality of first images;

respectively inputting the first images into a first neural network deployed on the terminal equipment for scoring to obtain a plurality of first scores correspondingly output by the first neural network; the scoring result of the image to be processed represents the suitability degree of the image to be processed for executing a preset image processing task;

and determining a first image corresponding to a target score meeting a preset rule in the plurality of first scores as a target image, and sending the target image to the server so that the server processes the target image by using the image processing algorithm.

2. The image pushing method according to claim 1, wherein the first image is a face image, the image processing algorithm is a face recognition algorithm, the image to be processed is a face image to be recognized, the image processing task is a face recognition task, and a scoring result of the face recognition algorithm represents a similarity degree between the face image to be recognized and a face image in a base library.

3. The image pushing method according to claim 2, wherein the acquiring a plurality of first images includes:

a first face track is obtained and comprises a plurality of first images.

4. The image pushing method according to claim 3, wherein the obtaining the first face track comprises:

carrying out face detection on video frames in a first video to obtain a plurality of first images; wherein each first image corresponds to a face frame detected from a video frame;

associating the plurality of first images to form the first face trajectory.

5. The image pushing method of claim 1, wherein the first neural network comprises at least one convolutional layer and at least one fully-connected layer.

6. The image pushing method according to claim 1, wherein the parameters of the first neural network and the features generated in the first neural network are quantized.

7. The image pushing method according to any one of claims 1 to 6, wherein the preset rule comprises taking a maximum value if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task.

8. A method of model training, comprising:

acquiring a plurality of second images;

inputting the second image into a first neural network for deploying on a terminal device to score, and obtaining a second score output by the first neural network;

obtaining a third score obtained by scoring the second image through an image processing algorithm deployed on a server; wherein the third score characterizes how well the second image is to be used to perform a preset image processing task;

and taking the third score as a label corresponding to the second score, calculating the predicted loss of the first neural network according to the second score and the third score, and updating the parameters of the first neural network based on the predicted loss.

9. The model training method of claim 8, wherein the second image is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score is indicative of how similar the second image is to an underlying library face image.

10. The model training method of claim 9, wherein obtaining a third score for scoring the second image by a face recognition algorithm deployed on a server comprises:

inputting the second image into a pre-trained face recognition network to obtain the features of the second image extracted by the face recognition network;

and calculating the distance between the features of the second image and the features of the bottom library face images, and determining the third score based on the minimum distance in the calculated distances.

11. An image pushing apparatus, applied to a terminal device, the apparatus comprising:

the first image acquisition module is used for acquiring a plurality of first images;

the first scoring module is used for inputting the first images to a first neural network deployed on the terminal equipment respectively for scoring to obtain a plurality of first scores correspondingly output by the first neural network; the scoring result of the image to be processed represents the suitability degree of the image to be processed for executing a preset image processing task;

and the image pushing module is used for determining a first image corresponding to a target score meeting a preset rule in the plurality of first scores as a target image and sending the target image to the server so that the server processes the target image by utilizing the image processing algorithm.

12. A model training apparatus, comprising:

the second image acquisition module is used for acquiring a plurality of second images;

the second scoring module is used for inputting the second image into a first neural network deployed on a terminal device for scoring, and obtaining a second score output by the first neural network;

the third scoring module is used for obtaining a third score obtained by scoring the second image through an image processing algorithm deployed on the server; wherein the third score characterizes how well the second image is to be used to perform a preset image processing task;

and the parameter updating module is used for taking the third fraction as a label corresponding to the second fraction, calculating the predicted loss of the first neural network according to the second fraction and the third fraction, and updating the parameter of the first neural network based on the predicted loss.

13. A computer-readable storage medium having stored thereon computer program instructions which, when read and executed by a processor, perform the method of any one of claims 1-10.

14. An electronic device, comprising: a memory having stored therein computer program instructions which, when read and executed by the processor, perform the method of any of claims 1-10.