CN111488476B - Image pushing method, model training method and corresponding devices - Google Patents

Image pushing method, model training method and corresponding devices Download PDF

Info

Publication number
CN111488476B
CN111488476B CN202010263423.8A CN202010263423A CN111488476B CN 111488476 B CN111488476 B CN 111488476B CN 202010263423 A CN202010263423 A CN 202010263423A CN 111488476 B CN111488476 B CN 111488476B
Authority
CN
China
Prior art keywords
image
score
neural network
face
image processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010263423.8A
Other languages
Chinese (zh)
Other versions
CN111488476A (en
Inventor
梁喆
朱雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aixin Technology Co ltd
Original Assignee
Beijing Aixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aixin Technology Co ltd filed Critical Beijing Aixin Technology Co ltd
Priority to CN202010263423.8A priority Critical patent/CN111488476B/en
Publication of CN111488476A publication Critical patent/CN111488476A/en
Application granted granted Critical
Publication of CN111488476B publication Critical patent/CN111488476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and provides an image pushing method, a model training method and a corresponding device. The image pushing method comprises the following steps: acquiring a plurality of first images; respectively inputting a plurality of first images into a first neural network deployed on terminal equipment to score, and obtaining a plurality of first scores output by the first neural network; the scoring result simulation server is used for simulating the scoring result of the image to be processed by the image processing algorithm deployed on the scoring result simulation server of the first neural network; and determining a first image corresponding to a target score meeting a preset rule in the plurality of first scores as a target image and sending the target image to the server so that the server processes the target image by using an image processing algorithm. The method does not need to deploy a large number of attribute networks in the terminal equipment, so the performance requirement on the terminal equipment is not high, and the process of selecting the target image takes the expectation meeting the image processing algorithm as the starting point, so that a better effect can be obtained when the target image is used for the image processing task.

Description

Image pushing method, model training method and corresponding devices
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an image pushing method, a model training method and a corresponding device.
Background
In some security system schemes, a main control server and multiple image acquisition devices (such as a common camera and a snapshot machine) are deployed, but if all the image acquisition devices transmit all the acquired images to the main control server for subsequent processing, the network bandwidth is affected, and the deployment of many image acquisition devices cannot be supported, so that the system functions are limited. The solution is that the image acquisition equipment screens the acquired images according to a certain standard in advance, and only a small amount of screened images are pushed to the server for processing so as to reduce the bandwidth pressure.
Taking the face recognition scene as an example, assume that the image acquisition device acquires a plurality of face images of the same person, and the existing image pushing strategy is to score the quality of the face images, and then select an image with the highest score to push to the server for face recognition. In order to achieve quality scoring of face images, various attribute networks are deployed on image acquisition equipment, various attributes of the face images, such as 3D (three-dimensional) gestures, blurring degree, face size, face shielding, eye states, illumination intensity and the like, are extracted, and then weighted average is carried out on the attributes to obtain a quality score.
However, in the above scheme, in order to obtain multiple attributes of the face image, the image acquisition device needs to perform complex calculation by using multiple attribute networks, and has high requirements on device performance; in addition, the weighting coefficients of the attributes are often determined manually and subjectively, so that the selected face image has a certain subjective tendency, and the image is not necessarily the image expected to be obtained by the face recognition algorithm deployed on the server.
Disclosure of Invention
An objective of the embodiments of the present application is to provide an image pushing method, a model training method, and a corresponding device, so as to improve the above technical problems.
In order to achieve the above purpose, the present application provides the following technical solutions:
in a first aspect, an embodiment of the present application provides an image pushing method, applied to a terminal device, where the method includes: acquiring a plurality of first images; respectively inputting the plurality of first images into a first neural network deployed on the terminal equipment to score, and obtaining a plurality of first scores correspondingly output by the first neural network; the scoring result of the image processing algorithm is used for representing the suitability of the image to be processed for executing a preset image processing task; and determining a first image corresponding to a target score meeting a preset rule in the first scores as a target image, and sending the target image to the server so that the server processes the target image by using the image processing algorithm.
In the method, firstly, in order to obtain the first score of the first image, the terminal equipment only needs to use a first neural network deployed locally, but does not need to deploy a large number of attribute networks, so that the operation amount is very limited, and the performance requirement on the terminal equipment is not high; secondly, the first neural network can simulate the scoring result of the image processing algorithm deployed on the server and used for executing the preset image processing task, so that the target image is selected according to the scoring of the first neural network, the image selection process does not comprise subjective factors, and takes the expectation (the scoring result characterized as the image processing algorithm) meeting the image processing algorithm as a starting point, so that a better effect can be obtained when the target image is used for the image processing task (namely, the target image is processed by the image processing algorithm).
In an implementation manner of the first aspect, the first image is a face image, the image processing algorithm is a face recognition algorithm, the image to be processed is a face image to be recognized, the image processing task is a face recognition task, and a scoring result of the face recognition algorithm characterizes a similarity degree between the face image to be recognized and a face image of a base.
The image pushing method provided by the embodiment of the invention can be used for a face recognition scene, but is not necessarily used for the scene, in the scene, the identity of the person corresponding to the face image of the base is known, the server can process the face image to be recognized (for example, perform feature extraction and the like) by using a locally deployed face recognition algorithm, and finally, the identity of the person in the face image to be recognized is determined according to the similarity degree of the face image to be recognized and the face image of the base, namely, the purpose of face recognition is achieved.
In an implementation manner of the first aspect, the acquiring a plurality of first images includes: a first face track is acquired, the first face track including a plurality of first images.
In an implementation manner of the first aspect, the acquiring a first face track includes: performing face detection on video frames in a first video to obtain a plurality of first images; wherein each first image corresponds to a face frame detected from the video frame; the plurality of first images are associated to form the first face track.
In both implementations, the first face track may refer to a set of a series of temporally consecutive face images (first images), the face images in the same face track in principle belonging to the same person.
For example, the first face track may be a plurality of continuous frames of a video including the face of the same person, and the target image selected from the frames should be a face image in a normal state (for example, no shielding, a front face, a better illumination condition, etc.) in view of the face recognition effect. In some implementations, the face image included in the first face track may be only a face frame in the video frame, rather than the entire video frame, which is beneficial to improving efficiency of subsequent processing of the face image, and occupies less bandwidth when pushing the target image to the server.
In one implementation of the first aspect, the first neural network includes at least one convolutional layer and at least one fully-connected layer.
In the above implementation, at least one convolution layer is used to extract image features, and at least one full-connection layer is used to convert the extracted features into fractional output. It will be appreciated that other layers, such as a pooling layer, etc., may also be included in the first neural network.
In one implementation manner of the first aspect, the parameters of the first neural network and the features generated in the first neural network are quantized.
Because the first neural network is deployed on the terminal device, the processing capability of the terminal device is weaker than that of devices such as a server, and if the neural network adopts high-precision floating point processing, the performance of the terminal device may be difficult to meet the requirements, so that the parameters of the first neural network and the characteristics generated in the first neural network can be quantized, for example, 4bit quantization, 8bit quantization and the like can be adopted.
In an implementation manner of the first aspect, if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task, the preset rule includes taking a maximum value.
In a second aspect, an embodiment of the present application provides a model training method, including: acquiring a plurality of second images; inputting the second image to a first neural network for deployment on terminal equipment to score, and obtaining a second score output by the first neural network; obtaining a third score obtained by scoring the second image by an image processing algorithm deployed on a server; wherein the third score characterizes how well the second image is used to perform a preset image processing task; and taking the third score as a label corresponding to the second score, calculating the predicted loss of the first neural network according to the second score and the third score, and updating the parameters of the first neural network based on the predicted loss.
The training method provides a scheme for obtaining the first neural network, namely, the scoring result of the image processing algorithm deployed on the server is used as the label of the training sample, so that when the trained first neural network performs scoring on the first image, the scoring result can simulate the scoring result of the image processing algorithm to the image to be processed, and the first neural network can be used in the image pushing method provided by the embodiment of the application. It should be noted that the execution place of the training method is not limited, and may be on the terminal device, on the server, or on other devices.
In addition, the method utilizes an image processing algorithm to automatically generate the training label without manual labeling, and the training mode is simple and efficient.
In an implementation manner of the second aspect, the second image is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score characterizes a similarity degree between the second image and the face image of the base.
The image pushing method provided by the embodiment of the application can be used for face recognition scenes, but is not necessarily used for the scenes.
In one implementation manner of the second aspect, obtaining a third score obtained by scoring the second image by a face recognition algorithm deployed on a server includes: inputting the second image into a pre-trained face recognition network to obtain the characteristics of the second image extracted by the face recognition network; and calculating the distance between the features of the second image and the features of the face image of the base, and determining the third score based on the minimum distance in the calculated distances.
The above implementation provides a specific scoring method for the face recognition algorithm deployed on the server, although other scoring methods may be used for the face recognition algorithm.
In a third aspect, an embodiment of the present application provides an image pushing device, applied to a terminal device, where the device includes: the first image acquisition module is used for acquiring a plurality of first images; the first scoring module is used for respectively inputting the plurality of first images into a first neural network deployed on the terminal equipment to score so as to obtain a plurality of first scores correspondingly output by the first neural network; the scoring result of the image processing algorithm is used for representing the suitability of the image to be processed for executing a preset image processing task; and the image pushing module is used for determining a first image corresponding to a target score meeting a preset rule in the first scores as a target image, and sending the target image to the server so that the server can process the target image by using the image processing algorithm.
In a fourth aspect, an embodiment of the present application provides a model training apparatus, including: the second image acquisition module is used for acquiring a plurality of second images; the second scoring module is used for inputting the second image into a first neural network for deployment on terminal equipment to score so as to obtain a second score output by the first neural network; a third scoring module, configured to obtain a third score obtained by scoring the second image by using an image processing algorithm deployed on a server; wherein the third score characterizes how well the second image is used to perform a preset image processing task; and the parameter updating module is used for calculating the predicted loss of the first neural network by taking the third score as the label of the second score, and updating the parameter of the first neural network based on the predicted loss.
In a fifth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon computer program instructions that, when read and executed by a processor, perform a method provided by the first aspect, the second aspect, or any one of the possible implementations of the two aspects.
In a sixth aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor, the memory having stored therein computer program instructions which, when read and executed by the processor, perform the method provided by the first aspect, the second aspect or any one of the possible implementations of the two aspects.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows an application scenario diagram of an image pushing method provided in an embodiment of the present application;
fig. 2 shows a flowchart of an image pushing method provided in an embodiment of the present application;
FIG. 3 shows a flowchart of a model training method provided by an embodiment of the present application;
FIG. 4 shows a functional block diagram of an image pushing device according to an embodiment of the present disclosure;
FIG. 5 shows a functional block diagram of a model training apparatus according to an embodiment of the present application;
fig. 6 shows a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. The terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Fig. 1 shows an application scenario diagram of an image pushing method provided in an embodiment of the present application. Referring to fig. 1, in this scenario, a terminal device 100 and a server 200 are referred to, where the terminal device 100 is disposed with a first neural network, and the server 200 is disposed with an image processing algorithm for executing a preset image processing task, and the application does not limit what image processing task is specifically executed by the image processing algorithm disposed on the server 200, and may be, for example, a face recognition task (where the image processing algorithm is a face recognition algorithm), a target detection task (where the image processing algorithm is a target detection algorithm), an image segmentation task (where the image processing algorithm is an image segmentation algorithm), and the like, and of course, a case where the image processing task is a face recognition task will be mainly described hereinafter as an example.
The image pushing method provided by the embodiment of the present application may be performed by the terminal device 100, fig. 6 shows a possible structure of the terminal device 100, and reference may be made to the following description of fig. 6. The functions to be implemented by the terminal device 100 to perform the image pushing method are: the target image is selected from the plurality of first images and transmitted to the server 200, and the server 200 processes the target image using an image processing algorithm (i.e., performs a preset image processing task using the target image). The number of target images may be one or more, but the selection of one target image is a typical case, which will be mainly exemplified hereinafter, and the number of target images selected is necessarily smaller than the number of first images on the terminal device 100.
The reason for selecting the target image from the plurality of first images is related to a specific application scenario, which is not limited in the present application. For example, if the bandwidth between the terminal device 100 and the server 200 is limited, if the terminal device 100 sends the first image to the server 200 indiscriminately, it occupies more bandwidth, so that the number of terminal devices 100 that can access the server 200 is limited, and cannot meet the actual requirement, and if only the target image is transmitted, the bandwidth pressure can be greatly reduced. Typical situations such as security systems are described above. For another example, the processing capability of the server 200 is limited, and if the terminal device 100 transmits the first image to the server 200 indiscriminately, the server 200 cannot process the received image effectively when the number of terminal devices 100 accessing the server 200 is large, and if only the target image is transmitted, the processing pressure of the server 200 can be reduced greatly. Of course, other reasons may exist and are not listed. In the image pushing method provided in the embodiment of the present application, the terminal device 100 selects and pushes the target image based on the scoring result of the first neural network on the first image, and a specific process thereof will be described later.
In most cases, the arithmetic processing capability of the terminal device 100 is significantly lower than that of the server 200, and the application is also specifically provided with a targeted policy for these situations, but the image pushing method provided by the embodiment of the application is not represented by mandatory requirements on the performance of the terminal device 100 and the server 200.
Fig. 2 shows a flowchart of an image pushing method provided in an embodiment of the present application. Referring to fig. 2, the method includes:
step S300: the terminal device acquires a plurality of first images.
The plurality of first images acquired in step S300 are used as candidate images as target images. The method of obtaining the first image is not limited, and for example, a decoded video frame may be obtained from a video as the first image, and for example, an image may be obtained from an existing image library as the first image. The first image may be generated by the terminal device itself or may be acquired from another device.
The plurality of first images may have a certain association in content, for example, may be a series of images collected for the same environment (such as a room, a road, etc.), may be a series of images collected for the same object (such as a person, a vehicle, etc.), and so on.
Step S310: and the terminal equipment inputs the plurality of first images into the first neural network respectively for scoring, and obtains a plurality of first scores correspondingly output by the first neural network.
Wherein the first neural network has the property: the scoring result can simulate the scoring result of the image to be processed by the image processing algorithm deployed on the server, in other words, if the same image to be processed is respectively input to the first neural network and the image processing algorithm deployed on the server, the scoring result obtained by the scoring algorithm is similar.
The model training method provided in the embodiments of the present application may be used to obtain the first neural network having the above-mentioned properties, and specific steps are described later. Of course, the first neural network may be obtained by other means, for example, in the case that the image processing algorithm is also a neural network-based algorithm, the neural network on the server may be called a second neural network, and the first neural network may be simplified by the second neural network, or even directly used. Of course, the situation of directly using the second neural network is unusual, and as mentioned above, in many cases, the arithmetic processing capability of the terminal device is significantly lower than that of the server, and it is difficult to completely adopt the same algorithm as that of the server on the terminal device.
The present application is not limited to what kind of neural network is specifically used for the first neural network, and may be, for example, a convolutional neural network (Convolutional Neural Network, abbreviated as CNN), a regional convolutional neural network (Region-based CNN, abbreviated as R-CNN), a recurrent neural network (Recurrent Neural Network, abbreviated as RNN), and the like.
In one implementation, the first neural network is a convolutional neural network, and the structure of the first neural network includes at least one convolutional layer and at least one fully-connected layer which are sequentially connected. Wherein at least one convolution layer is used for extracting features of an input image, and at least one full connection layer is used for converting the extracted features into fractional output. By "sequentially connected" is meant only the order of the layers and does not represent that the layers must be directly connected, e.g., other layers may be added between adjacent convolutional layers, such as a pooling layer for downsampling. Considering that the terminal device performance may be weak and not suitable for deploying a neural network with a too complex structure, in some alternatives, the number of layers of the at least one convolution layer may not be set too large (or not too small, or feature extraction may not be performed effectively), for example, 7 to 11 convolution layers may be set.
Further, still for performance of the terminal device, if the first neural network adopts high-precision floating point processing, the terminal device may be hard to bear, so in some implementations, quantization processing may be performed on parameters of the first neural network and features generated in the first neural network, for example, 4bit quantization, 8bit quantization, etc., so as to reduce the operand of the floating point operation. In contrast, if the image processing task is performed on the server using the second neural network, the parameters of the second neural network and the features generated in the second neural network do not need to be quantized.
On the server, the scoring result output by the image processing algorithm characterizes the suitability of the image to be processed for performing the preset image processing task, and in a simpler implementation, the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the preset image processing task, that is, the more suitable the image to be processed is for performing the image processing task, the higher the scoring result of the image processing algorithm is, otherwise the lower is. For example, for a face recognition task, the higher the probability that a face image is successfully recognized, the higher the scoring result of the face recognition algorithm, and otherwise, the lower. It should be noted that some image processing algorithms may not directly output a score, where the intermediate or final output result of the algorithm may be mapped to a score as a scoring result of the algorithm (also referred to as the image processing algorithm scoring the image to be processed).
Based on the above description of step S310, the scoring result of the first image by the first neural network and the scoring result of the first image by the image processing algorithm on the server are similar (although this similarity is contemplated when the first image is scored by the first neural network, the image processing algorithm does not actually score the first image). Thus, the first score may characterize how suitable the first image is for performing the preset image processing task.
Step S320: the terminal equipment determines a first image corresponding to a target score meeting a preset rule in the first scores as a target image, and sends the target image to the server.
The application does not limit what rule is specifically adopted by the preset rule. For example, since the first score may characterize the suitability of the first image for performing a preset image processing task, in order to optimize the execution result of the image processing task, the first score having the highest suitability of the characterization may be selected as the target score and the target image may be determined therefrom. For example, if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task, the preset rule may be set to select the largest first score.
For another example, the best execution result of the image processing task is not pursued, only one first image capable of enabling the execution result to meet the preset requirement is needed to be found, and only one first score with the proper degree of characterization meeting the requirement is needed to be selected as the target score, and the target image is determined accordingly. For example, if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task, the preset rule may be set to select a first score that is greater than a preset threshold.
After the target image is determined, the terminal device sends the target image to the server, and the server processes the target image by using an image processing algorithm, namely, a preset image processing task is executed.
In the method, firstly, in order to obtain the first score of the first image, the terminal equipment only needs to use a first neural network deployed locally, but does not need to deploy a large number of attribute networks, so that the operation amount is very limited, and the performance requirement on the terminal equipment is not high; secondly, the first neural network can simulate the scoring result of the image processing algorithm deployed on the server and used for executing the preset image processing task, so that the target image is selected according to the scoring of the first neural network, the image selection process does not comprise subjective factors, and the method takes the expectation (the scoring result characterized by the image processing algorithm) meeting the image processing algorithm on the server as a starting point, so that a better effect can be obtained when the selected target image is used for executing the image processing task (namely, the server processes the target image by utilizing the image processing algorithm).
The method of fig. 2 will be described again with reference to the face recognition scenario. In the face recognition scenario, the first image mentioned in the method of fig. 2 is a face image, the image processing algorithm is a face recognition algorithm, the image to be processed is a face image to be recognized, and the image processing task is a face recognition task. One possible way of face recognition is: the method comprises the steps of storing a plurality of base face images in advance, wherein the identities of corresponding persons are known, processing the face images to be recognized (for example, performing feature extraction and the like) by using a locally deployed face recognition algorithm, and finally determining the identities of the persons in the face images to be recognized (for example, determining the identities of the persons corresponding to the base images with the highest similarity as recognition results) according to the similarity degree (for example, the similarity degree of image features) between the face images to be recognized and the base face images. The specific face recognition algorithm is not limited, and may be, for example, faster R-CNN, faceNet, cosFace.
If the method is adopted for face recognition, the scoring result of the face recognition algorithm represents the similarity degree of the face image to be recognized and the face image of the base: if the similarity between the face image to be recognized and a certain base face image is high, the identity of the person in the face image to be recognized is clear, or the face image to be recognized is suitable for face recognition; if the similarity between the face image to be recognized and any base face image is not high, the identity of the person in the face image to be recognized is not clear, or the face image to be recognized is not suitable for face recognition.
If the scoring result of the face recognition algorithm is positively correlated with the similarity degree of the face image to be recognized and the face image of the base, the preset rule for selecting the target image can be set to select the largest first score as the target score, and if the selected target image is pushed to the server for face recognition, a relatively accurate recognition result is expected to be obtained.
In some implementations, the plurality of first images in step S200 may be derived from a first face track, where the first face track may refer to a set of a series of face images (first images) that are consecutive in time, and the face images in the same face track belong to the same person in principle, so in principle, it may be sufficient to arbitrarily select one first image from the first face track to perform face recognition, and in practice, in view of the face recognition effect, it is preferable that the target image selected from the first face track is a face image in a normal state (for example, there is no occlusion, a front face, a better illumination condition, etc.), and therefore, it is required to perform the selection of the target image according to the first score output by the first neural network.
For example, the first face track may be a plurality of continuous frames containing faces of the same person in a video, and of course, there may be a plurality of persons in a video, and each face of each person may form a face track, where the first face track is not a specific face track, and if the images in each face track are pushed, the processing manner is similar, so that the first face track has generality.
The first face track may be extracted from the video in, but is not limited to, the following manner:
first, face detection is performed on video frames in a first video, and a plurality of first images are obtained. Wherein each first image corresponds to a face frame detected from the video frame. The face detection method is not limited, and may be, for example, FPN, SSD, fast R-CNN, or the like.
Next, the obtained plurality of first images are correlated to form a first face track. The main purpose of this step is to ensure that the first images in the first face track are all for the same person. For example, the first images detected in the adjacent frames with the cross-over ratio greater than a certain threshold may be associated by way of cross-over ratio tracking (Intersection over Union Tracking, IOU tracking for short).
In the first face track determined in the above manner, the first image is only a face frame in the video frame, not the whole video frame, so that the efficiency of subsequent processing of the first image is improved, and the target image is pushed to the server while occupying less bandwidth. Of course, it is not excluded that in some implementations the entire video frame is directly taken as the first image in the first face track.
Fig. 3 shows a flowchart of a model training method according to an embodiment of the present application. The method is used to train a first neural network and the device performing the method is referred to as a training device. The training device may be implemented using the architecture of the electronic device 700 of fig. 6, as will be described in detail below with respect to fig. 6. The training device may be a terminal device, a server, or other device. Of course, in view of performance of the terminal device, training of the first neural network is performed on the high-performance device, and after training, the first neural network is deployed on the terminal device to operate, so that a reasonable implementation mode is achieved. Referring to fig. 3, the method includes:
step S400: the training device acquires a plurality of second images.
Step S410: the training equipment inputs the plurality of second images into the first neural network respectively to score, and a plurality of second scores correspondingly output by the first neural network are obtained.
The above two steps are similar to steps S300 and S310, and differ mainly in that the second image is an image for training, and thus description will not be repeated. It should be noted that, in step S410, the first neural network is deployed on the training device, and as mentioned above, the training device may not be a terminal device, and only after the training is completed, the first neural network is deployed on the terminal device for use.
Step S420: the training device obtains a third score obtained by scoring the second image by the image processing algorithm.
Wherein the third score characterizes how well the second image is adapted to perform a preset image processing task. The introduction of step S310 has pointed out that the image processing algorithm deployed on the server may score the image to be processed, step S420 is similar, and the second image may be understood as the image to be processed, and thus the process is not repeated.
The image processing algorithm may be deployed on the server or on the training device when the first neural network is trained, which is not limited in this application, and the image processing algorithm is eventually deployed on the server. If the image processing algorithm is deployed on the server and the training device is not the server, the training device also needs to send the second image to the server, and the server uses the image processing algorithm to score the second image to obtain a third score and returns the third score to the training device.
Step S430: the training device takes the third score as a label corresponding to the second score, calculates the predicted loss of the first neural network according to the second score and the third score, and updates the parameters of the first neural network based on the predicted loss.
The third score and the second score are respectively scoring results of the image processing algorithm and the first neural network on the second image, and the third score is used as a label corresponding to the second score to update (i.e. train) parameters of the first neural network, so that when the trained first neural network is used for scoring the first image, the scoring results of the image processing algorithm deployed on the server can be simulated to score the first image, and therefore the trained first neural network can be used in the image pushing method provided by the embodiment of the application. In addition, the training method automatically generates the training label by using an image processing algorithm without manual labeling, and the training mode is simple and efficient. As for the back propagation algorithm for updating the model parameters according to the prediction loss, reference may be made to the prior art, which is not explained here.
The method of fig. 3 will be described again with reference to the face recognition scenario. In the face recognition scenario, the second image mentioned in the method of fig. 3 is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score characterizes the similarity between the second image and the face image of the base.
Further, in the face recognition scenario, the process of obtaining the third score by the face recognition algorithm may be implemented as follows:
firstly, inputting the second image into a pre-trained face recognition network to obtain the characteristics of the second image extracted by the face recognition network. Here, the face recognition network may be one neural network, such as the aforementioned second neural network.
Then, the distance between the features of the second image and the features of the face image of the base is calculated, wherein the features of the face image of the base can be extracted and stored in advance, the features of the image can be expressed in the form of vectors, and the distance can be referred to as an L2 distance between the pointing quantities, and represents the similarity degree between the two features, or the similarity degree between the two images with the features (the higher the similarity degree, the smaller the distance).
Finally, a third score is determined based on the smallest of the calculated distances. For example, a mapping relationship may be preset between the distance and the score, so that a corresponding third score may be obtained according to the determined minimum distance, and the third score may represent the similarity degree between the second image and the face image of the base because the third score is directly related to the value of the minimum distance.
Further, since there may be a plurality of the face images of the base, the last step calculates a plurality of distances, and determining the minimum distance from the distances is equivalent to determining the base image most similar to the second image. In other words, the person identity corresponding to the bottom library image is the recognition result of the face recognition performed by the second image, so that the third score calculated based on the minimum distance reflects the suitability of the second image for performing the face recognition task.
The above implementation provides a specific scoring method for the face recognition algorithm deployed on the server, although other scoring methods may be used for the face recognition algorithm. It should be further noted that, the face recognition algorithm deployed on the server may be an algorithm based on a neural network, or may not be an algorithm based on a neural network, and the model training method provided in the embodiment of the present application mainly uses the third score output by the algorithm, and is not limited to implementation of the algorithm.
Fig. 4 shows a functional block diagram of an image pushing device 500 according to an embodiment of the present application. Referring to fig. 4, the image pushing device 500 includes:
a first image acquisition module 510 for acquiring a plurality of first images;
The first scoring module 520 is configured to input the plurality of first images to a first neural network deployed on the terminal device to score the first images, so as to obtain a plurality of first scores corresponding to the first neural network; the scoring result of the image processing algorithm is used for representing the suitability of the image to be processed for executing a preset image processing task;
and the image pushing module 530 is configured to determine, as a target image, a first image corresponding to a target score that satisfies a preset rule from the plurality of first scores, and send the target image to the server, so that the server processes the target image by using the image processing algorithm.
In one implementation manner of the image pushing device 500, the first image is a face image, the image processing algorithm is a face recognition algorithm, the image to be processed is a face image to be recognized, the image processing task is a face recognition task, and a scoring result of the face recognition algorithm characterizes a similarity degree between the face image to be recognized and a face image of a base.
In one implementation of the image pushing device 500, the first image acquisition module 510 acquires a plurality of first images, including: a first face track is acquired, the first face track including a plurality of first images.
In one implementation of the image pushing device 500, the first image obtaining module 510 obtains a first face track, including: performing face detection on video frames in a first video to obtain a plurality of first images; wherein each first image corresponds to a face frame detected from the video frame; the plurality of first images are associated to form the first face track.
In one implementation of the image pushing device 500, the first neural network includes at least one convolutional layer and at least one fully-connected layer.
In one implementation of the image pushing device 500, the parameters of the first neural network and the features generated in the first neural network are both quantized.
In one implementation of the image pushing device 500, the preset rule includes taking a maximum value if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task.
The image pushing device 500 according to the embodiment of the present application has been described in the foregoing method embodiment, and for brevity, reference may be made to the corresponding content in the method embodiment for the purpose of description.
Fig. 5 shows a functional block diagram of a model training apparatus 600 provided in an embodiment of the present application. Referring to fig. 5, the model training apparatus 600 includes:
a second image acquisition module 610, configured to acquire a plurality of second images;
a second scoring module 620, configured to input the second image to a first neural network for deployment on a terminal device to score, and obtain a second score output by the first neural network;
a third scoring module 630, configured to obtain a third score obtained by scoring the second image by using an image processing algorithm deployed on a server; wherein the third score characterizes how well the second image is used to perform a preset image processing task;
and a parameter updating module 640, configured to take the third score as a corresponding label of the second score, calculate a predicted loss of the first neural network according to the second score and the third score, and update a parameter of the first neural network based on the predicted loss.
In one implementation of the model training apparatus 600, the second image is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score characterizes a similarity degree between the second image and the face image of the base.
In one implementation of the model training apparatus 600, the third scoring module 630 is configured to score the second image by using a face recognition algorithm deployed on a server, where the third scoring module includes: inputting the second image into a pre-trained face recognition network to obtain the characteristics of the second image extracted by the face recognition network; and calculating the distance between the features of the second image and the features of the face image of the base, and determining the third score based on the minimum distance in the calculated distances.
The model training apparatus 600 provided in the embodiments of the present application has been described in the foregoing method embodiments, and for brevity, reference may be made to the corresponding contents in the method embodiments where the apparatus embodiment portion is not mentioned.
Fig. 6 shows one possible structure of an electronic device 700 provided in an embodiment of the present application. Referring to fig. 6, an electronic device 700 includes: processor 710, memory 720, and communication interface 730, which are interconnected and communicate with each other by a communication bus 740 and/or other forms of connection mechanisms (not shown).
The Memory 720 includes one or more (Only one is shown in the figure), which may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), and the like. Processor 710, as well as other possible components, may access memory 720, read and/or write data therein.
Processor 710 includes one or more (only one shown) which may be an integrated circuit chip with signal processing capabilities. The processor 710 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a micro control unit (Micro Controller Unit, MCU), a network processor (Network Processor, NP), or other conventional processor; but may also be a special purpose processor including a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short), a field programmable gate array (Field Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Communication interface 730 includes one or more (only one shown) that may be used to communicate directly or indirectly with other devices for data interaction. Communication interface 730 may include an interface for wired and/or wireless communication.
One or more computer program instructions may be stored in memory 720 that may be read and executed by processor 710 to implement the image pushing method and/or the model training method provided by embodiments of the present application.
It is to be understood that the configuration shown in fig. 6 is illustrative only, and that electronic device 700 may also include more or fewer components than shown in fig. 6, or have a different configuration than shown in fig. 6. The components shown in fig. 6 may be implemented in hardware, software, or a combination thereof. The electronic device 700 may be a physical device such as a server, a PC, a notebook, a tablet, a cell phone, a wearable device, an image capturing device, a vehicle-mounted device, a drone, a robot, etc., or may be a virtual device such as a virtual machine, a virtualized container, etc. The electronic device 700 is not limited to a single device, and may be a combination of a plurality of devices or one or more clusters formed by a large number of devices. For example, the terminal device 100 in fig. 1 may be implemented by using the structure of the electronic device 700, and the training device in fig. 3 may also be implemented by using the structure of the electronic device 700.
The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores computer program instructions, and when the computer program instructions are read and executed by a processor of a computer, the image pushing method and/or the model training method provided by the embodiment of the application are executed. For example, the computer-readable storage medium may be implemented as memory 720 in electronic device 700 in FIG. 6.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application, and various modifications and variations may be suggested to one skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (14)

1. An image pushing method, which is applied to a terminal device, the method comprising:
acquiring a plurality of first images;
respectively inputting the plurality of first images into a first neural network deployed on the terminal equipment to score, and obtaining a plurality of first scores correspondingly output by the first neural network; the scoring result of the image processing algorithm is used for representing the suitability of the image to be processed for executing a preset image processing task;
And determining a first image corresponding to a target score meeting a preset rule in the first scores as a target image, and sending the target image to the server so that the server processes the target image by using the image processing algorithm.
2. The image pushing method according to claim 1, wherein the first image is a face image, the image processing algorithm is a face recognition algorithm, the image to be processed is a face image to be recognized, the image processing task is a face recognition task, and a scoring result of the face recognition algorithm characterizes a similarity degree between the face image to be recognized and a face image of a base.
3. The image pushing method according to claim 2, wherein the acquiring a plurality of first images includes:
a first face track is acquired, the first face track including a plurality of first images.
4. The image pushing method according to claim 3, wherein the acquiring the first face track includes:
performing face detection on video frames in a first video to obtain a plurality of first images; wherein each first image corresponds to a face frame detected from the video frame;
The plurality of first images are associated to form the first face track.
5. The image pushing method of claim 1, wherein the first neural network comprises at least one convolutional layer and at least one fully-connected layer.
6. The image pushing method according to claim 1, wherein the parameters of the first neural network and the features generated in the first neural network are quantized.
7. The image pushing method according to any one of claims 1-6, wherein the preset rule comprises taking a maximum value if the scoring result of the image processing algorithm is positively correlated with the suitability of the image to be processed for performing the image processing task.
8. A model training method for training a first neural network for use in the image pushing method of any one of claims 1-7, the method comprising:
acquiring a plurality of second images;
inputting the second image to the first neural network for deployment on terminal equipment to score, and obtaining a second score output by the first neural network;
obtaining a third score obtained by scoring the second image by an image processing algorithm deployed on a server; wherein the third score characterizes how well the second image is used to perform a preset image processing task;
And taking the third score as a label corresponding to the second score, calculating the predicted loss of the first neural network according to the second score and the third score, and updating the parameters of the first neural network based on the predicted loss.
9. The model training method of claim 8, wherein the second image is a face image, the image processing algorithm is a face recognition algorithm, the image processing task is a face recognition task, and the third score characterizes a degree of similarity between the second image and a base face image.
10. The model training method of claim 9, wherein obtaining a third score for scoring the second image for a face recognition algorithm deployed on a server comprises:
inputting the second image into a pre-trained face recognition network to obtain the characteristics of the second image extracted by the face recognition network;
and calculating the distance between the features of the second image and the features of the face image of the base, and determining the third score based on the minimum distance in the calculated distances.
11. An image pushing apparatus, characterized by being applied to a terminal device, comprising:
The first image acquisition module is used for acquiring a plurality of first images;
the first scoring module is used for respectively inputting the plurality of first images into a first neural network deployed on the terminal equipment to score so as to obtain a plurality of first scores correspondingly output by the first neural network; the scoring result of the image processing algorithm is used for representing the suitability of the image to be processed for executing a preset image processing task;
and the image pushing module is used for determining a first image corresponding to a target score meeting a preset rule in the first scores as a target image, and sending the target image to the server so that the server can process the target image by using the image processing algorithm.
12. A model training apparatus for training a first neural network for use in the image pushing method of any one of claims 1-7, the apparatus comprising:
the second image acquisition module is used for acquiring a plurality of second images;
the second scoring module is used for inputting the second image into the first neural network for deployment on terminal equipment to score so as to obtain a second score output by the first neural network;
A third scoring module, configured to obtain a third score obtained by scoring the second image by using an image processing algorithm deployed on a server; wherein the third score characterizes how well the second image is used to perform a preset image processing task;
and the parameter updating module is used for taking the third score as a label corresponding to the second score, calculating the predicted loss of the first neural network according to the second score and the third score, and updating the parameter of the first neural network based on the predicted loss.
13. A computer readable storage medium, having stored thereon computer program instructions which, when read and executed by a processor, perform the method of any of claims 1-10.
14. An electronic device, comprising: a memory and a processor, the memory having stored therein computer program instructions which, when read and executed by the processor, perform the method of any of claims 1-10.
CN202010263423.8A 2020-04-03 2020-04-03 Image pushing method, model training method and corresponding devices Active CN111488476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010263423.8A CN111488476B (en) 2020-04-03 2020-04-03 Image pushing method, model training method and corresponding devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010263423.8A CN111488476B (en) 2020-04-03 2020-04-03 Image pushing method, model training method and corresponding devices

Publications (2)

Publication Number Publication Date
CN111488476A CN111488476A (en) 2020-08-04
CN111488476B true CN111488476B (en) 2023-06-27

Family

ID=71810837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010263423.8A Active CN111488476B (en) 2020-04-03 2020-04-03 Image pushing method, model training method and corresponding devices

Country Status (1)

Country Link
CN (1) CN111488476B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882048A (en) * 2020-09-28 2020-11-03 深圳追一科技有限公司 Neural network structure searching method and related equipment

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10860898B2 (en) * 2016-10-16 2020-12-08 Ebay Inc. Image analysis and prediction based visual search
JP6854881B2 (en) * 2017-03-27 2021-04-07 株式会社日立国際電気 Face image matching system and face image search system
CN108229277B (en) * 2017-03-31 2020-05-01 北京市商汤科技开发有限公司 Gesture recognition method, gesture control method, multilayer neural network training method, device and electronic equipment
CN108228871A (en) * 2017-07-21 2018-06-29 北京市商汤科技开发有限公司 Facial image dynamic storage method and device, electronic equipment, medium, program
KR102056806B1 (en) * 2017-12-15 2019-12-18 주식회사 하이퍼커넥트 Terminal and server providing a video call service
US11481583B2 (en) * 2017-12-28 2022-10-25 Intel Corporation Algorithm management blockchain
KR102079091B1 (en) * 2018-01-31 2020-02-19 주식회사 하이퍼커넥트 Terminal and image processing method thereof
CN110363279B (en) * 2018-03-26 2021-09-21 华为技术有限公司 Image processing method and device based on convolutional neural network model
CN108830288A (en) * 2018-04-25 2018-11-16 北京市商汤科技开发有限公司 Image processing method, the training method of neural network, device, equipment and medium
CN110188627B (en) * 2019-05-13 2021-11-23 睿视智觉(厦门)科技有限公司 Face image filtering method and device
CN110516671B (en) * 2019-08-27 2022-06-07 腾讯科技(深圳)有限公司 Training method of neural network model, image detection method and device
CN110909815B (en) * 2019-11-29 2022-08-12 深圳市商汤科技有限公司 Neural network training method, neural network training device, neural network processing device, neural network training device, image processing device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Y. Yin等.《2017 International Joint Conference on Neural Networks (IJCNN)》.2017,全文. *
数字图像库中彩色图像信息特征检索优化;陈烽;;计算机仿真(第08期);全文 *

Also Published As

Publication number Publication date
CN111488476A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN110807385B (en) Target detection method, target detection device, electronic equipment and storage medium
CN109697434B (en) Behavior recognition method and device and storage medium
US20180114071A1 (en) Method for analysing media content
CN113936256A (en) Image target detection method, device, equipment and storage medium
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
CN112200041B (en) Video motion recognition method and device, storage medium and electronic equipment
US20200012887A1 (en) Attribute recognition apparatus and method, and storage medium
CN111652181B (en) Target tracking method and device and electronic equipment
CN112633159B (en) Human-object interaction relation identification method, model training method and corresponding device
CN112052837A (en) Target detection method and device based on artificial intelligence
CN111723773B (en) Method and device for detecting carryover, electronic equipment and readable storage medium
CN109325539B (en) Insulator string drop detection method and device
JP2024511171A (en) Action recognition method and device
CN109783680A (en) Image method for pushing, image acquiring method, device and image processing system
CN114782859B (en) Method for establishing target behavior perception space-time positioning model and application
CN109740527B (en) Image processing method in video frame
CN111488476B (en) Image pushing method, model training method and corresponding devices
CN112084371B (en) Movie multi-label classification method and device, electronic equipment and storage medium
CN113570615A (en) Image processing method based on deep learning, electronic equipment and storage medium
CN116543333A (en) Target recognition method, training method, device, equipment and medium of power system
CN113591865B (en) Loop detection method and device and electronic equipment
CN115512207A (en) Single-stage target detection method based on multipath feature fusion and high-order loss sensing sampling
CN114998814A (en) Target video generation method and device, computer equipment and storage medium
CN113537359A (en) Training data generation method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant