CN110569721B

CN110569721B - Recognition model training method, image recognition method, device, equipment and medium

Info

Publication number: CN110569721B
Application number: CN201910706615.9A
Authority: CN
Inventors: 王健宗; 彭俊清; 瞿晓阳
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2023-08-29
Anticipated expiration: 2039-08-01
Also published as: CN110569721A; WO2021017261A1

Abstract

The invention discloses a recognition model training method, an image recognition device, equipment and a medium. The recognition model training method comprises the following steps: acquiring an original positive sample image and an original negative sample image carrying first labeling information; performing downsampling on the original positive sample image and the original negative sample image, and performing screenshot processing by using a screenshot tool to obtain a target positive sample image and a target negative sample image carrying second labeling information; inputting the target positive sample image and the target negative sample image into an MB-FCN model for model training, and obtaining a target MB-FCN detector; taking an original positive sample image and a target positive sample image which are matched with a current image identifier in the first annotation information and a source image identifier in the second annotation information as a group of target training data; and inputting the target training data into the GAN model for model training to obtain the target GAN model. The target GAN model can be helpful for improving the accuracy of fuzzy image face recognition.

Description

Recognition model training method, image recognition method, device, equipment and medium

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to a recognition model training method, an image recognition device, an apparatus, and a medium.

Background

Along with the rapid development of the face detection technology, the accuracy of the face detector for recognizing large and medium-sized face images is rapidly improved. In the practical application process, the face detection technology generally needs to identify smaller or more blurred face images so as to identify the identities of people corresponding to the face images, but the traditional face detector has lower accuracy in identifying the smaller or more blurred face images, and affects the practical application effect. For example, in a scene of crime suspects tracking, missing person searching or other person searching, a face detector may be used to identify a monitored image shot by a monitoring device to determine the trace of the crime suspects, missing persons or other target persons, and due to limitations of factors such as shooting pixels or shooting angles of the monitoring device, the faces of the crime suspects, missing persons or other target persons in the monitored image are smaller or more fuzzy, and when the conventional face detector is used for identification, the identities of the crime suspects, missing persons or other target persons cannot be accurately identified, so that the tracking efficiency of the target persons is affected.

Disclosure of Invention

The embodiment of the invention provides a recognition model training method, an image recognition device, equipment and a medium, which are used for solving the problem of lower accuracy rate when a current face detector recognizes a smaller or more blurred face image.

A method of training an identification model, comprising:

acquiring an original positive sample image and an original negative sample image carrying first annotation information, wherein the first annotation information comprises a current image identifier, a face identifier and a first fuzzy identifier;

performing downsampling processing on the original positive sample image and the original negative sample image to respectively obtain a corresponding positive sample thumbnail and a corresponding negative sample thumbnail;

performing screenshot processing on the positive sample thumbnail and the negative sample thumbnail by using a screenshot tool to obtain a target positive sample image and the target negative sample image carrying second annotation information, wherein the second annotation information comprises a source image identifier, a face identifier and a second fuzzy identifier;

inputting the target positive sample image and the target negative sample image with different face identifications into an MB-FCN model for model training to obtain a target MB-FCN detector;

The original positive sample image and the target positive sample image which are matched with the current image identification in the first annotation information and the source image identification in the second annotation information are used as a group of target training data;

and inputting the target training data into a GAN model for model training to obtain a target GAN model, wherein a generating network in the target GAN model is a generating network formed based on a super-resolution reconstruction technology.

An identification model training apparatus comprising:

the system comprises an original sample image acquisition module, a first image acquisition module and a second image acquisition module, wherein the original sample image acquisition module is used for acquiring an original positive sample image and an original negative sample image carrying first annotation information, and the first annotation information comprises a current image identifier, a face identifier and a first fuzzy identifier;

the sample thumbnail acquisition module is used for carrying out downsampling on the original positive sample image and the original negative sample image to respectively acquire a corresponding positive sample thumbnail and a corresponding negative sample thumbnail;

the target sample image acquisition module is used for carrying out screenshot processing on the positive sample thumbnail and the negative sample thumbnail by adopting a screenshot tool to acquire a target positive sample image and the target negative sample image carrying second annotation information, wherein the second annotation information comprises a source image identifier, a face identifier and a second fuzzy identifier;

The target detector acquisition module is used for inputting the target positive sample image and the target negative sample image with different face identifications into an MB-FCN model for model training to acquire a target MB-FCN detector;

the target training data acquisition module is used for taking the original positive sample image and the target positive sample image which are matched with the current image identification in the first labeling information and the source image identification in the second labeling information as a group of target training data;

the target GAN model acquisition module is used for inputting the target training data into a GAN model for model training to acquire a target GAN model, and the generation network in the target GAN model is a generation network formed based on a super-resolution reconstruction technology.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above-described recognition model training method when executing the computer program.

A computer readable storage medium storing a computer program which when executed by a processor implements the recognition model training method described above.

An image recognition method, comprising:

acquiring an image to be identified, and carrying out face detection and image interception on the image to be identified by adopting the target MB-FCN detector to acquire a target face image;

generating the target face image by adopting a generating network formed based on the super-resolution reconstruction technology in the target GAN model to acquire a target generated image;

performing feature similarity calculation on the target generated image and a target person image in a target image library to obtain feature similarity;

and if the feature similarity is larger than a similarity threshold, acquiring a detection result of determining the person corresponding to the target face image as the target person.

An image recognition apparatus comprising:

the target face image acquisition module is used for acquiring an image to be identified, carrying out face detection and image interception on the image to be identified by adopting the target MB-FCN detector, and acquiring a target face image;

the target generation image acquisition module is used for generating the target face image by adopting a generation network formed based on the super-resolution reconstruction technology in the target GAN model to acquire a target generation image;

The feature similarity acquisition module is used for carrying out feature similarity calculation on the target generated image and the target personnel image in the target image library to acquire feature similarity;

and the detection result acquisition module is used for acquiring a detection result of determining the person corresponding to the target face image as the target person if the feature similarity is greater than a similarity threshold.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the above image recognition method when executing the computer program.

A computer readable storage medium storing a computer program which, when executed by a processor, implements the image recognition method described above.

In the recognition model training method, the device, the equipment and the medium, the original positive sample image and the original negative sample image carrying the first labeling information are subjected to downsampling processing and image interception operation to obtain the target positive sample image and the target negative sample image carrying the second labeling information, so that the original positive sample image and the target positive sample image have an association relationship, and the realization of subsequent model training is facilitated. And carrying out MB-FCN model training by adopting a target positive sample image and a target negative sample image with different face marks and second fuzzy marks, so that the target MB-FCN model obtained by training can be used for rapidly and accurately comparing the face images in the fuzzy images. And taking the original positive sample image and the target positive sample image which are matched with the current image identification in the first annotation information and the source image identification in the second annotation information as a group of target training data, and performing GAN model training by utilizing the target training data to update model parameters in a generating network and a judging network formed based on a super-resolution reconstruction technology, so that the generated target GAN model can reconstruct a smaller or more blurred low-resolution image into a clearer super-resolution image so as to perform face recognition by utilizing the super-resolution image, thereby guaranteeing the accuracy and efficiency of face recognition.

According to the image recognition method, device, equipment and medium, the target MB-FCN model is adopted to perform face detection and image interception on the image to be processed with low resolution, so that the target face image containing the face is obtained, the data size of subsequent image reconstruction and image recognition is reduced, and the recognition efficiency is improved. Then, a generating network formed by a super-resolution reconstruction technology of the target GAN model is adopted to process the target face image so as to obtain a clearer target generating image with super resolution, so that the recognition accuracy is higher and the recognition efficiency is faster when the target generating image is subsequently recognized. And finally, comparing the feature similarity calculated by the target generated image and the target person image with a similarity threshold value to acquire a detection result of whether the person corresponding to the target face image is a target person or not, thereby realizing quick tracking of the target person.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of a recognition model training method or an image recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of training an identification model in accordance with an embodiment of the present invention;

FIG. 3 is another flow chart of a method of training a recognition model in an embodiment of the invention;

FIG. 4 is another flow chart of a method of training a recognition model in an embodiment of the invention;

FIG. 5 is another flow chart of a method of training a recognition model in an embodiment of the invention;

FIG. 6 is another flow chart of a method of training a recognition model in an embodiment of the invention;

FIG. 7 is another flow chart of an image recognition method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an apparatus for training a recognition model according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an image recognition apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The recognition model training method provided by the embodiment of the invention can be applied to an application environment shown in fig. 1. Specifically, the recognition model training method is applied to a recognition model training system, the recognition model training system comprises a client and a server as shown in fig. 1, and the client and the server are communicated through a network and are used for training a recognition model capable of accurately recognizing smaller or more fuzzy face images so as to expand the application of the recognition model in the field of image recognition. The client is also called a client, and refers to a program corresponding to the server for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a recognition model training method is provided, taking the application of the recognition model training method in the server shown in fig. 1 as an example, the recognition model training method includes the following steps:

s201: and acquiring an original positive sample image and an original negative sample image carrying first annotation information, wherein the first annotation information comprises a current image identifier, a face identifier and a first fuzzy identifier.

Wherein the original positive sample image is a clearer image containing a human face for model training. The original negative-sample image is a clearer image for model training that does not contain a human face.

The first labeling information is labeling information corresponding to the original positive sample image and the original negative sample image, and is information for identifying image attributes of the original positive sample image and the original negative sample image. In this embodiment, after the server obtains the original positive sample image and the original negative sample image, the original positive sample image and the original negative sample image need to be labeled by adopting a preset labeling rule, so that each original positive sample image and each original negative sample image carry corresponding first labeling information. The preset labeling rule is a rule for labeling a current image identifier, a source image identifier, a face identifier and a fuzzy identifier corresponding to the image.

The current image identifier is a unique identifier for identifying the image to be marked, and can be determined according to the sequence number obtained by the image, for example, T001 represents the 1 st image, and T002 represents the 2 nd image. The source image identifier is an identifier for identifying a source image corresponding to an image to be annotated, and can be understood as an identifier corresponding to a source image of an original positive sample image or an original negative sample image, for example, P001 represents a 1 st source image, and P002 represents a 2 nd source image. The face identification refers to identification for distinguishing whether an image to be marked contains a face or not, and the face identification comprises a first face identification for indicating that the image contains the face and a second face identification for indicating that the image does not contain the face. For example, a first face identification may be represented by 1 and a second face identification may be represented by 0. The blurred identity is an identity for distinguishing whether an image to be annotated is a blurred image or not, and the blurred identity comprises a first blurred identity for indicating that the image is not blurred and a second blurred identity for indicating that the image is blurred, namely the first blurred identity is an identity for indicating that the image is clearer. For example, a first ambiguous identifier may be represented by 0 and a second ambiguous identifier may be represented by 1.

Because the original positive sample image is a clearer image containing a human face for model training, the first labeling information carried by the original positive sample image comprises a current image identifier, a source image identifier, a first human face identifier and a first fuzzy identifier. For example, { T001, P001,1,0} means that the image identified as T001 for the current image is a sharper image containing a human face taken from the source image identified as P001 for the source.

Because the original negative sample image is a clearer image which is used for model training and does not contain a human face, the first annotation information carried by the original negative sample image comprises a current image identifier, a source image identifier, a second human face identifier and a first fuzzy identifier. For example, { T002, P002, 0} means that the image identified as T002 for the current image is a sharper image containing no faces taken from the source image identified as P002 for the source.

Further, in order to avoid interference caused by inconsistent sizes of sample images in the subsequent recognition model training process, the accuracy and efficiency of the finally formed recognition model training are affected, and the sizes of the original positive sample image and the original negative sample image can be limited to be consistent. For example, an original positive sample image containing a face and an original negative sample image excluding a face, which are identical in image size, are taken from the same clearer image.

S202: and performing downsampling processing on the original positive sample image and the original negative sample image to respectively obtain a corresponding positive sample thumbnail and a corresponding negative sample thumbnail.

Among them, downsampling (Subsampled), also called downsampling (downsampling), is a process for reducing an image so that the image conforms to the size of a display area or for generating a thumbnail of the image. In the downsampling process, if the size of an image I is m×n, s times downsampling is performed on the image I to obtain a resolution image with a size of (M/s) ×n/s, where s is a common divisor of M and N. For an image in matrix form, the image within the s-s window of the image needs to be changed into a pixel, and the value of the pixel point is the average value of all pixels within the window. In this embodiment, methods such as nearest neighbor interpolation, bilinear interpolation, mean interpolation, and median interpolation may be used in the downsampling process.

Specifically, the server performs downsampling processing on the original positive sample image and the original negative sample image to different degrees so as to reduce the original positive sample image and the original negative sample image into positive sample thumbnails and negative sample thumbnails with different scaling ratios, and the blurring processing purpose is achieved by reducing the resolution ratio of the positive sample thumbnail and the negative sample thumbnail. For example, the original positive and negative sample images may be reduced to 20% positive and negative sample thumbnails, respectively.

S203: and carrying out screenshot processing on the positive sample thumbnail and the negative sample thumbnail by adopting a screenshot tool to obtain a target positive sample image and a target negative sample image carrying second labeling information, wherein the second labeling information comprises a source image identifier, a face identifier and a second fuzzy identifier.

Specifically, the server firstly adopts a screenshot tool to perform screenshot processing on the positive sample thumbnail and the negative sample thumbnail, and obtains corresponding target positive sample images and target negative sample images, so that the images of the obtained target positive sample images and target negative sample images are smaller or more blurred, and the images are used as training data for training an identification model capable of identifying the smaller or more blurred images. And labeling the target positive sample image and the target negative sample image to obtain the target positive sample image and the target negative sample image carrying the second labeling information. The screenshot tools include, but are not limited to, openCV screenshot tools employed in this embodiment, openCV (Open Source Computer Vision Library ) is a cross-platform computer vision library issued based on BSD permissions (open source) that can run on Linux, windows, android and Mac OS operating systems. The system is lightweight and efficient, is composed of a series of C functions and a small number of C++ classes, provides interfaces of Python, ruby, MATLAB and other languages, and realizes a plurality of general algorithms in the aspects of image processing and computer vision.

In this embodiment, the second labeling information carried by the target positive sample image includes a current image identifier, a source image identifier, a first face identifier and a second blur identifier. For example, the second annotation information of the target positive sample image obtained based on the original positive sample image whose first annotation information is { T001, P001,1,0} is (M001, T001, 1}, i.e., the image whose current image is identified as M001 is a blurred image containing a human face obtained from the source image whose source is identified as T001, where the source image whose source is identified as T001 refers to the original positive sample image whose current image is identified as T001.

Correspondingly, the second annotation information carried by the target negative sample image comprises a current image identifier, a source image identifier, a second face identifier and a second fuzzy identifier. For example, the second labeling information of the target negative image acquired based on the original negative image whose first labeling information is { T002, P002, 0} is { M002, T002,0,1}, that is, the image whose current image is identified as M002 is a blurred image containing a human face acquired from the source image whose source is identified as T002, where the source image whose source is identified as T002 refers to the original negative image whose current image is identified as T002. It can be appreciated that the current image identifier T002 in the first labeling information of the original negative-sample image is matched with the source image identifier T002 in the second labeling information of the target negative-sample image, which indicates that the target negative-sample image is an image obtained after blurring processing based on a clearer original negative-sample image.

S204: and inputting the target positive sample image and the target negative sample image with different face identifications into an MB-FCN model for model training to obtain a target MB-FCN detector.

Wherein the target MB-FCN (Multi-branch fully convolutional network, multi-branch full convolutional network) detector is a detector for identifying whether the more blurred image is a face image or not, so that the target MB-FCN detector is subsequently utilized to identify smaller or more blurred faces from the image to be identified. Since the target MB-FCN detector is a detector for identifying whether the blurred image is a face image, training with the blurred images including a face and without a face is required in order to train the detector, a target positive sample image (i.e., the blurred image including a face) carrying the first face identification and the second blurred identification is required as a positive sample of the target MB-FCN detector, and a target negative sample image (i.e., the blurred image without a face) carrying the second face identification and the second blurred identification is required as a negative sample of the target MB-FCN detector.

Compared with a network used by a common CNN classification model, which is connected with a full connection layer after a convolution layer, the FCN (fully convolutional network) can accept an input image with any size by compressing an original two-dimensional matrix into a dimensional output vector (namely a classification label), and the deconvolution layer is adopted to up-sample the feature map of the last convolution layer so as to restore the feature map to the same size of the input image, so that a prediction can be generated for each pixel, meanwhile, the spatial information in the original input image is reserved, and finally, the pixel-by-pixel classification is carried out on the up-sampled feature map. The FCN changes the fully connected layer of the CNN to a convolutional layer, so that the entire network becomes a convolutional layer-and-pooling layer-only network, which is then called a fully convolutional network. FCNs are typically used to semantically segment images, and therefore, it is necessary to classify each pixel on an image, which requires an upsampling to upsample the resulting output to the size of the original image, thereby supporting semantic segmentation, so that FCN-based detectors have higher recognition accuracy and higher recognition efficiency than CNN detectors.

In order to further improve the recognition accuracy and recognition efficiency of the detector, in this embodiment, an MB-FCN (Multi-branch fully convolutional network, multi-branch full convolutional network) model is used to perform training instead of the FCN model, so as to obtain the target MB-FCN detector. The MB-FCN can use feature extractors with different scales to extract richer features, compared with the feature extracted by the FCN, the feature extraction method has better characterization effect, and is beneficial to higher recognition accuracy of the model. In addition, the MB-FCN considers the efficiency and effectiveness in the design process, the facial features in all the proportion ranges can be processed by only one pass through the backbone network, and compared with a method that the FCN needs to be transmitted for many times, the MB-FCN saves calculation, is more effective, and enables the recognition efficiency of the model to be faster.

S205: and taking the original positive sample image and the target positive sample image, which are matched with the current image identification in the first annotation information and the source image identification in the second annotation information, as a group of target training data.

The target training data is training data for training the GAN model. In this embodiment, the first labeling information carried by the original positive sample image includes a current image identifier, a source image identifier, a first face identifier and a first blur identifier, that is, the original positive sample image is a clearer image including a face. The second labeling information carried by the target positive sample image comprises a current image identifier, a source image identifier, a first face identifier and a second blurring identifier, namely the target positive sample image is a blurred image containing a face. If the current image identification in the first labeling information carried by the original positive sample image is matched with the source image identification in the second labeling information carried by the target positive sample image, the target positive sample image is an image obtained after blurring processing based on the clearer original positive sample image.

In this embodiment, the original positive sample image and the target positive sample image in which the current image identifier in the first labeling information is matched with the source image identifier in the second labeling information are used as a set of target training data, so that the original positive sample image and the target positive sample image in each set of target training data are two images with different sharpness based on the same face image, and in the subsequent GAN model training process, the blurred target positive sample image can be verified by using the clearer original positive sample image to update the model parameters in the GAN model.

S206: inputting target training data into a GAN model for model training, and obtaining a target GAN model, wherein a generating network in the target GAN model is a generating network formed based on a super-resolution reconstruction technology.

The generated countermeasure network (Generative Adversarial Networks, abbreviated as GAN) is a deep learning model, and is an unsupervised learning network. The Generation Antagonism Network (GAN) framework includes two sub-networks, one is a generation network (generation) and the other is a discrimination network (discrimination), and a relatively good output is generated by the generation network and the discrimination network for game learning with each other. In this embodiment, in the generation network of the GAN model, the facet is sampled to a fine scale using a super resolution reconstruction (Super Resolution Reconstruction, hereinafter referred to as SPN) technique to find a finer facet; correspondingly, in the discrimination network, a loss function is adopted, and whether the image contains a human face or not is forcedly discriminated by the loss function, namely, whether the human face or the non-human face is discriminated.

Specifically, the server inputs each set of target training data into the GAN model for model training, i.e., inputs the clearer original positive sample data and the blurred target positive sample image of the same content into the GAN model for training. In the training process of the GAN model, a generating network formed based on a super-resolution reconstruction technology is adopted to reconstruct a relatively blurred target positive sample image, then the generated image and an original positive sample image are input into a judging network of the GAN for processing, and the output value of the judging network is utilized for iterative updating, so that model parameters corresponding to the generating network and the judging network of the GAN model are updated, and the target GAN model is obtained, so that the target GAN model is ensured to reconstruct a relatively small or relatively blurred low-resolution image into a relatively clear super-resolution image, and the super-resolution image is utilized for face recognition, so that the accuracy and the efficiency of face recognition are ensured.

In a specific embodiment, the server counts the target number corresponding to the target training data, and if the target number is greater than the number threshold, divides the target training data into at least two target batches of target training data, so that the target number of the target training data of each target batch is less than the number threshold. And then inputting the target training data into the GAN model in batches for model training, and updating model parameters in the GAN model by adopting a small batch gradient descent algorithm to obtain the target GAN model. The process of carrying out GAN model training on the target training data in batches can avoid the condition that the sample set of the target training data is too large and the efficiency and the accuracy of the GAN model training are affected when the display limit of the GPU of the system is limited.

The small batch gradient descent (Mini-batch Gradient Descent, MBGD for short) algorithm is a processing method for updating model parameters by accumulating errors generated in a training process in batches according to preset batches to obtain accumulated errors corresponding to a plurality of batches and adopting the accumulated errors corresponding to the batches. When model parameters such as weight and offset in the neural network are adjusted, the minimum value of the loss function is required to be obtained, and in the embodiment, the minimum value of the loss function can be calculated by adopting a backward propagation algorithm. A Back Propagation algorithm (BP algorithm) is a training and learning method in neural network learning, and is used to adjust weights and offsets between nodes in the neural network.

In the recognition model training method provided by the embodiment, the original positive sample image and the original negative sample image carrying the first annotation information are subjected to downsampling processing and image interception operation so as to obtain the target positive sample image and the target negative sample image carrying the second annotation information, so that the original positive sample image and the target positive sample image have an association relationship, and the realization of subsequent model training is facilitated. And carrying out MB-FCN model training by adopting a target positive sample image and a target negative sample image with different face marks and second fuzzy marks, so that the target MB-FCN model obtained by training can be used for rapidly and accurately comparing the face images in the fuzzy images. The method comprises the steps of taking an original positive sample image and a target positive sample image, which are matched with a current image identifier in first annotation information and a source image identifier in second annotation information, as a group of target training data, performing GAN model training by using the target training data to update model parameters in a generating network and a judging network formed based on a super-resolution reconstruction technology, so that the generated target GAN model can reconstruct a smaller or more blurred low-resolution image into a clearer super-resolution image, and face recognition is performed by using the super-resolution image, thereby guaranteeing the accuracy and efficiency of face recognition.

In one embodiment, as shown in fig. 3, step S201, namely, acquiring an original positive sample image and an original negative sample image carrying first labeling information, specifically includes the following steps:

s301: a training sample image including a face is obtained from an image database.

The image database is a database for storing training sample images containing human faces. The image database can be a system local database, namely a local database connected with the server, so that the server can quickly acquire corresponding training sample images containing the human face. Alternatively, the image database may be a network image database, i.e. a database on the internet for storing image information, such as a hundred degree image database. The training sample image is an unprocessed image for model training.

In one embodiment, if the image database is a network image database, the server needs to use a crawler tool to capture training sample images including faces from the network image database. The crawler tool is a web crawler (also called web spider or web robot), and is a program or script for automatically capturing web information according to a certain rule. The crawler tools include, but are not limited to, python crawler tools. That is, the crawler tool is a tool for crawling an image satisfying a specific condition from the internet, which may be set to include a human face.

Specifically, the server executes the crawler file by adopting a crawler tool, and crawls training sample images meeting the data crawling conditions set by the crawler file from the network image library. Wherein the crawler file includes, but is not limited to, two data crawling conditions, a target URL and a search keyword. The target URL is a URL of a target website of the web image library in the crawler file, where the target website is used to define the web image library where the required crawled image is located, and the URL (Uniform Resource Locator, which is abbreviated as a uniform resource locator) is a concise representation of a location and an access method of a resource available from the internet, and is an address of a standard resource on the internet, that is, a URL address of the web image library. The search keywords refer to keywords in the crawler file, which are used for limiting the common characteristics of the images required to be crawled by the crawler file, and can be specifically set as faces.

S302: and carrying out fuzzy detection on the training sample image by adopting a fuzzy detection algorithm to obtain the corresponding ambiguity of the training sample.

In the training process of the recognition model, a clearer image is required to be adopted to verify a blurred image in the training process of the recognition model, and an original positive sample image and an original negative sample image are clearer sample images, so that the ambiguity of the training sample image needs to be detected in advance, and the training sample image is filtered based on the ambiguity so as to obtain an effective sample image capable of carrying out model training. It will be appreciated that the blur detection algorithm is an algorithm for detecting the blur degree of an image, and may employ a detection algorithm commonly used in the art.

S303: based on the ambiguity of the training sample image, a valid sample image is determined.

Specifically, the server compares the ambiguity of the training sample image with an ambiguity threshold preset by the system; if the ambiguity of the training sample image is smaller than the ambiguity threshold value, determining the training sample image as a valid sample image; if the ambiguity of the training sample image is not less than the ambiguity threshold, the training sample image is not used as an effective sample image, so that the purpose of filtering the blurred training sample image is achieved, the definition of the finally determined effective sample image reaches the preset standard, and the accuracy of the subsequent recognition model training is guaranteed. The blur threshold is a threshold which is preset and used for evaluating whether the blur degree of the image reaches a preset standard. The effective sample image is a training sample image with the ambiguity less than the ambiguity threshold, and is a source image which can be taken as a subsequent image.

S304: and carrying out face detection on the effective sample image by adopting a face detection algorithm to obtain the size of a face area.

The face detection algorithm is an algorithm for identifying whether the image contains a face, and in this embodiment, an industry-general face detection algorithm may be used. Specifically, the server performs face detection on the effective sample image by adopting a face detection algorithm to identify a face area in a face circumscribed rectangular frame, and determines the size of the face area according to the face circumscribed rectangular frame. The human face circumscribed rectangular frame is a rectangular frame for detecting a human face region in an image by adopting a human face detection algorithm. It can be understood that the face circumscribed rectangular frame is a virtual face frame for selecting a face area identified by a face detection algorithm, and the size of the face circumscribed rectangular frame is adjusted automatically along with the size of the face area. It will be appreciated that the size of the face region may be determined by the length and height of the rectangular frame circumscribed by the face.

S305: and if the size of the face area is larger than the size of the preset area, intercepting an original positive sample image and an original negative sample image corresponding to the standard area from the effective sample image.

The preset area size is the smallest area size which is preset and can be cut out as an original positive sample image. The standard region size is a region size for truncating the original positive sample image and the original negative sample image. The preset area size may be the same as or smaller than the standard area size.

When the face detection algorithm is adopted to detect the effective sample image in the step S304, a face external rectangular frame is adopted to select all face areas containing faces, if the face area corresponding to the face area selected by the face external rectangular frame is smaller in size, the face area is indicated to be a small face image, and if the small face image is intercepted as an original positive sample image to carry out subsequent recognition model training, the accuracy of recognition model training is affected because the original positive sample image intercepted based on the small face image is insufficient in definition and cannot be used as a verification image of the target positive sample image after downsampling and thumbnail interception. Therefore, after the server adopts a face detection algorithm to carry out face detection on the effective sample image and obtains the size of the face area corresponding to each face area, the size of the face area is required to be compared with the size of a preset area; if the size of the face area is larger than the size of the preset area, the original positive sample image and the original negative sample image corresponding to the standard area are intercepted from the effective sample image, so that the definition of the intercepted original positive sample image and original negative sample image is ensured, and the accuracy of the recognition model obtained by training is ensured; if the size of the face area is not larger than the size of the preset area, the original positive sample image and the original negative sample image are not required to be intercepted from the effective sample image.

It can be understood that the original positive sample image and the original negative sample image corresponding to the standard area are intercepted from the effective sample image, so that the sizes of the finally obtained original positive sample image and the finally obtained original negative sample image are consistent, interference caused by inconsistent sizes of the sample images in the training process of the recognition model is avoided, and the accuracy rate and the training efficiency of the training of the recognition model are ensured.

In this embodiment, the capturing of the original positive sample image and the original negative sample image corresponding to the standard area size from the effective sample image specifically includes: the server captures a face region and a non-face region in the effective sample image by using an OpenCV tool, and acquires an original positive sample image and an original negative sample image corresponding to the standard region in size. In this embodiment, the server performs screenshot processing on coordinates of 4 angles of a face region and a non-face region in an effective sample image through an OpenCV tool to obtain a corresponding original positive sample image and an original negative sample image, and performs screenshot processing through the OpenCV tool, so that the method is simple in calculation, higher in operation efficiency and more stable in performance.

S306: and labeling the original positive sample image and the original negative sample image to obtain the original positive sample image and the original negative sample image carrying the first labeling information.

Specifically, after the server acquires the original positive sample image and the original negative sample image, the original positive sample image and the original negative sample image need to be marked by adopting a preset marking rule so as to acquire the original positive sample image and the original negative sample image carrying the first marking information. The first annotation information carried by the original positive sample image comprises a current image identifier, a source image identifier, a first face identifier and a first fuzzy identifier. The second annotation information carried by the original negative sample image comprises a current image identifier, a source image identifier, a second face identifier and a first fuzzy identifier.

In the recognition model training method provided by the embodiment, the blur detection is performed on the training sample image so as to filter based on the blur degree of the training sample image to obtain an effective sample image with the blur degree reaching the standard, thereby guaranteeing the definition of obtaining the original positive sample image and the original negative sample image based on the effective sample image and guaranteeing the recognition accuracy of the recognition model obtained by training. Then, when the size of the face area in the effective sample image is larger than the size of the preset area, the original positive sample image and the original negative sample image are intercepted from the effective sample image, so that the definition of the intercepted original positive sample image and original negative sample image is ensured, and the accuracy of the recognition model obtained through training is ensured.

In one embodiment, as shown in fig. 4, step S302, namely, performing blur detection on a training sample image by using a blur detection algorithm, obtains a corresponding blur degree of the training sample image, specifically includes the following steps:

s401: and sharpening the training sample image by using the Laplacian operator to obtain a sharpened image and pixel gray values of the sharpened image.

Wherein the laplace operator (Laplacian operator) is a second order differential operator adapted to improve image blur due to diffuse reflection of light. The principle is that in the process of shooting and recording an image, a light spot diffusely reflects light to the surrounding area, and the degree of blurring of the image caused by the diffuse reflection of the light is often a constant multiple of the Laplace operator compared with the image shot under normal conditions. In this embodiment, the training sample image is sharpened by using a laplace operator, so as to obtain a sharpened image, which specifically includes: and processing the training sample image by using a Laplace operator to obtain a Laplace image describing the gray mutation, and overlapping the Laplace image with the training sample image to obtain a sharpened image. After the sharpened image is acquired, the RGB value of each pixel point in the sharpened image is acquired, and the RGB value is processed to acquire the pixel gray value corresponding to the sharpened image.

S402: and carrying out variance calculation on the pixel gray values of the sharpened image, obtaining a target variance value corresponding to the sharpened image, and determining the target variance value as the ambiguity corresponding to the training sample image.

In this embodiment, variance calculation is performed on the pixel gray values of the sharpened image to obtain the target variance values thereof, and the target variance values can be understood as the ambiguity of the sharpened image. Specifically, performing variance calculation on the pixel gray value of the sharpened image specifically includes: and calculating the square sum of the pixel gray value of each pixel point in the sharpened image minus the average gray value of the sharpened image, and dividing the square sum by the number of the pixel points to obtain the target variance value capable of reflecting the blurring degree of the sharpened image. In this embodiment, the smaller the target variance value, the closer the pixel gray value of each pixel in the sharpened image is to the average gray value, the less the difference between the pixel gray values of each pixel in the sharpened image is, so that the image edge is unclear, and therefore, the smaller the target variance value, the more blurred the sharpened image.

In the recognition model training method provided by the embodiment, the original image is sharpened by the Laplacian first so as to obtain a sharpened image with clearer details than the training sample image, thereby improving the definition of the image. Then, the target variance value of the sharpened image is calculated to represent the difference between the pixel gray values of the pixels of the sharpened image. And taking the target variance value of the sharpened image as the ambiguity of the training sample image so as to achieve the purpose of carrying out fuzzy filtering on the training sample image according to the comparison result of the ambiguity and a preset threshold value, thereby achieving the purpose of obtaining a clearer training sample image.

In one embodiment, as shown in fig. 5, step S204, namely, inputting a target positive sample image and a target negative sample image with different face identifications into an MB-FCN model for model training, and obtaining a target MB-FCN detector, specifically includes the following steps:

s501: and taking the target positive sample image carrying the first face mark and the target negative sample image carrying the second face mark as model training data based on the positive and negative sample distribution proportion.

The positive and negative sample distribution proportion is a proportion preset by the system and used for distributing positive samples and negative samples. The setting of the positive and negative sample proportion is to avoid the phenomenon of overfitting in the model training process. For example, the positive and negative sample ratios in the present embodiment may be set to be equal ratios, i.e., positive sample numbers: negative sample number = 1:1, may be set to other values not exceeding the proportional range threshold. The scale range threshold is a scale range that may cause the model to train through the fit.

Specifically, the server acquires a corresponding target positive sample image carrying the first face mark and a corresponding target negative sample image carrying the second face mark based on the positive and negative sample distribution proportion, and takes the target positive sample image and the target negative sample image as model training data so as to carry out subsequent MB-FCN model training. It will be appreciated that the model training data is a relatively blurred target positive sample image containing a face and a target negative sample image not containing a face.

S502: based on the training test allocation proportion, model training data is divided into a training set and a test set.

The training test allocation proportion is a proportion preset by the system and used for dividing the training set and the test set. The training set is a set of model training data for training the MB-FCN model, and the test set is a set of model training data for testing whether the accuracy of the MB-FCN model obtained by training meets the standard. For example, the training test allocation ratio may be set to the number of images in the training set: number of images in test set = 9:1.

s503: model training data in the training set is input into an MB-FCN model for model training, and an original MB-FCN detector is obtained.

Specifically, the server inputs model training data such as a target positive sample image carrying a first face identifier and a target negative sample image carrying a second face identifier in a training set into an MB-FCN model to carry out model training, in the model training process, error calculation is carried out according to a recognition result corresponding to each model training data and the first face identifier or the second face identifier carried in advance, and a loss function of the model is updated based on the error, so that model parameters in the MB-FCN model are updated based on the loss function, and an original MB-FCN detector is obtained.

S504: and testing the original MB-FCN detector by adopting model training data in the test set to obtain the test accuracy.

Specifically, the server inputs each model training data in the test set to the original MB-FCN detector for testing, and a test result corresponding to the model training data is obtained; if the test result is matched with the face identification corresponding to the model training data, the model training data is successfully tested; otherwise, if the test result is not matched with the face identification corresponding to the model training data, the model test data fails to be tested. Then, the server counts the number corresponding to all the model training data which are successfully tested in the test set as a first number, and counts the number corresponding to all the model training data in the test set as a second number; based on the quotient of the first number and the second number, the test accuracy of the original MB-FCN detector is obtained.

S505: and if the test accuracy is greater than the accuracy threshold, determining the original MB-FCN detector as a target MB-FCN detector.

The accuracy threshold is a threshold which is preset by the system and used for evaluating whether the accuracy of the model meets the standard. Specifically, the server compares the test accuracy of the original MB-FCN detector with an accuracy threshold; if the test accuracy is greater than the accuracy threshold, determining the original MB-FCN detector as a target MB-FCN detector; if the test accuracy is not greater than the accuracy threshold, the model training data needs to be re-acquired to perform model training.

In the recognition model training method provided by the embodiment, corresponding model training data is obtained based on the positive and negative sample distribution proportion, so that the influence on recognition accuracy caused by overfitting of a target MB-FCN detector obtained through final training is avoided. And training the original MB-FCN detector based on model training data in a training set, testing by adopting model training data in a testing set to determine the testing accuracy, and determining the original MB-FCN detector as a target MB-FCN detector when the testing accuracy is greater than an accuracy threshold value so as to enable a smaller or more fuzzy face image to be recognized from the image to be recognized by using the target MB-FCN detector later so as to ensure the recognition accuracy of the smaller or more fuzzy face image.

In one embodiment, as shown in fig. 6, step S206, namely, inputting target training data into the GAN model for model training, and obtaining a target GAN model, specifically includes the following steps:

s601: and inputting the target positive sample image in the target training data into a generation network formed based on the super-resolution reconstruction technology in the GAN model for processing, and obtaining a training generation image.

In this embodiment, the generation network in the GAN model specifically adopts a generation network formed based on the super-resolution reconstruction technique. Specifically, the generating network may adopt a DenseNet framework, including a convolution layer, N DenseBlock (DenseBlock) and a full connection layer, where each DenseBlock (DenseBlock) is provided with an up-sampling subnet and a refinement subnet capable of realizing super-resolution reconstruction. Wherein the up-sampling sub-network may coarsely magnify the resolution of the image by a factor of several, i.e. by an up-sampling operation. The refinement sub-network is a detail which is amplified by several times through fine granularity processing resolution, wherein the refinement sub-network is processed by adopting a deconvolution method. Since in a conventional convolutional neural network, if there are L layers, there are L connections required, and L (L+1)/2 connections are found in the DenseNet framework. I.e. in the DenseNet framework, the inputs of each layer come from the outputs of all the preceding layers, this way of connection makes the network narrower, the parameters smaller, and the transfer of features and gradients more efficient, and the network is also easier to train.

Specifically, the server inputs a relatively blurred target positive sample image with lower resolution in target training data into a generation network of a GAN model, in the generation network of the GAN model, a super-resolution reconstruction technology is adopted to reconstruct the target positive sample image, that is, an up-sampling sub-network and a refinement sub-network in N DenseBlock (DenseBlock) are used to perform iterative processing, so as to obtain a reconstructed super-resolution training generation image, so that model parameters of the GAN model are updated based on the training generation image.

S602: and inputting the original positive sample image and the training generation image in the target training data into a discrimination network in the GAN model for processing, and obtaining a target loss value.

Specifically, the server inputs a clearer original positive sample image in a group of target training data and a training generation image formed by performing super-resolution reconstruction based on the blurred target positive sample image into a discrimination network in the GAN model for processing, so that the original positive sample image is used as a reference object of the reconstructed super-resolution training generation image, and a target loss value between the original positive sample image and the training generation image is obtained.

S603: if the target loss value is larger than the preset loss value, updating the loss function based on the target loss value, updating the model parameters in the generating network and the judging network based on the updated loss function, repeatedly executing the process of inputting the target positive sample image in the target training data into the generating network formed by the super-resolution reconstruction technology in the GAN model, and acquiring the training generating image.

S604: if the target loss value is not greater than the preset loss value, the model converges, and a target GAN model is obtained.

The preset loss value is a minimum loss value preset by the system and used for evaluating the convergence of the model. In this embodiment, the server compares the target loss value with a preset loss value; if the target loss value is greater than the preset loss value, it indicates that the GAN model does not converge to reach the preset standard, at this time, the loss function needs to be updated based on the target loss value, the model parameters in the generating network and the discriminating network need to be updated based on the updated loss function, and the steps S601-S602 are repeatedly executed until the target loss value after iterative calculation is not greater than the preset loss value, so as to obtain the target GAN model with the model converging to reach the preset standard, so that the accuracy of the super-resolution image reconstructed based on the generating network in the target GAN model is higher, and the recognition efficiency is faster. In this embodiment, generating the model parameters of the network includes generating weights and offset values of the network, and discriminating the model parameters in the network includes discriminating the weights and offset values of the network.

In the recognition model training method provided by the embodiment, a relatively blurred target positive sample image is input into a generation network formed based on a super-resolution reconstruction technology for processing, and a relatively clear super-resolution training generation image is obtained; and inputting the training generated image and an original positive sample image corresponding to the same face into a discrimination network for processing, updating model parameters according to the target loss value until the model converges, and obtaining a target GAN model so that the accuracy and the efficiency of the super-resolution image reconstructed by the target GAN model are higher.

Further, in step S205, the original negative image and the target negative image that match the current image identifier in the first labeling information and the source image identifier in the second labeling information may be used as a set of target training data. It can be appreciated that the process of acquiring the target training data corresponding to the negative sample image is the same as the process of acquiring the target training data corresponding to the positive sample image, and in order to avoid repetition, details are not repeated here. Accordingly, in step S601, the target negative sample image in the target training data may be further input into a generating network formed based on the super resolution reconstruction technology in the GAN model for processing, so as to obtain a training generating image.

At this time, the judging network of the GAN model adopts two parallel full-connection layers (Fully connected layer, abbreviated as FC layer), and is used for distinguishing a natural real image (i.e., an original positive sample image or an original negative sample image) and a generated super-resolution image (i.e., a training generated image) in the first full-connection layer; and the second full-connection layer is used for identifying the image containing the human face and the image not containing the human face, so that the target GAN model obtained through final training not only can reconstruct the super-resolution image, but also can identify the human face in the image, and the efficiency of reconstructing the super-resolution image and identifying the human face by the target GAN model is ensured.

The embodiment of the invention also provides an image recognition method which can be applied to the application environment shown in the figure 1. Specifically, the recognition model training method is applied to an image recognition system, the image recognition system comprises a client and a server as shown in fig. 1, the client and the server are communicated through a network, and the recognition model training method is used for recognizing an image to be processed by using the recognition model obtained through the training in the embodiment so as to accurately recognize a smaller or more blurred face image, and therefore whether the face image is an image of a target face or not is determined.

As shown in fig. 7, this embodiment provides an image recognition method, which is applied to the server shown in fig. 1, and includes the following steps:

s701: and acquiring an image to be identified, carrying out face detection and image interception on the image to be identified by adopting a target MB-FCN detector, and acquiring a target face image.

The target MB-FCN detector is a detector trained based on the target positive sample image and the target negative sample image in the embodiment. The image to be recognized is an image that needs to be recognized to determine whether it is a target person. The target person is a person in need of searching, including but not limited to criminal suspects, missing persons, or other persons in need of searching.

The target face image is a face image obtained by detecting an image to be recognized by using a target MB-FCN detector to detect at least one face area in the image to be recognized and capturing at least one face area by using a capture tool. Because the MB-FCN detector adopts the blurred target positive sample image and the blurred target negative sample image to carry out model training and then obtains the detector for identifying whether the image is a face image, the detector can identify the blurred image to be identified so as to determine whether the blurred image to be identified contains a face area corresponding to a face, so that a server can intercept the target face image containing the face from the face area by adopting a screenshot tool.

S702: and generating the target face image by adopting a generating network formed based on the super-resolution reconstruction technology in the target GAN model to acquire a target generated image.

The target GAN model is a model obtained by performing model training by using target training data in the above embodiment, and the generating network in the target GAN model is a generating network formed based on a super-resolution reconstruction technology, so that a relatively blurred low-resolution image can be reconstructed into a relatively clear super-resolution image, and the reconstruction process has relatively high accuracy and relatively high efficiency.

Specifically, the server inputs the target face image into the target GAN model, and generates the target face image by using a generating network formed based on the super-resolution reconstruction technology in the target GAN model to obtain a super-resolution target generated image, so that the target generated image is used for carrying out subsequent face image recognition, and the recognition accuracy and recognition efficiency are improved. The target generation image is obtained by inputting a target face image with lower resolution into a target GAN model, and reconstructing a clearer super-resolution image by using a generation network formed based on a super-resolution reconstruction technology.

S703: and carrying out feature similarity calculation on the target generated image and the target personnel image in the target image library to obtain feature similarity.

The target image library is a database for storing the corresponding target personnel images. The target person image refers to an image of a target person to be tracked. Specifically, the server calculates feature similarity of the target generated image and each target personnel image in the target image library by adopting a face similarity detection algorithm, and obtains the calculated feature similarity. For example, the server may perform feature extraction on the target generated image and each target personnel image to obtain corresponding generated image features and personnel image features, and then calculate the generated image features and the personnel image features by using a feature distance algorithm (including but not limited to a euclidean distance algorithm or a cosine similarity algorithm) to obtain corresponding feature similarities.

S704: and if the feature similarity is greater than the similarity threshold, acquiring a detection result of determining the person corresponding to the target face image as the target person.

The similarity threshold is a threshold which is preset by the system and is required to be reached when the similarity is evaluated as the same person. After acquiring the feature similarity of the target generated image and each target personnel image, the server compares the feature similarity with a similarity threshold. If the feature similarity is larger than the similarity threshold, the target face image in the image to be recognized and the target person image in the target image library are more likely to be the same person, so that a detection result that the person corresponding to the target person image is the target person can be obtained. Correspondingly, if the feature similarity is not greater than the similarity threshold, the target face image is not the image of the target person.

The image recognition method provided by the embodiment can be applied to criminal suspects tracking, missing person searching or other scenes needing person searching. According to the method, the target MB-FCN model is firstly adopted to detect the human face and intercept the image from the monitoring equipment or the image to be processed with lower resolution ratio which is randomly shot by the person, so that the target human face image containing the human face is obtained, the data volume of the subsequent image reconstruction and image recognition is reduced, and the recognition efficiency is improved. Then, a generating network formed by a super-resolution reconstruction technology of the target GAN model is adopted to process the target face image so as to obtain a clearer target generating image with super resolution, so that the recognition accuracy is higher and the recognition efficiency is faster when the target generating image is subsequently recognized. And finally, comparing the feature similarity calculated by the target generated image and the target person image with a similarity threshold value to acquire a detection result of whether the person corresponding to the target face image is a target person or not, thereby realizing quick tracking of the target person.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In one embodiment, an apparatus for training an identification model is provided, where the apparatus for training an identification model corresponds to the method for training an identification model in the above embodiment one by one. As shown in fig. 8, the recognition model training apparatus includes an original sample image acquisition module 801, a sample thumbnail acquisition module 802, a target sample image acquisition module 803, a target detector acquisition module 804, a target training data acquisition module 805, and a target GAN model acquisition module 806. The functional modules are described in detail as follows:

the original sample image obtaining module 801 is configured to obtain an original positive sample image and an original negative sample image that carry first labeling information, where the first labeling information includes a current image identifier, a face identifier, and a first blur identifier.

The sample thumbnail obtaining module 802 is configured to perform downsampling processing on an original positive sample image and an original negative sample image, and obtain a corresponding positive sample thumbnail and a corresponding negative sample thumbnail respectively.

The target sample image obtaining module 803 is configured to obtain a target positive sample image and a target negative sample image carrying second labeling information by performing screenshot processing on the positive sample thumbnail and the negative sample thumbnail by using a screenshot tool, where the second labeling information includes a source image identifier, a face identifier, and a second blur identifier.

The target detector obtaining module 804 is configured to input a target positive sample image and a target negative sample image with different face identifiers to the MB-FCN model for model training, and obtain a target MB-FCN detector.

The target training data obtaining module 805 is configured to use, as a set of target training data, an original positive sample image and a target positive sample image that match a current image identifier in the first labeling information with a source image identifier in the second labeling information.

The target GAN model obtaining module 806 is configured to input target training data into the GAN model for model training, obtain a target GAN model, where the generation network in the target GAN model is a generation network formed based on the super-resolution reconstruction technology.

Preferably, the raw sample image acquisition module 801 comprises:

and the training sample image acquisition unit is used for acquiring the training sample image comprising the human face from the image database.

And the ambiguity acquisition unit is used for carrying out ambiguity detection on the training sample image by adopting an ambiguity detection algorithm to acquire the ambiguity corresponding to the training sample.

And the effective sample image determining unit is used for determining the effective sample image based on the ambiguity of the training sample image.

The face region size acquisition unit is used for carrying out face detection on the effective sample image by adopting a face detection algorithm to acquire the face region size.

The original sample image intercepting unit is used for intercepting an original positive sample image and an original negative sample image corresponding to the standard area size from the effective sample image if the face area size is larger than the preset area size.

The original sample image labeling unit is used for labeling the original positive sample image and the original negative sample image and acquiring the original positive sample image and the original negative sample image carrying the first labeling information.

Preferably, the ambiguity acquisition unit includes:

the gray level value acquisition subunit is used for sharpening the training sample image by using the Laplacian operator to acquire a sharpened image and pixel gray level values of the sharpened image.

And the ambiguity determining subunit is used for carrying out variance calculation on the pixel gray values of the sharpened image, obtaining a target variance value corresponding to the sharpened image, and determining the target variance value as the ambiguity corresponding to the training sample image.

Preferably, the object detector acquisition module 804 includes:

the model training data acquisition unit is used for taking the target positive sample image carrying the first face mark and the target negative sample image carrying the second face mark as model training data based on the positive and negative sample distribution proportion.

The model training data dividing unit is used for dividing model training data into a training set and a testing set based on the training test allocation proportion.

The original detector acquisition unit is used for inputting model training data in the training set into the MB-FCN model to perform model training, and acquiring an original MB-FCN detector.

The test accuracy obtaining unit is used for testing the original MB-FCN detector by adopting model training data in the test set to obtain the test accuracy.

And the target detector determining unit is used for determining the original MB-FCN detector as a target MB-FCN detector if the test accuracy is greater than the accuracy threshold.

Preferably, the target GAN model acquisition module 806 includes:

the training generation image acquisition unit is used for inputting the target positive sample image in the target training data into a generation network formed based on the super-resolution reconstruction technology in the GAN model for processing, and acquiring a training generation image.

The target loss value acquisition unit is used for inputting the original positive sample image and the training generation image in the target training data into the discrimination network in the GAN model for processing, and acquiring the target loss value.

And the model parameter updating unit is used for updating the loss function based on the target loss value if the target loss value is larger than the preset loss value, updating the model parameters in the generating network and the judging network based on the updated loss function, repeatedly executing the process of inputting the target positive sample image in the target training data into the generating network formed by the super-resolution reconstruction technology in the GAN model, and acquiring the training generating image.

And the target GAN model acquisition unit is used for acquiring a target GAN model by converging the model if the target loss value is not greater than the preset loss value.

In one embodiment, an image recognition apparatus is provided, which corresponds to the image recognition method in the above embodiment one by one. As shown in fig. 9, the image recognition apparatus includes a target face image acquisition module 901, a target generation image acquisition module 902, a feature similarity acquisition module 903, and a detection result acquisition module 904. The functional modules are described in detail as follows:

The target face image obtaining module 901 is configured to obtain an image to be identified, perform face detection and image capturing on the image to be identified by using the target MB-FCN detector, and obtain a target face image.

The target generated image obtaining module 902 is configured to perform generation processing on the target face image by using a generating network formed based on the super resolution reconstruction technology in the target GAN model to obtain a target generated image.

The feature similarity obtaining module 903 is configured to perform feature similarity calculation on the target generated image and the target person image in the target image library, so as to obtain feature similarity.

And the detection result obtaining module 904 is configured to obtain a detection result of determining a person corresponding to the target face image as the target person if the feature similarity is greater than the similarity threshold.

For specific limitations of the recognition model training apparatus, reference may be made to the above limitations of the recognition model training method, and no further description is given here. For specific limitations of the image recognition apparatus, reference may be made to the above limitations of the image recognition method, and no further description is given here. The above-described respective modules in the recognition model training apparatus or the image recognition apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data adopted or generated in the process of executing the recognition model training method or the image recognition method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a recognition model training method or an image recognition method.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the recognition model training method of the above embodiments, such as the steps shown in fig. 2-6, when the computer program is executed by the processor. Alternatively, the steps of the image recognition method in the above embodiment are implemented when the processor executes a computer program, as shown in fig. 7. Alternatively, the processor, when executing a computer program, implements the functions of the various modules/units in this embodiment of the recognition model training apparatus, such as the functions of the raw sample image acquisition module 801, the sample thumbnail acquisition module 802, the target sample image acquisition module 803, the target detector acquisition module 804, the target training data acquisition module 805, and the target GAN model acquisition module 806 shown in fig. 8. Or the processor when executing a computer program implements the functions of the respective modules/units in this embodiment of the image recognition apparatus, such as the functions of the target face image acquisition module 901, the target generation image acquisition module 902, the feature similarity acquisition module 903, and the detection result acquisition module 904 shown in fig. 9. In order to avoid repetition, a description thereof is omitted.

In an embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the steps of the recognition model training method of the above embodiment, such as the steps shown in fig. 2 to 6. Alternatively, the steps of the image recognition method in the above embodiment, such as the steps shown in fig. 7, are implemented when the computer program is executed by the processor, and are not repeated here. Alternatively, the computer program when executed by the processor implements the functions of the modules/units in this embodiment of the recognition model training apparatus, for example, the functions of the original sample image acquisition module 801, the sample thumbnail acquisition module 802, the target sample image acquisition module 803, the target detector acquisition module 804, the target training data acquisition module 805, and the target GAN model acquisition module 806 shown in fig. 8. Alternatively, the computer program when executed by the processor realizes the functions of the respective modules/units in this embodiment of the image recognition apparatus described above, such as the functions of the target face image acquisition module 901, the target generation image acquisition module 902, the feature similarity acquisition module 903, and the detection result acquisition module 904 shown in fig. 9. In order to avoid repetition, a description thereof is omitted.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A method of training an identification model, comprising:

acquiring a training sample image comprising a human face from an image database; performing fuzzy detection on the training sample image by adopting a fuzzy detection algorithm to obtain the corresponding ambiguity of the training sample; determining a valid sample image based on the ambiguity of the training sample image; performing face detection on the effective sample image by adopting a face detection algorithm to obtain the size of a face area; if the size of the face area is larger than the size of a preset area, an original positive sample image and an original negative sample image corresponding to the standard area size are intercepted from the effective sample image; labeling the original positive sample image and the original negative sample image to obtain the original positive sample image and the original negative sample image carrying first labeling information, wherein the first labeling information comprises a current image identifier, a face identifier and a first fuzzy identifier;

Downsampling the original positive sample image and the original negative sample image to obtain a corresponding positive sample thumbnail and a corresponding negative sample thumbnail respectively;

performing screenshot processing on the positive sample thumbnail and the negative sample thumbnail by using a screenshot tool to obtain a target positive sample image and a target negative sample image carrying second annotation information, wherein the second annotation information comprises a source image identifier, a face identifier and a second fuzzy identifier;

2. The method for training the recognition model according to claim 1, wherein the performing the blur detection on the training sample image by using a blur detection algorithm to obtain the corresponding blur degree of the training sample image comprises:

Sharpening the training sample image by using a Laplacian operator to obtain a sharpened image and a pixel gray value of the sharpened image;

and carrying out variance calculation on the pixel gray values of the sharpened image, obtaining a target variance value corresponding to the sharpened image, and determining the target variance value as the ambiguity corresponding to the training sample image.

3. The recognition model training method of claim 1, wherein the inputting the target positive sample image and the target negative sample image with different face identifications into an MB-FCN model for model training, obtaining a target MB-FCN detector, comprises:

based on the positive and negative sample distribution proportion, taking the target positive sample image carrying the first face mark and the target negative sample image carrying the second face mark as model training data;

dividing the model training data into a training set and a testing set based on a training test allocation proportion;

inputting model training data in the training set into an MB-FCN model for model training, and obtaining an original MB-FCN detector;

testing the original MB-FCN detector by adopting model training data in the test set to obtain test accuracy;

And if the test accuracy is greater than an accuracy threshold, determining the original MB-FCN detector as a target MB-FCN detector.

4. The method for training an identification model of claim 1, wherein the inputting the target training data into a GAN model for model training, obtaining a target GAN model, comprises:

inputting the target positive sample image in the target training data into a generation network formed based on a super-resolution reconstruction technology in a GAN model for processing, and obtaining a training generation image;

inputting the original positive sample image and the training generation image in the target training data into a discrimination network in a GAN model for processing to obtain a target loss value;

if the target loss value is larger than a preset loss value, updating a loss function based on the target loss value, updating model parameters in the generation network and the discrimination network based on the updated loss function, repeatedly executing the generation network formed by inputting the target positive sample image in the target training data into a GAN model based on a super-resolution reconstruction technology, and obtaining a training generation image;

and if the target loss value is not greater than the preset loss value, the model converges, and a target GAN model is obtained.

5. An image recognition method, comprising:

acquiring an image to be identified, and carrying out face detection and image interception on the image to be identified by adopting a target MB-FCN detector obtained by the identification model training method according to any one of claims 1-4 to acquire a target face image;

generating a target face image by adopting a generating network formed by a super-resolution reconstruction technology in a target GAN model obtained by the recognition model training method according to any one of claims 1-4 to acquire a target generated image;

6. An identification model training device, comprising:

the original sample image acquisition module is used for acquiring training sample images comprising human faces from the image database; performing fuzzy detection on the training sample image by adopting a fuzzy detection algorithm to obtain the corresponding ambiguity of the training sample; determining a valid sample image based on the ambiguity of the training sample image; performing face detection on the effective sample image by adopting a face detection algorithm to obtain the size of a face area; if the size of the face area is larger than the size of a preset area, an original positive sample image and an original negative sample image corresponding to the standard area size are intercepted from the effective sample image; labeling the original positive sample image and the original negative sample image to obtain the original positive sample image and the original negative sample image carrying first labeling information, wherein the first labeling information comprises a current image identifier, a face identifier and a first fuzzy identifier;

The sample thumbnail acquisition module is used for carrying out downsampling processing on the original positive sample image and the original negative sample image to respectively acquire a corresponding positive sample thumbnail and a corresponding negative sample thumbnail;

the target sample image acquisition module is used for carrying out screenshot processing on the positive sample thumbnail and the negative sample thumbnail by adopting a screenshot tool to acquire a target positive sample image and a target negative sample image carrying second annotation information, wherein the second annotation information comprises a source image identifier, a face identifier and a second fuzzy identifier;

7. An image recognition apparatus, comprising:

the target face image acquisition module is used for acquiring an image to be identified, and carrying out face detection and image interception on the image to be identified by adopting the target MB-FCN detector obtained by the identification model training method according to any one of claims 1-4 to acquire a target face image;

the target generation image acquisition module is used for adopting a generation network formed based on a super-resolution reconstruction technology in the target GAN model obtained by the recognition model training method according to any one of claims 1-4 to generate and process the target face image so as to acquire a target generation image;

8. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the recognition model training method according to any one of claims 1 to 4 when executing the computer program or the image recognition method according to claim 5 when the processor executes the computer program.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the recognition model training method according to any one of claims 1 to 4 or the image recognition method according to claim 5 when the processor executes the computer program.