CN113705294A - Image identification method and device based on artificial intelligence - Google Patents
Image identification method and device based on artificial intelligence Download PDFInfo
- Publication number
- CN113705294A CN113705294A CN202110238449.1A CN202110238449A CN113705294A CN 113705294 A CN113705294 A CN 113705294A CN 202110238449 A CN202110238449 A CN 202110238449A CN 113705294 A CN113705294 A CN 113705294A
- Authority
- CN
- China
- Prior art keywords
- image
- target object
- key point
- key points
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The application provides an image identification method and device based on artificial intelligence; the method comprises the following steps: carrying out target detection processing on the image to be recognized so as to obtain a target object image from the image to be recognized; performing feature extraction processing based on the target object image to obtain corresponding image features; performing key point identification processing based on the image characteristics to obtain key points and corresponding positions of the target object; and determining the integrity degree of the target object in the image to be recognized based on the key points and the corresponding positions of the target object. By the method and the device, the target object can be accurately identified in the image, and the incomplete degree of the target object in the image can be flexibly judged.
Description
Technical Field
The present application relates to artificial intelligence technology, and in particular, to an image recognition method and apparatus based on artificial intelligence, an electronic device, and a computer-readable storage medium.
Background
Artificial Intelligence (AI) is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.
Image Classification (Image Classification) refers to a technique of processing, analyzing, and understanding an Image with a computer to recognize various patterns of objects and objects. With the development of artificial intelligence technology in recent years, image recognition technology is also continuously innovated, technologies such as face recognition and human body recognition are also widely applied to various fields, and accurate evaluation of the integrity of an object in an image to be recognized becomes an important challenge for image recognition processing.
In the related technology, when whether an object in an image is complete or not is identified, a simple classification model is usually used for judging, whether the image is complete or not is judged by using the classification model, the integrity of a target identification object is judged after the identification object is rarely distinguished, when the image is identified in a targeted manner, misjudgment is generated on the image which does not contain the target identification object, the integrity of the target object cannot be accurately evaluated, and the accuracy and precision of the integrity judgment of the target object are influenced.
Disclosure of Invention
The embodiment of the application provides an image identification method, an image identification device, electronic equipment and a computer readable storage medium based on artificial intelligence, which can accurately identify a target object and flexibly judge the incompleteness degree of the target object in an image.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides an image identification method based on artificial intelligence, which comprises the following steps:
carrying out target detection processing on an image to be recognized so as to obtain a target object image from the image to be recognized;
performing feature extraction processing based on the target object image to obtain corresponding image features;
performing key point identification processing based on the image characteristics to obtain key points and corresponding positions of the target object;
and determining the integrity degree of the target object in the image to be recognized based on the key points and the corresponding positions of the target object.
The embodiment of the application provides an image recognition device based on artificial intelligence, includes: .
The target detection module is used for carrying out target detection processing on the image to be recognized so as to acquire a target object image from the image to be recognized;
the characteristic extraction module is used for carrying out characteristic extraction processing on the basis of the target object image to obtain corresponding image characteristics;
the key point identification module is used for carrying out key point identification processing based on the image characteristics to obtain key points and corresponding positions of the target object;
and the integrity judging module is used for determining the integrity degree of the target object in the image to be identified based on the key point and the corresponding position of the target object.
In the foregoing solution, the target detection module is further configured to:
carrying out target detection processing on an image to be recognized to obtain a detection frame comprising a target object;
and cutting out a target object image from the image to be recognized based on the position of the detection frame.
In the foregoing solution, the target detection module is further configured to:
extracting the features of the image to be identified to obtain a corresponding feature map;
determining a plurality of candidate frames in the image to be identified;
mapping the candidate frame to the feature map to obtain a plurality of corresponding candidate feature maps;
performing maximum pooling on the candidate feature maps to obtain a plurality of candidate area maps with the same size;
and carrying out classification processing and candidate frame position regression processing on the plurality of candidate region maps to obtain a detection frame comprising the target object.
In the foregoing solution, the feature extraction module is further configured to:
filling the surrounding area of the target object image, and performing feature extraction processing on the filled target object image to obtain corresponding image features, or,
and directly carrying out feature extraction processing on the cut target object image to obtain corresponding image features.
In the foregoing solution, when the image feature is extracted from the target object image subjected to the filling process, the keypoint identification module is further configured to:
calling the first key point detection model to execute the following processing:
mapping the image features into probability maps of a plurality of channels, wherein the probability map of each channel corresponds to the probability distribution of one key point in a preset key point set, and the probability distribution is used for representing the probability that each pixel point in the target object image belongs to the key point corresponding to the probability map;
the following processing is performed for each probability map: identifying the pixel point with the maximum probability in the probability map as a key point corresponding to the probability map, and identifying the position of the pixel point with the maximum probability as the position of the key point corresponding to the probability map;
combining the key points and the corresponding positions identified from each probability map to form a key point identification result of the target object;
the key point identification result comprises a plurality of key points and corresponding positions, and the key points correspond to all key points in the preset key point set one by one.
In the foregoing solution, when the image feature is extracted from the target object image subjected to the filling process, the keypoint identification module is further configured to:
calling a second key point detection model to execute the following processing:
mapping the image features into a probability map, wherein the probability map comprises the probability that each pixel point in the target object image corresponds to each key point in a preset key point set;
executing the following processing aiming at each pixel point in the target object image:
determining the maximum probability in the probabilities of the pixel points corresponding to all the key points in the preset key point set;
when the maximum probability exceeds a probability threshold, identifying the pixel point as a key point corresponding to the maximum probability, and identifying the position of the pixel point as the position of the key point corresponding to the maximum probability;
combining the key points identified from the probability map and the corresponding positions to form a key point identification result of the target object;
and the key point identification result comprises at least one key point in the preset key point set and a corresponding position.
In the foregoing solution, when the image feature is extracted from the target object image subjected to the filling processing, the completeness discriminating module is further configured to:
performing the following for each keypoint of the target object:
when the positions of the key points are located in the surrounding area for the filling processing, determining that the key points are the key points missing from the image to be identified;
removing the missing key points from the identified key points of the target object to update the identified key points of the target object;
taking the ratio of the number of the key points of the target object obtained by the updated identification to the number of the preset key points as the integrity degree of the target object in the image to be identified; wherein the preset number of keypoints is a keypoint count of a preset set of keypoints of the target object.
In the above solution, when the image feature is extracted from the target object image directly, the completeness discrimination module is further configured to:
taking the ratio of the number of the key points of the target object obtained by the key point identification processing to the number of preset key points as the integrity degree of the target object in the image to be identified; wherein the preset number of keypoints is a keypoint count of a preset set of keypoints of the target object.
In the above solution, before determining the ratio, the integrity determination module is further configured to:
carrying out occlusion recognition processing on each key point of the target object so as to determine occluded key points in the key points of the target object image;
removing the occluded key points from the key points of the target object to update the key points of the target object. In the above solution, the image recognition apparatus based on artificial intelligence further includes:
the processing module is used for deleting the image to be recognized from the candidate cover image set when the image to be recognized is a candidate cover image of the media account and the integrity degree of the image to be recognized is lower than an integrity degree threshold value; and shielding and recommending the image to be identified or reducing the recommendation weight of the image to be identified when the image to be identified is carried in the information to be recommended and the integrity degree of the image to be identified is lower than an integrity degree threshold value.
An embodiment of the present application provides an electronic device, including:
a memory for storing executable instructions;
and the processor is used for realizing the method provided by the embodiment of the application when executing the executable instructions stored in the memory.
The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the executable instructions, so as to implement the artificial intelligence based image recognition method provided by the embodiment of the application.
The embodiment of the application has the following beneficial effects:
the image to be recognized is subjected to explicit target detection so as to perform key point recognition processing on the target object image, namely, subsequent key point recognition is performed when the target object is determined to be included, and further the incomplete degree of the target object is determined, so that the condition of wrong judgment caused by the image without the target object is eliminated, and the degree of the integrity of the target object is accurately and efficiently obtained.
Drawings
FIG. 1 is an alternative structural diagram of an architecture of an artificial intelligence-based image recognition system provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of an alternative structure of an artificial intelligence based image recognition apparatus provided in an embodiment of the present application;
FIG. 3A is a schematic flow chart of an alternative artificial intelligence-based image recognition method according to an embodiment of the present disclosure;
FIG. 3B is a schematic flow chart of an alternative artificial intelligence-based image recognition method according to an embodiment of the present disclosure;
FIG. 4 is an alternative schematic structure diagram of a target detection model provided by an embodiment of the present application;
FIG. 5 is an alternative schematic diagram of a target detection model provided by an embodiment of the present application;
FIG. 6 is an alternative schematic diagram of a method for obtaining a candidate box according to an embodiment of the present disclosure;
FIG. 7 is an alternative schematic diagram of a filling process provided in an embodiment of the present application;
FIG. 8 is an alternative structural diagram of a keypoint identification model provided by an embodiment of the present application;
fig. 9 is an alternative schematic diagram of a feature extraction processing method provided in an embodiment of the present application;
FIG. 10 is an alternative diagram of a predetermined set of keypoints provided by embodiments of the present application;
FIG. 11 is a schematic flow chart of an alternative artificial intelligence based image recognition method provided by the embodiment of the present application;
fig. 12 is an alternative flowchart illustrating an application of the image recognition method to media account cover map selection according to the embodiment of the present application;
fig. 13 is an alternative flowchart illustrating that the image recognition method provided in the embodiment of the present application is applied to a recommendation system.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Where a similar description of "first/second" appears in the specification, the following description is added, and where the context allows, the terms "first/second" merely distinguish between similar items and do not denote a particular order, but rather are used to distinguish between similar items and not necessarily to denote a particular order, it being understood that "first/second" may, where permissible, be interchanged in a particular order or sequence, such that embodiments of the application described herein may be practiced in other than that shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Target detection: the method is an image segmentation technology based on target geometry and statistical characteristics, combines the segmentation and identification of a target into a whole, namely finds out all interested objects in an image, comprises two subtasks of object positioning and object classification, and determines the category and the position of the object at the same time.
2) The key points are as follows: a feature description in image processing is an abstract description of a fixed area or spatial physical relationship, describing a domain-wide composition or context, including but not limited to point information, more characterizing a location or context's relationship to surrounding neighbors.
3) Down-sampling: i.e. the reduced image. For example: for an image with size of M × N, s-times down-sampling is performed to obtain a resolution image with size of (M/s) × (N/s), where s is a common divisor of M and N, and for an image in matrix form, an image in a window of s × s of an original image is changed into a pixel, and a value of the pixel is an average value or a maximum value of all pixels in the window:
4) and (3) upsampling: that is, the image is enlarged, and an interpolation method is usually adopted, and a proper interpolation algorithm is adopted to insert new elements between pixel points on the basis of the original image pixels.
5) Non-maximal inhibition (NMS, Non-maximum Suppression): suppressing elements that are not maxima can be understood as local maximum search. And (4) extracting a window with the highest score in target detection (a detection box). For example, in object detection, a sliding window is subjected to feature extraction and classification and identification by a classifier, and then a score of each window is obtained. Since sliding windows may result in complete coverage or large intersections between windows, the NMS is used to pick those windows with the highest score (the highest probability of being the target object) in those neighborhoods and suppress those windows with low scores.
In the related art, for the imperfection identification in the image identification processing, only a simple classification model is used for judging, if the picture does not include the target object to be identified, the picture is identified as a complete class by default, and the image which does not include the target object is wrongly judged by the imperfection based on that the classification method does not have an explicit process for identifying the target object. In addition, the existing classification model can only output two final results of complete or incomplete target objects, cannot judge the incomplete degree of the target objects, and is difficult to meet the requirements of incomplete judgment of the target objects in different business scenes.
Based on this, the embodiment of the application provides an image recognition method, an image recognition device, an electronic device and a computer-readable storage medium based on artificial intelligence, which can reduce misjudgment of the incompleteness of a target object in an image to be recognized, and accurately and flexibly obtain the incompleteness degree of the target object in the image to be recognized.
First, an artificial intelligence based image recognition system provided in an embodiment of the present application is described, referring to fig. 1, fig. 1 is an optional architecture schematic diagram of an artificial intelligence based image recognition system 100 provided in an embodiment of the present application, and in order to support an artificial intelligence based image recognition application, a terminal 400 is connected to a server 200 through a network 300.
The image recognition system based on artificial intelligence provided by the embodiment of the application can be applied to media account cover picture selection. Specifically, the terminal 400 obtains a candidate cover image as an image to be recognized, generates an image recognition request based on the image to be recognized, the terminal 400 or the server 200 responds to the image recognition request to implement the artificial intelligence-based image recognition method provided by the application, obtains the incomplete degree of a target object of the image to be recognized, determines the integrity result of the target object, and the terminal 400 processes the candidate cover image according to the integrity recognition result of the target object, for example, if the target object is complete, the candidate cover image is displayed on a user interface as a cover image of a media account, and if the target object is incomplete, discards the current cover image candidate image.
The image recognition system based on artificial intelligence provided by the embodiment of the application can also be applied to an online recommendation system. Specifically, the terminal 400 obtains an image to be recommended, the image to be recommended is used as an image to be recognized, an image recognition request is generated based on the image to be recognized, the terminal 400 or the server 200 responds to the image recognition request to implement the artificial intelligence-based image recognition method provided by the application, the incompleteness degree of a target object of the image to be recognized is obtained, the completeness result of the target object is determined, the terminal 400 processes the image to be recommended according to the completeness recognition result of the target object, for example, if the target object is complete, the image to be recommended is displayed on a user interface, and if the target object is incomplete, the current image to be recommended is not recommended or the recommendation priority of the current image to be recommended is reduced.
In some embodiments, the artificial intelligence based image recognition method provided by the present application may be implemented by a terminal and a server in response to an image recognition request, which is specifically described as follows: a terminal 400 for sending an image recognition request to the server 200, wherein the image recognition request includes an image to be recognized; the server 200 is configured to receive the image identification request, respond to and analyze the image identification request to obtain an image to be identified, respond to the request, and perform target detection processing on the image to be identified so as to obtain a target object image from the image to be identified; performing feature extraction processing based on the target object image to obtain corresponding image features; performing key point identification processing based on the image characteristics to obtain key points and corresponding positions of the target object; and determining the integrity degree of the target object in the image to be recognized based on the key points and the corresponding positions of the target object, determining the integrity of the target object according to the integrity degree of the target object in the image to be recognized, and returning the image recognition result to the terminal 400 as an image recognition result.
In other embodiments, the artificial intelligence based image recognition method provided by the present application may be implemented by the terminal 400 independently in response to the image recognition request, specifically: the terminal 400 may respond to the image recognition request, and implement the artificial intelligence-based image recognition method provided by the present application based on the image to be recognized, obtain the integrity of the target object in the image to be recognized, and determine the integrity of the target object.
In some embodiments, the terminal 400 may be, but is not limited to, a laptop, a tablet, a desktop smart phone, a dedicated messaging device, a portable gaming device, a smart speaker, a smart watch, and the like.
The server 200 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The network 300 may be a wide area network or a local area network, or a combination of both. The terminal 400 and the server 200 may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto.
Next, an electronic device for implementing the artificial intelligence based image recognition method according to the embodiment of the present application is described, where in practical applications, the electronic device may be implemented as the terminal 400 or the server 200 in fig. 1, and the electronic device for implementing the artificial intelligence based image recognition method according to the embodiment of the present application is described by taking the electronic device as the server 200 shown in fig. 1 as an example. Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 200 according to an embodiment of the present application, where the server 200 shown in fig. 2 includes: at least one processor 210, memory 250, and at least one network interface 220. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.
The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.
The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.
In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.
An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
in some embodiments, the artificial intelligence based image recognition device provided by the embodiments of the present application can be implemented in software, and fig. 2 shows an artificial intelligence based image recognition device 255 stored in a memory 250, which can be software in the form of programs and plug-ins, and the like, and includes the following software modules: an object detection module 2551, a feature extraction module 2552, a keypoint identification module 2553, and an integrity determination module 2554, which are logical and therefore can be arbitrarily combined or further split depending on the functionality implemented. The functions of the respective modules will be explained below.
In other embodiments, the artificial intelligence based image recognition Device provided in the embodiments of the present Application may be implemented in hardware, for example, the artificial intelligence based image recognition Device provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the artificial intelligence based image recognition method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
The artificial intelligence image recognition method provided by the embodiment of the present application will be described in conjunction with an exemplary application and implementation of the server provided by the embodiment of the present application.
Referring to fig. 3A, fig. 3A is an optional flowchart of the artificial intelligence-based image recognition method according to the embodiment of the present application, which will be described with reference to steps 101 and 104 shown in fig. 3A.
Step 101: and carrying out target detection processing on the image to be recognized so as to acquire a target object image from the image to be recognized.
Here, the image to be recognized may be acquired in any manner. For example: the image is captured and acquired through tools such as a camera of the terminal device, and the like, and is intercepted in a video frame, or is acquired from an image data set, which is not limited in the embodiment of the application.
It should be noted that the target object may be various objects to be recognized, such as an animal and a person, for the animal, the target detection process may recognize a detection frame of a certain or a certain animal in the image to be recognized through the target detection model, and obtain a corresponding position of the animal, for the person, the target detection process may recognize a detection frame including different human bodies (or faces) in the image to be recognized through the target detection model, and obtain a corresponding position of the human body (or the face).
In some embodiments, referring to fig. 3B, fig. 3B is an optional flowchart of the artificial intelligence based image recognition method provided by the embodiment of the present application. Based on fig. 3A, the step 101 of performing the target detection process on the image to be recognized to obtain the target object image from the image to be recognized may be specifically implemented by the following steps 1011 and 1012.
Step 1011: and carrying out target detection processing on the image to be recognized to obtain a detection frame comprising a target object.
In some embodiments, the target detection processing is performed on the image to be recognized in step 1011 to obtain the detection frame including the target object, which is specifically realized by the following technical scheme: extracting the features of the image to be identified to obtain a corresponding feature map; determining a plurality of candidate frames in an image to be identified; mapping the candidate frames to the feature maps to obtain a plurality of corresponding candidate feature maps; performing maximum pooling on the candidate feature maps to obtain a plurality of candidate area maps with the same size; and carrying out classification processing and candidate frame position regression processing on the multiple candidate region maps to obtain a detection frame comprising the target object.
In some embodiments, the object detection process may be implemented by an object detection model, where the object detection model may be a Region Convolutional Neural network model (RCNN), a fast Region Convolutional Neural network model (fast-RCNN), an efficient network model (EfficientDet), or the like.
In some embodiments, the target detection process in step 1011 is described using the fast-RCNN model as an example. Referring to fig. 4, fig. 4 is an optional structural schematic diagram of a target detection model provided in the embodiment of the present application, where the target detection model is a fast-RCNN model, and is used to implement the method for performing target detection processing on the image to be recognized, which is provided in step 1011, to obtain a detection frame including a target object.
Specifically, in fig. 4, the fast-RCNN model includes convolution layers (Conv layers), a frame candidate extraction Network (RPN), and Region of interest pooling (ROI polling) layers and classification layers.
In some embodiments, features of the image to be recognized are extracted based on the convolution layer of the fast-RCNN model shown in FIG. 4 to obtain corresponding feature maps (feature maps); determining a plurality of candidate frames in the image to be identified based on the RPN network; mapping the candidate frames into the feature maps to obtain a plurality of corresponding candidate feature maps, and performing maximum pooling processing on the candidate feature maps by using the region-of-interest pooling layer to obtain a plurality of candidate region maps with the same size; classifying the candidate region maps by using a classification layer to obtain image classification results corresponding to the candidate frames; and combining the results of the candidate frame position regression processing on the multiple candidate region maps to obtain target detection frames containing different types of objects in the image to be identified.
Here, according to the service requirement under the specific condition, in the detection frame of the image to be recognized, the detection frame including the target object may be selected according to the type of the object, and the above embodiment is carried out.
In practical implementation, referring to fig. 5, fig. 5 is an alternative structural schematic diagram of the target detection model provided in the embodiment of the present application, where the target detection model is a fast-RCNN model; the model structure of the fast-RCNN model shown in FIG. 4 can be embodied as the model structure of the fast-RCNN model shown in FIG. 5. In FIG. 5, the fast-RCNN model specifically includes a feature extraction network, an RPN network, and an RCNN network. Here, the feature extraction network includes a number of convolution (conv) layers, nonlinear variation (relu) layers, and pooling (poling) layers, the RPN network includes a number of convolution layers and nonlinear variation layers, and the RCNN network includes a region-of-interest pooling layer and a full connection (full connection) layer of two branches.
In some embodiments, the extraction of the features of the image to be recognized, which is implemented based on the convolutional layer in fig. 4, to obtain the corresponding feature map, may be implemented by the feature extraction network in fig. 5. Here, the feature extraction network may be a VGG16 network or a ZF network. For example, taking the VGG16 network as an example, the feature extraction network may contain 13 convolutional layers, 13 non-linear varying layers, and 4 pooling layers. Specifically, the extraction of the features of the image to be recognized through the feature extraction network to obtain the corresponding feature map may be implemented in the following manner: carrying out convolution processing on the image by utilizing the convolution layer and the nonlinear change layer, extracting the characteristic information of the image, carrying out down-sampling operation on the image through the pooling layer, carrying out pooling processing on the characteristic, and obtaining a characteristic diagram corresponding to the image to be identified after carrying out convolution, nonlinear change and pooling for many times.
Here, the size of the convolution kernel of all convolution layers is set to 3 × 3, edge filling of 1 pixel is performed, and the size of all downsampled layer kernels is set to 2 × 2, at this time, the input image size is M × N, after the edge filling process of 1 pixel, the image size becomes (M +2) × (N +2), and after convolution with 3 × 3, the output feature map size is the same as the input image size. The down-sampling layer kernel size is 2 x 2, the step size is 2, and when the input matrix is M x N, the output matrix becomes (M/2) x (N/2). After the convolution and pooling operations, the dimension of the original input image is guaranteed to be reduced to one half of the original dimension. After the pooling operation of the 4 pooling layers, the size of the feature map output by the feature extraction network is 1/256 of the input image for fast mapping the features back to the original image to be recognized.
In some embodiments, determining a plurality of candidate frames in the image to be recognized based on the RPN network implementation in fig. 4 may be implemented by the RPN network in fig. 5. Here, the backbone of the RPN network may be a VGG16 network or a ZF network, for example, taking a VGG16 network as an example, the RPN network may include a number of convolutional layers, non-linearly varying layers. Specifically, in fig. 5, for the feature map output by the feature extraction network, the RPN performs a convolution once by using a convolution kernel RPN _ conv filled with 3 × 3, 1 pixel edges, and performs a nonlinear change process, determines a plurality of anchors of the image to be recognized during the convolution process, and obtains features of the plurality of anchors. And respectively entering two parallel processing branches based on the characteristics of the anchors, wherein the first branch performs classification processing on the anchors after convolution processing by using a convolution kernel of 1 × 1, the second branch performs frame candidate regression processing on the anchors after convolution processing by using a convolution kernel of 1 × 1, and finally a plurality of frame candidates are obtained according to a classification processing result and a frame candidate regression processing result.
In some embodiments, the classification process for multiple anchors may be implemented by: for each anchor, changing dimension (reshape) of the features of the anchor, performing probability mapping processing by using a softmax function, mapping the features of a plurality of anchors into the probability of foreground/background (here, a target object to be identified is defined as foreground, and the rest is defined as background), then changing the dimension, and outputting a classification result corresponding to each anchor as the foreground/background. It should be noted that, because softmax classifies channels, data is subjected to dimension change, and is "vacated" out of one dimension separately, and then the dimension change is performed, and the data is processed into an original data structure.
In some embodiments, the anchors belonging to the foreground are screened from the plurality of anchors, the position coordinates are corrected through boundary regression processing, sorting is performed according to the output scores of softmax, the first N anchors are extracted, and after non-maximum suppression processing, redundant anchors are removed. And then, M anchors with the front output scores are reselected to be determined as a plurality of candidate frames. Here, the non-maximum suppression processing for the redundant candidate frame screening may be implemented by: sorting the candidate frames of each category in a descending order according to the classification confidence; and selecting the candidate frame with the highest confidence coefficient from the candidate frames in each category, calculating the intersection ratio of the candidate frame with the highest confidence coefficient and the rest candidate frames, and deleting the rest candidate frames with the intersection ratio larger than the threshold value.
Exemplarily, referring to fig. 6, fig. 6 is an alternative schematic diagram of a method for obtaining a candidate frame according to an embodiment of the present application, where the method for obtaining a candidate frame in fig. 6 is implemented by an RPN network; the method for determining multiple anchors of an image to be recognized and obtaining features of the multiple anchors and the processing method based on two branches provided in fig. 5 can be specifically realized by specific steps in the schematic diagram shown in fig. 6, and are specifically described as follows: when convolution processing is carried out on the feature map, k anchors are generated in the image to be identified based on the central point of each sliding convolution kernel window; here, three length-width ratios (1:1, 1:2, 2:1) and three area scales (128 × 128, 256 × 256, 512 × 512) are generally used as constraints for selecting anchors, so as to obtain 9 anchors, and obtain the characteristics of the 9 anchors after convolution processing. And classifying the foreground/background of each anchor based on the characteristics of each anchor and performing position regression of the candidate frame (here, the position regression of the candidate frame is to perform regression processing on the candidate frame through a pre-labeled target frame to obtain four offsets of the horizontal and vertical coordinates and the length and width of the central point). Taking the VGG network as an example, each point in the feature map is 512-dimensional, each anchor is mapped on the feature map, and after parallel processing of two branches, the feature map is converted into a feature map of 2 × k dimensions (classification result of probability of characterizing foreground/background) and a feature map of 4 × k dimensions (characterization position regression result, i.e. four offsets).
In some embodiments, the maximal pooling of the plurality of candidate feature maps based on the region-of-interest pooling layer in fig. 4 to obtain a plurality of candidate region maps with the same size may be implemented by the region-of-interest pooling layer in fig. 5.
For example, the ROI polling layer equally divides the input candidate feature maps of different scales into 7 equal parts in the horizontal and vertical directions to obtain 49 regions, and performs maximum value downsampling processing on each region to obtain a plurality of candidate region maps of the same size.
In some embodiments, the method of classifying multiple candidate region maps to obtain multiple candidate frames, which is implemented based on the classification layer in fig. 4, may be implemented by a fully-connected layer of a first branch in fig. 5, and the fully-connected layer of a second branch shown in fig. 5 is used for performing frame candidate position regression processing.
Illustratively, the following processing is performed for each candidate region map: the classification processing is carried out through the full connection layer of the first branch: mapping each candidate region map into a probability matrix belonging to each category, and judging the corresponding object category in the image to be identified; and performing candidate frame regression processing by utilizing the parallel full-connected layer of the second branch to obtain position offset so as to correct the position of the candidate frame, and after performing classification processing and candidate frame regression processing on all the candidate region images, selecting corresponding candidate frames so as to obtain target detection frames containing different types of objects in the image to be identified.
Step 1012: and cutting out the target object image from the image to be recognized based on the position of the detection frame.
In actual implementation, according to the position of the detection frame in the image to be recognized, the image to be recognized is subjected to cutting processing, and a target object image with the same size as the detection frame is extracted.
For example, based on the position of the detection frame containing the human body, a picture of the human body is cut out at the corresponding position of the image to be recognized, and it should be noted that, in general, the detection frame obtained after the target detection processing contains a single human body.
Here, the target object is identified in the image to be identified, for example, a human body is identified and screened in the image through target detection, so that the problem that the existing classification model makes a misjudgment on the image without the human body when the existing classification model makes an imperfect judgment on the human body in the image is avoided, and when the classification model is applied to a specific image identification business process, the accuracy of the object integrity judgment is improved.
Step 102: and carrying out feature extraction processing based on the target object image to obtain corresponding image features.
In some embodiments, the feature extraction processing is performed based on the target object image, and the obtaining of the corresponding image feature is specifically realized by the following technical scheme: and filling the surrounding area of the target object image, and performing feature extraction processing on the filled target object image to obtain corresponding image features.
In practical implementation, a circle of uniform pixel values is filled in the surrounding area of the target object image to obtain a target object image with preset size and subjected to filling processing, so as to be used for feature extraction processing and key point identification processing.
For example, referring to fig. 7, fig. 7 is an alternative schematic diagram of a filling processing method provided by an embodiment of the present application, a region 71 of fig. 7 is a target object image (hereinafter referred to as an original target image) cut out from an image to be recognized, and a region 72 is a surrounding region where filling processing is performed with uniform pixel values. Here, in order to adapt to the sizes of different detection frames, the original target image is adjusted to an image of a preset size, and the region outside the original target image within the preset size is filled with uniform pixel values, so as to obtain a target image subjected to filling processing.
It should be noted that the uniform pixel value is an empirical pixel value, and is used to represent that the surrounding area is an area other than the original target image, and belongs to a background area or an occlusion area compared to the original target image. Here, the uniform pixel value may be a specific pixel value set in advance, for example: the average pixel value calculated in the large image recognition database (imagenet), and the like, the setting of the uniform pixel value in the embodiment of the present application is not particularly limited. In some embodiments, the feature extraction process may be implemented by a key point identification model, where the key point identification model may be a Cascaded Regression model (CPR), a High-resolution network model (HRnet), or the like.
In some embodiments, the target object images subjected to the padding process may be stored in an image database or image sample set for training the keypoint recognition model as training sample images. Here, the key points of the target object are marked in the target object image after the filling processing, and the target object image after the filling processing and the key points of the target object are marked as training sample images, so that the key point recognition model can extract image features in the surrounding area and then recognize the key points.
In some embodiments, taking the HRnet model as an example, the HRnet model includes two parts: a feature extraction part and a key point identification part, wherein the feature extraction processing is performed based on the target object image in step 102, and the obtaining of the corresponding image feature can be realized by the feature extraction part of the HRnet.
Here, a general key point identification model performs downsampling processing on an input image to reduce image resolution, performs upsampling processing to improve resolution and obtain a final feature extraction result, a mode of downsampling to retain strong image features and then upsampling to restore image resolution loses spatial information to a certain extent, a certain quantization error is generated when the feature extraction result is used for key point identification, the HRnet adopts a design mode of network structure parallel connection, and a branch is always kept to be the feature extraction processing of high resolution, so that abundant spatial information is protected, and the generation of quantization errors is reduced.
Referring to fig. 8, fig. 8 is an alternative structural diagram of a keypoint identification model provided in an embodiment of the present application, where the keypoint identification model is an HRnet model, and in fig. 8, a horizontal direction represents a depth (depth) of a network, and a vertical direction represents a scale (scale) of the network. The depths 1 to 15 represent a network structure of a feature extraction portion of the HRnet, and the network feature extraction portion may be divided into a plurality of stages (for example, the depths 1 to 3, 4 to 6, 7 to 10, 11 to 14 are respectively different stages).
In practical implementation, before each stage is started, a feature map with a smaller resolution (the scale is also smaller) is added, feature maps with different scales are obtained by means of interpolation up-sampling and convolution down-sampling respectively, and feature maps with the same scale are subjected to fusion processing, so that the initial feature map is ensured to be combined with features of feature maps with different scales in the previous stage. In each stage process, a residual error neural network (ResNet) is adopted to carry out deep learning on the feature maps of all scales respectively; through the last stage (depth 11 to 14), 4 feature maps with different scales are obtained; based on the feature maps of 4 different scales, a feature map is obtained that results in a final output at depth 15. Specifically, referring to fig. 9, fig. 9 is an optional schematic diagram of a feature extraction processing method provided in an embodiment of the present application, in fig. 9, a feature map having the same size as a maximum-scale feature map (a feature map that always maintains high resolution) is obtained by performing interpolation upsampling on all small-scale feature maps, for example, feature maps 902, 903, and 904 are respectively subjected to interpolation upsampling processing to obtain feature maps 905, 906, and 907 having the same size as feature map 901, and feature maps 901, 905, 906, and 907 are fused to obtain a final output feature map at depth 15, where the feature map is used to represent feature information of a target object image.
Here, the feature extraction of the target object image after the filling processing is beneficial to detecting all theoretically available key points in the target object image, so that the incomplete degree of the representative target object is quantified according to the relationship between the positions of the key points and the surrounding area of the filling processing.
In other embodiments, the feature extraction processing is performed based on the target object image, and the obtaining of the corresponding image feature is specifically realized by the following technical scheme: and directly carrying out feature extraction processing on the cut target object image to obtain corresponding image features.
In actual implementation, the feature extraction part of the HRnet directly performs feature extraction processing on the clipped target object image to obtain corresponding image features, where the feature extraction processing may refer to the feature extraction processing method in the embodiment of the present application, and details are not described herein.
The feature extraction of the target object image which is directly cut out is beneficial to detecting key points which substantially exist in the target object image, visually seeing the missing situation of the key points of the target object and conveniently obtaining the incomplete degree of the key points of the target object.
Step 103: and performing key point identification processing based on the image characteristics to obtain key points and corresponding positions of the target object.
In some embodiments, when the image features are extracted from the target object image subjected to the filling processing, the key point identification processing is performed based on the image features, and obtaining the key points and the corresponding positions of the target object is specifically realized by the following technical solutions: calling the first key point detection model to execute the following processing: mapping the image features into probability maps of a plurality of channels, wherein the probability map of each channel corresponds to the probability distribution of one key point in a preset key point set, and the probability distribution is used for representing the probability that each pixel point in the target object image belongs to the key point corresponding to the probability map; the following processing is performed for each probability map: identifying the pixel point with the maximum probability in the probability map as a key point corresponding to the probability map, and identifying the position of the pixel point with the maximum probability as the position of the key point corresponding to the probability map; and combining the key points and the corresponding positions identified from each probability map to form a key point identification result of the target object. The key point identification result comprises a plurality of key points and corresponding positions, and the plurality of key points correspond to all key points in the preset key point set one by one.
Here, the first keypoint detection model may be the above keypoint identification model, for example, an HRnet model, and the feature extraction part that calls the first keypoint detection model performs feature extraction processing on the target image object after the filling processing to obtain a corresponding feature map, in response to the above description. And calling a key point identification part of the first key point detection model to perform key point identification processing based on the feature map.
In practical implementation, the keypoint identification part adds a convolution layer behind the feature map, and maps the image features into probability maps of a plurality of channels, wherein the number of the channels is set as the number of the keypoints in a preset keypoint set, so that the probability map of each channel corresponds to the probability distribution of one keypoint in the preset keypoint set.
In some embodiments, in each probability map corresponding to different channels, a pixel point with the highest probability in the probability map may be identified as a key point corresponding to the probability map, and a position of the pixel point with the highest probability may be identified as a position of the key point corresponding to the probability map, specifically, a coordinate of the key point may be determined by the following formula (1):
Pi=argmax(Heatmapi) (1)
pi is the coordinate of the key point, and Heatmapi is the prediction heat map corresponding to the ith channel, namely a probability map, wherein i is a positive integer and is more than or equal to 1.
In other embodiments, each probability map corresponding to a different channel may be a true probability map, and a pixel point with a true probability map probability value of 1 may be identified as a key point corresponding to the true probability map.
The following describes the training process of the first keypoint detection model. Before invoking the first keypoint model, a first set of training samples is also obtained to train the first keypoint detection model. Here, the first training sample set includes: the method comprises the steps of carrying key point position marks in a shielded image sample and an unshielded image sample. Part of key points of the target object in the occluded image sample are occluded by the background, and key points of the object in the non-occluded image sample are not occluded. Here, the occlusion of part of the keypoints by the background can be any occlusion situation, such as: the filling processing according to the embodiment of the present application may be performed by filling uniform pixel values at corresponding key points to serve as a background region, blocking corresponding key points, and marking the positions of the key points in the blocked region. And training the first key point detection model through the training sample set so that the key point identification model can successfully identify key points in the shielded area to form a key point identification result of the target object, wherein the key point identification result comprises a plurality of key points corresponding to all key points in a preset key point set corresponding to the target object and the positions of the key points.
Here, the preset keypoint set is a preset set of keypoints representing the key information of the target object, for example, when the target object is a human body, the preset keypoint set for each human body is defined to include 17 keypoints, see fig. 10, where fig. 10 is an optional schematic diagram of the preset keypoint set provided in the embodiment of the present application, where the preset keypoint set is 17 keypoints of the human body, and the 17 keypoints are respectively a nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right crotches, left and right knees, and left and right ankles.
The first key point detection model is used for identifying key points of the target object image subjected to filling processing, so that the key point information is not lost, the key point information of the target object is judged to be lost according to all identified key point positions, and the integrity degree of the target object is accurately obtained.
In other embodiments, when the image feature is directly extracted from the target object image, the key point identification processing is performed based on the image feature, and the key point and the corresponding position of the target object are obtained specifically by the following technical solutions: calling a second key point detection model to execute the following processing: mapping the image characteristics into a probability map, wherein the probability map comprises the probability that each pixel point in the target object image corresponds to each key point in a preset key point set; aiming at each pixel point in the target object image, the following processing is executed: determining the maximum probability in the probability of the pixel point corresponding to each key point in the preset key point set; when the maximum probability exceeds a probability threshold, identifying the pixel point as a key point corresponding to the maximum probability, and identifying the position of the pixel point as the position of the key point corresponding to the maximum probability; combining the key points and the corresponding positions identified from the probability map to form a key point identification result of the target object; and the key point identification result comprises at least one key point in a preset key point set and a corresponding position.
Here, the second keypoint detection model may be the above keypoint identification model, for example, an HRnet model, and the feature extraction unit of the second keypoint detection model is called to perform feature extraction processing on the target object image that is directly cut out to obtain a corresponding feature map, in response to the above description. And calling a key point identification part of the second key point detection model to perform key point identification processing based on the feature map.
In actual implementation, the key point identification part adds a full-connection layer behind the feature map to flatten the feature map, maps the image feature into a probability map, and the probability map can be a probability matrix and represents the probability that each pixel point corresponds to each key point in a preset key point set; here, the mapping may be a multi-classification mapping in which the probability of each probability point is expressed as a probability of belonging to each of a plurality of keypoint classes. And for each pixel point, determining the maximum probability of each pixel point, comparing the maximum probability with a preset probability threshold, when the maximum probability exceeds the probability threshold, taking the key point category corresponding to the maximum probability as the classification result of the pixel point, and if the maximum probability does not exceed the probability threshold, failing to identify the current pixel, and representing that the current pixel does not belong to any key point.
Here, the threshold value of the probability threshold representing the probability may be set to 80%, and for the maximum probability exceeding the probability threshold, the category corresponding to the maximum probability is generally the category to which the current pixel belongs or is most likely to belong to a certain key point.
The following describes the training process of the second keypoint detection model. Before invoking the second keypoint model, a second set of training samples is also obtained to train the first keypoint detection model. Here, the second training sample set is used for an image sample set containing a target object for performing keypoint identification on the target object. The second set of training samples comprises: the method comprises the steps of obtaining an incomplete object image sample and a complete object image sample, wherein the complete object image sample comprises all key points in a preset key point set corresponding to a target object, and the incomplete object image sample comprises at least one key point in the preset key point set corresponding to the target object. And training a second key point detection model through the training sample set so that the second key point detection model can directly perform feature extraction and key point identification processing on the cut target object image to form a key point identification result of the target object, wherein the key point identification result comprises at least one key point in a preset key point set corresponding to the target object and the position of the key point.
And performing key point identification on the target object image obtained by directly cutting by using a second key point detection model, so that the identified key points are compared with a preset key point set of the target object, and the key point information missing condition of the target object is quickly and conveniently obtained. And then the integrity degree of the target object is obtained.
Step 104: and determining the integrity degree of the target object in the image to be recognized based on the key points and the corresponding positions of the target object.
In some embodiments, when the image features are extracted from the target object image subjected to the filling processing, based on the key points and the corresponding positions of the target object, determining the integrity of the target object in the image to be recognized is achieved by the following technical solutions: the following processing is performed for each keypoint of the target object: when the positions of the key points are located in the surrounding area for filling processing, determining that the key points are the key points missing in the image to be identified; removing missing key points from the identified key points of the target object to update the identified key points of the target object; taking the ratio of the number of the key points of the target object obtained by the updated identification to the number of the preset key points as the integrity degree of the target object in the image to be identified; wherein the preset number of keypoints is a keypoint count of a preset set of keypoints for the target object.
In practical implementation, when the image features are extracted from the target object image subjected to the filling processing, the key point identification result of the target object is obtained through key point identification detection, and the key point identification result includes a plurality of key points corresponding to all key points in the preset key point set corresponding to the target object and positions of the key points. For each identified key point, determining whether the current key point is located in the filled surrounding area or the original target image area according to the coordinates of the key point in the image to be identified, when the detected key point is located in the surrounding area 72 shown in fig. 7, the target object obtained through target detection does not actually have the key information of the key point, determining that the current key point is the key point missing from the target object based on the key information, and judging the integrity degree of the target object in the image to be identified according to the comparison of the number of the missing key points and the preset key point set of the target object.
For example, when the target object is a human body, 17 key points and corresponding position information are obtained through key point recognition, the relationship between the key points and the surrounding area is judged, the key points of the left knee, the right knee and the left ankle are obtained and are located in the surrounding area, and the part below the left knee and the right knee which is at least missing in the target human body can be obtained through inference. In order to quantify the degree of incompleteness of the target human body, the missing key points (left and right knees and left and right ankles) are removed from the identified key points, and the target key points after the removal operation are updated (the remaining 13 key points are obtained). And taking the ratio of the number of the updated key points (13 key points) to the number of the preset key points (17 key points) as the integrity degree of the target object in the image to be recognized.
It should be noted that the key point in the embodiment of the present application may be one pixel point, or may be a point set composed of a plurality of pixel points, for example, the key point representing the nose may be composed of 7 pixel points fitting the nose contour; in the embodiment of the present application, for the description that the number of the preset key points is the number of the key points in the preset key point set of the target object, the point set condition of each key point is ignored, and in the actual implementation, the condition that each key point is composed of the point set can be considered, so that the count of the key points in the preset key point set is enriched to optimize the ratio, and the completeness of the target object is determined based on the key point information, which is not described in the embodiment of the present application.
In some embodiments, a completeness threshold may be preset, where the completeness threshold characterizes a result of whether a target object included in the image to be recognized is complete. And comparing the relationship between the integrity threshold and the integrity degree, and determining that the current target object is an intact object when the integrity degree exceeds the integrity threshold. In practical implementation, in order to adapt to different business requirements, a threshold value can be flexibly set to determine whether a target object is complete.
The filled target object image is subjected to key point identification, all key points which can be obtained theoretically can be positioned in the image, the position relation between the key points and the filled surrounding area and the original target image area is judged, the missing condition of key point information is clearly obtained in the image, and the incomplete degree of the target object is quickly obtained.
In other embodiments, when the image features are extracted from a direct target object image, determining the integrity of the target object in the image to be recognized based on the key points and the corresponding positions of the target object is implemented by the following technical solutions: taking the ratio of the number of key points of the target object obtained by the key point identification processing to the number of preset key points as the integrity degree of the target object in the image to be identified; wherein the preset number of keypoints is a keypoint count of a preset set of keypoints for the target object.
In actual implementation, the above is carried out, and when the image characteristics are directly extracted from the target object image, the key point identification processing is carried out based on the image characteristics to obtain the key point identification result of the target object; and the key point identification result comprises at least one key point in a preset key point set and a corresponding position. Here, at least one of the key points directly obtained for the cut target object image in the image to be recognized may be all key points of a preset key point set, or may be a part of key points; when all key points of the preset key point set are identified and obtained, the target object is generally considered to be complete under the condition that the target object is not considered to be shielded, when part of key points are identified and obtained, the missing condition of the key point information of the target object can be obtained by comparing the key points with the preset key point set and judging which key point information is missing, and further the integrity degree of the target object in the image to be identified is obtained.
Illustratively, when the target object is a human body, 9 key points (nose, left eye, left ear, left shoulder, left elbow, left wrist, left crotch, left knee, and left ankle) and corresponding position information are obtained through key point recognition, and it can be inferred that the target human body lacks at least the right side of the body by comparing with 17 key points preset by the target human body. In order to quantify the incomplete degree of the target human body, the ratio of the number of the identified key points (9 key points) to the number of the preset key points (17 key points) is used as the complete degree of the target object in the image to be identified.
The method comprises the steps of directly identifying key points of a target object image cut out from an image to be identified so as to determine the integrity degree of the target object, simply and conveniently obtaining actual key information of the target object, and rapidly obtaining the missing condition of the key point information of the target object by comparing the identified key information with the key information obtained theoretically so as to determine the integrity degree of the target object.
In other embodiments, before determining the ratio of the number of the key points of the target object to the preset number of key points in step 104, the following process may be performed: carrying out occlusion recognition processing on each key point of the target object to determine occluded key points in the key points of the target object image; and removing the occluded key points from the key points of the target object so as to update the key points of the target object.
Here, it is fully considered that the key point identification processing according to the embodiment of the present application masks a target object from being occluded by an object or overlapping objects, and therefore, the occlusion identification processing is performed on the detected key points before the ratio of the number of key points of the target object to the preset number of key points is determined.
In some embodiments, the occlusion recognition process for each key point of the target object may be implemented by the following specific schemes: for each key point, performing classification identification on pixel points for characterizing the key point information (classification represents the distinction between a background category and a foreground category, and the background category is an occlusion category), converting the key point information into a feature vector, mapping the feature vector into the probability of the background category, and determining the key point corresponding to the probability smaller than a background probability threshold value as the occluded key point so as to remove the occluded key point from the key points of the target object.
In other embodiments, the occlusion recognition processing for each key point of the target object may be further implemented by the following specific embodiments: determining a region in a preset range in the image to be identified based on the position of the key point, and extracting an image of the corresponding region as a key point image corresponding to the key point; extracting the characteristics of the key point images, and performing two-classification prediction on the key point images based on the extracted characteristics of the key point images; determining the attribution of the key point image as a foreground category or an occlusion category according to the two-classification prediction result; when the key point image belongs to the occlusion category, determining that the corresponding key point is an occluded key point, so as to remove the occluded key point from the key points of the target object.
In some embodiments, the occluded key points are removed from the key points of the target object to update the key points of the target object, and in step 104, the ratio of the updated key points of the target object to the preset number of key points is used as the integrity degree of the target object in the image to be recognized.
Illustratively, when the target object is a human body, 9 key points (nose, left eye, left ear, left shoulder, left elbow, left wrist, left crotch, left knee, and left ankle) are obtained after the key point identification processing is performed on the directly cropped human body image by using the second key point detection model, and after the occlusion identification, key points of which three key points (nose, left eye, and left ear) are occlusion categories are obtained, and according to comparison with 17 key points preset by the target human body, it can be inferred that the target human body at least lacks the right side of the body, and the face of the target human body has occlusion. In order to quantify the incomplete degree of the target human body, three key points of the occlusion category are removed from the identified key points, and the ratio of the number of the updated key points (6 key points) to the number of the preset key points (17 key points) is used as the complete degree of the target object in the image to be identified. The target object integrity can be rapidly determined by directly comparing the key points obtained by key point identification with the preset number of key points to obtain the target object incompleteness. Carrying out occlusion identification on each key point, fully considering the actual occlusion situation of the target object, and filtering out occluded key points; when the incomplete degree of the target object is judged based on the filtered key points, the complete degree refers to two dimensions of substantial absence of key point information and occlusion of the key point information of the target object in the image to be recognized, so that the complete degree of the target object is judged more accurately, and higher application value can be embodied in an application scene needing to consider that the target object is not only in the image but also needs to ensure that the occlusion does not exist.
According to the embodiment of the application, the image to be recognized is subjected to explicit target detection, the image containing the target object is extracted, subsequent key point recognition processing is carried out when the target object is determined to be included, and the incomplete degree of the target object is accurately and flexibly judged through comparison between the recognition key point information and the key point information recognized theoretically. Compared with the situation that only complete or incomplete binary classification can be carried out based on a simple image classification method, the method enriches the judgment conditions and judgment results, and can realize different application values in a specific service scene. In addition, the detection including the target object is preferentially carried out, the misjudgment condition caused by pictures not including the target object can be eliminated by carrying out the key point identification processing, and the accuracy and precision of the incompleteness judgment are improved.
In some embodiments, the method of steps 101-104 may be applied to the following usage scenarios: when the image to be recognized is a candidate cover image of the media account and the integrity degree of the image to be recognized is lower than an integrity degree threshold value, deleting the image to be recognized from the candidate cover image set; and shielding and recommending the image to be identified or reducing the recommendation weight of the image to be identified when the image to be identified is carried in the information to be recommended and the integrity degree of the image to be identified is lower than the integrity degree threshold.
In practical implementation, when the image to be recognized is used for cover image selection of the media account, and when the image to be recognized is a candidate cover image of the media account and the integrity degree of the image to be recognized is lower than an integrity degree threshold, deleting the image to be recognized from the candidate cover image set can be implemented by the following specific scheme: acquiring a candidate cover image set of a media account (here, the candidate cover image set may be acquired in any way, and may be obtained from a media video frame, or from an image database related to a target media, or may be acquired from an image or video uploaded by a user of the media account); acquiring an image to be recognized in a candidate cover picture set, and detecting a target object of the image to be recognized and identifying the incompleteness of the target object through the steps 101 and 104 provided by the embodiment of the application to obtain the incompleteness degree of the target object in the image to be recognized; when the integrity degree is lower than the integrity degree threshold value, determining whether a target object of the image to be recognized is incomplete, deleting the current image to be recognized in the candidate cover picture set, and not taking the corresponding image to be recognized as a cover picture; when the integrity level is higher than the integrity level threshold, whether the target object of the image to be recognized is complete or not is determined, and the image to be recognized can be used as a priority alternative image of the cover image and enter the subsequent selection or processing of the cover image.
In practical implementation, when the image to be recognized is applied to an online recommendation system for image recommendation, and when the image to be recognized is carried in the information to be recommended and the integrity degree of the image to be recognized is lower than an integrity degree threshold, shielding and recommending the image to be recognized, or reducing the recommendation weight of the image to be recognized can be realized by the following specific scheme: acquiring an image to be identified; detecting a target object of an image to be recognized and identifying the incompleteness of the target object through the steps 101-104 provided by the embodiment of the application to obtain the incompleteness degree of the target object in the image to be recognized; when the integrity degree is higher than the integrity degree threshold value, whether the target object of the image to be recognized is complete or not is determined, the current image to be recognized is filtered or shielded, recommendation is not performed, or the recommendation weight of the image to be recognized is reduced, so that the recommendation chance of the image to be recognized is reduced.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The image recognition method provided by the application takes human body image recognition of a human body in an image as an example, performs target detection on an image to be recognized to obtain an image containing the human body, performs key point detection on the image containing the human body to obtain key points and positions of the key points, judges the degree of incompleteness of the human body according to the key points and the positions of the key points, and accurately and flexibly obtains the result of human body incompleteness based on a preset rule.
Referring to fig. 11, fig. 11 is a schematic flow chart of an image recognition method based on artificial intelligence according to an embodiment of the present application, specifically illustrating a method for performing human body imperfection recognition on an image, which will be described with reference to step 201 and step 205.
Step 201: and acquiring an incomplete human body identification request.
In actual implementation, after an image to be recognized including a human body is acquired, an incomplete recognition request for the human body is received, and incomplete recognition processing is performed on the image in response to the request.
Step 202: and carrying out target detection by using the human body detection model to obtain a plurality of detection frames containing different objects in the image to be identified and the positions of the detection frames.
Here, the human body detection model may be a target detection model provided in the embodiment of the present application, and for example, taking the fast-RCNN model as an example, the fast-RCNN model integrates feature extraction, candidate region extraction, detection frame regression, and category judgment into one network, so that the detection speed is greatly improved, see fig. 4, where fig. 4 shows a target detection flow, that is, an image to be recognized is input, and a detection frame of a target region and a category of a target object are output.
Exemplarily, the image to be recognized is input into a convolutional layer (Conv layers) in fig. 4, and the Conv layers are used for extracting image features of the image to be recognized; specifically, feature maps (feature maps) in pictures are extracted using a series of convolution (conv), nonlinear variation (relu), pooling (posing) operations for convolution, nonlinear variation, and pooling operations. The candidate box extraction network (RPN) in fig. 4 is mainly used to generate candidate boxes; specifically, a plurality of anchors (anchors) are obtained by using the RPN network, and classification processing and bounding box position regression processing (bounding box regression) are performed on the anchors to obtain a plurality of candidate boxes (region preambles). Inputting the feature map and the candidate frames into a region-of-interest pooling (RoI posing) layer in fig. 4, before entering the RoI posing layer, mapping the candidate frames into the feature map to obtain a feature map or a feature matrix corresponding to each candidate frame, and performing maximum pooling downsampling on the feature map corresponding to each candidate frame by using the RoI posing layer to obtain candidate feature maps (pro-sample feature maps) with the same size; specifically, the input feature maps or feature matrices with different proportions and different scales are equally divided into 7 equal parts in the horizontal and vertical directions to obtain 49 areas, and maximum value downsampling processing is performed on each area to obtain a plurality of candidate feature maps with the same size. And sending the candidate feature maps into two subsequent full-connection layers connected in parallel to respectively judge the target category and carry out candidate bounding box position regression processing.
Specifically, the Classifier layer shown in fig. 4 is a fully-connected layer of one of the branches, and is configured to perform multi-class prediction on the candidate feature map, map a probability matrix belonging to each class by using a plurality of softmax functions, perform candidate bounding box position regression prediction by using the fully-connected layer of the other branch, obtain a position offset from the pre-marked target detection box, and adjust the position of the candidate box according to the position offset. And combining the classification result, removing the candidate frames belonging to the same classification and covering a large area of content, and combining the position regression processing of the candidate bounding frames to obtain a plurality of target detection frames containing different objects and final accurate positions of the corresponding detection frames.
Step 203: selecting a detection frame containing a human body, cutting a corresponding human body area containing the human body in the image to be recognized to obtain a human body image containing a single human body, and performing surrounding filling processing on the target image to obtain the human body image after the filling processing.
In practical implementation, after target detection is performed by the target detection model, the categories and detection frames of a plurality of objects in the image to be recognized can be obtained, the objects of which the categories are human bodies and the corresponding detection frames thereof are found out, and according to the positions of the detection frames, a single human body is cut out from the corresponding positions in the original image according to the size of the detection frames, so that a plurality of human body pictures including the single human body are obtained. And performing surrounding filling processing on each human body picture. Specifically, as shown in fig. 7, the filling method fills a circle of uniform pixel values in the surrounding area of the original human body image containing the human body, so as to obtain the human body image after the filling process.
Here, the uniform pixel value is a predetermined specific pixel value, and may be an empirical pixel value, for example: average pixel values calculated in a large image recognition database (imagenet), etc.
Step 204: and performing key point detection on the filled human body image by using the human body key point model to obtain key points and corresponding positions of the human body.
Here, the human body keypoint model may be a keypoint identification model, a first keypoint detection model, or the like provided in the embodiment of the present application.
In practical implementation, a human body key point model is taken as an example, the HRnet adopts a parallel design method, when image feature extraction is carried out, one image processing branch is kept at high resolution, image processing is divided into a plurality of stages, a feature map with smaller resolution (the scale is smaller) is added at the beginning of each stage, branches with corresponding resolution are created, the feature map of each branch at the beginning of each stage is ensured to be combined with the features of feature maps with different scales at the last stage by means of interpolation up-sampling and convolution down-sampling, and after a plurality of stages of processing, all feature maps with small scales are subjected to interpolation up-sampling to obtain a feature map with the same size as the feature map with the maximum scale (the feature map with high resolution is always kept). And mapping the characteristic map into probability maps of different channels, wherein the probability map of each channel corresponds to the probability distribution of one key point in a preset key point set of the human body, the position of the maximum value in each channel is taken as the position of the key point predicted by the channel, and the prediction results of all the channels are combined to obtain all the key points corresponding to the preset key point set and the position information of the key points.
Here, it is defined that each of the preset key point sets of the human body includes 17 key points, and referring to fig. 10, fig. 10 shows that the 17 key points of the human body are respectively a nose, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right crotches, left and right knees, and left and right ankles.
It should be noted that most of the human body key point detection models perform down-sampling to reduce the resolution, and then perform up-sampling to improve the resolution, so as to obtain a final result, lose spatial information to a certain extent, and cause a certain degree of quantization error when calculating the recognition result at last. In order to finally obtain a higher-resolution feature map, the HRnet adopts a parallel design method which always keeps a high-resolution branch, so that abundant spatial information is saved, and the quantization error of a detection result is reduced.
Step 205: and judging the incompleteness degree of the human body based on a preset rule.
In practical implementation, if the detected positions of the key points are located in the filled surrounding area, it is determined that the key points of the current human body part are cut off and are not in the original human body image. And sequentially judging the positions and the regions of all the key points to obtain whether all the detected key points are in the filled surrounding region or the original human body image region, and obtaining the human body incomplete degree based on a preset rule according to the information of all the key points.
Here, the degree of representing the human body incomplete may be quantified by using a preset rule, for example, a ratio of the number of key points of the target object located in the original human body image to the preset number of key points is used as the degree of completeness of the target object in the image to be recognized.
In practical implementation, if the detected key points are more in the peripheral area subjected to the filling processing, the human body is incomplete to a greater extent, and conversely, if the detected key points are more in the original image area, the human body is incomplete to a lesser extent.
Because the conventional image classification-based method can only output two classification results of whether the picture is complete or not, and does not make an explicit judgment on whether a human body exists in the picture or not, the picture without the human body can be misjudged, and the image classification model cannot obtain an intermediate result of the human body incomplete degree, the requirements under different service scenes can be hardly met, and once the service scenes change the incomplete definition of the human body, the model needs to be retrained, and the flexibility is lacked.
According to the incomplete human body recognition method based on the key point detection, whether a human body exists in a picture to be recognized is judged at first, the phenomenon that misjudgment is caused on the integrity of the picture without the human body is avoided, the incomplete degree of the human body can be obtained according to whether the key point is in the peripheral area subjected to filling processing or not, the final judgment on the integrity of the human body is carried out according to the incomplete degree of the human body, and the incomplete recognition of the picture under various business scenes can be flexibly applied through adjusting preset rules and complete thresholds under different business scenes.
Referring to fig. 12, fig. 12 is a schematic flowchart illustrating a process of applying the image recognition method provided by the embodiment of the present application to media account cover map selection. This is explained in conjunction with steps 301 to 305 below.
Step 301: teletext/video content is acquired.
Specifically, the cover map selection of the media account number can be applied to the Tencent watching point and the cover map selection of the video number, for example, when an article/video of sports information is displayed at an online video client, if the current sports information needs to display the information of characters such as an athlete or a game host, the human body information of the athlete or the game host in the cover map of the information at least needs to be ensured to be complete.
In practical implementation, the image-text/video content uploaded by the media account user is obtained, wherein the image-text/video content provides the material of the cover page.
Step 302: and acquiring a plurality of cover picture candidate pictures.
In practical implementation, a plurality of cover picture candidate pictures can be obtained in the image-text/video content in any mode. For example, it is selected or clipped from a teletext material, clipped from a video frame, etc.
Step 303: and selecting a cover picture candidate picture, carrying out incomplete human body recognition processing on the cover picture candidate picture, and judging whether a human body contained in the candidate picture is complete or not.
The object detection model and the key point identification model provided by the embodiment of the application are adopted to carry out human body identification and incompleteness detection on the cover picture candidate picture, so as to obtain the identification result of whether the human body is complete or not. If the human body is complete, go to step 305, and if the human body is not complete, go to step 304.
Step 304: and removing the current cover picture candidate picture from the plurality of cover picture candidate pictures.
Step 305: and processing the cover picture candidate pictures.
Here, the cover image candidate picture processing is performed, and the current cover image candidate picture can be used as a priority candidate image of the cover image, and the subsequent selection processing is performed or the cover image processing stage is directly performed.
Referring to fig. 13, fig. 13 is a schematic flowchart illustrating that the image recognition method provided by the embodiment of the present application is applied to a recommendation system. This is explained in conjunction with steps 401 to 404 below.
Step 401: teletext/video content is acquired.
Specifically, the recommendation system can be applied to QQ viewpoints and WeChat public numbers to recommend interested picture information for users. For example, when a picture of a sporting event is recommended for a sporting event enthusiast, if the current picture of the sporting event needs to show information of a character such as a player or a host of the event, the integrity of the information of the player or the host of the event in the picture is at least ensured.
Here, the teletext/video content may be a content pre-stored in a recommendation pool of the recommendation system, or a content uploaded by a user of the system may be received in real time, or the like.
Step 402: selecting a picture to be recommended, carrying out incomplete human body identification processing on the picture to be recommended, and judging whether a human body contained in the picture to be recommended is complete or not.
Here, the target detection model and the key point identification model provided by the embodiment of the application are used for carrying out human body identification and incompleteness detection on the picture to be recommended, so as to obtain the identification result of whether the human body is complete or not. If the human body is complete, go to step 404, and if the human body is incomplete, go to step 403.
Step 403: not enabling or shielding the current picture to be recommended, or reducing the recommended weight of the picture to be recommended.
Step 404: and (4) putting the image to be identified into a recommendation pool, and waiting for recommendation processing.
By the human body incompleteness identification method, whether a human body exists in the picture to be identified is judged, misjudgment on the integrity of the picture without the human body is avoided, and the human body incompleteness degree can be quickly and flexibly obtained according to the fact that whether the key points are in the peripheral area subjected to filling processing or not. In practical application, the human body imperfection identification method provided by the embodiment of the application can provide a higher-quality alternative drawing for cover drawing selection, can also provide a higher-quality content for an online recommendation system, and obviously improves the use experience of a user.
Continuing with the exemplary structure of the artificial intelligence based image recognition device 255 provided by the embodiments of the present application as software modules, in some embodiments, as shown in fig. 2, the software modules stored in the artificial intelligence based image recognition device 255 of the memory 250 may include: a target detection module 2551, configured to perform target detection processing on the image to be recognized, so as to obtain a target object image from the image to be recognized; a feature extraction module 2552, configured to perform feature extraction processing based on the target object image to obtain corresponding image features; a key point identification module 2553, configured to perform key point identification processing based on image features to obtain key points and corresponding positions of the target object; and the integrity judging module 2554 is configured to determine the integrity degree of the target object in the image to be recognized based on the key points and the corresponding positions of the target object.
In some embodiments, the target detection module 2551 is further configured to: carrying out target detection processing on an image to be recognized to obtain a detection frame comprising a target object; and cutting out the target object image from the image to be recognized based on the position of the detection frame.
In some embodiments, the target detection module 2551 is further configured to: extracting the features of the image to be identified to obtain a corresponding feature map; determining a plurality of candidate frames in an image to be identified; mapping the candidate frames to the feature maps to obtain a plurality of corresponding candidate feature maps; performing maximum pooling on the candidate feature maps to obtain a plurality of candidate area maps with the same size; and carrying out classification processing and candidate frame position regression processing on the multiple candidate region maps to obtain a detection frame comprising the target object.
In some embodiments, the feature extraction module 2552 is further to: and filling the surrounding area of the target object image, and performing feature extraction processing on the filled target object image to obtain corresponding image features, or directly performing feature extraction processing on the cut target object image to obtain corresponding image features.
In some embodiments, when the image features are extracted from the target object image subjected to the filling process, the keypoint identification module 2553 is further configured to: calling the first key point detection model to execute the following processing: mapping the image features into probability maps of a plurality of channels, wherein the probability map of each channel corresponds to the probability distribution of one key point in a preset key point set, and the probability distribution is used for representing the probability that each pixel point in the target object image belongs to the key point corresponding to the probability map; the following processing is performed for each probability map: identifying the pixel point with the maximum probability in the probability map as a key point corresponding to the probability map, and identifying the position of the pixel point with the maximum probability as the position of the key point corresponding to the probability map; combining the key points and the corresponding positions identified from each probability map to form a key point identification result of the target object; the key point identification result comprises a plurality of key points and corresponding positions, and the plurality of key points correspond to all key points in the preset key point set one by one.
In some embodiments, when the image features are extracted from the target object image subjected to the filling process, the keypoint identification module 2553 is further configured to: calling a second key point detection model to execute the following processing: mapping the image characteristics into a probability map, wherein the probability map comprises the probability that each pixel point in the target object image corresponds to each key point in a preset key point set; aiming at each pixel point in the target object image, the following processing is executed: determining the maximum probability in the probability of the pixel point corresponding to each key point in the preset key point set; when the maximum probability exceeds a probability threshold, identifying the pixel point as a key point corresponding to the maximum probability, and identifying the position of the pixel point as the position of the key point corresponding to the maximum probability; combining the key points and the corresponding positions identified from the probability map to form a key point identification result of the target object; and the key point identification result comprises at least one key point in a preset key point set and a corresponding position.
In some embodiments, when the image features are extracted from the target object image subjected to the filling process, the completeness discrimination module 2554 is further configured to: the following processing is performed for each keypoint of the target object: when the positions of the key points are located in the surrounding area for filling processing, determining that the key points are the key points missing in the image to be identified; removing missing key points from the identified key points of the target object to update the identified key points of the target object; taking the ratio of the number of the key points of the target object obtained by the updated identification to the number of the preset key points as the integrity degree of the target object in the image to be identified; wherein the preset number of keypoints is a keypoint count of a preset set of keypoints for the target object.
In some embodiments, when the image features are extracted from the direct target object image, the completeness discrimination module 2554 is further configured to: taking the ratio of the number of key points of the target object obtained by the key point identification processing to the number of preset key points as the integrity degree of the target object in the image to be identified; wherein the preset number of keypoints is a keypoint count of a preset set of keypoints for the target object.
In some embodiments, before determining the ratio, the completeness discrimination module 2554 is further configured to: carrying out occlusion recognition processing on each key point of the target object to determine occluded key points in the key points of the target object image; and removing the occluded key points from the key points of the target object so as to update the key points of the target object.
In some embodiments, the artificial intelligence based image recognition apparatus further comprises: a processing module 2555, configured to delete the image to be recognized from the candidate cover image set when the image to be recognized is a candidate cover image of the media account and the integrity of the image to be recognized is lower than an integrity threshold; and shielding and recommending the image to be identified or reducing the recommendation weight of the image to be identified when the image to be identified is carried in the information to be recommended and the integrity degree of the image to be identified is lower than the integrity degree threshold.
It should be noted that, the description of the image recognition apparatus based on artificial intelligence in the embodiment of the present application is similar to the description of the image recognition method based on artificial intelligence, and has similar beneficial effects to the image recognition method based on artificial intelligence, and therefore, the description thereof is omitted.
An embodiment of the present application provides an electronic device, including:
a memory for storing executable instructions;
and the processor is used for realizing the image recognition method based on artificial intelligence provided by the embodiment of the application when the executable instructions stored in the memory are executed.
The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for realizing the image identification method based on artificial intelligence provided by the embodiment of the application when being executed by a processor.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the artificial intelligence based image recognition method according to the embodiment of the application.
Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, will cause the processor to perform a method provided by embodiments of the present application, for example, the method illustrated in fig. 3A and 3B.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (H TML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
In summary, according to the embodiment of the application, the image to be recognized is subjected to explicit target detection, the image including the target object is extracted, so as to perform key point recognition processing on the target object, and by comparing the recognition key point information with the theoretically recognized key point information, the incompleteness degree of the target object is accurately and flexibly judged, the precision and the accuracy of incompleteness judgment are improved, and different application values can be realized in a specific service scene.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.
Claims (10)
1. An image recognition method based on artificial intelligence is characterized by comprising the following steps:
carrying out target detection processing on an image to be recognized so as to obtain a target object image from the image to be recognized;
performing feature extraction processing based on the target object image to obtain corresponding image features;
performing key point identification processing based on the image characteristics to obtain key points and corresponding positions of the target object;
and determining the integrity degree of the target object in the image to be recognized based on the key points and the corresponding positions of the target object.
2. The method of claim 1,
the target detection processing is carried out on the image to be recognized so as to obtain a target object image from the image to be recognized, and the method comprises the following steps:
carrying out target detection processing on an image to be recognized to obtain a detection frame comprising a target object;
cutting out a target object image from the image to be recognized based on the position of the detection frame;
the feature extraction processing based on the target object image to obtain corresponding image features includes:
filling the surrounding area of the target object image, and performing feature extraction processing on the filled target object image to obtain corresponding image features, or,
and directly carrying out feature extraction processing on the cut target object image to obtain corresponding image features.
3. The method according to claim 2, wherein the performing the target detection processing on the image to be recognized to obtain a detection frame including a target object comprises:
extracting the features of the image to be identified to obtain a corresponding feature map;
determining a plurality of candidate frames in the image to be identified;
mapping the candidate frame to the feature map to obtain a plurality of corresponding candidate feature maps;
performing maximum pooling on the candidate feature maps to obtain a plurality of candidate area maps with the same size;
and carrying out classification processing and candidate frame position regression processing on the plurality of candidate region maps to obtain a detection frame comprising the target object.
4. The method of claim 1,
when the image feature is extracted from the target object image subjected to the filling processing, performing keypoint identification processing based on the image feature to obtain keypoints and corresponding positions of the target object, including:
calling the first key point detection model to execute the following processing:
mapping the image features into probability maps of a plurality of channels, wherein the probability map of each channel corresponds to the probability distribution of one key point in a preset key point set, and the probability distribution is used for representing the probability that each pixel point in the target object image belongs to the key point corresponding to the probability map;
the following processing is performed for each probability map: identifying the pixel point with the maximum probability in the probability map as a key point corresponding to the probability map, and identifying the position of the pixel point with the maximum probability as the position of the key point corresponding to the probability map;
combining the key points and the corresponding positions identified from each probability map to form a key point identification result of the target object;
the key point identification result comprises a plurality of key points and corresponding positions, and the key points correspond to all key points in the preset key point set one by one.
5. The method of claim 1, wherein when the image feature is directly extracted from the target object image, performing a keypoint identification process based on the image feature to obtain keypoints and corresponding locations of the target object comprises:
calling a second key point detection model to execute the following processing:
mapping the image features into a probability map, wherein the probability map comprises the probability that each pixel point in the target object image corresponds to each key point in a preset key point set;
executing the following processing aiming at each pixel point in the target object image:
determining the maximum probability in the probabilities of the pixel points corresponding to all the key points in the preset key point set;
when the maximum probability exceeds a probability threshold, identifying the pixel point as a key point corresponding to the maximum probability, and identifying the position of the pixel point as the position of the key point corresponding to the maximum probability;
combining the key points identified from the probability map and the corresponding positions to form a key point identification result of the target object;
and the key point identification result comprises at least one key point in the preset key point set and a corresponding position.
6. The method of claim 1, wherein when the image feature is extracted from a target object image subjected to a padding process, the determining the completeness of the target object in the image to be recognized based on the key points and the corresponding positions of the target object comprises:
performing the following for each keypoint of the target object:
when the positions of the key points are located in the surrounding area for the filling processing, determining that the key points are the key points missing from the image to be identified;
removing the missing key points from the identified key points of the target object to update the identified key points of the target object;
taking the ratio of the number of the key points of the target object obtained by the updated identification to the number of the preset key points as the integrity degree of the target object in the image to be identified;
wherein the preset number of keypoints is a keypoint count of a preset set of keypoints of the target object.
7. The method of claim 1, wherein when the image feature is extracted from the target object image directly, the determining the completeness of the target object in the image to be recognized based on the key points and corresponding locations of the target object comprises:
taking the ratio of the number of the key points of the target object obtained by the key point identification processing to the number of preset key points as the integrity degree of the target object in the image to be identified;
wherein the preset number of keypoints is a keypoint count of a preset set of keypoints of the target object.
8. The method of claim 6 or 7, wherein prior to determining the ratio, the method further comprises:
carrying out occlusion recognition processing on each key point of the target object so as to determine occluded key points in the key points of the target object image;
removing the occluded key points from the key points of the target object to update the key points of the target object.
9. The method according to any one of claims 1 to 8, further comprising:
when the image to be recognized is a candidate cover image of a media account and the integrity degree of the image to be recognized is lower than an integrity degree threshold value, deleting the image to be recognized from a candidate cover image set;
and shielding and recommending the image to be identified or reducing the recommendation weight of the image to be identified when the image to be identified is carried in the information to be recommended and the integrity degree of the image to be identified is lower than an integrity degree threshold value.
10. An image recognition apparatus based on artificial intelligence, comprising:
the target detection module is used for carrying out target detection processing on the image to be recognized so as to acquire a target object image from the image to be recognized;
the characteristic extraction module is used for carrying out characteristic extraction processing on the basis of the target object image to obtain corresponding image characteristics;
the key point identification module is used for carrying out key point identification processing based on the image characteristics to obtain key points and corresponding positions of the target object;
and the integrity judging module is used for determining the integrity degree of the target object in the image to be identified based on the key point and the corresponding position of the target object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110238449.1A CN113705294A (en) | 2021-03-04 | 2021-03-04 | Image identification method and device based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110238449.1A CN113705294A (en) | 2021-03-04 | 2021-03-04 | Image identification method and device based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113705294A true CN113705294A (en) | 2021-11-26 |
Family
ID=78647849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110238449.1A Pending CN113705294A (en) | 2021-03-04 | 2021-03-04 | Image identification method and device based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113705294A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114581959A (en) * | 2022-05-09 | 2022-06-03 | 南京安元科技有限公司 | Work clothes wearing detection method based on clothes style feature extraction |
CN114758124A (en) * | 2022-03-30 | 2022-07-15 | 北京奇艺世纪科技有限公司 | Occlusion detection method, device, equipment and computer readable medium for target object |
CN115214430A (en) * | 2022-03-23 | 2022-10-21 | 广州汽车集团股份有限公司 | Vehicle seat adjusting method and vehicle |
CN115359265A (en) * | 2022-08-18 | 2022-11-18 | 腾讯科技(深圳)有限公司 | Key point extraction method, device, equipment and storage medium |
CN116468654A (en) * | 2023-02-01 | 2023-07-21 | 北京纳通医用机器人科技有限公司 | Image processing method, device, equipment and storage medium |
-
2021
- 2021-03-04 CN CN202110238449.1A patent/CN113705294A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115214430A (en) * | 2022-03-23 | 2022-10-21 | 广州汽车集团股份有限公司 | Vehicle seat adjusting method and vehicle |
CN115214430B (en) * | 2022-03-23 | 2023-11-17 | 广州汽车集团股份有限公司 | Vehicle seat adjusting method and vehicle |
CN114758124A (en) * | 2022-03-30 | 2022-07-15 | 北京奇艺世纪科技有限公司 | Occlusion detection method, device, equipment and computer readable medium for target object |
CN114581959A (en) * | 2022-05-09 | 2022-06-03 | 南京安元科技有限公司 | Work clothes wearing detection method based on clothes style feature extraction |
CN115359265A (en) * | 2022-08-18 | 2022-11-18 | 腾讯科技(深圳)有限公司 | Key point extraction method, device, equipment and storage medium |
CN116468654A (en) * | 2023-02-01 | 2023-07-21 | 北京纳通医用机器人科技有限公司 | Image processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112052787B (en) | Target detection method and device based on artificial intelligence and electronic equipment | |
CN109948497B (en) | Object detection method and device and electronic equipment | |
CN113705294A (en) | Image identification method and device based on artificial intelligence | |
CN111814902A (en) | Target detection model training method, target identification method, device and medium | |
CN109272509B (en) | Target detection method, device and equipment for continuous images and storage medium | |
CN112734775B (en) | Image labeling, image semantic segmentation and model training methods and devices | |
CN110517246B (en) | Image processing method and device, electronic equipment and storage medium | |
JP6330385B2 (en) | Image processing apparatus, image processing method, and program | |
CN112052781A (en) | Feature extraction model training method, face recognition device, face recognition equipment and medium | |
CN112381837B (en) | Image processing method and electronic equipment | |
CN111160335A (en) | Image watermarking processing method and device based on artificial intelligence and electronic equipment | |
CN111368758A (en) | Face ambiguity detection method and device, computer equipment and storage medium | |
CN107871314B (en) | Sensitive image identification method and device | |
CN112418195B (en) | Face key point detection method and device, electronic equipment and storage medium | |
CN111340195A (en) | Network model training method and device, image processing method and storage medium | |
CN110569731A (en) | face recognition method and device and electronic equipment | |
CN111461070B (en) | Text recognition method, device, electronic equipment and storage medium | |
CN110378837A (en) | Object detection method, device and storage medium based on fish-eye camera | |
CN112733802A (en) | Image occlusion detection method and device, electronic equipment and storage medium | |
CN110046574A (en) | Safety cap based on deep learning wears recognition methods and equipment | |
CN107516102B (en) | Method, device and system for classifying image data and establishing classification model | |
CN112836625A (en) | Face living body detection method and device and electronic equipment | |
CN113239875B (en) | Method, system and device for acquiring face characteristics and computer readable storage medium | |
CN112633159A (en) | Human-object interaction relation recognition method, model training method and corresponding device | |
CN109543629B (en) | Blink identification method, device, equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |