CN113837236A - Method and device for identifying target object in image, terminal equipment and storage medium - Google Patents

Method and device for identifying target object in image, terminal equipment and storage medium Download PDF

Info

Publication number
CN113837236A
CN113837236A CN202111013970.1A CN202111013970A CN113837236A CN 113837236 A CN113837236 A CN 113837236A CN 202111013970 A CN202111013970 A CN 202111013970A CN 113837236 A CN113837236 A CN 113837236A
Authority
CN
China
Prior art keywords
image
target object
preset
transition region
edge transition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111013970.1A
Other languages
Chinese (zh)
Other versions
CN113837236B (en
Inventor
邓立邦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Zhimeiyuntu Tech Corp ltd
Original Assignee
Guangdong Zhimeiyuntu Tech Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Zhimeiyuntu Tech Corp ltd filed Critical Guangdong Zhimeiyuntu Tech Corp ltd
Priority to CN202111013970.1A priority Critical patent/CN113837236B/en
Publication of CN113837236A publication Critical patent/CN113837236A/en
Application granted granted Critical
Publication of CN113837236B publication Critical patent/CN113837236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, terminal equipment and a storage medium for identifying a target object in an image, wherein the method comprises the following steps: acquiring an image to be identified; inputting the image to be recognized into a preset image target recognition model so that the image target recognition model recognizes whether the image to be recognized contains a preset target object; the image target recognition model is formed by training through a preset neural network based on a first main body image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main body image and a semantic annotation image corresponding to the first edge transition region image. By implementing the embodiment of the invention, the accuracy of target object identification in the image can be improved under the condition of few samples.

Description

Method and device for identifying target object in image, terminal equipment and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for identifying a target object in an image, a terminal device, and a storage medium.
Background
In the field of artificial intelligence image recognition, when an existing image target recognition model is trained, images of a target object under different scenes and different angles need to be acquired, then the target object and a background in each image are segmented, and training is performed based on a main image of the segmented target object. By adopting the scheme, a large number of samples need to be collected, and the accuracy is poor when the target identification model is constructed under the condition of lacking a large number of sample data.
Disclosure of Invention
The embodiment of the invention provides a method and a device for identifying a target object in an image, a terminal device and a storage medium, which can improve the accuracy of identifying the target object in the image under the condition of few samples.
The invention provides a method for identifying a target object in an image, which comprises the following steps:
acquiring an image to be identified;
inputting the image to be recognized into a preset image target recognition model so that the image target recognition model recognizes whether the image to be recognized contains a preset target object;
the image target recognition model is formed by training through a preset neural network based on a first main body image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main body image and a semantic annotation image corresponding to the first edge transition region image.
In this embodiment, a main body image and an edge transition region image of the target object are extracted as training samples, the main body image of the target object can represent the characteristics of the target object such as color, shape and texture, and the edge transition region image can represent the characteristics of the border portion between the edge of the target object and the image background, so that the trained model can realize accurate identification of the target object based on the characteristics of the target object and the characteristics of the border portion between the edge of the target object and the image background even in the case of few samples or even single sample, and the accuracy of image target identification in the case of few samples or even single sample is improved.
In a preferred embodiment, the method for constructing the image object recognition model comprises the following steps:
acquiring a first main body image and a first edge transition region image of a preset target object in the original image;
obtaining a semantic annotation image corresponding to the first main body image to obtain a second main body image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image;
inputting the first main body image, the first edge transition region image, the second main body image and the second edge transition region image into a preset GAN neural network, performing alternate iterative training on a generator and a discriminator in the GAN neural network, and taking the generator after training as the image target recognition model.
In the embodiment, the countermeasure training is carried out based on the GAN neural network, compared with other generation models, only back propagation is used, and a complex Markov chain is not needed, so that the training difficulty of the model is reduced.
In a preferred embodiment, the generator comprises: a plurality of levels of hidden layers;
when the generator is trained, extracting a feature vector of each image from the first main body image and the first edge transition region image to generate a feature vector set; respectively inputting the feature vector set into hidden layers of each level, and training each hidden layer in the generator;
when the hidden layer to be trained is a first hidden layer, the hidden layer to be trained is trained according to the feature vector set and the influence weight of the feature vector set on the first hidden layer;
and when the hidden layer to be trained is not the first hidden layer, training the hidden layer to be trained according to the feature vector set, the influence weight of the feature vector set on the hidden layer to be trained and the output result of the previous hidden layer.
Different from the traditional GAN neural network, in the traditional GAN neural network, a first layer of hidden layer trains according to input samples, and transmits a generated result to a second layer of hidden layer, then the second layer of hidden layer trains according to the generated result of the first layer of hidden layer, and then transmits the generated result to a third layer of hidden layer, and so on; by training in the sampling mode, under the scene of few samples, the hidden layer of the middle level has the problems of insufficient training or overfitting. Therefore, in the embodiment of the invention, the network structure of the generator in the GAN neural network is changed, the feature vectors of the samples extracted by the input layer are input into the hidden layers according to the preset influence weights, and each hidden layer in the middle can be trained according to the output result of the previous hidden layer and the feature vectors of the samples, so that the problem of insufficient training or overfitting of the hidden layer due to lack of training samples under the condition of few samples or single samples is solved, and the effect of the model is further improved.
In a preferred embodiment, the extracting feature vectors of the images from the first subject image and the first edge transition region image to generate a feature vector set specifically includes:
and extracting color pixel matrixes of the images from the first main image and the first edge transition region image to generate a color pixel matrix set, and taking the color pixel matrix set as the feature vector set.
In a preferred embodiment, the obtaining of the semantic annotation image corresponding to the first subject image obtains a second subject image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image, which specifically includes:
and performing semantic annotation on the first main body image and the first edge transition region image according to a preset color to obtain the second main body image and the second edge transition region image.
In a preferred embodiment, a preset target object in the original image is segmented by a preset image segmentation model, so as to obtain the first subject image.
In a preferred embodiment, a preset target object in the original image is segmented through a preset image segmentation model to obtain the first main body image, so that the complexity of manual segmentation can be reduced.
In a preferred embodiment, the image region increased by expanding the preset target object outwards by a first preset ratio according to the area of the preset target object along the object edge is combined with the image region decreased by contracting the preset target object inwards by a second preset ratio according to the area of the preset target object along the object edge to obtain the first edge transition region image of the preset target object.
In a preferred embodiment, the value range of the first preset proportion is [ 10%, 50% ]; the value range of the second preset proportion is [ 10%, 50% ].
In this embodiment, a region where the edge of the target object expands outward by at least 10% and contracts inward by no more than 50% is taken as an edge transition region; the characteristics of the border part between the image background and the edge of the preset target object can be well reflected in the area, so that the image area in the range is extracted to serve as a first edge transition area, the target object can be recognized more accurately by the trained model, and the recognition effect of the target object is further improved.
On the basis of the above method item embodiments, the present invention correspondingly provides apparatus item embodiments;
an embodiment of the present invention provides an apparatus for identifying a target object in an image, including: the device comprises an image to be recognized acquisition module and a target object recognition module;
the image to be identified acquisition module is used for acquiring an image to be identified;
the target object recognition module is used for inputting the image to be recognized into a preset image target recognition model so as to enable the image target recognition model to recognize whether the image to be recognized contains a preset target object or not; the image target recognition model is formed by training through a preset neural network based on a first main body image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main body image and a semantic annotation image corresponding to the first edge transition region image.
In a preferred embodiment, the system further comprises a model building module;
the model building module is used for acquiring a first main body image and a first edge transition region image of a preset target object in the original image;
obtaining a semantic annotation image corresponding to the first main body image to obtain a second main body image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image;
inputting the first main body image, the first edge transition region image, the second main body image and the second edge transition region image into a preset GAN neural network, performing alternate iterative training on a generator and a discriminator in the GAN neural network, and taking the generator after training as the image target recognition model.
In a preferred embodiment, the generator comprises: a plurality of levels of hidden layers;
when the generator is trained, extracting a feature vector of each image from the first main body image and the first edge transition region image to generate a feature vector set; respectively inputting the feature vector set into hidden layers of each level, and training each hidden layer in the generator;
when the hidden layer to be trained is a first hidden layer, the hidden layer to be trained is trained according to the feature vector set and the influence weight of the feature vector set on the first hidden layer;
and when the hidden layer to be trained is not the first hidden layer, training the hidden layer to be trained according to the feature vector set, the influence weight of the feature vector set on the hidden layer to be trained and the output result of the previous hidden layer.
In a preferred embodiment, the extracting feature vectors of the images from the first subject image and the first edge transition region image to generate a feature vector set specifically includes:
and extracting color pixel matrixes of the images from the first main image and the first edge transition region image to generate a color pixel matrix set, and taking the color pixel matrix set as the feature vector set.
In a preferred embodiment, the semantic annotation image of the first subject image is obtained to obtain a second subject image; obtaining a semantic annotation image of the first edge transition region image to obtain a second edge transition region image, which specifically includes:
and performing semantic annotation on the first main body image and the first edge transition region image according to a preset color to obtain the second main body image and the second edge transition region image.
In a preferred embodiment, a preset target object in the original image is segmented by a preset image segmentation model, so as to obtain the first subject image.
In a preferred embodiment, the image region increased by expanding the preset target object outwards by a first preset ratio according to the area of the preset target object along the object edge is combined with the image region decreased by contracting the preset target object inwards by a second preset ratio according to the area of the preset target object along the object edge to obtain the first edge transition region image of the preset target object.
In a preferred embodiment, the first predetermined ratio ranges from [ 10%, 50% ]; the value range of the second preset proportion is [ 10%, 50% ].
On the basis of the embodiment of the method item, the invention correspondingly provides an embodiment of the terminal equipment item;
an embodiment of the present invention provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the method for identifying a target object in an image according to any one of the present invention is implemented.
On the basis of the above method item embodiment, the present invention correspondingly provides a storage medium item embodiment;
an embodiment of the present invention provides a storage medium, where the storage medium includes a stored computer program, and when the computer program runs, a device on which the storage medium is located is controlled to execute the method for identifying a target object in an image according to any one of the present invention.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides an image target identification method, an image target identification device, terminal equipment and a storage medium. Compared with the existing GauGAN image target recognition model, the method not only obtains the main body image of the target object in the image, but also obtains the edge transition region image of the target object when training the image target recognition model. The edge transition region image can represent the characteristics of the border part of the target object and the image background. Therefore, even under the condition of few samples, the trained image target recognition model can recognize the main body characteristic of the target object in the image and the characteristic of the edge area, and the accuracy of target recognition is improved.
Drawings
Fig. 1 is a flowchart illustrating a method for identifying a target object in an image according to an embodiment of the present invention.
FIG. 2 is a diagram of an original image provided by an embodiment of the present invention;
fig. 3 is a first subject image of a preset target object in an original image according to an embodiment of the present invention;
fig. 4 is a first edge transition region image of a preset target object in an original image according to an embodiment of the present invention;
FIG. 5 is a second subject image of a semantic target object in a semantic annotation image provided by an embodiment of the invention;
FIG. 6 is an image of a first edge transition region of a semantic target object in a semantic annotation image according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an edge hair texture for a dog according to one embodiment of the present invention.
FIG. 8 is a schematic diagram of a generator according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of another generator according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of a further generator according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an apparatus for recognizing a target object in an image according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for identifying a target object in an image, which at least includes the following steps:
step S1, acquiring an image to be recognized;
step S2, inputting the image to be recognized into a preset image target recognition model so that the image target recognition model recognizes whether the image to be recognized contains a preset target object or not; the image target recognition model is formed by training through a preset neural network based on a first main body image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main body image and a semantic annotation image corresponding to the first edge transition region image.
For step S101, acquiring an image to be recognized, which may be input by a user or acquired from other devices, performing target object recognition on the image to be recognized according to the subsequent steps, and determining whether the image to be recognized includes a preset target object.
For step S102: first, an image target recognition model is explained:
obtaining a training sample: training samples of the model are a first main image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main image and a semantic annotation image corresponding to the first edge transition region image; taking a 'dog' in an image as a preset target, then performing image segmentation on the dog in the original image to obtain a whole image as a first main body image shown in fig. 3, then performing segmentation on the dog in the original image and a background bordering part of the image to obtain a first edge transition image shown in fig. 4, and then performing semantic annotation on the first main body image and the second main body image to obtain a second main body image shown in fig. 5 and a second edge transition area image shown in fig. 6;
the model can learn the texture, color and shape characteristics of the whole dog image from the first main image and the second main image; the feature of the joint of the edge of the dog and the background environment thereof can be learned from the first edge transition region image and the second edge transition region image, but the relationship between the edge of the dog and the background environment is actually obtained, for example, the edge of the dog is a hair, and after the edge transition region image of the dog is obtained, the edge of the dog is found to be not a completely smooth edge straight line, but a certain hollow position along with the texture of the hair, specifically as shown in fig. 7, as can be seen from fig. 7, the edge of the hair of the dog is not a straight line, but the edge of the dog has many hollow parts due to the existence of many fine hairs. Therefore, the model can learn the shape characteristics of the dog edge, the color, the texture and other characteristics of the dog edge. Therefore, when the edge transition region image is input into the model for learning, the model can identify the relationship between the edge of the dog and the background environment, so that the target of the dog can be identified more accurately.
In an optional embodiment, the obtaining of the semantic annotation image corresponding to the first subject image obtains a second subject image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image, which specifically includes: and performing semantic annotation on the first main body image and the first edge transition region image according to a preset color to obtain the second main body image and the second edge transition region image. That is, in this embodiment, the color may be changed by a preset color: the first subject image and the first edge transition region image are semantically labeled, for example, green. In the actual operation process, the first subject image and the first edge transition region may be manually colored by using a drawing tool, so as to obtain the second subject image and the second edge transition region image. In some optional embodiments, the original image may be subjected to automatic semantic annotation directly through an existing image semantic segmentation model, such as a label model, to form a semantic annotation image corresponding to the original image, and then the second main body image and the second edge transition region image may be segmented from the semantic annotation image.
In an optional embodiment, a preset target object in the original image is wholly segmented through a preset image segmentation model, so as to obtain the first subject image. In this embodiment, the first subject image may be obtained by performing overall segmentation on a preset target object in an original image based on an existing image segmentation model; in addition, in other alternative embodiments, the main body image of the preset target object in the original image may be segmented by manual segmentation or by an auxiliary function such as a magic stick of software such as photoshop.
In a preferred embodiment, the image region increased by expanding the preset target object outwards by a first preset ratio according to the area of the preset target object along the object edge is combined with the image region decreased by contracting the preset target object inwards by a second preset ratio according to the area of the preset target object along the object edge to obtain the first edge transition region image of the preset target object. The preferable value range of the first preset proportion is [ 10%, 50% ]; the value range of the second preset proportion is [ 10%, 50% ]. In this embodiment, a region in which the edge of the preset target object expands outward by at least 10% and contracts inward by no more than 50% is taken as an edge transition region; the characteristics of the border part between the image background and the edge of the preset target object can be well reflected in the area range, so that the image area in the area range is extracted to serve as a first edge transition area, the target object can be recognized more accurately by the trained model, and the recognition effect of the target object is further improved. Also in some alternative embodiments, the first edge transition region may be manually segmented according to the above-mentioned rules.
It should be noted that, if the original image is directly subjected to automatic semantic annotation by the existing image semantic segmentation model, such as the label model, a semantic annotation image corresponding to the original image is formed, and then the second main image and the second edge transition region image are segmented from the semantic annotation image. Then, the second subject image can also be segmented or manually segmented by the existing image segmentation model; the second edge transition region image may also be manually segmented or automatically segmented according to the same segmentation rules as the first edge transition region image.
Constructing a model: in a preferred embodiment, the method for constructing the image object recognition model comprises the following steps:
acquiring a first main body image and a first edge transition region image of a preset target object in the original image; obtaining a semantic annotation image corresponding to the first main body image to obtain a second main body image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image; inputting the first main body image, the first edge transition region image, the second main body image and the second edge transition region image into a preset GAN neural network, performing alternate iterative training on a generator and a discriminator in the GAN neural network, and taking the generator after training as the image target recognition model.
After the images are acquired, inputting the first main body image, the second main body image, the first edge transition region image and the second edge transition region image into a GAN neural network for alternative iterative training; the GAN neural network comprises a generator and a discriminator, wherein during training, the generator takes a first main body image and a first edge transition region image as input for training, the discriminator takes a second main body image, a second edge transition region image and a structure generated and output by the generator as input for training, then network parameters of the generator are adjusted according to a discrimination result of the discriminator, and finally the trained generator is used as an image target recognition model.
Preferably, in the training of the generator, a feature vector set is generated by extracting a feature vector of each image from the first subject image and the first edge transition region image, and the generator is trained based on the feature vector set. When the discriminator is trained, extracting the feature vectors of the images from a second main body image, the second edge transition region image and the image output by the generator to generate a second feature vector set, and training the discriminator according to the second feature vector set. Illustratively, a color pixel matrix of each image is extracted as a feature vector of each image. The color pixel matrix includes, but is not limited to, any one of the following: a gray value pixel matrix or an RGB pixel matrix.
In order to further improve the accuracy of model target object identification under the condition of few samples or even single sample, the structure of a generator in the GAN neural network is improved in a preferred embodiment of the invention;
in a preferred embodiment, as shown in fig. 8, the generator comprises: a plurality of levels of hidden layers;
when the generator is trained, extracting a feature vector of each image from the first main body image and the first edge transition region image to generate a feature vector set; respectively inputting the feature vector set into hidden layers of each level, and training each hidden layer in the generator;
when the hidden layer to be trained is a first hidden layer, the hidden layer to be trained is trained according to the feature vector set and the influence weight of the feature vector set on the first hidden layer;
and when the hidden layer to be trained is not the first hidden layer, training the hidden layer to be trained according to the feature vector set, the influence weight (namely weight coefficient) of the feature vector set on the hidden layer to be trained and the output result of the previous hidden layer.
Different from the traditional GAN neural network, in the traditional GAN neural network, a first layer of hidden layer trains according to input samples, and transmits a generated result to a second layer of hidden layer, then the second layer of hidden layer trains according to the generated result of the first layer of hidden layer, and then transmits the generated result to a third layer of hidden layer, and so on; by training in the sampling mode, under the scene of few samples, the hidden layer of the middle level has the problems of insufficient training or overfitting. Therefore, in this embodiment of the present invention, the network structure of the generator in the GAN neural network is improved, a long memory model is adopted, the feature vectors of the samples extracted by the input layer are input into each hidden layer according to each preset influence weight, and each hidden layer in the middle can be trained according to the output result of the previous hidden layer and the feature vectors of the samples, so that the problem of insufficient training or overfitting of the middle hidden layer due to lack of training samples in the case of few samples or single samples is solved, and the effect of the model is further improved. It should be noted that the influence weight coefficient corresponding to each hidden layer may be set according to actual conditions, and the influence weights of the hidden layers may be the same or different.
In practical cases, there are generator network structures in which each hidden layer of each hierarchy may be composed of a plurality of sub-networks, as shown in fig. 9, and the second hidden layer includes a second hidden layer 1, a second hidden layer 2, and a second hidden layer 3; in this case, the feature vectors are inputted into the second hidden layer by the same input layer, which is to be understood as that the feature vectors are inputted into the sub-networks of the second hidden layer, i.e. the feature vectors are inputted into the second hidden layer 1, the second hidden layer 2 and the second hidden layer 3 shown in fig. 9, respectively, so that the second hidden layer 1, the second hidden layer 2 and the second hidden layer 3 are trained according to the feature vectors, the respective influence weights and the output result of the first hidden layer.
Preferably, in order to further avoid the over-fitting effect of the model, when the network structure of the generator of the present invention is configured as the network structure shown in fig. 9, the existing drop method (Dropouts) may be adopted to randomly disconnect a sub-network of an upper hidden layer from a lower hidden layer, and input the output of the upper hidden layer into the disconnected sub-network to avoid the over-fitting phenomenon of the model; as shown in fig. 10 schematically, when the first hidden layer is disconnected from the second hidden layer 2 in the second hidden layer, the second hidden layer 2 only needs to be trained according to the feature vector input by the input layer and the influence weight coefficient corresponding to the feature vector. After the image target recognition model is trained, when the model is used, the image to be recognized is input into the image target recognition model, whether an image area with the color consistent with that of the image with the second main body image and the second edge transition area is contained in the output image generated by the generator or not is judged, and if the image area with the color consistent with that of the image with the second main body image and the second edge transition area is contained, a preset target object exists in the image to be recognized.
The method for identifying the target object in the image provided by the invention has a wide application range, and can be matched with a security check instrument in a customs security check scene to teach a machine to identify a novel suspicious object. In the customs security inspection scene, if a novel suspicious object is found, generally speaking, the number of samples is extremely small, but the machine is urgently required to learn to identify the object, so as to rapidly check whether similar suspicious objects exist and the positions of the similar suspicious objects in a large range, and therefore, the method is very suitable for identifying the target object in the image by adopting the method provided by the invention, and the machine learns to identify the target object under the condition of few samples.
As shown in fig. 11, on the basis of the above embodiment of the method, the present invention correspondingly provides an embodiment of the apparatus:
the embodiment of the invention provides a device for identifying a target object in an image, which comprises an image acquisition module to be identified and a target object identification module;
the image to be identified acquisition module is used for acquiring an image to be identified;
the target object recognition module is used for inputting the image to be recognized into a preset image target recognition model so as to enable the image target recognition model to recognize whether the image to be recognized contains a preset target object or not; the image target recognition model is formed by training through a preset neural network based on a first main body image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main body image and a semantic annotation image corresponding to the first edge transition region image.
In a preferred embodiment, the system further comprises a model building module; the model building module is used for acquiring a first main body image and a first edge transition region image of a preset target object in the original image;
obtaining a semantic annotation image corresponding to the first main body image to obtain a second main body image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image;
inputting the first main body image, the first edge transition region image, the second main body image and the second edge transition region image into a preset GAN neural network, performing alternate iterative training on a generator and a discriminator in the GAN neural network, and taking the generator after training as the image target recognition model.
In a preferred embodiment, the generator comprises: a plurality of levels of hidden layers;
when the generator is trained, extracting a feature vector of each image from the first main body image and the first edge transition region image to generate a feature vector set; respectively inputting the feature vector set into hidden layers of each level, and training each hidden layer in the generator;
when the hidden layer to be trained is a first hidden layer, the hidden layer to be trained is trained according to the feature vector set and the influence weight of the feature vector set on the first hidden layer;
and when the hidden layer to be trained is not the first hidden layer, training the hidden layer to be trained according to the feature vector set, the influence weight of the feature vector set on the hidden layer to be trained and the output result of the previous hidden layer.
In a preferred embodiment, the extracting feature vectors of the images from the first subject image and the first edge transition region image to generate a feature vector set specifically includes:
and extracting color pixel matrixes of the images from the first main image and the first edge transition region image to generate a color pixel matrix set, and taking the color pixel matrix set as the feature vector set.
In a preferred embodiment, the semantic annotation image of the first subject image is obtained to obtain a second subject image; obtaining a semantic annotation image of the first edge transition region image to obtain a second edge transition region image, which specifically includes:
and performing semantic annotation on the first main body image and the first edge transition region image according to a preset color to obtain the second main body image and the second edge transition region image.
In a preferred embodiment, a preset target object in the original image is segmented by a preset image segmentation model, so as to obtain the first subject image.
In a preferred embodiment, the image region increased by expanding the preset target object outwards by a first preset ratio according to the area of the preset target object along the object edge is combined with the image region decreased by contracting the preset target object inwards by a second preset ratio according to the area of the preset target object along the object edge to obtain the first edge transition region image of the preset target object.
In a preferred embodiment, the first predetermined ratio ranges from [ 10%, 50% ]; the value range of the second preset proportion is [ 10%, 50% ].
It should be noted that the above-described embodiments of the apparatus for recognizing a target object in an image correspond to the above-described embodiments of the method for recognizing a target object in each image of the present invention, and can implement any one of the above-described methods for recognizing a target object in an image of the present invention. In addition, the above embodiments of the device for identifying a target object in an image are merely schematic, where the modules described as separate components may or may not be physically separate, and the components displayed as modules may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the device for identifying the target object in the image, the connection relationship between the modules indicates that the modules have communication connection, and the connection relationship can be specifically realized as one or more communication buses or signal lines.
On the basis of the above method item embodiment of the present invention, a terminal equipment item embodiment is correspondingly provided;
an embodiment of the present invention provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement a method for identifying a target object in an image according to any one of the present invention, such as the method shown in fig. 1, or the processor executes the computer program to implement functions of modules in the embodiments of an apparatus for identifying a target object in each image.
Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device.
The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the whole terminal device using various interfaces and lines.
The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
On the basis of the above method item embodiments, the present invention correspondingly provides storage medium item embodiments;
another embodiment of the present invention provides a storage medium, which includes a stored computer program, wherein when the computer program runs, a device on which the storage medium is located is controlled to execute the method for identifying a target object in any one of the images according to the present invention. The storage medium is a computer-readable storage medium, and the computer program includes a computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (18)

1. A method for identifying a target object in an image, comprising:
acquiring an image to be identified;
inputting the image to be recognized into a preset image target recognition model so that the image target recognition model recognizes whether the image to be recognized contains a preset target object;
the image target recognition model is formed by training through a preset neural network based on a first main body image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main body image and a semantic annotation image corresponding to the first edge transition region image.
2. The method for identifying the target object in the image according to claim 1, wherein the method for constructing the image target identification model comprises the following steps:
acquiring a first main body image and a first edge transition region image of a preset target object in the original image;
obtaining a semantic annotation image corresponding to the first main body image to obtain a second main body image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image;
inputting the first main body image, the first edge transition region image, the second main body image and the second edge transition region image into a preset GAN neural network, performing alternate iterative training on a generator and a discriminator in the GAN neural network, and taking the generator after training as the image target recognition model.
3. The method of identifying a target object in an image of claim 2, wherein the generator comprises: a plurality of levels of hidden layers;
when the generator is trained, extracting a feature vector of each image from the first main body image and the first edge transition region image to generate a feature vector set; respectively inputting the feature vector set into hidden layers of each level, and training each hidden layer in the generator;
when the hidden layer to be trained is a first hidden layer, the hidden layer to be trained is trained according to the feature vector set and the influence weight of the feature vector set on the first hidden layer;
and when the hidden layer to be trained is not the first hidden layer, training the hidden layer to be trained according to the feature vector set, the influence weight of the feature vector set on the hidden layer to be trained and the output result of the previous hidden layer.
4. The method for identifying a target object in an image according to claim 3, wherein the extracting feature vectors of each image from the first main body image and the first edge transition region image to generate a feature vector set specifically includes:
and extracting color pixel matrixes of the images from the first main image and the first edge transition region image to generate a color pixel matrix set, and taking the color pixel matrix set as the feature vector set.
5. The method for identifying a target object in an image according to claim 2, wherein the semantic annotation image corresponding to the first subject image is obtained to obtain a second subject image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image, which specifically includes:
and performing semantic annotation on the first main body image and the first edge transition region image according to a preset color to obtain the second main body image and the second edge transition region image.
6. The method according to claim 2, wherein the first subject image is obtained by segmenting a preset target object in the original image through a preset image segmentation model.
7. The method as claimed in claim 2, wherein the image region increased by extending the preset target object along the edge of the object by a first preset ratio according to the area of the preset target object and the image region decreased by contracting the preset target object along the edge of the object by a second preset ratio according to the area of the preset target object are combined to obtain the first edge transition region image of the preset target object.
8. The method for identifying the target object in the image according to claim 7, wherein the first preset proportion has a value range of [ 10%, 50% ]; the value range of the second preset proportion is [ 10%, 50% ].
9. An apparatus for identifying a target object in an image, comprising: the device comprises an image to be recognized acquisition module and a target object recognition module;
the image to be identified acquisition module is used for acquiring an image to be identified;
the target object recognition module is used for inputting the image to be recognized into a preset image target recognition model so as to enable the image target recognition model to recognize whether the image to be recognized contains a preset target object or not; the image target recognition model is formed by training through a preset neural network based on a first main body image of a preset target object in an original image, a first edge transition region image of the preset target object, a semantic annotation image corresponding to the first main body image and a semantic annotation image corresponding to the first edge transition region image.
10. An apparatus for identifying a target object in an image as defined in claim 9, further comprising a model building module;
the model building module is used for acquiring a first main body image and a first edge transition region image of a preset target object in the original image;
obtaining a semantic annotation image corresponding to the first main body image to obtain a second main body image; obtaining a semantic annotation image corresponding to the first edge transition region image to obtain a second edge transition region image;
inputting the first main body image, the first edge transition region image, the second main body image and the second edge transition region image into a preset GAN neural network, performing alternate iterative training on a generator and a discriminator in the GAN neural network, and taking the generator after training as the image target recognition model.
11. An apparatus for identifying a target object in an image as defined in claim 10, wherein the generator comprises: a plurality of levels of hidden layers;
when the generator is trained, extracting a feature vector of each image from the first main body image and the first edge transition region image to generate a feature vector set; respectively inputting the feature vector set into hidden layers of each level, and training each hidden layer in the generator;
when the hidden layer to be trained is a first hidden layer, the hidden layer to be trained is trained according to the feature vector set and the influence weight of the feature vector set on the first hidden layer;
and when the hidden layer to be trained is not the first hidden layer, training the hidden layer to be trained according to the feature vector set, the influence weight of the feature vector set on the hidden layer to be trained and the output result of the previous hidden layer.
12. The apparatus for recognizing a target object in an image according to claim 11, wherein the extracting feature vectors of each image from the first subject image and the first edge transition region image to generate a feature vector set specifically includes:
and extracting color pixel matrixes of the images from the first main image and the first edge transition region image to generate a color pixel matrix set, and taking the color pixel matrix set as the feature vector set.
13. The apparatus for identifying a target object in an image according to claim 10, wherein the semantic annotation image of the first subject image is obtained to obtain a second subject image; obtaining a semantic annotation image of the first edge transition region image to obtain a second edge transition region image, which specifically includes:
and performing semantic annotation on the first main body image and the first edge transition region image according to a preset color to obtain the second main body image and the second edge transition region image.
14. The apparatus for identifying a target object in an image according to claim 10, wherein the first subject image is obtained by segmenting a preset target object in the original image through a preset image segmentation model.
15. The apparatus for identifying a target object in an image as claimed in claim 10, wherein the image region increased by extending the preset target object along the edge of the object by a first preset ratio according to the area of the preset target object and the image region decreased by contracting the preset target object along the edge of the object by a second preset ratio according to the area of the preset target object are combined to obtain the first edge transition region image of the preset target object.
16. The apparatus according to claim 15, wherein the first predetermined ratio is in a range of [ 10%, 50% ]; the value range of the second preset proportion is [ 10%, 50% ].
17. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of identifying a target object in an image according to any one of claims 1 to 8 when executing the computer program.
18. A storage medium, characterized in that the storage medium comprises a stored computer program, wherein when the computer program runs, a device on which the storage medium is located is controlled to execute the method for identifying a target object in an image according to any one of claims 1 to 8.
CN202111013970.1A 2021-08-31 2021-08-31 Method and device for identifying target object in image, terminal equipment and storage medium Active CN113837236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111013970.1A CN113837236B (en) 2021-08-31 2021-08-31 Method and device for identifying target object in image, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111013970.1A CN113837236B (en) 2021-08-31 2021-08-31 Method and device for identifying target object in image, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113837236A true CN113837236A (en) 2021-12-24
CN113837236B CN113837236B (en) 2022-11-15

Family

ID=78961856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111013970.1A Active CN113837236B (en) 2021-08-31 2021-08-31 Method and device for identifying target object in image, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113837236B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439845A (en) * 2022-08-02 2022-12-06 北京邮电大学 Image extrapolation method and device based on graph neural network, storage medium and terminal

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504717A (en) * 2008-07-28 2009-08-12 上海高德威智能交通系统有限公司 Characteristic area positioning method, car body color depth and color recognition method
CN105787482A (en) * 2016-02-26 2016-07-20 华北电力大学 Specific target outline image segmentation method based on depth convolution neural network
US20180260970A1 (en) * 2017-03-08 2018-09-13 Casio Computer Co., Ltd. Identification apparatus, identification method and non-transitory computer-readable recording medium
CN111160313A (en) * 2020-01-02 2020-05-15 华南理工大学 Face representation attack detection method based on LBP-VAE anomaly detection model
CN111539259A (en) * 2020-03-31 2020-08-14 广州富港万嘉智能科技有限公司 Target object recognition method, artificial neural network training method, computer-readable storage medium, and manipulator
CN111783863A (en) * 2020-06-23 2020-10-16 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN113012189A (en) * 2021-03-31 2021-06-22 影石创新科技股份有限公司 Image recognition method and device, computer equipment and storage medium
CN113159006A (en) * 2021-06-23 2021-07-23 杭州魔点科技有限公司 Attendance checking method and system based on face recognition, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101504717A (en) * 2008-07-28 2009-08-12 上海高德威智能交通系统有限公司 Characteristic area positioning method, car body color depth and color recognition method
CN105787482A (en) * 2016-02-26 2016-07-20 华北电力大学 Specific target outline image segmentation method based on depth convolution neural network
US20180260970A1 (en) * 2017-03-08 2018-09-13 Casio Computer Co., Ltd. Identification apparatus, identification method and non-transitory computer-readable recording medium
CN111160313A (en) * 2020-01-02 2020-05-15 华南理工大学 Face representation attack detection method based on LBP-VAE anomaly detection model
CN111539259A (en) * 2020-03-31 2020-08-14 广州富港万嘉智能科技有限公司 Target object recognition method, artificial neural network training method, computer-readable storage medium, and manipulator
CN111783863A (en) * 2020-06-23 2020-10-16 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN113012189A (en) * 2021-03-31 2021-06-22 影石创新科技股份有限公司 Image recognition method and device, computer equipment and storage medium
CN113159006A (en) * 2021-06-23 2021-07-23 杭州魔点科技有限公司 Attendance checking method and system based on face recognition, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439845A (en) * 2022-08-02 2022-12-06 北京邮电大学 Image extrapolation method and device based on graph neural network, storage medium and terminal

Also Published As

Publication number Publication date
CN113837236B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
WO2019100724A1 (en) Method and device for training multi-label classification model
CN105144239B (en) Image processing apparatus, image processing method
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
US8792722B2 (en) Hand gesture detection
US8750573B2 (en) Hand gesture detection
CN111161311A (en) Visual multi-target tracking method and device based on deep learning
CN112434721A (en) Image classification method, system, storage medium and terminal based on small sample learning
WO2023185785A1 (en) Image processing method, model training method, and related apparatuses
CN112052186B (en) Target detection method, device, equipment and storage medium
CN109002755B (en) Age estimation model construction method and estimation method based on face image
CN104915972A (en) Image processing apparatus, image processing method and program
KR20170026222A (en) Method and device for classifying an object of an image and corresponding computer program product and computer-readable medium
CN108256454B (en) Training method based on CNN model, and face posture estimation method and device
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN112836625A (en) Face living body detection method and device and electronic equipment
JP6989450B2 (en) Image analysis device, image analysis method and program
CN112102929A (en) Medical image labeling method and device, storage medium and electronic equipment
CN110610131B (en) Face movement unit detection method and device, electronic equipment and storage medium
CN116206334A (en) Wild animal identification method and device
CN113837236B (en) Method and device for identifying target object in image, terminal equipment and storage medium
CN112418256A (en) Classification, model training and information searching method, system and equipment
CN112434576A (en) Face recognition method and system based on depth camera
CN115713669B (en) Image classification method and device based on inter-class relationship, storage medium and terminal
CN116246161A (en) Method and device for identifying target fine type of remote sensing image under guidance of domain knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant