CN115019032A - Image processing method, image processing device, electronic equipment and storage medium - Google Patents

Image processing method, image processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115019032A
CN115019032A CN202210612912.9A CN202210612912A CN115019032A CN 115019032 A CN115019032 A CN 115019032A CN 202210612912 A CN202210612912 A CN 202210612912A CN 115019032 A CN115019032 A CN 115019032A
Authority
CN
China
Prior art keywords
image
feature
sample
target
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210612912.9A
Other languages
Chinese (zh)
Inventor
陈魏然
李丽
王永森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Information Technology Beijing Co ltd
Original Assignee
China Post Information Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Post Information Technology Beijing Co ltd filed Critical China Post Information Technology Beijing Co ltd
Priority to CN202210612912.9A priority Critical patent/CN115019032A/en
Publication of CN115019032A publication Critical patent/CN115019032A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image processing method, an image processing device, electronic equipment and a storage medium. The method comprises the following steps: acquiring an image frame to be identified, and identifying at least one target image area in the image frame to be identified; for any target image area, extracting the image characteristics of the current target image area based on a pre-trained characteristic extraction model, and determining the target position characteristics of the current target image area based on the image characteristics; acquiring a pre-constructed feature category library, and determining the region categories to which the target image regions in the image frame to be identified respectively belong based on the feature center points of all categories and the target position features in the feature category library. By the technical scheme disclosed by the embodiment of the invention, the problems of low dressing identification efficiency and low accuracy in the prior art are solved, and the false detection rate of dressing identification is reduced, so that the identification accuracy is improved.

Description

Image processing method, image processing device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of computer vision technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.
Background
Many industries such as warehouse logistics, factory production lines, etc. have certain requirements for the garment specifications, and it is required that proper work clothes are worn by operators. Generally, in the fields of warehouse logistics, factory production lines and the like, because the number of operators is large, in order to distinguish different operation links and outsourcing parties, work clothes with different styles, colors and marks are often provided, including coats, waistcoats, T-shirts and the like. For large-scale logistics centers or factory production lines, the variety of the work clothes can be as many as dozens, and the appearance of some work clothes is close to that of common clothes. Regulatory issues for dressing codes; with the development of informatization, networking and intelligent technologies, some enterprises and factories introduce a video monitoring system and a deep learning technology to intelligently analyze and alarm the dressing condition of operators in videos.
The existing method mostly adopts a target detection method based on deep learning to distinguish work clothes from non-work clothes, and the method can obtain good effect aiming at scenes with few kinds of work clothes, but has the following difficulties for complex scenes such as warehouse logistics centers or factory production lines and the like: (1) the working clothes and the non-working clothes are various in types, the non-working clothes cannot be predicted, certain cross property exists in a working space, and various working clothes and various non-working clothes can appear at the same time in certain scenes; (2) the number of required operators in each link is different, so that various kinds of work clothes have different frequencies, and the problem of unbalanced samples exists in actual detection.
Disclosure of Invention
The invention provides an image processing method, an image processing device, electronic equipment and a storage medium, which are used for solving the problems of low dressing identification efficiency and low accuracy in the prior art, and realizing the reduction of the false detection rate of dressing identification, thereby improving the identification accuracy.
In a first aspect, an embodiment of the present invention provides an image processing method, where the method includes:
acquiring an image frame to be identified, and identifying at least one target image area in the image frame to be identified;
for any target image area, extracting the image characteristics of the current target image area based on a pre-trained characteristic extraction model, and determining the target position characteristics of the current target image area based on the image characteristics;
and acquiring a pre-constructed feature category library, and determining the region categories to which the target image regions in the image frame to be recognized respectively belong based on the feature center points of all categories in the feature category library and the target position features.
Optionally, the identifying at least one target image area in the image frame to be identified includes:
acquiring a pre-trained target recognition model, and performing image area recognition on the image frame to be recognized based on the target recognition model to obtain image coordinates of at least one target image area in the image frame to be recognized;
and clipping the image frame to be recognized based on each image coordinate to generate at least one target image area in the image frame to be recognized.
Optionally, the training process of the feature extraction model includes:
acquiring a sample image library, and generating a group of sample image pairs based on any two sample region images in the sample image library; the sample image library comprises various labeled sample region images;
respectively inputting each sample image pair into a pre-established feature extraction model of a backbone network for training to obtain a training output result corresponding to each sample image pair;
determining a model loss of the feature extraction model based on each of the training output results and each of the sample images, and adjusting model parameters of the feature extraction model based on the model loss.
Optionally, the training output result includes sample image features;
the determining a model loss for the feature extraction model based on the training output results and the sample images comprises:
for any group of sample image pairs, obtaining sample image features respectively corresponding to each sample area image in the current sample image pair, and determining a sample feature distance between the sample image features in the current sample image pair;
determining a model loss for the feature extraction model based on the sample feature distances for each of the sample image pairs.
Optionally, the method for constructing the feature category library includes:
based on the feature extraction model, performing feature extraction on various sample region images in a sample image library to obtain image features corresponding to the various sample region images respectively;
and determining category feature central points corresponding to the various image features respectively, and constructing a feature category library based on the category feature central points.
Optionally, the determining, based on the feature center points of each category in the feature category library and the target position features, the region categories to which the target image regions in the image frame to be recognized respectively belong includes:
respectively determining the characteristic distance between the target position characteristic of the current target image area and the central point of each category characteristic for any target image area;
and determining the region class to which the target image region belongs based on each characteristic distance.
Optionally, the target image area includes a dressing area; the region categories include at least one work service category and a non-work service category;
the determining the region class to which the target image region belongs based on each of the feature distances includes:
determining a minimum feature distance and a next minimum feature distance in each feature distance, and determining a feature distance difference value between the minimum feature distance and the next minimum feature distance;
if the feature distance difference is larger than a preset threshold value, determining an area type corresponding to a type feature central point with the minimum feature distance between the target position features as the area type of the target position features;
if the characteristic distance difference is smaller than a preset threshold value, acquiring a region type corresponding to a type characteristic central point with the minimum characteristic distance between the target position characteristics; if the area type is any work clothes type, determining the area type corresponding to the type feature central point with the minimum feature distance between the target position features as the area type to which the target position features belong; and if the area type is a non-service type, determining the area type corresponding to the type feature central point with the next-smallest feature distance between the target position features as the area type of the target position features.
In a second aspect, an embodiment of the present invention further provides an image processing apparatus, including:
the target image area identification module is used for acquiring an image frame to be identified and identifying at least one target image area in the image frame to be identified;
the target position characteristic determining module is used for extracting the image characteristics of the current target image area based on a pre-trained characteristic extraction model for any target image area and determining the target position characteristics of the current target image area based on the image characteristics;
and the region type determining module is used for acquiring a pre-constructed feature type library and determining the region type to which each target image region in the image frame to be identified belongs respectively based on the feature center point of each type in the feature type library and the target position feature.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the image processing method according to any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to, when executed, enable a processor to implement the image processing method according to any embodiment of the present invention.
The technical scheme of the embodiment of the invention comprises the steps of obtaining an image frame to be identified, and identifying at least one target image area in the image frame to be identified; for any target image area, extracting the image characteristics of the current target image area based on a pre-trained characteristic extraction model, and determining the target position characteristics of the current target image area based on the image characteristics; the method comprises the steps of obtaining a pre-constructed feature class library, determining the region class to which each target image region in an image frame to be recognized belongs respectively based on the feature center point and each target position feature of each class in the feature class library, maximizing the distance between sample classes by a model training method based on a twin structure, avoiding the convergence and generalization problems caused by direct detection, and quickly obtaining a trained feature extraction model; meanwhile, the identification of non-work clothes and the subdivision of work clothes can be realized by combining the constructed sample image library; in addition, aiming at the problem of easy false detection caused by high similarity of partial work clothes and non-work clothes, a secondary judgment strategy is designed to reduce false alarm, the problems of low dressing identification efficiency and low accuracy in the prior art are solved, the false detection rate of dressing identification is reduced, and the identification accuracy is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an image processing apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention;
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
Example one
Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention, where the embodiment is applicable to identifying and classifying a target area in an image, and the method may be executed by an image processing apparatus, where the image processing apparatus may be implemented in a form of hardware and/or software, and the image processing apparatus may be configured in a server or an intelligent terminal. As shown in fig. 1, the method includes:
s110, acquiring an image frame to be identified, and identifying at least one target image area in the image frame to be identified.
In the embodiment of the invention, the image frame to be recognized can be understood as video data recorded in a preset scene, and the obtained video data is further subjected to image frame decomposition, so that a plurality of image frames are obtained. Optionally, the preset scenario may include, but is not limited to, a service scenario, an electric power scenario, a warehouse logistics scenario, a factory production line scenario, and the like. The target image area can be understood as a partial image area in the image frame to be recognized, the target image area contains a target object, and optionally, the corresponding target objects in different scenes are different; for example, in a service scenario, the target object may be a service person with different working properties, and in a warehouse logistics scenario, the target object may be a staff member such as a sorter or courier. The number of target image areas included in the image frame to be recognized may be one or multiple, and may be specifically determined according to a target object included in the image frame to be recognized, which is not limited in this embodiment.
And under the condition of acquiring the image frame to be recognized, recognizing the image frame to be recognized so as to determine each target image area in the image frame to be recognized. Optionally, the identification method may include: acquiring a pre-trained target recognition model, and performing image area recognition on an image frame to be recognized based on the target recognition model to obtain image coordinates of at least one target image area in the image frame to be recognized; and clipping the image frame to be recognized based on the image coordinates to generate at least one target image area in the image frame to be recognized.
In this embodiment, the target recognition model may be a target detection network based on a convolutional neural network, or may be other detection networks, and the specific composition structure of the model is not limited in this embodiment. Specifically, for any image frame to be recognized, inputting the current image frame to be recognized into a target recognition model trained in advance to obtain at least one image coordinate output by the model; further, image areas corresponding to the image coordinates in the image frame to be recognized are respectively determined, the image frame to be recognized is cut based on the image areas, and at least one target image area in the image frame to be recognized is generated.
And S120, for any target image area, extracting the image characteristics of the current target image area based on the pre-trained characteristic extraction model, and determining the target position characteristics of the current target image area based on the image characteristics.
In the embodiment of the invention, on the basis of acquiring each target image area in the image frame to be identified, the image characteristics in each target image area are extracted. Specifically, for any target image area, the current target image area is input to a feature extraction model which is trained in advance, and the image features of the current target image area are obtained. It should be noted that the image features output by the feature extraction model are image features in a high-dimensional space, and the effect is that more detailed image features can be obtained in the high-dimensional space, so that the subsequent recognition result is more accurate.
Further, the obtained image features are subjected to feature processing to obtain target position features of the current target image area, so that the target area images can be classified based on the position features. In this embodiment, the method for determining the target position feature may adopt a conventional image processing method, or may also adopt a preset neural network for processing, which is not limited in this embodiment.
In addition to the above-described embodiments, in order to obtain a trained feature extraction model, it is necessary to train the feature extraction model in advance. Specifically, the training process comprises: acquiring a sample image library, and generating a group of sample image pairs based on any two sample region images in the sample image library; the sample image library comprises various labeled sample area images; respectively inputting each sample image pair into a pre-established feature extraction model of a backbone network for training to obtain a training output result corresponding to each sample image pair; and determining model loss of the feature extraction model based on each training output result and each sample image, and adjusting model parameters of the feature extraction model based on the model loss.
In this embodiment, based on the historical monitoring video, frame extraction processing is performed on the video monitoring data to obtain a plurality of video frame images. Furthermore, labeling and cutting a target region in the image to obtain various sample region images; for example, if the target area is a dressing image area, the sample area images of different types may be area images of different types of clothing, and optionally, the dressing area images may be further subdivided according to the types of work clothing, and all non-work clothing may be classified into a large group, so as to obtain a dressing area image library, which includes n +1 types. Wherein n is the number of the types of the work clothes, namely the types of the clothing are respectively as follows: worker's clothes 1, worker's clothes 2, …, worker's clothes n, non-worker's clothes.
It is to be noted that, in the technical solution of this embodiment, a data enhancement method may be further used to amplify the classes of the work services with fewer samples, so as to reduce the influence caused by unbalanced data distribution. The data enhancement methods used include, but are not limited to, horizontal flipping, luminance transformation, affine transformation.
Further, any two sample region images in the sample image library are paired to generate at least one sample image pair, where the sample region images in the sample image pair may be sample images of the same type or sample images of different types, which is not limited in this embodiment. And respectively inputting each sample image pair into a pre-established feature extraction model of the backbone network to obtain the sample image features output by the model. And determining the model loss of the feature extraction model according to the sample image features of each sample image pair and the belonged category of each sample region image in each sample image pair, and adjusting the model parameters of the feature extraction model based on the model loss.
Optionally, the method for determining a model loss of the feature extraction model based on each training output result and each sample image may include: for any group of sample image pairs, obtaining sample image characteristics corresponding to each sample area image in the current sample image pair respectively, and determining a sample characteristic distance between the sample image characteristics in the current sample image pair; model loss for the feature extraction model is determined based on the sample feature distances for each sample image pair.
Specifically, two sample region images in the current sample image pair can be simultaneously input into two feature extraction models with the same parameters to obtain image features output by the models, so that the efficiency of model training can be improved.
The method comprises the steps of determining a sample feature distance of two image features on the basis of obtaining image features of two sample area images in a current sample image pair, further determining a sample feature distance of each sample image pair, determining model loss of a feature extraction model based on the sample feature distance of each sample image pair, and further adjusting model parameters of the feature extraction model based on the model loss.
In the above embodiment, the feature similarity between the sample image pairs is used as a loss function, and the back propagation is used to prompt the feature extraction model to learn parameters with small intra-class distance and large inter-class distance, so as to achieve the purpose of maximizing the feature differentiation of various samples in a high-dimensional (d-dimensional) feature space, and achieve better generalization performance for classifying unknown samples.
S130, acquiring a pre-constructed feature category library, and determining the region categories to which the target image regions in the image frame to be identified respectively belong based on feature center points of various categories and target position features in the feature category library.
In this embodiment, the category feature library includes various feature center points respectively corresponding to various sample region images in the sample image library.
Optionally, the method for constructing the feature category library in this embodiment may include: based on the feature extraction model, performing feature extraction on various sample region images in the sample image library to obtain image features corresponding to the various sample region images respectively; and determining category feature central points corresponding to the various image features respectively, and constructing a feature category library based on the category feature central points.
Specifically, for any type of sample region image in the sample image library, each sample region image of the current type is input into the trained feature extraction model, so as to obtain each image feature of the sample region image of the current type. Further, processing each image feature by calculation to obtain a feature center point of each image feature of the current category. Optionally, the method for determining the feature center point may use a clustering method, and certainly, may also use a formula to perform the processing. For example, calculating the expression of the feature center point may include:
Figure BDA0003672499140000101
wherein, Fc represents the characteristic central point of the class c sample in the high-dimensional space; nc is the number of samples of class c; and fi is a d-dimensional image feature corresponding to the ith sample.
Based on the above embodiment, the feature center points of each category corresponding to each type of sample region image are calculated, and a feature category library is constructed based on the feature center points of each category.
In addition to the above embodiment, the region types to which the target image regions respectively belong are determined based on the target position features of the acquired target region image and the feature center points of the respective types in the feature type library.
Optionally, the method for determining the region categories to which the target image regions respectively belong may include: respectively determining the characteristic distance between the target position characteristic of the current target image area and the central point of each class characteristic; and determining the region type of the target image region based on the characteristic distances.
In this embodiment, the target image area includes a dressing area; the region categories comprise at least one work service category and a non-work service category; optionally, determining the region class to which the target image region belongs based on each feature distance includes: determining a minimum characteristic distance and a next minimum characteristic distance in all the characteristic distances, and determining a characteristic distance difference value between the minimum characteristic distance and the next minimum characteristic distance; if the characteristic distance difference is larger than a preset threshold value, determining a region type corresponding to a type characteristic central point with the minimum characteristic distance between the target position characteristics as the region type of the target position characteristics; if the characteristic distance difference is smaller than a preset threshold value, acquiring a region type corresponding to a type characteristic central point with the minimum characteristic distance between the target position characteristics; if the region type is any work clothes type, determining the region type corresponding to the type feature central point with the minimum feature distance between the target position features as the region type of the target position features; and if the area type is the non-work service type, determining the area type corresponding to the type feature central point with the next-smallest feature distance between the target position features, and taking the area type as the area type of the target position features.
Specifically, the target position feature Fx of the target image area is determined, and the distances between feature center points F1, F2, …, Fn and Fn +1 of each category in the feature category library are calculated respectively. Optionally, the present invention employs euclidean distances. Further, a minimum feature distance minDis, a next-smallest feature distance sec _ minDis, among the n +1 feature distances is determined, and a feature distance difference diff between the next-smallest feature distance and the minimum feature distance is determined as sec _ minDis-minDis.
Acquiring the belonged category Cm of the sample region image corresponding to the minimum characteristic distance and the belonged category Cs of the target image region corresponding to the secondary minimum characteristic distance; if diff is larger than a preset threshold thr, judging the category of the corresponding target image region to be Cm; otherwise, if diff is smaller than the preset threshold thr, the method is divided into two cases: optionally, in the first method, if the category of Cm is a worker's clothes category, it is determined that the category of the target image region is Cm, that is, the worker's clothes category; and on the contrary, if the type of Cm is a non-work service type, determining that the type of the target image region is Cs, namely the work service type. The secondary judgment strategy has the advantages that the secondary judgment strategy can effectively reduce false alarm of non-work clothes aiming at the problem that the partial style work clothes are similar to the non-work clothes, so that the classification accuracy is improved.
The technical scheme of the embodiment of the invention comprises the steps of obtaining an image frame to be identified, and identifying at least one target image area in the image frame to be identified; for any target image area, extracting the image characteristics of the current target image area based on a pre-trained characteristic extraction model, and determining the target position characteristics of the current target image area based on the image characteristics; the method comprises the steps of obtaining a pre-constructed feature class library, determining the region class to which each target image region in an image frame to be recognized belongs respectively based on the feature center point of each class and the position feature of each target in the feature class library, maximizing the inter-sample class distance by a model training method based on a twin structure, avoiding the problems of convergence and generalization caused by direct detection, and quickly obtaining a trained feature extraction model; meanwhile, the identification of non-work clothes and the subdivision of work clothes can be realized by combining the constructed sample image library; in addition, aiming at the problem of easy false detection caused by high similarity of partial work clothes and non-work clothes, a secondary judgment strategy is designed to reduce false alarm, the problems of low dressing identification efficiency and low accuracy in the prior art are solved, the false detection rate of dressing identification is reduced, and the identification accuracy is improved.
Example two
Fig. 2 is a schematic structural diagram of an image processing apparatus according to a second embodiment of the present invention. As shown in fig. 2, the apparatus includes: a target image area identification module 210, a target location feature determination module 220, and an area category determination module 230; wherein the content of the first and second substances,
a target image area identification module 210, configured to acquire an image frame to be identified, and identify at least one target image area in the image frame to be identified;
a target position feature determination module 220, configured to, for any target image region, extract an image feature of a current target image region based on a pre-trained feature extraction model, and determine a target position feature of the current target image region based on the image feature;
the region type determining module 230 is configured to obtain a pre-constructed feature type library, and determine, based on feature center points of each type in the feature type library and feature of each target position, region types to which each target image region in the image frame to be identified belongs.
On the basis of the foregoing embodiments, optionally, the target image area identifying module 210 includes:
the image coordinate obtaining unit is used for obtaining a pre-trained target identification model, and carrying out image area identification on the image frame to be identified based on the target identification model to obtain the image coordinate of at least one target image area in the image frame to be identified;
and the target image area identification unit is used for cutting the image frame to be identified based on each image coordinate to generate at least one target image area in the image frame to be identified.
On the basis of the foregoing embodiments, optionally, the apparatus further includes a model training module, configured to train the feature extraction model; wherein the model training module comprises:
a sample image pair generating unit for acquiring a sample image library and generating a set of sample image pairs based on any two sample region images in the sample image library; the sample image library comprises various labeled sample region images;
a training output result obtaining unit, configured to input each sample image pair into a pre-established feature extraction model of a backbone network, respectively, for training, so as to obtain a training output result corresponding to each sample image pair;
and the model loss determining unit is used for determining the model loss of the feature extraction model based on each training output result and each sample image and adjusting the model parameters of the feature extraction model based on the model loss.
On the basis of the foregoing embodiments, optionally, the training output result includes a sample image feature;
accordingly, a model loss determination unit comprises:
a sample feature distance determining subunit, configured to, for any group of sample image pairs, obtain sample image features corresponding to each sample region image in the current sample image pair, and determine a sample feature distance between the sample image features in the current sample image pair;
a model loss determination subunit for determining a model loss of the feature extraction model based on the sample feature distance of each of the sample image pairs.
On the basis of the foregoing embodiments, optionally, the apparatus further includes a feature category library construction module, configured to construct a feature category library, where the feature category library construction module includes:
the image feature obtaining unit is used for performing feature extraction on various types of sample region images in a sample image library based on the feature extraction model to obtain image features respectively corresponding to the various types of sample region images;
and the feature class library construction unit is used for determining class feature central points respectively corresponding to the various image features and constructing a feature class library based on the class feature central points.
On the basis of the foregoing embodiments, optionally, the area category determining module 230 includes:
the characteristic distance determining unit is used for respectively determining the characteristic distance between the target position characteristic of the current target image area and the central point of each class characteristic for any target image area;
and the region type determining unit is used for determining the region type to which the target image region belongs based on each characteristic distance.
On the basis of the foregoing embodiments, optionally, the target image area includes a dressing area; the region categories include at least one work service category and a non-work service category;
correspondingly, the area category determination unit comprises:
a feature distance difference determining unit configured to determine a minimum feature distance and a next-smallest feature distance in each of the feature distances, and determine a feature distance difference between the minimum feature distance and the next-smallest feature distance;
a first region type determining unit, configured to determine, if the feature distance difference is greater than a preset threshold, a region type corresponding to a type feature center point having a minimum feature distance between the target location features, as the region type of the target location features;
a second region type determining unit, configured to, if the feature distance difference is smaller than a preset threshold, obtain a region type corresponding to a type feature center point having a minimum feature distance between the target position features; if the area type is any work clothes type, determining the area type corresponding to the type feature central point with the minimum feature distance between the target position features as the area type to which the target position features belong; and if the area type is a non-service type, determining the area type corresponding to the type feature central point with the next-smallest feature distance between the target position features as the area type of the target position features.
The image processing device provided by the embodiment of the invention can execute the image processing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE III
FIG. 3 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM)12, a Random Access Memory (RAM)13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM)12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as an image processing method.
In some embodiments, the image processing method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the image processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An image processing method, comprising:
acquiring an image frame to be identified, and identifying at least one target image area in the image frame to be identified;
for any target image area, extracting the image characteristics of the current target image area based on a pre-trained characteristic extraction model, and determining the target position characteristics of the current target image area based on the image characteristics;
and acquiring a pre-constructed feature category library, and determining the region categories to which the target image regions in the image frame to be recognized respectively belong based on the feature center points of all categories in the feature category library and the target position features.
2. The method according to claim 1, wherein said identifying at least one target image area in said image frame to be identified comprises:
acquiring a pre-trained target recognition model, and performing image area recognition on the image frame to be recognized based on the target recognition model to obtain image coordinates of at least one target image area in the image frame to be recognized;
and clipping the image frame to be recognized based on each image coordinate to generate at least one target image area in the image frame to be recognized.
3. The method of claim 1, wherein the training process of the feature extraction model comprises:
acquiring a sample image library, and generating a group of sample image pairs based on any two sample region images in the sample image library; the sample image library comprises various labeled sample region images;
respectively inputting each sample image pair into a pre-established feature extraction model of a backbone network for training to obtain a training output result corresponding to each sample image pair;
determining a model loss of the feature extraction model based on each of the training output results and each of the sample images, and adjusting model parameters of the feature extraction model based on the model loss.
4. The method of claim 3, wherein the training output comprises sample image features;
the determining a model loss for the feature extraction model based on the training output results and the sample images comprises:
for any group of sample image pairs, obtaining sample image features respectively corresponding to each sample area image in the current sample image pair, and determining a sample feature distance between the sample image features in the current sample image pair;
determining a model loss for the feature extraction model based on the sample feature distances for each of the sample image pairs.
5. The method of claim 1, wherein the method of constructing the feature class library comprises:
based on the feature extraction model, performing feature extraction on various sample region images in a sample image library to obtain image features corresponding to the various sample region images respectively;
and determining category feature central points corresponding to the various image features respectively, and constructing a feature category library based on the category feature central points.
6. The method according to claim 1, wherein the determining, based on the feature center points of the categories in the feature category library and the target position features, the category to which the target image regions in the image frame to be recognized respectively belong comprises:
respectively determining the characteristic distance between the target position characteristic of the current target image area and the central point of each category characteristic for any target image area;
and determining the region class to which the target image region belongs based on each characteristic distance.
7. The method of claim 6, wherein the target image area comprises a dressing area; the region categories include at least one work service category and a non-work service category;
the determining the region class to which the target image region belongs based on each of the feature distances includes:
determining a minimum feature distance and a next minimum feature distance in each feature distance, and determining a feature distance difference value between the minimum feature distance and the next minimum feature distance;
if the feature distance difference is larger than a preset threshold value, determining an area type corresponding to a type feature central point with the minimum feature distance between the target position features as the area type of the target position features;
if the characteristic distance difference is smaller than a preset threshold value, acquiring a region type corresponding to a type characteristic central point with the minimum characteristic distance between the target position characteristics; if the area type is any work clothes type, determining the area type corresponding to the type feature central point with the minimum feature distance between the target position features as the area type to which the target position features belong; and if the area type is a non-service type, determining the area type corresponding to the type feature central point with the next-smallest feature distance between the target position features as the area type of the target position features.
8. An image processing apparatus characterized by comprising:
the target image area identification module is used for acquiring an image frame to be identified and identifying at least one target image area in the image frame to be identified;
the target position characteristic determining module is used for extracting the image characteristics of the current target image area based on a pre-trained characteristic extraction model for any target image area and determining the target position characteristics of the current target image area based on the image characteristics;
and the region type determining module is used for acquiring a pre-constructed feature type library and determining the region type to which each target image region in the image frame to be identified belongs respectively based on the feature center point of each type in the feature type library and the target position feature.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the image processing method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to implement the image processing method of any one of claims 1 to 7 when executed.
CN202210612912.9A 2022-05-31 2022-05-31 Image processing method, image processing device, electronic equipment and storage medium Pending CN115019032A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210612912.9A CN115019032A (en) 2022-05-31 2022-05-31 Image processing method, image processing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210612912.9A CN115019032A (en) 2022-05-31 2022-05-31 Image processing method, image processing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115019032A true CN115019032A (en) 2022-09-06

Family

ID=83070319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210612912.9A Pending CN115019032A (en) 2022-05-31 2022-05-31 Image processing method, image processing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115019032A (en)

Similar Documents

Publication Publication Date Title
CN113361603B (en) Training method, category identification device, electronic device, and storage medium
CN113627508B (en) Display scene recognition method, device, equipment and storage medium
CN113657289B (en) Training method and device of threshold estimation model and electronic equipment
CN114444619B (en) Sample generation method, training method, data processing method and electronic device
CN113780098A (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN113627439A (en) Text structuring method, processing device, electronic device and storage medium
CN115358392A (en) Deep learning network training method, text detection method and text detection device
CN115565101A (en) Production safety abnormity identification method and device, electronic equipment and storage medium
CN113947701B (en) Training method, object recognition method, device, electronic equipment and storage medium
CN113657249A (en) Training method, prediction method, device, electronic device, and storage medium
CN115033732B (en) Spatiotemporal trajectory association method and device, electronic equipment and storage medium
CN116309963A (en) Batch labeling method and device for images, electronic equipment and storage medium
CN115665783A (en) Abnormal index tracing method and device, electronic equipment and storage medium
CN116017401A (en) Stay point determining method and device, electronic equipment and storage medium
CN115019032A (en) Image processing method, image processing device, electronic equipment and storage medium
CN115471772A (en) Method, device, equipment and medium for extracting key frame
CN113887607A (en) Target object information processing method and device and computer program product
CN114973081A (en) High-altitude parabolic detection method and device, electronic equipment and storage medium
CN113642472A (en) Training method and action recognition method of discriminator model
CN114187448A (en) Document image recognition method and device, electronic equipment and computer readable medium
CN117746069B (en) Graph searching model training method and graph searching method
CN114911963B (en) Template picture classification method, device, equipment, storage medium and product
CN113963234B (en) Data annotation processing method, device, electronic equipment and medium
CN115131825A (en) Human body attribute identification method and device, electronic equipment and storage medium
CN117668294A (en) Face library creation and video identification methods and devices and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination