CN117078997A - Image processing or training method, device, equipment and medium of image processing model - Google Patents

Image processing or training method, device, equipment and medium of image processing model Download PDF

Info

Publication number
CN117078997A
CN117078997A CN202310780968.XA CN202310780968A CN117078997A CN 117078997 A CN117078997 A CN 117078997A CN 202310780968 A CN202310780968 A CN 202310780968A CN 117078997 A CN117078997 A CN 117078997A
Authority
CN
China
Prior art keywords
image
processed
grid
sample
image processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310780968.XA
Other languages
Chinese (zh)
Inventor
赵一麟
沈智勇
陆勤
龚建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310780968.XA priority Critical patent/CN117078997A/en
Publication of CN117078997A publication Critical patent/CN117078997A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a training method, apparatus, device and medium for image processing or image processing model, and relates to the field of image processing, in particular to the field of smart city, deep learning and computer vision. The specific implementation scheme is as follows: extracting features of an image to be processed to obtain image features of the image to be processed; gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed; according to the grid characteristics to be processed of the grid to be processed, carrying out target category prediction on the grid to be processed to obtain category confidence of the grid to be processed; screening the grid characteristics to be processed of each grid to be processed according to the category confidence degree of each grid to be processed to obtain screening characteristics of the images to be processed; and generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed. The method and the device can improve the accuracy and the efficiency of target segmentation.

Description

Image processing or training method, device, equipment and medium of image processing model
Technical Field
The present disclosure relates to the field of image processing, and in particular, to the fields of smart cities, deep learning, and computer vision, and more particularly, to a training method, apparatus, device, and medium for image processing or an image processing model.
Background
At present, the sharing bicycle is used as a public traffic tool, and the short trip requirement of citizens is borne. Plays a vital role in urban daily traffic. The supervision requirement for the shared bicycle is also generated, and the standard parking of the shared bicycle not only beautifies urban volumes, but also improves the use efficiency of the shared bicycle.
Disclosure of Invention
The present disclosure provides a training method, apparatus, device, and medium for image processing or image processing model.
According to an aspect of the present disclosure, there is provided an image processing method including:
extracting features of an image to be processed to obtain image features of the image to be processed, wherein the size of the image features of the image to be processed is smaller than that of the image to be processed;
gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed;
according to the grid characteristics to be processed of the grid to be processed, carrying out target category prediction on the grid to be processed to obtain category confidence of the grid to be processed;
Screening the grid characteristics to be processed of each grid to be processed according to the category confidence degree of each grid to be processed to obtain screening characteristics of the images to be processed;
and generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed.
According to an aspect of the present disclosure, there is provided an image processing apparatus including: the image processing model implements an image processing method according to any of the embodiments of the present disclosure;
the feature extraction module is used for extracting features of the image to be processed to obtain image features of the image to be processed, and the size of the image features of the image to be processed is smaller than that of the image to be processed;
the target classification module is used for gridding the image characteristics of the image to be processed to obtain the grid characteristics to be processed of at least one grid to be processed;
the target classification module is used for predicting the target category of the grid to be processed according to the grid characteristics to be processed of the grid to be processed, so as to obtain the category confidence coefficient of the grid to be processed;
the target segmentation module is used for screening the grid characteristics to be processed of each grid to be processed according to the category confidence coefficient of each grid to be processed to obtain screening characteristics of the images to be processed;
The target segmentation module is used for generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed.
According to an aspect of the present disclosure, there is provided a training method of an image processing model, including: the image processing model implements an image processing method according to any of the embodiments of the present disclosure;
extracting features of a sample image through the image processing model to obtain image features of the sample image, wherein the size of the image features of the sample image is smaller than that of the sample image;
gridding the image features of the sample image through the image processing model to obtain sample grid features of at least one sample grid;
performing target category prediction on the sample grid according to sample grid characteristics of the sample grid through the image processing model to obtain category confidence of the sample grid;
screening sample grid features of each sample grid according to the category confidence coefficient of each sample grid through the image processing model to obtain screening features of the sample images;
Generating a prediction segmentation result of the sample image based on the screening feature of the sample image and the image feature of the sample image through the image processing model;
calculating, by the image processing model, a first difference between a standard segmentation result and the predicted segmentation result of the sample image;
calculating, by the image processing model, a second difference between the standard class of the sample image and the class confidence of each of the sample grids;
and adjusting model parameters of the image processing model according to the first difference and the second difference through the image processing model.
According to an aspect of the present disclosure, there is provided a training apparatus of an image processing model, including: the image processing model is configured with an image processing apparatus according to any one of the embodiments of the present disclosure;
the image processing model is used for extracting features of a sample image to obtain image features of the sample image, and the size of the image features of the sample image is smaller than that of the sample image;
the image processing model is used for gridding the image characteristics of the sample image to obtain sample grid characteristics of at least one sample grid;
The image processing model is used for predicting the target category of the sample grid according to the sample grid characteristics of the sample grid to obtain the category confidence of the sample grid;
the image processing model is used for screening sample grid characteristics of each sample grid according to the category confidence coefficient of each sample grid to obtain screening characteristics of the sample image;
the image processing model is used for generating a prediction segmentation result of the sample image based on the screening feature of the sample image and the image feature of the sample image;
the image processing model is used for calculating a first difference between a standard segmentation result and the prediction segmentation result of the sample image;
the image processing model is used for calculating a second difference between the standard class of the sample image and the class confidence of each sample grid;
the image processing model is used for adjusting model parameters of the image processing model according to the first difference and the second difference.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method or the training method of the image processing model according to any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the image processing method or the training method of the image processing model of any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the image processing method or the training method of an image processing model according to any of the embodiments of the present disclosure.
The method and the device can improve accuracy and efficiency of target segmentation.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of an image processing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of another image processing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of another image processing method disclosed in accordance with an embodiment of the present disclosure;
FIG. 4 is a flow chart of a training method of an image processing model disclosed in accordance with an embodiment of the present disclosure;
FIG. 5 is a scene graph of an image processing method disclosed in accordance with an embodiment of the disclosure;
FIG. 6 is a schematic diagram of an output result according to an embodiment of the present disclosure;
fig. 7 is a schematic structural view of an image processing apparatus according to an embodiment of the present disclosure;
FIG. 8 is a schematic structural view of a training device of an image processing model disclosed in accordance with an embodiment of the present disclosure;
fig. 9 is a block diagram of an electronic device of an image processing method or a training method of an image processing model according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of an image processing method according to an embodiment of the present disclosure, which may be applied to a case where segmentation and classification of a target object are performed on an image. The method of the embodiment may be performed by an image processing apparatus, which may be implemented in software and/or hardware, and specifically configured in an electronic device with a certain data computing capability, where the electronic device may be a client device or a server device, and the client device may be a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, or the like.
S101, extracting features of an image to be processed to obtain image features of the image to be processed, wherein the size of the image features of the image to be processed is smaller than that of the image to be processed.
The image to be processed may refer to an image for identifying a target. Illustratively, the goal is to share non-motor vehicles, animals or vehicles, etc. A shared non-motor vehicle may refer to a shared bicycle and a shared electric vehicle. Feature extraction may be achieved by convolution. The image features may be represented in matrix or vector form. The image to be processed may be regarded as a matrix of pixels, the size of which may refer to the length and width of the matrix, and the size of the image features may refer to the length and width of the matrix describing the image features. Alternatively, the image features of the image to be processed are understood as feature maps. The image features of the image to be processed are usually obtained by processing the image to be processed, and the size of the image features is smaller than that of the image to be processed.
Optionally, extracting features of the image to be processed to obtain image features of the image to be processed, including: carrying out multi-scale feature extraction on the image to be processed to obtain features with multiple sizes; and fusing the features with the multiple sizes to obtain the image features of the image to be processed.
The method comprises the steps of obtaining the image characteristics of the image to be processed by adopting a backbone network, extracting the characteristics of the image to be processed to obtain the characteristics of a plurality of sizes, and fusing the characteristics of the plurality of sizes by adopting a characteristic aggregation network. Illustratively, the backbone network may be a Swin-transducer and the feature aggregation network may be a FPN (feature pyramid networks, feature pyramid network).
The multi-scale feature extraction is obtained by processing a plurality of stages (convolution, linear change and the like), each stage is processed to obtain a feature with one size, and different stages are processed to obtain features with different sizes.
By extracting the multi-scale features, features with different sizes can be fully utilized, features rich in semantic information and features containing detail information are fused, and the content of the features is enriched, so that classification segmentation is performed based on the rich features, and the accuracy of target classification segmentation can be improved.
S102, gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed.
Dividing the image to be processed into a plurality of grids to obtain at least one grid to be processed. Gridding the image features may mean that the image features are divided to obtain grid features to be processed corresponding to the grids to be processed. The image to be processed is divided into s×s grids to be processed, and the grid characteristics of the grids to be processed are determined according to the content corresponding to the grids to be processed in the image characteristics. For example, the content corresponding to the to-be-processed grid in the image features may be directly determined, and the to-be-processed grid features of the to-be-processed grid may be obtained by convolution.
Optionally, the gridding the image features of the image to be processed to obtain at least one grid feature to be processed of the grid to be processed includes: interpolation is carried out on the image characteristics of the image to be processed, so that the interpolation characteristics of the image to be processed are obtained; and carrying out grid division on the interpolation characteristics of the image to be processed to obtain the grid characteristics to be processed of at least one grid to be processed.
In practice, the image features of the image to be processed are smaller in size than the image to be processed. And interpolating the image features to form interpolation features with the same size as the image to be processed. The size of the interpolation feature is the same as the size of the image to be processed. And performing grid division on the interpolation feature, and taking the content of the region at the position corresponding to the grid to be processed in the division result as the grid feature to be processed of the grid to be processed. In addition, the interpolation feature can be further convolved, and the convolution result of the region at the corresponding position of the grid to be processed in the division result is used as the grid feature to be processed of the grid to be processed.
By classifying and dividing the targets of the image to be processed into the classifications and dividing the grids at different positions, the detection of small targets can be facilitated, and the detection accuracy of the targets can be improved.
S103, according to the characteristics of the to-be-processed grids, carrying out target category prediction on the to-be-processed grids to obtain category confidence of the to-be-processed grids.
The target class prediction may be a prediction of whether there is at least one class of target for the grid to be processed. The grid to be processed has a target of a certain category, the probability of the grid to be processed aiming at the category is higher, the grid to be processed does not have the target of the category, and the probability of the grid to be processed aiming at the category is lower.
Category confidence is used to describe the degree of reliability of category predictions for the grid to be processed. Processing, for example, convolution processing, is performed on the to-be-processed grid features of the S to-be-processed grids to obtain a target category prediction result of s×s×c, where C represents a category probability of the to-be-processed grid, and C may represent a category probability that the to-be-processed grid is at least one category. And determining the category confidence of the grids to be processed according to the category probability of the grids to be processed. For example, the highest category probability may be determined as the category confidence, or the average of the category probabilities may be taken and determined as the category confidence.
And S104, screening the grid characteristics to be processed of each grid to be processed according to the category confidence coefficient of each grid to be processed, and obtaining screening characteristics of the images to be processed.
Screening out the grids to be processed with the category confidence meeting the reliable conditions, and determining the grid characteristics to be processed of the reserved grids to be processed as screening characteristics of the images to be processed. The reliability condition is that the category confidence is greater than or equal to a preset confidence threshold, or the reliability condition is that the category confidence is less than the preset confidence threshold, and for example, the reliability condition is that the category confidence is within a preset confidence range. The reliability condition may be for detecting whether an object exists in an image area corresponding to the mesh to be processed, and/or whether the class of the object is reliable, etc.
S105, generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed.
The screening feature and the image feature can be used as the feature of the image to be processed and used for carrying out image segmentation on the image to be processed to obtain an image segmentation result. The image segmentation result may be a mask map, i.e. pixels with different pixel values distinguishing between target and non-target in the image to be processed. The image segmentation result is obtained by processing the position information comprising the grids and the category information of the screened key grids based on the screening features, so that the screening features provide richer and more accurate positions and category information to improve the image segmentation accuracy in the image segmentation process, and the screening features are subjected to post-processing, so that the processed data volume can be reduced, and the image segmentation efficiency can be improved.
Optionally, the image processing method is implemented by a pre-trained image processing model.
The image processing model is a machine learning model, and illustratively, the image processing model includes a feature extraction module, a target classification module, and a target segmentation module. The feature extraction module is used for extracting features from the input image. The object classification module is used for classifying the images according to the characteristics. The object segmentation module is used for carrying out object segmentation on the image. The image processing method is realized through the image processing module, so that the segmentation accuracy can be improved.
According to the technical scheme, the image characteristics are obtained by extracting the characteristics of the image to be processed, the image characteristics are meshed to form the grid characteristics to be processed of a plurality of grids to be processed, category prediction is carried out on the grids to be processed, the grid characteristics to be processed are screened according to category confidence, screening characteristics are obtained, the screening characteristics and the image characteristics are processed to generate an image segmentation result of the image to be processed, in the image segmentation process, the position sensitivity and the target sensitivity of a detection head are increased, the segmentation accuracy is improved, meanwhile, based on the screened data processing, the processed data quantity is reduced, and the image segmentation efficiency is improved.
Fig. 2 is a flowchart of another image processing method according to an embodiment of the present disclosure, further optimized and expanded based on the above technical solution, and may be combined with the above various alternative embodiments. Screening the grid characteristics to be processed of each grid to be processed according to the category confidence of each grid to be processed, and obtaining screening characteristics of the images to be processed, wherein the screening characteristics are as follows: screening target grids to be processed with targets according to the category confidence of each grid to be processed; and combining the grid characteristics to be processed of each target grid to be processed to generate screening characteristics of the images to be processed.
S201, extracting features of an image to be processed to obtain image features of the image to be processed, wherein the size of the image features of the image to be processed is smaller than that of the image to be processed.
S202, gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed.
S203, according to the characteristics of the to-be-processed grids, performing target category prediction on the to-be-processed grids to obtain category confidence of the to-be-processed grids.
S204, screening out target to-be-processed grids with targets according to the category confidence degree of each to-be-processed grid.
The target to-be-processed grid with the target may refer to that the target exists in an image area corresponding to the target to-be-processed grid. And judging whether the target exists in the image area corresponding to the target to-be-processed grid by adopting a preset confidence threshold value. Illustratively, the grid with a category confidence level greater than or equal to the confidence threshold is the target grid to be processed. The grid with the category confidence less than the confidence threshold is not the target grid to be processed. By way of example, the preset confidence threshold may be 0.3.
S205, combining the grid characteristics to be processed of each target grid to be processed to generate screening characteristics of the images to be processed.
Combining the grid characteristics to be processed of the target grid to be processed means that the dimension is increased, and the grid characteristics to be processed form multi-dimensional screening characteristics.
In a specific example, the feature of the to-be-processed mesh of the plurality of to-be-processed meshes of the to-be-processed image is [ Batch, n_dim, S ], where Batch is the to-be-processed image, s×s is the number of meshes of the to-be-processed mesh, n_dim is the feature of the to-be-processed mesh, and s×s is the feature of the to-be-processed mesh of the to-be-processed meshes to form n_dim. After screening, the number of the target grids to be processed is P, the grid characteristics of the P target grids to be processed are n_dim, and after combination, screening characteristics of [ Batch, ndim, P ] are obtained, which indicate that the image Batch to be processed has P target grids to be processed, and the corresponding P grid characteristics to be processed form ndim.
S206, generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed.
Optionally, the performing target class prediction on the to-be-processed grid according to the to-be-processed grid characteristics of the to-be-processed grid to obtain a class confidence coefficient of the to-be-processed grid includes: according to the characteristics of the to-be-processed grid, carrying out target category prediction on the to-be-processed grid to obtain the probability of at least one category to which the to-be-processed grid belongs; and selecting the highest probability from the probabilities of at least one category to which the grid to be processed belongs, and determining the highest probability as the category confidence of the grid to be processed.
The grid to be processed has a target of one category, and the probability that the grid to be processed belongs to the category is higher. The object of the category does not exist in the grid to be processed, and the probability that the grid to be processed belongs to the category is low. The number of categories may be plural. And comparing the probability that the grid to be processed belongs to the category, and selecting the highest probability value as the category confidence of the grid to be processed.
By way of example, the class confidence of the mesh to be processed may be [ Batch, S, S, C ], C (C1, C2, C3) C1 is the probability of the first class, C2 is the probability of the second class, and C3 is the probability of the third class. If c3 is the largest, the class confidence of the grid to be processed is c3.
By carrying out at least one category prediction on the to-be-processed grid and determining the highest probability in the probabilities of all the categories as the category confidence coefficient of the to-be-processed grid, the detection of the category confidence coefficient can be simplified, the category confidence coefficient can be accurately represented, and the screening effectiveness of the to-be-processed grid is improved.
Optionally, the generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed includes: convolving and adjusting the image characteristics of the image to be processed to obtain adjusted image characteristics of the image to be processed, wherein the size of the adjusted image characteristics of the image to be processed is the same as the size of the image to be processed; and determining the screening characteristics of the image to be processed as a convolution kernel, convolving the adjusted image characteristics of the image to be processed, generating a mask segmentation result in the image to be processed, and determining the mask segmentation result as an image segmentation result of the image to be processed.
The size of the image features is smaller than the image size, and the adjusted image features are the same as the image to be processed. The image is illustratively [ Batch, A, H/8,W/8], where A is the number of categories, H is the height of the image to be processed, and W is the width of the image to be processed. The adjusted image features are [ Batch, n_dim, H, W ]. The resizing may be performed using a linear variation. Convolution and resizing may be implemented using 1 x 1 convolution, group normalization processing (group norm), linear rectification function (ReLU), and the like.
And determining the screening feature as a convolution kernel, convolving the adjusted image feature by adopting the screening feature to obtain a mask image of the image to be processed, and determining the mask image as a mask segmentation result of the image to be processed, namely an image segmentation result. In fact, in a general object recognition scene, the distribution of objects in an image is sparse, and only a small part of the features corresponding to the grids are effective features in the process of reasoning the objects once. The screening features are determined to be convolution kernels, the dynamic determination convolution kernels are output according to a convolution learning mode, the convolution kernels are replaced by fixed convolution kernels to participate in convolution calculation, and then a segmentation result is generated, so that effective calculation can be increased, invalid calculation can be reduced, memory consumption can be reduced, and calculation efficiency can be improved.
By taking the screening features as dynamic convolution kernels, image segmentation is realized by dynamic convolution, invalid calculation can be reduced, calculation efficiency can be improved, the convolution kernels can be effectively regulated, and convolution flexibility and adaptability can be improved.
According to the technical scheme, the target to-be-processed grids with targets are screened, the to-be-processed grid features of the target to-be-processed grids are combined to form screening features, so that the accuracy of image segmentation can be improved, the calculated amount of image segmentation can be reduced, and the image segmentation efficiency can be improved.
Fig. 3 is a flowchart of another image processing method according to an embodiment of the present disclosure, further optimized and expanded based on the above technical solution, and may be combined with the above various alternative embodiments. The image processing method is optimized as follows: and detecting a target frame of the image to be processed based on the screening characteristics of the image to be processed and the image characteristics of the image to be processed.
S301, extracting features of an image to be processed to obtain image features of the image to be processed, wherein the size of the image features of the image to be processed is smaller than that of the image to be processed.
S302, gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed.
S303, according to the characteristics of the to-be-processed grids, performing target category prediction on the to-be-processed grids to obtain category confidence of the to-be-processed grids; the category confidence includes a brand detection result of the shared non-motor vehicle.
Optionally, the brand detection result of the shared non-motor vehicle comprises yes or no; or the confidence detection result of the brand of the shared non-motor vehicle comprises a single vehicle or a group vehicle.
The brand detection results comprise detection results of at least one brand, wherein the detection result of each brand is yes or no, and a probability representation is specifically adopted, and the yes represents that the brand is present, and the no represents that the brand is not present. For another example, the detection result of each brand is represented by a single vehicle or a group vehicle, and the probability is specifically represented by the single vehicle, wherein the representation is that the brand has one shared non-motor vehicle, and the group vehicle is that the brand has a plurality of (2 or more) shared non-motor vehicles.
By setting the confidence detection result of brands, whether the detection result is yes or not, such as single vehicles or group vehicles, the variety of categories can be increased, the flexibility is improved, and the variety of shared non-motor vehicle detection scenes are adapted.
S304, screening the grid characteristics to be processed of each grid to be processed according to the category confidence coefficient of each grid to be processed, and obtaining screening characteristics of the images to be processed.
S305, generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed.
S306, detecting whether the shared non-motor vehicles are illegally parked according to the segmentation result of the shared non-motor vehicles and the brand detection result of the shared non-motor vehicles in the image to be processed.
In one application scenario, the goal is to share non-motor vehicles. Category confidence is the brand of the shared non-motor vehicle. The segmentation result and the brand detection result of the target non-motor vehicle can be identified in the image to be processed, and detection rules of illegal parking of shared non-motor vehicles of different brands are different. For example, brand a shared non-motor vehicles are determined to be out of range of the subway station entrance, and brand B shared non-motor vehicles are determined to be out of range at the subway station entrance.
And determining a detection mode of illegal parking of a certain shared non-motor vehicle according to a brand detection result of the certain shared non-motor vehicle, and determining an illegal parking area or a standard parking area in the image to be processed according to the detection mode. And detecting the segmentation result of the shared non-motor vehicles based on the illegal parking area or the standard parking area, and judging whether the shared non-motor vehicles are in the illegal parking area or the standard parking area, thereby determining whether the shared non-motor vehicles are illegally parked.
Optionally, the image processing method further includes: and detecting a target frame of the image to be processed based on the screening characteristics of the image to be processed and the image characteristics of the image to be processed.
After the screening feature of the image to be processed is obtained, the image to be processed can be subjected to target frame detection based on the screening feature of the image to be processed and the image feature of the image to be processed.
Target frame detection may refer to locating a target in an image to be processed. Optionally, the image features of the image to be processed may be processed to obtain the position of the target frame in the image to be processed. For example, the convolution may be continued on the image features to obtain the position of the target frame in the image to be processed. Optionally, the filtering feature of the image to be processed may be processed to obtain the position of the target frame in the image to be processed, and, illustratively, the filtering feature may be further convolved to obtain the position of the target frame in the image to be processed. Optionally, the target frame detection results of the screening feature and the image feature may be fused to obtain a target frame.
By adopting the screening characteristics and the image characteristics of the image to be processed, the representativeness of key information can be increased, the positioning frame can be further detected, and the detection accuracy of the positioning frame can be improved.
Optionally, the detecting the target frame of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed includes: and detecting coordinates of a target frame in the image to be processed according to the screening characteristics of the image to be processed.
And convolving the screening characteristics of the image to be processed to obtain the coordinates of the target frame in the image to be processed. The target frame may be a rectangular frame and the coordinates may be coordinates of four vertices of the target frame.
As in the previous example, the matrix is characterized by batch_n_dim_p. And convolving the matrix until a matrix of Batch P4 is obtained, wherein the matrix represents 4 vertex coordinates of the Pi target frame of the image to be processed.
By adopting the screening characteristics to be processed, the target frame is detected in the image to be processed, so that the detection accuracy of the positioning frame can be improved, and the detection efficiency of the positioning frame can be improved.
According to the technical scheme, by setting the application scene as the illegal parking detection of the shared non-motor vehicles, the segmentation result of each shared non-motor vehicle and the brand detection result of the shared non-motor vehicles in the image to be processed can be detected in parallel, and the illegal parking detection of each brand of shared non-motor vehicles can be carried out according to the segmentation result and the brand detection result, so that the illegal parking detection efficiency can be accelerated, and the detection accuracy can be considered.
Fig. 4 is a flowchart of another method of training an image processing model disclosed in accordance with an embodiment of the present disclosure, which may be applied to the case of training an image processing model for segmentation and classification of an image into target objects. The method of the embodiment may be performed by an image processing apparatus, which may be implemented in software and/or hardware, and specifically configured in an electronic device with a certain data computing capability, where the electronic device may be a client device or a server device, and the client device may be a mobile phone, a tablet computer, a vehicle-mounted terminal, a desktop computer, or the like. The image processing model implements an image processing method according to any of the embodiments of the present disclosure.
S401, extracting features of a sample image through the image processing model to obtain image features of the sample image, wherein the size of the image features of the sample image is smaller than that of the sample image.
The sample image is used for training an image processing model, and the sample image and the image to be processed have the same size.
S402, gridding the image features of the sample image through the image processing model to obtain sample grid features of at least one sample grid.
The sample is divided into a plurality of grids, resulting in at least one sample grid. Gridding the image features may refer to dividing the image features to obtain sample grid features corresponding to the sample grids.
S403, performing target category prediction on the sample grid according to sample grid characteristics of the sample grid through the image processing model to obtain category confidence of the sample grid.
The target class prediction may be predicting whether a sample grid has at least one class of targets. The sample grid has a target of a certain category, the probability of the sample grid aiming at the category is higher, the sample grid does not have the target of the category, and the probability of the sample grid aiming at the category is lower.
S404, screening sample grid features of each sample grid according to the category confidence coefficient of each sample grid through the image processing model to obtain screening features of the sample images.
And screening out the sample grids with the category confidence degrees meeting the reliability conditions, and determining the sample grid characteristics of the reserved sample grids as screening characteristics of the sample images. The reliability condition is that the category confidence is greater than or equal to a preset confidence threshold, or the reliability condition is that the category confidence is less than the preset confidence threshold, and for example, the reliability condition is that the category confidence is within a preset confidence range. The reliability condition is used to describe the class of the object in the image area to which the sample grid corresponds.
S405, generating a prediction segmentation result of the sample image based on the screening feature of the sample image and the image feature of the sample image through the image processing model.
The screening feature and the image feature can be used as the feature of the sample image and used for carrying out image segmentation on the sample image to obtain a prediction segmentation result. The predictive segmentation result may be a mask map, i.e. pixels that distinguish between target and non-target in the sample image using different pixel values. The prediction segmentation result is obtained by processing the position information comprising the grids and the class information of the screened key grids based on the screening features, so that the screening features provide richer and more accurate positions and class information to improve the image segmentation accuracy in the image segmentation process, and the screening features are subjected to post-processing, so that the processed data volume can be reduced, and the image segmentation efficiency can be improved.
S406, calculating a first difference between a standard segmentation result of the sample image and the prediction segmentation result through the image processing model.
The standard segmentation result may be a correct segmentation result of the sample image. The first difference is used to describe the difference between the prediction segmentation result and the true value.
S407, calculating a second difference between the standard class of the sample image and the class confidence of each sample grid through the image processing model.
The standard class may be the correct classification result of the sample image. The second difference is used to describe the difference between the category confidence and the true value.
S408, adjusting model parameters of the image processing model according to the first difference and the second difference through the image processing model.
And adjusting model parameters of the image processing model with the aim of reducing the first difference and the second difference. The value of the loss function can be calculated according to the first difference and the second difference, the loss function is converged, and the completion of training of the image processing model is determined.
Optionally, the standard class of the sample image includes: and target categories exist in the region of the sample image corresponding to the sample grid, wherein the number of the target categories is at least one.
The standard class is a class of the object existing in the region to which the sample grid corresponds. There may be at least one object, and the categories of different objects may be the same or different. For example, categories of objects that exist may be combined to form a standard category.
By determining the standard class of the sample image as at least one of the classes of the targets existing in the region corresponding to the sample grid, the detection granularity of the class can be increased, and the class detection accuracy can be improved.
Optionally, the calculating a second difference between the standard class of the sample image and the class confidence of the sample grid includes: a class confidence of the sample grid and a class cross entropy loss value between at least one target class of the corresponding region of the sample image are calculated.
The difference between the class confidence and the standard class may be represented by a bi-class cross entropy loss value (Binary Cross Entropy loss, BCE loss).
When there are a plurality of target categories, a classification cross entropy loss value between the category confidence and only one of the plurality of target categories may be calculated. Alternatively, a class confidence level may be calculated with a class cross entropy loss value between multiple target classes.
Illustratively, the categories of objects present in the region to which the sample grid corresponds include category a, category b, and category c. A class cross entropy loss value between the class confidence and (i1=1, i2=1, i3=1) can be calculated. i1 is category a, i2 is category b and i3 is category c. For another example, the class c alone is used to calculate a two-class cross entropy loss value, specifically a two-class cross entropy loss value between the class confidence and (i1=0, i2=0, i3=1). Sequentially selecting the categories, wherein the later selected category covers the earlier selected category, and determining the last selected category as a target category for calculating the two-category cross entropy loss value.
The detection precision of the category is flexibly configured by carrying out diversified selection on the category for calculating the cross entropy loss value of the two categories.
Optionally, the training method of the image processing model further includes: performing sample frame detection on the sample image based on the screening features of the sample image and the image features of the sample image through the image processing model; the adjusting, by the image processing model, model parameters of the image processing model according to the first difference and the second difference, including: calculating, by the image processing model, a third difference between a standard frame of the sample image and the sample frame; and adjusting model parameters of the image processing model according to the first difference, the second difference and the third difference through the image processing model.
The image processing model is also used for simultaneously detecting the positioning frame of the sample image. The standard frame is the correct target frame of the target in the sample image. And the sample frame is a predicted target frame obtained by processing the sample image by the image processing model. The third difference is used to describe the difference between the sample box and the standard box.
A third difference may be calculated using GIOU (Generalized Intersection over Union, generalized and cross-ratio), and boundaries of the sample box are supervised based on the third difference, thereby assisting in learning of the target segmentation module. In practice, when the image processing model actually lands, the process of detecting the sample frame can be eliminated, and only the image processing model is assisted to learn the frame boundary during training, so that the recognition accuracy of the segmentation boundary is improved.
And detecting the sample frame through the image processing model, and calculating a third difference between the sample frame and the corresponding standard frame, so that the longitudinal and transverse boundaries of the object can be perceived, and the learning of the target segmentation module is assisted.
Optionally, the detecting the sample frame of the sample image based on the screening feature of the sample image and the image feature of the sample image includes: and detecting coordinates of a target frame in the sample image according to the screening characteristics of the sample image to obtain a sample frame of the sample image.
By adopting the screening characteristics and the image characteristics of the sample image, the representativeness of key information can be increased, the positioning frame can be further detected, the detection accuracy of the positioning frame can be improved, and the supervision intensity of the target boundary can be further increased, so that the segmentation capability of the image processing model can be improved.
Optionally, the training method of the image processing model further includes: adjusting the grid number of the sample image including sample grids according to the number of the target categories through the image processing model so as to reduce the number of the target categories existing in the region corresponding to the sample grids in the sample image; the gridding the image features of the sample image by the image processing model comprises: and gridding the image characteristics of the sample image according to the adjusted grid quantity through the image processing model.
For detection accuracy, typically one sample grid comprises only one target class. And performing grid division on the sample image to obtain sample grids, and counting the number of target categories included in the sample grids aiming at each sample grid. The sample grid is sized so that as few as one target class as possible is best for one sample grid. The size of the sample grid is adjusted by adjusting the number of grids included in the sample image. The larger the number, the smaller the size; the smaller the number, the larger the size.
By convolving and gridding the image features of the sample image based on the adjusted grid number, the image of the sample grid can be enhanced, redundant information and interference information are reduced, and the image segmentation and classification accuracy is improved.
According to the technical scheme, the target frame is detected in the image to be processed by adopting the screening feature and the image feature to be processed, so that the representativeness of key information can be increased, the positioning frame is detected, and the detection accuracy of the positioning frame can be improved.
Fig. 5 and 6 are scene diagrams of an image processing method according to an embodiment of the present disclosure. The specific method comprises the following steps:
the structure of the whole network is shown in fig. 5, the image processing model adopts a multi-task joint training mode, and three branches of classification (target classification module), segmentation (target segmentation module) and positioning (coordinate) frame regression are arranged in total. The coordinate frame regression branch and the segmentation branch locate the boundary of the object and have the interaction effect; and the classification branches and the segmentation branches are associated in a form of dynamic convolution kernel to play a role of mutual promotion.
The input of the image processing model is RGB (red, green and blue) pictures, the characteristic extraction is carried out through a backbone network (for example, swin-transform), then the multi-scale characteristics are fused through a characteristic aggregation network (for example, FPN), and the result is output after the three tasks are branched.
The image processing model adopts a dynamic convolution kernel design: the dynamic convolution kernel is called because the result of dividing the branch is not learned by a fixed convolution kernel, but the convolution kernel is dynamically determined according to the output of a convolution kernel learning process and participates in the convolution operation, so as to generate the division result.
The image features obtained by aggregation [ Batch, C, H/8,W/8], and classification branches are processed. The convolution kernel prediction process and classification branches share several layers (e.g., 4 layers) of convolution, enhancing the association through feature sharing. And the shared convolution layer is used for obtaining the grid characteristics to be processed.
The classification branch may include: gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed; and according to the characteristics of the to-be-processed grids, carrying out target category prediction on the to-be-processed grids to obtain category confidence of the to-be-processed grids. The convolution kernel prediction process may include: on the basis of classifying branches, screening the grid characteristics to be processed of each grid to be processed according to the class confidence coefficient of each grid to be processed, and obtaining screening characteristics of the images to be processed.
The output shape of the classification branch is [ Batch, S, S, C ] (wherein C is the number of classes, S is the number of grids, meaning that the feature map is divided into areas of S.S. As shown in FIG. 6, the features with class confidence higher than the preset confidence threshold value [ Batch, ndim, P ] (P corresponds to the number of grids), namely the screening features with the shape of [ Batch, n_dim, S, S ] (n_dim is the feature dimension, which can be understood as one feature is predicted by each small grid). The screening features are adopted as dynamic convolution kernels, and the adjusted image features [ Batch, n_dim, H, W ] are convolved to obtain the output of the target segmentation module as [ Batch, P, H, W ].
During model training, if a target gt (real situation) center falls in a certain grid, sample features [ Batch, ndim, P ] corresponding to the sample grid with the target are extracted from the sample grid, namely screening features. Convolving with the adjusted corresponding segmentation module output [ Batch, n_dim, H, W ] to obtain an output shape of [ Batch, P, H, W ]. And calculating and monitoring the dice between the output prediction segmentation result and the gt.
For the classification branch, the score corresponding to the category of gt in the grid is 1, and the rest is 0, so that a target (label) of [ Batch, S, S, C ] can be generated, and BCE loss between the standard category and the category confidence of the classification branch is calculated for supervision.
And for the coordinate frame positioning branch, the image characteristic is convolved independently, or the screening characteristic is convolved, and the difference between the sample frame and the standard frame of the reference frame is calculated by using the GIOU loss to monitor, so that the model can better perceive the transverse and longitudinal boundaries of the object, and the learning of the segmentation branch is assisted.
In a floor application, the coordinate regression branch may be eliminated, which provides additional supervision information only during training to facilitate learning of the segmentation branch, so that only classification and segmentation branches may be output. When model reasoning is carried out, if the class score of a certain grid in (S, C) outputs is higher than a confidence threshold (default 0.3), the mask output of the corresponding index in the split branch output is obtained as a split graph. Wherein, to support different brands and illegal scale judgment, c=n×2 can be set, where N is the number of brands, and each brand can additionally set judgment of a list or group.
In addition, the coordinate regression branch can be trained independently and connected in series with the image processing model for co-training.
The existing scheme only relates to position detection of the single vehicle or the electric vehicle, however, accurate algorithm identification is needed to achieve accurate supervision of the shared single vehicle, for example, different brands of single vehicles need to be connected with processing flows of different manufacturers, different numbers of single vehicles also correspond to different-scale illegal stop events, and therefore information such as brands, numbers and the like need to be output while the position is detected.
The embodiment of the disclosure enriches the functional dimension of the product, and increases the product functions of brand identification and scale judgment besides basic identification of shared bicycles and shared electric vehicles. In addition, due to the multiple functions, the scheme of combined training can enable the tasks with strong correlation to mutually promote, and the recognition effect is improved.
Fig. 7 is a block diagram of an image processing apparatus in an embodiment of the present disclosure, which is applicable to a case where division and classification of a target object are performed on an image, according to an embodiment of the present disclosure. The device is realized by software and/or hardware, and is specifically configured in the electronic equipment with certain data operation capability.
An image processing apparatus 700 as shown in fig. 7 includes: a feature extraction module 701, a target classification module 702, and a target segmentation module 703. Wherein,
the feature extraction module 701 is configured to perform feature extraction on an image to be processed to obtain an image feature of the image to be processed, where a size of the image feature of the image to be processed is smaller than a size of the image to be processed;
the object classification module 702 is configured to gridde the image features of the image to be processed to obtain a grid feature to be processed of at least one grid to be processed;
the target classification module 702 is configured to predict a target class of the to-be-processed grid according to the to-be-processed grid characteristics of the to-be-processed grid, so as to obtain a class confidence coefficient of the to-be-processed grid;
the target segmentation module 703 is configured to screen the feature of the to-be-processed grid of each to-be-processed grid according to the category confidence coefficient of each to-be-processed grid, so as to obtain a screened feature of the to-be-processed image;
the object segmentation module 703 is configured to generate an image segmentation result of the image to be processed based on the filtering feature of the image to be processed and the image feature of the image to be processed.
According to the technical scheme, the image characteristics are obtained by extracting the characteristics of the image to be processed, the image characteristics are meshed to form the grid characteristics to be processed of a plurality of grids to be processed, category prediction is carried out on the grids to be processed, the grid characteristics to be processed are screened according to category confidence, screening characteristics are obtained, the screening characteristics and the image characteristics are processed to generate an image segmentation result of the image to be processed, in the image segmentation process, the position sensitivity and the target sensitivity of a detection head are increased, the segmentation accuracy is improved, meanwhile, based on the screened data processing, the processed data quantity is reduced, and the image segmentation efficiency is improved.
Further, the object segmentation module includes: the grid screening unit is used for screening target grids to be processed with targets according to the category confidence level of each grid to be processed; and the feature screening unit is used for combining the grid features to be processed of each target grid to be processed to generate screening features of the images to be processed.
Further, the area motion detection module includes: the class prediction unit is used for predicting the target class of the to-be-processed grid according to the to-be-processed grid characteristics of the to-be-processed grid, so as to obtain the probability of at least one class to which the to-be-processed grid belongs; the confidence determining unit is used for selecting the highest probability from the probabilities of at least one category to which the grid to be processed belongs, and determining the highest probability as the category confidence of the grid to be processed.
Further, the object segmentation module includes: the size adjustment unit is used for carrying out convolution and size adjustment on the image characteristics of the image to be processed to obtain adjusted image characteristics of the image to be processed, and the size of the adjusted image characteristics of the image to be processed is the same as the size of the image to be processed; the dynamic convolution unit is used for determining the screening feature of the image to be processed as a convolution kernel, convolving the adjusted image feature of the image to be processed, generating a mask segmentation result in the image to be processed, and determining the mask segmentation result as an image segmentation result of the image to be processed.
Further, the object classification module includes: the characteristic interpolation unit is used for interpolating the image characteristics of the image to be processed to obtain the interpolation characteristics of the image to be processed; and the grid dividing unit is used for carrying out grid division on the interpolation characteristics of the image to be processed to obtain the grid characteristics to be processed of at least one grid to be processed.
Further, the category confidence includes a brand detection result of the shared non-motor vehicle; the image processing apparatus further includes: and the violation detection module is used for detecting whether the shared non-motor vehicle is illegally parked or not according to the segmentation result of the shared non-motor vehicle in the image to be processed and the brand detection result of the shared non-motor vehicle after the image segmentation result of the image to be processed is generated. The image processing apparatus further includes: and the target frame detection module is used for detecting the target frame of the image to be processed based on the screening characteristics of the image to be processed and the image characteristics of the image to be processed.
Further, the brand detection result of the shared non-motor vehicle comprises yes or no; or the confidence detection result of the brand of the shared non-motor vehicle comprises a single vehicle or a group vehicle.
Further, the image processing apparatus further includes: and the target frame detection module is used for detecting the target frame of the image to be processed based on the screening characteristics of the image to be processed and the image characteristics of the image to be processed.
Further, the feature extraction module. Comprising the following steps: the multi-scale feature extraction unit is used for carrying out multi-scale feature extraction on the image to be processed to obtain features with multiple sizes; and the multi-scale feature fusion unit is used for fusing the features with the multiple sizes to obtain the image features of the image to be processed.
Further, the image processing device is configured in the image processing model.
The image processing device can execute the image processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the image processing method.
Fig. 8 is a block diagram of a training apparatus of an image processing model in an embodiment of the present disclosure, according to an embodiment of the present disclosure, which is applicable to a case of training an image processing model for segmenting and classifying an image of a target object. The device is realized by software and/or hardware, and is specifically configured in the electronic equipment with certain data operation capability. The image processing model is configured with an image processing apparatus according to any of the embodiments of the present disclosure.
An image processing model training apparatus 800 as shown in fig. 8, comprising: image processing model 801. Wherein,
the image processing model is used for extracting the characteristics of a sample image to obtain the image characteristics of the sample image, and the size of the image characteristics of the sample image is smaller than that of the sample image; gridding the image features of the sample image to obtain sample grid features of at least one sample grid; according to sample grid characteristics of the sample grid, carrying out target category prediction on the sample grid to obtain category confidence of the sample grid; screening sample grid features of each sample grid according to the category confidence of each sample grid to obtain screening features of the sample image; generating a prediction segmentation result of the sample image based on the screening features of the sample image and the image features of the sample image; calculating a first difference between a standard segmentation result and the predicted segmentation result of the sample image; calculating a second difference between the standard class of the sample image and the class confidence of each of the sample grids; and adjusting model parameters of the image processing model according to the first difference and the second difference.
Further, the standard class of the sample image includes: and target categories exist in the region of the sample image corresponding to the sample grid, wherein the number of the target categories is at least one.
Further, the image processing model is further configured to: a class confidence of the sample grid and a class cross entropy loss value between at least one target class of the corresponding region of the sample image are calculated.
Further, the image processing model is configured to perform sample frame detection on the sample image based on the screening feature of the sample image and the image feature of the sample image; the image processing model is used for calculating a third difference between a standard frame of the sample image and the sample frame; the image processing model is used for adjusting model parameters of the image processing model according to the first difference, the second difference and the third difference.
Further, the image processing model is configured to detect coordinates of a target frame in the sample image according to the screening feature of the sample image, so as to obtain a sample frame of the sample image.
Further, the image processing model is configured to adjust, according to the number of target categories, the number of grids of the sample image including sample grids, so as to reduce the number of target categories existing in a region corresponding to the sample grids in the sample image; the image processing model is used for gridding the image characteristics of the sample image according to the adjusted grid quantity.
The training device for the image processing model can execute the training method for the image processing model provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of executing the training method for the image processing model.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 9 shows a schematic area diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, an image processing method or a training method of an image processing model. For example, in some embodiments, the image processing method or the training method of the image processing model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the image processing method or the training method of the image processing model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the image processing method or the training method of the image processing model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application specific standard objects (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or region diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (35)

1. An image processing method, comprising:
extracting features of an image to be processed to obtain image features of the image to be processed, wherein the size of the image features of the image to be processed is smaller than that of the image to be processed;
gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed;
according to the grid characteristics to be processed of the grid to be processed, carrying out target category prediction on the grid to be processed to obtain category confidence of the grid to be processed;
screening the grid characteristics to be processed of each grid to be processed according to the category confidence degree of each grid to be processed to obtain screening characteristics of the images to be processed;
and generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed.
2. The method of claim 1, wherein the screening the feature of each of the grids to be processed according to the category confidence of each of the grids to be processed to obtain the screened feature of the image to be processed comprises:
screening target grids to be processed with targets according to the category confidence of each grid to be processed;
and combining the grid characteristics to be processed of each target grid to be processed to generate screening characteristics of the images to be processed.
3. The method according to claim 1 or 2, wherein the performing target class prediction on the to-be-processed mesh according to the to-be-processed mesh characteristics of the to-be-processed mesh to obtain the class confidence of the to-be-processed mesh includes:
according to the characteristics of the to-be-processed grid, carrying out target category prediction on the to-be-processed grid to obtain the probability of at least one category to which the to-be-processed grid belongs;
and selecting the highest probability from the probabilities of at least one category to which the grid to be processed belongs, and determining the highest probability as the category confidence of the grid to be processed.
4. The method of claim 1, wherein the generating the image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed comprises:
Convolving and adjusting the image characteristics of the image to be processed to obtain adjusted image characteristics of the image to be processed, wherein the size of the adjusted image characteristics of the image to be processed is the same as the size of the image to be processed;
and determining the screening characteristics of the image to be processed as a convolution kernel, convolving the adjusted image characteristics of the image to be processed, generating a mask segmentation result in the image to be processed, and determining the mask segmentation result as an image segmentation result of the image to be processed.
5. The method of claim 1, wherein the gridding the image features of the image to be processed to obtain the grid features to be processed of at least one grid to be processed, comprises:
interpolation is carried out on the image characteristics of the image to be processed, so that the interpolation characteristics of the image to be processed are obtained;
and carrying out grid division on the interpolation characteristics of the image to be processed to obtain the grid characteristics to be processed of at least one grid to be processed.
6. The method of claim 1, wherein the category confidence comprises a shared non-motor vehicle brand detection result;
after generating the image segmentation result of the image to be processed, the method further comprises the following steps:
And detecting whether the shared non-motor vehicle is illegally parked or not according to the segmentation result of the shared non-motor vehicle and the brand detection result of the shared non-motor vehicle in the image to be processed.
7. The method of claim 6, wherein the shared non-motor vehicle brand detection result comprises yes or no; or the confidence detection result of the brand of the shared non-motor vehicle comprises a single vehicle or a group vehicle.
8. The method of claim 1, further comprising:
and detecting a target frame of the image to be processed based on the screening characteristics of the image to be processed and the image characteristics of the image to be processed.
9. The method of claim 1, wherein the feature extraction of the image to be processed to obtain the image features of the image to be processed comprises:
carrying out multi-scale feature extraction on the image to be processed to obtain features with multiple sizes;
and fusing the features with the multiple sizes to obtain the image features of the image to be processed.
10. The method of claim 1, wherein the image processing method is implemented by a pre-trained image processing model.
11. A method of training an image processing model, comprising: the image processing model implements the image processing method according to any one of claims 1 to 10;
Extracting features of a sample image through the image processing model to obtain image features of the sample image, wherein the size of the image features of the sample image is smaller than that of the sample image;
gridding the image features of the sample image through the image processing model to obtain sample grid features of at least one sample grid;
performing target category prediction on the sample grid according to sample grid characteristics of the sample grid through the image processing model to obtain category confidence of the sample grid;
screening sample grid features of each sample grid according to the category confidence coefficient of each sample grid through the image processing model to obtain screening features of the sample images;
generating a prediction segmentation result of the sample image based on the screening feature of the sample image and the image feature of the sample image through the image processing model;
calculating, by the image processing model, a first difference between a standard segmentation result and the predicted segmentation result of the sample image;
calculating, by the image processing model, a second difference between the standard class of the sample image and the class confidence of each of the sample grids;
And adjusting model parameters of the image processing model according to the first difference and the second difference through the image processing model.
12. The method of claim 11, wherein the standard class of sample images comprises: and target categories exist in the region of the sample image corresponding to the sample grid, wherein the number of the target categories is at least one.
13. The method of claim 12, wherein the calculating a second difference between the standard class of the sample image and the class confidence of the sample grid comprises:
a class confidence of the sample grid and a class cross entropy loss value between at least one target class of the corresponding region of the sample image are calculated.
14. The method of claim 13, further comprising:
performing sample frame detection on the sample image based on the screening features of the sample image and the image features of the sample image through the image processing model;
the adjusting, by the image processing model, model parameters of the image processing model according to the first difference and the second difference, including:
Calculating, by the image processing model, a third difference between a standard frame of the sample image and the sample frame;
and adjusting model parameters of the image processing model according to the first difference, the second difference and the third difference through the image processing model.
15. The method of claim 14, wherein the sample frame detecting the sample image based on the screening features of the sample image and the image features of the sample image comprises:
and detecting coordinates of a target frame in the sample image according to the screening characteristics of the sample image to obtain a sample frame of the sample image.
16. The method of claim 11, further comprising:
adjusting the grid number of the sample image including sample grids according to the number of the target categories through the image processing model so as to reduce the number of the target categories existing in the region corresponding to the sample grids in the sample image;
the gridding the image features of the sample image by the image processing model comprises:
and gridding the image characteristics of the sample image according to the adjusted grid quantity through the image processing model.
17. An image processing apparatus comprising:
the feature extraction module is used for extracting features of the image to be processed to obtain image features of the image to be processed, and the size of the image features of the image to be processed is smaller than that of the image to be processed;
the target classification module is used for gridding the image characteristics of the image to be processed to obtain the grid characteristics to be processed of at least one grid to be processed;
the target classification module is used for predicting the target category of the grid to be processed according to the grid characteristics to be processed of the grid to be processed, so as to obtain the category confidence coefficient of the grid to be processed;
the target segmentation module is used for screening the grid characteristics to be processed of each grid to be processed according to the category confidence coefficient of each grid to be processed to obtain screening characteristics of the images to be processed;
the target segmentation module is used for generating an image segmentation result of the image to be processed based on the screening feature of the image to be processed and the image feature of the image to be processed.
18. The apparatus of claim 17, the target segmentation module comprising:
the grid screening unit is used for screening target grids to be processed with targets according to the category confidence level of each grid to be processed;
And the feature screening unit is used for combining the grid features to be processed of each target grid to be processed to generate screening features of the images to be processed.
19. The apparatus of claim 17 or 18, wherein the region motion detection module comprises:
the class prediction unit is used for predicting the target class of the to-be-processed grid according to the to-be-processed grid characteristics of the to-be-processed grid, so as to obtain the probability of at least one class to which the to-be-processed grid belongs;
the confidence determining unit is used for selecting the highest probability from the probabilities of at least one category to which the grid to be processed belongs, and determining the highest probability as the category confidence of the grid to be processed.
20. The apparatus of claim 17, wherein the target segmentation module comprises:
the size adjustment unit is used for carrying out convolution and size adjustment on the image characteristics of the image to be processed to obtain adjusted image characteristics of the image to be processed, and the size of the adjusted image characteristics of the image to be processed is the same as the size of the image to be processed;
the dynamic convolution unit is used for determining the screening feature of the image to be processed as a convolution kernel, convolving the adjusted image feature of the image to be processed, generating a mask segmentation result in the image to be processed, and determining the mask segmentation result as an image segmentation result of the image to be processed.
21. The apparatus of claim 17, wherein the object classification module comprises:
the characteristic interpolation unit is used for interpolating the image characteristics of the image to be processed to obtain the interpolation characteristics of the image to be processed;
and the grid dividing unit is used for carrying out grid division on the interpolation characteristics of the image to be processed to obtain the grid characteristics to be processed of at least one grid to be processed.
22. The apparatus of claim 17, wherein the category confidence comprises a shared non-motor vehicle brand detection result;
the apparatus further comprises:
and the violation detection module is used for detecting whether the shared non-motor vehicle is illegally parked or not according to the segmentation result of the shared non-motor vehicle in the image to be processed and the brand detection result of the shared non-motor vehicle after the image segmentation result of the image to be processed is generated.
The apparatus further comprises:
and the target frame detection module is used for detecting the target frame of the image to be processed based on the screening characteristics of the image to be processed and the image characteristics of the image to be processed.
23. The apparatus of claim 22, wherein the shared non-motor vehicle brand detection result comprises yes or no; or the confidence detection result of the brand of the shared non-motor vehicle comprises a single vehicle or a group vehicle.
24. The apparatus of claim 17, further comprising:
and the target frame detection module is used for detecting the target frame of the image to be processed based on the screening characteristics of the image to be processed and the image characteristics of the image to be processed.
25. The apparatus of claim 17, wherein the feature extraction module. Comprising the following steps:
the multi-scale feature extraction unit is used for carrying out multi-scale feature extraction on the image to be processed to obtain features with multiple sizes;
and the multi-scale feature fusion unit is used for fusing the features with the multiple sizes to obtain the image features of the image to be processed.
26. The apparatus of claim 17, wherein the image processing apparatus is configured in the image processing model.
27. A training apparatus for an image processing model, comprising: the image processing model being configured with an image processing apparatus as claimed in any one of claims 17 to 26;
the image processing model is used for extracting features of a sample image to obtain image features of the sample image, and the size of the image features of the sample image is smaller than that of the sample image;
the image processing model is used for gridding the image characteristics of the sample image to obtain sample grid characteristics of at least one sample grid;
The image processing model is used for predicting the target category of the sample grid according to the sample grid characteristics of the sample grid to obtain the category confidence of the sample grid;
the image processing model is used for screening sample grid characteristics of each sample grid according to the category confidence coefficient of each sample grid to obtain screening characteristics of the sample image;
the image processing model is used for generating a prediction segmentation result of the sample image based on the screening feature of the sample image and the image feature of the sample image;
the image processing model is used for calculating a first difference between a standard segmentation result and the prediction segmentation result of the sample image;
the image processing model is used for calculating a second difference between the standard class of the sample image and the class confidence of each sample grid;
the image processing model is used for adjusting model parameters of the image processing model according to the first difference and the second difference.
28. The apparatus of claim 27, wherein the standard class of sample images comprises: and target categories exist in the region of the sample image corresponding to the sample grid, wherein the number of the target categories is at least one.
29. The apparatus of claim 28, wherein the image processing model is further configured to:
a class confidence of the sample grid and a class cross entropy loss value between at least one target class of the corresponding region of the sample image are calculated.
30. The apparatus of claim 29, further comprising:
the image processing model is used for detecting a sample frame of the sample image based on the screening characteristics of the sample image and the image characteristics of the sample image;
the image processing model is used for calculating a third difference between a standard frame of the sample image and the sample frame;
the image processing model is used for adjusting model parameters of the image processing model according to the first difference, the second difference and the third difference.
31. The apparatus of claim 30, further comprising:
and the image processing model is used for detecting coordinates of a target frame in the sample image according to the screening characteristics of the sample image to obtain a sample frame of the sample image.
32. The apparatus of claim 27, further comprising:
the image processing model is used for adjusting the grid number of the sample grids in the sample image according to the number of the target categories so as to reduce the number of the target categories in the region corresponding to the sample grids in the sample image;
The image processing model is used for gridding the image characteristics of the sample image according to the adjusted grid quantity.
33. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of any one of claims 1-10 or the training method of the image processing model of any one of claims 11-16.
34. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the image processing method according to any one of claims 1-10 or the training method of the image processing model according to any one of claims 11-16.
35. A computer program product comprising a computer program which, when executed by a processor, implements the image processing method according to any one of claims 1-10 or the training method of the image processing model according to any one of claims 11-16.
CN202310780968.XA 2023-06-28 2023-06-28 Image processing or training method, device, equipment and medium of image processing model Pending CN117078997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310780968.XA CN117078997A (en) 2023-06-28 2023-06-28 Image processing or training method, device, equipment and medium of image processing model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310780968.XA CN117078997A (en) 2023-06-28 2023-06-28 Image processing or training method, device, equipment and medium of image processing model

Publications (1)

Publication Number Publication Date
CN117078997A true CN117078997A (en) 2023-11-17

Family

ID=88705023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310780968.XA Pending CN117078997A (en) 2023-06-28 2023-06-28 Image processing or training method, device, equipment and medium of image processing model

Country Status (1)

Country Link
CN (1) CN117078997A (en)

Similar Documents

Publication Publication Date Title
CN109559320B (en) Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network
CN113902897B (en) Training of target detection model, target detection method, device, equipment and medium
CN113139543B (en) Training method of target object detection model, target object detection method and equipment
CN113012176B (en) Sample image processing method and device, electronic equipment and storage medium
CN114140683A (en) Aerial image target detection method, equipment and medium
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
CN113177968A (en) Target tracking method and device, electronic equipment and storage medium
CN115512251A (en) Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement
CN110705338A (en) Vehicle detection method and device and monitoring equipment
CN110795975A (en) Face false detection optimization method and device
CN111652181A (en) Target tracking method and device and electronic equipment
CN113901911B (en) Image recognition method, image recognition device, model training method, model training device, electronic equipment and storage medium
CN109934072B (en) Personnel counting method and device
CN114022865A (en) Image processing method, apparatus, device and medium based on lane line recognition model
CN113065379B (en) Image detection method and device integrating image quality and electronic equipment
CN115439692A (en) Image processing method and device, electronic equipment and medium
CN115273148A (en) Pedestrian re-recognition model training method and device, electronic equipment and storage medium
CN115436900A (en) Target detection method, device, equipment and medium based on radar map
CN117078997A (en) Image processing or training method, device, equipment and medium of image processing model
CN114926791A (en) Method and device for detecting abnormal lane change of vehicles at intersection, storage medium and electronic equipment
CN113989300A (en) Lane line segmentation method and device, electronic equipment and storage medium
CN114092739B (en) Image processing method, apparatus, device, storage medium, and program product
CN113806361B (en) Method, device and storage medium for associating electronic monitoring equipment with road
CN115049895B (en) Image attribute identification method, attribute identification model training method and device
WO2022217551A1 (en) Target detection method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination