CN117237596B

CN117237596B - Image recognition method, device, computer equipment and storage medium

Info

Publication number: CN117237596B
Application number: CN202311515064.0A
Authority: CN
Inventors: 张权; 王刚; 赵哲; 吕炎州; 肖圣端; 伍绍桂
Original assignee: Guangzhou Yihong Intelligent Equipment Co ltd
Current assignee: Guangzhou Yihong Intelligent Equipment Co ltd
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-07-23
Anticipated expiration: 2043-11-15
Also published as: CN117237596A

Abstract

The application relates to an image recognition method, an image recognition device, computer equipment and a storage medium, wherein a target area in image information is firstly positioned based on output key point position information by utilizing a neural network model obtained through training, after the target area is positioned, the boundary between each target area and between the target area and a background area is obtained by simultaneously performing irrigation expansion by utilizing a water-flooding filling method by taking the obtained key point position information in the target area and any initial position information of the background area as starting points. According to the scheme provided by the application, for the target areas with the overlapping, the actual sizes of the target areas with the overlapping can be identified by utilizing the key point detection and the water-diffusion filling mode, so that the identification efficiency of the images is improved.

Description

Image recognition method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of image processing, and in particular, to an image recognition method, apparatus, computer device, and storage medium.

Background

Image segmentation technology is the most popular technology in the field of predictive images, and has changed tremendously over the past decade. Today, it is widely used in the fields of image classification, face recognition, object detection, video analysis, and image processing in robots and automatic driving automobiles.

In industrial generation, it is often necessary to locate target areas of an image and calculate the number of target areas, an example segmentation of the image segmentation technique solving this problem well. However, when the conventional technology uses the instance segmentation to locate the target region, if the target regions in the image are overlapped, the instance segmentation cannot accurately identify the position of the target region in the image, and thus cannot calculate the area of the target region, resulting in low image identification efficiency.

Disclosure of Invention

In view of the above, the present application aims to provide an image recognition method, which is capable of positioning overlapping target areas in an image by inputting image information into a trained neural network, and further recognizing each target area by combining with water flooding filling, so as to improve image recognition efficiency.

In one embodiment, the present application provides an image recognition method, including:

Acquiring image information; the image information comprises a plurality of target areas, and the target areas are overlapped;

Inputting the image information into a trained neural network model, and outputting to obtain the position information of key points of each target area;

And simultaneously expanding by taking each target area as a positive sample, taking the background area as a negative sample, taking the key point position information determined by each target area and the starting position information arbitrarily set by the background area as expansion starting points, and respectively forming boundary information at the intersection between each target area with changed pixel values and between the target area and the background area so as to identify and obtain each target area in the image information.

Compared with the prior art, the image recognition method provided by the application has the advantages that the target area in the image information is firstly positioned based on the outputted key point position information by utilizing the neural network model obtained through training, after the target area is positioned, the target area is irrigated and expanded by utilizing the obtained key point position information by utilizing the water-diffusion filling method, and the boundary between each target area and the boundary between the target area and the background area are obtained by irrigating and expanding the background area by utilizing any initial position information. Therefore, the scheme of the application can identify the actual size of each overlapped target area by utilizing the key point detection and the water-flooding filling mode for the overlapped target areas, thereby improving the identification efficiency of the images.

Further, training the initial neural network model to obtain a trained neural network model, which specifically comprises the following steps:

acquiring first image information, wherein the first image information comprises a plurality of target areas, and the target areas are overlapped;

Labeling the position information of the key points in each target area in the first image information to obtain second image information; the second image information is marked with the position information of key points of each target area;

And inputting the second image information into an initial neural network model for training to obtain a trained neural network model.

Further, inputting the second image information into an initial neural network model for training, and obtaining a trained neural network model includes:

extracting features of the second image information to obtain image feature information of the second image information;

calculating the image characteristic information based on a multi-layer network to obtain the key point position information of each target area in the second image information;

Outputting the key point position information of each target area;

Calculating the loss of the key point position information and the actual key point position information of each target area by using a loss function, if the loss is larger than a threshold value, updating network parameters of the network parameters transmitted back to the initial neural network model, and calculating the contribution degree of each network parameter to the loss; continuously correcting network parameters, and circularly executing the steps to train the initial neural network until the loss is smaller than the threshold value, thereby obtaining the trained neural network.

Further, the step of simultaneously expanding the target areas with the background areas as positive samples and the key point position information determined by the target areas and the starting position information arbitrarily set by the background areas as expansion starting points, and forming boundary information at intersections between the target areas with changed pixel values and between the target areas and the background areas, respectively, so as to identify and obtain the target areas in the image information includes:

acquiring pixel values of target areas where the position information of the key points is located, and taking the pixel value corresponding to the target area with the lowest pixel value as a first pixel value;

The key point position information of the target area where the first pixel value is located is used as a first expansion starting point to be expanded outwards until the expanded pixel value reaches a second pixel value of a target area adjacent to the target area where the current key point position information is located, and first boundary information is formed at the intersection of the current target area and the adjacent target area where the second pixel value is located;

continuing to expand outwards by taking the key point position information in the target area where the second pixel value is positioned as a new first expansion starting point until forming a plurality of first boundary information among all target areas is completed;

Acquiring a third pixel value of a background area, using starting position information arbitrarily set in the background area as a second expansion starting point to expand outwards until the expanded fourth pixel value reaches a first pixel value corresponding to each target area, and forming a plurality of second boundary information at the intersection of the background area and all the target areas;

And identifying and obtaining the target area where the position information of each key point is based on the plurality of first boundary information and the plurality of second boundary information.

Further, identifying, based on the plurality of first boundary information and the plurality of second boundary information, a target area in which each of the key point position information is located includes:

And obtaining the area of the target area based on the pixel number of the target area where the position information of each key point obtained by recognition is located.

In another embodiment, the present application further provides an image recognition apparatus, including:

The acquisition module is used for acquiring image information; the image information comprises a plurality of target areas, and the target areas are overlapped;

The output module is used for inputting the image information into the trained neural network model and outputting the key point position information of each target area;

The identification module is used for simultaneously expanding by taking each target area as a positive sample and the background area as a negative sample and taking the key point position information determined by each target area and the starting position information arbitrarily set by the background area as expansion starting points, and boundary information is respectively formed at the intersection between each target area with changed pixel values and between the target area and the background area so as to identify each target area in the obtained image information.

Further, the identification module includes:

the first acquisition unit is used for acquiring pixel values of target areas where the position information of the key points is located, and taking the pixel value corresponding to the target area with the lowest pixel value as a first pixel value;

the first expansion unit is used for taking the key point position information of the target area where the first pixel value is located as a first expansion starting point to expand outwards until the expanded pixel value reaches a second pixel value of the target area adjacent to the target area where the current key point position information is located, and first boundary information is formed at the intersection of the current target area and the adjacent target area where the second pixel value is located;

a second expansion unit, configured to continue to expand outwards with the keypoint location information in the target area where the second pixel value is located as a new first expansion starting point until forming a plurality of first boundary information between all target areas is completed;

A second obtaining unit, configured to obtain a third pixel value of a background area, and take the starting position information arbitrarily set in the background area as a second expansion starting point to expand outwards until the fourth pixel value after expansion reaches a first pixel value corresponding to each target area, where the background area meets all the target areas to form a plurality of second boundary information;

And the identification unit is used for identifying and obtaining the target area where the position information of each key point is located based on the plurality of first boundary information and the second boundary information.

In another embodiment, the present application also provides a computer device comprising:

a processor;

A memory for storing a computer program for execution by the processor; wherein the processor implements the image recognition method described in the above embodiments when executing the computer program.

In another embodiment, the present application also provides a storage medium having stored thereon a computer program which, when executed, implements the image recognition method described in the above embodiment.

For a clearer understanding of the present invention, specific embodiments of the invention will be set forth in the following description taken in conjunction with the accompanying drawings.

Drawings

FIG. 1 is a flowchart of an image recognition method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of an image recognition method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an image recognition device according to an embodiment of the present application.

Detailed Description

As described in the prior art, an image instance segmentation technique is generally used to locate an image area in an industrial production process, however, for a product image, multiple target areas are adhered together and overlapped, which cannot be accurately located and identified by the existing instance segmentation technique.

Based on the above-mentioned problems, please refer to fig. 1. In one embodiment, the present application provides an image recognition method, including:

step S1: acquiring image information; the image information comprises a plurality of target areas, and the target areas are overlapped;

in particular, an image acquisition device may be employed to acquire image information. The image information can be product image information in industrial production, wherein the product image information comprises a plurality of target areas, and the target areas are overlapped. The key point position information of each target area in the product image information can be directly output and obtained by acquiring the product image information comprising a plurality of overlapped target areas and inputting the product image information into a neural network model which is trained later.

Step S2: inputting the image information into a trained neural network model, and outputting to obtain the position information of key points of each target area;

In this embodiment, an initial neural network model is trained based on acquired sample image information, sample image information of a plurality of overlapped target areas is acquired, and key point position information corresponding to the target areas is marked on each target area to be identified. Specifically, the location information of the key points in each target area can be manually marked, after the image information of the location information of the key points of each target area is obtained, the image information of the location information of the key points of each target area is input into the initial neural network model for training, so that the initial neural network model can learn the location information of the key points of all the target areas, and the initial neural network model is trained to obtain the trained neural network model.

Specifically, in the step S2, the training is performed on the initial neural network model to obtain a trained neural network model, which specifically includes the following steps:

Step S21: acquiring first image information, wherein the first image information comprises a plurality of target areas, and the target areas are overlapped;

Step S22: labeling the position information of the key points in each target area in the first image information to obtain second image information; the second image information is marked with the position information of key points of each target area;

Specifically, when the initial neural network model is trained, the obtained key point position information of each target area in the first image information is manually marked, and it can be understood that the key point position information of each target area is the position information of the central point, namely, the area central point on the image information is manually marked. In the process of training the initial neural network model, the key point position information is manually marked, so that the initial neural network model is learned during training, and the trained neural network model can automatically identify the key point position information of a target area in the output image information. In this embodiment, the identification of the location information of the center point of the image area is performed by manually labeling and combining with training of a neural network model, and preferably, the neural network model adopts a CNN convolutional neural network model. In practice, other machine vision methods may be used to identify the center point of the image, and different methods may be specifically used according to different data, which is not limited in this embodiment.

Step S23: and inputting the second image information into an initial neural network model for training to obtain a trained neural network model.

Specifically, after image information overlapped by a plurality of target areas is collected and key point position information is manually marked in each target area to be identified, the image information is input into an initial neural network model for training, so that the initial neural network model learns the key point position information of each target area in an image, and further the initial neural network model is trained to obtain a trained neural network model. And after the image information including the overlapping of a plurality of target areas is acquired, the image information is directly input into a trained neural network model, and the position information of the key points in each target area can be directly output.

Further, the step S23 includes:

Step S231: extracting features of the second image information to obtain image feature information of the second image information;

Step S232: calculating the image characteristic information based on a multi-layer network to obtain the key point position information of each target area in the second image information;

Step S233: outputting the key point position information of each target area;

Step S234: calculating the loss of the key point position information and the actual key point position information of each target area by using a loss function, if the loss is larger than a threshold value, updating the network parameters by the network parameters which are transmitted back to the initial neural network model, and calculating the contribution degree of each network parameter to the loss; continuously correcting network parameters, and circularly executing the steps S231-S234 to train the initial neural network until the loss is smaller than a threshold value to obtain the trained neural network. Wherein the threshold value may be determined from different data sets.

Specifically, the loss function is used to compare the model prediction result with the deviation degree of the actual result, and can be used to update the model parameters. In this embodiment, the loss function is used to compare the loss between the key point position information of each target area output by the model and the actual key point position information, and when the loss meets the stop condition, it is explained that the model learns the key point position information, otherwise, the network parameters are continuously updated through the loss obtained by calculation to perform training on the initial neural network until the training is completed. Wherein the loss function may be calculated as a cross entropy loss function.

In this embodiment, in the process of training an initial neural network model, key point position information of each target area in first image information is manually marked to obtain second image information, the second image information is input into the initial neural network model, the initial neural network model extracts image features of the second image information to obtain an intermediate result, a multi-layer network is used for calculating the intermediate result obtained each time so that the initial neural network model learns the key point position information of each target area in the image information and outputs the key point position information, after the initial neural network model outputs the key point position information of each target area, loss of the output key point position information and actual key point position information is calculated, if the loss is large, the loss is returned to network parameters of the initial neural network model to update, the network parameters are continuously updated until the loss is smaller and smaller, and a trained neural network model is obtained after the stopping condition is met. When the trained neural network model is used, the image information comprising overlapping of a plurality of target areas is input into the trained neural network model, and then the key point position information corresponding to each target area can be directly output.

Step S3: and simultaneously expanding by taking each target area as a positive sample, taking the background area as a negative sample, taking the key point position information determined by each target area and the starting position information arbitrarily set by the background area as expansion starting points, and respectively forming boundary information at the intersection between each target area with changed pixel values and between the target area and the background area so as to identify and obtain each target area in the image information.

In this embodiment, after the location information of the key point of each target area is determined, each target area corresponding to the location information of the key point is taken as a positive sample, the background area is taken as a negative sample, the location information of the key point determined by each target area and the initial location information arbitrarily set by the background area are taken as expansion starting points, expansion is performed simultaneously based on filling in water, and boundary information is formed at the intersection between each target area with changed pixel values and between each target area and the background area, so that each target area in the image information can be identified and obtained.

Specifically, referring to fig. 2, the step S3 specifically includes:

Step S31: acquiring pixel values of target areas where the position information of the key points is located, and taking the pixel value corresponding to the target area with the lowest pixel value as a first pixel value;

Step S32: the key point position information of the target area where the first pixel value is located is used as a first expansion starting point to be expanded outwards until the expanded pixel value reaches a second pixel value of a target area adjacent to the target area where the current key point position information is located, and first boundary information is formed at the intersection of the current target area and the adjacent target area where the second pixel value is located;

Specifically, assuming that the image information includes three overlapped target areas, and pixel values of the target areas are different, if the obtained target area pixels where the key point position information is located are higher, the highlight positions of the corresponding target areas in the image information are described; if the obtained pixels of the key point information positions are lower in the pixels of the target area, the positions of the corresponding target areas in the image are indicated; and the pixel value is different from the highlight position or the valley position, it is determined as the background area of the image information.

Further, in this embodiment, after the position information of each key point is obtained by outputting, the pixel values of the target area where the position information of each key point is located are obtained first, for example, assuming that three overlapped target areas in the image information are A, B, C respectively, the respective corresponding pixel values are assumed to be 30, 50 and 70 respectively, filling is started from the target area a where the pixel value is the lowest 30 by using a water diffusion filling algorithm, the position information of the key point of the target area a is used as the first expansion starting point, the pixel values in the water filling expansion process are continuously increased, all the pixel values 30-49 are included in the target area a, when the pixel value is smaller than 50, the key point position information of the target area a is used as the first expansion starting point, the target area B, C is unchanged until the pixel value is increased to 50, that is, the pixel value after expansion reaches the pixel value of the target area B, and a boundary is formed at the intersection of the area a and the target area B. Preferably, the boundary of the two regions may be formed by plotting a curve when the target region a and the target region B meet.

Step S33: continuing to expand outwards by taking the key point position information in the target area where the second pixel value is positioned as a new first expansion starting point until forming a plurality of first boundary information among all target areas is completed;

Specifically, as described above, when the pixel value after expansion reaches the pixel value 50 of the target area B, the boundary between the target area a and the target area B is formed, at this time, the target area B starts to be irrigated and expanded, the critical point position information of the target area B is used as the new first expansion starting point, the irrigation and expansion are continued, at this time, the target area C is unchanged until the pixel value increases to 70, that is, the pixel value after expansion reaches the pixel value of the target area C, the boundary between the target area B and the target area C is formed at the area intersection of the target area B and the target area C, and therefore, the boundary information between the areas is formed by filling the boundary between the target areas A, B, C with the diffused water in the above manner.

Step S34: acquiring a third pixel value of a background area, using starting position information arbitrarily set in the background area as a second expansion starting point to expand outwards until the expanded fourth pixel value reaches a first pixel value corresponding to each target area, and forming a plurality of second boundary information at the intersection of the background area and all the target areas;

Specifically, after the first boundary information between all the target areas in the image information is established, the second boundary information between the target area and the background area is further determined. The above description describes that an area other than the target area at the highlight position or the valley position may be understood as a background area, the third pixel value of the background area is relatively stable with respect to the target area at the highlight position or the valley position, and assuming that the third pixel value of the background area is 20, any position information in the background area may be used as a second expansion starting point for water diffusion filling of the background area to be expanded, and when the pixel value is increased to 30, second boundary information is formed at the intersection of the background area and the area of the target area a; continuing to irrigate and expand when the pixel value is increased to 50, forming second boundary information at the area junction of the background area and the target area B; continuing the irrigation expansion when the pixel value increases to 70, a second boundary information is formed at the area intersection of the background area and the target area C. To this end, boundary information between the target areas A, B, C and between each target area A, B, C and the background area is filled with diffused water.

Step S35: and identifying and obtaining the target area where the position information of each key point is based on the plurality of first boundary information and the plurality of second boundary information.

Specifically, after the boundary information between the target areas A, B, C and between each target area A, B, C and the background area by using the water-flooding algorithm, the area within the boundary is the actual target area of the target area A, B, C, so that the actual area of each target area where the key point position information in the image information is located can be identified based on the formed first boundary information and second boundary information.

Further, the step S35 includes:

Step S351: and obtaining the area of the target area based on the pixel number of the target area where the position information of each key point obtained by recognition is located.

Specifically, after the actual area of each target area where the key point position information is located in the image information is identified, the area of the target area can be determined based on the number of pixels in the target area.

According to the image recognition method provided by the application, the target area in the image information is firstly positioned based on the outputted key point position information by utilizing the neural network model obtained through training, after the target area is positioned, the target area is irrigated and expanded by utilizing the obtained key point position information by utilizing the water-flooding filling method, and the boundary between each target area and the background area is obtained by carrying out the water-flooding expansion on the background area by utilizing any initial position information, so that the actual size of each target area with overlap can be recognized by utilizing the key point detection and the water-flooding filling mode for the target area with overlap, and the recognition efficiency of the image is improved. Meanwhile, after the actual sizes of the overlapped target areas are obtained through identification, the area of the target areas can be further obtained based on the number of pixels of the target areas, and the calculated area of the target areas is the area of the actual sizes of the target areas and meets the production line requirements.

In another embodiment, referring to fig. 3, the present application further provides an image recognition apparatus, including:

An acquisition module 12 for acquiring image information; the image information comprises a plurality of target areas, and the target areas are overlapped;

The output module 14 is configured to input the image information into a trained neural network model, and output the location information of the key points of each target area;

The identifying module 16 is configured to simultaneously expand with each target area as a positive sample and the background area as a negative sample, with the key point position information determined by each target area and the starting position information arbitrarily set by the background area as expansion starting points, and form boundary information at intersections between each target area with changed pixel values and between the target area and the background area, so as to identify each target area in the image information.

Further, the identification module 16 includes:

a processor;

The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims

1. An image recognition method is characterized in that: comprising the following steps:

Taking each target area corresponding to the key point position information as a positive sample, taking a background area as a negative sample, taking the key point position information determined by each target area and the starting position information arbitrarily set by the background area as expansion starting points, expanding based on water-flooding filling, and forming boundary information at the junction between each target area with changed pixel values and the junction between the target area and the background area respectively so as to identify each target area in the obtained image information;

each target area corresponding to the key point position information is taken as a positive sample, the background area is taken as a negative sample, the key point position information determined by each target area and the initial position information arbitrarily set by the background area are taken as expansion starting points, expansion is performed based on water-flooding filling, boundary information is respectively formed at the intersection between each target area with changed pixel values and the intersection between the target area and the background area, and the identification of each target area in the image information comprises the following specific steps:

Acquiring a third pixel value of a background area, and taking the starting position information arbitrarily set in the background area as a second expansion starting point to expand outwards until the fourth pixel value after expansion reaches a first pixel value corresponding to each target area, and forming a plurality of second boundary information at the intersection of the background area and all the target areas;

2. The image recognition method according to claim 1, wherein: the trained neural network model is obtained based on the following modes:

3. The image recognition method according to claim 2, wherein: inputting the second image information into an initial neural network model for training, and obtaining a trained neural network model comprises the following steps:

Outputting the key point position information of each target area;

Calculating the loss of the key point position information and the actual key point position information of each target area by using a loss function, if the loss is larger than a threshold value, updating network parameters of the network parameters transmitted back to the initial neural network model, and calculating the contribution degree of each network parameter to the loss; continuously correcting network parameters, and circularly executing the steps until the loss is smaller than the threshold value to obtain a trained neural network model.

4. The image recognition method according to claim 1, wherein: and drawing a curve at the intersection between each target area to form the first boundary information, and drawing a curve at the intersection between the target area and the background area to form the second boundary information.

5. The image recognition method according to claim 1, wherein: the identifying, based on the plurality of first boundary information and second boundary information, a target area in which each of the key point position information is located includes:

6. An image recognition apparatus, comprising:

The identification module is used for taking each target area corresponding to the key point position information as a positive sample, taking the background area as a negative sample, taking the key point position information determined by each target area and the initial position information arbitrarily set by the background area as expansion starting points, expanding based on water diffusion filling, and respectively forming boundary information at the intersection between each target area with changed pixel values and the intersection between the target area and the background area so as to identify and obtain each target area in the image information;

The identification module comprises:

7. A computer device, the computer device comprising:

a processor;

A memory for storing a computer program for execution by the processor; wherein the processor, when executing the computer program, implements the image recognition method of any one of claims 1-5.

8. A storage medium having stored thereon a computer program which, when executed, implements the image recognition method of any one of claims 1-5.