CN109165645B

CN109165645B - Image processing method and device and related equipment

Info

Publication number: CN109165645B
Application number: CN201810865247.8A
Authority: CN
Inventors: 辛愿; 王嘉雯
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2023-04-07
Anticipated expiration: 2038-08-01
Also published as: CN109165645A

Abstract

The embodiment of the invention discloses an image processing method, an image processing device and related equipment, wherein the method comprises the following steps: acquiring a target image containing a target object and a reference object, detecting the pixel size of the target object in the target image as a target pixel size, and detecting the pixel size of the reference object in the target image as a reference pixel size; and acquiring a reference actual size of the reference object, and determining the target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size. By adopting the invention, the size of the business object is automatically determined, and the efficiency of measuring the size of the target object is improved.

Description

Image processing method and device and related equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an image processing method and apparatus, and a related device.

Background

In agricultural insurance, if a farmer guarantees cultivated poultry or planted crops, and natural disasters or accidents occur to protected objects, a claim settlement request needs to be sent to an insurance institution to obtain insurance compensation. In order to obtain the loss amount of the peasant household and determine the amount of the claim, an insurance clerk of the insurance institution needs to measure the volume of the protected object on the scene of an accident, and then determines the amount of the claim according to the measured volume. Measuring the volume of the protected object from the clerk to the field, however, can be labor intensive and can take a significant amount of time for the clerk.

Disclosure of Invention

The embodiment of the invention provides an image processing method, an image processing device and related equipment, which can automatically determine the size of a business object and improve the efficiency of measuring the size of the business object.

An embodiment of the present invention provides an image processing method, including:

acquiring a target image containing a target object and a reference object, detecting the pixel size of the target object in the target image as a target pixel size, and detecting the pixel size of the reference object in the target image as a reference pixel size;

and acquiring a reference actual size of the reference object, and determining the target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size.

An embodiment of the present invention provides an image processing apparatus, including:

an acquisition module for acquiring a target image including the target object and a reference object;

the first detection module is used for detecting the pixel size of the target object in the target image as a target pixel size;

the second detection module is used for detecting the pixel size of the reference object in the target image as a reference pixel size;

and the determining module is used for acquiring the reference actual size of the reference object and determining the target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size.

Wherein, the first detection module comprises:

the first convolution unit is used for performing convolution processing on the target image based on a convolution layer in a first complete convolution neural network model to obtain a first convolution characteristic diagram formed by combining first convolution characteristic information;

a searching unit, configured to search a plurality of first interest areas in the first volume feature map;

the first convolution unit is further configured to perform pooling processing on first convolution feature information included in each first interest region to obtain first structure feature information, and identify first matching degrees of the first structure feature information included in each first interest region and a plurality of attribute type features in the first complete convolution neural network model;

the first convolution unit is further configured to use the maximum first matching degree as a confidence corresponding to the first interest region among the plurality of first matching degrees corresponding to each first interest region;

the first convolution unit is further configured to, among the confidence degrees corresponding to the plurality of first interest regions, take the first interest region corresponding to the maximum confidence degree as a first target region, and take the size of the first target region as the target pixel size.

Wherein the search unit includes:

a score calculating subunit, configured to slide windows on the first convolution feature map, and determine, in each window, a plurality of candidate regions based on anchor blocks at different scales;

the score calculating subunit is further configured to calculate a target foreground score of the first convolution feature information in each candidate region;

and the score determining subunit is used for determining the plurality of first interest areas according to the target foreground score of each candidate area.

Wherein the score determining subunit includes:

a first determining subunit, configured to determine, as a first auxiliary area, a candidate area where the target foreground score is greater than a score threshold;

the first determining subunit is further configured to determine, as the second auxiliary region, the first auxiliary region having the largest target foreground score among the first auxiliary regions;

a deletion reservation subunit, configured to calculate overlapping areas between the second auxiliary region and the remaining first auxiliary regions, delete the first auxiliary region whose overlapping area is greater than an area threshold, and reserve the first auxiliary region whose overlapping area is less than or equal to the area threshold;

the first determining subunit is further configured to determine the second auxiliary region and the reserved first auxiliary region as the first region of interest.

Wherein the second detection module comprises:

a conversion unit for converting the target image into a target grayscale image;

the area determining unit is used for taking a connected area in the target gray level image as a first reference area;

the conversion unit is further configured to input the grayscale images in the plurality of first reference regions into a classification model respectively, and identify a first probability that each first reference region contains the reference object;

the selecting unit is used for selecting a first reference area meeting the matching condition from the plurality of first reference areas as a second reference area according to the first probability;

a size determination unit for determining the reference pixel size according to the size of the second reference region.

Wherein the area determination unit includes:

the gradiometer unit is used for calculating a first gradient map corresponding to the target gray image according to a gradient operator and performing closed operation on the first gradient map to obtain a second gradient map;

the first detection subunit is used for respectively detecting the communication areas of the first gradient map and the second gradient map and converting the detected communication areas to positive treatment to obtain a plurality of auxiliary reference areas;

the first detection subunit is further configured to use, as the first reference region, a region having the same position information as the auxiliary reference region in the target grayscale image.

Wherein the selection unit includes:

a first extraction subunit operable to extract a maximum first probability among the plurality of first probabilities;

a second determining subunit, configured to determine, if the maximum first probability is smaller than or equal to a probability threshold, a weight of the first reference region according to the position information of the first reference region, the length-width ratio of the first reference region, and the size of the first reference region, determine the first reference region with the largest weight as the first reference region that satisfies the matching condition, and determine the first reference region with the largest weight as the second reference region;

the second determining subunit is further configured to determine, if the maximum first probability is greater than the probability threshold, the first reference region corresponding to the maximum first probability as the first reference region satisfying the matching condition, and determine the first reference region corresponding to the maximum first probability as the second reference region.

Wherein the size determination unit includes:

the second extraction subunit is used for extracting the gray level image edge information in the second reference area to obtain an auxiliary gradient image when the reference object belongs to a first reference object with a fixed shape;

the second detection subunit is used for detecting a continuous curve in the auxiliary gradient image as a first target curve, and determining a first target diameter according to the first target curve;

the second detection subunit is further configured to determine the first target diameter as the reference pixel size.

Wherein the size determination unit further includes:

a clustering subunit configured to, when the reference object belongs to a second reference object whose shape is not fixed, take, as a third reference region, a region in the target image that has the same positional information as the second reference region;

the clustering subunit is further configured to perform color clustering processing on the images in the third reference region according to the colors of the images in the third reference region, so as to obtain a clustering result region;

and the size determining subunit is used for determining the size of the reference pixel according to the size of the clustering result area.

Wherein, the second detection module further comprises:

the second convolution unit is used for performing convolution processing on the target image based on a convolution layer in the mask region convolution neural network model when the reference object belongs to a second reference object with an unfixed shape to obtain a second convolution characteristic diagram formed by combining second convolution characteristic information;

the identification unit is used for searching a plurality of second interest areas in the second convolution feature map and performing pooling processing on second convolution feature information contained in each second interest area to obtain second structure feature information;

the identification unit is further configured to identify a second probability that each second interest region includes the second reference object according to second feature information included in the second interest region;

the calculation unit is used for determining a second interest area corresponding to the maximum second probability as an auxiliary target area and calculating a binary mask of each pixel in the auxiliary target area;

the computing unit is further configured to combine all pixels corresponding to the binary masks belonging to the foreground masks into a target sub-image, and determine the size of the reference pixel according to the size of the target sub-image.

Wherein, the second detection module further comprises:

the third convolution unit is used for performing convolution processing on the target image based on a convolution layer in a second complete convolution neural network model to obtain a third convolution characteristic diagram formed by combining third convolution characteristic information when the reference object belongs to a first reference object with a fixed shape;

the third convolution unit is further configured to search a plurality of third interest areas in the third convolution feature map, and perform pooling processing on third convolution feature information included in each third interest area to obtain third structure feature information;

the third convolution unit is further configured to identify, according to third feature information included in each third interest region, a third probability that the first reference object is included in each third interest region;

the third convolution unit is further configured to determine, as the second target region, a third interest region with a largest third probability among third probabilities corresponding to the plurality of third interest regions;

the extraction unit is used for extracting the edge information of the image in the second target area to obtain an edge gradient image;

the third convolution unit is further configured to detect a continuous curve in the edge gradient image as a second target curve, determine a second target diameter according to the second target curve, and determine the second target diameter as the reference pixel size.

The obtaining module is specifically configured to:

and receiving a service request associated with the target object, and acquiring a target image containing the target object and the reference object according to the service request.

The image processing apparatus further includes:

a display module, configured to use an attribute category of the attribute type feature corresponding to the confidence of the first target region as tag information corresponding to the first target region, and determine, according to the tag information corresponding to the first target region and the actual size of the target, target service data associated with the service request;

the display module is further configured to display the target service data and store the target service data; the target service data comprises: service claim amount and physical sign information of the target object;

and the sending module is used for sending the target service data to a service terminal associated with the service request.

An embodiment of the present invention provides an electronic device, including: a processor and a memory;

the processor is connected to a memory, wherein the memory is used for storing program codes and the processor is used for calling the program codes to execute the method in the embodiment of the invention.

An aspect of the embodiments of the present invention provides a computer storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, perform a method as in the embodiments of the present invention.

The method comprises the steps of detecting the pixel size of a target object in a target image as the target pixel size and detecting the pixel size of a reference object in the target image as the reference pixel size by acquiring the target image comprising the target object and the reference object; and acquiring a reference actual size of the reference object, and determining the target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size. According to the method, the pixel size of the target object in the image and the pixel size of the reference object in the image are respectively detected from the image containing the target object and the reference object, the real size of the reference object is obtained, and the real size of the target object can be determined according to the proportional relation, so that the real size of the target object can be automatically determined, the size of a business object is prevented from being measured in a manual mode, and the efficiency of measuring the size of the business object is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1a is a system architecture diagram of an image processing method according to an embodiment of the present invention;

FIGS. 1 b-1 d are schematic views of a scene of an image processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a process for detecting a reference pixel size according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1a, which is a system architecture diagram of an image processing method according to an embodiment of the present invention, a server 10a provides a service for a user terminal cluster, where the user terminal cluster may include: user terminal 10b, user terminal 10 c. In agricultural insurance, it is described below that a farmer initiates a claim settlement request to an insurance mechanism as a scenario, when a user (which may be the

user

10e, 10f, or 10 g) needs to initiate the claim settlement request to the insurance mechanism, the user may take a picture including a target object and a reference object based on a user terminal (which may be the user terminal 10b, the user terminal 10c, or the user terminal 10 d), and the user terminal acquires the picture and then sends the picture to the server 10a. The server 10a detects the pixel sizes of the target object and the reference object in the picture, determines the real size of the target object according to the actual size and the proportional relation of the reference object, and further determines the physical sign information and the claim amount associated with the target object according to the real size of the target object. The server 10a stores the determined physical sign information and the claim settlement amount in a database in a related manner, and sends the database to the user terminal, so that the user can know the physical sign information of the target object of the claim settlement request and the obtained claim settlement amount. Of course, if the user terminal itself can detect the pixel sizes of the target object and the reference object in the picture, the physical sign information and the claim amount of the target object can also be directly determined in the user terminal, and only the physical sign information and the claim amount need to be sent to the server 10a for storing the determined physical sign information and the claim amount. The following fig. 1 b-1 d illustrate how to determine the pixel sizes of the target object and the reference object, and the real size of the target object, by taking a user terminal 10b and a server 10a as examples.

The user terminal may include a mobile phone, a tablet computer, a notebook computer, a palm computer, an intelligent sound, a mobile internet device (MID, a mobile internet device), a POS (Point Of Sales) device, a wearable device (e.g., an intelligent watch, an intelligent bracelet, etc.), and the like.

Fig. 1b is a schematic view of a scene of an image processing method according to an embodiment of the present invention. When an accident occurs to a target object (which may be poultry, livestock, etc.), a farmer issues a claim settlement request to an insurance company. First, in the interface 20a of the user terminal 10b, basic information of the application, such as a ticket number when insurance is purchased, a reason and time when an accident occurs, and the like, is filled. In the interface 20b, a photo containing the target object is taken, and before the photo is taken, a reference object is placed beside or on the target object, wherein the reference object can be a standard card or a coin with standard size such as an identification card and a bus card, in order to accurately calculate the real size of the target object. After the preparation is completed, a picture containing the target object and the reference object may be taken. In order to center the target object in the captured picture and facilitate subsequent detection of the size of the target object, as shown in the interface 20c, the user may be prompted to capture the target object and the reference object in alignment with the preset rectangular frame when capturing the picture.

As shown in fig. 1c, when the user terminal 10b acquires the photographed picture 20d, the picture 20d is transmitted to the server 10a, and the recognition model 20e that can detect the pixel size of the target object in the picture and the recognition model 20g that can detect the pixel size of the reference object in the picture are stored in the server 10a. The picture 20d is input into the recognition model 20e, and convolution operation is performed on the picture 20d based on the convolution layer in the recognition model 20e, so that a convolution feature map can be obtained, wherein the convolution feature map comprises convolution feature information of the target object in the picture 20 d. Generally, the size of the obtained convolution feature map is smaller than the size of the picture 20d, and in order to unify the sizes of the convolution feature maps, the boundary of each convolution feature map may be filled with a value "0", so that the size of the convolution feature map is consistent with the size of the picture 20d, and the filled value "0" does not affect the subsequent calculation result. In order to slide windows (window size may be 3 × 3) according to a preset step (step size may be 1) on the convolution feature map, in each window, a plurality of candidate frames are determined according to a preset anchor frame (anchor frame size may be 16 × 16) and a plurality of scales, the multi-scales may be adjusting the aspect ratios of the anchor frames to 0.5,1,2, respectively, and scaling the aspect ratio-adjusted anchor frames by 0.5 times, 1 times, and 2 times, respectively. It can be appreciated that multiple candidate boxes may correspond within each sliding window. And calculating the foreground score of each candidate frame according to the convolution characteristic information contained in each candidate frame, wherein the higher the foreground score is, the higher the probability that the corresponding candidate frame contains the target object is. And taking the candidate frame with the foreground score larger than a preset score threshold value as a first interest area. Since a large amount of overlapping areas may exist among the plurality of first interest regions, the first interest regions having a large amount of overlapping areas among the first interest regions corresponding to the highest foreground scores may be deleted, and only the first interest regions having a small amount of overlapping areas or no overlapping areas among the first interest regions corresponding to the highest foreground scores may be retained. After the plurality of first interest areas are determined, performing pooling processing and full-connection processing on the convolution feature information in each first interest area, and calculating matching degrees between the feature information after the full-connection processing and a plurality of attribute type features included in the classifier based on the classifier in the identification model 20e, wherein the higher the matching degree is, the higher the probability that the attribute category corresponding to the object included in the first interest area is the attribute category corresponding to the attribute type feature in the classifier is. Each first interest area can obtain a plurality of matching degrees, the maximum matching degree is used as the confidence degree of the first interest area, and the attribute category corresponding to the maximum matching degree is used as the label information of the first interest area, namely, one first interest area corresponds to one label information and corresponds to one confidence degree. The tag information is an attribute category of the object in the first interest area, for example, the attribute category may be livestock such as pigs, sheep, cattle, and the like. And selecting the first interest area corresponding to the maximum confidence coefficient from the confidence coefficients corresponding to the plurality of first interest areas as a first target area 20f, wherein the first target area 20f carries tag information 'pig', and the tag information 'pig' is the attribute category of the target object. The pixel size of the target object in the picture 20d can be determined by the size of the first target region 20f, for example, the length of the target region 20f can be taken as the pixel size of the target object in the picture 20 d. The recognition model 20e is trained based on a Region-based full Convolutional network (RFCN), and the target detection algorithm may further include: RCNN (Regions with CNN), FAST RCNN (FAST Regions convolutional neural network), FASTER RCNN (FASTER Regions convolutional neural network), which may all implement the identifying of the first target region in the picture 20d and the identification of the attribute class of the target object in the first target region.

As shown in fig. 1c, the picture 20d is input into the recognition model 20g, and convolution operation is performed on the picture 20d based on the convolution layer in the recognition model 20g, so that a convolution feature map including convolution feature information of the reference object in the picture 20d can be obtained. Similarly, a value "0" is filled in the boundary of the convolution feature map, so that the size of the convolution feature map is consistent with the size of the picture 20d, and the filled value "0" does not affect the subsequent calculation result. And searching a second interest area in the convolution characteristic image, wherein the specific process of searching the second interest area can participate in the correlation step of searching the first interest area. And performing pooling processing and full connection processing on the convolution feature information in each second interest region respectively, calculating the matching probability of the feature information subjected to full connection processing and the attribute category of the reference object by using the classifier (namely calculating the probability that each second interest region contains the reference object), and taking the second interest region corresponding to the maximum matching probability as a target auxiliary region. Based on the regression mask layer in the recognition model 20g, a binary mask for each pixel in the target auxiliary region is calculated. The binary mask includes a foreground mask indicating that the corresponding pixel belongs to the reference object and a background mask indicating that the corresponding pixel belongs to the background. The pixel combination belonging to the foreground mask is used as a target sub-image 20h, and the pixel size of the reference object in the picture 20d can be determined according to the size of the target sub-image 20h, for example, the diagonal length of the target sub-image 20h can be used as the pixel size of the reference object in the picture 20 d. The recognition model 20g is trained based on a Mask-RCNN (Mask region-based full Convolutional network) which is an entity segmentation algorithm, the Mask-RCNN can accurately obtain the position and the shape of a reference object in a picture, and for the reference object with a small measurement volume, the accurate pixel size can ensure that the error is smaller when the compensation amount is determined subsequently. The sequence of determining the target sub-graph 20h and determining the first target area 20f is not limited, and the attribute class of the target object is determined by the recognition model 20 e.

As shown in fig. 1d, the server 10a may determine the real size (including the real length and the real width) of the target object according to the pixel size of the target object "pig" in the picture 20d, the pixel size of the reference object, and the real size of the reference object, for example, the real length of the pig body displayed in the interface 30a is: 80cm, determining the weight of the target object according to the real body length and the attribute category of the target object: 70kg, and further determining the compensation amount of the claim request according to the weight of the target object and the attribute category of the target object as follows: 900 yuan. The server 10a may transmit the length, weight, and compensation amount of the target object to the user terminal 10b, and display the data received from the server 10a in the interface 30b of the user terminal. The real size of the target object is determined by detecting the pixel sizes of the target object and the reference object in the image. The real size of the target object can be automatically determined, the size is prevented from being measured in a manual mode, and the efficiency of measuring the size of the target object is improved.

The specific process of detecting the pixel sizes of the target object and the reference object in the image may refer to the following embodiments corresponding to fig. 2 to 4.

Further, please refer to fig. 2, which is a flowchart illustrating an image processing method according to an embodiment of the present invention. As shown in fig. 2, the image processing method may include:

step S101, acquiring a target image including the target object and a reference object, and detecting a pixel size of the target object in the target image as a target pixel size.

Specifically, the terminal device (which may be the server in fig. 1 d) receives a service request (e.g., the insurance claim settlement request in fig. 1 a) associated with a target object, and obtains a target image (which may be the picture 20d in fig. 1 c) according to the service request, where the target image includes the target object (which may be the pig body in fig. 1 c) and a reference object (which may be the card in fig. 1 c), where an object used for calculating the real size is referred to as a target object, and an object used for comparing with the target object is referred to as a reference object. Based on the convolution layer in the first complete convolution neural network model constructed in the terminal device, the terminal device performs convolution processing on the target image, namely randomly selecting a small part of feature information in the target image as a sample (convolution kernel), and sliding the sample as a window through all the target images in sequence, namely performing convolution operation on the sample and the target image, so as to obtain convolution feature information about the target object in the target image. After the convolution processing, the terminal device may obtain a convolution feature map including convolution feature information of the target object in the target image, and the terminal device may refer to the convolution feature information of the target object as first convolution feature information and refer to a convolution feature image formed by combining the first convolution feature information as a first convolution feature map. Based on an RPN (Region-selecting Network) algorithm, a plurality of interest regions are searched in the first convolution feature map, and the interest regions are called first interest regions. Taking a first interest area as an example, if there are multiple first interest areas, the terminal device may calculate the confidence of each first interest area and the tag information of each first interest object in the same manner. Pooling the first volume feature information contained in the first interest region, that is, aggregating and counting the first volume feature information, wherein the static structure feature information about the target object in the target image can be obtained after aggregation and counting, and the static structure feature information is referred to as first structure feature information. The pooling is to reduce the amount of subsequent computations and to share weights. Based on a classifier in the first complete convolution neural network model, identifying matching degrees of the aggregated and counted first structural feature information in the first interest region and a plurality of attribute type features in the first complete convolution neural network model, which are called first matching degrees, that is, each first interest region corresponds to a plurality of first matching degrees, and the number of the first matching degrees corresponding to each first interest region is the same as the number of the attribute type features in the first complete convolution neural network model, wherein each attribute type feature corresponds to one attribute category. In the multiple first matching degrees corresponding to each first interest region, the terminal device takes the maximum first matching degree as the confidence degree of the first interest region, and the terminal device takes the attribute category of the attribute type feature corresponding to the maximum first matching degree as the tag information of the object in the first interest region. For example, the first complete convolution neural network model includes attribute type features corresponding to an attribute category "pig"; attribute type characteristics corresponding to attribute type of sheep; and attribute type features corresponding to the attribute category "cow", and the classifier identifies a first matching degree between the first structural feature information a and the attribute type features corresponding to the 3 attribute categories to obtain: the first matching degree of the first structural feature information A and the attribute type 'pig' is 0.1; the first matching degree of the first structural characteristic information A and the attribute type sheep is 0.7; the first degree of matching of the first structural feature information a with the attribute category "cow" is 0.9. Therefore, the confidence of the first interest region corresponding to the first structural feature information a is 0.9, and the tag information of the object in the first interest region is: and (4) cattle.

In the above manner, the terminal device may determine a corresponding confidence and tag information for each first interest region, and then may determine the first interest region corresponding to the maximum confidence as a first target region (for example, the first target region 20f in fig. 1 c), and use a pixel size of the first target region in the target image as a size of the target pixel, for example, use a pixel length of the first target region or use a pixel area of the first target region as the target pixel size. The first complete convolution neural network model is trained based on a target detection algorithm RFCN.

Based on the RPN algorithm, a specific process of searching for a plurality of first interest regions in the first convolution feature map may be: presetting a window size (for example, the window size is 3 × 3), sequentially sliding the windows on the first convolution feature map according to a preset step size (the step size may be 1), in each window, determining a plurality of candidate frames according to a preset anchor frame (the anchor frame size may be 16 × 16) and a plurality of scales, where the plurality of scales may be adjusting an aspect ratio of the anchor frame or scaling the anchor frame size. It can be appreciated that multiple candidate boxes may correspond within each sliding window. The terminal device calculates a target foreground score of the first convolution characteristic information in each candidate region, namely, the classifier calculates the probability that the corresponding candidate region belongs to the foreground region, and if the target foreground score is higher, the probability that the candidate region belongs to the foreground region is higher. Since each target image corresponds to a large number of candidate regions (the number may exceed 2000), and these candidate regions have a large number of overlapping regions, a Non Maximum Suppression (NMS) algorithm may be used to eliminate a part of overlapping candidate regions. The specific process of NMS may be: the terminal equipment takes a candidate area with a target foreground score larger than a preset score threshold value as a first auxiliary area, and selects the first auxiliary area with the maximum target foreground score from the target foreground scores corresponding to all the first auxiliary areas to determine the first auxiliary area as a second auxiliary area. The overlapping area between the second auxiliary region and each of the first auxiliary regions may be calculated separately using formula (1) (calculating IOU) or using formula (2) (calculating overlap),

wherein A represents the area of the second auxiliary area, B represents the area of the first auxiliary area, and IOU is equal to the intersection of the first auxiliary area and the second auxiliary area divided by the union of the first auxiliary area and the second auxiliary area; overlap is equal to the intersection of the first and second auxiliary areas divided by the second auxiliary area. Whether it is the IOU calculated by equation (1) or the overlap calculated by equation (2), a higher value indicates a larger overlap area. Deleting the first auxiliary region with the overlapping area larger than the preset area threshold, correspondingly reserving the first auxiliary region with the overlapping area smaller than or equal to the area threshold, and taking the second auxiliary region and the reserved first auxiliary region as the first interest region. The above process is the whole process of searching the first interest region by the RPN algorithm, and in order to reduce the subsequent calculation amount, an NMS algorithm is also nested in the RPN algorithm for deleting the candidate region with too much overlapping area.

Step S102, detecting a pixel size of the reference object in the target image as a reference pixel size.

After determining the label information of the target object in the first target area and determining the target pixel size of the target object by the first target area, the terminal device continues to detect the pixel size of the reference object in the target image, referred to as the reference pixel size. Firstly, a terminal device detects whether a target image is a color image, if the target image is the color image, the target image needs to be grayed, and the grayed image is called a target grayscale image; if the target image is originally a grayscale image, it is not used for any processing, but the target image is still referred to as a target grayscale image. Converting a color image into a grayscale image is to convert 3 channels (R channel, G channel, B channel) of the color image into only one single channel image. The terminal equipment identifies a connected region in the target gray-scale image based on edge detection, and the connected region is called a first reference region. The edge detection can identify the contour lines of the target object and the reference object in the target gray image, and the area corresponding to the continuous and closed contour line can be used as a communication area. Taking a first reference region as an example to describe how to determine the corresponding first probability, the terminal device inputs the grayscale image in the first reference region into a classification model trained in advance, where a classifier in the classification model is a two-classifier, and the two-classifier can calculate the probability that the first reference region contains the reference object, which is called the first probability. For example, the reference object is a card; the second classifier may identify a first probability that the first reference area includes the card, although the higher the first probability, the more likely the corresponding first reference area is to contain the reference object. When training the classifier in the classification model, only two types of samples (positive samples and negative samples) are needed, the positive samples are reference regions containing reference objects, and the negative samples are reference regions not containing reference objects.

If there are multiple first reference areas, the terminal device may determine the first probability corresponding to each first reference area by using the foregoing manner. According to the first probability calculated by the classifier, the terminal device selects a first reference region meeting the matching condition from the plurality of first reference regions, and takes the selected first reference region as a second reference region. The size of the reference pixel is determined according to the pixel size of the second reference region in the target image, for example, the pixel length of the second reference region or the pixel area of the second reference region is taken as the target pixel size.

Step S103, obtaining a reference actual size of the reference object, and determining a target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size.

Specifically, the terminal device obtains an actual size (referred to as a reference actual size) of the reference object, and calculates the actual size (referred to as a target actual size) of the target object according to a proportional relationship (specifically, the proportional relationship is that a ratio between the reference pixel size and the reference actual size is the same as a ratio between the target pixel size and the target actual size) according to the target pixel size of the target object, the reference pixel size of the reference object, and the reference actual size of the reference object. Wherein, the target actual size can be calculated according to the formula (3):

wherein A1 denotes a reference pixel size; a2 denotes a reference actual size; b1 represents a target pixel size; b2 denotes the target actual size.

Optionally, the target service data may be further determined subsequently according to the target actual size of the target object, and the attribute type of the target object plays a decisive role in the target service data, so that the terminal device needs to determine the attribute of the target object, and the specific process of determining the attribute type is as follows: the terminal device obtains label information corresponding to the first target area, wherein the label information is determined according to the attribute type of the attribute type feature corresponding to the confidence degree of the first target area. And the terminal equipment determines target service data associated with the service request according to the label information corresponding to the first target area and the actual size of the target. For example, when the service request is an insurance claim request, the target service data may be service claim amount, physical sign information such as weight and volume of the target object, that is, physical sign information such as weight of the target object is determined according to the actual size of the target object, and then the service claim amount is determined according to the physical sign information and the tag information of the target object in the first target area.

Optionally, after the terminal device determines the target service data, the terminal device may display the target service data, where the target service data includes: the amount of the service claim, and the information of the physical sign (weight, volume, etc.) of the target object. The terminal device sends the target service data to the service terminal (e.g. the user terminal 10b in the embodiment corresponding to fig. 1 d) associated with the service request.

Further, referring to fig. 3, fig. 3 is a schematic flow chart illustrating a method for detecting a reference pixel size according to an embodiment of the invention. As shown in fig. 3, the specific process of detecting the reference pixel size includes the following steps S201 to S204, and the steps S201 to S204 are an embodiment of the step S102 in the embodiment corresponding to fig. 2:

step S201, converting the target image into a target grayscale image, and using a connected region in the target grayscale image as a first reference region.

Specifically, the terminal device converts the target image into a grayscale image, referred to as a target grayscale image. According to the preset gradient operator, the gradient operator can be a sobel gradient operator, a canny gradient operator, a laplace gradient operator and the like. By utilizing the gradient operator, the gradient of each pixel point in the target image can be rapidly calculated, the image obtained by combining the gradients of each pixel point is called a gradient map, and the gradient map obtained by the target image is called a first gradient map. And the terminal equipment performs closed operation on the first gradient image to obtain a gradient image called as a second gradient image, wherein the closed operation is to expand and then corrode the first gradient image, and after the closed operation is performed, the small holes in the first gradient image can be filled and adjacent objects in the first gradient image can be connected. The first gradient map and the second gradient map comprise contour lines of all objects (including target objects and reference objects) in the target image, the terminal device calls the area identified by the continuous and closed contour line in the first gradient map as a connected area, and also calls the area identified by the continuous and closed contour line in the second gradient map as a connected area. Since the communication area may be an irregular area pattern, the terminal device is further required to forward the determined communication area, so that the communication area after forward processing is a regular area pattern (e.g., a rectangle). The connected region after the correction processing is called an auxiliary reference region. Since the auxiliary reference region corresponds to the first gradient map and the second gradient map, the auxiliary reference region is mapped to the target gray scale map, that is, a region having the same position information as the auxiliary reference region in the target gray scale map is used as the first reference region. For example, the coordinates of the auxiliary reference region in the first gradient map are (3, 20, 20), respectively, wherein the first term in the above coordinates represents the starting abscissa of the auxiliary reference region; the second term represents the starting ordinate of the auxiliary reference area; the third term represents the length of the auxiliary reference region; the fourth term represents the width of the auxiliary reference region. The area identified by the coordinates "(3, 20, 20)" in the target gray scale image is taken as the first reference area.

Step S202, inputting the gray level images in the plurality of first reference regions into a classification model respectively, and identifying a first probability that each first reference region contains the reference object.

Specifically, how the terminal device calculates the first probability corresponding to the first reference region is described below by taking a first reference region as an example. The terminal equipment inputs the gray level image in the first reference area into a pre-trained classification model, a classifier in the classification model is a two-classifier, and the two-classifier can calculate the probability that the first reference area contains the reference object, namely the first probability. For example, the reference object is a card; the second classifier may identify a first probability that the first reference area includes the card, although the higher the first probability, the more likely the corresponding first reference area is to contain the reference object. The classification model can be trained based on a convolutional neural network in deep learning. If there are multiple first reference regions, the first probability corresponding to each first reference region may be respectively identified based on the classification model.

Step S203, selecting a first reference area meeting the matching condition from the plurality of first reference areas as a second reference area according to the first probability.

Specifically, the terminal device extracts the largest first probability among the plurality of first probabilities. Detecting a maximum first probability, and if the maximum first probability is less than or equal to a preset probability threshold, indicating that the first probability determined by the classification model cannot be used as a precondition for screening the first reference regions, so that the terminal device needs to determine the weight of each first reference region according to the position information of each first reference region, the length-width ratio of each first reference region, and the size of each first reference region, wherein the weight is larger if the position of the first reference region is closer to the center of the target image; if the aspect ratio of the first reference area is closer to the preset aspect ratio condition, the weight is larger; the weight is larger if the size of the first reference area is closer to the predetermined size condition. After the weight corresponding to each first reference region is obtained through calculation, the first reference region with the largest weight is used as the first reference region meeting the matching condition, and the first reference region meeting the matching condition is used as the second reference region, namely, the first reference region with the largest weight is used as the second reference region.

If the maximum first probability is greater than the preset probability threshold, it is stated that the first probability determined by the classification model may be used as a precondition for screening the first reference region, and therefore the terminal device determines the first reference region corresponding to the maximum first probability as the first reference region satisfying the matching condition, and uses the first reference region satisfying the matching condition as the second reference region, that is, uses the first reference region corresponding to the maximum first matching probability as the second reference region. In the above description, no matter which condition is satisfied by the maximum first probability, only one first reference region is selected from the plurality of first reference regions as the second reference region, and the rest of the first reference regions do not need to refer to the subsequent operation.

Step S204, determining the size of the reference pixel according to the size of the second reference area.

Specifically, the reference objects are divided into a first reference object whose shape is fixed, and a second reference object whose shape is not fixed. For example, a coin is a first reference object with a fixed shape, while a card (because of the possibility of curling or creasing of a paper card, etc.) is a second reference object with a non-fixed shape. Different reference objects correspond to different calculation modes. When the reference object belongs to the first reference object with a fixed shape, the terminal device extracts the edge information of the gray level image in the second reference area based on the gradient operator, and a gradient image corresponding to the gray level image in the second reference area can be obtained, and the gradient image is called as an auxiliary gradient image. Based on the equal-step hough transformation, the terminal equipment determines a continuous curve and a circle center in the auxiliary gradient image, and the continuous curve is called as a first target curve. From the first target curve and the center of the circle, a diameter corresponding to the first target curve, referred to as the first target diameter, may be determined. The first target diameter is taken as a reference pixel size of the reference object.

When the reference object belongs to a second reference object with unfixed shape, the target gray-scale image where the second reference area is located is mapped back to the target image, that is, the area of the target image where the terminal device will have the same position information as the second reference area is used as a third reference area. And according to the color of the image in the third reference region, the terminal equipment performs clustering processing on the color of the image in the third reference region so as to eliminate a shadow part of the boundary of the third reference region and improve the precision of the subsequent calculation of the size of the reference pixel of the second reference object. And the terminal equipment calls the area subjected to the color clustering processing as a clustering result area, and determines the reference pixel size of the reference object according to the size of the clustering result area. For example, the pixel length of the clustering result region or the pixel area of the clustering result region or the diagonal length of the clustering result region is taken as the target pixel size.

The method comprises the steps of detecting the pixel size of a target object in a target image as the target pixel size and detecting the pixel size of a reference object in the target image as the reference pixel size by acquiring the target image comprising the target object and the reference object; and acquiring a reference actual size of the reference object, and determining the target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size. In the above way, the pixel size of the target object in the image and the pixel size of the reference object in the image are respectively detected from the image containing the target object and the reference object, the real size of the reference object is obtained, and the real size of the target object can be determined according to the proportional relation, so that the real size of the target object can be automatically determined, the size measurement in a manual mode is avoided, and the efficiency of measuring the size of the target object is improved.

Referring to fig. 4, it is a schematic flow chart of an image processing method according to an embodiment of the present invention, where the image processing method includes the following steps:

step S301, acquiring a target image including the target object and a reference object, and detecting a pixel size of the target object in the target image as a target pixel size.

The specific implementation manner of step S301 may refer to step S101 in the embodiment corresponding to fig. 2.

Step S302, when the reference object belongs to a second reference object with unfixed shape, performing convolution processing on the target image based on a convolution layer in the mask region convolution neural network model to obtain a second convolution feature map formed by combining second convolution feature information.

Specifically, when the reference object belongs to a second reference object with an unfixed shape, based on a convolutional layer in a mask area convolutional neural network model (such as the recognition model 20g in fig. 1 c) constructed in the terminal device, the terminal device performs convolutional processing on the target image, that is, randomly selects a small part of feature information in the target image as a sample (convolutional kernel), and slides the sample as a window sequentially over all the target images, that is, performs convolutional operation on the sample and the target images, so as to obtain convolutional feature information about the second reference object in the target image. After the convolution processing, the terminal device may obtain a convolution feature map including convolution feature information of a second reference object in the target image, and the terminal device may refer to the obtained convolution feature information as second convolution feature information and refer to the obtained convolution feature image as a second convolution feature map.

Step S303, searching a plurality of second interest areas in the second convolution feature map, and performing pooling processing on the second convolution feature information included in each second interest area to obtain second structure feature information.

Specifically, based on the RPN algorithm, the terminal device searches a plurality of interest regions in the second convolution feature map, which are referred to as a second interest region, where a specific process of searching for an interest region based on the RPN algorithm may refer to step S101 in the corresponding embodiment of fig. 2. Taking a second interest area as an example, if there are multiple second interest areas, the terminal device may calculate the second structural feature information in each second interest area in the same manner. Pooling the second convolution characteristic information contained in the second interest region, namely aggregating and counting the second convolution characteristic information, wherein static structure characteristic information about a second reference object in the target image can be obtained after aggregation and counting, and the static structure characteristic information is called as second structure characteristic information.

Step S304, according to the second characteristic information contained in each second interest region, identifying a second probability that each second interest region contains the second reference object.

Specifically, based on the classifier in the mask region convolutional neural network model, the classifier may output a matching probability (referred to as a second probability) of each second structural feature information with the second reference object, that is, a second probability that the terminal device identifies that each second interest region includes the second reference object. Of course, the higher the second probability, the more likely it is that the corresponding second region of interest will contain the second reference object. When training the classifier in the classification model, only two types of samples (positive samples and negative samples) are needed, the positive samples are the interest areas containing the second reference object, and the negative samples are the interest areas not containing the second reference object.

Step S305, determining the second interest region corresponding to the maximum second probability as an auxiliary target region, and calculating a binary mask of each pixel in the auxiliary target region.

Specifically, since there are a plurality of second interest regions and each second interest region corresponds to one second probability, the terminal device uses the second interest region corresponding to the maximum second probability as the auxiliary target region. The position detection of the second reference object in the target image is completed after the auxiliary target area is determined, and since the second reference object is an irregular figure, in order to improve the accuracy of calculating the pixel size of the second reference object, the terminal device is required to further segment out specific pixel points corresponding to the second reference object in the auxiliary target area. A binary mask for each pixel in the target auxiliary region is calculated based on a regression mask layer in the mask region convolutional neural network model. The binary mask includes a foreground mask indicating that the corresponding pixel belongs to the second reference object and a background mask indicating that the corresponding pixel belongs to the background.

Step S306, all pixels corresponding to the binary mask belonging to the foreground mask are combined into a target sub-image, and the size of the reference pixel is determined according to the size of the target sub-image.

Specifically, the terminal device refers to an image formed by combining pixels belonging to the foreground mask as a target sub-image (such as the target sub-image 20h in fig. 1 c), and determines the reference pixel size of the second reference object according to the size of the target sub-image. For example, the pixel length of the target sub-image or the pixel area of the target sub-image or the diagonal length of the target sub-image is taken as the target pixel size. The Mask region convolution neural network model is trained on the basis of a Mask-RCNN (body segmentation algorithm), the Mask-RCNN can accurately obtain the position and the shape of a second reference object in a target image, and for the second reference object with a small measurement volume, the more accurate pixel size can ensure that the actual size of the target object is determined in a follow-up mode and has smaller errors.

Optionally, when the reference object belongs to a first reference object with an unfixed shape, based on a convolution layer in a second complete convolution neural network model that is constructed in the terminal device, the terminal device performs convolution processing on the target image, that is, randomly selects a small part of feature information in the target image as a sample (convolution kernel), and sequentially slides the sample across all the target images as a window, that is, performs convolution operation on the sample and the target image, so as to obtain convolution feature information about the first reference object in the target image. After the convolution processing, the terminal device may obtain a convolution feature map including convolution feature information of the first reference object in the target image, refer to the obtained convolution feature information as third convolution feature information, and refer to the obtained convolution feature image as a third convolution feature map. Based on the RPN algorithm, the terminal device searches a plurality of interest areas, referred to as third interest areas, in the third convolution feature map, where a specific process of searching for an interest area based on the RPN algorithm may refer to step S101 in the corresponding embodiment of fig. 2. And performing pooling processing on the third convolution feature information contained in the third interest region respectively, namely aggregating and counting the third convolution feature information, wherein the static structure feature information about the first reference object in the target image can be obtained after aggregation and counting, and is called as third structure feature information. Based on the classifier in the second complete convolutional neural network model, the terminal device identifies a matching probability (referred to as a third probability) of each third feature information with the first reference object, that is, identifies a third probability that each third interest region contains the first reference object. Of course, the higher the third probability, the more likely the corresponding third region of interest is to contain the first reference object. When training the classifier in the classification model, only two types of samples (positive samples and negative samples) are needed, the positive samples are the interest areas containing the first reference object, and the negative samples are the interest areas not containing the first reference object. The second complete convolution neural network model is trained based on a target detection algorithm RFCN. Because a plurality of third interest areas exist and each third interest area corresponds to one third probability, the terminal device takes the third interest area corresponding to the maximum third probability as the second target area. The terminal device converts the image in the second target region into a gray image, extracts the edge information of the gray image based on the gradient operator, and obtains a gradient image corresponding to the image in the second target region, which is called an edge gradient image. Based on the equal-step hough transformation, the terminal device can determine a continuous curve and a circle center in the edge gradient image, and the continuous curve is called a second target curve. The terminal device may determine a diameter corresponding to the second target curve, referred to as a second target diameter, according to the second target curve and the center of the circle. The second target diameter is taken as the reference pixel size of the reference object.

Step S307, obtaining a reference actual size of the reference object, and determining a target actual size of the target object according to the target pixel size, the reference pixel size, and the reference actual size.

The specific implementation manner of step S307 may refer to step S103 in the embodiment corresponding to fig. 2.

The method comprises the steps of detecting the pixel size of a target object in a target image as the target pixel size by acquiring the target image containing the target object and a reference object, and detecting the pixel size of the reference object in the target image as the reference pixel size; and acquiring a reference actual size of the reference object, and determining the target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size. According to the method, the pixel size of the target object in the image and the pixel size of the reference object in the image are respectively detected from the image containing the target object and the reference object, the real size of the reference object is obtained, and the real size of the target object can be determined according to the proportional relation, so that the real size of the target object can be automatically determined, the size is prevented from being measured manually, and the efficiency of measuring the size of the target object is improved.

Further, please refer to fig. 5, which is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention. As shown in fig. 5, the image processing apparatus 1 may include: the device comprises an acquisition module 11, a first detection module 12, a second detection module 13 and a determination module 14.

An acquisition module 11 that acquires a target image including the target object and a reference object;

specifically, the obtaining module 11 receives a service request associated with a target object, and obtains a target image according to the service request, where the target image includes the target object and a reference object, an object used for calculating a real size is referred to as the target object, and an object used for comparing with the target object is referred to as the reference object.

A first detecting module 12, configured to detect a pixel size of the target object in the target image as a target pixel size;

specifically, based on the constructed convolution layer in the first complete convolution neural network model, the first detection module 12 performs convolution processing on the target image, the first detection module 12 may obtain a convolution feature map, the convolution feature map includes convolution feature information of the target object in the target image, the first detection module 12 refers to the convolution feature information of the target object as first convolution feature information, and the convolution feature image formed by combining the first convolution feature information is referred to as a first convolution feature map. Based on an RPN (Region-selecting network) algorithm, a plurality of interest regions are searched in the first convolution feature map, and the interest regions are called as first interest regions. Pooling first volume characteristic information contained in the first interest region, namely aggregating and counting the first volume characteristic information, wherein static structure characteristic information about a target object in a target image can be obtained after aggregation and counting, and the static structure characteristic information is called as first structure characteristic information. Based on a classifier in the first complete convolution neural network model, identifying matching degrees of the aggregated and counted first structural feature information in the first interest region and a plurality of attribute type features in the first complete convolution neural network model, which are called first matching degrees, namely, each first interest region corresponds to a plurality of first matching degrees, the number of the first matching degrees corresponding to each first interest region is the same as the number of the attribute type features in the first complete convolution neural network model, and each attribute type feature corresponds to one attribute category. In the multiple first matching degrees corresponding to each first interest region, the first detection module 12 uses the maximum first matching degree as the confidence degree of the first interest region, and the terminal device uses the attribute category of the attribute type feature corresponding to the maximum first matching degree as the tag information of the object in the first interest region.

A second detection module 13, configured to detect a pixel size of the reference object in the target image as a reference pixel size;

specifically, the second detection module 13 continues to detect the pixel size of the reference object in the target image, which is called the reference pixel size. Firstly, the second detection module 13 detects whether the target image is a color image, if the target image is the color image, the target image needs to be grayed, and the grayed image is called a target grayscale image; if the target image is originally a grayscale image, it is not used for any processing, but the target image is still referred to as a target grayscale image. The second detection module 13 identifies a connected region in the target grayscale image, called a first reference region, based on edge detection. The edge detection can identify the contour lines of the target object and the reference object in the target gray level image, and the area corresponding to the continuous and closed contour line can be used as a communication area. The second detection module 13 inputs the grayscale image in the first reference region into a classification model trained in advance, where a classifier in the classification model is a two-classifier, and the two-classifier can calculate a probability that the first reference region contains the reference object, which is called a first probability, and certainly, the higher the first probability is, the more likely the corresponding first reference region contains the reference object.

A determining module 14, configured to obtain a reference actual size of the reference object, and determine a target actual size of the target object according to the target pixel size, the reference pixel size, and the reference actual size.

Specifically, the determining module 14 obtains an actual size (referred to as a reference actual size) of the reference object, and calculates the actual size (referred to as a target actual size) of the target object according to a proportional relationship (specifically, the proportional relationship is that a ratio between the reference pixel size and the reference actual size is the same as a ratio between the target pixel size and the target actual size) based on the target pixel size of the target object, the reference pixel size of the reference object, and the reference actual size of the reference object.

For specific functional implementation manners of the obtaining module 11, the first detecting module 12, the second detecting module 13, and the determining module 14, reference may be made to steps S101 to S103 in the corresponding embodiment of fig. 2, which is not described herein again.

Referring to fig. 5, the first detecting module 12 may include: a first convolution unit 121, a search unit 122.

A first convolution unit 121, configured to perform convolution processing on the target image based on a convolution layer in the first complete convolution neural network model to obtain a first convolution feature map formed by combining first convolution feature information;

a searching unit 122, configured to search a plurality of first interest regions in the first volume feature map;

the first convolution unit 121 is further configured to perform pooling processing on the first convolution feature information included in each first interest region to obtain first structure feature information, and identify first matching degrees between the first structure feature information included in each first interest region and the multiple attribute type features in the first complete convolution neural network model;

the first convolution unit 121 is further configured to use the maximum first matching degree as a confidence corresponding to the first interest region in the plurality of first matching degrees corresponding to each first interest region;

the first convolution unit 121 is further configured to, in the confidence degrees corresponding to the multiple first interest regions, use the first interest region corresponding to the maximum confidence degree as the first target region, and use the size of the first target region as the target pixel size.

For specific functional implementation manners of the first convolution unit 121 and the search unit 122, reference may be made to step S101 in the corresponding embodiment of fig. 2, which is not described herein again.

Referring to fig. 5, the search unit 122 may include: score calculation subunit 1221, score determination subunit 1222.

A score calculation subunit 1221, configured to slide windows on the first convolution feature map, and determine, in each window, a plurality of candidate regions based on anchor boxes at different scales;

the score calculating subunit 1221 is further configured to calculate a target foreground score of the first volume of feature information in each candidate region;

a score determining subunit 1222, configured to determine the plurality of first regions of interest according to the target foreground score of each candidate region.

The specific functional implementation manner of the score calculating subunit 1221 and the score determining subunit 1222 may refer to step S101 in the corresponding embodiment of fig. 2, which is not described herein again.

Referring to fig. 5, the score determining subunit 1222 may include: a first determination subunit 12221, a deletion reservation subunit 12222.

A first determining subunit 12221, configured to determine, as the first auxiliary region, a candidate region where the target foreground score is greater than a score threshold;

the first determining subunit 12221, further configured to determine, as the second auxiliary region, the first auxiliary region having the largest target foreground score, in the first auxiliary region;

a deletion holding subunit 12222, configured to calculate overlapping areas between the second auxiliary regions and the remaining first auxiliary regions, delete the first auxiliary regions whose overlapping areas are greater than an area threshold, and hold the first auxiliary regions whose overlapping areas are less than or equal to the area threshold;

the first determining subunit 12221 is further configured to determine the second auxiliary area and the reserved first auxiliary area as the first region of interest.

For specific functional implementation manners of the first determining subunit 12221 and the deleting reserving subunit 12222, reference may be made to step S101 in the embodiment corresponding to fig. 2, and details are not described here again.

Referring to fig. 5, the second detection module 13 may include: a conversion unit 131, an area determination unit 132, a selection unit 133, a size determination unit 134.

A conversion unit 131 for converting the target image into a target grayscale image;

a region determining unit 132, configured to use a connected region in the target grayscale image as a first reference region;

the converting unit 131 is further configured to input the grayscale images in the plurality of first reference regions into a classification model, and identify a first probability that each first reference region contains the reference object;

a selecting unit 133, configured to select, according to the first probability, a first reference region that meets a matching condition from the plurality of first reference regions as a second reference region;

a size determining unit 134, configured to determine the reference pixel size according to the size of the second reference area.

For specific functional implementation manners of the converting unit 131, the area determining unit 132, the selecting unit 133, and the size determining unit 134, reference may be made to step S102 in the corresponding embodiment of fig. 2, which is not described herein again.

Referring to fig. 5, the area determination unit 132 may include: gradiometer unit 1321, first detection subunit 1322.

The gradient operator unit 1321 is configured to calculate a first gradient map corresponding to the target gray image according to a gradient operator, and perform a closing operation on the first gradient map to obtain a second gradient map;

a first detecting subunit 1322, configured to detect a communication area of the first gradient map and the second gradient map, respectively, and forward-process the detected communication area to obtain a plurality of auxiliary reference areas;

the first detecting subunit 1322 is further configured to use a region having the same position information as the auxiliary reference region in the target grayscale image as the first reference region.

The specific functional implementation manners of the gradient meter subunit 1321 and the first detecting subunit 1322 can refer to step S201 in the embodiment corresponding to fig. 3, and are not described herein again.

Referring to fig. 5, the selection unit 133 may include: a first extraction sub-unit 1331, a second determination sub-unit 1332.

A first extraction subunit 1331 configured to extract a maximum first probability among the plurality of first probabilities;

a second determining subunit 1332, configured to, if the maximum first probability is less than or equal to a probability threshold, determine a weight of the first reference region according to the position information of the first reference region, the length-width ratio of the first reference region, and the size of the first reference region, determine the first reference region with the largest weight as the first reference region that satisfies the matching condition, and determine the first reference region with the largest weight as the second reference region;

the second determining subunit 1332 is further configured to, if the maximum first probability is greater than the probability threshold, determine the first reference region corresponding to the maximum first probability as the first reference region meeting the matching condition, and determine the first reference region corresponding to the maximum first probability as the second reference region.

For specific functional implementation manners of the first extracting sub-unit 1331 and the second determining sub-unit 1332, reference may be made to step S203 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 5, the size determining unit 134 may include: a second extraction subunit 1341, a second detection subunit 1342.

A second extracting subunit 1341, configured to, when the reference object belongs to a first reference object with a fixed shape, extract gray-scale image edge information in the second reference region to obtain an auxiliary gradient image;

a second detecting subunit 1342, configured to detect a continuous curve in the auxiliary gradient image as a first target curve, and determine a first target diameter according to the first target curve;

the second detecting subunit 1342 is further configured to determine the first target diameter as the reference pixel size.

For specific functional implementation manners of the second extracting subunit 1341 and the second detecting subunit 1342, refer to step S204 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 5, the size determining unit 134 may include: a second extraction subunit 1341, a second detection subunit 1342; the method can also comprise the following steps: a clustering subunit 1343, a size determination subunit 1344.

A clustering subunit 1343, when the reference object belongs to a second reference object whose shape is not fixed, regarding an area having the same positional information as the second reference area in the target image as a third reference area;

the clustering subunit 1343 is further configured to perform color clustering processing on the images in the third reference region according to the colors of the images in the third reference region, so as to obtain a clustering result region;

a size determining subunit 1344, configured to determine the reference pixel size according to the size of the clustering result region.

For specific functional implementation manners of the clustering subunit 1343 and the size determining subunit 1344, reference may be made to step S204 in the embodiment corresponding to fig. 3, which is not described herein again.

Referring to fig. 5, the second detection module 13 may include: the conversion unit 131, the area determination unit 132, the selection unit 133, and the size determination unit 134 may further include: a second convolution unit 135, a recognition unit 136, and a calculation unit 137.

A second convolution unit 135, configured to, when the reference object belongs to a second reference object with an unfixed shape, perform convolution processing on the target image based on a convolution layer in the mask region convolution neural network model to obtain a second convolution feature map formed by combining second convolution feature information;

the identifying unit 136 is configured to search a plurality of second interest areas in the second convolution feature map, and perform pooling processing on second convolution feature information included in each second interest area to obtain second structure feature information;

the identifying unit 136 is further configured to identify a second probability that each second interest region includes the second reference object according to second feature information included in the second interest region;

the calculating unit 137 is configured to determine a second interest region corresponding to the maximum second probability as an auxiliary target region, and calculate a binary mask of each pixel in the auxiliary target region;

the calculating unit 137 is further configured to combine all pixels corresponding to the binary mask belonging to the foreground mask into a target sub-image, and determine the size of the reference pixel according to the size of the target sub-image.

The specific functional implementation manners of the second convolution unit 135, the identification unit 136, and the calculation unit 137 may refer to steps S302 to S306 in the embodiment corresponding to fig. 4, which is not described herein again.

Referring to fig. 5, the second detecting module 13 may include: the conversion unit 131, the area determination unit 132, the selection unit 133, the size determination unit 134, the second convolution unit 135, the identification unit 136, and the calculation unit 137 may further include: a third convolution unit 138 and an extraction unit 139.

A third convolution unit 138, configured to, when the reference object belongs to the first reference object with a fixed shape, perform convolution processing on the target image based on the convolution layer in the second complete convolution neural network model to obtain a third convolution feature map formed by combining third convolution feature information;

the third convolution unit 138 is further configured to search a plurality of third interest areas in the third convolution feature map, and perform pooling processing on third convolution feature information included in each third interest area to obtain third structure feature information;

the third convolution unit 138 is further configured to identify, according to third feature information included in each third interest region, a third probability that each third interest region includes the first reference object;

the third convolution unit 138 is further configured to determine, as the second target region, a third region of interest with a maximum third probability among the third probabilities corresponding to the plurality of third regions of interest;

an extracting unit 139, configured to extract edge information of the image in the second target region to obtain an edge gradient image;

the third convolution unit 138 is further configured to detect a continuous curve in the edge gradient image as a second target curve, determine a second target diameter according to the second target curve, and determine the second target diameter as the reference pixel size.

The specific functional implementation manner of the third convolution unit 138 and the extraction unit 139 may refer to step S306 in the embodiment corresponding to fig. 4, which is not described herein again.

Referring to fig. 5, the obtaining module 11 is specifically configured to:

The image detection apparatus further includes:

a display module 15, configured to use the attribute type of the attribute type feature corresponding to the confidence of the first target region as the tag information corresponding to the first target region, and determine, according to the tag information corresponding to the first target region and the actual size of the target, target service data associated with the service request.

The display module 15 is further configured to display target service data and store the target service data; the target service data comprises: service claim amount and physical sign information of the target object;

a sending module 16, configured to send the target service data to a service terminal associated with the service request.

The specific functional implementation manner of the obtaining module 11 may refer to step S101 in the embodiment corresponding to fig. 2, which is not described herein again; the specific functional implementation manner of the display module 15 and the sending module 16 may refer to step S103 in the corresponding embodiment of fig. 2, which is not described herein again.

The method comprises the steps of respectively detecting the pixel size of a target object in an image and the pixel size of a reference object in the image from the image containing the target object and the reference object, then obtaining the real size of the reference object, and determining the real size of the target object according to the proportional relation, so that the real size of the target object can be automatically determined, the size is prevented from being measured in a manual mode, and the efficiency of measuring the size of the target object is improved.

Further, please refer to fig. 6, which is a schematic structural diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 6, the image processing apparatus 1 in fig. 6 may be applied to the electronic device 1000, and the electronic device 1000 may include: the processor 1001, the network interface 1004, and the memory 1005, the electronic device 1000 may further include: a user interface 1003, and at least one communication bus 1002. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 6, the memory 1005, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the electronic device 1000 shown in fig. 6, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing input to a user; and the processor 1001 may be configured to invoke the device control application stored in the memory 1005 to implement:

acquiring a target image containing the target object and a reference object, detecting the pixel size of the target object in the target image as a target pixel size, and detecting the pixel size of the reference object in the target image as a reference pixel size;

In one embodiment, when the processor 1001 performs the detection of the pixel size of the target object in the target image as the target pixel size, specifically performs the following steps:

performing convolution processing on the target image based on a convolution layer in a first complete convolution neural network model to obtain a first convolution characteristic diagram formed by combining first convolution characteristic information;

searching a plurality of first interest areas in the first volume feature map, and performing pooling processing on first volume feature information contained in each first interest area to obtain first structure feature information;

identifying a first degree of matching of first structural feature information contained in each first region of interest to a plurality of attribute type features in the first fully convolutional neural network model;

taking the maximum first matching degree as the confidence degree corresponding to the first interest region in a plurality of first matching degrees corresponding to each first interest region;

and in the confidence degrees corresponding to the multiple first interest regions, taking the first interest region corresponding to the maximum confidence degree as a first target region, and taking the size of the first target region as the target pixel size.

In an embodiment, when the processor 1001 searches for a plurality of first interest areas in the first volume feature map, it specifically performs the following steps:

sliding windows on the first convolution feature map, and determining a plurality of candidate regions in each window based on anchor boxes under different scales;

and respectively calculating a target foreground score of the first volume characteristic information in each candidate region, and determining the plurality of first interest regions according to the target foreground score of each candidate region.

In one embodiment, when the processor 1001 determines the plurality of first interest areas according to the target foreground score of each candidate area, the following steps are specifically performed:

determining candidate areas with the target foreground score larger than a score threshold value as first auxiliary areas;

determining a first auxiliary area with the largest target foreground score as a second auxiliary area in the first auxiliary area;

respectively calculating the overlapping areas between the second auxiliary area and the rest first auxiliary areas, deleting the first auxiliary areas with the overlapping areas larger than an area threshold value, and reserving the first auxiliary areas with the overlapping areas smaller than or equal to the area threshold value;

determining the second auxiliary area and the reserved first auxiliary area as the first region of interest.

In one embodiment, when the processor 1001 detects the pixel size of the reference object in the target image as the reference pixel size, the following steps are specifically performed:

converting the target image into a target gray image, and taking a communication area in the target gray image as a first reference area;

respectively inputting the gray level images in the plurality of first reference regions into a classification model, and identifying a first probability that each first reference region contains the reference object;

according to the first probability, selecting a first reference region which meets a matching condition from the plurality of first reference regions as a second reference region;

and determining the size of the reference pixel according to the size of the second reference area.

In an embodiment, when the processor 1001 executes the step of taking the connected region in the target grayscale image as the first reference region, the following steps are specifically executed:

calculating a first gradient map corresponding to the target gray image according to a gradient operator, and performing closed operation on the first gradient map to obtain a second gradient map;

respectively detecting the communication areas of the first gradient map and the second gradient map, and correcting the detected communication areas to obtain a plurality of auxiliary reference areas;

and taking the area with the same position information as the auxiliary reference area in the target gray scale image as the first reference area.

In an embodiment, when the processor 1001 selects, according to the first probability, a first reference region that satisfies a matching condition from the plurality of first reference regions as a second reference region, the following steps are specifically performed:

extracting a maximum first probability from the plurality of first probabilities;

if the maximum first probability is less than or equal to a probability threshold, determining the weight of the first reference region according to the position information of the first reference region, the length-width ratio of the first reference region and the size of the first reference region, determining the first reference region with the maximum weight as the first reference region meeting the matching condition, and determining the first reference region with the maximum weight as the second reference region;

if the maximum first probability is greater than the probability threshold, determining the first reference region corresponding to the maximum first probability as the first reference region satisfying the matching condition, and determining the first reference region corresponding to the maximum first probability as the second reference region.

In one embodiment, when the processor 1001 determines the reference pixel size according to the size of the second reference area, the following steps are specifically performed:

when the reference object belongs to a first reference object with a fixed shape, extracting gray level image edge information in the second reference area to obtain an auxiliary gradient image;

detecting a continuous curve in the auxiliary gradient image to be used as a first target curve, and determining a first target diameter according to the first target curve;

determining the first target diameter as the reference pixel size.

In one embodiment, when the processor 1001 determines the reference pixel size according to the size of the second reference region, the following steps are specifically performed:

when the reference object belongs to a second reference object of which the shape is not fixed, taking an area having the same position information as the second reference area in the target image as a third reference area;

according to the color of the image in the third reference area, carrying out color clustering processing on the image in the third reference area to obtain a clustering result area;

and determining the size of the reference pixel according to the size of the clustering result area.

In one embodiment, when the processor 1001 performs the detection of the pixel size of the reference object in the target image as the reference pixel size, specifically performs the following steps:

when the reference object belongs to a second reference object with an unfixed shape, performing convolution processing on the target image based on a convolution layer in a mask region convolution neural network model to obtain a second convolution characteristic diagram formed by combining second convolution characteristic information;

searching a plurality of second interest areas in the second convolution feature map, and performing pooling processing on second convolution feature information contained in each second interest area to obtain second structure feature information;

identifying a second probability that each second interest region contains the second reference object according to second feature information contained in each second interest region;

determining a second interest area corresponding to the maximum second probability as an auxiliary target area, and calculating a binary mask of each pixel in the auxiliary target area;

and combining all pixels corresponding to the binary masks belonging to the foreground masks into a target sub-image, and determining the size of the reference pixel according to the size of the target sub-image.

In one embodiment, when detecting the pixel size of the reference object in the target image as the reference pixel size, the processor 1001 specifically performs the following steps:

when the reference object belongs to a first reference object with a fixed shape, performing convolution processing on the target image based on a convolution layer in a second complete convolution neural network model to obtain a third convolution characteristic diagram formed by combining third convolution characteristic information;

searching a plurality of third interest areas in the third convolution feature map, and performing pooling processing on third convolution feature information contained in each third interest area to obtain third structure feature information;

identifying a third probability that each third interest region contains the first reference object according to third feature information contained in each third interest region;

determining a third interest area with the maximum third probability as a second target area in a third probability corresponding to the plurality of third interest areas;

extracting edge information of the image in the second target area to obtain an edge gradient image;

and detecting a continuous curve in the edge gradient image to be used as a second target curve, determining a second target diameter according to the second target curve, and determining the second target diameter as the reference pixel size.

In one embodiment, when acquiring a target image including a target object and a reference object, the processor 1001 specifically performs the following steps:

receiving a service request associated with the target object, and acquiring a target image containing the target object and the reference object according to the service request;

the processor 1001 further performs the following steps:

taking the attribute type of the attribute type feature corresponding to the confidence degree of the first target area as the label information corresponding to the first target area, and determining target service data associated with the service request according to the label information corresponding to the first target area and the target actual size;

displaying target service data and storing the target service data; the target service data comprises: the service claim amount and the physical sign information of the target object;

and sending the target service data to a service terminal associated with the service request.

The method comprises the steps of detecting the pixel size of a target object in a target image as the target pixel size by acquiring the target image containing the target object and a reference object, and detecting the pixel size of the reference object in the target image as the reference pixel size; and acquiring a reference actual size of the reference object, and determining the target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size. In the above way, the pixel size of the target object in the image and the pixel size of the reference object in the image are respectively detected from the image containing the target object and the reference object, the real size of the reference object is obtained, and the real size of the target object can be determined according to the proportional relation, so that the real size of the target object can be automatically determined, the size measurement in a manual mode is avoided, and the efficiency of measuring the size of the target object is improved.

It should be understood that the electronic device 1000 described in the embodiment of the present invention may perform the description of the image processing method in the embodiment corresponding to fig. 2 to fig. 4, and may also perform the description of the image processing apparatus 1 in the embodiment corresponding to fig. 5, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Furthermore, it is to be noted here that: an embodiment of the present invention further provides a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the image processing apparatus 1, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the image processing method in the embodiment corresponding to fig. 2 to fig. 4 can be performed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. An image processing method, comprising:

acquiring a reference actual size of the reference object, and determining a target actual size of the target object according to the target pixel size, the reference pixel size and the reference actual size;

wherein the detecting a pixel size of the reference object in the target image as a reference pixel size includes:

according to the first probability, selecting a first reference area meeting a matching condition from the plurality of first reference areas as a second reference area;

2. The method according to claim 1, wherein the detecting a pixel size of the target object in the target image as a target pixel size comprises:

identifying a first degree of matching of first structural feature information contained in each first region of interest to a plurality of attribute type features in the first full convolutional neural network model;

3. The method of claim 2, wherein searching the first volumetric feature map for a plurality of first regions of interest comprises:

sliding windows on the first convolution feature map, determining a plurality of candidate regions in each window based on anchor boxes at different scales;

4. The method of claim 3, wherein the determining the first plurality of regions of interest according to the target foreground score of each candidate region comprises:

5. The method according to claim 1, wherein the using the connected region in the target gray-scale image as a first reference region comprises:

6. The method according to claim 1, wherein the selecting, according to the first probability, a first reference region satisfying a matching condition from the plurality of first reference regions as a second reference region comprises:

7. The method of claim 1, wherein determining the reference pixel size based on the size of the second reference region comprises:

when the reference object belongs to a first reference object with a fixed shape, extracting gray image edge information in the second reference area to obtain an auxiliary gradient image;

determining the first target diameter as the reference pixel size.

8. The method of claim 1, wherein determining the reference pixel size based on the size of the second reference region comprises:

when the reference object belongs to a second reference object with unfixed shape, taking an area with the same position information as the second reference area in the target image as a third reference area;

9. The method according to claim 1, wherein the detecting a pixel size of the reference object in the target image as a reference pixel size comprises:

when the reference object belongs to a second reference object with an unfixed shape, performing convolution processing on the target image based on a convolution layer in a mask region convolution neural network model to obtain a second convolution feature map formed by combining second convolution feature information;

10. The method according to claim 1, wherein the detecting a pixel size of the reference object in the target image as a reference pixel size comprises:

and detecting a continuous curve in the edge gradient image to be used as a second target curve, determining a second target diameter according to the second target curve, and determining the second target diameter as the size of the reference pixel.

11. The method of claim 2, wherein the obtaining a target image containing a target object and a reference object comprises:

the method further comprises:

taking the attribute type of the attribute type feature corresponding to the confidence degree of the first target area as the corresponding label information of the first target area, and determining target service data associated with the service request according to the label information corresponding to the first target area and the target actual size;

displaying the target service data and storing the target service data; the target service data comprises: the service claim amount and the physical sign information of the target object;

12. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring a target image containing a target object and a reference object;

the second detection module is used for converting the target image into a target gray image, taking a connected region in the target gray image as a first reference region, respectively inputting the gray images in the plurality of first reference regions into a classification model, identifying a first probability that each first reference region contains the reference object, selecting a first reference region meeting a matching condition from the plurality of first reference regions according to the first probability to serve as a second reference region, and determining the size of a reference pixel according to the size of the second reference region;

13. An electronic device, comprising: a processor and a memory;

the processor is coupled to a memory, wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method of any of claims 1-11.

14. A computer storage medium, characterized in that it stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-11.