CN112541489A

CN112541489A - Image detection method and device, mobile terminal and storage medium

Info

Publication number: CN112541489A
Application number: CN201910898654.3A
Authority: CN
Inventors: 栾屹
Original assignee: SF Technology Co Ltd
Current assignee: SF Technology Co Ltd; SF Tech Co Ltd
Priority date: 2019-09-23
Filing date: 2019-09-23
Publication date: 2021-03-23

Abstract

The embodiment of the application discloses an image detection method, an image detection device, a mobile terminal and a storage medium, and the embodiment of the application can acquire an image to be detected containing a target object; detecting the image to be detected to generate a heat point diagram; extracting pixel features of the target object from the hotspot graph; and determining the area of the target object according to the pixel characteristics. According to the scheme, the hot spot diagram of the target object can be generated based on the image to be detected, the area where the target object is located according to the pixel characteristics in the hot spot diagram, and the image detection efficiency and accuracy are improved.

Description

Image detection method and device, mobile terminal and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image detection method, an image detection device, a mobile terminal, and a storage medium.

Background

In the prior art, a traditional image processing technology is adopted for detecting and positioning a telephone number in an image, firstly, noise reduction, gray level conversion, binarization and other operations are performed on the image to extract more remarkable character, pattern and texture areas in the image, then, a communication domain search algorithm is used for screening out an area with denser characters as a candidate area, and finally, further screening is performed according to the length-width ratio and the position of the candidate area to determine the area where the telephone number is located.

The existing detection method at least has the following problems: first, searching for a telephone number region based on pattern texture complexity is too simple, and some complex patterns or marks in an image may be mistaken for a telephone number, so that the detection accuracy is low. Secondly, a communication domain search algorithm is used, and the character dense region is used as a candidate region, so that different types of texts (number sequences and Chinese character text lines) cannot be accurately distinguished. For example, in an express bill with dense characters, text information such as a telephone number, a recipient name, and a recipient address is located at adjacent positions, and the telephone number cannot be accurately distinguished. Thirdly, the dense characters can cause the telephone number area range determined by the communication domain search algorithm to be inaccurate, and the situations of few words or many words and the like occur. Fourthly, after a plurality of candidate areas are obtained, the telephone number area with the highest confidence coefficient needs to be selected through the length-width ratio and the position, which results in poor universality, for example, once the font, the format of the bill, the photographing angle and the like of the telephone number are changed, accurate detection cannot be performed, and the handwritten express bill with no strict format constraint and various formats is not applicable at all.

Disclosure of Invention

The embodiment of the application provides an image detection method, an image detection device, a mobile terminal and a storage medium, which can improve the efficiency and accuracy of image detection.

In a first aspect, an embodiment of the present application provides an image detection method, including:

acquiring an image to be detected containing a target object;

detecting the image to be detected to generate a heat point diagram;

extracting pixel features of the target object from the hotspot graph;

and determining the area of the target object according to the pixel characteristics.

In some embodiments, the detecting the image to be detected and generating the hotspot graph includes:

extracting the characteristics of the image to be detected through the trained detection model to obtain characteristic information;

and generating a heat point diagram according to the characteristic information.

In some embodiments, the extracting the features of the image to be detected by the trained detection model to obtain the feature information includes:

determining a preset receptive field range by taking each pixel point in the image to be detected as a center;

and calculating the pixels in the preset receptive field range based on the trained detection model to obtain characteristic information.

In some embodiments, before the feature extraction is performed on the image to be detected by the trained detection model, the method further includes:

acquiring a training sample image and a target heat point diagram corresponding to the training sample image;

preprocessing the training sample image to obtain a preprocessed training sample image;

generating a sample hot spot diagram based on the preprocessed training sample image through a preset detection model;

and converging the target heat point diagram and the sample heat point diagram through a preset loss function so as to adjust the parameters of the detection model and obtain the trained detection model.

In some embodiments, before converging the target hotspot graph and the sample hotspot graph by the preset loss function, the method further comprises:

acquiring each pixel point in the sample hot spot diagram and each pixel point in the target hot spot diagram;

and constructing a preset loss function according to the difference value between each pixel point in the sample heat point diagram and each pixel point in the target heat point diagram.

In some embodiments, the target object includes a phone number, and after determining the area of the target object according to the pixel characteristics, the method further includes:

and identifying the telephone number in the area.

In some embodiments, the hot spot map includes four hot spot maps, each hot spot map includes a vertex, and the extracting the pixel feature of the target object from the hot spot map includes:

extracting the pixel with the maximum pixel value from each hot spot diagram to obtain the pixel characteristics of four vertexes;

the determining the region of the target object according to the pixel features comprises:

and determining the position of the target object externally connected with the text box according to the pixel characteristics of the four vertexes to obtain the area where the target object is located.

In a second aspect, an embodiment of the present application further provides an image detection apparatus, including:

the first acquisition module is used for acquiring an image to be detected containing a target object;

the detection module is used for detecting the image to be detected and generating a heat point diagram;

an extraction module, configured to extract pixel features of the target object from the hotspot graph;

and the determining module is used for determining the area of the target object according to the pixel characteristics.

In some embodiments, the detection module comprises:

the extraction submodule is used for extracting the characteristics of the image to be detected through the trained detection model to obtain characteristic information;

and the generating submodule is used for generating a heat point diagram according to the characteristic information.

In some embodiments, the extraction submodule is specifically configured to:

In some embodiments, the image detection apparatus further comprises:

the second acquisition module is used for acquiring a training sample image and a target heat point diagram corresponding to the training sample image;

the processing module is used for preprocessing the training sample image to obtain a preprocessed training sample image;

the generating module is used for generating a sample hot spot diagram based on the preprocessed training sample image through a preset detection model;

and the convergence module is used for converging the target hot spot diagram and the sample hot spot diagram through a preset loss function so as to adjust the parameters of the detection model and obtain the trained detection model.

In some embodiments, the image detection apparatus further comprises:

the third acquisition module is used for acquiring each pixel point in the sample hot spot diagram and each pixel point in the target hot spot diagram;

and the construction module is used for constructing a preset loss function according to the difference value between each pixel point in the sample heat point diagram and each pixel point in the target heat point diagram.

In some embodiments, the target object includes a phone number, and the image detection apparatus further includes:

and the identification module is used for identifying the telephone number in the area.

In some embodiments, the hot spot map includes four hot spot maps, each hot spot map includes a vertex, and the extraction module is specifically configured to: extracting the pixel with the maximum pixel value from each hot spot diagram to obtain the pixel characteristics of four vertexes;

the determining module is specifically configured to: and determining the position of the target object externally connected with the text box according to the pixel characteristics of the four vertexes to obtain the area where the target object is located.

In a third aspect, an embodiment of the present application further provides a mobile terminal, including a memory and a processor, where the memory stores a computer program, and the processor executes any one of the image detection methods provided in the embodiment of the present application when calling the computer program in the memory.

In a fourth aspect, the present application further provides a storage medium, where the storage medium is used to store a computer program, and the computer program is loaded by a processor to execute any one of the image detection methods provided in the embodiments of the present application.

According to the embodiment of the application, the image to be detected containing the target object can be obtained, the image to be detected is detected, the hotspot graph is generated, then the pixel characteristics of the target object can be extracted from the hotspot graph, and at the moment, the area where the target object is located can be determined according to the pixel characteristics. According to the scheme, the hot spot diagram of the target object can be generated based on the image to be detected, the area where the target object is located according to the pixel characteristics in the hot spot diagram, and the image detection efficiency and accuracy are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image detection method provided in an embodiment of the present application;

FIG. 2 is another schematic flow chart diagram of an image detection method provided in an embodiment of the present application;

fig. 3 is a schematic diagram illustrating labeling of an express waybill image according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a deep learning neural network model provided by an embodiment of the present application;

FIG. 5 is another schematic flow chart diagram of an image detection method provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a hotspot graph generation provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an image detection apparatus provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a mobile terminal according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of an image detection method according to an embodiment of the present application. The execution main body of the image detection method can be the image detection device provided by the embodiment of the application, or a mobile terminal integrated with the image detection device, wherein the image detection device can be realized in a hardware or software mode, and the mobile terminal can be a smart phone, a tablet computer, a palm computer, a notebook computer or the like. The image detection method may include:

s101, acquiring an image to be detected containing a target object.

The target object may be flexibly set according to actual needs, and specific contents are not limited herein. For example, the target object may be a telephone number (including a mobile phone number, a landline number, and the like), an identification number, or text, and the like.

The image to be detected can be an image obtained by shooting an express bill, an image obtained by shooting an identity card, an image obtained by shooting a personal business card, an image obtained by downloading from a server and the like.

S102, detecting an image to be detected, and generating a heat point diagram.

After the image to be detected is obtained, a hotspot graph can be generated based on the image to be detected.

In some embodiments, detecting the image to be detected, generating the hotspot graph may include: extracting the characteristics of the image to be detected through the trained detection model to obtain characteristic information; and generating a heat point diagram according to the characteristic information.

The detection model may be a deep learning neural network model or other types of models, and the specific type is not limited herein. For example, in order to meet the requirement of mobile terminal deployment, an extremely simplified deep learning neural network model can be used as a detection model, the detection model only comprises two network layers, namely a convolutional layer and a maximum pooling layer, the structure is simple, the calculation speed is high, the transplantation is convenient, and the detection model can be directly deployed on the mobile terminal. In addition, the middle part of the detection model can adopt a dense connection mode, so that the detection model has strong adaptability to the size change of the target object, and the target objects with different sizes can be accurately positioned. For example, the size of the region where the target object is located is related to the distance between the lens and the target object, and when the lens is far away from the target object, the size of the target object in the acquired image is small; when the lens is close to the target object, the size of the target object in the collected image is large, and due to the adaptability to size change, the detection model has strong adaptability to shooting distance, and target objects with different sizes can be accurately positioned.

After the detection model is determined, the detection model can be trained, and feature extraction is performed on the image to be detected through the trained detection model to obtain feature information, so that a heat point diagram is generated according to the feature information. The feature information may be a feature map generated after performing a convolution operation on the image to be detected, and the feature map may be a feature map used for describing the target object. The hotspot graph can be an image highlighting the area where the target object is located, for example, the greater the pixel value of the area in the hotspot graph, the darker the color, the pixel area where the pixel value is greater than a preset pixel threshold can be determined as the vertex area of the area where the target object is located, and the vertex position can be determined from the vertex area using a non-maximum suppression algorithm. Alternatively, the hot spot map may be a truth map, for example, the vertex of the region where the target object is located is 1, and the rest of the positions are 0.

In some embodiments, before performing feature extraction on an image to be detected by the trained detection model, the image detection method may further include: acquiring a training sample image and a target heat point diagram corresponding to the training sample image; preprocessing a training sample image to obtain a preprocessed training sample image; generating a sample hot spot diagram based on the preprocessed training sample image through a preset detection model; and converging the target heat point diagram and the sample heat point diagram through a preset loss function so as to adjust the parameters of the detection model and obtain the trained detection model.

Specifically, in the process of training the detection model, a training sample image may be obtained, where the training sample image may include multiple images, the training sample image may include an object, the object may be a phone number, an identification number, or a text, the training sample image may be acquired by a camera or a camera, or the training sample image may be downloaded from a server, and the like. For example, the target hotspot graph may be generated from the training sample image, or the target hotspot graph (which may also be referred to as a real hotspot graph) may be generated based on four vertices (including an upper left vertex, an upper right vertex, a lower right vertex, and a lower left vertex) of a rectangular region in which the object is located in the training sample image. Labeling the four vertices of the rectangular region where the object is located can generate a text file in json format, which can include coordinate values of the four vertices, such as a [175, 104], B [451, 107], C [451, 140], D [175, 135 ].

The target heat point map may include one, and the target heat point map may include four vertices of a rectangular area where an object in the training sample image is located, where pixel values of positions of the four vertices are larger. Or, the target hotspot graph may include four, each target hotspot graph may include one vertex of a rectangular region where the object is located in the training sample image, that is, each target hotspot graph corresponds to the distribution of the upper left vertex, the upper right vertex, the lower right vertex, and the lower left vertex of the region where the object is located, four vertices of the four target hotspot graphs constitute four vertices of the rectangular region where the object is located in the training sample image, and the pixel values of the positions of the four vertices are larger.

After the training sample image is obtained, the training sample image and the target hotspot graph corresponding to the training sample image can be input into a detection model to be trained, and the detection model is trained, so that the detection model can extract a characteristic rule (e.g., digital characteristic) inside an object, for example, four vertexes of an area where an object (e.g., a telephone number) with a large pixel value (i.e., high brightness) in the target hotspot image is located are learned through the detection model, so that the ability of distinguishing the object from other interfering objects is gradually obtained, the learned characteristic rule is popularized to data outside a training sample image set, and the trained detection model can identify an object area of any input image.

In order to improve the accuracy and reliability of the detection model training, the training sample images can be enriched, and specifically, after the training sample images are obtained, the training sample images can be preprocessed to obtain preprocessed training sample images. The preprocessing can be flexibly set according to actual needs, for example, the chrominance, luminance or saturation of the acquired training sample image can be adjusted, and a series of preprocessing such as scaling, clipping, translation or turning can be performed on the training sample image.

Then, a sample hotspot graph (which may also be referred to as a predicted hotspot graph) may be generated by the detection model based on the preprocessed training sample image, specifically, feature extraction may be performed on an object in the to-be-detected image by the detection model to obtain feature information, for example, a receptive field range (for example, 65 × 65 pixels) may be determined with each pixel point in the training sample image as a center, pixels in the receptive field range may be calculated based on the detection model (for example, calculated and converted into a feature vector by a multi-layer neural network such as a convolutional layer and a max pooling layer) to obtain feature information, and the sample hotspot graph may be generated based on the feature information. The sample hot spot diagram may include one, and the sample hot spot diagram may include four vertices of a rectangular area where the object is located, where pixel values of the positions of the four vertices are larger. Or, the sample hotspot graph may include four samples, each sample hotspot graph may include one vertex of the rectangular area where the object is located, four vertices of the four sample hotspot graphs constitute four vertices of the rectangular area where the object is located in the training sample image, and pixel values of positions of the four vertices are larger.

Secondly, a preset loss function may be constructed, and in some embodiments, before the convergence of the target hotspot graph and the sample hotspot graph is performed through the preset loss function, the image detection method may further include: acquiring each pixel point in the sample hot spot diagram and each pixel point in the target hot spot diagram; constructing a preset Loss function according to the difference value between each pixel point in the sample hot spot diagram and each pixel point in the target hot spot diagram, wherein the preset Loss function Loss can be as follows:

wherein P is a sample hotspot graph generated by the detection model, G is a target hotspot graph, and P is_ijkFor corresponding pixel points on the sample hotspot graph, G_ijkFor the corresponding pixel point on the target hotspot graph, i is the number of rows of the pixels in the hotspot graph, j is the number of columns of the pixels in the hotspot graph, and k is the number of layers (i.e., the number of sample hotspot graphs), for example, when the sample hotspot graph corresponding to one training sample image is 4, k represents k sample hotspot graphs.

At this time, the target hotspot graph and the sample hotspot graph can be converged through a preset loss function to adjust parameters of the detection model, so that the trained detection model is obtained. That is, the training process for the detection model may be to adjust the parameters of the detection model until the target hotspot graph approaches the sample hotspot graph, that is, to make the preset loss function approach 0 as much as possible, and if the sample hotspot graph generated by the detection model and the target hotspot graph completely coincide, the value of the preset loss function is 0.

After obtaining the trained detection model, the trained detection model may perform feature extraction on the image to be detected, and in some embodiments, the trained detection model may perform feature extraction on the image to be detected, and obtaining the feature information may include: determining a preset receptive field range by taking each pixel point in an image to be detected as a center; and calculating pixels in a preset receptive field range based on the trained detection model to obtain characteristic information.

For example, a receptive field range (for example, 65 × 65 pixels) may be determined with each pixel point in the image to be detected as a center, and pixels within the receptive field range may be calculated based on the trained detection model (for example, pixel values are calculated and converted into feature vectors by a multilayer network such as a convolutional layer and a max pooling layer), so as to obtain feature information, so as to generate a heat map according to the feature information. For example, after 256 × 64 images to be detected are input into the detection model, a local region of 64 × 16 blocks 65 × 65 is defined at an interval of 4 pixels (the part beyond the boundary of the images to be detected is completed by 0), the 64 × 16 local regions are converted into one-dimensional feature vectors by the detection model and corresponding scores are calculated, and finally the generated hotspot graph has 64 × 16 values, each value is the probability (i.e., feature information) that the center of the corresponding receptive field is the vertex, so that the subsequent pixel value with the highest score can be set as the vertex of the region where the target object is located.

S103, extracting pixel characteristics of the target object from the hot spot graph.

After obtaining the hotspot graph, a pixel feature of the target object, which may be a pixel value or a brightness value, may be extracted from the hotspot graph. When the hotspot graph comprises one hotspot graph, the pixels of four areas with larger pixel values can be extracted from the hotspot graph to obtain the pixel characteristics of the target object; or the pixels of the four areas with larger brightness values can be extracted from the hot spot map to obtain the pixel characteristics of the target object.

In some embodiments, the hot-spot map includes four hot-spot maps, each hot-spot map includes a vertex, and extracting the pixel feature of the target object from the hot-spot map may include: and extracting the pixel with the maximum pixel value from each hot spot image to obtain the pixel characteristics of four vertexes.

When the hot spot map includes four hot spot maps, each hot spot map includes one vertex of the region where the target object is located, and at this time, the pixel with the largest pixel value may be extracted from each hot spot map by using a non-maximum suppression algorithm or another algorithm, so as to obtain pixel features of the four vertices.

And S104, determining the area where the target object is located according to the pixel characteristics.

In some embodiments, determining the region where the target object is located according to the pixel characteristics may include: and determining the position of the target object externally connected with the text box according to the pixel characteristics of the four vertexes to obtain the area of the target object.

For example, the hot spot map corresponding to the image to be detected is a hot spot map a, a hot spot map B, a hot spot map C and a hot spot map D, the pixel feature 1 of the vertex a is extracted from the hot spot map a, the pixel feature 2 of the vertex B is extracted from the hot spot map B, the pixel feature 3 of the vertex C is extracted from the hot spot map C, the pixel feature 4 of the vertex D is extracted from the hot spot map D, and the position of an external textbox of the target object can be determined according to the pixel feature 1, the pixel feature 2, the pixel feature 3 and the pixel feature 4, wherein the position of the external textbox is the area where the target object is located.

In some embodiments, the target object includes a phone number, and after determining the area where the target object is located according to the pixel characteristics, the image detection method may further include: telephone numbers within the area are identified.

When the target object is a telephone number, after the area where the telephone number is located is determined, the telephone number in the area can be identified through a character recognition model, the identified telephone number can be stored, the identified telephone number can be called, and the like.

The image detection method according to the above embodiment will be described in further detail below.

Referring to fig. 2 and fig. 5, fig. 2 is a schematic flowchart illustrating a process of training a detection model in an image detection method provided in an embodiment of the present application, and fig. 5 is a schematic flowchart illustrating a process of applying the detection model in the image detection method provided in the embodiment of the present application. The image detection method can be applied to a mobile terminal, and the mobile terminal is used for detecting the telephone number in the express waybill image and the detection model is used for deep learning neural network model.

Training of detection model

As shown in fig. 2, the flow of the image detection method may be as follows:

s201, the mobile terminal obtains a training sample image containing a telephone number and obtains a target hotspot graph corresponding to the training sample image.

Because a large number of training sample images are needed for training the deep learning neural network model, when the training sample images are prepared, a large number of express waybill images including telephone numbers need to be acquired to obtain the training sample images, the training sample images can be obtained by shooting the express waybill through a camera or a camera, or the training sample images can be obtained by downloading images related to the express waybill from a server, and the like.

And acquiring a target hotspot graph corresponding to the training sample image, for example, four vertices, such as an upper left vertex, an upper right vertex, a lower right vertex, and a lower left vertex, of an area where the phone number is located in the training sample image may be labeled, as shown in fig. 3, in order to protect privacy of the client, a name and a receiving address of the client are subjected to fuzzy processing, and a text file in a json format may be generated after the four vertices of the area where the phone number is located are labeled, where the text file may include x and y coordinate values of the four vertices, for example, a [175, 104], B [451, 107], C [451, 140], and D [175, 135 ].

And then generating four target heat point maps based on the training sample images marked with the four vertexes, wherein each target heat point map is respectively corresponding to one vertex, namely each target heat point map respectively corresponds to the distribution conditions of the upper left vertex, the upper right vertex, the lower right vertex and the lower left vertex of the area where the telephone number is located, the position with the largest pixel value in the target heat point map is the vertex position, namely the training sample image and the four target heat point maps corresponding to each training sample image are obtained, so that the training sample images are subsequently input into a deep learning neural network model, and the four vertexes of the area where the telephone number with higher brightness is located in the target heat point map are learned through the deep learning neural network model, so that the capability of distinguishing the telephone number from other texts is gradually obtained.

S202, the mobile terminal preprocesses the training sample image to obtain a preprocessed training sample image.

After the training sample image is obtained, the mobile terminal can input the training sample image and the target hotspot image corresponding to the training sample image into a deep learning neural network model to be trained, train the deep learning neural network model, so that the deep learning neural network model can extract a feature rule (e.g., a feature of a number) of a telephone number, for example, four vertexes of an area where the telephone number with a large pixel value (i.e., with high brightness) in the target hotspot is located are learned through the deep learning neural network model, thereby gradually obtaining the capability of distinguishing the telephone number from other texts, and popularizing the learned feature rule to data outside a training sample image set, i.e., the trained deep learning neural network model can recognize the telephone number area of any input image.

In order to improve the precision and reliability of deep learning neural network model training, training sample images can be enriched, that is, after the training sample images are obtained, the obtained training sample images can be subjected to adjustment of chromaticity, brightness or saturation, and a series of preprocessing such as scaling, clipping, translation or turning is performed on the training sample images.

Specifically, the chroma, the brightness and the saturation of the training sample image can be adjusted to obtain an adjusted training sample image; randomly zooming the adjusted training sample image to obtain a zoomed training sample image; cutting the zoomed training sample image to obtain a cut training sample image; turning the cut training sample image to obtain a turned training sample image; and setting the turned training sample image as a preprocessed training sample image.

For example, the chroma, the brightness, and the saturation of the training sample image are first adjusted, for example, the chroma of the training sample image may be adjusted to be in a luminance range of 0.6 to 1.4, the brightness of the training sample image may be adjusted to be in a luminance range of 0.6 to 1.4, and the saturation of the training sample image may be adjusted to be in a saturation range of 0.6 to 1.4, so that the adjusted training sample image may be obtained. Alternatively, only the training sample image may be chroma-adjusted, or only the training sample image may be luma-adjusted, or only the training sample image may be saturation-adjusted, or both luma-and saturation-adjusted, etc.

The adjusted training sample image may then be randomly scaled to obtain a scaled training sample image, for example, a random number of 0 to 1 may be generated, and if the random number is between 0 and 0.5, the adjusted training sample image may be scaled down, and if the random number is between 0.5 and 1, the adjusted training sample image may be scaled up. For example, mapping relationships between different value intervals and the reduction scale may be set, after it is determined that the adjusted training sample image needs to be reduced, a random number from 0 to 1 may be generated, and a value interval in which the random number is located may be determined, at this time, a current reduction scale may be determined according to the mapping relationship between the value interval and the reduction scale, and the adjusted training sample image may be reduced based on the reduction scale. For another example, a mapping relationship between different value sections and an amplification scale may be set, after it is determined that the adjusted training sample image needs to be amplified, a random number from 0 to 1 may be generated, and a value section in which the random number is located may be determined, at this time, a current amplification scale may be determined according to the mapping relationship between the value sections and the amplification scale, and the adjusted training sample image may be amplified based on the amplification scale.

Secondly, the scaled training sample image may be clipped to obtain a clipped training sample image, in some embodiments, the clipping the scaled training sample image to obtain the clipped training sample image may include: randomly cutting the zoomed training sample image to obtain a candidate training sample image after cutting; acquiring the central position of a bounding box of the telephone number in the candidate training sample image; and screening out images with the area larger than a preset value and containing the central position of the boundary box of the telephone number from the candidate training sample images to obtain the cut training sample images.

In order to ensure that the cropped training sample image contains the area of the telephone number and improve the reliability of the training sample image for model training, the training sample image may be screened, for example, the zoomed training sample image may be randomly cropped to obtain a cropped candidate training sample image, and then it is detected that the area of the candidate training sample image is greater than a preset value, and the preset value may be flexibly set according to actual needs, for example, the preset value may be 80% of the area of the training sample image, that is, it is detected that the area of the candidate training sample image is greater than 80% of the area of the training sample image. If the area of the candidate training sample image is smaller than or equal to a preset value, the candidate training sample image is removed, if the area of the candidate training sample image is larger than the preset value, a bounding box of the telephone number in the candidate training sample image is detected, the size and the shape of the bounding box can be flexibly set according to actual needs, and for example, the bounding box can be a bounding box of a quadrangle where the internally tangent telephone number is located. At this time, whether the center position of the boundary frame of the area where the telephone number is located in the candidate training sample image or not can be judged, if the center position of the boundary frame of the area where the telephone number is located is not located in the candidate training sample image, the candidate training sample image is removed, and if the center position of the boundary frame of the area where the telephone number is located in the candidate training sample image, the candidate training sample image is used as the training sample image after being cut.

After the clipped training sample image is obtained, a random number may be generated, which may be a numerical value in the range of 0 to 1 (including 0 and 1), and the clipped training sample image is flipped according to the random number to obtain the flipped training sample image. For example, after a random number of 0 to 1 is generated, if the random number is between 0 and 0.5, the clipped training sample image may be flipped left and right, flipped up and down, or diagonally, to obtain a flipped training sample image; and if the random number is between 0.5 and 1, not turning the clipped training sample image, and finally setting the turned training sample image as the preprocessed training sample image. So that the deep learning neural network model can be trained based on the preprocessed training sample image in the following process to obtain the trained deep learning neural network model.

It should be noted that, the adjustment of the chromaticity, the brightness, or the saturation may be performed only on the training sample image to obtain the preprocessed training sample image; or only carrying out random scaling on the training sample image to obtain a preprocessed training sample image; or only cutting the training sample image to obtain a preprocessed training sample image; or, only turning over the training sample image to obtain a preprocessed training sample image; alternatively, the training sample image may be randomly scaled and flipped to obtain a preprocessed training sample image, and so on.

And S203, the mobile terminal generates a sample heat point diagram based on the preprocessed training sample image through a preset deep learning neural network model.

The mobile terminal may train the deep learning neural network model according to the preprocessed training sample image, and in order to improve the precision and efficiency of model training, the preprocessed training sample image may be normalized, for example, the preprocessed training sample image may be normalized in size to obtain the normalized-in-size training sample image: if the long side of the preprocessed training sample image is larger than the preset length value, reducing the preprocessed training sample image to enable the long side to be the preset length value, filling the short side of the preprocessed training sample image with a preset numerical value to enable the short side to be the preset length value, and obtaining the training sample image with the normalized size; or if the long side of the preprocessed training sample image is smaller than the preset length value, the preprocessed training sample image is amplified to enable the long side to be the preset length value, the short side of the preprocessed training sample image is filled with a preset numerical value to enable the short side to be the preset length value, and the training sample image with the normalized size is obtained. The preset length value and the preset numerical value can be flexibly set according to actual needs. For example, if the long side of the preprocessed training sample image is larger than 550, the preprocessed training sample image is reduced so that the long side is equal to 550, and the short side of the preprocessed training sample image is filled with a value 0 so that the length of the short side is 550, so that the training sample image after size normalization can be obtained. If the long edge of the preprocessed training sample image is smaller than 550, the preprocessed training sample image is amplified to enable the long edge to be equal to 550, the short edge of the preprocessed training sample image is filled with a value 0 to enable the length of the short edge to be 550, and therefore the training sample image with the size being 550 × 550 and normalized can be obtained, and the deep learning neural network model can be trained according to the training sample image with the size being normalized.

The structure of the deep learning neural network model can be as shown in fig. 4, in order to meet the requirement of mobile terminal deployment, a very simplified deep learning neural network model can be used as the deep learning neural network model, the deep learning neural network model only comprises two network layers, namely a convolutional layer and a maximum pooling layer, the structure is simple, the calculation speed is high, the transplantation is convenient, and the deep learning neural network model can be directly deployed on the mobile terminal. Moreover, the middle part of the deep learning neural network model can comprise 14 convolution layers and adopts a dense connection mode, so that the deep learning neural network model has strong adaptability to the size change of telephone numbers, and the telephone numbers with different sizes can be accurately positioned. For example, the size of the area where the telephone number is located is related to the distance between the lens and the telephone number, and when the lens is far away from the telephone number, the size of the telephone number in the collected image is small; when the lens is close to the telephone number, the size of the telephone number in the collected image is large, and the adaptability to size change enables the deep learning neural network model to have strong adaptability to shooting distance, and telephone numbers of different sizes can be accurately positioned. That is to say, in the embodiment, the deep learning neural network model does not need to acquire candidate frames based on different confidence levels, merge the candidate frames according to the confidence levels, and perform accurate regression according to the candidate frames to acquire the accurate position of the phone number, and does not need to go through a deep learning neural network model (such as a neural network) for multiple times (first positioning the candidate frames, second accurate regression, if there is more than one candidate frame, multiple times of entering the deep learning neural network model), the whole process only needs to go through the deep learning neural network model once, and operations such as screening and merging of the candidate frames are not needed, so that the calculation speed is greatly increased.

In the process of training the deep learning neural network model, the mobile terminal can generate a sample heat point diagram based on the preprocessed training sample image (or the training sample image with normalized size) through the deep learning neural network model. For example, feature information may be obtained by performing feature extraction on a training sample image through a deep learning neural network model, so as to generate a sample heat point diagram according to the feature information. The feature information may be a feature map generated by performing a convolution operation on a training sample image about a phone number, where in the sample hotspot map, the pixel value of an area where the phone number is located is larger, and the area with the larger pixel value is darker in color. Alternatively, the sample hotspot graph may be a truth graph, e.g., the top of the area where the phone number is located is 1, and the rest are 0.

For example, a receptive field range (for example, 65 × 65 pixels) may be determined with each pixel point in the training sample image as a center, pixels within the receptive field range may be calculated based on the deep learning neural network model (for example, the pixels may be calculated and converted into feature vectors by a multi-layer neural network such as a convolutional layer and a max pooling layer) to obtain feature information, and a sample heat map may be generated based on the feature information. The sample hotspot graph can comprise four samples, each sample hotspot graph can comprise one vertex of a rectangular area where the telephone number is located, four vertices of the four sample hotspot graphs form four vertices of the rectangular area where the telephone number is located in the training sample image, and the pixel values of the positions of the four vertices are larger. For example, with each pixel point as a center, a local area of 64x16 block 65x65 is drawn out (the part beyond the boundary of the original image is completed by 0) in an input training sample image (the resolution is scaled to 256x64) at an interval of 4 pixels, the deep learning neural network model converts the 64x16 local area into a one-dimensional feature vector and calculates a score, the finally generated hotspot image has 64x16 values, each value is the probability that the center of the corresponding receptive field is the vertex, and the subsequent deep learning neural network model automatically adjusts internal parameters, so that the score of the area with the center close to the vertex is higher and higher, and the score of the area with the center far from the vertex is lower and lower. Taking the top left vertex as an example, if the center of a certain receptive field area is the top left vertex of the telephone number area, the bottom right quarter of the receptive field area should contain a sequence of digits, and the remaining three parts should not contain consecutive digits.

And S204, the mobile terminal constructs a preset loss function.

The mobile terminal can obtain each pixel point in the sample hotspot graph and each pixel point in the target hotspot graph, and construct a preset Loss function according to a difference value between each pixel point in the sample hotspot graph and each pixel point in the target hotspot graph, wherein the preset Loss function Loss can be as described above.

S205, the mobile terminal converges the target heat point diagram and the sample heat point diagram through a preset loss function so as to adjust parameters of the deep learning neural network model and obtain the trained deep learning neural network model.

The mobile terminal can converge the target hot spot diagram and the sample hot spot diagram through a preset loss function, and adjusts parameters of the deep learning neural network model to appropriate values so that the loss is low and the gradient does not decrease any more, so that the error between the target hot spot diagram and the sample hot spot diagram is reduced until the target hot spot diagram approaches the sample hot spot diagram, namely the preset loss function approaches 0 as far as possible, and therefore an accurate hot spot diagram can be generated based on each input image, and the trained deep learning neural network model can be obtained. The trained deep learning neural network model can analyze the image characteristics of any express waybill image and generate four hot spot maps, and the coordinate positions of the upper left vertex, the upper right vertex, the lower right vertex and the lower left vertex of the area where the telephone number is located can be obtained based on the four hot spot maps, so that the area where the telephone number is located can be determined.

The deep learning neural network model can be suitable for telephone numbers with dense characters such as express bills, and can be accurately positioned even if the front and the back of the telephone numbers have texts, so that more characters or fewer characters cannot be generated. In addition, the deep learning neural network model has strong adaptability to shooting, and can accurately position the text position of the telephone number for images obtained at different shooting angles or different shooting distances. The deep learning neural network model is not influenced by text layout, and can accurately position the number text position by different types of facesheets and different page layouts. In addition, the deep learning neural network model is simple in structure and high in calculation speed, and work efficiency of couriers is greatly improved.

Application of deep learning neural network model

As shown in fig. 5, the flow of the image detection method may be as follows:

s301, the mobile terminal collects an image to be detected.

The image to be detected can be an image obtained by shooting the express bill, or an image obtained by loading and unloading the express bill from the server, and the like.

S302, the mobile terminal extracts the features of the image to be detected through the trained deep learning neural network model, and generates four hot spot maps based on the extracted feature information.

For example, the mobile terminal may determine a receptive field range (for example, 65 × 65 pixels) by using each pixel point in the image to be detected as a center, and calculate pixels within the receptive field range based on the trained deep learning neural network model (for example, calculate pixel values and convert the pixel values into feature vectors through multilayer networks such as a convolutional layer and a max pooling layer), so as to obtain feature information, so as to generate a hot spot map according to the feature information. For example, after 256 × 64 images to be detected are input into the deep learning neural network model, local regions of 64 × 16 blocks 65 × 65 are marked out at intervals of 4 pixels (the part beyond the boundary of the images to be detected is completed by 0), the 64 × 16 local regions are converted into one-dimensional feature vectors by the deep learning neural network model and corresponding scores are calculated, and finally, the generated hotspot graph has 64 × 16 values, each value is the probability (i.e., feature information) that the center of the corresponding receptive field is the vertex, so that the pixel value with the highest score can be set as the vertex of the region where the telephone number is located in the following.

S303, the mobile terminal extracts the pixel with the maximum pixel value from each hot spot image to obtain the pixel characteristics of four vertexes.

When the hotspot graph comprises four hotspots, each hotspot graph respectively comprises one vertex of the area where the telephone number is located, and at this time, the pixel with the maximum pixel value can be respectively extracted from each hotspot graph by using a non-maximum suppression algorithm or other algorithms to obtain the pixel characteristics of the four vertices.

S304, the mobile terminal determines the position of the telephone number externally connected with the text box according to the pixel characteristics of the four vertexes, and the area where the telephone number is located is obtained.

For example, the hotspot graph corresponding to the image to be detected is a hotspot graph A, a hotspot graph B, a hotspot graph C and a hotspot graph D, the pixel feature 1 of the vertex a is extracted from the hotspot graph A, the pixel feature 2 of the vertex B is extracted from the hotspot graph B, the pixel feature 3 of the vertex C is extracted from the hotspot graph C, the pixel feature 4 of the vertex D is extracted from the hotspot graph D, and the position of the circumscribed textbox of the telephone number can be determined according to the pixel feature 1, the pixel feature 2, the pixel feature 3 and the pixel feature 4, wherein the circumscribed textbox position is the area where the telephone number is located.

S305, the mobile terminal identifies a telephone number in the area.

After the area where the telephone number is located is determined, the telephone number in the area can be identified through a character recognition model, the identified telephone number can be stored, the identified telephone number can be called, and the like. That is, after the image to be detected is input into the deep learning neural network model, the features of the image to be detected can be analyzed by the deep learning neural network model, and four hot spot maps are output through the last layer, for example, as shown in fig. 6, the hot spot maps respectively correspond to the upper left vertex, the upper right vertex, the lower right vertex and the lower left vertex of the telephone number region. After four hot spot maps are obtained, the position where the pixel value in each hot spot map is the largest can be located, namely the position coordinates of the upper left vertex, the upper right vertex, the lower right vertex and the lower left vertex of the area where the telephone number is located can be obtained respectively, so that the external text box of the telephone number is located, and the specific position of the telephone number area is obtained.

The method and the device for detecting the telephone number in the image can acquire the image to be detected containing the telephone number, detect the telephone number in the image to be detected through the trained deep learning neural network model, determine the area where the telephone number is located, and identify the telephone number in the area. On one hand, the strong identification and distinguishing capability of the deep learning neural network model can be utilized, the numbers can be accurately distinguished from Chinese characters, English characters and various patterns according to the unique shape characteristics of the numbers, and the condition that other patterns or characters are mistakenly detected as the numbers is avoided. On the other hand, image features are extracted from four vertexes of the telephone number area for identification and judgment, so that the starting position and the ending position of the telephone number can be accurately positioned, and the problem of separating the telephone number from adjacent characters on an express bill with dense characters is solved. On the other hand, numbers are distinguished through the number shape rule based on the deep learning neural network model, the influence of the character body change of the telephone number is avoided, even if the telephone number is handwritten and written at different positions on a bill, the external text box of the telephone number can be accurately detected, positioned and given, and the deep learning neural network model structure is small in calculation amount and high in solving speed on the premise of ensuring accuracy. The problem of telephone number detection location on various express waybill images is solved, and efficiency, distinguishability, accuracy, universality and the like of telephone number detection in the images are improved.

In order to better implement the image detection method provided by the embodiment of the present application, the embodiment of the present application further provides a device based on the image detection method. The terms are the same as those in the image detection method, and details of implementation can be referred to the description in the method embodiment.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure, wherein the image detection apparatus 300 may include a first obtaining module 301, a detecting module 302, an extracting module 303, a determining module 304, and the like.

The first obtaining module 301 is configured to obtain an image to be detected including a target object;

the detection module 302 is configured to detect an image to be detected and generate a hotspot graph;

an extracting module 303, configured to extract pixel features of the target object from the hotspot graph;

and the determining module 304 is configured to determine the region where the target object is located according to the pixel characteristics.

In some embodiments, the detection module 302 may include an extraction sub-module, a generation sub-module, and the like, which may be specifically as follows:

and the generating submodule is used for generating a hot spot diagram according to the characteristic information.

In some embodiments, the extraction submodule is specifically configured to: determining a preset receptive field range by taking each pixel point in an image to be detected as a center; and calculating pixels in a preset receptive field range based on the trained detection model to obtain characteristic information.

In some embodiments, the image detection apparatus 300 may further include a second obtaining module, a processing module, a generating module, a converging module, and the like, which may specifically be as follows:

the second acquisition module is used for acquiring the training sample image and a target heat point diagram corresponding to the training sample image;

In some embodiments, the image detection apparatus 300 may further include a third obtaining module, a building module, and the like, which may specifically be as follows:

and the construction module is used for constructing a preset loss function according to the difference value between each pixel point in the sample hot point diagram and each pixel point in the target hot point diagram.

In some embodiments, the target object includes a phone number, and the image detection apparatus 300 may further include: and the identification module is used for identifying the telephone numbers in the area.

In some embodiments, the hot-spot graph includes four hot-spot graphs, each hot-spot graph includes one vertex, and the extraction module 303 is specifically configured to: extracting the pixel with the maximum pixel value from each hot spot diagram to obtain the pixel characteristics of four vertexes; the determining module 304 is specifically configured to: and determining the position of the target object externally connected with the text box according to the pixel characteristics of the four vertexes to obtain the area of the target object.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

In the embodiment of the application, the first obtaining module 301 may obtain an image to be detected including a target object, the detecting module 302 detects the image to be detected to generate a hot spot diagram, the extracting module 303 may extract a pixel feature of the target object from the hot spot diagram, and the determining module 304 may determine an area where the target object is located according to the pixel feature. According to the scheme, the hot spot diagram of the target object can be generated based on the image to be detected, the area where the target object is located according to the pixel characteristics in the hot spot diagram, and the image detection efficiency and accuracy are improved.

Accordingly, an embodiment of the present application also provides a mobile terminal, and as shown in fig. 8, the mobile terminal may include Radio Frequency (RF) circuit 601, a memory 602 including one or more computer-readable storage media, an input unit 603, a display unit 604, a sensor 605, an audio circuit 606, a Wireless Fidelity (WiFi) module 607, a processor 608 including one or more processing cores, and a power supply 609. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 8 is not intended to be limiting of mobile terminals and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the RF circuit 601 may be used for receiving and transmitting signals during a message transmission or communication process, and in particular, for receiving downlink messages from a base station and then processing the received downlink messages by one or more processors 608; in addition, data relating to uplink is transmitted to the base station. In general, the RF circuit 601 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 601 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), and the like.

The memory 602 may be used to store software programs and modules, and the processor 608 executes various functional applications and data processing by operating the software programs and modules stored in the memory 602. The memory 602 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the mobile terminal, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 602 may also include a memory controller to provide the processor 608 and the input unit 603 access to the memory 602.

The input unit 603 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, input unit 603 may include a touch-sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations by a user (e.g., operations by a user on or near the touch-sensitive surface using a finger, a stylus, or any other suitable object or attachment) thereon or nearby, and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 608, and can receive and execute commands sent by the processor 608. In addition, touch sensitive surfaces may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. The input unit 603 may include other input devices in addition to the touch-sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 604 may be used to display information input by or provided to the user and various graphical user interfaces of the mobile terminal, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 604 may include a Display panel, and optionally, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay the display panel, and when a touch operation is detected on or near the touch-sensitive surface, the touch operation is transmitted to the processor 608 to determine the type of touch event, and the processor 608 then provides a corresponding visual output on the display panel according to the type of touch event. Although in FIG. 8 the touch sensitive surface and the display panel are two separate components to implement input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement input and output functions.

The mobile terminal may also include at least one sensor 605, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile terminal, further description is omitted here.

Audio circuitry 606, a speaker, and a microphone may provide an audio interface between a user and the mobile terminal. The audio circuit 606 may transmit the electrical signal converted from the received audio data to a speaker, and convert the electrical signal into a sound signal for output; on the other hand, the microphone converts the collected sound signal into an electric signal, which is received by the audio circuit 606 and converted into audio data, which is then processed by the audio data output processor 608, and then transmitted to, for example, another mobile terminal via the RF circuit 601, or the audio data is output to the memory 602 for further processing. The audio circuit 606 may also include an earbud jack to provide communication of a peripheral headset with the mobile terminal.

WiFi belongs to short distance wireless transmission technology, and the mobile terminal can help the user to send and receive e-mail, browse web page and access streaming media etc. through WiFi module 607, it provides wireless broadband internet access for the user. Although fig. 8 shows the WiFi module 607, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 608 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 602 and calling data stored in the memory 602, thereby performing overall monitoring of the mobile terminal. Optionally, processor 608 may include one or more processing cores; preferably, the processor 608 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 608.

The mobile terminal also includes a power supply 609 (e.g., a battery) for powering the various components, which may be logically coupled to the processor 608 via a power management system that may be configured to manage charging, discharging, and power consumption. The power supply 609 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown, the mobile terminal may further include a camera, a bluetooth module, and the like, which will not be described herein. Specifically, in this embodiment, the processor 608 in the mobile terminal loads the executable file corresponding to the process of one or more application programs into the memory 602 according to the following instructions, and the processor 608 runs the application program stored in the memory 602, thereby implementing various functions:

acquiring an image to be detected containing a target object; detecting an image to be detected to generate a heat point diagram; extracting pixel features of the target object from the hot spot graph; and determining the area where the target object is located according to the pixel characteristics.

In some embodiments, when generating the hotspot graph by detecting the image to be detected, the processor 608 further performs: extracting the characteristics of the image to be detected through the trained detection model to obtain characteristic information; and generating a heat point diagram according to the characteristic information.

In some embodiments, when feature information is obtained by performing feature extraction on the image to be detected through the trained detection model, the processor 608 further performs: determining a preset receptive field range by taking each pixel point in an image to be detected as a center; and calculating pixels in a preset receptive field range based on the trained detection model to obtain characteristic information.

In some embodiments, before feature extraction of the image to be detected by the trained detection model, the processor 608 further performs: acquiring a training sample image and a target heat point diagram corresponding to the training sample image; preprocessing a training sample image to obtain a preprocessed training sample image; generating a sample hot spot diagram based on the preprocessed training sample image through a preset detection model; and converging the target heat point diagram and the sample heat point diagram through a preset loss function so as to adjust the parameters of the detection model and obtain the trained detection model.

In some embodiments, before converging the target hotspot graph and the sample hotspot graph by the preset loss function, the processor 608 further performs: acquiring each pixel point in the sample hot spot diagram and each pixel point in the target hot spot diagram; and constructing a preset loss function according to the difference value between each pixel point in the sample heat point diagram and each pixel point in the target heat point diagram.

In some embodiments, the target object includes a phone number, and after determining the area where the target object is located according to the pixel characteristics, the processor 608 further performs: telephone numbers within the area are identified.

In some embodiments, the hot-spot map includes four hot-spot maps, and when each hot-spot map includes a vertex, the processor 608 further performs: extracting the pixel with the maximum pixel value from each hot spot diagram to obtain the pixel characteristics of four vertexes; in determining the region of the target object according to the pixel characteristics, the processor 608 further performs: and determining the position of the target object externally connected with the text box according to the pixel characteristics of the four vertexes to obtain the area of the target object.

In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may refer to the above detailed description of the image detection method, and are not described herein again.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by a computer program, which may be stored in a computer-readable storage medium and loaded and executed by a processor, or by related hardware controlled by the computer program.

To this end, the present application provides a storage medium, in which a computer program is stored, where the computer program can be loaded by a processor to execute any one of the image detection methods provided by the present application. For example, the computer program is loaded by a processor and may perform the following steps:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute any image detection method provided in the embodiments of the present application, beneficial effects that can be achieved by any image detection method provided in the embodiments of the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.

The image detection method, the image detection device, the mobile terminal and the storage medium provided by the embodiments of the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the description of the above embodiments is only used to help understanding the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image detection method, comprising:

acquiring an image to be detected containing a target object;

detecting the image to be detected to generate a heat point diagram;

extracting pixel features of the target object from the hotspot graph;

2. The image detection method according to claim 1, wherein the detecting the image to be detected and generating the heat map comprises:

3. The image detection method of claim 2, wherein the feature extraction of the image to be detected by the trained detection model to obtain feature information comprises:

4. The image detection method according to claim 2, wherein before the feature extraction of the image to be detected by the trained detection model, the method further comprises:

5. The image detection method according to claim 4, wherein before the converging the target hotspot graph and the sample hotspot graph through a preset loss function, the method further comprises:

6. The image detection method according to any one of claims 1 to 5, wherein the target object includes a telephone number, and after the area where the target object is located is determined according to the pixel feature, the method further includes:

and identifying the telephone number in the area.

7. The image detection method according to any one of claims 1 to 5, wherein the hotspot graph comprises four pieces, each hotspot graph comprises a vertex, and the extracting the pixel feature of the target object from the hotspot graph comprises:

8. An image detection apparatus, characterized by comprising:

9. A mobile terminal, characterized by comprising a processor and a memory, the memory having stored therein a computer program, the processor executing the image detection method according to any one of claims 1 to 7 when calling the computer program in the memory.

10. A storage medium for storing a computer program which is loaded by a processor to perform the image detection method according to any one of claims 1 to 7.