CN113011409A - Image identification method and device, electronic equipment and storage medium - Google Patents

Image identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113011409A
CN113011409A CN202110359351.1A CN202110359351A CN113011409A CN 113011409 A CN113011409 A CN 113011409A CN 202110359351 A CN202110359351 A CN 202110359351A CN 113011409 A CN113011409 A CN 113011409A
Authority
CN
China
Prior art keywords
target
image
target detection
image segmentation
detection frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110359351.1A
Other languages
Chinese (zh)
Inventor
单海蛟
何小坤
熊泽法
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202110359351.1A priority Critical patent/CN113011409A/en
Publication of CN113011409A publication Critical patent/CN113011409A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image recognition method, an image recognition device, an electronic device and a storage medium, wherein a target image is obtained, a target detection frame and an image segmentation result graph corresponding to the target detection frame are obtained by utilizing an image segmentation model trained in advance, the target image is cut according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to a target object, the content of the target object in the target area is determined by utilizing an optical character recognition algorithm to obtain a recognition result, the target detection algorithm and the image segmentation algorithm are combined, the target image can be accurately cut to obtain a single target area, the interference of other text information is effectively reduced, the target area is accurately recognized, and the accuracy of image recognition is improved.

Description

Image identification method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an image recognition method and apparatus, an electronic device, and a storage medium.
Background
Nowadays, with the rapid development of artificial intelligence, professional answers are obtained by searching for topics from images containing topic information, and the method becomes a popular learning method.
The method for searching topics based on an image is mainly a method based on target detection, and comprises the steps of performing frame selection on each topic contained in the image by using a rectangular frame, cutting multiple topics contained in the image according to a frame selection result to obtain a topic area containing single topic information, performing character recognition according to the cut topic area, and performing searching according to recognition content to obtain an accurate search result.
However, the topic contents contained in the obtained image often have the phenomena of inclination and distortion, the topic areas obtained by the prior art are difficult to accurately distinguish the boundaries of each topic, especially when the image contains a plurality of topic areas, the areas selected by the topic frame are easy to overlap, and when the image is cut according to the frame selection result, the image contains other topic information, so that an interference text appears before, behind or in the middle of a single topic identification result, an accurate identification result cannot be obtained, and the accuracy rate of searching is low.
Disclosure of Invention
To solve the technical problem or at least partially solve the technical problem, the present disclosure provides an image recognition method, an apparatus, an electronic device, and a storage medium.
In a first aspect, an embodiment of the present disclosure provides an image recognition method, including:
acquiring a target image, wherein the target image comprises one or more target objects;
according to the target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model;
according to the target detection frame and the image segmentation result graph, cutting the target image to obtain a target area corresponding to the target object;
and determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
Optionally, the cutting the target image according to the target detection frame and the image segmentation result map to obtain a target region corresponding to the target object includes:
cutting the target image according to the target detection frame to obtain a first target image;
determining the maximum connected region of the target object segmented from the image segmentation result graph according to the image segmentation result graph;
obtaining a minimum tilt matrix of the outline according to the pixel points of the outline of the maximum communication area;
correcting the first target image according to the inclination angle of the minimum inclination matrix;
and cutting the corrected first target image according to the width and the height of the minimum tilt matrix to obtain a target area corresponding to the target object.
Optionally, before obtaining, according to the target image, a target detection frame corresponding to the target object and an image segmentation result map corresponding to the target detection frame by using an image segmentation model trained in advance, the method further includes:
inputting the target image into a pre-trained angle classification model to obtain an angle classification result of the target image, and rotating the target image according to the angle classification result;
and according to the rotated target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
Optionally, the image segmentation model includes a target detection layer and an image segmentation layer, the target detection layer is configured to perform feature extraction and target detection on the target image to obtain target feature information and a target detection frame, and the image segmentation layer is configured to obtain the image segmentation result map according to the target feature information and the target detection frame.
Optionally, the image segmentation layer is configured to obtain the image segmentation result graph according to the target feature information and the target detection frame, and includes:
the image segmentation layer is used for determining first target feature information corresponding to the target detection frame in the target feature information, calculating a probability value of each pixel point in the first target feature information, and obtaining the image segmentation result graph according to the probability value of each pixel point.
Optionally, before the acquiring the target image, the method further includes generating an image segmentation model, including:
acquiring a first sample image and a first target detection frame containing a target object in the first sample image;
according to the first sample image and the first target detection frame, performing model training on a target detection layer in the image segmentation model to obtain a first target detection layer;
acquiring a second sample image and a second target segmentation map containing a target object in the second sample image;
and performing model training on a first target detection layer and an image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation image.
Optionally, the performing model training on the target detection layer in the image segmentation model according to the first sample image and the first target detection frame to obtain a first target detection layer includes:
inputting the first sample image into a target detection layer in the image segmentation model to obtain a first predicted target detection frame;
determining a first loss function according to the first predicted target detection box and the first target detection box;
and updating the parameters of the target detection layer according to the first loss function to obtain a first target detection layer.
Optionally, the performing model training on the first target detection layer and the image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation map includes:
inputting the second sample image into the first target detection layer to obtain second feature information and a second prediction target detection frame corresponding to the second sample image;
inputting the second feature information and the second prediction target detection frame into the image segmentation layer in the image segmentation model to obtain a second prediction target segmentation map;
determining a second loss function according to the second feature information, the second predicted target detection frame, the second predicted target segmentation map and the second target segmentation map;
and updating the parameters of the first target detection layer and the parameters of the image segmentation layer according to the second loss function.
In a second aspect, an embodiment of the present disclosure provides an image recognition apparatus, including:
the device comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring a target image, and the target image comprises one or more target objects.
And the image segmentation module is used for obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by utilizing a pre-trained image segmentation model according to the target image.
And the image cutting module is used for cutting the target image according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to the target object.
And the image recognition module is used for determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
In a third aspect, an embodiment of the present disclosure provides an electronic device, which includes a memory; a processor; and a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method as described above.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above-described method.
The embodiment of the disclosure provides an image recognition method, an image recognition device, an electronic device and a storage medium, wherein a target image is obtained, a pre-trained image segmentation model is utilized to obtain a target detection frame and an image segmentation result graph corresponding to the target detection frame, the target image is cut according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to a target object, the content of the target object in the target area is determined by an optical character recognition algorithm to obtain a recognition result, the target detection algorithm and the image segmentation algorithm are combined to cut the target image, the boundary of each topic can be accurately distinguished to obtain the target area containing a single topic, the phenomenon that the target area of each topic is overlapped is effectively reduced, the target area is accurately recognized, and a text which is interfered when the single topic is recognized is avoided, the accuracy of image recognition is effectively improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a network architecture diagram of a target detection algorithm provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a block selection result based on a target detection algorithm according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an application scenario provided by the embodiment of the present disclosure;
fig. 4 is a schematic diagram of an image segmentation model training method provided in an embodiment of the present disclosure;
FIG. 5 is a diagram of a network structure of an image segmentation model provided in an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an image segmentation model training method provided in an embodiment of the present disclosure;
fig. 7 is a flowchart of an image recognition method according to an embodiment of the present disclosure;
fig. 8 is a flowchart of an image recognition method provided by an embodiment of the present disclosure;
fig. 9 is a schematic diagram of an image recognition method according to an embodiment of the present disclosure;
fig. 10 is a schematic diagram of a framing result of an image recognition method according to an embodiment of the disclosure;
fig. 11 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The existing target detection algorithm is mainly a single-stage target detection method, such as a real-time target detection algorithm (You Only Look one/Yolo), which can quickly locate a target, but is difficult to accurately segment a title boundary, and the title content in an actually acquired image is easy to have the phenomena of inclination and distortion. Therefore, the existing target detection algorithm is difficult to obtain a single topic area, and the accuracy rate of the search result is low.
The current Yolo series detection algorithm is a target detection network with the most balanced speed and precision, and has evolved from Yolo v1 to Yolo v 5. The speed can reach 140 FPS at the fastest speed. Fig. 1 is a network structure diagram of a target detection algorithm provided in an embodiment of the present disclosure, and a Yolo network structure 100 is shown in fig. 1, and includes an input layer 110, a trunk layer 120, a sampling layer 130, and an output layer 140, where the input layer 110 performs data enhancement (Mosaic) and adaptive anchor frame calculation on data input to a network, and inputs a processed feature map to the trunk layer 120 to perform slicing operation and convolution network structure processing (adopting a Focus structure and a CSP structure), the sampling layer 130 performs up-sampling and down-sampling on the feature map output by the trunk layer 120, and the output layer 140 calculates an accuracy of an output detection result and outputs a detection result frame.
Taking the network structure of fig. 1 as an example, 3 features with different scales are obtained through the input layer 110, the trunk layer 120 and the sampling layer 130, assuming that the width and the height of the input image are 512, the sizes of the output features are 64, 32 and 16, and the number of channels of the 3 features with different scales are 256, 512 and 1024, respectively. And the features of 3 different scales are finally obtained by the convolution layer in the output layer 140 to obtain all possible detection results of 3 scales, wherein an output frame in the detection results comprises probability information and coordinate information, and coordinates are represented by (x, y, w, h) and are respectively the coordinates, width and height of the central point of the rectangular frame. All the test results are passed together through non-maximum suppression (NMS) in the output layer 140 to get the test result box with the highest accuracy.
However, due to the limitation of network design, the output coordinates of the Yolo detection model are a regular rectangle, and the subject is easy to tilt in the shooting scene. If a detected single topic information is represented by a regular rectangular box, there will be overlap between multiple topics, as shown in fig. 2, there will be an overlap phenomenon in the selection of each topic, and a cut topic area containing each topic will also contain other topic information, for example, the information of C is contained in the content of the topic B box in fig. 2. The recognition result of the optical character recognition algorithm (OCR) will also contain texts of other topics, and when the texts of other topics are relatively large, the recognition result of the current topic is likely to be affected.
Specifically, an image recognition method may be performed by a terminal or a server. Specifically, the terminal or the server may perform target detection and image segmentation on the target object in the target image through the image segmentation model. The execution subject of the training method of the image segmentation model and the execution subject of the image recognition method may be the same or different.
For example, in one application scenario, as shown in FIG. 3, server 320 trains an image segmentation model. The terminal 310 obtains the trained image segmentation model from the server 320, and the terminal 310 performs target detection and image segmentation on the target object in the target image through the trained image segmentation model. The target image may be captured by the terminal 310. Alternatively, the target image is obtained by the terminal 310 from another device. Still alternatively, the target image is an image obtained by image processing of a preset image by the terminal 310, where the preset image may be obtained by shooting by the terminal 310, or the preset image may be obtained by the terminal 310 from another device. Here, the other devices are not particularly limited.
In another application scenario, the server 320 trains the image segmentation model. Further, the server 320 performs target detection and image segmentation on the target object in the target image through the trained image segmentation model. The manner in which the server 320 acquires the target image may be similar to the manner in which the terminal 310 acquires the target image as described above, and will not be described herein again.
In yet another application scenario, the terminal 310 trains an image segmentation model. Further, the terminal 310 performs target detection and image segmentation on the target object in the target image through the trained image segmentation model.
It can be understood that the image segmentation model training method and the image recognition method provided by the embodiments of the present disclosure are not limited to the several possible scenarios described above. Since the trained image segmentation model can be applied to the image recognition method described below, the image segmentation model training method can be described below before the image recognition method is described.
Taking the example of training the image segmentation model by the server 320, a method for training the image segmentation model, that is, a training process of the image segmentation model, is described below. It is understood that the image segmentation model training method is equally applicable to the scenario in which the terminal 310 trains the image segmentation model.
Fig. 4 is a schematic diagram of an image segmentation model training method provided in the embodiment of the present disclosure. The image segmentation model includes a target detection layer and an image segmentation layer, as shown in fig. 5, the image segmentation model network structure 500 is shown in fig. 5, the target detection layer includes an input layer 110, a trunk layer 120, a sampling layer 130 and an output layer 140 shown in fig. 5, that is, the network structure of the above Yolo target detection algorithm, and is used to perform feature extraction and target detection on the target image to obtain target feature information and a target detection frame, and the image segmentation layer includes a segmentation layer 150 for obtaining the image segmentation result map according to the target feature information and the target detection frame. The method comprises the following steps as shown in fig. 4:
s410, acquiring a first sample image and a first target detection frame containing a target object in the first sample image.
In this embodiment, the first sample image may specifically refer to an image including one or more items of information, correspondingly, the target object may specifically refer to each item of information in the first sample image, and the first target detection frame may specifically refer to an image in which each item of information included in the first sample image is framed, where a framing result of each item is accurate on the first sample image.
Optionally, the first sample image may be an image that is shot by a terminal and contains one or more topic information, or an image that is obtained through operations such as screenshot and downloading, where the topic information may specifically refer to a mathematical topic, a language topic, or the like, and may also refer to content of an article, a newspaper, or a webpage that contains each segment of text information, and text recognition may be performed by using the image recognition method described in this embodiment, which is not limited herein.
And S420, performing model training on a target detection layer in the image segmentation model according to the first sample image and the first target detection frame to obtain a first target detection layer.
Understandably, according to the first sample image obtained in S410 and the first target detection frame serving as the label, the target detection layer in the image segmentation model is trained to obtain the trained first target detection layer, where the target detection layer may be constructed for the above-mentioned target detection network (Yolo).
Optionally, the specific implementation step of S420 includes: inputting the first sample image into a target detection layer in the image segmentation model to obtain a first predicted target detection frame; determining a first loss function according to the first predicted target detection box and the first target detection box; and updating the parameters of the target detection layer according to the first loss function to obtain a first target detection layer.
Understandably, a first sample image is input into a constructed target detection layer to obtain a first predicted target detection frame, wherein the first predicted target detection frame is an image obtained by framing a target object, namely topic information, by a target detection layer, namely a Yolo network, and then a first loss function of the target detection layer is determined according to the first predicted target detection frame and the first target detection frame serving as a label, wherein a specific calculation formula of the first loss function is not limited, and the first loss function can be selected by self according to the input image, parameters of the target detection layer are gradually updated according to the first loss function, so that the first target detection layer with updated network parameters is obtained, and the first target detection layer is stored.
S430, acquiring a second sample image and a second target segmentation map containing a target object in the second sample image.
In this embodiment, the second sample image may specifically be an image including one or more items of item information, the target object may specifically be each item of item information included in the second sample image, and the second target segmentation map may specifically be a result map obtained by segmenting the target object in a second target detection frame selected from one or more items of item information frames in the second sample image, that is, segmenting the target object in the second target detection frame as a foreground and separating the target object from a background, where the number of the obtained second target segmentation maps is the same as the number of items of item information included in the second sample image, that is, each item information in the second detection frame is segmented, where the segmentation result of the second target segmentation map is accurate.
S440, according to the second sample image and the second target segmentation image, performing model training on a first target detection layer and an image segmentation layer in the image segmentation model.
Optionally, the image segmentation layer is configured to determine first target feature information corresponding to the target detection box in the target feature information, calculate a probability value of each pixel in the first target feature information, and obtain the image segmentation result graph according to the probability value of each pixel.
Understandably, in the S440, the first target detection layer and the image segmentation layer obtained in the S420 in the image segmentation model are subjected to model training by using the second sample image and the second target segmentation map obtained in the S430, so as to generate the image segmentation model.
The image segmentation model training method provided by the embodiment of the disclosure trains a target detection layer in an image segmentation model by obtaining a first sample image and a first target detection frame corresponding to the first sample image, performs model training on the first target detection layer and the image segmentation layer in the image segmentation model by obtaining a second sample image and a second target segmentation image corresponding to the second sample image, obtains the image segmentation model, performs combined training on the target detection layer and the image segmentation layer by obtaining a new sample image by adopting the target detection layer trained in advance, so as to continuously converge a network layer, thereby not only further improving the training precision of the network model, accelerating the convergence speed of the model, maintaining the stability of the network training, but also enabling the accuracy of the model to be not lower than the accuracy of the original target detection layer after the image segmentation layer is subsequently added, therefore, the accuracy of the image segmentation model is effectively ensured.
Fig. 6 is a schematic diagram of an image segmentation model training method provided in an embodiment of the present disclosure; on the basis of the foregoing embodiment, optionally, the model training is performed on the first target detection layer and the image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation map, and the specific implementation step of fig. 6 includes:
s610, inputting the second sample image into the first target detection layer, and obtaining second feature information and a second prediction target detection frame corresponding to the second sample image.
It can be understood that the first target detection layer is a network layer trained by using the first sample image and updated to extract the second feature information, which is the image feature in the second sample image, and the second predicted target detection frame, and therefore, the accuracy of framing the question information in the second sample image in the second predicted target detection frame obtained by the trained first target detection layer is relatively high, and the training of the image segmentation layer in the image segmentation model is facilitated.
S620, inputting the second feature information and the second prediction target detection frame into the image segmentation layer in the image segmentation model to obtain a second prediction target segmentation map.
Understandably, the second feature information and the second predicted target detection box obtained in the step S610 are input into an image segmentation layer in the image segmentation model, wherein the image segmentation layer is used for determining target feature information corresponding to the second predicted target detection box in the second feature information, calculating a probability value of each pixel point in the target feature information, and obtaining a second predicted target segmentation graph according to the probability value of each pixel point.
Alternatively, as shown in fig. 5, the segmentation layer 150 may include a convolution layer, a regional feature aggregation layer, and an example segmentation layer, and the regional feature aggregation layer (ROI Align) may be used to determine target feature information corresponding to the second predicted target detection frame in the second feature information, and scale the target feature information, preferably, scale the target region to a fixed size of 7 × 7, calculate a probability value of each pixel point in the target region scaled to a size of 7 × 7 using an example segmentation layer (mask predictor), and perform image segmentation to obtain a second predicted target segmentation map, where the number of the second predicted target segmentation maps is the same as the number of target objects selected by the second predicted target detection frame.
S630, determining a second loss function according to the second feature information, the second predicted target detection frame, the second predicted target segmentation map and the second target segmentation map.
Understandably, a second loss function is determined according to the second characteristic information obtained in the step S620, the second predicted target detection box, the second predicted target segmentation map and the second target segmentation map.
And S640, updating the parameters of the first target detection layer and the parameters of the image segmentation layer according to the second loss function.
Understandably, according to the second loss function obtained in S630, the parameters of the first target detection layer and the parameters of the image segmentation layer are updated to obtain the image segmentation model, where the first target detection layer is the updated target detection layer.
According to the image segmentation model training method and device, the image segmentation model is trained through the first sample image, the parameters of the first target detection layer and the parameters of the image segmentation layer can be updated, and on the basis of training the target detection layer, through repeated iteration training, the parameters of the first target detection layer and the parameters of the image segmentation layer are updated simultaneously, so that the image segmentation model is higher and higher in accuracy, the convergence rate is higher and more stable, and the accuracy of the image segmentation model is improved.
Fig. 7 is a flowchart of an image recognition method according to an embodiment of the disclosure. For example, the image recognition method may be performed by the terminal 310. Similarly, the image recognition method may also be performed by the server 320. Specifically, the terminal 310 may obtain a trained image segmentation model from the server 320, and further, the terminal 310 performs image recognition on a target object in the target image according to the trained image segmentation model. Specifically, the method illustrated in fig. 7 includes the following steps:
s710, acquiring a target image, wherein the target image comprises one or more target objects.
Optionally, the target image may specifically refer to an image shot, captured, or received by a user, where the target image includes one or more target objects, the shot image including one or more items of information is used as the target image, and the target object may be content of each item included in the target image, for example, content corresponding to the item a, the item B, or the item C included in fig. 2.
Optionally, the size of the obtained target image is normalized, the height or the width of the target image is judged according to the set maximum side length, and the target image is scaled in an equal ratio according to the judgment result, so that the long edge of the image is smaller than or equal to the preset maximum side length.
S720, according to the target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
It can be understood that, by using the image segmentation model trained in the above embodiment, the target detection and the image segmentation are performed on the target image obtained in S710 to obtain a target detection frame and an image segmentation result map corresponding to the target detection frame, where the target detection frame selects all the topics included in the target image on a frame-by-frame basis on the target image, and the image segmentation result map performs image segmentation on the topic information framed by the target detection frame, that is, the size of the target detection frame of each topic is the same as the size of the image segmentation result map.
And S730, cutting the target image according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to the target object.
Understandably, the target detection frame and the image segmentation result graph obtained according to the S720 are obtained. And cutting the target image to obtain a target area corresponding to the target object. Optionally, the target image is cut according to the target detection frame to obtain the first target image; determining the maximum connected region of the target object segmented from the image segmentation result graph according to the image segmentation result graph; obtaining a minimum tilt matrix of the outline according to the pixel points of the outline of the maximum communication area; correcting the first target image according to the inclination angle of the minimum inclination matrix; and cutting the corrected first target image according to the width and the height of the minimum tilt matrix to obtain a target area corresponding to the target object.
Understandably, cutting the target image according to the coordinate information in the target detection frame to obtain a first target image corresponding to the target object in the target image; then, determining the foreground segmented from the image segmentation result image, namely the maximum connected region of the target object, wherein the number of the image segmentation result images obtained by the image segmentation model is the same as that of the first target image; obtaining a minimum inclination matrix of the outline according to the pixel points of the outline of the maximum communication area, wherein the minimum inclination matrix can be represented by a central point coordinate (x, y), the width and the height (width, height) of an inclined rectangle and an inclination angle theta, the inclination angle theta is an included angle formed by anticlockwise rotation of a horizontal shaft (x axis) and a first edge of the touched rectangle, the side length of the edge is width, and the side length of the other edge is height; and (4) turning the first target image to be positive according to the inclination angle theta of each minimum oblique rectangle, and cutting out the minimum title area in the first target image according to the central point and the width and the height of the minimum oblique rectangle.
It can be understood that, in this embodiment, the finally calculated minimum tilt rectangle information may also be output to the user, and the user cuts out a desired topic area according to the information of the minimum tilt matrix, and returns the final topic area to the terminal or the server for identification.
S740, determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
It can be understood that the content of the target object in the target region obtained in S730 is determined by using an optical character recognition algorithm to obtain a recognition result, and the obtained target region may be an image containing only a single title information, that is, only the content of one target object.
The image recognition method provided by the embodiment of the disclosure obtains a target image, obtains a target detection frame and an image segmentation result image corresponding to the target detection frame by using a pre-trained image segmentation model, cuts the target image according to the target detection frame and the image segmentation result image to obtain a target area corresponding to a target object, determines the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result, combines the target detection algorithm and the image segmentation algorithm to segment the target object in the target detection frame, thereby cutting the target image, accurately distinguishing the boundary of each topic to obtain a target area containing a single topic, effectively reducing the phenomenon that the target area of each topic is overlapped, thereby accurately recognizing the target area, the method avoids the text which is interfered when a single question is identified, and effectively improves the accuracy of image identification.
Fig. 8 is a flowchart of an image recognition method according to an embodiment of the present disclosure. In a basic implementation of the foregoing embodiment, optionally, before obtaining, according to the target image, a target detection frame corresponding to the target object and an image segmentation result map corresponding to the target detection frame by using an image segmentation model that is trained in advance, the method further includes:
and S810, inputting the target image into a pre-trained angle classification model to obtain an angle classification result of the target image, and rotating the target image according to the angle classification result.
Understandably, the acquired target image is input into a pre-trained angle classification model, the angle corresponding to the target object in the target image is judged, the target image is rotated according to the determined angle classification result, preferably, the angle type determined by the angle classification model can be the angle classification result in four directions of 0, 90, 180 and 270, and the target image picture can be corrected according to the angle classification result.
Optionally, a convolutional neural network may be selected to construct an angle classification model, and the constructed network is trained to obtain the angle classification model.
And S820, according to the rotated target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
It can be understood that, the step of performing target detection and image segmentation on the rotated target image obtained in S810 by using the image segmentation model trained in advance to obtain a target detection frame for framing one or more target objects in the rotated target image and an image segmentation result map corresponding to the target detection frame, and the subsequent step of performing clipping and identification on the target objects according to the target detection frame and the image segmentation result map is the same as that in the above embodiment, and is not repeated here.
According to the image identification method provided by the embodiment of the disclosure, the angle classification is carried out on the target image, the rotated target image is obtained according to the classification result, and the operations such as target framing, image segmentation and image identification are carried out according to the corrected target image, so that the accuracy of topic framing can be effectively improved, and the correctness of topic content identification in the cut target area is ensured.
Fig. 9 is a schematic diagram of an image recognition method according to an embodiment of the present disclosure, and based on the above embodiment, the results obtained in each step of the image recognition method are described with reference to fig. 9 as an example.
Taking fig. 2 as an example of the acquired target image, wherein the title information included in the title a, the title B, and the title C is taken as a target object, and taking the title B in fig. 2 as an example, each step will be described in detail.
According to the target image, a pre-trained image segmentation model is utilized to obtain a target detection frame and an image segmentation result graph B corresponding to the target detection frame, wherein the target detection frame can be as shown in FIG. 2, all question information included in the target image is framed and selected to obtain the target detection frame, and an image segmentation result graph corresponding to each question information frame in the target detection frame is also obtained, for example, FIG. 2 includes 3 question detection frames, and after image segmentation processing, the image segmentation result graph includes 3 image segmentation result graphs.
Cutting the target image according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to the target object, which may specifically include:
in fig. 2, cutting is performed according to the result of framing each target object, so as to obtain a first target image 910 containing a single target object, i.e. title information;
determining the maximum connected region of the target object, namely, the topic B, segmented in the image segmentation result graph 920 according to the image segmentation result graph 920;
obtaining a minimum tilt matrix 940 of the outline according to the pixel points of the outline 930 of the maximum connected region, wherein the gray line in 930 is the outline of the maximum connected region; in the minimum tilt matrix 940 of the profile, θ is a tilt angle, a black dot represents a central point of the minimum tilt matrix, and a subject frame selection is performed in the obtained target image according to the minimum tilt matrix information, so that a target frame selection result as shown in fig. 10 can be obtained, and thus, compared with the target frame selection result shown in fig. 2, the frame selection result of the subject information is more accurate;
and correcting the corresponding first target image 910 according to the inclination angle of the minimum inclination matrix to obtain 950 an image obtained by correcting the first target image 910 according to the inclination angle, and cutting out the minimum title region according to the center point and the width and height of the minimum inclination matrix, namely cutting 950 according to the center point and the width and height of the minimum inclination matrix to obtain 960 the target region corresponding to the target object.
The content of the target object in the target area 960 is determined by using an optical character recognition algorithm, and a recognition result 970, that is, the topic information "B.
According to the image identification method provided by the embodiment of the disclosure, the target detection algorithm and the image segmentation algorithm are combined to cut the target image, so that the boundaries of each topic can be accurately distinguished, the target area containing a single topic is obtained, the phenomenon that the target areas of each topic are overlapped is effectively reduced, the target area is accurately identified, the text which is interfered when the single topic is identified is avoided, and the accuracy of image identification is effectively improved.
Fig. 11 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present disclosure. The apparatus 1100 comprises an acquisition module 1101, an image segmentation module 1102, an image cropping module 1103, and an image recognition module 1104.
An obtaining module 1101 is configured to obtain a target image, where the target image includes one or more target objects.
And an image segmentation module 1102, configured to obtain, according to the target image, a target detection frame and an image segmentation result map corresponding to the target detection frame by using a pre-trained image segmentation model.
An image clipping module 1103, configured to clip the target image according to the target detection frame and the image segmentation result map, so as to obtain a target area corresponding to the target object.
And the image recognition module 1104 is configured to determine the content of the target object in the target area by using an optical character recognition algorithm, so as to obtain a recognition result.
Optionally, the image recognition apparatus 1100 further includes an image rotation module, where the image rotation module is configured to input the target image into a pre-trained angle classification model to obtain an angle classification result of the target image, and rotate the target image according to the angle classification result; and according to the rotated target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
Optionally, the image cropping module 1103 specifically includes: cutting the target image according to the target detection frame to obtain the first target image; determining the maximum connected region of the target object segmented from the image segmentation result graph according to the image segmentation result graph; obtaining a minimum tilt matrix of the outline according to the pixel points of the outline of the maximum communication area; and correcting the minimum tilt matrix, and cutting the first target image according to the corrected minimum tilt matrix to obtain a target area corresponding to the target object.
Understandably, the image clipping module 1103 is connected to the image segmentation module 1102 and the acquisition module 1101, and clips the target image obtained by the acquisition module 1101 according to the target detection frame obtained by the image segmentation module 1102 to obtain a first target image, determines the minimum tilt matrix according to the image segmentation result graph obtained by the image segmentation module 1102, and clips the first target image according to the corrected minimum tilt matrix to obtain a target area corresponding to the target object.
Fig. 11 is an image recognition apparatus provided in an embodiment of the present disclosure, which can be used to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, and are not described herein again.
Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device 1200 may be a server or a terminal as described above. The electronic device provided in the embodiment of the present disclosure may execute the processing procedure provided in the embodiment of the image recognition method, as shown in fig. 12, the electronic device 1200 includes: memory 1210, processor 1220, and communications interface 1230; wherein the computer program is stored in the memory 1210 and is configured to be executed by the processor 1220 for performing the image recognition method as described above.
In addition, the embodiments of the present disclosure also provide a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the image recognition method as described above is implemented.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. An image recognition method, comprising:
acquiring a target image, wherein the target image comprises one or more target objects;
according to the target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model;
according to the target detection frame and the image segmentation result graph, cutting the target image to obtain a target area corresponding to the target object;
and determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
2. The method according to claim 1, wherein the cropping the target image according to the target detection frame and the image segmentation result map to obtain a target region corresponding to the target object comprises:
cutting the target image according to the target detection frame to obtain a first target image;
determining the maximum connected region of the target object segmented from the image segmentation result graph according to the image segmentation result graph;
obtaining a minimum tilt matrix of the outline according to the pixel points of the outline of the maximum communication area;
correcting the first target image according to the inclination angle of the minimum inclination matrix;
and cutting the corrected first target image according to the width and the height of the minimum tilt matrix to obtain a target area corresponding to the target object.
3. The method according to claim 1, before obtaining, from the target image, a target detection frame corresponding to the target object and an image segmentation result map corresponding to the target detection frame by using a pre-trained image segmentation model, further comprising:
inputting the target image into a pre-trained angle classification model to obtain an angle classification result of the target image, and rotating the target image according to the angle classification result;
and according to the rotated target image, obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by using a pre-trained image segmentation model.
4. The method according to claim 1, wherein the image segmentation model includes a target detection layer and an image segmentation layer, the target detection layer is configured to perform feature extraction and target detection on the target image to obtain target feature information and a target detection frame, and the image segmentation layer is configured to obtain the image segmentation result map according to the target feature information and the target detection frame.
5. The method according to claim 4, wherein the image segmentation layer is configured to obtain the image segmentation result map according to the target feature information and the target detection frame, and includes:
the image segmentation layer is used for determining first target feature information corresponding to the target detection frame in the target feature information, calculating a probability value of each pixel point in the first target feature information, and obtaining the image segmentation result graph according to the probability value of each pixel point.
6. The method of claim 1, wherein prior to acquiring the target image, the method further comprises generating an image segmentation model comprising:
acquiring a first sample image and a first target detection frame containing a target object in the first sample image;
according to the first sample image and the first target detection frame, performing model training on a target detection layer in the image segmentation model to obtain a first target detection layer;
acquiring a second sample image and a second target segmentation map containing a target object in the second sample image;
and performing model training on a first target detection layer and an image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation image.
7. The method of claim 6, wherein the performing model training on the target detection layer in the image segmentation model according to the first sample image and the first target detection frame to obtain a first target detection layer comprises:
inputting the first sample image into a target detection layer in the image segmentation model to obtain a first predicted target detection frame;
determining a first loss function according to the first predicted target detection box and the first target detection box;
and updating the parameters of the target detection layer according to the first loss function to obtain a first target detection layer.
8. The method of claim 6, wherein the model training of the first target detection layer and the image segmentation layer in the image segmentation model according to the second sample image and the second target segmentation map comprises:
inputting the second sample image into the first target detection layer to obtain second feature information and a second prediction target detection frame corresponding to the second sample image;
inputting the second feature information and the second prediction target detection frame into an image segmentation layer in the image segmentation model to obtain a second prediction target segmentation map;
determining a second loss function according to the second feature information, the second predicted target detection frame, the second predicted target segmentation map and the second target segmentation map;
and updating the parameters of the first target detection layer and the parameters of the image segmentation layer according to the second loss function.
9. An image recognition apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target image which comprises one or more target objects;
the image segmentation module is used for obtaining a target detection frame and an image segmentation result graph corresponding to the target detection frame by utilizing a pre-trained image segmentation model according to the target image;
the image cutting module is used for cutting the target image according to the target detection frame and the image segmentation result graph to obtain a target area corresponding to the target object;
and the image recognition module is used for determining the content of the target object in the target area by using an optical character recognition algorithm to obtain a recognition result.
10. An electronic device, comprising:
a memory;
a processor; and
a computer program;
wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202110359351.1A 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium Pending CN113011409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110359351.1A CN113011409A (en) 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110359351.1A CN113011409A (en) 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113011409A true CN113011409A (en) 2021-06-22

Family

ID=76387941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110359351.1A Pending CN113011409A (en) 2021-04-02 2021-04-02 Image identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113011409A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673576A (en) * 2021-07-26 2021-11-19 浙江大华技术股份有限公司 Image detection method, terminal and computer readable storage medium thereof
CN116664822A (en) * 2023-06-01 2023-08-29 广州阅数科技有限公司 Image target detection method based on automatic graph cutting algorithm
WO2024066375A1 (en) * 2022-09-29 2024-04-04 青岛海尔空调器有限总公司 Method and apparatus used by air conditioner for monitoring, and air conditioner and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301386B1 (en) * 1998-12-09 2001-10-09 Ncr Corporation Methods and apparatus for gray image based text identification
WO2017162069A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Image text identification method and apparatus
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN110969129A (en) * 2019-12-03 2020-04-07 山东浪潮人工智能研究院有限公司 End-to-end tax bill text detection and identification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301386B1 (en) * 1998-12-09 2001-10-09 Ncr Corporation Methods and apparatus for gray image based text identification
WO2017162069A1 (en) * 2016-03-25 2017-09-28 阿里巴巴集团控股有限公司 Image text identification method and apparatus
CN107609549A (en) * 2017-09-20 2018-01-19 北京工业大学 The Method for text detection of certificate image under a kind of natural scene
CN109697440A (en) * 2018-12-10 2019-04-30 浙江工业大学 A kind of ID card information extracting method
CN110969129A (en) * 2019-12-03 2020-04-07 山东浪潮人工智能研究院有限公司 End-to-end tax bill text detection and identification method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673576A (en) * 2021-07-26 2021-11-19 浙江大华技术股份有限公司 Image detection method, terminal and computer readable storage medium thereof
WO2024066375A1 (en) * 2022-09-29 2024-04-04 青岛海尔空调器有限总公司 Method and apparatus used by air conditioner for monitoring, and air conditioner and storage medium
CN116664822A (en) * 2023-06-01 2023-08-29 广州阅数科技有限公司 Image target detection method based on automatic graph cutting algorithm

Similar Documents

Publication Publication Date Title
CN109146892B (en) Image clipping method and device based on aesthetics
US20190188528A1 (en) Text detection method and apparatus, and storage medium
CN113011409A (en) Image identification method and device, electronic equipment and storage medium
US9235759B2 (en) Detecting text using stroke width based text detection
CN107220640B (en) Character recognition method, character recognition device, computer equipment and computer-readable storage medium
KR101479387B1 (en) Methods and apparatuses for face detection
CN111797821B (en) Text detection method and device, electronic equipment and computer storage medium
CN113313083B (en) Text detection method and device
RU2697649C1 (en) Methods and systems of document segmentation
CN110909724B (en) Thumbnail generation method of multi-target image
CN110460838B (en) Lens switching detection method and device and computer equipment
CN113850238B (en) Document detection method and device, electronic equipment and storage medium
JP2017211939A (en) Generation device, generation method, and generation program
CN114511041B (en) Model training method, image processing method, device, equipment and storage medium
CN112949649B (en) Text image identification method and device and computing equipment
CN111652140A (en) Method, device, equipment and medium for accurately segmenting questions based on deep learning
CN111652142A (en) Topic segmentation method, device, equipment and medium based on deep learning
CN108304840B (en) Image data processing method and device
CN114283431B (en) Text detection method based on differentiable binarization
CN113657370B (en) Character recognition method and related equipment thereof
CN113657369B (en) Character recognition method and related equipment thereof
CN111144156B (en) Image data processing method and related device
US20170091760A1 (en) Device and method for currency conversion
CN112434696A (en) Text direction correction method, device, equipment and storage medium
CN116777734A (en) Method, device, equipment and storage medium for generating background penetration image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination