CN111783639A - Image detection method and device, electronic equipment and readable storage medium - Google Patents

Image detection method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN111783639A
CN111783639A CN202010613093.0A CN202010613093A CN111783639A CN 111783639 A CN111783639 A CN 111783639A CN 202010613093 A CN202010613093 A CN 202010613093A CN 111783639 A CN111783639 A CN 111783639A
Authority
CN
China
Prior art keywords
image
candidate
detected
mark
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010613093.0A
Other languages
Chinese (zh)
Inventor
戴兵
叶芷
李扬曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010613093.0A priority Critical patent/CN111783639A/en
Publication of CN111783639A publication Critical patent/CN111783639A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image detection method, an image detection device, electronic equipment and a readable storage medium, and relates to a computer vision technology in computer technology. The specific implementation scheme is as follows: acquiring an image to be detected; obtaining at least one candidate area in the image to be detected based on a pre-trained mark detection network according to the image to be detected, wherein the candidate area comprises a preset mark; according to the at least one candidate region, based on a pre-trained mark recognition network, obtaining the characteristics of the at least one candidate region; and determining the category of the image to be detected according to the characteristics of the at least one candidate region. Compared with a scheme based on template comparison in the prior art, the scheme has higher detection efficiency and accuracy.

Description

Image detection method and device, electronic equipment and readable storage medium
Technical Field
The present disclosure relates to computer vision technologies, and in particular, to an image detection method and apparatus, an electronic device, and a readable storage medium.
Background
With the continuous development of the internet, more and more pictures and videos appear on the internet. Among these huge amounts of pictures and videos, some pictures and videos may have to pay special attention due to their high sensitivity. For example, some videos may contain a violent terrorism picture, and if the videos are allowed to flow on the internet, the security of the network environment is greatly affected, so that it is necessary to review and classify pictures and videos entering the internet. Generally, one important criterion for distinguishing whether a picture or video belongs to the category of riot or terrorist is to determine whether the picture or video contains a sensitive mark, such as a specific flag or mark of a specific organization.
Disclosure of Invention
The application provides an image detection method, an image detection device, an electronic device and a readable storage medium.
According to a first aspect of the present application, there is provided an image detection method, comprising:
acquiring an image to be detected;
obtaining at least one candidate area in the image to be detected based on a pre-trained mark detection network according to the image to be detected, wherein the candidate area comprises a preset mark;
according to the at least one candidate region, based on a pre-trained mark recognition network, obtaining the characteristics of the at least one candidate region;
and determining the category of the image to be detected according to the characteristics of the at least one candidate region.
According to a second aspect of the present application, there is provided an image detection apparatus comprising:
the acquisition module is used for acquiring an image to be detected;
an obtaining module, configured to obtain at least one candidate region in the image to be detected based on a pre-trained marker detection network according to the image to be detected, where the candidate region includes a preset marker; according to the at least one candidate region, based on a pre-trained mark recognition network, obtaining the characteristics of the at least one candidate region;
and the determining module is used for determining the category of the image to be detected according to the characteristics of the at least one candidate region.
According to a third aspect of the present application, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.
According to the scheme of the embodiment of the application, firstly, an image to be detected is obtained, then at least one candidate area in the image to be detected is obtained based on a pre-trained mark detection network according to the image to be detected, the candidate area contains a preset mark, at least one candidate area is obtained based on a pre-trained mark identification network according to the at least one candidate area, the characteristics of the at least one candidate area are obtained, and then the category of the image to be detected is determined according to the characteristics of the at least one candidate area. Because two independent convolutional neural networks, namely the mark detection network and the mark identification network, are adopted, the picture to be detected is respectively subjected to coarse screening and fine screening, firstly, a candidate area containing a preset mark is obtained through the coarse screening of the mark detection network, then, the characteristics are further extracted through the mark identification network, and the category of the picture to be detected is further determined. Therefore, compared with a scheme based on template comparison in the prior art, the detection efficiency and accuracy of the scheme are higher.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
fig. 1 is an exemplary scene schematic diagram to which an image detection method provided in an embodiment of the present application is applicable;
FIG. 2 is a schematic flowchart of an image detection method provided in an embodiment of the present application;
FIG. 3 is a schematic flowchart of an image detection method provided in an embodiment of the present application;
fig. 4 is a block configuration diagram of an image detection apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of an electronic device for implementing the image detection method according to the embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the purpose of understanding, which are to be regarded as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the prior art, when a sensitive mark is detected, the detection is mainly based on traditional image feature extraction and template matching. Specifically, feature analysis may be performed on the sensitive marker, for example, conventional features of the sensitive marker, such as Histogram of Oriented Gradient (HOG) features, may be extracted, and a feature distribution template of the sensitive marker may be established according to a specific feature distribution of the sensitive marker. Then, during testing, the HOG features of the image to be tested can be extracted, and the feature distribution of the image to be tested is compared with the feature distribution of the sensitive mark, so that whether the image to be tested contains the sensitive mark or not is judged.
However, the prior art detection schemes have several disadvantages: on one hand, as described above, in image detection, conventional artificially designed features (such as HOG features) are generally used, and these artificially designed features often do not have good universality when facing mass data, and it is difficult to describe data better, so the accuracy is low; on the other hand, the method based on template matching needs to extract the features of the picture to be detected and then compare the features with the templates one by one, so that the efficiency is low, and with social progress, the number of sensitive marks is continuously increased, so that more and more templates need to be compared, and the image detection efficiency is more influenced.
In view of the foregoing problems, the present application provides an image detection method, an image detection apparatus, an electronic device, and a readable storage medium, which are applied to the field of computer vision in the field of computer technology to more accurately and efficiently detect a preset mark in an image. The scheme of the embodiment of the application is that a preset mark in an image or a video is detected and identified based on a Convolutional Neural Network (CNN) of deep learning. For video detection, a segment of video can be input, key frame extraction is performed on the video, detection and identification of a preset mark are performed on each key frame image after extraction, the detection and identification are two independent CNN networks, then the detection and identification result of each frame is obtained, and then whether the video belongs to a specific classification is judged according to a specific strategy, for example, whether the video is a violent video or not is judged. The technical solution of the present application will be described below with reference to specific examples.
Fig. 1 is an exemplary scene schematic diagram applicable to the image detection method provided in the embodiment of the present application, and as shown in fig. 1, the method may be applied to a scene in which a server checks a picture or a video uploaded by a user when the user uploads the picture or the video. The scenario involves a terminal device and a server used by a user. The user uses some terminal application programs to upload pictures or videos to the server through the internet by operating the terminal device, and since some pictures or videos may contain sensitive content, the streaming of the pictures or videos may be not beneficial to network security, the server needs to detect the content of the pictures or videos to filter out the pictures or videos with the sensitive marks.
In another application scenario, a terminal device operated by a user may detect a picture or video to be uploaded, so as to prevent the user from uploading the picture or video with the sensitive flag.
It should be noted that the above description only takes the sensitive mark as an example, but the method of the embodiment of the present application is not limited to the detection of the sensitive mark. In fact, the method of the embodiment of the present application is also applicable to the detection of the mark of interest, such as the detection and recognition of objects such as a specific table logo, a specific flag, and the like.
Fig. 2 is a schematic flowchart of an image detection method according to an embodiment of the present application. As shown in fig. 2, the main body of the method may be the terminal device shown in fig. 1, or may be a server. The following embodiments of the present application take an execution subject as an example of a terminal device. The method comprises the following steps:
s201, obtaining an image to be detected.
The image to be detected may be, for example, a photo to be uploaded to the internet, or may be an image captured from a video. When detecting and identifying the object, the image to be detected may also be a picture of the object captured by the camera, for example.
The photo or the video can be shot by a camera, the camera can be integrated on the terminal equipment or not, and when the camera is not integrated on the terminal equipment, the photo or the video can be transmitted to the terminal equipment through communication transmission.
S202, obtaining at least one candidate area in the image to be detected based on a pre-trained mark detection network according to the image to be detected.
After the image to be detected is acquired, the image to be detected can be detected. In the method of the embodiment of the application, the detection of the image to be detected is completed by the mark detection network and the mark identification network. The mark detection network performs preliminary detection on an image to be detected, firstly determines a candidate region containing a preset mark, and then performs feature extraction on the candidate region by the mark identification network so as to further determine the category of the image to be detected.
For example, the mark detection network may be a pre-trained CNN network, which may perform preliminary screening on whether the image to be detected includes a preset mark based on the input image to be detected, so as to obtain a candidate region including the preset mark.
The preset flag may be a flag of current interest. For example, when performing image review, if it is intended to determine whether the image contains a mark of a specific tissue, the marks of the specific tissue may all be used as preset marks, and then, for each preset mark, the mark detection network is used to perform detection, so as to obtain a candidate region containing the preset mark. In addition, the preset mark may be other marks such as a specific station mark, a specific flag, and the like.
It should be noted that the number of the preset marks is not limited herein, and one or more preset marks may be used. In addition, the image to be detected may include more than one candidate region including the preset mark, and all candidate regions that may include the preset mark may be obtained through the detection of the mark detection network. Moreover, since there may be more than one preset flag, the preset flags included in each candidate area may be the same or different.
S203, according to the at least one candidate region, based on a pre-trained mark recognition network, obtaining the characteristics of the at least one candidate region.
After the candidate regions including the preset marks are obtained through the preliminary screening of the mark detection network, feature extraction may be performed on the candidate regions based on the pre-trained mark recognition network according to each candidate region. It should be noted that, since there may be more than one candidate region, feature extraction needs to be performed on the preset flag included in each candidate region.
For example, the above-mentioned mark recognition network may be a pre-trained convolutional neural network, which may perform feature extraction on a preset mark included in the candidate region based on an input region image, so as to obtain a feature of the candidate region.
And S204, determining the category of the image to be detected according to the characteristics of the at least one candidate area.
After the features of the candidate region are obtained, the category of the image to be detected may be determined based on the features of the candidate region.
Optionally, the type of the preset mark may be determined based on the feature of the preset mark, and then the type of the image to be detected may be determined.
For example, if the three candidate regions a1, a2, and A3 obtained by the mark detection network all include preset marks, the respective features of the three candidate regions a1, a2, and A3 are extracted based on the mark identification network, and the category of the image to be detected is determined based on the extracted features.
According to the method of the embodiment of the application, firstly, an image to be detected is obtained, then, at least one candidate area in the image to be detected is obtained based on a pre-trained mark detection network according to the image to be detected, the candidate area comprises a preset mark, the network is identified according to the at least one candidate area and based on the pre-trained mark, the characteristics of the at least one candidate area are obtained, and then, the category of the image to be detected is determined according to the characteristics of the at least one candidate area. Because two independent convolutional neural networks, namely the mark detection network and the mark identification network, are adopted, the picture to be detected is respectively subjected to coarse screening and fine screening, firstly, a candidate area containing a preset mark is obtained through the coarse screening of the mark detection network, then, the characteristics are further extracted through the mark identification network, and the category of the picture to be detected is further determined. Therefore, compared with a scheme based on template comparison in the prior art, the detection efficiency and accuracy of the scheme are higher.
Fig. 3 is a schematic flowchart of an image detection method according to an embodiment of the present application. The execution subject of the method may be the terminal device shown in fig. 1, or may be a server. The following embodiments of the present application take an execution subject as an example of a terminal device. As shown in fig. 3, the method includes:
s301, acquiring an image to be detected.
The description of step S201 in the foregoing embodiment is equally applicable to this step, and is not repeated here redundantly.
S302, inputting the image to be detected into a feature extraction network, and extracting key features of the image to be detected.
After the image to be detected is obtained, feature extraction can be performed on the image to be detected by using a pre-trained feature extraction network, so that key features of the image to be detected are extracted. For example, currently mainstream intelligent network models (such as a residual error network) can be used as the feature extraction network, the intelligent network models can more intelligently extract features of the image, the extracted image features generally include abundant semantic information in the image, and compared with traditional features such as HOG features, different images to be detected can be better described, so that the accuracy of the detection result is improved. Alternatively, the above feature extraction network may be resnet50, for example.
S303, inputting the key features of the image to be detected into the candidate area to generate a network, and obtaining the position coordinates of the at least one candidate area.
After extracting the key features of the image to be detected, the extracted key features may be input into a candidate Region generation Network (RPN) to obtain the position coordinates of at least one candidate Region.
The RPN herein may include a pre-trained regression model and classification model. The regression model can be used for positioning a region to be processed in the image to be detected, wherein the region to be processed is a region possibly comprising a preset mark; the classification model may further determine whether the region to be processed includes a preset flag based on an output of the regression model. By using the regression model and the classification model as the RPN, the candidate region containing the preset mark can be screened out more quickly.
Specifically, the key features of the image to be detected can be input into the regression model to obtain the position coordinates of at least one region to be processed; as more than one obtained region to be processed may be obtained, for each region to be processed, inputting the key features of the region to be processed into a classification model, and determining whether the region to be processed contains a preset mark; and when the to-be-processed area is determined to contain the preset mark, taking the position coordinate of the to-be-processed area as the position coordinate of a candidate area. All the regions to be processed can be traversed, then all the regions to be processed containing the preset marks are judged as the at least one candidate region by the classification model, and the position coordinates of the regions to be processed are used as the position coordinates of the candidate regions, so that the at least one candidate region is obtained.
Optionally, in order to improve the convergence rate and the classification accuracy, the classification model may select two classifiers, that is, determine whether the region to be processed includes the preset flag.
S304, according to the position coordinates of the at least one candidate region, the characteristics of the at least one candidate region are obtained based on a pre-trained mark recognition network.
After obtaining the position coordinates of the candidate regions, at least one region image respectively corresponding to the at least one candidate region may be obtained based on the image to be detected according to the position coordinates of the at least one candidate region. For example, the image to be detected may be trimmed according to the position coordinates of the candidate region, so as to obtain the region image. By the method, the image to be detected can be divided into smaller area images, and the speed of feature extraction is improved more beneficially as the operation object is reduced. Further, since the candidate region obtained in step S303 includes the preset flag, the region image corresponding to the candidate region also includes the preset flag. Since there may be more than one preset mark, the preset marks included in each region image may be the same or different.
After obtaining at least one region image corresponding to each of the at least one candidate region, the region image may be input to a tag recognition network and feature extraction may be performed for each region image, so as to determine a feature of the region image, which is used as a feature of the candidate region corresponding to the region image.
For example, if the three candidate areas a1, a2, and A3 are obtained to include the preset flag via the flag detection network, the area images B1, B2, and B3 corresponding to the three candidate areas a1, a2, and A3 are obtained based on the position coordinates of the candidate areas a1, a2, and A3, and the area images are respectively input to the flag identification network to extract their respective features.
Alternatively, the above-described marker recognition network may be used to extract high-dimensional features of the region image, the high-dimensional features having a fixed length. For example, the tag identification network may be formed by a residual network and a fine-grained classification function, such as resnet50 plus a mainstream fine-grained classification function, which has good inter-class separability for different objects in the same large class of objects, so as to improve the accuracy of detection.
S305, determining the category of the image to be detected according to the characteristics of the at least one candidate area.
As described in step S304, the mark recognition network performs feature extraction on at least one region image corresponding to at least one candidate region, and obtains the feature of each region image as the feature of the candidate region corresponding to the region image.
For each region image, the feature of the region image may be compared with each candidate feature in the preset candidate feature set, so as to determine the category of the preset mark included in the region image.
The preset candidate feature set is a predetermined feature set of all preset marks. Alternatively, the above-mentioned preset set of alternative features may be prepared in the following manner: inputting each optional mark in a preset optional mark set into the mark identification network to obtain optional features of each optional mark; and constructing the preset candidate feature set by using the candidate features of the candidate marks. The currently focused flags may be set in the preset candidate flag set, for example, if it is to detect whether an image belongs to a sensitive image, flags or flags of a specific organization or institution may be used as candidate flags, and these flags are collected and collected as the preset candidate flag set. The process of extracting the candidate features of the candidate mark by the mark recognition network is similar to the process of extracting the features of the preset mark in the region image, and is not described herein again. In practical application, with the continuous increase of the sensitive marks, only the feature extraction is needed to be performed on the newly added sensitive marks according to the above manner, and the candidate features of the newly added sensitive marks are added into the preset candidate feature set, that is, the detection on the newly added sensitive marks is realized when the image is detected, so that the expandability is higher when the gradually increased sensitive marks are faced by adopting the manner of constructing the preset candidate feature set.
When comparing the features of the region image with each alternative feature in the preset alternative feature set, determining euclidean distances between the features of the region image and each alternative feature respectively; when the euclidean distance between a certain candidate feature in the pre-device selected feature set and the feature of the region image is smaller than a preset threshold, the category of the candidate mark corresponding to the candidate feature can be determined as the category of the preset mark included in the region image. When the Euclidean distance between each candidate feature in the preset candidate feature set and the feature of the region image is not smaller than a preset threshold, the region image can be determined not to include any candidate mark in the preset candidate feature set. The Euclidean distance is relatively simple to calculate, and the calculation speed can be further improved by taking the Euclidean distance as a standard for comparison; in addition, the preset threshold value can be set according to actual requirements, so that the detection precision can be controlled more flexibly.
After the category of the preset mark included in each region image is determined, the category of the image to be detected can be determined according to the categories of the preset mark included in all the region images. The type of the image to be detected is determined based on the type of the sensitive mark contained in each area image, so that the type of the image to be detected is judged more accurately.
For example, assume that the preset candidate feature set is { X1, X2, X3}, which includes three candidate features X1, X2, and X3, where X1 is a feature of the sensitive tissue S1, X2 is a feature of the sensitive tissue S2, and X3 is a feature of the sensitive tissue S3, and the candidate features of the three candidate features are extracted by using a feature recognition network to obtain C1, C2, and C3, so that the preset candidate feature set is composed of a set of the three candidate features, i.e., { C1, C2, and C3 }. Assuming that two region images are obtained based on the image to be detected as P1 and P2, then performing feature comparison on each region image based on the preset candidate mark set respectively: for a region image P1, the euclidean distances from three candidate features in the candidate feature set are D1, D2 and D3, respectively, and the preset threshold is T1, where D1 is smaller than T1, and D2 and D3 are both larger than T1, so that it may be determined that the preset mark included in the region image P1 is closer to the candidate mark X1 corresponding to the candidate feature C1, and therefore, the category of the candidate mark X1 may be determined as the category of the preset mark included in the region image, that is, the preset mark included in the region image P1 is the same as the category of X1; in a similar manner, for the region image P2, it is determined that the distances between the region image P2 and the three candidate features are all greater than the threshold value, and therefore, it is determined that the region image P2 does not include candidate flags corresponding to the three candidate features. However, since the P1 includes the sensitive mark, although the P2 does not include the sensitive mark, the comparison result of the two pictures can be combined to determine that the image to be detected includes the sensitive mark and should be classified as a sensitive picture.
Optionally, when the object to be detected is a video, before step S301, the video to be detected may be obtained first, then a key frame image of the video to be detected is extracted, and the key frame image is used as the image to be detected. Some existing tools, such as ffmpeg, may be used to perform video frame extraction on an input video to be detected, and to improve efficiency, one frame of image in the video is extracted every fixed time (e.g., 2 seconds) to serve as a key frame image. Because the key frame usually includes the key information of the whole video, the operation amount can be reduced, and the detection and classification of the video to be detected can be completed more quickly.
It should be noted that there may be more than one key frame image of the video to be detected, and therefore, the method of the embodiment of the present application may be applied to detect each key frame image, determine the category of each key frame image, and finally comprehensively determine the classification of the video to be detected according to the preset policy. For example, if the target is to determine whether a video segment is a sensitive video (e.g., a riot video), it may be determined whether the number of the key frame images extracted from the video segment includes a sensitive picture exceeds a preset threshold, and if so, it may be determined that the video segment is a sensitive video. Since the calculation may be subject to errors, it is not necessary that the video is a sensitive video if, for example, only one or two images contain sensitive pictures. The selection of the preset threshold may depend on actual requirements, and is not limited herein.
According to the method of the embodiment of the application, firstly, an image to be detected is obtained, then at least one candidate area in the image to be detected is obtained based on a pre-trained mark detection network according to the image to be detected, then the characteristics of the at least one candidate area are obtained based on the pre-trained mark identification network according to the at least one candidate area, and finally the category of the image to be detected is determined according to the characteristics of the at least one candidate area. Because two independent convolutional neural networks, namely the mark detection network and the mark identification network, are adopted, the picture to be detected is respectively subjected to coarse screening and fine screening, firstly, a candidate area containing a preset mark is obtained through the coarse screening of the mark detection network, then, the characteristics are further extracted through the mark identification network, and the category of the picture to be detected is further determined. Therefore, compared with a scheme based on template comparison in the prior art, the detection efficiency and accuracy of the scheme are higher.
Fig. 4 is a block configuration diagram of an image detection apparatus according to an embodiment of the present application. As shown in fig. 4, the image detection apparatus 400 includes:
an obtaining module 401, configured to obtain an image to be detected;
an obtaining module 402, configured to obtain, according to the to-be-detected image, at least one candidate region in the to-be-detected image based on a pre-trained marker detection network, where the candidate region includes a preset marker; according to the at least one candidate region, based on a pre-trained mark recognition network, obtaining the characteristics of the at least one candidate region;
a determining module 403, configured to determine a category of the image to be detected according to a feature of the at least one candidate region.
As a possible implementation, the marker detection network includes a pre-trained feature extraction network and a pre-trained candidate area generation network; the obtaining module 402 is specifically configured to:
inputting the image to be detected into the feature extraction network, and extracting key features of the image to be detected;
and inputting the key characteristics of the image to be detected into the candidate area to generate a network, and obtaining the position coordinates of the at least one candidate area.
As a possible implementation, the candidate region generation network includes a regression model and a classification model; the obtaining module 402 is specifically configured to:
inputting the key characteristics of the image to be detected into the regression model to obtain the position coordinates of at least one region to be processed;
inputting the key features of the to-be-processed area into the classification model for each to-be-processed area, and determining whether the to-be-processed area contains the preset mark; and when the to-be-processed area is determined to contain the preset mark, taking the position coordinate of the to-be-processed area as the position coordinate of the candidate area.
As a possible implementation, the obtaining module 402 is specifically configured to:
obtaining at least one region image respectively corresponding to the at least one candidate region based on the image to be detected according to the at least one candidate region;
for each region image, the region image is input to the marker recognition network, and the feature of the region image is specified as the feature of the candidate region corresponding to the region image.
As a possible implementation manner, the determining module 403 is specifically configured to:
comparing the characteristics of the area image with all the alternative characteristics in a preset alternative characteristic set aiming at each area image, and determining the type of the preset mark contained in the area image;
and determining the type of the image to be detected according to the type of the preset mark contained in the at least one area image.
As a possible implementation, the obtaining module 402 is further configured to:
inputting each optional mark in a preset optional mark set into the mark identification network to obtain optional features of each optional mark;
and constructing the preset candidate feature set by using the candidate features of the candidate marks.
As a possible implementation manner, the determining module 403 is specifically configured to:
respectively determining Euclidean distances between the features of the region image and the alternative features;
and when the Euclidean distance between the candidate feature and the feature of the area image is smaller than a preset threshold value, determining the type of the candidate mark corresponding to the candidate feature as the type of the preset mark contained in the area image.
As a possible implementation manner, the obtaining module 401 is specifically configured to:
acquiring a video to be detected;
and extracting the key frame image of the video to be detected, and taking the key frame image as the image to be detected.
As a possible implementation, the above-mentioned mark detection network is a convolutional neural network.
As a possible implementation, the above-mentioned flag detection network is a residual error network.
As a possible implementation, the above-mentioned mark recognition network is a convolutional neural network.
As a possible implementation, the above-mentioned mark recognition network is composed of a residual error network and a fine-grained classification function.
According to the device of the embodiment of the application, two independent convolutional neural networks, namely the mark detection network and the mark identification network, are adopted, so that the picture to be detected is respectively subjected to coarse screening and fine screening, firstly, a candidate area containing a preset mark is obtained through the coarse screening of the mark detection network, then, the characteristics are further extracted through the mark identification network, and the category of the picture to be detected is further determined. Therefore, compared with a scheme based on template comparison in the prior art, the detection efficiency and accuracy of the scheme are higher.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 5, it is a block diagram of an electronic device according to the image detection method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on a memory to display graphical information of a GUI on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.
Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of image detection provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of image detection provided herein.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the obtaining module 401, the obtaining module 402, and the determining module 403 shown in fig. 4) corresponding to the method of image detection in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the method of image detection in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device by image detection, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 optionally includes memory located remotely from processor 501, which may be connected to image-sensing electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the image detection method may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the image-sensed electronic apparatus, such as a touch screen, a small keyboard, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using a high-level procedural and/or object-oriented programming language, and/or assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the scheme of the embodiment of the application, firstly, an image to be detected is obtained, then at least one candidate area in the image to be detected is obtained based on a pre-trained mark detection network according to the image to be detected, the candidate area contains a preset mark, at least one candidate area is obtained based on a pre-trained mark identification network according to the at least one candidate area, the characteristics of the at least one candidate area are obtained, and then the category of the image to be detected is determined according to the characteristics of the at least one candidate area. Because two independent convolutional neural networks, namely the mark detection network and the mark identification network, are adopted, the picture to be detected is respectively subjected to coarse screening and fine screening, firstly, a candidate area containing a preset mark is obtained through the coarse screening of the mark detection network, then, the characteristics are further extracted through the mark identification network, and the category of the picture to be detected is further determined. Therefore, compared with a scheme based on template comparison in the prior art, the detection efficiency and accuracy of the scheme are higher.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (26)

1. An image detection method, comprising:
acquiring an image to be detected;
obtaining at least one candidate area in the image to be detected based on a pre-trained mark detection network according to the image to be detected, wherein the candidate area comprises a preset mark;
according to the at least one candidate region, based on a pre-trained mark recognition network, obtaining the characteristics of the at least one candidate region;
and determining the category of the image to be detected according to the characteristics of the at least one candidate region.
2. The method of claim 1, wherein the signature detection network comprises a pre-trained feature extraction network and a pre-trained candidate region generation network;
according to the image to be detected, at least one candidate area in the image to be detected is obtained based on a pre-trained mark detection network, and the method comprises the following steps:
inputting the image to be detected into the feature extraction network, and extracting key features of the image to be detected;
and inputting the key features of the image to be detected into the candidate area to generate a network, and obtaining the position coordinates of the at least one candidate area.
3. The method of claim 2, wherein the candidate region generation network comprises a regression model and a classification model;
the step of inputting the key features of the image to be detected into the candidate area generation network to obtain the position coordinates of the at least one candidate area comprises the following steps:
inputting the key characteristics of the image to be detected into the regression model to obtain the position coordinates of at least one region to be processed;
inputting the key features of the regions to be processed into the classification model for each region to be processed, and determining whether the regions to be processed contain the preset marks; and when the to-be-processed area is determined to contain the preset mark, taking the position coordinate of the to-be-processed area as the position coordinate of the candidate area.
4. The method of claim 1, wherein the obtaining features of the at least one candidate region based on a pre-trained landmark recognition network according to the at least one candidate region comprises:
obtaining at least one region image respectively corresponding to the at least one candidate region based on the image to be detected according to the at least one candidate region;
and inputting the area image into the mark identification network for each area image, and determining the characteristics of the area image as the characteristics of the candidate area corresponding to the area image.
5. The method according to claim 4, wherein said determining the class of the image to be detected according to the characteristics of the at least one candidate region comprises:
for each regional image, comparing the characteristics of the regional image with all the alternative characteristics in a preset alternative characteristic set, and determining the category of the preset mark contained in the regional image;
and determining the category of the image to be detected according to the category of the preset mark contained in the at least one region image.
6. The method according to claim 5, wherein before comparing the features of the region image with the candidate features in the preset candidate feature set, the method further comprises:
inputting each alternative mark in a preset alternative mark set into the mark identification network to obtain alternative characteristics of each alternative mark;
and constructing the preset candidate feature set by using the candidate features of each candidate mark.
7. The method according to claim 6, wherein the comparing the feature of the region image with each candidate feature in a preset candidate feature set to determine the category of the preset mark included in the region image comprises:
respectively determining Euclidean distances between the features of the region image and the alternative features;
and when the Euclidean distance between the candidate features and the features of the region image is smaller than a preset threshold value, determining the category of the candidate mark corresponding to the candidate features as the category of the preset mark contained in the region image.
8. The method of any one of claims 1-7, wherein said acquiring an image to be detected comprises:
acquiring a video to be detected;
and extracting a key frame image of the video to be detected, and taking the key frame image as the image to be detected.
9. The method of any of claims 1-7, wherein the marker detection network is a convolutional neural network.
10. A method according to claim 2 or 3, wherein the feature extraction network is a residual network.
11. The method of any one of claims 1-7, wherein the landmark identification network is a convolutional neural network.
12. The method of claim 11, wherein the signature recognition network is comprised of a residual network and a fine-grained classification function.
13. An image detection apparatus comprising:
the acquisition module is used for acquiring an image to be detected;
an obtaining module, configured to obtain at least one candidate region in the image to be detected based on a pre-trained marker detection network according to the image to be detected, where the candidate region includes a preset marker; according to the at least one candidate region, based on a pre-trained mark recognition network, obtaining the characteristics of the at least one candidate region;
and the determining module is used for determining the category of the image to be detected according to the characteristics of the at least one candidate area.
14. The apparatus of claim 13, wherein the signature detection network comprises a pre-trained feature extraction network and a pre-trained candidate region generation network;
the obtaining module is specifically configured to:
inputting the image to be detected into the feature extraction network, and extracting key features of the image to be detected;
and inputting the key features of the image to be detected into the candidate area to generate a network, and obtaining the position coordinates of the at least one candidate area.
15. The apparatus of claim 14, wherein the candidate region generation network comprises a regression model and a classification model;
the obtaining module is specifically configured to:
inputting the key characteristics of the image to be detected into the regression model to obtain the position coordinates of at least one region to be processed;
inputting the key features of the regions to be processed into the classification model for each region to be processed, and determining whether the regions to be processed contain the preset marks; and when the to-be-processed area is determined to contain the preset mark, taking the position coordinate of the to-be-processed area as the position coordinate of the candidate area.
16. The apparatus of claim 13, wherein the obtaining module is specifically configured to:
obtaining at least one region image respectively corresponding to the at least one candidate region based on the image to be detected according to the at least one candidate region;
and inputting the area image into the mark identification network for each area image, and determining the characteristics of the area image as the characteristics of the candidate area corresponding to the area image.
17. The apparatus of claim 16, wherein the determining module is specifically configured to:
for each regional image, comparing the characteristics of the regional image with all the alternative characteristics in a preset alternative characteristic set, and determining the category of the preset mark contained in the regional image;
and determining the category of the image to be detected according to the category of the preset mark contained in the at least one region image.
18. The apparatus of claim 17, wherein the means for obtaining is further configured to:
inputting each alternative mark in a preset alternative mark set into the mark identification network to obtain alternative characteristics of each alternative mark;
and constructing the preset candidate feature set by using the candidate features of each candidate mark.
19. The apparatus of claim 18, wherein the determining module is specifically configured to:
respectively determining Euclidean distances between the features of the region image and the alternative features;
and when the Euclidean distance between the candidate features and the features of the region image is smaller than a preset threshold value, determining the category of the candidate mark corresponding to the candidate features as the category of the preset mark contained in the region image.
20. The apparatus according to any one of claims 13 to 19, wherein the obtaining means is specifically configured to:
acquiring a video to be detected;
and extracting a key frame image of the video to be detected, and taking the key frame image as the image to be detected.
21. The apparatus of any one of claims 13-19, wherein the marker detection network is a convolutional neural network.
22. The apparatus of claim 14 or 15, wherein the feature extraction network is a residual network.
23. The apparatus of any one of claims 13-19, wherein the signature recognition network is a convolutional neural network.
24. The apparatus of claim 23, wherein the signature recognition network is comprised of a residual network and a fine-grained classification function.
25. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.
26. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-12.
CN202010613093.0A 2020-06-30 2020-06-30 Image detection method and device, electronic equipment and readable storage medium Pending CN111783639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010613093.0A CN111783639A (en) 2020-06-30 2020-06-30 Image detection method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010613093.0A CN111783639A (en) 2020-06-30 2020-06-30 Image detection method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN111783639A true CN111783639A (en) 2020-10-16

Family

ID=72761228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010613093.0A Pending CN111783639A (en) 2020-06-30 2020-06-30 Image detection method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111783639A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966609A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Target detection method and device
CN113076882A (en) * 2021-04-03 2021-07-06 国家计算机网络与信息安全管理中心 Specific mark detection method based on deep learning
CN113157160A (en) * 2021-04-20 2021-07-23 北京百度网讯科技有限公司 Method and apparatus for identifying misleading play buttons
CN114549576A (en) * 2022-01-26 2022-05-27 广州方图科技有限公司 Walking ability detection method and device, storage medium and physical examination machine

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326858A (en) * 2016-08-23 2017-01-11 北京航空航天大学 Road traffic sign automatic identification and management system based on deep learning
CN109145901A (en) * 2018-08-14 2019-01-04 腾讯科技(深圳)有限公司 Item identification method, device, computer readable storage medium and computer equipment
CN110135307A (en) * 2019-04-30 2019-08-16 北京邮电大学 Method for traffic sign detection and device based on attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326858A (en) * 2016-08-23 2017-01-11 北京航空航天大学 Road traffic sign automatic identification and management system based on deep learning
CN109145901A (en) * 2018-08-14 2019-01-04 腾讯科技(深圳)有限公司 Item identification method, device, computer readable storage medium and computer equipment
CN110135307A (en) * 2019-04-30 2019-08-16 北京邮电大学 Method for traffic sign detection and device based on attention mechanism

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966609A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Target detection method and device
CN112966609B (en) * 2021-03-05 2023-08-11 北京百度网讯科技有限公司 Target detection method and device
CN113076882A (en) * 2021-04-03 2021-07-06 国家计算机网络与信息安全管理中心 Specific mark detection method based on deep learning
CN113157160A (en) * 2021-04-20 2021-07-23 北京百度网讯科技有限公司 Method and apparatus for identifying misleading play buttons
CN113157160B (en) * 2021-04-20 2023-08-15 北京百度网讯科技有限公司 Method and apparatus for identifying misleading play button
CN114549576A (en) * 2022-01-26 2022-05-27 广州方图科技有限公司 Walking ability detection method and device, storage medium and physical examination machine
CN114549576B (en) * 2022-01-26 2023-05-23 广州方图科技有限公司 Walking capacity detection method, walking capacity detection device, storage medium and physical examination machine

Similar Documents

Publication Publication Date Title
CN111860506B (en) Method and device for recognizing characters
CN112528850B (en) Human body identification method, device, equipment and storage medium
CN111259751B (en) Human behavior recognition method, device, equipment and storage medium based on video
CN105631399B (en) Fast object tracking framework for sports video recognition
CN111783639A (en) Image detection method and device, electronic equipment and readable storage medium
CN110659600B (en) Object detection method, device and equipment
CN111598164B (en) Method, device, electronic equipment and storage medium for identifying attribute of target object
CN111768381A (en) Part defect detection method and device and electronic equipment
CN104361327A (en) Pedestrian detection method and system
CN111753908A (en) Image classification method and device and style migration model training method and device
CN111611903B (en) Training method, using method, device, equipment and medium of motion recognition model
WO2021041176A1 (en) Shuffle, attend, and adapt: video domain adaptation by clip order prediction and clip attention alignment
CN111783649B (en) Video type detection method, device, electronic equipment and storage medium
CN112149636A (en) Method, apparatus, electronic device and storage medium for detecting target object
CN112561053B (en) Image processing method, training method and device of pre-training model and electronic equipment
CN110717933B (en) Post-processing method, device, equipment and medium for moving object missed detection
CN111626263B (en) Video region of interest detection method, device, equipment and medium
CN111563541B (en) Training method and device of image detection model
CN111178323B (en) Group behavior recognition method, device, equipment and storage medium based on video
CN111783650A (en) Model training method, action recognition method, device, equipment and storage medium
CN111783600B (en) Face recognition model training method, device, equipment and medium
CN111444819B (en) Cut frame determining method, network training method, device, equipment and storage medium
CN112001265A (en) Video event identification method and device, electronic equipment and storage medium
CN112560772B (en) Face recognition method, device, equipment and storage medium
CN111552829B (en) Method and apparatus for analyzing image material

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201016