CN113065591B - Target detection method and device, electronic equipment and storage medium - Google Patents

Target detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113065591B
CN113065591B CN202110340256.7A CN202110340256A CN113065591B CN 113065591 B CN113065591 B CN 113065591B CN 202110340256 A CN202110340256 A CN 202110340256A CN 113065591 B CN113065591 B CN 113065591B
Authority
CN
China
Prior art keywords
candidate frame
image
sample
network
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110340256.7A
Other languages
Chinese (zh)
Other versions
CN113065591A (en
Inventor
李�诚
张正明
李南贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Intelligent Technology Co Ltd
Original Assignee
Shanghai Sensetime Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Intelligent Technology Co Ltd filed Critical Shanghai Sensetime Intelligent Technology Co Ltd
Priority to CN202110340256.7A priority Critical patent/CN113065591B/en
Publication of CN113065591A publication Critical patent/CN113065591A/en
Application granted granted Critical
Publication of CN113065591B publication Critical patent/CN113065591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks

Abstract

The disclosure relates to a target detection method and device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring an image to be processed; performing image recognition processing on the image to be processed through a recognition network to obtain at least one recognition result of the image to be processed, wherein the recognition result comprises a candidate frame and category information corresponding to the candidate frame; and according to each candidate frame and the category information corresponding to each candidate frame, obtaining a detection result of the image to be processed, wherein the identification network comprises a candidate frame generation network and a classification network, in the training process of the identification network, according to the pre-trained candidate frame generation network, a second training set aiming at the classification network is constructed according to the processing result of the sample image in the first training set and the labeling information of the sample image, and according to the second training set, the classification network is trained, so that the identification network is obtained according to the candidate frame generation network and the trained classification network. The embodiment of the disclosure can reduce the difficulty of target detection.

Description

Target detection method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the field of computer technology, and in particular, to a target detection method and device, an electronic device, and a storage medium.
Background
With the development of scientific technology, AI (Artificial Intelligence ) technology is increasingly used in various industries. Computer vision is an important area of AI, and object detection is the basis of many computer vision tasks, so training an object detection network is the primary task to achieve many computer vision tasks.
Disclosure of Invention
The disclosure provides a technical scheme for realizing target detection.
According to an aspect of the present disclosure, there is provided a target detection method, applied to an electronic device, including:
acquiring an image to be processed; performing image recognition processing on the image to be processed through a recognition network to obtain at least one recognition result of the image to be processed, wherein the recognition result comprises a candidate frame and category information corresponding to the candidate frame; and according to each candidate frame and the category information corresponding to each candidate frame, obtaining a detection result of the image to be processed, wherein the identification network comprises a candidate frame generation network and a classification network, in the training process of the identification network, a second training set aiming at the classification network is constructed according to the processing result of the pre-trained candidate frame generation network aiming at the sample image in the first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the identification network is obtained according to the candidate frame generation network and the trained classification network.
According to the target detection method provided by the embodiment of the disclosure, target detection for the image to be processed can be realized through the identification network, and only the classification network is required to be trained in the training process of the identification network.
In one possible implementation manner, the obtaining the detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame includes:
determining a first candidate frame of which the category information is target category information from the candidate frames; determining a target candidate frame from the first candidate frames; and obtaining a detection result of the image to be processed according to the target candidate frame and the category information corresponding to the target candidate frame.
According to the target detection method provided by the embodiment of the disclosure, the target detection for the image to be processed can be realized through the identification network, and the complexity of target detection is reduced.
In one possible implementation manner, the obtaining the detection result of the image to be processed according to the target candidate frame and the category information corresponding to the target candidate frame includes:
And correcting and adjusting the coordinate information of the target candidate frame, and obtaining a detection result of the image to be processed according to the category information corresponding to the target candidate frame and the adjusted target candidate frame.
According to the target detection method provided by the embodiment of the disclosure, the target candidate frame can be corrected and adjusted to obtain a more accurate detection result, so that the accuracy of the detection result is improved.
In one possible implementation manner, the performing, by using an identification network, image identification processing on the image to be processed to obtain at least one identification result of the image to be processed includes:
performing image processing on the image to be processed through the candidate frame generating network to obtain at least one piece of candidate frame information of the image to be processed, wherein the candidate frame information comprises a candidate frame and image characteristic information of the candidate frame; respectively carrying out image recognition on the image content in each candidate frame according to the image characteristic information corresponding to each candidate frame through the classification network to obtain the category information corresponding to each candidate frame; and obtaining at least one identification result of the image to be processed according to the category information corresponding to the at least one candidate frame.
According to the target detection method provided by the embodiment of the disclosure, target detection for the image to be processed can be realized by identifying the network, and the complexity of target detection can be reduced.
In one possible implementation, the method further includes: training the identification network according to the first training set, wherein the first training set comprises a plurality of sample groups, the sample groups comprise sample images and labeling information of the sample images, and training the identification network according to the first training set comprises the following steps:
performing image processing on a sample image through the candidate frame generation network to obtain at least one sample candidate frame message, wherein the sample candidate frame message comprises a sample candidate frame and image characteristic information of the sample candidate frame; obtaining labeling category information corresponding to the sample candidate frame through the labeling information of the sample candidate frame and the sample image; obtaining a second training set according to the sample candidate frame information and the labeling category information corresponding to each sample candidate frame; training the classification network through the second training set; and generating a network according to the trained classification network and the candidate frame to obtain the identification network.
According to the target detection method provided by the embodiment of the disclosure, a user can create the first training set through simple marking operation to train the identification network, and in the process of training the identification network, only the classification network is trained to generate the network according to the trained classification network and the pre-trained candidate frame to obtain the identification network.
In one possible implementation, the training the classification network through the second training set includes:
classifying each sample candidate frame through the classification network according to the image characteristic information corresponding to each sample candidate frame to obtain prediction category information corresponding to each sample candidate frame; and training the classification network according to the prediction category information corresponding to the sample candidate frame and the labeling category information corresponding to the sample candidate frame to obtain a trained classification network.
In a possible implementation manner, the image processing, through the candidate frame generating network, on the sample image, to obtain at least one sample candidate frame information includes:
Responding to training operation aiming at the identification network, and calling a candidate frame matched with a target format from a link library to generate a network according to the target format supported by the electronic equipment; and performing image processing on the sample image through the candidate frame generation network matched with the target format to obtain at least one sample candidate frame information in the sample image.
According to the target detection method provided by the embodiment of the disclosure, the network in the link library is directly called, a deep learning frame is not required to be installed, and the target detection method can be used in a cross-platform mode, namely, the target detection method can be operated on a windows, linux, macos system, so that the problems that the realization of the target detection network is often dependent on the deep learning frame and some existing algorithm packages, the environment is relatively dependent, the network is often not operated under different operating systems, and the compatibility of the target detection method is poor are solved.
In one possible implementation manner, the labeling information includes a labeling frame and labeling category information corresponding to the labeling frame, and the obtaining, by using the labeling information of the sample candidate frame and the sample image, the labeling category information corresponding to the sample candidate frame includes:
Determining the overlapping degree of the sample candidate frame and any labeling frame aiming at any sample candidate frame; and determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame.
In one possible implementation manner, the determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame includes:
determining a target annotation frame from at least one annotation frame under the condition that the overlapping degree of the sample candidate frame and the at least one annotation frame is greater than or equal to an overlapping degree threshold, wherein the overlapping degree of the target annotation frame and the sample candidate frame is highest; and taking the labeling category information of the target labeling frame as the labeling category information of the sample candidate frame.
According to the target detection method provided by the embodiment of the disclosure, the labeling category information of all sample candidate frames of the sample image can be obtained, and then the classification network can be trained according to the labeling category information of the sample candidate frames so as to obtain the identification network, and target detection for the image to be processed is realized according to the identification network.
According to an aspect of the present disclosure, there is provided an object detection apparatus applied to an electronic device, the apparatus including:
The acquisition module is used for acquiring the image to be processed; the first processing module is used for carrying out image recognition processing on the image to be processed through a recognition network to obtain at least one recognition result of the image to be processed, wherein the recognition result comprises a candidate frame and category information corresponding to the candidate frame; the second processing module is used for obtaining a detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame, wherein the identification network comprises a candidate frame generation network and a classification network, in the training process of the identification network, a second training set aiming at the classification network is constructed according to the processing result of the candidate frame generation network aiming at the sample image in the first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the identification network is obtained according to the candidate frame generation network and the classification network after training.
In one possible implementation manner, the second processing module is further configured to:
determining a first candidate frame of which the category information is target category information from the candidate frames; determining a target candidate frame from the first candidate frames; and obtaining a detection result of the image to be processed according to the target candidate frame and the category information corresponding to the target candidate frame.
In one possible implementation manner, the second processing module is further configured to:
and correcting and adjusting the coordinate information of the target candidate frame, and obtaining a detection result of the image to be processed according to the category information corresponding to the target candidate frame and the adjusted target candidate frame.
In one possible implementation manner, the first processing module is further configured to:
performing image processing on the image to be processed through the candidate frame generating network to obtain at least one piece of candidate frame information of the image to be processed, wherein the candidate frame information comprises a candidate frame and image characteristic information of the candidate frame; respectively carrying out image recognition on the image content in each candidate frame according to the image characteristic information corresponding to each candidate frame through the classification network to obtain the category information corresponding to each candidate frame; and obtaining at least one identification result of the image to be processed according to the category information corresponding to the at least one candidate frame.
In a possible implementation manner, the apparatus further includes a training module, where the training module is configured to train the identification network according to the first training set, and the first training set includes a plurality of sample groups, where the sample groups include sample images and labeling information of the sample images, and the training module is further configured to:
Performing image processing on a sample image through the candidate frame generation network to obtain at least one sample candidate frame message, wherein the sample candidate frame message comprises a sample candidate frame and image characteristic information of the sample candidate frame; obtaining labeling category information corresponding to the sample candidate frame through the labeling information of the sample candidate frame and the sample image; obtaining a second training set according to the sample candidate frame information and the labeling category information corresponding to each sample candidate frame; training the classification network through the second training set; and generating a network according to the trained classification network and the candidate frame to obtain the identification network.
In one possible implementation, the training module is further configured to:
classifying each sample candidate frame through the classification network according to the image characteristic information corresponding to each sample candidate frame to obtain prediction category information corresponding to each sample candidate frame; and training the classification network according to the prediction category information corresponding to the sample candidate frame and the labeling category information corresponding to the sample candidate frame to obtain a trained classification network.
In one possible implementation, the training module is further configured to:
Responding to training operation aiming at the identification network, and calling a candidate frame matched with a target format from a link library to generate a network according to the target format supported by the electronic equipment; and performing image processing on the sample image through the candidate frame generation network matched with the target format to obtain at least one sample candidate frame information in the sample image.
In one possible implementation manner, the labeling information includes a labeling frame and labeling category information corresponding to the labeling frame, and the training module is further configured to:
determining the overlapping degree of the sample candidate frame and any labeling frame aiming at any sample candidate frame; and determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame.
In one possible implementation, the training module is further configured to:
determining a target annotation frame from at least one annotation frame under the condition that the overlapping degree of the sample candidate frame and the at least one annotation frame is greater than or equal to an overlapping degree threshold, wherein the overlapping degree of the target annotation frame and the sample candidate frame is highest; and taking the labeling category information of the target labeling frame as the labeling category information of the sample candidate frame.
According to an aspect of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
In the target detection method and device, the electronic equipment and the storage medium provided by the embodiment of the disclosure, target detection for the image to be processed can be realized through the identification network, and only the classification network needs to be trained in the training process of the identification network.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
FIG. 1 illustrates a flow chart of a target detection method according to an embodiment of the present disclosure;
FIG. 2 shows a schematic diagram of a target detection method according to an embodiment of the disclosure;
FIG. 3 shows a schematic diagram of a target detection method according to an embodiment of the disclosure;
FIG. 4 shows a schematic diagram of a target detection method according to an embodiment of the disclosure;
FIG. 5 shows a block diagram of an object detection device according to an embodiment of the present disclosure;
fig. 6 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure;
fig. 7 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
The related target detection method can be realized through a target detection network, and the realization of the target detection network often depends on a deep learning framework and some existing algorithm packages, so that the environment dependence is relatively large, the number of network parameters of the target detection network is large, and the storage and calculation capability requirements on electronic equipment are extremely high. Therefore, it is important to develop a target detection method which is light in weight, low in resource consumption and simple to operate.
The embodiment of the disclosure provides a target detection method, which can pretrain a candidate frame generation network, wherein the candidate frame generation network can be used for generating a candidate frame in an image to be processed and extracting image characteristic information corresponding to image content in a candidate frame area. And then, the sample images in the first training set can be subjected to image processing by calling the candidate frame generating network, so that the sample candidate frames corresponding to the sample images and the image characteristic information corresponding to the image content in the sample candidate frames are generated. The sample image has annotation information (the annotation information can comprise an annotation frame and annotation category information corresponding to the annotation frame), and then the annotation category information corresponding to each sample candidate frame can be determined according to the matching result and the annotation category information of the annotation frame by matching the annotation frame of the sample image with the sample candidate frame.
Further, a second training set for training the classification network may be constructed according to each sample candidate frame, the image feature information corresponding to each sample candidate frame, and the labeling category information corresponding to each sample candidate frame, so as to train the classification network through the second training set. Specifically, the image feature information corresponding to the sample candidate frame can be classified through a classification network to obtain prediction type information corresponding to the sample candidate frame, the classification network is trained according to the prediction type information corresponding to the sample candidate frame and labeling type information of the sample candidate frame, and then the recognition network is obtained according to the classification network obtained after training and the pre-trained candidate frame generation network.
After the image to be processed is acquired, image recognition processing is carried out on the image to be processed through a recognition network to obtain at least one recognition result of the image to be processed, wherein the recognition result can comprise candidate frames in the image to be processed and category information corresponding to the candidate frames, and further a detection result of the image to be processed is obtained according to each candidate frame in the image to be processed and the category information corresponding to each candidate frame. For example, when the target detection object is a person, a candidate frame with the category information being a person may be determined from the recognition result, and a detection result may be obtained according to the candidate frame and the category information ("person") of the candidate frame, and after the detection result is obtained, the detection result may be displayed, for example: the candidate frame corresponding to the detection result can be displayed on the image to be processed, or the category information corresponding to the candidate frame can be displayed while the candidate frame corresponding to the detection result is displayed.
According to the target detection method provided by the embodiment of the disclosure, detection of the image to be processed can be realized by utilizing the identification network, and training of the classification network can be guided by utilizing the pre-trained candidate frame generation network in the training process of the identification network so as to construct the identification network through the candidate frame generation network and the trained classification network. Namely, the target detection method provided by the embodiment of the disclosure can realize target detection only by training the classification network, and because the classification network is a lightweight network and has few network parameters, the network training process can reduce the requirements on the storage and calculation capacity of the electronic equipment and shorten the network training period.
Fig. 1 shows a flowchart of an object detection method according to an embodiment of the present disclosure, where the object detection method may be performed by an electronic device such as a terminal device or a server, and the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, etc., and the method may be implemented by a processor invoking computer readable instructions stored in a memory. Alternatively, the method may be performed by a server.
As shown in fig. 1, the target detection method may include:
in step S11, an image to be processed is acquired;
in step S12, performing image recognition processing on the image to be processed through a recognition network to obtain at least one recognition result of the image to be processed, where the recognition result includes a candidate frame and category information corresponding to the candidate frame;
in step S13, a detection result of the image to be processed is obtained according to each candidate frame and the category information corresponding to each candidate frame.
The recognition network comprises a candidate frame generation network and a classification network, wherein in the training process of the recognition network, a second training set aiming at the classification network is constructed according to the processing result of the pre-trained candidate frame generation network aiming at a sample image in a first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the recognition network is obtained according to the candidate frame generation network and the trained classification network.
In the embodiment of the disclosure, the electronic device may acquire the image to be processed in any one of image acquisition modes such as image data acquisition, image data uploading or image data downloading; alternatively, the image to be processed may be a video frame image in video data acquired by the electronic device. After the image to be processed is acquired, image recognition processing can be carried out on the image to be processed through a recognition network, at least one recognition result of the image to be processed is obtained, and the recognition result can comprise a candidate frame and category information corresponding to the candidate frame.
The recognition network can be a pre-trained network and can be used for carrying out image recognition on the image to be processed to obtain at least one candidate frame and category information corresponding to each candidate frame in the image to be processed. The identification network may include a candidate frame generation network and a classification network, where the candidate frame generation network may be configured to generate a plurality of candidate frames in the image to be processed and extract image feature information corresponding to image content in each candidate frame area, and the classification network may be configured to process the image feature information corresponding to each candidate frame area to obtain class information corresponding to each candidate frame. Wherein the candidate box is used to represent an area in the image to be processed where an object (e.g., a person, an object, etc.) may be present.
It should be noted that, the candidate frame generating network may be a pre-trained network, and in the case that the user wants to train the identifying network, the candidate frame generating network may be called from the link library to perform image processing on the sample images in the first training set, so as to generate sample candidate frames corresponding to each sample image and image feature information corresponding to image content in the sample candidate frame area. The sample image has annotation information (can comprise annotation frames and annotation category information corresponding to the annotation frames), and the annotation category information corresponding to each sample candidate frame can be obtained by matching the annotation frames of the sample image with the sample candidate frames according to the matching result and the annotation category information of each annotation frame.
Further, a second training set for training the recognition network can be constructed according to each candidate frame, the image feature information corresponding to each sample candidate frame and the labeling category information corresponding to each sample candidate frame, and the classification network is trained through the second training set. Specifically, the sample candidate frame can be classified according to the image feature information corresponding to the sample candidate frame through the classification network to obtain the prediction category information of the sample candidate frame, and the classification loss of the classification network is calculated according to the prediction category information of the sample candidate frame and the labeling category information of the sample candidate frame, so that the classification network is trained according to the classification loss of the classification network, and the recognition network is formed according to the trained classification network and the pre-trained candidate frame generation network.
In one possible implementation manner, the performing, by using an identification network, image identification processing on the image to be processed to obtain at least one identification result of the image to be processed may include:
performing image processing on the image to be processed through the candidate frame generating network to obtain at least one piece of candidate frame information of the image to be processed, wherein the candidate frame information comprises a candidate frame and image characteristic information of the candidate frame;
respectively carrying out image recognition on the image content in each candidate frame according to the image characteristic information corresponding to each candidate frame through the classification network to obtain the category information corresponding to each candidate frame;
and obtaining at least one identification result of the image to be processed according to the category information corresponding to the at least one candidate frame.
For example, the candidate box generation network may be used to generate at least one candidate box in the image to be processed and extract image feature information of the image content in each candidate box region to obtain at least one candidate box information. The classification network may be configured to classify the candidate frames according to the image feature information of the candidate frames, to obtain category information corresponding to each candidate frame.
The image to be processed can be processed through a candidate frame generation network in the recognition network to obtain at least one candidate frame information of the image to be processed, and the at least one candidate frame information is used as input information of a classification network, so that the classification network can obtain category information corresponding to each candidate frame after image recognition is carried out on the image characteristic information of each candidate frame. And according to each candidate frame and the category information corresponding to each candidate frame, at least one identification result of the image to be processed can be obtained.
After the recognition network processes the image to be processed, at least one candidate frame in the image to be processed and category information corresponding to each candidate frame can be obtained from the recognition result. And determining the candidate frame corresponding to the target detection object from the candidate frames according to the category information corresponding to each candidate frame, so as to obtain a detection result. For example: the method comprises the steps that a first candidate frame with the category information of a first specified category can be selected from candidate frames, and a detection result of an image to be processed is obtained according to the first candidate frame with the first specified category and the category information corresponding to the first candidate frame, wherein the first specified category is the category information of a target detection object; or, the second candidate frame with the category information of the second designated category can be screened out from the candidate frames, and the detection result of the image to be processed is obtained according to the third candidate frame remained after screening and the category information corresponding to the third candidate frame, wherein the category information except the second designated category is the category information of the target detection object.
In this way, the electronic device can perform image recognition processing on the image to be processed through the recognition network after the image to be processed is acquired, so as to obtain at least one recognition result of the image to be processed, wherein the recognition result comprises the candidate frame and the category information corresponding to the candidate frame. The electronic equipment can obtain the detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame. The recognition network comprises a candidate frame generation network and a classification network, and in the training process of the recognition network, a second training set aiming at the classification network can be constructed according to the processing result of the pre-trained candidate frame generation network aiming at the sample image in the first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the recognition network is obtained according to the pre-trained candidate frame generation network and the trained classification network.
According to the target detection method provided by the embodiment of the disclosure, target detection for the image to be processed can be realized through the identification network, and only the classification network is required to be trained in the training process of the identification network.
In one possible implementation manner, the obtaining the detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame may include:
determining a first candidate frame of which the category information is target category information from the candidate frames;
determining a target candidate frame from the first candidate frames;
and obtaining a detection result of the image to be processed according to the target candidate frame and the category information corresponding to the target candidate frame.
For example, after the recognition network performs recognition processing on the image to be processed to obtain a plurality of candidate frames of the image to be processed and category information corresponding to each candidate frame, the category information may be determined from the plurality of candidate frames as a first candidate frame of target category information, where the target category information may include category information corresponding to a target detection object to be detected, or the target category information may be category information excluding a preset category that does not need to be detected. The target category information may be specified or preset by the user as needed, or category information other than the background may be set as default target category information.
For example: the candidate frames corresponding to the image to be processed correspond to: in the case where the detected target object includes a person, a horse, a car, a dog, a background, and the like, it is possible to determine that the target category information includes a person, a horse, a car, a dog, and the like, and thus it is possible to determine the first candidate frame corresponding to the candidate frame of any one category of the person, the horse, the car, and the dog.
After the plurality of first candidate frames are obtained, a target candidate frame can be determined from the plurality of first candidate frames, wherein the target candidate frame can be the candidate frame with the highest precision in the first candidate frames corresponding to the target category information. For example: in the case where the target category information includes a person, a horse, a car, and a dog, target candidate frames corresponding to the four categories of the person, the horse, the car, and the dog may be determined from the first candidate frames, respectively. The number of the target candidate frames in the embodiment of the present disclosure is not specifically limited, and may be one or more.
For example, a candidate frame with highest accuracy may be determined from the first candidate frames with the same category information as the target candidate frame, where the accuracy may be represented by identifying the confidence level of the category information corresponding to the candidate frame output by the network, and the higher the confidence level, the higher the accuracy. For example: the first candidate frame with the highest confidence degree can be determined as the target candidate frame from the first candidate frames with the category information of people, horses, vehicles and dogs.
Alternatively, an NMS (Non-Maximum Suppression ) algorithm may be employed to determine the target candidate box from the first candidate box. For example, after all the first candidate frames are ranked according to the confidence level, the first candidate frame with the highest confidence level may be selected as the first target candidate frame. And continuing to traverse the rest of the first candidate frames, determining the overlapping degree of the traversed first candidate frames and the first target candidate frames, deleting the first candidate frames when the overlapping degree of the traversed first candidate frames and the first target candidate frames is larger than a preset threshold value, and determining all the rest of the first candidate frames (including the first target candidate frames) as target candidate frames after traversing all the first candidate frames.
The overlapping degree of the traversed first candidate frame and the first target candidate frame may be determined according to the traversed area of the first candidate frame and the traversed area of the first target candidate frame, and for example, the overlapping degree may be determined by referring to the following formula (1).
Wherein the IOU1 is used for identifying the overlapping degree of the first candidate frame currently traversed and the first target candidate frame, S 1 For identifying the area of the traversed first candidate box, S 2 For identifying the area of the first target candidate frame S 1,2 An overlapping area of the traversed first candidate box and the first target candidate box is identified.
After the target candidate frames are obtained, detection results aiming at the images to be processed can be obtained according to the target candidate frames and the category information of the target candidate frames, wherein the image content in the target candidate frames corresponds to the target detection objects, and the category information of the target candidate frames corresponds to the category information of the target detection objects.
In a possible implementation manner, the obtaining, according to the target candidate frame and the category information corresponding to the target candidate frame, a detection result of the image to be processed may include:
and correcting and adjusting the coordinate information of the target candidate frame, and obtaining a detection result of the image to be processed according to the category information corresponding to the target candidate frame and the adjusted target candidate frame.
For example, after the target candidate frame is obtained, the coordinate information of the target candidate frame may be corrected and adjusted to obtain a more accurate target candidate frame, thereby obtaining a more accurate detection result.
For example, a regression linear network can be trained according to the sample candidate frame generated by the candidate frame generating network and the labeling frame in the sample image, and the regression linear network can be used for correcting and adjusting the size, the position and other information of the sample candidate frame, so that the adjusted size, the position and other information of the sample candidate frame and the labeling frame are infinitely close. For example: the regression linear network can be trained according to the left upper corner coordinate and the right lower corner coordinate of the sample candidate frame and the left upper corner coordinate and the right lower corner coordinate of the labeling frame, and after the regression linear network corrects and adjusts the left upper corner coordinate and the right lower corner coordinate of the sample candidate frame, the left upper corner coordinate and the right lower corner coordinate of the sample candidate frame can be infinitely close to the left upper corner coordinate and the right lower corner coordinate of the labeling frame.
After the target candidate frame is obtained, the target candidate frame can be corrected and adjusted according to the regression linear network, an adjusted target candidate frame is obtained, and a detection result of the image to be processed is obtained according to the category information of the target candidate frame and the adjusted target candidate frame, wherein the image content in the adjusted target candidate frame in the detection result corresponds to the target detection object, and the category information of the target candidate frame corresponds to the category of the target detection object.
Therefore, the target candidate frame can be corrected and adjusted to obtain a more accurate detection result, and the accuracy of the detection result is improved.
In one possible implementation, the method may further include: training the identification network according to the first training set, wherein the first training set comprises a plurality of sample groups, the sample groups comprise sample images and labeling information of the sample images,
the training the identification network according to the first training set may include:
performing image processing on a sample image through the candidate frame generation network to obtain at least one sample candidate frame message, wherein the sample candidate frame message comprises a sample candidate frame and image characteristic information of the sample candidate frame;
obtaining labeling category information corresponding to the sample candidate frame through the labeling information of the sample candidate frame and the sample image;
obtaining a second training set according to the sample candidate frame information and the labeling category information corresponding to each sample candidate frame;
training the classification network through the second training set;
and generating a network according to the trained classification network and the candidate frame to obtain the identification network.
For example, the electronic device may obtain the first training set by uploading, downloading, etc.; alternatively, the electronic device may obtain the first training set in response to a user's creation operation for the first training set. The user can obtain the sample group through marking the marking frame of the sample image and marking category information corresponding to the marking frame, and then the first training set is created through marking the obtained plurality of sample groups. For example: the user can add and display a label frame in the sample image by triggering the adding control, and can adjust the position and the size of the label frame in a dragging mode and the like, after labeling the label category information corresponding to the label frame, one label information of the sample image can be obtained, and the method is repeated, and after at least one label information of the sample image is obtained, a sample group can be obtained by triggering the determining control. And so on, after obtaining the plurality of sample groups, a first training set may be created from the plurality of sample groups.
And calling a pre-trained candidate frame generation network to process the sample images in the first training set to obtain at least one sample candidate frame in the sample images and image characteristic information corresponding to each sample candidate frame. For example, a labeling frame that matches each sample candidate frame may be determined from the labeling frames corresponding to the sample images, and labeling category information corresponding to the labeling frames that match each sample candidate frame may be determined as labeling category information of each sample candidate frame.
For example, the matching degree of the sample candidate frame and each labeling frame may be determined, and the labeling frame having the highest matching degree with the sample candidate frame may be determined as the labeling frame matching with the sample candidate frame. In the embodiment of the disclosure, the degree of matching between the sample candidate frame and the labeling frame can be represented by the degree of overlapping between the sample candidate frame and the labeling frame, and the higher the degree of overlapping is, the higher the degree of matching is. In one possible implementation, the overlapping degree of the sample candidate frame and the labeling frame may be determined according to the area of the sample candidate frame and the area of the labeling frame, and for an exemplary manner, the overlapping degree may be determined by referring to the following formula (2).
Wherein, the IOU2 is used for identifying the overlapping degree of the sample candidate frame and the labeling frame, S 3 For identifying areas of sample candidate boxes, S 4 Area for identifying marking frame S 3,4 For identifying the overlapping area of the sample candidate box and the label box.
After the labeling category information corresponding to the sample candidate frames is obtained, a plurality of sample groups can be obtained according to the sample candidate frame information and the labeling category information corresponding to the sample candidate frames, wherein the plurality of sample groups form a second training set, namely, any sample group in the second training set comprises the sample candidate frames, the image characteristic information corresponding to the sample candidate frames and the labeling category information corresponding to the sample candidate frames.
The image feature information corresponding to the sample candidate frame in the second training set may be used as input data of the classification network, and the labeling category information corresponding to the sample candidate frame may be used as labeling data to train the classification network. After training of the classification network is completed, a network and a trained classification network can be generated according to the pre-trained candidate frames, an identification network is constructed, the image to be processed is processed through the identification network, at least one candidate frame in the image to be processed and category information corresponding to each candidate frame are obtained, at least one identification result is obtained, and a detection result aiming at the image to be processed is obtained according to the identification result.
In the embodiment of the disclosure, the user can create the first training set through simple marking operation to train the identification network, and in the process of training the identification network, only training the classification network can generate the network according to the trained classification network and the pre-trained candidate frame to obtain the identification network.
In one possible implementation, the training the classification network through the second training set may include:
Classifying each sample candidate frame through the classification network according to the image characteristic information corresponding to each sample candidate frame to obtain prediction category information corresponding to each sample candidate frame;
and training the classification network according to the prediction category information corresponding to the sample candidate frame and the labeling category information corresponding to the sample candidate frame to obtain a trained classification network.
For example, the image feature information corresponding to the sample candidate frame may be input into a classification network, and the classification network obtains the prediction category information corresponding to the sample candidate frame after classifying the image feature information corresponding to the sample candidate frame. According to the prediction category information corresponding to the sample candidate frame and the labeling category information corresponding to the sample candidate frame, the classification loss of the classification network can be determined, and then the network parameters of the classification network are adjusted according to the classification loss until the classification loss of the classification network meets the training requirement (for example, the classification loss of the classification network is smaller than a loss threshold value), the training on the classification network is completed, and the trained classification network is obtained.
In a possible implementation manner, the image processing, through the candidate frame generating network, on the sample image, to obtain at least one sample candidate frame information may include:
Responding to training operation aiming at the identification network, and calling a candidate frame matched with a target format from a link library to generate a network according to the target format supported by the electronic equipment;
and performing image processing on the sample image through the candidate frame generation network matched with the target format to obtain at least one sample candidate frame information in the sample image.
For example, the candidate frame generation network may be trained on the large-scale target detection data set in advance, and after the candidate frame generation network is compressed and accelerated by the network model, the candidate frame generation network may be packaged into a format suitable for a system such as windows, linux, macos, and stored in a link library such as dll (Dynamic Link Library) and so (Shared Object) and dylib, respectively, for external calling. The compression and acceleration process of the network model for the candidate frame generation network may be implemented by using a model quantization technology, and the specific process may be referred to the related technology, which is not described herein.
Under the condition that a user triggers training operation (such as sending a training instruction in a command line mode or triggering operation such as a control used for sending the training instruction) aiming at the identification network, a target format supported by the electronic equipment can be determined according to a system operated by the electronic equipment, and a candidate frame generation network matched with the target format is called from a link library so as to process a sample image according to the called candidate frame generation network, and a sample candidate frame in the sample image and image feature information corresponding to the sample candidate frame are obtained. For example: in the case that the electronic device is in windows format, it may be determined that the target format supported by the electronic device is in dll format, and then the candidate box of the target format may be called from the dll link library to generate the network.
In this way, the target detection method provided by the embodiment of the disclosure can directly call the network in the link library, does not need to install a deep learning frame, can be used in a cross-platform manner, namely can run on the windows, linux, macos system, relieves the problem that the target detection method is poor in compatibility due to the fact that the implementation of the target detection network often depends on the deep learning frame and some existing algorithm packages, the environment depends more and the network is not run in operation under different operating systems frequently, and the target detection method provided by the embodiment of the disclosure is high in compatibility and simple in operation in the network training process.
In one possible implementation manner, the labeling information includes a labeling frame and labeling category information corresponding to the labeling frame, and the obtaining, by using the labeling information of the sample candidate frame and the sample image, the labeling category information corresponding to the sample candidate frame may include:
determining the overlapping degree of the sample candidate frame and any labeling frame aiming at any sample candidate frame;
and determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame.
For example, after obtaining at least one sample candidate frame of the sample image, the overlapping degree of each sample candidate frame and each labeling frame of the sample image may be determined, and the manner of determining the overlapping degree may refer to the foregoing embodiment, which is not described herein. The matching relation between the sample candidate frame and the labeling frame can be determined according to the overlapping degree of the sample candidate frame and the labeling frame, and then the labeling category information of the sample candidate frame can be determined according to the matching relation and the labeling category information of the labeling frame, namely, the labeling category information of the labeling frame which is matched with the sample candidate frame is determined as the labeling category information of the sample candidate frame.
In a possible implementation manner, the determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame may include:
determining a target annotation frame from at least one annotation frame under the condition that the overlapping degree of the sample candidate frame and the at least one annotation frame is greater than or equal to an overlapping degree threshold, wherein the overlapping degree of the target annotation frame and the sample candidate frame is highest;
and taking the labeling category information of the target labeling frame as the labeling category information of the sample candidate frame.
For example, after the overlapping degree of any sample candidate frame and at least one labeling frame of the sample image is obtained, a labeling frame with the overlapping degree greater than or equal to the overlapping degree threshold value with respect to the sample candidate frame may be determined, the labeling frame with the highest overlapping degree in the determined labeling frames is taken as the target labeling frame, and the labeling category information of the target labeling frame is determined as the labeling category information of the sample candidate frame. The overlapping degree threshold is a preset value, and the specific value of the overlapping degree threshold in the embodiment of the disclosure is not specifically limited.
In a possible implementation manner, the determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame may include:
and under the condition that the overlapping degree of the sample candidate frame and any labeling frame is smaller than an overlapping degree threshold value, determining labeling category information of the sample candidate frame as a background category.
For example, if the overlapping degree of the sample candidate frame and any of the labeling frames in the sample image is smaller than the overlapping degree threshold, it is indicated that there is no labeling frame matching with the sample candidate frame in the sample image, that is, it may be determined that the sample candidate frame does not frame the target detection object (or does not frame the target detection object completely), and then the labeling frame type of the sample candidate frame may be determined as the background type.
In this way, the labeling category information of all sample candidate frames of the sample image can be obtained, and then the classification network can be trained according to the labeling category information of the sample candidate frames so as to obtain the identification network, and target detection for the image to be processed can be realized according to the identification network.
In order that those skilled in the art may better understand the embodiments of the present disclosure, the embodiments of the present disclosure will be described below with specific examples.
Referring to fig. 2, the candidate box generation network may be written in a programming language such as c++ (The c++ Programming Language/C plus plus, c++ language) in advance, and after training The candidate box generation network on The large-scale target detection dataset, the candidate box generation network may be packaged into a link library such as. Dll, & so, & dylib for external call. The candidate frame generation network is capable of generating, for a given picture, a plurality of candidate frames that may be objects of persons, objects, and the like, and image feature information corresponding to image contents in respective candidate frame areas.
It should be noted that, because the running efficiency of c++ is high, and the c++ may be conveniently packaged into a dynamic link library, so that hybrid programming is convenient, in this disclosure, the candidate box generating network is written by using c++ in the example, but in practice, the candidate box generating network may also be written by using other programming languages, for example: c language, python language, etc., the disclosed embodiments do not specifically limit the programming language in which the candidate box generation network is written.
The user may mark the images in the prepared training set through the marking software, that is, mark the marking frame of the sample image and marking category information corresponding to the image content in the marking frame (the marking process may refer to the foregoing embodiment, and the disclosure embodiment is not repeated herein), so as to obtain the first training set.
And (3) according to a system call matched candidate frame generation network operated by the electronic equipment, carrying out image processing on the sample images in the first training set to obtain a plurality of sample candidate frames in the sample images and image characteristic information corresponding to each sample candidate frame, and referring to FIG. 3, wherein the left image in FIG. 3 is the sample image, the right image is the sample image processed by the candidate frame generation network, and the processed sample image comprises a plurality of sample candidate frames.
Determining a labeling frame matched with the sample candidate frame according to the overlapping degree of the sample candidate frame of the sample image and each labeling frame, and taking labeling category information of the sample labeling frame as labeling category information of the sample candidate frame; alternatively, in the case where there is no labeling frame matching the sample candidate frame, labeling category information of the sample candidate frame may be determined as the background category. And constructing a second training set according to the sample candidate frame, the image characteristic information corresponding to the sample candidate frame and the labeling category information of the sample candidate frame so as to train the classification network according to the second training set.
The image feature information corresponding to the sample candidate frame in the second training set is classified through the classification network to obtain prediction type information corresponding to the sample candidate frame, and the classification network is trained according to the prediction type information of the sample candidate frame and the labeling type information of the sample candidate frame.
And generating a network according to the trained classification network and the pre-trained sample candidate frame, and forming an identification network.
Referring to fig. 4, after the image to be processed (see the image in the upper left corner in fig. 4) is input into the recognition network to perform recognition processing, at least one candidate frame (see the image in the upper right corner in fig. 4) of the image to be processed and the category information corresponding to each candidate frame can be obtained. After determining the first candidate frame (see the image in the lower right corner in fig. 4) with the category information as the target category information from the candidate frames, determining the first candidate frame with the highest precision as the target candidate frame (see the image in the lower left corner in fig. 4), adjusting and correcting the target candidate frame, and obtaining the detection result of the image to be processed according to the adjusted target candidate frame and the category information corresponding to the target candidate frame.
According to the target detection method provided by the embodiment of the disclosure, detection of the image to be processed can be realized by utilizing the identification network, and because the training of the classification network can be guided by utilizing the pre-trained candidate frame generation network in the training process of the identification network, the identification network can be further constructed by utilizing the candidate frame generation network and the trained classification network. The target detection method provided by the embodiment of the disclosure only needs to train the classification network, and because the classification network is a lightweight network and has few network parameters, the requirements on the storage and calculation capacities of the electronic equipment can be reduced, the network training period is shortened, the operation is simple, and the method compatibility is high.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides an object detection device, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the object detection methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.
Fig. 5 shows a block diagram of an object detection apparatus according to an embodiment of the present disclosure, as shown in fig. 5, the apparatus including:
an acquisition module 51, which may be configured to acquire an image to be processed;
the first processing module 52 may be configured to perform image recognition processing on the image to be processed through a recognition network, to obtain at least one recognition result of the image to be processed, where the recognition result includes a candidate frame and category information corresponding to the candidate frame;
a second processing module 53, configured to obtain a detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame,
the recognition network comprises a candidate frame generation network and a classification network, wherein in the training process of the recognition network, a second training set aiming at the classification network is constructed according to the processing result of the pre-trained candidate frame generation network aiming at a sample image in a first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the recognition network is obtained according to the candidate frame generation network and the trained classification network.
In this way, the electronic device can perform image recognition processing on the image to be processed through the recognition network after the image to be processed is acquired, so as to obtain at least one recognition result of the image to be processed, wherein the recognition result comprises the candidate frame and the category information corresponding to the candidate frame. The electronic equipment can obtain the detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame. The recognition network comprises a candidate frame generation network and a classification network, and in the training process of the recognition network, a second training set aiming at the classification network can be constructed according to the processing result of the pre-trained candidate frame generation network aiming at the sample image in the first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the recognition network is obtained according to the pre-trained candidate frame generation network and the trained classification network.
According to the target detection device provided by the embodiment of the disclosure, target detection for the image to be processed can be realized through the identification network, and only the classification network is required to be trained in the training process of the identification network.
In a possible implementation manner, the second processing module 53 may be further configured to:
determining a first candidate frame of which the category information is target category information from the candidate frames;
determining a target candidate frame from the first candidate frames;
and obtaining a detection result of the image to be processed according to the target candidate frame and the category information corresponding to the target candidate frame.
In a possible implementation manner, the second processing module 53 may be further configured to:
and correcting and adjusting the coordinate information of the target candidate frame, and obtaining a detection result of the image to be processed according to the category information corresponding to the target candidate frame and the adjusted target candidate frame.
In one possible implementation, the first processing module 52 may be further configured to:
Performing image processing on the image to be processed through the candidate frame generating network to obtain at least one piece of candidate frame information of the image to be processed, wherein the candidate frame information comprises a candidate frame and image characteristic information of the candidate frame;
respectively carrying out image recognition on the image content in each candidate frame according to the image characteristic information corresponding to each candidate frame through the classification network to obtain the category information corresponding to each candidate frame;
and obtaining at least one identification result of the image to be processed according to the category information corresponding to the at least one candidate frame.
In a possible implementation manner, the apparatus may further include a training module, where the training module may be configured to train the identification network according to the first training set, and the first training set includes a plurality of sample groups, where the sample groups include sample images and labeling information of the sample images, and the training module may be further configured to:
performing image processing on a sample image through the candidate frame generation network to obtain at least one sample candidate frame message, wherein the sample candidate frame message comprises a sample candidate frame and image characteristic information of the sample candidate frame;
Obtaining labeling category information corresponding to the sample candidate frame through the labeling information of the sample candidate frame and the sample image;
obtaining a second training set according to the sample candidate frame information and the labeling category information corresponding to each sample candidate frame;
training the classification network through the second training set;
and generating a network according to the trained classification network and the candidate frame to obtain the identification network.
In one possible implementation, the training module may be further configured to:
classifying each sample candidate frame through the classification network according to the image characteristic information corresponding to each sample candidate frame to obtain prediction category information corresponding to each sample candidate frame;
and training the classification network according to the prediction category information corresponding to the sample candidate frame and the labeling category information corresponding to the sample candidate frame to obtain a trained classification network.
In one possible implementation, the training module may be further configured to:
responding to training operation aiming at the identification network, and calling a candidate frame matched with a target format from a link library to generate a network according to the target format supported by the electronic equipment;
And performing image processing on the sample image through the candidate frame generation network matched with the target format to obtain at least one sample candidate frame information in the sample image.
In one possible implementation manner, the labeling information includes a labeling frame and labeling category information corresponding to the labeling frame, and the training module may be further configured to:
determining the overlapping degree of the sample candidate frame and any labeling frame aiming at any sample candidate frame;
and determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame.
In one possible implementation, the training module may be further configured to:
determining a target annotation frame from at least one annotation frame under the condition that the overlapping degree of the sample candidate frame and the at least one annotation frame is greater than or equal to an overlapping degree threshold, wherein the overlapping degree of the target annotation frame and the sample candidate frame is highest;
and taking the labeling category information of the target labeling frame as the labeling category information of the sample candidate frame.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the object detection method as provided in any of the embodiments above.
The disclosed embodiments also provide another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the object detection method provided in any of the above embodiments.
The electronic device may be provided as a terminal, server or other form of device.
Fig. 6 shows a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 6, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a photosensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.
Fig. 7 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to FIG. 7, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. Electronic device 1900 may operate an operating system based on memory 1932, such as the Microsoft Server operating system (Windows Server) TM ) Apple Inc. developed graphical user interface based operating System (Mac OS X TM ) Multi-user multi-process computer operating system (Unix) TM ) Unix-like operating system (Linux) of free and open source code TM ) Unix-like operating system (FreeBSD) with open source code TM ) Or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (12)

1. A method of object detection, characterized in that it is applied to an electronic device, the method comprising:
acquiring an image to be processed;
performing image recognition processing on the image to be processed through a recognition network to obtain at least one recognition result of the image to be processed, wherein the recognition result comprises a candidate frame and category information corresponding to the candidate frame;
Obtaining a detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame,
the recognition network comprises a candidate frame generation network and a classification network, wherein in the training process of the recognition network, a second training set aiming at the classification network is constructed according to the processing result of the pre-trained candidate frame generation network aiming at a sample image in a first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the recognition network is obtained according to the candidate frame generation network and the trained classification network.
2. The method according to claim 1, wherein the obtaining the detection result of the image to be processed according to each candidate frame and the category information corresponding to each candidate frame includes:
determining a first candidate frame of which the category information is target category information from the candidate frames;
determining a target candidate frame from the first candidate frames;
and obtaining a detection result of the image to be processed according to the target candidate frame and the category information corresponding to the target candidate frame.
3. The method according to claim 2, wherein the obtaining the detection result of the image to be processed according to the target candidate frame and the category information corresponding to the target candidate frame includes:
And correcting and adjusting the coordinate information of the target candidate frame, and obtaining a detection result of the image to be processed according to the category information corresponding to the target candidate frame and the adjusted target candidate frame.
4. A method according to any one of claims 1 to 3, wherein said performing image recognition processing on the image to be processed through the recognition network to obtain at least one recognition result of the image to be processed comprises:
performing image processing on the image to be processed through the candidate frame generating network to obtain at least one piece of candidate frame information of the image to be processed, wherein the candidate frame information comprises a candidate frame and image characteristic information of the candidate frame;
respectively carrying out image recognition on the image content in each candidate frame according to the image characteristic information corresponding to each candidate frame through the classification network to obtain the category information corresponding to each candidate frame;
and obtaining at least one identification result of the image to be processed according to the category information corresponding to the at least one candidate frame.
5. The method according to any one of claims 1 to 4, further comprising: training the identification network according to the first training set, wherein the first training set comprises a plurality of sample groups, the sample groups comprise sample images and labeling information of the sample images,
The training the identification network according to the first training set includes:
performing image processing on a sample image through the candidate frame generation network to obtain at least one sample candidate frame message, wherein the sample candidate frame message comprises a sample candidate frame and image characteristic information of the sample candidate frame;
obtaining labeling category information corresponding to the sample candidate frame through the labeling information of the sample candidate frame and the sample image;
obtaining a second training set according to the sample candidate frame information and the labeling category information corresponding to each sample candidate frame;
training the classification network through the second training set;
and generating a network according to the trained classification network and the candidate frame to obtain the identification network.
6. The method of claim 5, wherein the training the classification network through the second training set comprises:
classifying each sample candidate frame through the classification network according to the image characteristic information corresponding to each sample candidate frame to obtain prediction category information corresponding to each sample candidate frame;
and training the classification network according to the prediction category information corresponding to the sample candidate frame and the labeling category information corresponding to the sample candidate frame to obtain a trained classification network.
7. The method according to claim 5 or 6, wherein the image processing of the sample image through the candidate frame generation network to obtain at least one sample candidate frame information comprises:
responding to training operation aiming at the identification network, and calling a candidate frame matched with a target format from a link library to generate a network according to the target format supported by the electronic equipment;
and performing image processing on the sample image through the candidate frame generation network matched with the target format to obtain at least one sample candidate frame information in the sample image.
8. The method according to any one of claims 5 to 7, wherein the labeling information includes a labeling frame and labeling category information corresponding to the labeling frame, and the obtaining labeling category information corresponding to the sample candidate frame by the labeling information of the sample candidate frame and the sample image includes:
determining the overlapping degree of the sample candidate frame and any labeling frame aiming at any sample candidate frame;
and determining the labeling category information of the sample candidate frame according to the overlapping degree of the sample candidate frame and any labeling frame.
9. The method of claim 8, wherein determining the labeling category information of the sample candidate box according to the overlapping degree of the sample candidate box and any labeling box comprises:
determining a target annotation frame from at least one annotation frame under the condition that the overlapping degree of the sample candidate frame and the at least one annotation frame is greater than or equal to an overlapping degree threshold, wherein the overlapping degree of the target annotation frame and the sample candidate frame is highest;
and taking the labeling category information of the target labeling frame as the labeling category information of the sample candidate frame.
10. An object detection apparatus, characterized by being applied to an electronic device, comprising:
the acquisition module is used for acquiring the image to be processed;
the first processing module is used for carrying out image recognition processing on the image to be processed through a recognition network to obtain at least one recognition result of the image to be processed, wherein the recognition result comprises a candidate frame and category information corresponding to the candidate frame;
a second processing module, configured to obtain a detection result of the image to be processed according to each candidate frame and category information corresponding to each candidate frame,
The recognition network comprises a candidate frame generation network and a classification network, wherein in the training process of the recognition network, a second training set aiming at the classification network is constructed according to the processing result of the pre-trained candidate frame generation network aiming at a sample image in a first training set and the labeling information of the sample image, and the classification network is trained according to the second training set, so that the recognition network is obtained according to the candidate frame generation network and the trained classification network.
11. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 9.
12. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 9.
CN202110340256.7A 2021-03-30 2021-03-30 Target detection method and device, electronic equipment and storage medium Active CN113065591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110340256.7A CN113065591B (en) 2021-03-30 2021-03-30 Target detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110340256.7A CN113065591B (en) 2021-03-30 2021-03-30 Target detection method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113065591A CN113065591A (en) 2021-07-02
CN113065591B true CN113065591B (en) 2023-11-28

Family

ID=76564543

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110340256.7A Active CN113065591B (en) 2021-03-30 2021-03-30 Target detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113065591B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114241374B (en) * 2021-12-14 2022-12-13 百度在线网络技术(北京)有限公司 Training method of live broadcast processing model, live broadcast processing method, device and equipment
CN114821513B (en) * 2022-06-29 2022-09-09 威海凯思信息科技有限公司 Image processing method and device based on multilayer network and electronic equipment
CN116128954B (en) * 2022-12-30 2023-12-05 上海强仝智能科技有限公司 Commodity layout identification method, device and storage medium based on generation network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409517A (en) * 2018-09-30 2019-03-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
WO2020228179A1 (en) * 2019-05-15 2020-11-19 平安科技(深圳)有限公司 Picture instance detection method and apparatus, computer device, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229509B (en) * 2016-12-16 2021-02-26 北京市商汤科技开发有限公司 Method and device for identifying object class and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409517A (en) * 2018-09-30 2019-03-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
WO2020228179A1 (en) * 2019-05-15 2020-11-19 平安科技(深圳)有限公司 Picture instance detection method and apparatus, computer device, and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于快速卷积神经网络的果园果实检测试验研究;张磊;姜军生;李昕昱;宋健;解福祥;;中国农机化学报(10);全文 *

Also Published As

Publication number Publication date
CN113065591A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN113538517B (en) Target tracking method and device, electronic equipment and storage medium
CN110889469B (en) Image processing method and device, electronic equipment and storage medium
WO2021051857A1 (en) Target object matching method and apparatus, electronic device and storage medium
CN113065591B (en) Target detection method and device, electronic equipment and storage medium
US11301726B2 (en) Anchor determination method and apparatus, electronic device, and storage medium
CN112465843A (en) Image segmentation method and device, electronic equipment and storage medium
CN109615006B (en) Character recognition method and device, electronic equipment and storage medium
CN111242303B (en) Network training method and device, and image processing method and device
CN110858924B (en) Video background music generation method and device and storage medium
CN114078118A (en) Defect detection method and device, electronic equipment and storage medium
CN111435432B (en) Network optimization method and device, image processing method and device and storage medium
US20210326649A1 (en) Configuration method and apparatus for detector, storage medium
CN111259967B (en) Image classification and neural network training method, device, equipment and storage medium
CN114240882A (en) Defect detection method and device, electronic equipment and storage medium
EP3147802A1 (en) Method and apparatus for processing information
CN110532956B (en) Image processing method and device, electronic equipment and storage medium
CN111222637A (en) Neural network model deployment method and device, electronic equipment and storage medium
CN109685041B (en) Image analysis method and device, electronic equipment and storage medium
CN110633715B (en) Image processing method, network training method and device and electronic equipment
CN110764627B (en) Input method and device and electronic equipment
CN109992754B (en) Document processing method and device
CN112598676B (en) Image segmentation method and device, electronic equipment and storage medium
CN110955800A (en) Video retrieval method and device
EP3825894A1 (en) Method, device and terminal for performing word segmentation on text information, and storage medium
CN111488964A (en) Image processing method and device and neural network training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant