CN112560956A - Target detection method and device, nonvolatile storage medium and electronic equipment - Google Patents
Target detection method and device, nonvolatile storage medium and electronic equipment Download PDFInfo
- Publication number
- CN112560956A CN112560956A CN202011493902.5A CN202011493902A CN112560956A CN 112560956 A CN112560956 A CN 112560956A CN 202011493902 A CN202011493902 A CN 202011493902A CN 112560956 A CN112560956 A CN 112560956A
- Authority
- CN
- China
- Prior art keywords
- layer
- target detection
- feature
- characteristic
- detection model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 167
- 238000007499 fusion processing Methods 0.000 claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 17
- 230000002708 enhancing effect Effects 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 46
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000012937 correction Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target detection method and device, a nonvolatile storage medium and electronic equipment. Wherein, the method comprises the following steps: obtaining a plurality of feature layers of a target detection model, wherein the plurality of feature layers comprise: the semantic information of the first characteristic layer is less than that of the second characteristic layer; performing fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, wherein the fusion processing is used for enhancing semantic information of the first feature layer; and updating the target detection model based on the target detection layer. The invention solves the technical problem of lower detection precision of the traditional target detection model in the prior art.
Description
Technical Field
The invention relates to the field of target detection, in particular to a target detection method and device, a nonvolatile storage medium and electronic equipment.
Background
In object detection, the classical target detection framework often does not perform well for small target detection tasks. Aiming at the problem of small target detection in an industrial environment, how to improve a classic target detection framework and improve the detection precision of small-scale targets is not provided at present.
Disclosure of Invention
The embodiment of the invention provides a target detection method and device, a nonvolatile storage medium and electronic equipment, which at least solve the technical problem of low detection precision of a traditional target detection model in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a target detection method, including: obtaining a plurality of feature layers of a target detection model, wherein the plurality of feature layers comprise: the semantic information of the first characteristic layer is less than that of the second characteristic layer; performing fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, wherein the fusion processing is used for enhancing semantic information of the first feature layer; and updating the target detection model based on the target detection layer.
Optionally, the fusing the first feature layer and the second feature layer in the plurality of feature layers to obtain the target detection layer includes: amplifying the first scale of the first characteristic layer into a second scale by adopting an interpolation mode, wherein the second scale is equal to the scale of a second characteristic layer adjacent to the first characteristic layer; adding the processed first characteristic layer and the second characteristic layer to obtain a new first characteristic layer; amplifying the third scale of the new first feature layer into a fourth scale by adopting an interpolation mode, wherein the fourth scale is equal to the scale of a new second feature layer adjacent to the new first feature layer; and adding the processed new first feature layer and the new second feature layer until the fusion processing of all the first feature layers and the second feature layers in the plurality of feature layers is finished, thereby obtaining the target detection layer.
Optionally, the method further includes: acquiring a real receptive field of the target detection model; adjusting an anchor frame of the target detection model based on the real receptive field, wherein the size of the anchor frame affects the anchor frame classification and the anchor frame regression of the target detection model; determining the number of samples of the target detection model according to the anchor frame, wherein the number of samples comprises: the number of positive samples and the number of negative samples.
Optionally, determining the number of samples of the target detection model according to the anchor frame includes: determining a loss function of the target detection model according to the anchor frame, wherein the loss function includes: a classification loss function and a regression loss function; the number of samples is determined based on the loss function.
Optionally, determining a loss function L of the target detection model according to the anchor frame by using the following calculation formula, including:
where i denotes the index of the anchor frame, PiRepresenting the prediction probability, P, of the above-mentioned anchor frame as the targeti *A true value indicating that the anchor frame is a target, and P is a true value when the anchor frame is the targeti *Is 1, otherwise Pi *Is 0; t is tiIndicating the predicted detection frame coordinate correction value,representing the actual coordinate value of the detection frame; pi *LrRepresenting the calculation of the regression loss, N, against the anchor box of the sample onlycAnd NrRespectively representing the number of positive and negative anchor frames and the number of regressed positive anchor frames during classification, and lambda represents a balance parameter for balancing classification loss and regression loss.
Optionally, the classification loss function is a Softmax loss function, and the regression loss function is a Smooth-L1 loss function.
According to another aspect of the embodiments of the present invention, there is also provided an object detection apparatus, including: an obtaining module, configured to obtain a plurality of feature layers of a target detection model, where the plurality of feature layers include: the semantic information of the first characteristic layer is less than that of the second characteristic layer; a fusion processing module, configured to perform fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, where the fusion processing is used to enhance semantic information of the first feature layer; and the updating module is used for updating the target detection model based on the target detection layer.
According to another aspect of the embodiments of the present invention, there is also provided a non-volatile storage medium storing a plurality of instructions, the instructions being adapted to be loaded by a processor and to perform any one of the above object detection methods.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program is configured to execute any one of the above object detection methods when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above object detection methods.
In the embodiment of the present invention, a plurality of feature layers of a target detection model are obtained, where the plurality of feature layers include: the semantic information of the first characteristic layer is less than that of the second characteristic layer; performing fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, wherein the fusion processing is used for enhancing semantic information of the first feature layer; and updating the target detection model based on the target detection layer.
The first characteristic layer and the second characteristic layer of the target detection model are subjected to recursive fusion processing, the semantic features of the first characteristic layer are enhanced, and the purpose of improving the detection accuracy of the target detection model is achieved, so that the technical effect of enhancing the detection accuracy of the target detection model is achieved, and the technical problem that the detection accuracy of the traditional target detection model is low in the prior art is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a method of target detection according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an alternative target detection method according to an embodiment of the invention;
fig. 3 is a schematic diagram of an implementation scenario of an optional fusion process performed on feature layers according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of an object detection method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 1 is a flowchart of an object detection method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
step S102, obtaining a plurality of characteristic layers of a target detection model, wherein the plurality of characteristic layers comprise: the semantic information of the first characteristic layer is less than that of the second characteristic layer;
step S104, performing fusion processing on the first characteristic layer and the second characteristic layer in the plurality of characteristic layers to obtain a target detection layer, wherein the fusion processing is used for enhancing semantic information of the first characteristic layer;
step S106, updating the target detection model based on the target detection layer.
In the embodiment of the present invention, a plurality of feature layers of a target detection model are obtained, where the plurality of feature layers include: the semantic information of the first characteristic layer is less than that of the second characteristic layer; performing fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, wherein the fusion processing is used for enhancing semantic information of the first feature layer; and updating the target detection model based on the target detection layer.
The shallow characteristic layer of the target detection model is used as a detection layer, the shallow characteristic layer and the deep characteristic layer are subjected to recursive fusion processing, the semantic characteristics of the shallow characteristic layer are enhanced, and the purpose of improving the detection accuracy of the target detection model is achieved, so that the technical effect of enhancing the detection accuracy of the target detection model is achieved, and the technical problem that the detection accuracy of the traditional target detection model is low in the prior art is solved.
Optionally, the target detection model may be any type of target detection model, for example, a Faster R-CNN target detection model, an SDD target detection model, a YOLO target detection model, and the like.
Taking the above target detection model as an SSD target detection model as an example, the SSD target detection model is a single-stage, efficient and general object detection framework. The effect of detecting the target in real time can be achieved while the precision is ensured, but when the SSD target detection model is used for detecting the small target, the detection effect is not ideal. According to analysis, the SSD target detection model is too deep in a feature detection layer, and the downsampling multiple is too large, so that feature information of a small target is lost. Secondly, the SSD anchor frame is laid at intervals which are too large, so that small targets are easily not in the anchor frame, and the small targets are also missed to be inspected.
In the embodiment of the application, aiming at the problem of low recall rate of small target detection, a shallow feature layer is used as a detection layer, and the shallow feature layer and a deep feature layer are fused to enhance semantic feature information; aiming at the problem that the positive and negative samples are extremely unbalanced in small target detection, a negative sample screening strategy is added, and the number of the positive and negative samples is balanced; the shallow characteristic layer is used as a detection layer, the deep characteristic layer and the shallow characteristic layer are fused, positive and negative samples are balanced, and the interference of the negative samples on detection results in target detection is reduced.
Optionally, the plurality of feature layers include: the semantic information of the first characteristic layer is less than that of the second characteristic layer.
It should be noted that the shallow feature and the deep feature are relative, for example, when two adjacent features are compared, the lower feature belongs to the shallow feature, and the upper feature belongs to the deep feature, but another upper feature adjacent to the upper feature is a deep feature of the upper feature, and the upper feature belongs to the shallow feature relative to the other upper feature.
In an optional embodiment, fig. 2 is a flowchart of an optional target detection method according to an embodiment of the present invention, and as shown in fig. 2, the fusing the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer includes:
step S202, enlarging the first scale of the first characteristic layer into a second scale by adopting an interpolation mode, wherein the second scale is equal to the scale of a second characteristic layer adjacent to the first characteristic layer;
step S204, adding the processed first characteristic layer and the second characteristic layer to obtain a new first characteristic layer;
step S206, enlarging the third dimension of the new first feature layer to a fourth dimension by adopting an interpolation mode, wherein the fourth dimension is equal to the dimension of a new second feature layer adjacent to the new first feature layer;
step S208 is to add the processed new first feature layer and the new second feature layer until the fusion process of all the first feature layers and the second feature layers in the plurality of feature layers is completed, thereby obtaining the target detection layer.
In order to solve the problem of the prior art, in the embodiment of the present application, a detection layer may be added to a shallow feature layer, however, semantic information of the shallow feature layer is not rich enough, and it is difficult to identify an object, and in order to solve the problem of less semantic information of the shallow feature layer, semantic information of the shallow feature layer may be enhanced by feature fusion with a deep feature layer through a feature pyramid, and a specific fusion processing procedure is as shown in fig. 3, based on an input image, a plurality of feature layers Conv3_3, Conv4_3, Conv5_3, Conv6_2, and Conv7_2 in an SSD object detection model are obtained as feature pyramid layers, starting from Conv7_2, the feature layer size is enlarged to be the same as the previous layer (Conv6_2) by means of interpolation, and the two are added to obtain a new feature layer P1, the P1 is subjected to the above interpolation operation and added to the Conv5_3 to obtain P2, the above operations were repeated to obtain P3 and P4, and P4 was used as the target detection layer.
In an optional embodiment, the method further includes:
step S302, acquiring a real receptive field of the target detection model;
step S304, adjusting an anchor frame of the target detection model based on the real receptive field, wherein the size of the anchor frame influences the anchor frame classification and the anchor frame regression of the target detection model;
step S306, determining the number of samples of the target detection model according to the anchor frame, where the number of samples includes: the number of positive samples and the number of negative samples.
In the embodiment of the present application, the size of the anchor frame of the target detection model may be, but is not limited to, set to 16 according to the size of the actual receptive field being twenty to forty percent of the theoretical receptive field, so that the anchor frame can be classified and regressed more accurately.
It should be noted that, in the embodiment of the present application, although the anchor frame laid by the target detection model can cover a small target more comprehensively, a large number of negative samples are generated, so that the number of the negative samples is far higher than that of the positive samples, the positive samples and the negative samples are extremely unbalanced, and the network training result at this time is necessarily biased toward the negative samples. For example, if only one of the 100 anchor boxes is a positive sample (target) and the other 99 anchor boxes are negative samples (background), the accuracy of the embodiment of the present application can be as high as 99% if the detection result is directly set as the background.
In an alternative embodiment, determining the number of samples of the target detection model according to the anchor frame includes:
step S402, determining a loss function of the target detection model according to the anchor frame, wherein the loss function includes: a classification loss function and a regression loss function;
in step S404, the number of samples is determined based on the loss function.
Aiming at the problem of imbalance of the positive and negative samples, the loss function of the target detection model can be determined through an anchor frame and sequenced, and the number of the positive samples and the number of the negative samples are respectively selected according to the proportion of 3:1 of the positive samples and the negative samples.
In an alternative embodiment, the loss function includes two parts, one is a classification loss function of whether the anchor frame is a human face, and the other is a regression loss function of the anchor frame as the coordinate correction value of the detection frame of the human face.
In an alternative embodiment, the classification loss function is a Softmax loss function, and the regression loss function is a Smooth-L1 loss function.
In an alternative embodiment, determining the loss function L of the target detection model according to the anchor frame by using the following calculation formula includes:
where i denotes the index of the anchor frame, PiRepresenting the prediction probability, P, of the above-mentioned anchor frame as the targeti *A true value indicating that the anchor frame is a target, and P is a true value when the anchor frame is the targeti *Is 1, otherwise Pi *Is 0; t is tiIndicating the predicted detection frame coordinate correction value,representing the actual coordinate value of the detection frame; pi *LrRepresenting the calculation of the regression loss, N, against the anchor box of the sample onlycAnd NrRespectively representing the number of positive and negative anchor frames and the number of regressed positive anchor frames during classification, and lambda represents a balance parameter for balancing classification loss and regression loss.
Aiming at the condition that the detection effect of the SSD target on the small target is not good, the shallow feature layer is used as a detection layer and is fused with the deep feature layer; adjusting the size of the anchor frame according to the real receptive field; and the number of the positive samples and the negative samples is balanced, so that the detection precision of the small target is improved.
Example 2
According to an embodiment of the present invention, there is further provided an apparatus embodiment for implementing the object detection method, and fig. 4 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention, as shown in fig. 4, the object detection apparatus includes: an obtaining module 400, a fusion processing module 402, and an updating module 404, wherein:
an obtaining module 400, configured to obtain a plurality of feature layers of a target detection model, where the plurality of feature layers include: the semantic information of the first characteristic layer is less than that of the second characteristic layer; a fusion processing module 402, configured to perform fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, where the fusion processing is used to enhance semantic information of the first feature layer; an updating module 404, configured to update the object detection model based on the object detection layer.
It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; alternatively, the modules may be located in different processors in any combination.
It should be noted here that the above-mentioned obtaining module 400, the fusion processing module 402 and the updating module 404 correspond to steps S102 to S106 in embodiment 1, and the above-mentioned modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to what is disclosed in embodiment 1. It should be noted that the modules described above may be implemented in a computer terminal as part of an apparatus.
It should be noted that, reference may be made to the relevant description in embodiment 1 for alternative or preferred embodiments of this embodiment, and details are not described here again.
The above object detection device may further include a processor and a memory, where the above obtaining module 400, the fusion processing module 402, the updating module 404, and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory, wherein one or more than one kernel can be arranged. The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
According to an embodiment of the present application, there is also provided an embodiment of a non-volatile storage medium. Optionally, in this embodiment, the nonvolatile storage medium includes a stored program, and the apparatus in which the nonvolatile storage medium is located is controlled to execute the any one of the object detection methods when the program runs.
Optionally, in this embodiment, the nonvolatile storage medium may be located in any one of a group of computer terminals in a computer network, or in any one of a group of mobile terminals, and the nonvolatile storage medium includes a stored program.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: obtaining a plurality of feature layers of a target detection model, wherein the plurality of feature layers comprise: the semantic information of the first characteristic layer is less than that of the second characteristic layer; performing fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, wherein the fusion processing is used for enhancing semantic information of the first feature layer; and updating the target detection model based on the target detection layer.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: amplifying the first scale of the first characteristic layer into a second scale by adopting an interpolation mode, wherein the second scale is equal to the scale of a second characteristic layer adjacent to the first characteristic layer; adding the processed first characteristic layer and the second characteristic layer to obtain a new first characteristic layer; amplifying the third scale of the new first feature layer into a fourth scale by adopting an interpolation mode, wherein the fourth scale is equal to the scale of a new second feature layer adjacent to the new first feature layer; and adding the processed new first feature layer and the new second feature layer until the fusion processing of all the first feature layers and the second feature layers in the plurality of feature layers is finished, thereby obtaining the target detection layer.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: acquiring a real receptive field of the target detection model; adjusting an anchor frame of the target detection model based on the real receptive field, wherein the size of the anchor frame affects the anchor frame classification and the anchor frame regression of the target detection model; determining the number of samples of the target detection model according to the anchor frame, wherein the number of samples comprises: the number of positive samples and the number of negative samples.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: determining a loss function of the target detection model according to the anchor frame, wherein the loss function includes: a classification loss function and a regression loss function; the number of samples is determined based on the loss function.
Optionally, the apparatus in which the non-volatile storage medium is controlled to perform the following functions when the program is executed: determining a loss function L of the target detection model according to the anchor frame by adopting the following calculation formula, wherein the calculation formula comprises the following steps:
where i denotes the index of the anchor frame, PiRepresenting the prediction probability, P, of the above-mentioned anchor frame as the targeti *A true value indicating that the anchor frame is a target, and P is a true value when the anchor frame is the targeti *Is 1, otherwise Pi *Is 0; t is tiIndicating the predicted detection frame coordinate correction value,representing the actual coordinate value of the detection frame; pi *LrRepresenting the calculation of the regression loss, N, against the anchor box of the sample onlycAnd NrRespectively representing the number of positive and negative anchor frames and the number of regressed positive anchor frames during classification, and lambda represents a balance parameter for balancing classification loss and regression loss.
According to an embodiment of the present application, there is also provided an embodiment of a processor. Optionally, in this embodiment, the processor is configured to execute a program, where the program executes the any one of the object detection methods.
According to an embodiment of the present application, there is also provided an embodiment of an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the above object detection methods.
There is further provided, in accordance with an embodiment of the present application, an embodiment of a computer program product, which, when executed on a data processing device, is adapted to execute a program initialized with the steps of the object detection method of any one of the above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable non-volatile storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a non-volatile storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned nonvolatile storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A method of object detection, comprising:
obtaining a plurality of feature layers of an object detection model, wherein the plurality of feature layers comprise: the semantic information of the first characteristic layer is less than that of the second characteristic layer;
performing fusion processing on the first characteristic layer and the second characteristic layer in the plurality of characteristic layers to obtain a target detection layer, wherein the fusion processing is used for enhancing semantic information of the first characteristic layer;
updating the target detection model based on the target detection layer.
2. The method according to claim 1, wherein fusing the first feature layer and the second feature layer of the plurality of feature layers to obtain a target detection layer comprises:
amplifying a first scale of the first characteristic layer into a second scale in an interpolation mode, wherein the second scale is equal to the scale of a second characteristic layer adjacent to the first characteristic layer;
adding the processed first characteristic layer and the second characteristic layer to obtain a new first characteristic layer;
enlarging the third scale of the new first feature layer to a fourth scale in an interpolation mode, wherein the fourth scale is equal to the scale of a new second feature layer adjacent to the new first feature layer;
and adding the processed new first characteristic layer and the new second characteristic layer until the fusion processing of all the first characteristic layers and the second characteristic layers in the plurality of characteristic layers is finished, so as to obtain the target detection layer.
3. The method of claim 1, further comprising:
acquiring a real receptive field of the target detection model;
adjusting an anchor frame of the target detection model based on the real receptive field, wherein the size of the anchor frame affects anchor frame classification and anchor frame regression of the target detection model;
determining a sample number of the target detection model according to the anchor frame, wherein the sample number comprises: the number of positive samples and the number of negative samples.
4. The method of claim 3, wherein determining the number of samples of the target detection model from the anchor box comprises:
determining a loss function of the target detection model according to the anchor frame, wherein the loss function comprises: a classification loss function and a regression loss function;
determining the number of samples based on the loss function.
5. The method of claim 4, wherein determining a loss function L of the object detection model from the anchor block using the following computational formula comprises:
where i denotes the index of the anchor frame, PiRepresenting the prediction probability, P, of said anchor frame as targeti *A true value indicating that the anchor frame is a target, P when the anchor frame is the targeti *Is 1, otherwise Pi *Is 0; t is tiCorrection value of coordinate of detection frame, ti *Representing the actual coordinate value of the detection frame; pi *LrRepresenting the calculation of the regression loss, N, against the anchor box of the sample onlycAnd NrRespectively representing the number of positive and negative anchor frames and the number of regressed positive anchor frames during classification, and lambda represents a balance parameter for balancing classification loss and regression loss.
6. The method of claim 4,
the classification loss function adopts a Softmax loss function, and the regression loss function adopts a Smooth-L1 loss function.
7. An object detection device, comprising:
an obtaining module, configured to obtain a plurality of feature layers of a target detection model, where the plurality of feature layers include: the semantic information of the first characteristic layer is less than that of the second characteristic layer;
a fusion processing module, configured to perform fusion processing on the first feature layer and the second feature layer in the plurality of feature layers to obtain a target detection layer, where the fusion processing is used to enhance semantic information of the first feature layer;
an update module to update the target detection model based on the target detection layer.
8. A non-volatile storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the object detection method of any one of claims 1 to 6.
9. A processor for running a program, wherein the program is arranged to perform the object detection method of any one of claims 1 to 6 when running.
10. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the object detection method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011493902.5A CN112560956A (en) | 2020-12-16 | 2020-12-16 | Target detection method and device, nonvolatile storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011493902.5A CN112560956A (en) | 2020-12-16 | 2020-12-16 | Target detection method and device, nonvolatile storage medium and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112560956A true CN112560956A (en) | 2021-03-26 |
Family
ID=75064445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011493902.5A Pending CN112560956A (en) | 2020-12-16 | 2020-12-16 | Target detection method and device, nonvolatile storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112560956A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520229A (en) * | 2018-04-04 | 2018-09-11 | 北京旷视科技有限公司 | Image detecting method, device, electronic equipment and computer-readable medium |
CN109741318A (en) * | 2018-12-30 | 2019-05-10 | 北京工业大学 | The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field |
US20190287215A1 (en) * | 2018-03-13 | 2019-09-19 | Disney Enterprises, Inc. | Image Processing Using A Convolutional Neural Network |
CN110287927A (en) * | 2019-07-01 | 2019-09-27 | 西安电子科技大学 | Based on the multiple dimensioned remote sensing image object detection method with context study of depth |
CN111027547A (en) * | 2019-12-06 | 2020-04-17 | 南京大学 | Automatic detection method for multi-scale polymorphic target in two-dimensional image |
CN111126472A (en) * | 2019-12-18 | 2020-05-08 | 南京信息工程大学 | Improved target detection method based on SSD |
CN111524106A (en) * | 2020-04-13 | 2020-08-11 | 北京推想科技有限公司 | Skull fracture detection and model training method, device, equipment and storage medium |
CN111652288A (en) * | 2020-05-11 | 2020-09-11 | 北京航天自动控制研究所 | Improved SSD small target detection method based on dense feature pyramid |
CN111832655A (en) * | 2020-07-16 | 2020-10-27 | 四川大学 | Multi-scale three-dimensional target detection method based on characteristic pyramid network |
CN111914727A (en) * | 2020-07-28 | 2020-11-10 | 联芯智能(南京)科技有限公司 | Small target human body detection method based on balance sampling and nonlinear feature fusion |
CN112052861A (en) * | 2019-06-05 | 2020-12-08 | 高新兴科技集团股份有限公司 | Method for calculating effective receptive field of deep convolutional neural network and storage medium |
-
2020
- 2020-12-16 CN CN202011493902.5A patent/CN112560956A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190287215A1 (en) * | 2018-03-13 | 2019-09-19 | Disney Enterprises, Inc. | Image Processing Using A Convolutional Neural Network |
CN108520229A (en) * | 2018-04-04 | 2018-09-11 | 北京旷视科技有限公司 | Image detecting method, device, electronic equipment and computer-readable medium |
CN109741318A (en) * | 2018-12-30 | 2019-05-10 | 北京工业大学 | The real-time detection method of single phase multiple dimensioned specific objective based on effective receptive field |
CN112052861A (en) * | 2019-06-05 | 2020-12-08 | 高新兴科技集团股份有限公司 | Method for calculating effective receptive field of deep convolutional neural network and storage medium |
CN110287927A (en) * | 2019-07-01 | 2019-09-27 | 西安电子科技大学 | Based on the multiple dimensioned remote sensing image object detection method with context study of depth |
CN111027547A (en) * | 2019-12-06 | 2020-04-17 | 南京大学 | Automatic detection method for multi-scale polymorphic target in two-dimensional image |
CN111126472A (en) * | 2019-12-18 | 2020-05-08 | 南京信息工程大学 | Improved target detection method based on SSD |
CN111524106A (en) * | 2020-04-13 | 2020-08-11 | 北京推想科技有限公司 | Skull fracture detection and model training method, device, equipment and storage medium |
CN111652288A (en) * | 2020-05-11 | 2020-09-11 | 北京航天自动控制研究所 | Improved SSD small target detection method based on dense feature pyramid |
CN111832655A (en) * | 2020-07-16 | 2020-10-27 | 四川大学 | Multi-scale three-dimensional target detection method based on characteristic pyramid network |
CN111914727A (en) * | 2020-07-28 | 2020-11-10 | 联芯智能(南京)科技有限公司 | Small target human body detection method based on balance sampling and nonlinear feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109034660B (en) | Method and related device for determining risk control strategy based on prediction model | |
CN113344910B (en) | Defect labeling image generation method and device, computer equipment and storage medium | |
CN113536081B (en) | Data center data management method and system based on artificial intelligence | |
CN112348765A (en) | Data enhancement method and device, computer readable storage medium and terminal equipment | |
CN107818301A (en) | Update the method, apparatus and electronic equipment of biometric templates | |
CN114359563B (en) | Model training method, device, computer equipment and storage medium | |
CN113268641B (en) | User data processing method based on big data and big data server | |
CN110362563A (en) | The processing method and processing device of tables of data, storage medium, electronic device | |
CN110321892A (en) | A kind of picture screening technique, device and electronic equipment | |
CN114783021A (en) | Intelligent detection method, device, equipment and medium for wearing of mask | |
CN115203496A (en) | Project intelligent prediction and evaluation method and system based on big data and readable storage medium | |
CN113850523A (en) | ESG index determining method based on data completion and related product | |
CN117234938A (en) | Screening method and device for test cases, electronic equipment and storage medium | |
CN108055638A (en) | Obtain method, apparatus, computer-readable medium and the equipment of target location | |
CN111127592B (en) | Picture color filling method and device, electronic equipment and readable storage medium | |
CN111353577B (en) | Multi-task-based cascade combination model optimization method and device and terminal equipment | |
CN112560956A (en) | Target detection method and device, nonvolatile storage medium and electronic equipment | |
CN116758373A (en) | Training method, image processing method, device and equipment for deep learning model | |
CN116258923A (en) | Image recognition model training method, device, computer equipment and storage medium | |
CN110691362B (en) | Station address determination method and device | |
CN107040603A (en) | For determining the method and apparatus that application program App enlivens scene | |
CN115935274A (en) | Training method, device, equipment and storage medium for reselling behavior recognition model | |
CN112052883A (en) | Clothes detection method, device and storage medium | |
CN113469057B (en) | Fire eye video self-adaptive detection method, device, equipment and medium | |
CN113569727B (en) | Method, system, terminal and medium for identifying construction site in remote sensing image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |