CN111523403B - Method and device for acquiring target area in picture and computer readable storage medium - Google Patents
Method and device for acquiring target area in picture and computer readable storage medium Download PDFInfo
- Publication number
- CN111523403B CN111523403B CN202010258207.4A CN202010258207A CN111523403B CN 111523403 B CN111523403 B CN 111523403B CN 202010258207 A CN202010258207 A CN 202010258207A CN 111523403 B CN111523403 B CN 111523403B
- Authority
- CN
- China
- Prior art keywords
- module
- detection
- picture
- sub
- target area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000001514 detection method Methods 0.000 claims abstract description 185
- 230000006870 function Effects 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 15
- 230000009471 action Effects 0.000 description 13
- 238000012549 training Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000005070 sampling Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
Abstract
The application relates to the field of image processing, and discloses a method and a device for acquiring a target area in a picture and a computer readable storage medium. The method for acquiring the target area in the picture comprises the following steps: inputting the picture to be detected into a detection model, and acquiring a target detection point in the picture to be detected; the detection model comprises a backbone module and a multi-stage detection module connected with the backbone module; the backbone module is used for acquiring feature graphs of the pictures to be detected in each preset dimension; the multistage detection module is used for carrying out position detection and category detection on the feature map and determining a target detection point according to the detection result of the position detection and the detection result of the category detection; and acquiring a target area from the picture to be detected according to the target detection point. Compared with the prior art, the method and the device for acquiring the target area in the picture and the computer readable storage medium provided by the embodiment of the application have the advantages that the target area where the target to be identified is accurately detected, and the picture is convenient to accurately cut.
Description
Technical Field
The present application relates to the field of image processing, and in particular, to a method and apparatus for acquiring a target area in a picture, and a computer readable storage medium.
Background
With the development of modern information technology to intelligent and humanized, various man-machine interaction, virtual reality and intelligent monitoring systems are sequentially developed. Human body posture estimation, motion recognition, behavior understanding, etc. techniques based on computer vision play an important role therein.
However, the inventor of the present application found that, in the prior art, a deep learning model is generally required to be used, and when the deep learning model performs motion recognition on a picture, if the motion image of the picture is small, for example, when performing motion recognition on a football match picture, if the football match picture is a long-range picture, the motion image of a player is small, and in this case, in the prior art, the picture is generally required to be scaled, so that part of motion details are lost, and the motion recognition effect is poor. In order to improve the action recognition effect and reduce the loss of action details, the prior art also has the technical proposal of cutting pictures, cutting larger pictures into a plurality of small pictures and carrying out action recognition on the small pictures. However, since the region where the target to be identified is located cannot be accurately cut in the prior art, all the small pictures need to be respectively identified in an action mode, which results in a multiple increase in the calculation amount.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for acquiring a target area in a picture, and a computer readable storage medium, which can accurately detect the target area where a target to be identified is located, and is convenient for accurately cutting the picture.
In order to solve the above technical problems, an embodiment of the present application provides a method for acquiring a target area in a picture, where the method includes: inputting the picture to be detected into a detection model, and acquiring a target detection point in the picture to be detected; the detection model comprises a backbone module and a multi-stage detection module connected with the backbone module; the backbone module is used for acquiring feature graphs of the picture to be detected in each preset dimension; the multi-stage detection module is used for carrying out position detection and category detection on the feature map, and determining the target detection point according to the detection result of the position detection and the detection result of the category detection; and acquiring a target area from the picture to be detected according to the target detection point.
The embodiment of the application also provides a target detection device, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method for acquiring a target area in a picture as described above.
The embodiment of the application also provides a computer readable storage medium which stores a computer program, and the computer program realizes the method for acquiring the target area in the picture when being executed by a processor.
Compared with the prior art, the built detection model comprises a backbone module and a multi-stage detection module, wherein the backbone module acquires the characteristic diagrams of the pictures to be detected in each preset dimension, the characteristic diagrams in a plurality of preset dimensions are respectively input into the multi-stage detection module, the multi-stage detection module respectively performs position detection and category detection on the input characteristic diagrams in the plurality of preset dimensions, and then the obtained detection results of the position detection and the detection results of the category detection are combined to determine the positions corresponding to the target detection points. After the picture to be detected is input into the detection model after the training is completed, the detection model can output the target detection point in the picture to be detected, and after the position of the target detection point is determined, the range of the target area in the picture to be detected can be determined according to the target detection point. Through the accurate identification to the target area, can more be convenient for wait to detect the picture and carry out accurate tailorring, can carry out action recognition through tailorring the picture later, reduce action detail and lose the while, reduce action recognition's calculated quantity.
In addition, the detection result of the position detection comprises a central point coordinate parameter of the target area and a size parameter of the target area; and constructing a loss function of the detection model according to the central point coordinate parameter and the size parameter, wherein the weight of the central point coordinate parameter in the loss function is larger than the weight of the size parameter, and the weight of the central point coordinate parameter and the weight of the size parameter are preset constants. Because the validity of the central point coordinate parameter of the target detection point is greater than that of the size parameter, the weight of the central point coordinate parameter in the loss function is set to be greater than that of the size parameter, and the accuracy of the target detection point output by the detection model can be effectively improved.
In addition, the backbone module comprises a first sub-module, a second sub-module, a third sub-module and a fourth sub-module which are sequentially connected; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module comprise a plurality of convolution modules with the same convolution kernel, and the number of the convolution kernels of the first sub-module, the second sub-module, the third sub-module and the fourth sub-module is increased in sequence; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module are respectively used for outputting the feature graphs of the preset dimensions corresponding to the first sub-module, the second sub-module, the third sub-module and the fourth sub-module.
In addition, the convolution modules with the same convolution kernel comprise a feature mapping sub-module, a convolution computing sub-module, a batch norm computing sub-module, a linear rectification computing sub-module and a feature mapping sub-module which are connected in sequence.
In addition, the detection model further comprises a feature pyramid network connecting the fourth sub-module and the multi-stage detection module; and the characteristic pyramid network is used for combining output results output by the fourth submodule and inputting the combined output results into the multi-stage detection module.
In addition, the determining the target detection point according to the detection result of the position detection and the detection result of the category detection specifically includes: and combining the detection result of the position detection and the detection result of the category detection according to a preset dimension to obtain the target detection point.
In addition, the obtaining the target area from the picture to be detected according to the target detection point specifically includes: and cutting the picture to be detected by taking the target detection point as a center point and the preset size as a side length to obtain the target area.
In addition, before inputting the picture to be detected into the detection model, the method further comprises the following steps: acquiring a plurality of football match pictures; inputting a plurality of football match images into a far-middle and near-view judging model to obtain near-view football match pictures, middle-view football match pictures and far-view football match pictures; and taking the middle-view football match picture and the distant-view football match picture as the pictures to be detected. The middle-view football match picture and the distant-view football match picture are used as pictures to be detected to be input into the detection model, so that the accuracy of the detection results of football actions in the middle-view football match picture and the distant-view football match picture can be effectively improved. In addition, the close-range football match pictures are directly subjected to action recognition, and the pictures are not required to be cut, so that the calculated amount can be effectively reduced.
Drawings
Fig. 1 is a program flow chart of a method for acquiring a target area in a picture according to a first embodiment of the present application;
fig. 2 is a schematic structural diagram of a detection model in the method for acquiring a target area in a picture according to the first embodiment of the present application;
fig. 3 is a schematic structural diagram of a convolution module in the method for obtaining a target area in a picture according to the first embodiment of the present application;
fig. 4 is a schematic structural diagram of a pyramid module in the method for obtaining a target area in a picture according to the first embodiment of the present application;
fig. 5 is a flowchart of a method for acquiring a target area in a picture according to a second embodiment of the present application;
fig. 6 is a schematic structural diagram of an object detection device according to a third embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. However, the claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments.
The first embodiment of the application relates to a method for acquiring a target area in a picture. The specific flow is shown in fig. 1, and comprises the following steps:
step S101: and inputting the picture to be detected into the detection model.
Specifically, in the present embodiment, as shown in fig. 2, the inspection model includes a backbone module 201 and a multi-stage inspection module 202 connected to the backbone module 201. The backbone module is used for acquiring feature images of the pictures to be detected in each preset dimension, and the multi-stage detection module is used for carrying out position detection and category detection on the feature images and determining target detection points in the pictures to be detected by combining detection results of the position detection and detection results of the category detection.
Further, in the present embodiment, the backbone module 201 includes a first sub-module 2011, a second sub-module 2012, a third sub-module 2013, and a fourth sub-module 2014 that are sequentially connected; the first sub-module 2011, the second sub-module 2012, the third sub-module 2013 and the fourth sub-module 2014 each comprise a plurality of convolution modules 203 with identical convolution kernels. And the number of convolution kernels within the first 2011, second 2012, third 2013 and fourth 2014 sub-modules increases in sequence.
The convolution kernel of the convolution module 203 is a 3x3 convolution kernel, and it is understood that the convolution kernel of the convolution module 203 is a 3x3 convolution kernel only as a specific application example in this embodiment, and in other embodiments of the present application, the convolution kernel of the convolution module 203 may also be other values, such as 4x4, 5x5, etc., which are not listed here, and in particular, may be flexibly set according to actual needs.
In the following, the operation process of the backbone module in this embodiment is specifically illustrated, and it will be understood that the number of convolution kernels and the size of the output feature map described below are only one specific illustration in this embodiment, and are not limited thereto. In this embodiment, the picture to be detected with the size of 1280x720 is input into the backbone module, the number of convolution kernels of the convolution module 203 included in the first submodule 2011 is 128, the feature map output size is 1280x720, the number of convolution kernels of the convolution module 203 included in the second submodule 2012 is 256, the feature map output size is 640x360, the number of convolution kernels of the convolution module 203 included in the third submodule 2013 is 512, the feature map output size is 320x180, and the number of convolution kernels of the convolution module 203 included in the fourth submodule 2014 is 1024, and the feature map output size is 160x90.
Preferably, in this embodiment, as shown in fig. 3, the convolution module 203 includes a feature mapping submodule 2031, a convolution calculation submodule 2032, a batch norm operator submodule 2033, a linear rectification calculation submodule 2034, and a feature mapping submodule 2035, which are connected in sequence.
Note that, in fig. 2, the horizontally extending arrow indicates the same 3×3 convolution calculation process with a step size of 1 as the convolution module 203; the top-down arrow indicates the downsampling process and the bottom-up arrow indicates the upsampling process. The up-sampling process and the down-sampling process are convolution calculation processes, and the step sizes of the up-sampling process and the down-sampling process are different from those of the convolution module 203, for example, the up-sampling process and the down-sampling process may be convolution calculation processes with a step size of 2 or convolution calculation processes with a step size of 4.
Further, in the present embodiment, as shown in fig. 2, the multi-level detection module 202 includes a position detection sub-module 2021 and a category detection sub-module 2022. The position detection sub-module 2021 is configured to perform position detection on an input feature map, and the category detection sub-module 2022 is configured to perform category detection on the input feature map.
In this embodiment, the feature pyramid module 204 is further included. The feature pyramid module 204 is connected to the fourth sub-module 2014, the position detection sub-module 2021, and the category detection sub-module 2022, respectively. The specific structure of the feature pyramid module 204 is shown in fig. 4, and is used for combining the feature information output by the fourth submodule 2014 to form a feature map, and transmitting the feature map to the position detection submodule 2021 and the category detection submodule 2022.
Specifically, in the present embodiment, the multi-stage detection module 202 further includes a combination module connected to the position detection sub-module 2021 and the category detection sub-module 2022. The combination module is configured to combine the detection results of the position detection sub-module 2021 and the detection results of the category detection sub-module 2022. The location of the target detection point is determined. In the present embodiment, the combination module performs combination calculation of a preset dimension on the detection result of the position detection sub-module 2021 and the detection result of the category detection sub-module 2022 to obtain a combination result, and uses the combination result as the target detection point. For example, the dimension of the detection result of the position detection sub-module 2021 before combination is 5x5x24, and for position detection, 4-dimensional position coordinates are unified, so 5x5x24 is converted to 150x4. Similarly, the dimension of the detection result of the class detection sub-module 2022 before combination is 5x5x18, and for the class detection, the 3-dimensional class number is uniform, so the 5x5x18 is transformed to 150x3, so that combination is performed according to the common dimension 150. It should be understood that the foregoing is merely a specific example of the present embodiment, and is not limited thereto, and other combinations may be adopted in other embodiments of the present application, and the present application is not limited to the specific examples, and may be flexibly set according to actual needs.
Specifically, the detection model is a learning model with training completed. The training process comprises the following steps: training samples are first obtained. For example, in the present embodiment, a soccer game picture is exemplified.
First, a plurality of football match pictures are acquired, and for example, a plurality of frames of images may be extracted from a piece of football match video. Then, the target areas in the football match pictures are marked by anchor frame marks, for example, football and ball holders can be marked by manual marking and other methods, and a plurality of football match pictures with marks are formed as training samples. And finally, inputting the training sample into a detection model, firstly, acquiring feature images of each preset dimension of the training sample through a backbone module, then, carrying out position detection on the feature images through a position detection sub-module, and carrying out category detection on the feature images through a category detection sub-module. And solving a loss function according to the position detection result and the category detection result and combining the labels of the training samples. Specifically, the position detection sub-module presets an anchor frame on the feature map, and each coordinate point on the feature map generates a corresponding anchor frame. And comparing each anchor frame with the anchor frame label, and calculating a loss function.
The loss function calculation formula of the position detection result is as follows:
;
wherein L is loc For loss of position detection, n posotives And posives is an anchor frame with IoU greater than 0.5 as compared to the anchor frame label; the SizeLoss input is the last two dimensions of the four-dimensional coordinates after combination transformation in the detection module, namely the width and the height of the target area, and the Sizecenter is the four-dimensional coordinates after combination transformation in the detection module, namely the coordinates of the target detection point; smoothL1 represents a smooth L1 loss function, specifically as follows:
。
IoU is IoU = (A.u.B)/(A.u.B), and A, B is respectively an anchor frame and an anchor frame mark.
In the present embodiment, since the validity of the target detection point center point coordinate parameter is greater than the validity of the size parameter of the target area, different weights are assigned and α2> α1. I.e. the weight of the position offset is larger than the weight of the predicted size offset.
The loss function calculation formula of the category detection result is as follows:
wherein L is conf For loss of class detection results, CELoss is the cross entropy loss function and hard negotives is a fixed multiple of posotives. hard negotives are obtained by respective negative matches (IoU<0.5 The largest of the cross entropy losses of anchor frame prediction).
The loss function is L:. Wherein, beta is a preset constant.
After the loss function is obtained, training the model through a back propagation algorithm, and continuously reducing the value of the loss function until the value of the loss function reaches a preset threshold.
Step S102: and acquiring a target area from the picture to be detected according to the target detection point.
Specifically, in this embodiment, after the target detection point is obtained, the side length of the preset size can be set according to the actual requirement, and the picture to be detected is cut by using the target detection point as the center point, so as to obtain the target area.
Compared with the prior art, in the method for acquiring the target area in the picture provided by the first embodiment of the application, the established detection model comprises two parts, namely a backbone module and a multi-stage detection module, after the backbone module acquires the feature images of the picture to be detected in a plurality of preset dimensions, the feature images in the plurality of preset dimensions are respectively input into the multi-stage detection module, and the multi-stage detection module respectively performs position detection and category detection on the input feature images in the plurality of preset dimensions, and then combines the obtained detection results of the position detection and the detection results of the category detection to determine the position corresponding to the target detection point. By training the data of the detection model, the accuracy of the target detection point output by the detection model is improved, after the picture to be detected is input into the detection model after the training is completed, the detection model can output the target detection point in the picture to be detected, and after the position of the target detection point is determined, the range of the target area in the picture to be detected can be determined according to the target detection point. Through the accurate identification to the target area, can more be convenient for wait to detect the picture and carry out accurate tailorring, can carry out action recognition through tailorring the picture later, reduce action detail and lose the while, reduce action recognition's calculated quantity.
The second embodiment of the application relates to a method for acquiring a target area in a picture. The second embodiment is substantially the same as the first embodiment, and differs mainly in that: the second embodiment is a cropping process applied to a soccer game picture. The specific steps are shown in fig. 5, including:
step S201: a plurality of football match pictures are acquired.
Specifically, in this embodiment, a football match video may be obtained; and extracting a plurality of picture frames from the football match video to obtain a plurality of football match pictures. It should be understood that the foregoing is merely a specific illustration of capturing a plurality of football match pictures in the present embodiment, and is not meant to be limiting.
Step S202: and inputting a plurality of football match images into a far-middle and near-view judging model to obtain near-view football match pictures, middle-view football match pictures and far-view football match pictures.
Specifically, in this embodiment, the far, middle and near view judgment model is a res net34 network, and the specific structure is as follows:
it should be understood that the foregoing table is merely a specific structural example of the network of the res net34, and is not limited thereto, and in other embodiments of the present application, other structures may be used, which are not listed herein, and may be specifically set flexibly according to actual needs.
Step S203: cutting the close-range football match pictures according to a preset size to obtain a target area.
Step S204: inputting the middle-view football match pictures and the distant-view football match pictures into a detection model after training is completed, and obtaining target detection points output by the detection model.
Step S205: and acquiring a target area from the middle-view football match picture and the distant-view football match picture according to the target detection point.
It should be understood that the steps S204 to S205 are substantially the same as the steps S101 to S102 in the first embodiment, and specific reference may be made to the specific description of the first embodiment, which is not repeated herein.
Compared with the prior art, the method for acquiring the target area in the picture provided by the second embodiment of the application acquires the close-range football match picture, the middle-range football match picture and the far-range football match picture in the far-middle close-range judgment model while retaining the technical effect of the first embodiment; for the close-range football match picture, as the action details are clear, the target area is obtained by directly cutting according to the preset size or the target area can be cut without cutting, so that the operation amount is effectively reduced.
The above steps of the methods are divided, for clarity of description, and may be combined into one step or split into multiple steps when implemented, so long as they contain the same logic relationship, and they are all within the protection scope of this patent; it is within the scope of this patent to add insignificant modifications to the algorithm or flow or introduce insignificant designs, but not to alter the core design of its algorithm and flow.
A third embodiment of the present application relates to an object detection apparatus, as shown in fig. 6, including: at least one processor 601; and a memory 602 communicatively coupled to the at least one processor 601; the memory 602 stores instructions executable by the at least one processor 601, where the instructions are executed by the at least one processor 601, so that the at least one processor 601 can perform a method for acquiring a target area in a picture as described above.
Where the memory 602 and the processor 601 are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors 601 and the memory 602. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor 601 is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor 601.
The processor 601 is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory 602 may be used to store data used by processor 601 in performing operations.
A fourth embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program implements the above-described method embodiments when executed by a processor.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the application and that various changes in form and details may be made therein without departing from the spirit and scope of the application.
Claims (7)
1. The method for acquiring the target area in the picture is characterized by comprising the following steps:
inputting the picture to be detected into a detection model, and obtaining a target detection point in the picture to be detected; the detection model comprises a backbone module and a multi-stage detection module connected with the backbone module; the backbone module is used for acquiring feature graphs of the picture to be detected in each preset dimension; the multi-stage detection module is used for carrying out position detection and category detection on the feature map, and determining the target detection point according to the detection result of the position detection and the detection result of the category detection;
acquiring a target area from the picture to be detected according to the target detection point;
the backbone module comprises a first sub-module, a second sub-module, a third sub-module and a fourth sub-module which are sequentially connected; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module comprise a plurality of convolution modules with the same convolution kernel, and the number of the convolution kernels of the first sub-module, the second sub-module, the third sub-module and the fourth sub-module is increased in sequence; the first sub-module, the second sub-module, the third sub-module and the fourth sub-module are respectively used for outputting the feature graphs of the corresponding preset dimensions; the convolution modules with the same convolution kernel comprise a feature mapping sub-module, a convolution computing sub-module, a batch norm computing sub-module, a linear rectification computing sub-module and a feature mapping sub-module which are connected in sequence; the detection model further comprises a characteristic pyramid network connecting the fourth sub-module and the multi-level detection module; and the characteristic pyramid network is used for combining output results output by the fourth submodule and inputting the combined output results into the multi-stage detection module.
2. The method for obtaining a target area in a picture according to claim 1, wherein the detection result of the position detection includes a center point coordinate parameter of the target area and a size parameter of the target area;
and constructing a loss function of the detection model according to the central point coordinate parameter and the size parameter, wherein the weight of the central point coordinate parameter in the loss function is larger than the weight of the size parameter, and the weight of the central point coordinate parameter and the weight of the size parameter are preset constants.
3. The method for acquiring the target area in the picture according to claim 1, wherein the determining the target detection point according to the detection result of the position detection and the detection result of the category detection specifically includes:
and combining the detection result of the position detection and the detection result of the category detection according to a preset dimension to obtain the target detection point.
4. The method for obtaining a target area in a picture according to claim 1, wherein the obtaining the target area from the picture to be detected according to the target detection point specifically includes:
and cutting the picture to be detected by taking the target detection point as a center point and the preset size as a side length to obtain the target area.
5. The method for obtaining a target area in a picture according to claim 1, wherein before inputting the picture to be detected into the detection model, the method further comprises:
acquiring a plurality of football match pictures;
inputting a plurality of football match pictures into a far-middle and near-view judging model to obtain near-view football match pictures, middle-view football match pictures and far-view football match pictures;
and taking the middle-view football match picture and the distant-view football match picture as the pictures to be detected.
6. An object detection apparatus, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to at least one of the processors; wherein,,
the memory stores instructions executable by at least one of the processors to enable the at least one of the processors to perform the method of acquiring a target region in a picture according to any one of claims 1 to 5.
7. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method of acquiring a target area in a picture as claimed in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010258207.4A CN111523403B (en) | 2020-04-03 | 2020-04-03 | Method and device for acquiring target area in picture and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010258207.4A CN111523403B (en) | 2020-04-03 | 2020-04-03 | Method and device for acquiring target area in picture and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111523403A CN111523403A (en) | 2020-08-11 |
CN111523403B true CN111523403B (en) | 2023-10-20 |
Family
ID=71901943
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010258207.4A Active CN111523403B (en) | 2020-04-03 | 2020-04-03 | Method and device for acquiring target area in picture and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111523403B (en) |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541494A (en) * | 2010-12-30 | 2012-07-04 | 中国科学院声学研究所 | Video size switching system and video size switching method facing display terminal |
CN104091171A (en) * | 2014-07-04 | 2014-10-08 | 华南理工大学 | Vehicle-mounted far infrared pedestrian detection system and method based on local features |
CN107392244A (en) * | 2017-07-18 | 2017-11-24 | 厦门大学 | The image aesthetic feeling Enhancement Method returned based on deep neural network with cascade |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
WO2019041360A1 (en) * | 2017-09-04 | 2019-03-07 | 华为技术有限公司 | Pedestrian attribute recognition and positioning method and convolutional neural network system |
CN109492608A (en) * | 2018-11-27 | 2019-03-19 | 腾讯科技(深圳)有限公司 | Image partition method, device, computer equipment and storage medium |
CN109886273A (en) * | 2019-02-26 | 2019-06-14 | 四川大学华西医院 | A kind of CMR classification of image segmentation system |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
CN110069993A (en) * | 2019-03-19 | 2019-07-30 | 同济大学 | A kind of target vehicle detection method based on deep learning |
CN110309876A (en) * | 2019-06-28 | 2019-10-08 | 腾讯科技(深圳)有限公司 | Object detection method, device, computer readable storage medium and computer equipment |
CN110363104A (en) * | 2019-06-24 | 2019-10-22 | 中国科学技术大学 | A kind of detection method of diesel oil black smoke vehicle |
CN110414574A (en) * | 2019-07-10 | 2019-11-05 | 厦门美图之家科技有限公司 | A kind of object detection method calculates equipment and storage medium |
CN110503112A (en) * | 2019-08-27 | 2019-11-26 | 电子科技大学 | A kind of small target deteection of Enhanced feature study and recognition methods |
CN110729045A (en) * | 2019-10-12 | 2020-01-24 | 闽江学院 | Tongue image segmentation method based on context-aware residual error network |
WO2020024584A1 (en) * | 2018-08-03 | 2020-02-06 | 华为技术有限公司 | Method, device and apparatus for training object detection model |
CN110781728A (en) * | 2019-09-16 | 2020-02-11 | 北京嘀嘀无限科技发展有限公司 | Face orientation estimation method and device, electronic equipment and storage medium |
WO2020032354A1 (en) * | 2018-08-06 | 2020-02-13 | Samsung Electronics Co., Ltd. | Method, storage medium and apparatus for converting 2d picture set to 3d model |
CN110852177A (en) * | 2019-10-17 | 2020-02-28 | 北京全路通信信号研究设计院集团有限公司 | Obstacle detection method and system based on monocular camera |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108694401B (en) * | 2018-05-09 | 2021-01-12 | 北京旷视科技有限公司 | Target detection method, device and system |
CN109272530B (en) * | 2018-08-08 | 2020-07-21 | 北京航空航天大学 | Target tracking method and device for space-based monitoring scene |
-
2020
- 2020-04-03 CN CN202010258207.4A patent/CN111523403B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102541494A (en) * | 2010-12-30 | 2012-07-04 | 中国科学院声学研究所 | Video size switching system and video size switching method facing display terminal |
CN104091171A (en) * | 2014-07-04 | 2014-10-08 | 华南理工大学 | Vehicle-mounted far infrared pedestrian detection system and method based on local features |
CN107392244A (en) * | 2017-07-18 | 2017-11-24 | 厦门大学 | The image aesthetic feeling Enhancement Method returned based on deep neural network with cascade |
WO2019041360A1 (en) * | 2017-09-04 | 2019-03-07 | 华为技术有限公司 | Pedestrian attribute recognition and positioning method and convolutional neural network system |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
WO2020024584A1 (en) * | 2018-08-03 | 2020-02-06 | 华为技术有限公司 | Method, device and apparatus for training object detection model |
WO2020032354A1 (en) * | 2018-08-06 | 2020-02-13 | Samsung Electronics Co., Ltd. | Method, storage medium and apparatus for converting 2d picture set to 3d model |
CN109492608A (en) * | 2018-11-27 | 2019-03-19 | 腾讯科技(深圳)有限公司 | Image partition method, device, computer equipment and storage medium |
CN109886273A (en) * | 2019-02-26 | 2019-06-14 | 四川大学华西医院 | A kind of CMR classification of image segmentation system |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
CN110069993A (en) * | 2019-03-19 | 2019-07-30 | 同济大学 | A kind of target vehicle detection method based on deep learning |
CN110363104A (en) * | 2019-06-24 | 2019-10-22 | 中国科学技术大学 | A kind of detection method of diesel oil black smoke vehicle |
CN110309876A (en) * | 2019-06-28 | 2019-10-08 | 腾讯科技(深圳)有限公司 | Object detection method, device, computer readable storage medium and computer equipment |
CN110414574A (en) * | 2019-07-10 | 2019-11-05 | 厦门美图之家科技有限公司 | A kind of object detection method calculates equipment and storage medium |
CN110503112A (en) * | 2019-08-27 | 2019-11-26 | 电子科技大学 | A kind of small target deteection of Enhanced feature study and recognition methods |
CN110781728A (en) * | 2019-09-16 | 2020-02-11 | 北京嘀嘀无限科技发展有限公司 | Face orientation estimation method and device, electronic equipment and storage medium |
CN110729045A (en) * | 2019-10-12 | 2020-01-24 | 闽江学院 | Tongue image segmentation method based on context-aware residual error network |
CN110852177A (en) * | 2019-10-17 | 2020-02-28 | 北京全路通信信号研究设计院集团有限公司 | Obstacle detection method and system based on monocular camera |
Non-Patent Citations (5)
Title |
---|
Adult Image and Video Recognition by a Deep Multicontext Network and Fine-to-Coarse Strategy;Ou X 等;ACM Transactions on Intelligent Systems and Technology (TIST);第8卷(第5期);1-5 * |
Deep fully-connected networks for video compressive sensing;Iliadis M 等;Digital Signal Processing;9-18 * |
基于Faster RCNN的道路车辆检测算法研究;刘敦强;中国硕士学位论文全文数据库 工程科技Ⅱ辑(第(2019)02期);C034-519 * |
基于SSD和MobileNet网络的目标检测方法的研究;任宇杰 等;计算机科学与探索;第13卷(第11期);1881-1893 * |
视频显著性检测研究进展;丛润民 等;软件学报;第29卷(第08期);2527-2544 * |
Also Published As
Publication number | Publication date |
---|---|
CN111523403A (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229509B (en) | Method and device for identifying object class and electronic equipment | |
CN110363817B (en) | Target pose estimation method, electronic device, and medium | |
WO2022116423A1 (en) | Object posture estimation method and apparatus, and electronic device and computer storage medium | |
CN114186632B (en) | Method, device, equipment and storage medium for training key point detection model | |
CN109858476B (en) | Tag expansion method and electronic equipment | |
CN108573471B (en) | Image processing apparatus, image processing method, and recording medium | |
CN112509036B (en) | Pose estimation network training and positioning method, device, equipment and storage medium | |
CN114519881A (en) | Face pose estimation method and device, electronic equipment and storage medium | |
CN110910375A (en) | Detection model training method, device, equipment and medium based on semi-supervised learning | |
CN114092963A (en) | Key point detection and model training method, device, equipment and storage medium | |
CN114359665A (en) | Training method and device of full-task face recognition model and face recognition method | |
CN110276801B (en) | Object positioning method and device and storage medium | |
CN111177811A (en) | Automatic fire point location layout method applied to cloud platform | |
CN115965961B (en) | Local-global multi-mode fusion method, system, equipment and storage medium | |
CN111523403B (en) | Method and device for acquiring target area in picture and computer readable storage medium | |
CN115131621A (en) | Image quality evaluation method and device | |
CN110633630B (en) | Behavior identification method and device and terminal equipment | |
CN116069801B (en) | Traffic video structured data generation method, device and medium | |
WO2023109086A1 (en) | Character recognition method, apparatus and device, and storage medium | |
CN113721240B (en) | Target association method, device, electronic equipment and storage medium | |
CN113033578B (en) | Image calibration method, system, terminal and medium based on multi-scale feature matching | |
CN113705643A (en) | Target detection method and device and electronic equipment | |
CN113792660B (en) | Pedestrian detection method, system, medium and equipment based on improved YOLOv3 network | |
CN113139579B (en) | Image classification method and system based on image feature self-adaptive convolution network | |
CN116152345B (en) | Real-time object 6D pose and distance estimation method for embedded system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |