CN113378857A - Target detection method and device, electronic equipment and storage medium - Google Patents

Target detection method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113378857A
CN113378857A CN202110721556.XA CN202110721556A CN113378857A CN 113378857 A CN113378857 A CN 113378857A CN 202110721556 A CN202110721556 A CN 202110721556A CN 113378857 A CN113378857 A CN 113378857A
Authority
CN
China
Prior art keywords
detection
module
detection frames
processing
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110721556.XA
Other languages
Chinese (zh)
Inventor
叶锦
谭啸
孙昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110721556.XA priority Critical patent/CN113378857A/en
Publication of CN113378857A publication Critical patent/CN113378857A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a target detection method, a target detection device, electronic equipment and a storage medium, and relates to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: carrying out feature extraction on the image to obtain a plurality of feature maps; classifying the plurality of feature maps to obtain a plurality of thermodynamic diagrams; carrying out target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames; and screening the plurality of first detection frames to obtain a plurality of second detection frames, wherein the plurality of second detection frames are used for identifying target objects of different classifications in the image. By adopting the method and the device, the target detection speed is improved.

Description

Target detection method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence. The present disclosure relates to computer vision and deep learning technology, and is particularly applicable to smart cities and intelligent traffic scenarios.
Background
With the development of the technology, the hardware performance can be improved through artificial intelligence, the applicable application scenarios are various, for example, image processing, video processing, face recognition, target positioning target detection and the like are involved, especially in the application scenario requiring real-time performance, the requirement for the detection speed is higher and higher, and how to improve the detection speed as much as possible on the premise of ensuring the detection precision is an urgent problem to be solved. For this reason, the related art has no effective solution.
Disclosure of Invention
The disclosure provides a target detection method, a target detection device, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided an object detection method including:
carrying out feature extraction on the image to obtain a plurality of feature maps;
classifying the plurality of feature maps to obtain a plurality of thermodynamic diagrams;
carrying out target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames;
and screening the plurality of first detection frames to obtain a plurality of second detection frames, wherein the plurality of second detection frames are used for identifying target objects of different classifications in the image.
According to another aspect of the present disclosure, there is provided an object detecting apparatus including:
the first processing module is used for extracting the features of the image to obtain a plurality of feature maps;
the second processing module is used for classifying the plurality of feature maps to obtain a plurality of thermodynamic maps;
the third processing module is used for carrying out target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames;
and the fourth processing module is used for screening the plurality of first detection frames to obtain a plurality of second detection frames, and the plurality of second detection frames are used for identifying target objects of different classifications in the image.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method provided by any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement the method provided by any one of the embodiments of the present disclosure.
By adopting the method and the device, the image is subjected to feature extraction, a plurality of feature maps can be obtained, the feature maps are classified, a plurality of thermodynamic diagrams can be obtained, target detection processing is carried out according to the thermodynamic diagrams, a plurality of first detection frames can be obtained, the first detection frames are subjected to screening processing, a plurality of second detection frames can be obtained, and the second detection frames are used for identifying target objects of different classifications in the image, so that the target detection speed is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow diagram of a target detection method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an application scenario of a target detection method according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a target detection method in an application example according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of the structure of the target detection device according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing the object detection method of the embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The term "at least one" herein means any combination of at least two of any one or more of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C. The terms "first" and "second" used herein refer to and distinguish one from another in the similar art, without necessarily implying a sequence or order, or implying only two, such as first and second, to indicate that there are two types/two, first and second, and first and second may also be one or more.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Due to the development of technology, artificial intelligence technology with neural network as the core is widely applied in relevant scenes of computer vision, such as face recognition, image classification, character recognition (OCR), object detection, image segmentation, object tracking, event detection, unmanned driving, smart city and intelligent traffic scene, etc.
Taking an intelligent traffic scene as an example, target detection in the scene can be realized by adopting an anchor point (anchor) or pixel traversal mode based on a computer vision and artificial intelligence technology. For example, a Faster mode that the candidate region is based on a convolutional neural network (fast R-CNN) is adopted, and a predefined frame mode such as an anchor is used to cooperate with a region generation network (RPN) and region of interest pooling (ROI posing) to perform target detection, so that the method has a slow detection speed and high accuracy; for example, a mode of Only aiming at a frame by one eye (Yolo, You Only Look Once) is adopted, target detection is performed according to the position and the type of a pixel point direct regression frame, and an article and the position thereof in an image can be identified; for example, a first-stage multi-box detection (SSD) method is used to perform target detection by scanning a feature pyramid and pixel point by point, and the method has a fast detection speed but a low accuracy.
In summary, in the target detection method in the related art, no matter the first stage (detection in the first stage generally does not need to perform anchor predefinition, and has a fast detection speed but poor accuracy) or the second stage (detection in the second stage generally needs to predefine a large amount of anchor data, and has a high detection accuracy but a slow speed) is adopted, either a large amount of anchor data needs to be preset, or traversal is performed on pixels, and both the detection accuracy and the retrieval speed cannot be taken into consideration.
In order to solve the problems, according to the method, a large amount of anchor data do not need to be preset, traversal for pixel points is not needed, target detection is mainly carried out on multiple categories based on thermodynamic diagrams, the detection speed can be improved as far as possible on the premise that the detection precision is guaranteed, and the method is particularly applied to detection scenes with high real-time requirements such as intelligent traffic and automatic driving, can meet the requirements of real-time detection and can guarantee high detection accuracy.
According to an embodiment of the present disclosure, an object detection method is provided, and fig. 1 is a flowchart of an object detection method according to an embodiment of the present disclosure, which may be applied to an object detection apparatus, for example, in a case where the apparatus may be deployed in a terminal or a server or other processing device to execute, image processing, video processing, and the like may be executed. Among them, the terminal may be a User Equipment (UE), a mobile device, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, and so on. In some possible implementations, the method may also be implemented by a processor calling computer readable instructions stored in a memory. As shown in fig. 1, includes:
and S101, extracting the features of the image to obtain a plurality of feature maps.
And S102, classifying the plurality of feature maps to obtain a plurality of thermodynamic diagrams.
And S103, carrying out target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames.
S104, screening the plurality of first detection frames to obtain a plurality of second detection frames, wherein the plurality of second detection frames are used for identifying target objects of different classifications in the image.
In an example of S101-S104, a Convolutional Neural Network (CNN) may be used to perform feature extraction on an input image, classify a plurality of obtained feature maps to obtain a plurality of thermodynamic diagrams, perform connected domain operation on the plurality of thermodynamic diagrams to implement target detection processing to obtain a plurality of first detection frames, and perform filtering processing including regression and classification on the plurality of first detection frames to obtain a plurality of second detection frames.
By adopting the method and the device, the image is subjected to feature extraction, a plurality of feature maps can be obtained, the feature maps are classified, a plurality of thermodynamic diagrams can be obtained, target detection processing is carried out according to the thermodynamic diagrams, a plurality of first detection frames can be obtained, the first detection frames are subjected to screening processing, a plurality of second detection frames can be obtained, and the second detection frames are used for identifying target objects of different classifications in the image, so that the target detection speed is improved.
In one embodiment, extracting features from an image to obtain a plurality of feature maps includes: inputting the image into a feature extraction module; and performing the feature extraction in the feature extraction module, and outputting the plurality of feature maps. For example, the feature extraction module uses the CNN to perform feature extraction on an image input by the CNN, and outputs the image to obtain a CxHxW feature. Wherein C refers to the category of the feature map, H refers to the height of the feature map, and W refers to the width of the feature map. By adopting the embodiment, the image features can be accurately extracted and processed by the feature extraction module, so that the detection precision is improved.
In one embodiment, classifying the plurality of feature maps to obtain a plurality of thermodynamic diagrams includes: inputting the plurality of feature maps into a classification module; the classification is performed in the classification module, and the plurality of thermodynamic diagrams are output and used for characterizing a plurality of classes of image features. For example, a classification module is used to classify CxHxW features corresponding to a plurality of CNN output feature maps, so as to generate a plurality of thermodynamic diagrams, and the thermodynamic diagrams are classified into a plurality of categories for characterizing different image features. By adopting the embodiment, the image features can be accurately classified through the classification module, so that the detection precision is improved.
In one embodiment, performing target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames includes: inputting the plurality of thermodynamic diagrams into a connected domain module; and in the connected domain module, respectively carrying out connected domain operation on each thermodynamic diagram to realize the target detection processing, and outputting the plurality of first detection frames. For example, the connected component module performs connected component calculation on each thermodynamic diagram area to obtain a plurality of first detection frames, and in the intelligent transportation, the plurality of detection frames may include detection frames for vehicles, lane lines, objects around roads, and the like. By adopting the embodiment, the accurate detection processing of the target object can be realized through the connected domain module, so that the detection precision is improved, and correspondingly, the detection speed is also improved.
In an embodiment, the screening the plurality of first detection frames to obtain a plurality of second detection frames, where the plurality of second detection frames are used to identify target objects of different classifications in the image, and the method includes: inputting the plurality of first detection boxes into a screening module comprising a regression branch and a classification branch; in the screening module, the coordinates of the first detection frames are subjected to regression processing based on the positions of the detection frames of the corresponding classes through the regression branch, the features of the positions corresponding to the first detection frames are extracted through the classification branch and then subjected to classification processing, and the second detection frames are output. The regression branch is to obtain a more accurate position of the detection frame, for example, the regression device may be directly used to regress the coordinates of the first detection frame corresponding to the category; the classification branch is for performing more accurate classification, for example, a region of interest focusing (ROI Align) algorithm may be used to extract features of a corresponding position of the first detection box, and then the features are sent to a two-classifier for classification, where the two-classifier is used to determine whether the class exists. By adopting the embodiment, the screening processing of the first detection frame can be realized by the regression branch and the classification branch, so that the detection precision is improved, and correspondingly, the detection speed is also improved.
According to an embodiment of the present disclosure, an object detection method is provided, fig. 2 is an application scenario schematic diagram of the object detection method according to the embodiment of the present disclosure, and may be implemented by using an object detection network, where the object detection network may include a feature extraction module, a classification module, a connected domain module, and a screening module, and the object detection network may be deployed in various terminals, such as the server 11, the terminals 21 to 22, the testing machine 31, the desktop 41, and the like shown in fig. 2, and is not limited to the object detection architecture shown in fig. 2, and any terminal may implement the processing logic 200 according to the above embodiment of the present disclosure, which specifically includes:
s201, feature extraction is carried out on the image by using a feature extraction module, and a plurality of feature maps are output.
S202, classifying the plurality of feature maps by using a classification module, and outputting a plurality of thermodynamic diagrams.
And S203, using a connected domain module to respectively perform connected domain calculation on the region rows in each thermodynamic diagram and output a plurality of first detection frames.
S204, screening the plurality of first detection frames by using the screening module, and outputting a plurality of second detection frames.
By adopting the embodiment, the target detection network comprising the feature extraction module, the classification module, the connected domain module and the screening module can be deployed at various required terminals according to actual requirements, and accurate and rapid target detection can be realized.
Application example:
fig. 3 is a schematic diagram of a target detection method in an application example according to an embodiment of the present disclosure, in the application example, for target detection in an intelligent traffic scene, detection is performed based on thermodynamic diagrams, and both detection accuracy and detection speed may be considered, where the method includes:
firstly, for an input image, a CNN is used for carrying out feature extraction, the CNN network can be ResNet, MobileNet and other networks, and the output feature is CxHxW. Wherein C refers to the category of the feature map, H refers to the height of the feature map, and W refers to the width of the feature map.
Secondly, firstly, classifying the CxHxW features output by the CNN by using a classifier, for example, if N classes exist in total, then a thermodynamic diagram of NxHxW is generated, and each matrix of HxW represents the thermodynamic diagram of a certain class.
And thirdly, connecting the areas of each thermodynamic diagram by using a connected domain module, wherein the used connected domain algorithm can be a Two-Pass algorithm. This generates a corresponding first detection box for each category.
Fourthly, the generated first detection frame is subjected to processing of regression branches and classification branches respectively, and screening of the detection frame is carried out to obtain a final second detection frame, specifically, for the regression branches, a regressor can be directly used for regressing the coordinates of the first detection frame corresponding to the category; for the classification branch, the ROI Align algorithm may be used to extract the feature of the corresponding position of the first detection frame, and then the feature is sent to the second classifier, for example, the feature is input into a second classifier for classification (the function of the second classifier is to determine whether the class exists), and finally the required second detection frame is obtained.
By adopting the application example, a large amount of anchor data are not required to be preset, and target detection is not required to be carried out all the time by pixel points.
According to an embodiment of the present disclosure, there is provided an object detection apparatus, fig. 4 is a schematic structural diagram of the object detection apparatus according to an embodiment of the present disclosure, and as shown in fig. 4, an object detection apparatus 400 includes: a first processing module 401, configured to perform feature extraction on an image to obtain a plurality of feature maps; a second processing module 402, configured to classify the plurality of feature maps to obtain a plurality of thermodynamic diagrams; a third processing module 403, configured to perform target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames; a fourth processing module 404, configured to perform screening processing on the multiple first detection frames to obtain multiple second detection frames, where the multiple second detection frames are used to identify target objects of different classifications in the image.
In one embodiment, the first processing module is configured to input the image into a feature extraction module; and performing the feature extraction in the feature extraction module, and outputting the plurality of feature maps.
In one embodiment, the second processing module is configured to input the plurality of feature maps into a classification module; the classification is performed in the classification module, and the plurality of thermodynamic diagrams are output and used for characterizing a plurality of classes of image features.
In one embodiment, the third processing module is configured to input the thermodynamic diagrams into a connected domain module; and in the connected domain module, respectively carrying out connected domain operation on each thermodynamic diagram to realize the target detection processing, and outputting the plurality of first detection frames.
In one embodiment, the fourth processing module is configured to input the plurality of first detection boxes into a screening module including a regression branch and a classification branch; in the screening module, the coordinates of the first detection frames are subjected to regression processing based on the positions of the detection frames of the corresponding classes through the regression branch, the features of the positions corresponding to the first detection frames are extracted through the classification branch and then subjected to classification processing, and the second detection frames are output.
The functions of each module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method, and are not described herein again.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 5 is a block diagram of an electronic device for implementing the object detection method, the image processing method, and the video processing method of the embodiments of the present disclosure. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 can also be stored. The calculation unit 501, the ROM502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as the object detection method, the image processing, and the video processing method. For example, in some embodiments, the object detection method, the image processing and video processing methods may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the object detection method, the image processing, and the video processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the object detection method, the image processing and the video processing method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (13)

1. A method of target detection, the method comprising:
carrying out feature extraction on the image to obtain a plurality of feature maps;
classifying the plurality of feature maps to obtain a plurality of thermodynamic diagrams;
carrying out target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames;
and screening the plurality of first detection frames to obtain a plurality of second detection frames, wherein the plurality of second detection frames are used for identifying target objects of different classifications in the image.
2. The method of claim 1, wherein the extracting features of the image to obtain a plurality of feature maps comprises:
inputting the image into a feature extraction module;
and performing the feature extraction in the feature extraction module, and outputting the plurality of feature maps.
3. The method of claim 1 or 2, wherein the classifying the plurality of feature maps into a plurality of thermodynamic maps comprises:
inputting the plurality of feature maps into a classification module;
the classification is performed in the classification module, and the plurality of thermodynamic diagrams are output and used for characterizing a plurality of classes of image features.
4. The method of claim 1 or 2, wherein the performing target detection processing according to the plurality of thermodynamic diagrams resulting in a plurality of first detection frames comprises:
inputting the plurality of thermodynamic diagrams into a connected domain module;
and in the connected domain module, respectively carrying out connected domain operation on each thermodynamic diagram to realize the target detection processing, and outputting the plurality of first detection frames.
5. The method according to claim 1 or 2, wherein the screening the plurality of first detection frames to obtain a plurality of second detection frames, the plurality of second detection frames being used for identifying target objects of different classifications in the image, includes:
inputting the plurality of first detection boxes into a screening module comprising a regression branch and a classification branch;
in the screening module, the coordinates of the first detection frames are subjected to regression processing based on the positions of the detection frames of the corresponding classes through the regression branch, the features of the positions corresponding to the first detection frames are extracted through the classification branch and then subjected to classification processing, and the second detection frames are output.
6. An object detection apparatus, the apparatus comprising:
the first processing module is used for extracting the features of the image to obtain a plurality of feature maps;
the second processing module is used for classifying the plurality of feature maps to obtain a plurality of thermodynamic maps;
the third processing module is used for carrying out target detection processing according to the thermodynamic diagrams to obtain a plurality of first detection frames;
and the fourth processing module is used for screening the plurality of first detection frames to obtain a plurality of second detection frames, and the plurality of second detection frames are used for identifying target objects of different classifications in the image.
7. The apparatus of claim 6, wherein the first processing module is to:
inputting the image into a feature extraction module;
and performing the feature extraction in the feature extraction module, and outputting the plurality of feature maps.
8. The apparatus of claim 6 or 7, wherein the second processing module is configured to:
inputting the plurality of feature maps into a classification module;
the classification is performed in the classification module, and the plurality of thermodynamic diagrams are output and used for characterizing a plurality of classes of image features.
9. The apparatus of claim 6 or 7, wherein the third processing module is configured to:
inputting the plurality of thermodynamic diagrams into a connected domain module;
and in the connected domain module, respectively carrying out connected domain operation on each thermodynamic diagram to realize the target detection processing, and outputting the plurality of first detection frames.
10. The apparatus of claim 6 or 7, wherein the fourth processing module is configured to:
inputting the plurality of first detection boxes into a screening module comprising a regression branch and a classification branch;
in the screening module, the coordinates of the first detection frames are subjected to regression processing based on the positions of the detection frames of the corresponding classes through the regression branch, the features of the positions corresponding to the first detection frames are extracted through the classification branch and then subjected to classification processing, and the second detection frames are output.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-5.
13. A computer program product comprising computer instructions which, when executed by a processor, implement the method of any one of claims 1-5.
CN202110721556.XA 2021-06-28 2021-06-28 Target detection method and device, electronic equipment and storage medium Withdrawn CN113378857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110721556.XA CN113378857A (en) 2021-06-28 2021-06-28 Target detection method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110721556.XA CN113378857A (en) 2021-06-28 2021-06-28 Target detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113378857A true CN113378857A (en) 2021-09-10

Family

ID=77579619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110721556.XA Withdrawn CN113378857A (en) 2021-06-28 2021-06-28 Target detection method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113378857A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989568A (en) * 2021-10-29 2022-01-28 北京百度网讯科技有限公司 Target detection method, training method, device, electronic device and storage medium
CN114549874A (en) * 2022-03-02 2022-05-27 北京百度网讯科技有限公司 Training method of multi-target image-text matching model, image-text retrieval method and device
CN115049954A (en) * 2022-05-09 2022-09-13 北京百度网讯科技有限公司 Target identification method, device, electronic equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310770A (en) * 2020-02-21 2020-06-19 集美大学 Target detection method and device
CN111461182A (en) * 2020-03-18 2020-07-28 北京小米松果电子有限公司 Image processing method, image processing apparatus, and storage medium
CN111814755A (en) * 2020-08-18 2020-10-23 深延科技(北京)有限公司 Multi-frame image pedestrian detection method and device for night motion scene
CN112101195A (en) * 2020-09-14 2020-12-18 腾讯科技(深圳)有限公司 Crowd density estimation method and device, computer equipment and storage medium
CN112183435A (en) * 2020-10-12 2021-01-05 河南威虎智能科技有限公司 Two-stage hand target detection method
CN112967315A (en) * 2021-03-02 2021-06-15 北京百度网讯科技有限公司 Target tracking method and device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310770A (en) * 2020-02-21 2020-06-19 集美大学 Target detection method and device
CN111461182A (en) * 2020-03-18 2020-07-28 北京小米松果电子有限公司 Image processing method, image processing apparatus, and storage medium
CN111814755A (en) * 2020-08-18 2020-10-23 深延科技(北京)有限公司 Multi-frame image pedestrian detection method and device for night motion scene
CN112101195A (en) * 2020-09-14 2020-12-18 腾讯科技(深圳)有限公司 Crowd density estimation method and device, computer equipment and storage medium
CN112183435A (en) * 2020-10-12 2021-01-05 河南威虎智能科技有限公司 Two-stage hand target detection method
CN112967315A (en) * 2021-03-02 2021-06-15 北京百度网讯科技有限公司 Target tracking method and device and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989568A (en) * 2021-10-29 2022-01-28 北京百度网讯科技有限公司 Target detection method, training method, device, electronic device and storage medium
CN114549874A (en) * 2022-03-02 2022-05-27 北京百度网讯科技有限公司 Training method of multi-target image-text matching model, image-text retrieval method and device
CN114549874B (en) * 2022-03-02 2024-03-08 北京百度网讯科技有限公司 Training method of multi-target image-text matching model, image-text retrieval method and device
CN115049954A (en) * 2022-05-09 2022-09-13 北京百度网讯科技有限公司 Target identification method, device, electronic equipment and medium
CN115049954B (en) * 2022-05-09 2023-09-22 北京百度网讯科技有限公司 Target identification method, device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN113379718B (en) Target detection method, target detection device, electronic equipment and readable storage medium
CN112633276B (en) Training method, recognition method, device, equipment and medium
CN112597837B (en) Image detection method, apparatus, device, storage medium, and computer program product
CN112560862B (en) Text recognition method and device and electronic equipment
CN113378857A (en) Target detection method and device, electronic equipment and storage medium
CN115880536B (en) Data processing method, training method, target object detection method and device
CN113392794B (en) Vehicle line crossing identification method and device, electronic equipment and storage medium
CN114359932B (en) Text detection method, text recognition method and device
CN113657483A (en) Model training method, target detection method, device, equipment and storage medium
CN113947188A (en) Training method of target detection network and vehicle detection method
CN113378969A (en) Fusion method, device, equipment and medium of target detection results
CN113326773A (en) Recognition model training method, recognition method, device, equipment and storage medium
CN113239807A (en) Method and device for training bill recognition model and bill recognition
CN114443794A (en) Data processing and map updating method, device, equipment and storage medium
CN114724133A (en) Character detection and model training method, device, equipment and storage medium
CN113706705B (en) Image processing method, device, equipment and storage medium for high-precision map
CN113344121B (en) Method for training a sign classification model and sign classification
CN113569911A (en) Vehicle identification method and device, electronic equipment and storage medium
CN114724113B (en) Road sign recognition method, automatic driving method, device and equipment
CN114429631B (en) Three-dimensional object detection method, device, equipment and storage medium
CN115761698A (en) Target detection method, device, equipment and storage medium
CN113887394A (en) Image processing method, device, equipment and storage medium
CN115147814A (en) Recognition method of traffic indication object and training method of target detection model
CN112818972B (en) Method and device for detecting interest point image, electronic equipment and storage medium
CN113936158A (en) Label matching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210910

WW01 Invention patent application withdrawn after publication