CN112966565A

CN112966565A - Object detection method and device, terminal equipment and storage medium

Info

Publication number: CN112966565A
Application number: CN202110160557.1A
Authority: CN
Inventors: 赵雨佳; 程骏; 庞建新
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-15

Abstract

The application is applicable to the technical field of artificial intelligence, and provides an object detection method, an object detection device, terminal equipment and a storage medium, wherein the method comprises the following steps: training a preset optimization network model to obtain a pre-trained optimization network model; performing model pruning on the pre-trained optimization network model to obtain a target network model; and when the target image is acquired, detecting the target object through the target network model to acquire a detection result. According to the embodiment of the application, the optimized network model is trained and then pruned, so that the accuracy of object detection can be improved, the operation time can be reduced, and the target detection efficiency can be improved.

Description

Object detection method and device, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to an object detection method, an object detection device, a terminal device and a storage medium.

Background

With the rapid development of artificial intelligence technology, various artificial intelligence products are compliant, and intelligent products with a mobile function can autonomously judge road conditions in the driving process and realize the functions of obstacle avoidance and the like by detecting objects.

Due to the problem of moving speed, the requirements on the performance and accuracy of a target detection algorithm are very high for moving intelligent products, but the efficiency of detecting obstacles by adopting sensors such as laser, ultrasonic radar, infrared and the like or traditional algorithms is lower for the current moving intelligent products.

Disclosure of Invention

The embodiment of the application provides an object detection method, an object detection device, terminal equipment and a storage medium, and aims to solve the problem that an existing intelligent product with a mobile function is low in target detection efficiency.

In a first aspect, an embodiment of the present application provides an object detection method, including:

training a preset optimization network model to obtain a pre-trained optimization network model;

performing model pruning on the pre-trained optimization network model to obtain a target network model;

and when the target image is acquired, detecting the target object through the target network model to acquire a detection result.

In one embodiment, after obtaining the detection result, the method further includes: when the detection result comprises a target object, obtaining the type and the position of the target object;

and executing the operation corresponding to the type and the position of the target object according to the type and the position of the target object.

In an embodiment, when the target image is obtained, the detecting the target object by the target network model includes, before obtaining a detection result:

and optimizing the target network model through a TensorRT framework model to obtain a TensorRT engine model.

In one embodiment, the optimizing the target network model through a TensorRT framework model to obtain a TensorRT engine model includes:

converting the target network model into a target network model in a preset format;

and loading the target network model with the preset format into a TensorRT framework model to obtain the TensorRT engine model loaded with the target network model with the preset format.

In one embodiment, the detecting a target object through the target network model to obtain a detection result includes:

and detecting a target object through the TensorRT engine model to obtain the detection result.

In an embodiment, the performing model pruning on the pre-trained optimized network model to obtain a target network model includes:

and carrying out network structure pruning and network weight pruning on the pre-trained optimized network model according to a preset compression algorithm to obtain a target network model.

In one embodiment, the optimized network model is constructed based on the YOLOv5 model and uses the small model of MobileNetV3 as the backbone network.

In a second aspect, an embodiment of the present application provides an object detection apparatus, including:

the training module is used for training a preset optimization network model to obtain a pre-trained optimization network model;

the pruning module is used for carrying out model pruning on the pre-trained optimized network model to obtain a target network model;

and the detection module is used for detecting the target object through the target network model when the target image is obtained, so as to obtain a detection result.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above object detection method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the object detection method are implemented.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on an electronic device, causes the electronic device to perform the steps of the above object detection method.

Compared with the prior art, the embodiment of the application has the advantages that: the embodiment of the application can obtain the pre-trained optimized network model by training the preset optimized network model; and performing model pruning on the pre-trained optimization network model to obtain a target network module, and detecting a target object through the target network model when a target image is obtained to obtain a detection result. The training of the optimized network model and the pruning can improve the accuracy of object detection and reduce the operation time, thereby improving the target detection efficiency.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of an object detection method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of an object detection method according to another embodiment of the present application

Fig. 3 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The object detection method provided by the embodiment of the application can be applied to a robot, the robot can be a mobile robot, and the mobile robot can be an intelligent vehicle. The robot may also be a service robot, an entertainment robot, a military robot, an agricultural robot, etc., and the embodiments of the present application do not limit the specific type of the robot.

For example, in an application scenario, the object detection method provided by the embodiment of the application is applied to an intelligent car, for example, an automatic driving intelligent car generally uses a camera as a sensor, and completes an automatic tracking driving task on a specific path through an automatic tracking algorithm. In order to enable the intelligent vehicle to have AI capability, the intelligent vehicle also needs to autonomously judge the road condition in autonomous tracing driving, for example, stop and driving signal feedback is made to traffic lights on the road, accurate position information is output to obstacles (such as pedestrians, other vehicles and obstacle objects), and traffic signs are correctly identified. The object detection method provided by the embodiment of the application can realize the identification requirements of the intelligent trolley except for path identification, can efficiently output the position information of various target objects, supports the trolley to make corresponding decisions, and can embody the AI capability of the intelligent trolley.

In order to explain the technical means described in the present application, the following examples are given below.

Referring to fig. 1, an object detection method provided in an embodiment of the present application includes:

step S101, training a preset optimization network model to obtain a pre-trained optimization network model.

Specifically, an optimized network model can be constructed in advance based on a lightweight network model, a large number of sample images containing various types of objects are prepared according to various object types needing to be identified, and each sample image comprises object classes corresponding to the objects which are classified and marked. And training a large number of prepared sample images on a preset optimization network model until a preset loss function of the optimization network model converges, and judging that the optimization network model is a trained network model (namely the pre-trained optimization network model). The predetermined loss function may be a cross entropy loss function or a mean square error loss function. And detecting the target object through the obtained pre-trained optimization network, and determining the object type and the position information of the object when the target object is detected.

Specifically, a network model is constructed on the basis of YOLOv5, the model size of YOLOv5 is mainly divided into l (large), m (middle), and s (small), and the YOLOv5 algorithm can be applied to scenes with different performance requirements due to different network widths and different parameters. The execution terminal is a mobile intelligent product, and the network structure based on the s model, namely with the minimum parameter number, is selected as the reference network, so that the method and the device are suitable for scenes with short time consumption and high precision. However, even the s model of YOLOv5 with the minimum parameter is still time-consuming to operate on a mobile intelligent product, and needs to be lightweight in order to further compress the size of the model, the original backhaul of YOLOv5 uses a cross-phase local area network (CSPNet), and the models with different sizes of l, m and s are just scaled in the depth of the model and the number of channels, and the essential structure of the model is not changed. Therefore, the backhaul formed by stacking the cspnets is also heavier for the s model, so that the present application adopts a small model of mobilenetv3, which is friendly for deploying a mobile terminal, as a backbone network (backhaul), and can further compress the size of the model.

And S102, carrying out model pruning on the pre-trained optimized network model to obtain a target network model.

Specifically, the parameters in the pre-trained optimized network model are already parameters which are adjusted in the training process, but there are parameters which have little influence on the final detection performance of the model, so the parameters which have little influence on the final detection performance in the model can be pruned through a pruning algorithm.

In an embodiment, the performing model pruning on the pre-trained optimized network model to obtain a target network model includes: and carrying out network structure pruning and network weight pruning on the pre-trained optimized network model according to a preset compression algorithm to obtain a target network model.

Specifically, the pruning algorithm for the network model may be network structure pruning and network weight pruning, and the network structure pruning may be pruning of structural parameters such as filters and channels in the network model, so as to implement model compression and acceleration. Pruning of the network weights may be model compression by clipping individual weight parameters in the network model. For example, the network structure pruning and the network weight pruning can be performed through an analysis compression algorithm provided by an Intel open-source neural network compression library Distiller.

In one application, filters corresponding to parameters with structural parameters smaller than a first preset threshold value are cut and deleted, channels corresponding to parameters with structural parameters smaller than a second preset threshold value are judged to be unimportant channels, and the channels are deleted, so that the complexity of a model can be reduced.

And step S103, when the target image is obtained, detecting the target object through the target network model to obtain a detection result.

Specifically, in the process of executing terminal movement, an image, called a target image, may be acquired by an image sensor (e.g., a camera), and the acquired target image is detected by a target network model to obtain a detection result. The detection result may specifically be that, when the target object is detected, the position information of the target object in the image and the classification label of the target object are output.

In an embodiment, when the target image is obtained, the detecting the target object by the target network model includes, before obtaining a detection result: and optimizing the target network model through a TensorRT framework model to obtain a TensorRT engine model.

Specifically, the TensorRT technology can optimize and deploy a pre-trained target network model, can improve the reasoning speed of the model, and can further improve the real-time performance of the model for detecting the target object when a plurality of target objects exist, so that the TensorRT framework model is adopted to optimize the target network model to obtain the TensorRT engine model. For example, the TensorRT framework model may use the Graphics Processing Unit (GPU) of NVIDIA.

In one embodiment, the optimizing the target network model through a TensorRT framework model to obtain the TensorRT engine model includes: converting the target network model into a target network model in a preset format; and loading the target network model with the preset format into a TensorRT framework model to obtain the TensorRT engine model loaded with the target network model with the preset format.

Specifically, the TensorRT supports an Open Neural Network Exchange (NNOX) model, and a target Network model is converted into an ONNX model and then is converted into an engine (engine) model of the TensorRT. If the target network model can be converted into an ONNX model according to an ONNX interface plug-in, the target network model is converted into an ONNX model file, namely, an ONNX file. And loading the ONNX model file converted by using a TensorRT framework model to generate a TensorRT engine (engine) model, namely the TensorRT engine model loaded with the target network model in the preset format. The TensorRT engine model adopts an engine model of FP16 (16-bit floating point number) instead of FP32 (32-bit floating point number), the size of the model is further reduced, and the model of FP16 can achieve speed doubling with little precision loss.

Detecting a target object through the target network model to obtain a detection result, wherein the detection result comprises the following steps: and detecting a target object through the TensorRT engine model to obtain the detection result.

Specifically, the object detection is performed on the TensorRT engine model loaded with the target network model in the preset format, and the detection result is obtained.

In one embodiment, after obtaining the detection result, the method further includes steps S201 to S202:

step S201, when the detection result comprises the target object, obtaining the type and the position of the target object.

Specifically, when the detection result is that the target object is detected, the type and position of the target object are determined according to the detection result.

Step S202, according to the type and the position of the target object, executing the operation corresponding to the type and the position of the target object.

Specifically, the type and the position of the target object are determined according to the detection result, the position is the position of the execution terminal relative to the target object under the image coordinate system, the position of the target object under the image coordinate system is converted into the position of the target object under the world coordinate system according to the calibration parameters of the execution terminal, and the corresponding pre-associated linkage operation is executed according to the type of the target object and the position of the target object under the world coordinate system. If the type of the target object is detected to be a red light, the distance to the red light is determined according to the position of the red light, the operation pre-associated with the distance under the red light type is executed, so that the type (such as pedestrians, other vehicles and obstacle objects) and the position of the object can be determined, and corresponding feedback is made according to the type and the position of the object.

The embodiment of the application can obtain the pre-trained optimized network model by training the preset optimized network model; and performing model pruning on the pre-trained optimization network model to obtain a target network module, and detecting a target object through the target network model when a target image is obtained to obtain a detection result. The training of the optimized network model and the pruning can improve the accuracy of object detection and reduce the operation time, thereby improving the target detection efficiency.

The embodiment of the application also provides an object detection device, which is used for executing the steps in the embodiment of the object detection method. The object detection device may be a virtual appliance (virtual application) in the terminal device, which is executed by a processor of the terminal device, or may be the terminal device itself.

As shown in fig. 3, an object detection apparatus 300 according to an embodiment of the present application includes:

the training module 301 is configured to train a preset optimization network model to obtain a pre-trained optimization network model;

a pruning module 302, configured to perform model pruning on the pre-trained optimized network model to obtain a target network model;

and the detection module 303 is configured to perform target object detection through the target network model when the target image is obtained, so as to obtain a detection result.

In one embodiment, the object detection apparatus further comprises:

an obtaining module, configured to obtain a type and a position of a target object when the detection result includes the target object;

and the execution module is used for executing the operation corresponding to the type and the position of the target object according to the type and the position of the target object.

In one embodiment, the object detection apparatus further comprises:

and the optimization module is used for detecting the target object through the target network model when the target image is obtained, and optimizing the target network model through the TensorRT framework model before the detection result is obtained to obtain the TensorRT engine model.

In one embodiment, the optimization module specifically includes:

the conversion unit is used for converting the target network model into a target network model in a preset format;

and the loading unit is used for loading the target network model with the preset format into the TensorRT framework model to obtain the TensorRT engine model loaded with the target network model with the preset format.

In one embodiment, the detection module is specifically configured to: and detecting a target object through the TensorRT engine model to obtain the detection result.

In one embodiment, the pruning module is specifically configured to: and carrying out network structure pruning and network weight pruning on the pre-trained optimized network model according to a preset compression algorithm to obtain a target network model.

As shown in fig. 4, an embodiment of the present invention further provides a terminal device 400 including: a processor 401, a memory 402 and a computer program 403, such as an object detection program, stored in said memory 402 and executable on said processor 401. The processor 401, when executing the computer program 403, implements the steps in the various object detection method embodiments described above. The processor 401, when executing the computer program 403, implements the functions of the modules in the device embodiments described above, such as the functions of the modules 301 to 303 shown in fig. 3.

Illustratively, the computer program 403 may be partitioned into one or more modules that are stored in the memory 402 and executed by the processor 401 to implement the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program 403 in the terminal device 400. For example, the computer program 403 may be divided into a training module, a pruning module and a detection module, and specific functions of the modules are described in the foregoing embodiments, and are not described herein again.

The terminal device 400 may be a robot with a mobile function, or a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The terminal device may include, but is not limited to, a processor 401, a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 400 and does not constitute a limitation of terminal device 400 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage unit of the terminal device 400, such as a hard disk or a memory of the terminal device 400. The memory 402 may also be an external storage device of the terminal device 400, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 400. Further, the memory 402 may also include both an internal storage unit and an external storage device of the terminal device 400. The memory 402 is used for storing the computer programs and other programs and data required by the terminal device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An object detection method, comprising:

2. The object detection method according to claim 1, further comprising, after obtaining the detection result:

when the detection result comprises a target object, obtaining the type and the position of the target object;

3. The object detection method according to claim 1, wherein before the target object detection is performed by the target network model when the target image is acquired and a detection result is obtained, the method comprises:

4. The object detection method according to claim 1, wherein the optimizing the target network model through a TensorRT framework model to obtain a TensorRT engine model comprises:

5. The object detection method according to claim 4, wherein the detecting a target object through the target network model to obtain a detection result comprises:

6. The object detection method according to any one of claims 1 to 5, wherein the performing model pruning on the pre-trained optimized network model to obtain a target network model comprises:

7. The object detection method according to any one of claims 1 to 5, wherein the optimized network model is constructed based on a YOLOv5 model and uses a small model of MobileNet V3 as a backbone network.

8. An object detecting device, comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.