WO2023095934A1

WO2023095934A1 - Method and system for lightening head neural network of object detector

Info

Publication number: WO2023095934A1
Application number: PCT/KR2021/017317
Authority: WO
Inventors: 채명수; 김태호; 박한철
Original assignee: 주식회사 노타
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2023-06-01
Also published as: KR20230162676A

Abstract

A method and system for lightening a head neural network of an object detector are disclosed. The method for lightening according to an embodiment may comprise the steps of: receiving an object detector model as an input; replacing a head neural network of the object detector model that has been received as an input; determining whether or not to perform anchor pruning; if it is determined to perform the anchor pruning, performing anchor pruning on the object detector model in which the head neural network has been replaced; and outputting the lightened object detector model.

Description

Objects, detectors, heads, neural networks, lightweight methods, and systems

Embodiments of the present invention relate to objects, detectors, heads, neural networks, weight reduction methods, and systems, and more particularly, to weight reduction methods and systems specialized for weight reduction of head neural networks rather than weight reduction centered on backbone neural networks.

The present invention is a study conducted with the support of the Information and Communications Planning and Evaluation Institute with financial resources from the government (Ministry of Science and ICT) in 2021 (No. This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No. 2021-0-00907, Development of Adaptive and Lightweight Edge-Collaborative Analysis Technology for Enabling Proactively Immediate Response and Rapid Learning)).

Systems such as self-driving cars and smart signal control require real-time performance, but because a deep neural network-based object detector operates on a device with a relatively small resource size, lightweighting of the deep neural network model is essential. It should be.

The deep neural network-based object detector model consists of a backbone neural network that extracts features for input and a head neural network that predicts the coordinates and object type of an object. Previously, only studies on lightweight backbone neural networks have been conducted. has been focused

As a method of lightening the backbone neural network, a method of using an efficient convolutional neural network such as MobileNet instead of a convolutional neural network model having a relatively large size and excellent performance has been mainly used. In addition, the model of the backbone neural network is compressed using lightweight techniques such as pruning and low-rank approximation of the backbone neural network.

Recently, research on efficient backbone neural networks has been actively conducted, and as lightweight techniques of backbone neural networks have been advanced, the proportion of computations in backbone neural networks is decreasing and the proportion of computations in head neural networks that perform actual predictions is relatively increasing. However, discussions and research on weight reduction of the head neural network of object detectors have not been actively conducted.

We provide a method and system for reducing the weight of the head neural network of an object detector, which is specialized for weight reduction of the head neural network, rather than the weight reduction of the backbone neural network, which has been mainly studied in the past.

A lightweight method performed by a computer device including at least one processor, comprising: receiving, by the at least one processor, an object detector model; replacing, by the at least one processor, a head neural network of the input object detector model; determining, by the at least one processor, whether to perform anchor pruning; performing anchor pruning on the object detector model replaced by the head neural network, when it is determined by the at least one processor to perform the anchor pruning; and outputting, by the at least one processor, a lightweight object detector model.

According to one side, the step of replacing the head neural network may be characterized by reducing the number of output channels of a convolutional layer constituting the head neural network of the input object detector model.

According to another aspect, the step of replacing the head neural network may include converting a convolutional layer constituting the head neural network of the input object detector model to another efficient convolutional layer or block (eg, a shuffle block) It can be characterized by replacing with.

According to another aspect, in the replacing of the head neural network, the head neural network of the input object detector model is replaced with a head neural network searched using a neural architecture search (NAS) method. can do.

According to another aspect, the pruning of anchors may include measuring importance of anchors; removing anchors belonging to a predetermined ratio or less based on the importance of the anchors; and re-learning an object detector model from which anchors belonging to the predetermined ratio or less are removed.

According to another aspect, the step of measuring the importance of each anchor may be characterized in that the importance of each independent anchor is determined based on the extent of performance degradation before and after removing the output of each independent anchor.

According to another aspect, the step of measuring the importance of the anchor may be characterized in that the importance of the anchor is determined based on the degree of redundancy of the bounding box predicted by each anchor.

According to another aspect, the redundancy of the bounding box is a value obtained by dividing the number of anchors whose Intersection over Union (IoU) score with the bounding box predicted by the first anchor is equal to or greater than a preset value by the number of bounding boxes predicted by one anchor It can be characterized in that it is calculated based on.

According to another aspect, the IoU score may be calculated based on a value obtained by dividing the area of an overlapping region of two bounding boxes predicted by two anchors by the total area of the two bounding boxes.

According to another aspect, the outputting of the lightweight object detector model may include outputting, as the lightweight object detector model, an object detector model in which the head neural network is replaced when the anchor pruning is not performed, and the anchor pruning is not performed. When pruning is performed, the head neural network is replaced and the object detector model for which the anchor pruning is performed may be output as the lightweight object detector model.

A computer program stored in a computer readable recording medium is provided in combination with a computer device to execute the method on the computer device.

A computer readable recording medium having a program for executing the method in a computer device is recorded.

It includes at least one processor implemented to execute instructions readable by a computer device, receiving an object detector model by the at least one processor, replacing a head neural network of the received object detector model, and pruning anchors. , and if it is determined to perform the anchor pruning, the head neural network performs anchor pruning on the replaced object detector model and outputs a lightweight object detector model. Provides a computer device that performs the device.

It is possible to provide a method and system for reducing the weight of the head neural network of an object detector, which is specialized for weight reduction of the head neural network, rather than weight reduction centered on the backbone neural network, which has been mainly studied in the past.

1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention.

2 is a block diagram illustrating an example of a computer device according to one embodiment of the present invention.

3 is a diagram showing an example of a structure of an object detector according to an embodiment of the present invention.

4 is a diagram illustrating an example of an original image and a feature map according to an embodiment of the present invention.

5 is a flowchart illustrating an example of a weight reduction method according to an embodiment of the present invention.

6 is a diagram illustrating an example of calculating an IoU score according to an embodiment of the present invention.

Hereinafter, an embodiment will be described in detail with reference to the accompanying drawings.

A lightweight system according to embodiments of the present invention may be implemented by at least one computer device. At this time, a computer program according to an embodiment of the present invention may be installed and driven in the computer device, and the computer device may perform the weight reduction method according to the embodiments of the present invention under the control of the driven computer program. The above-described computer program may be combined with a computer device and stored in a computer readable recording medium to execute the weight reduction method on a computer.

1 is a diagram illustrating an example of a network environment according to an embodiment of the present invention. The network environment of FIG. 1 shows an example including a plurality of

electronic devices

110 , 120 , 130 , and 140 , a plurality of

servers

150 and 160 , and a network 170 . 1 is an example for explanation of the invention, and the number of electronic devices or servers is not limited as shown in FIG. 1 . In addition, the network environment of FIG. 1 only describes one example of environments applicable to the present embodiments, and the environment applicable to the present embodiments is not limited to the network environment of FIG. 1 .

The plurality of

electronic devices

110, 120, 130, and 140 may be fixed terminals implemented as computer devices or mobile terminals. Examples of the plurality of

electronic devices

110, 120, 130, and 140 include a smart phone, a mobile phone, a navigation device, a computer, a laptop computer, a digital broadcast terminal, a personal digital assistant (PDA), and a portable multimedia player (PMP). ), and tablet PCs. As an example, FIG. 1 shows the shape of a smartphone as an example of the electronic device 110, but in the embodiments of the present invention, the electronic device 110 substantially uses a wireless or wired communication method to transmit other information via the network 170. It may refer to one of various physical computer devices capable of communicating with the

electronic devices

120 , 130 , and 140 and/or the

servers

150 and 160 .

The communication method is not limited, and short-distance wireless communication between devices as well as a communication method utilizing a communication network (eg, a mobile communication network, a wired Internet, a wireless Internet, and a broadcasting network) that the network 170 may include may also be included. For example, the network 170 may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), and a broadband network (BBN). , one or more arbitrary networks such as the Internet. In addition, the network 170 may include any one or more of network topologies including a bus network, a star network, a ring network, a mesh network, a star-bus network, a tree or a hierarchical network, and the like. Not limited.

Each of the

servers

150 and 160 communicates with the plurality of

electronic devices

110, 120, 130, and 140 through the network 170 to provide commands, codes, files, contents, services, and the like, or a computer device or a plurality of computers. It can be implemented in devices. For example, the server 150 provides a service (eg, an instant messaging service, a social network service, a payment service, a virtual exchange) to a plurality of

electronic devices

110, 120, 130, and 140 connected through the network 170. service, risk monitoring service, game service, group call service (or voice conference service), messaging service, mail service, map service, translation service, financial service, search service, content provision service, etc.).

2 is a block diagram illustrating an example of a computer device according to one embodiment of the present invention. Each of the plurality of

electronic devices

110 , 120 , 130 , and 140 or each of the

servers

150 and 160 described above may be implemented by the computer device 200 shown in FIG. 2 .

As shown in FIG. 2 , the computer device 200 may include a memory 210, a processor 220, a communication interface 230, and an input/output interface 240. The memory 210 is a computer-readable recording medium and may include a random access memory (RAM), a read only memory (ROM), and a permanent mass storage device such as a disk drive. Here, a non-perishable mass storage device such as a ROM and a disk drive may be included in the computer device 200 as a separate permanent storage device distinct from the memory 210 . Also, an operating system and at least one program code may be stored in the memory 210 . These software components may be loaded into the memory 210 from a computer-readable recording medium separate from the memory 210 . The separate computer-readable recording medium may include a computer-readable recording medium such as a floppy drive, a disk, a tape, a DVD/CD-ROM drive, and a memory card. In another embodiment, software components may be loaded into the memory 210 through the communication interface 230 rather than a computer-readable recording medium. For example, software components may be loaded into memory 210 of computer device 200 based on a computer program installed by files received over network 170 .

The processor 220 may be configured to process commands of a computer program by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processor 220 by memory 210 or communication interface 230 . For example, processor 220 may be configured to execute received instructions according to program codes stored in a recording device such as memory 210 .

The communication interface 230 may provide a function for the computer device 200 to communicate with other devices (eg, storage devices described above) through the network 170 . For example, a request, command, data, file, etc. generated according to a program code stored in a recording device such as the memory 210 by the processor 220 of the computer device 200 is controlled by the communication interface 230 to the network ( 170) to other devices. Conversely, signals, commands, data, files, etc. from other devices may be received by the computer device 200 through the communication interface 230 of the computer device 200 via the network 170 . Signals, commands, data, etc. received through the communication interface 230 may be transferred to the processor 220 or the memory 210, and files, etc. may be stored as storage media that the computer device 200 may further include (described above). permanent storage).

The input/output interface 240 may be a means for interface with the input/output device 250 . For example, the input device may include a device such as a microphone, keyboard, or mouse, and the output device may include a device such as a display or speaker. As another example, the input/output interface 240 may be a means for interface with a device in which functions for input and output are integrated into one, such as a touch screen. At least one of the input/output devices 250 may be configured as one device with the computer device 200 . For example, like a smart phone, a touch screen, a microphone, a speaker, and the like may be implemented in a form included in the computer device 200 .

Also, in other embodiments, computer device 200 may include fewer or more elements than those of FIG. 2 . However, there is no need to clearly show most of the prior art components. For example, the computer device 200 may be implemented to include at least some of the aforementioned input/output devices 250 or may further include other components such as a transceiver and a database.

3 is a diagram showing an example of a structure of an object detector according to an embodiment of the present invention. The object detector may be divided into a backbone neural network, a neck neural network, and a head neural network, but in the object detector according to the present embodiment, the head neural network may be defined as both the intermediate neural network and the head neural network.

The pixels of the input image can be compressed into abstract values while passing through the convolutional layer of the backbone neural network, and can be expressed as a feature map with a smaller resolution than before passing through the convolutional layer. there is. (1/n) of “Convolutional layer (1/n)” shown in FIG. 3 may mean a reduction in the spatial size of an image after passing through the corresponding convolutional layer. For example, when an image with a resolution of 800Х800 passes through a "convolutional layer (1/2)", it can be reduced to a resolution of 400Х400.

In the neck neural network, a new feature map can be generated by passing feature maps of different sizes through another convolutional layer. At this time, except for the feature map of the smallest size ("Feature map (1/32)" in the embodiment of FIG. After upsampling the size of the maps, they are combined to create a new feature map, which is then passed through a convolutional layer.

4 is a diagram illustrating an example of an original image and a feature map according to an embodiment of the present invention. Each grid of each feature map reduced to 1/n size represents n cells of the original image, and how m anchors pre-specified for each grid must be calibrated to form a bounding box that encloses the object. box) and object types nested within the bounding box. m identical anchors are applied to each feature map, and different anchors can be defined in different feature maps. If 9 predefined anchors exist for each feature map and 4 feature maps are generated, a total of 36 independent anchors can be defined.

5 is a flowchart illustrating an example of a weight reduction method according to an embodiment of the present invention. The weight reduction method according to the present embodiment may be performed by at least one computer device 200 implementing the weight reduction system. For example, the processor 220 of the computer device 200 may be implemented to execute a control instruction according to an operating system code or at least one computer program code included in the memory 210 . Here, the processor 220 controls the computer device 200 so that the computer device 200 performs the steps 510 to 550 included in the method of FIG. 5 according to a control command provided by a code stored in the computer device 200. can control.

In step 510, the computer device 200 may receive an object detector model. For example, the computer device 200 may receive an original model of an object detector for lightening a head neural network out of an overall structure.

In step 520, the computer device 200 may replace the head neural network of the input object detector model. For example, the computer device 200 may replace the head neural network included in the original model of the object detector with a relatively small head neural network (lightweight head neural network). Since the backbone neural network trained on a large-sized model has a better feature extraction ability, in step 520, the computer device 200 maintains the parameters of the backbone neural network of the pre-trained input model while maintaining the head If you modify the neural network, you can get better performance.

In this case, the computer device 200 may replace the head neural network with a lightweight head neural network through at least one of methods (1) to (3) below.

(1) A method of searching for a high-performance, high-efficiency head neural network using a Neural Architecture Search (NAS), for example, the computer device 200 automatically searches for the neural network structure of the head neural network of the object detector model input. It can be replaced with the head neural network searched using the technique.

(2) A method of reducing the number of output channels of the convolutional layer constituting the head neural network, for example, the computer device 200 reduces the number of output channels of the convolutional layer constituting the head neural network of the received object detector model, thereby reducing the number of output channels of the head neural network can be replaced.

(3) A method of replacing the convolution layer constituting the head neural network with a more efficient convolution layer or block (eg, a shuffle block), for example, the computer device 200 calculates the input object detector model A convolutional layer constituting the head neural network may be replaced with another convolutional layer block.

After replacing the head neural network, since the parameters of the replaced head neural network are set to initial values, in order to increase the performance of the model replaced by the head neural network as much as the performance of the original model of the object detector, the head neural network is used with training data from the original model. Retraining can be performed on this replaced model. At this time, as in the transfer learning method, learning is started from the parameters of the backbone neural network that is learned from a large model and has good parameters, but both the method of learning only the replaced (lightened) head neural network or the method of updating the backbone neural network together are used this is possible

Among the methods (1) to (3) described above, method (2) of reducing the number of output channels of the convolutional neural network layer is the easiest method to use, and the head neural network while maintaining the parameters of the backbone neural network through experiments. It was confirmed that latency (inference speed) can be improved while achieving better performance than the original model when applying the method of retraining the entire backbone and head neural network after adjusting the number of channels.

The object detector model used in the experiment is "Yolo v5", and while maintaining the parameters of the backbone neural network of the input object detector model, the head neural network was replaced with a head neural network composed of convolutional layers with a smaller number of output channels. . The data used at this time was OCR (Optical Character Reader) data for recognizing string objects on images, which was learned with 36,939 images and evaluated with 3,000 evaluation data. At this time, the F1 score was used as a performance indicator of the object detector. The performance evaluation results are shown in Table 1 below.

	F1 스코어F1 score	Latency (ms)Latency (ms)	파라미터 수number of parameters
원본 객체 탐지기original object detector	0.79450.7945	137.06137.06	36.3436.34
헤드 신경망이 교체된 객체 탐지기Object detector with head neural network replaced	0.79980.7998	116.02116.02	24.5624.56

In step 530, the computer device 200 may determine whether to perform anchor pruning. Anchor pruning can be performed to further improve inference speed after the head neural network is replaced. For example, anchor pruning can improve inference speed in two aspects (1) and (2) by removing predefined anchors in object detectors: Reduced the number of modifiers for each anchor

(2) Inference speed can be improved by reducing the number of bounding boxes used in the NMS (Non-Maximum Suppression) process of pairing and comparing predicted bounding boxes to select the best bounding box among overlapping bounding boxes.

Anchor pruning is difficult to use if the total number of predefined anchors is small or if the anchor-free object detector is an anchor-free object detector, but most high-performance object detectors have a large number of anchors. Anchor pruning is possible for detectors.

If it is determined to perform anchor pruning, step 540 may be performed, and if it is determined not to perform anchor pruning, step 550 may be performed.

In step 540, the computer device 200 may perform anchor pruning on the model in which the head neural network is replaced. Before performing anchor pruning, prediction values for all bounding boxes and object types obtained from each anchor can be stored in advance through a validation dataset for the lightweight model. At this time, anchor pruning can be performed in three steps (1) to (3) below.

(1) Anchor importance measurement, for example, the computer device 200 may measure the anchor importance.

(2) Remove a certain percentage (eg, r%) of unimportant anchors, for example, the computer device 200 may remove r% of anchors based on the importance of the anchors. Here, r may be a natural number, and the computer device 200 may remove anchors belonging to r% or less based on importance among all anchors.

(3) Model re-learning, for example, the computer device 200 may re-learn a model from which r% anchors have been removed.

Anchor importance measurement (1) can be performed through one of two of the following (a) and (b).

(a) performance-based importance measures

(b) redundant anchor measurement

For the performance-based importance measurement (a), the computer device 200 may perform performance evaluation on the verification data set after removing outputs of independent anchors with respect to the stored prediction value for each anchor. At this time, the computer device 200 may measure the extent of performance degradation compared to the conventional one, consider anchors with a relatively small extent of performance degradation as important anchors, and consider anchors with a relatively large extent of performance degradation as unimportant anchors and sort them. there is. In other words, the computer device 200 may sort the anchors by determining the importance of each independent anchor based on the extent of performance degradation before and after removing the output of each independent anchor.

Regarding redundant anchor measurement (b), one anchor defined in each feature map often predicts almost the same bounding box as anchors in other feature maps. By finding and removing these redundant bounding boxes, the number of unnecessarily predicted bounding boxes can be reduced. The redundancy of each bounding box may be calculated as a value obtained by dividing the number of anchors whose Intersection over Union (IoU) scores of x or more with bounding boxes predicted by other anchors by the number of bounding boxes predicted by one anchor. In this case, the computer device 200 may determine the importance of anchors based on the degree of redundancy of the bounding box predicted by each anchor. Here, the score of IoU measures the degree to which two bounding boxes overlap, and can be calculated as a value obtained by dividing the area of the area where the two bounding boxes overlap by the area of the entire area of the two bounding boxes. The closer the score of IoU is to 1, the two bounding boxes can be regarded as identical, and the closer to 0, the two bounding boxes can be regarded as different.

6 is a diagram illustrating an example of calculating an IoU score according to an embodiment of the present invention. In FIG. 6, "Score" may correspond to an IoU score, "Area of overlap" means the area of overlapping two bounding boxes, and "Area of union" means the total area of two bounding boxes, respectively. can do. As shown in FIG. 6, "Score" may be calculated as a value obtained by dividing "Area of overlap" by "Area of union".

For r% unimportant anchor removal (2), the computer device 200 may remove anchors of the lower r% of the ordered anchors among the predefined anchors. In this case, the value of r may be preset in consideration of the degree of improvement in inference speed versus performance degradation using the verification data set.

For model retraining (3), the computer device 200 retrains the model using the training data used when the original object detector was trained similarly in order to adapt the model to the newly defined anchor after anchor pruning. can be done

In step 550, the computer device 200 may output a lightweight object detector model. When step 540 is omitted, the computer device 200 may output a lightweight object detector model in which the head neural network is replaced, and when step 540 is performed, the computer device 200 replaces the head neural network. and a lightweight object detector model with anchor pruning performed can be output.

As such, according to the embodiments of the present invention, it is possible to provide a method and system for reducing the weight of the head neural network of an object detector, which is specialized for weight reduction of the head neural network, rather than the weight reduction of the backbone neural network, which has been mainly studied in the past.

The system or device described above may be implemented as a hardware component or a combination of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. can be embodied in Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The medium may continuously store programs executable by a computer or temporarily store them for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or combined hardware, but is not limited to a medium directly connected to a certain computer system, and may be distributed on a network. Examples of the medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROM and DVD, magneto-optical media such as floptical disks, and ROM, RAM, flash memory, etc. configured to store program instructions. In addition, examples of other media include recording media or storage media managed by an app store that distributes applications, a site that supplies or distributes various other software, and a server. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

As described above, although the embodiments have been described with limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

A lightweight method performed by a computer device including at least one processor,

receiving an object detector model as an input by the at least one processor;

replacing, by the at least one processor, a head neural network of the input object detector model;

determining, by the at least one processor, whether to perform anchor pruning;

performing anchor pruning on the object detector model replaced by the head neural network, when it is determined by the at least one processor to perform the anchor pruning; and

Outputting, by the at least one processor, a lightweight object detector model

A lightweight method comprising a.
According to claim 1,

Replacing the head neural network,

The weight reduction method characterized in that for reducing the number of output channels of the convolutional layer constituting the head neural network of the input object detector model.
According to claim 1,

Replacing the head neural network,

The lightweight method characterized in that replacing the convolutional layer constituting the head neural network of the input object detector model with another convolutional layer block.
According to claim 1,

Replacing the head neural network,

The weight reduction method characterized in that replacing the head neural network of the input object detector model with the head neural network searched using a neural network structure automatic search technique (Neural Architecture Search, NAS).
According to claim 1,

The step of performing the anchor pruning,

measuring anchor importance;

removing anchors belonging to a predetermined ratio or less based on the importance of the anchors; and

Re-learning an object detector model from which anchors belonging to the predetermined ratio or less are removed

Weight reduction method comprising a.
According to claim 5,

The step of measuring the importance of the anchor,

A lightweight method characterized by determining the importance of each independent anchor based on the extent of performance degradation before and after removing the output of each independent anchor.
According to claim 5,

The step of measuring the importance of the anchor,

A lightweight method characterized in that the importance of anchors is determined based on the redundancy of the bounding box predicted by each anchor.
According to claim 7,

The redundancy of the bounding box is calculated based on a value obtained by dividing the number of anchors whose Intersection over Union (IoU) score with the bounding box predicted by the first anchor is equal to or greater than a preset value by the number of bounding boxes predicted by one anchor. lightweight method.
According to claim 8,

The weight reduction method of claim 1 , wherein the IoU score is calculated based on a value obtained by dividing an area of an overlapping area of two bounding boxes predicted by two anchors by an area of a total area of the two bounding boxes.
According to claim 1,

In the step of outputting the lightweight object detector model,

When the anchor pruning is not performed, the object detector model in which the head neural network is replaced is output as the lightweight object detector model, and when the anchor pruning is performed, the head neural network is replaced and the anchor pruning is performed. A lightweight method characterized in that for outputting the performed object detector model as the lightweight object detector model.
A computer program stored in a computer readable recording medium to be combined with a computer device to execute the method of any one of claims 1 to 10 in the computer device.
A computer readable recording medium in which a program for executing the method of any one of claims 1 to 10 is recorded in a computer device.
at least one processor implemented to execute instructions readable by a computer device;

including,

by the at least one processor,

Take the object detector model as input,

Replace the head neural network of the input object detector model,

determine whether to perform anchor pruning;

If it is determined that anchor pruning is to be performed, anchor pruning is performed on the object detector model in which the head neural network has been replaced;

Outputting a lightweight object detector model

Characterized by a computer device.
According to claim 13,

To replace the head neural network, by the at least one processor,

Reduce the number of output channels of the convolutional layer constituting the head neural network of the input object detector model, or replace the convolutional layer constituting the head neural network of the input object detector model with another convolutional layer block. Or replacing the head neural network of the input object detector model with a head neural network searched using a neural network structure automatic search technique (Neural Architecture Search, NAS)

Characterized by a computer device.
According to claim 13,

To perform the anchor pruning, by the at least one processor:

measure the importance of anchors,

Remove anchors belonging to a certain ratio or less based on the importance of the anchors;

Re-learning an object detector model from which anchors belonging to the predetermined ratio or less are removed

Characterized by a computer device.
According to claim 15,

to measure the importance of the anchor, by the at least one processor;

Determining the importance of each independent anchor based on the extent of performance degradation before and after removing the output of each independent anchor, or determining the importance of each anchor based on the degree of redundancy of the bounding box predicted by each anchor

Characterized by a computer device.