CN113076955A

CN113076955A - Target detection method, system, computer equipment and machine readable medium

Info

Publication number: CN113076955A
Application number: CN202110398900.6A
Authority: CN
Inventors: 张婷
Original assignee: Shanghai Yuncong Enterprise Development Co ltd
Current assignee: Shanghai Yuncong Enterprise Development Co ltd
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-07-06

Abstract

The invention provides a target detection method, a system, computer equipment and a machine readable medium, which utilize a neural network to extract a plurality of target characteristics from a target image; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The method and the device can improve the subsequent tracking and identifying effects while improving the target association accuracy. The invention can directly integrate the single detection tasks of multiple targets such as human body detection, human face detection, human head detection and the like into a structured target for detection, maintains the correlation among the human body target, the human head target and the human face target in the same person, reduces the error of target correlation in the tracking and identifying stage and improves the tracking and identifying effect.

Description

Target detection method, system, computer equipment and machine readable medium

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to a target detection method, a target detection system, a computer device, and a machine-readable medium.

Background

The structured detection of personnel comprises human body detection, face detection, head detection and the like, and has important significance in the aspects of face recognition, crowd counting, behavior analysis and recognition and the like in the field of monitoring security. At present, the structured detection of personnel mainly adopts a mode which basically tends to multitask individual detection based on deep learning, the mode is easily influenced by light, weather, shielding and the like in a monitoring scene with a complex background, and meanwhile, due to the flexibility of the personnel and the changeability of posture angles, the situation that the head of the existing person does not have a human body or has one or other detection frames is easier to occur, and the detection effect is influenced to a certain extent due to the fact that the structured information of the personnel, namely the correlation of the face of the person, the head of the person and the human body, is ignored by the mode.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide an object detection method, system, computer device and machine-readable medium for solving the technical problems in the prior art.

To achieve the above and other related objects, the present invention provides a target detection method, comprising the steps of:

extracting a plurality of target features from the target image by using a neural network, and associating a plurality of targets based on the extracted plurality of target features to form a structured target; wherein the plurality of targets belong to the same object;

and predicting the classification confidence of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result.

Optionally, the displaying the structured target in the target image according to the prediction result includes:

extracting a regression position of the structured target from the prediction result;

mapping the regression location of the structured target into the target image, and locating and displaying the structured target in the target image.

Optionally, the process of predicting the regression value of the structured target comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information which is associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.

Optionally, before extracting a plurality of target features from the target image by using the neural network, normalizing the target image is further included.

Optionally, the method further includes adding an FPN structure to the network structure of the neural network, and extracting a plurality of target features from the target image by using the neural network after the FPN structure is added.

Optionally, the plurality of objects includes at least a human body, a human head, and a human face.

The invention also provides a target detection system, comprising:

the characteristic extraction module is used for extracting a plurality of target characteristics from the target image by utilizing a neural network;

the association module is used for associating a plurality of targets according to the extracted target features to form a structured target; wherein the plurality of targets belong to the same object;

a structured prediction module for predicting a classification confidence of the structured target and a regression value of the structured target;

and the display module is used for displaying the structured target in the target image according to the prediction result.

Optionally, the displaying module displays the structured target in the target image according to the prediction result, including:

Optionally, the process of predicting the regression value of the structured target by the structuring module comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.

The present invention also provides a computer apparatus comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as in any one of the above.

The invention also provides one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method as described in any one of the above.

As described above, the present invention provides a target detection method, system, computer device and machine-readable medium, which have the following advantages: the method comprises the steps of extracting a plurality of target features from a target image by utilizing a neural network; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The multiple targets belong to the same object, and comprise a human body target, a human head target and a human face target of the object. In the prior art, when the object of a person is detected independently, the correlation among the human body, the human face and the human head of the same person can be ignored, so that the structural information of the person can be lost. Aiming at the existing problems, the invention designs a personnel structured detection mode, adopts deep learning to construct a detection neural network, integrates and associates a plurality of targets by improving the prediction mode of the targets, changes the single target label prediction into integral structured prediction, and simultaneously combines the labels of the plurality of targets to perform anchor (namely anchor point or anchor frame) associated clustering to establish a structured model. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved.

Drawings

Fig. 1 is a schematic flowchart of a target detection method according to an embodiment;

FIG. 2 is a schematic flow chart of a target detection method according to another embodiment;

fig. 3 is a schematic hardware structure diagram of an object detection system according to an embodiment;

FIG. 4 is a diagram of a structured prediction module, according to an embodiment;

fig. 5 is a schematic hardware structure diagram of a terminal device according to an embodiment;

fig. 6 is a schematic diagram of a hardware structure of a terminal device according to another embodiment.

Description of the element reference numerals

M10 feature extraction module

M20 association module

M30 structured prediction module

M40 display module

1100 input device

1101 first processor

1102 output device

1103 first memory

1104 communication bus

1200 processing assembly

1201 second processor

1202 second memory

1203 communication assembly

1204 Power supply Assembly

1205 multimedia assembly

1206 Audio component

1207 input/output interface

1208 sensor assembly

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Referring to fig. 1, the present invention provides a target detection method, including the following steps:

s100, extracting a plurality of target features from a target image by using a neural network; the method and the device for extracting the target features from the target image can use one neural network to extract a plurality of target features from the target image, and can also use a plurality of neural networks to extract a plurality of target features from the target image; the target image may be a single-frame image or a multi-frame image.

S200, associating a plurality of targets based on the extracted target features to form a structured target; the multiple targets belong to the same object, and the multiple targets at least comprise a human body target, a human head target and a human face target.

S300, predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result.

Aiming at the existing problems, the embodiment of the application designs a personnel structured detection mode, a detection neural network is constructed by adopting deep learning, a plurality of targets are integrated and associated by improving the prediction mode of the targets, the prediction of an independent target label is changed into integral structured prediction, the anchor (namely anchor point or anchor frame) associated clustering is carried out by combining the labels of the targets, and a structured model is established. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved. As an example, the single person detection is taken as an example to perform structured detection, where the detection of the human body, the human face and the human head are all different target tasks, and they can be detected separately, but at the same time, they belong to the same person and are information of the same person, so that the human body, the human face and the human head of the same person can be detected and/or predicted as the same structured target, so the embodiment of the present application detects the human body, the human head and the human face of the same person as the one structured target.

In accordance with the above, in an exemplary embodiment, the process of displaying the structured target in the target image according to the prediction result includes: and extracting regression positions of the multiple targets according to the prediction result, mapping the regression positions of the multiple targets to the target image, and positioning and displaying the structured target in the target image. As an example, for example, regression positions of the human body target, the human head target, and the human face target are extracted according to the prediction result, and then the regression positions of the human body target, the human head target, and the human face target are mapped to the position of the original input image (i.e., the target image), so as to obtain the actual positions of the human body, the human head, and the human face in the original input image, and complete the positioning and displaying of the structured target on the original input image.

According to the above description, the process of predicting the regression value of the structured target comprises: after the structured target is formed, combining a plurality of target features extracted from the target image with the associated anchors in the target image, namely combining the plurality of target features extracted from the target image with the associated anchor frame information or anchor point information in the target image, and then predicting the regression value of the structured target according to the combination result. As an example, for a certain person a in a certain single-frame target image, firstly, the head feature, the face feature and the body feature of the person a are extracted from the target image, and then the head target, the face target and the body target of the person a are associated according to the extracted head feature, the face feature and the body feature to form a structured target. And then, associating the human head frame, the human face frame and the human body frame of the person A in the frame of target image to obtain an associated anchor corresponding to the person A. And finally, combining the head features, the face features and the body features extracted from the target image with the anchors associated in the target image, and then predicting the regression value of the structured target according to the combination result.

According to the above description, before extracting a plurality of target features from a target image by using a neural network, the method further includes normalizing the target image to eliminate an average characteristic of the target image. As an example, specifically, the target image is acquired by the image acquisition device, and the acquired target image is normalized, that is, the mean value is reduced and the variance is removed, so that the average characteristic of the target image is eliminated, the difference characteristic of the target image is retained, and the input into the feature extraction module is more representative.

According to the above description, the neural network for extracting a plurality of target features from the target image may select a classical network structure or a customized full convolution network as a basic feature extraction layer, and then perform feature extraction on the obtained target image. As an example, the basic feature extraction network may select a VGG network, a resnet series network, and the like, and may also select a customized full convolution network. The embodiment of the application can also add an FPN structure on the network structure of the neural network, and extract a plurality of target features from the target image by using the neural network added with the FPN structure.

In one embodiment, the method provides a way to detect the structurization of a person, as shown in fig. 2, comprising:

step S101, image preprocessing. Firstly, a target image is obtained through an image acquisition device, normalization, namely, mean value reduction and variance removal, is carried out on the obtained target image, the average characteristic of the target image is eliminated, the difference characteristic of the target image is reserved, and the input entering an extraction feature module is more representative.

And step S102, extracting image features. The method comprises the steps of performing feature extraction on an input target image by selecting a classical network structure or a self-defined full convolution network as a basic feature extraction layer, wherein the basic feature extraction network can select a VGG (virtual vapor gateway) network, a resnet series network and the like, can also select a self-defined network, and adds an FPN (floating platform network) structure on the basic network structure to enhance the composition of target features. The image feature extraction method and the image feature extraction device adopt the basic full convolution network framework to extract the image features and reserve the depth information of the target image.

And step S103, image structuring prediction. And acquiring the classification confidence coefficient of the structured target and the regression value of the structured target according to the extracted features. The regression value of the structured target is jointly generated by combining the associated anchor information (namely anchor point information or anchor frame information), namely for a person A in an image, after the structured target of the person A is formed, the associated anchor frame information or anchor point information corresponding to the person A in the image is obtained, then a plurality of target features extracted from the image are combined with the associated anchor frame information or anchor point information corresponding to the person A in the image, and the regression value of the structured target is predicted according to the combination result. In the embodiment of the application, compared with the anchor of an individual target, the difference of the associated anchor is that the target of the associated anchor becomes a structured multi-target label instead of an individual target frame label. As an example, in the embodiment of the present application, the target frame tag of a single human target, a human head target, or a human face target is (xyz), and the multi-target tag of a structured target is (xhwxhywxhywh). Then setting a reasonable threshold value, and extracting regression positions of the human body, the human head and the human face at one time according to the set threshold value and the predicted regression value; that is, predicting the structured target is equivalent to predicting three types of targets, namely a human body, a human head and a human face, so that the position information of each type of target can be acquired at one time, and the result is (xywhxhywywhwh). Therefore, all regression positions of the output structured target can be directly predicted by combining the associated anchors, and the person structured prediction modeling is facilitated, so that the detection of each part can be improved, and the false detection rate can be reduced.

And step S104, displaying the target. And mapping the target position information extracted by prediction to the position of the original input image (namely the target image), obtaining the actual target position of the structured target, and finishing the positioning and display of the structured target on the original input image.

The method comprises the steps of extracting a plurality of target features from a target image by utilizing a neural network; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The multiple targets belong to the same object, and comprise a human body target, a human head target and a human face target of the object. In the prior art, when the object of a person is detected independently, the correlation among the human body, the human face and the human head of the same person can be ignored, so that the structural information of the person can be lost. Aiming at the existing problems, the method designs a personnel structured detection mode, adopts deep learning to construct a detection neural network, integrates and associates a plurality of targets by improving the prediction mode of the targets, changes the single target label prediction into integral structured prediction, and combines the labels of the plurality of targets to perform anchor (namely anchor point or anchor frame) associated clustering to establish a structured model. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved.

As shown in fig. 3 and 4, the present invention also provides an object detection system, including:

a feature extraction module M10, configured to extract a plurality of target features from the target image by using a neural network; according to the embodiment of the application, one neural network can be used for extracting a plurality of target features from a target image, and a plurality of neural networks can be used for extracting a plurality of target features from the target image; the target image may be a single-frame image or a multi-frame image.

The association module M20 is used for associating a plurality of targets according to the extracted target features to form a structured target; wherein the plurality of targets belong to the same object;

a structural prediction module M30, configured to predict the classification confidence of the structural target and the regression value of the structural target;

a display module M40, configured to display the structured target in the target image according to the prediction result.

In one embodiment, the present system provides a way to detect the structuralization of a person, as shown in fig. 2, comprising:

The system utilizes a neural network to extract a plurality of target features from a target image; associating a plurality of targets based on the extracted plurality of target features to form a structured target; and predicting the classification confidence coefficient of the structured target and the regression value of the structured target, and displaying the structured target in the target image according to the prediction result. The multiple targets belong to the same object, and comprise a human body target, a human head target and a human face target of the object. In the prior art, when the object of a person is detected independently, the correlation among the human body, the human face and the human head of the same person can be ignored, so that the structural information of the person can be lost. Aiming at the existing problems, the system designs a personnel structured detection mode, adopts deep learning to construct a detection neural network, integrates and associates a plurality of targets by improving the prediction mode of the targets, changes the single target label prediction into integral structured prediction, and simultaneously combines the labels of the targets to perform anchor (namely anchor point or anchor frame) associated clustering to establish a structured model. Due to the fact that the structural model is combined with the structural correlation of the human target, the single human target can be correlated with the human face target and the human head target which are correlated with the human body target, missing detection of a plurality of targets in the same person is reduced, and false detection of irrelevant targets is also reduced. In addition, before tracking and identifying a certain person, the human body target, the human face target and the human head target of the person can be associated firstly, so that the target association accuracy is improved, and the subsequent tracking and identifying effects can be improved. The embodiment of the application directly integrates multiple targets such as human body detection, human face detection, human head detection and the like into one structured target for detection, the correlation of the human body target in the same person, the human head target and the human face target can be kept, not only is false detection eliminated, but also binding of the structured target is beneficial to simplifying the correlation operation of a series of follow-up tracking and identifying stages on the target, namely for the person under real-time monitoring, the person can obtain the human body, the human head, the human face and other information of the person only by judging the identity of the person, the person does not need to be judged again, whether the information of the human face and the human head corresponds to the same person or not, thereby the correlation of the person target is enhanced, the error of target correlation in the tracking and identifying stage is reduced, and the tracking and identifying effect is improved.

An embodiment of the present application further provides a computer device, where the computer device may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.

The present embodiment also provides a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) included in the data processing method in fig. 1 according to the present embodiment.

Fig. 5 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.

Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.

Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.

In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.

Fig. 6 is a schematic hardware structure diagram of a terminal device according to another embodiment of the present application. FIG. 6 is a specific embodiment of the implementation of FIG. 5. As shown, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.

The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.

The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication components 1203, power components 1204, multimedia components 1205, audio components 1206, input/output interfaces 1207, and/or sensor components 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.

The processing component 1200 generally controls the overall operation of the terminal device. The processing assembly 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the method illustrated in fig. 1 described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.

The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.

The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The audio component 1206 is configured to output and/or input speech signals. For example, the audio component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, audio component 1206 also includes a speaker for outputting voice signals.

The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.

The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.

The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.

As can be seen from the above, the communication component 1203, the audio component 1206, the input/output interface 1207 and the sensor component 1208 in the embodiment of fig. 6 may be implemented as the input device in the embodiment of fig. 5.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A method of target detection, comprising the steps of:

2. The object detection method of claim 1, wherein displaying the structured object in the object image according to the prediction result comprises:

3. The method of claim 1 or 2, wherein predicting the regression value of the structured target comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information which is associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.

4. The method of claim 1, further comprising normalizing the target image prior to extracting the plurality of target features from the target image using the neural network.

5. The object detection method according to claim 1, further comprising adding an FPN structure to the network structure of the neural network, and extracting a plurality of object features from the object image using the neural network to which the FPN structure is added.

6. The object detection method according to any one of claims 1 to 5, wherein the plurality of objects includes at least a human body, a human head, and a human face.

7. An object detection system, comprising:

8. The object detection system of claim 7, wherein the display module displays the structured object in the object image according to the prediction result, comprising:

9. The object detection system of claim 7 or 8, wherein the process of the structuring module predicting regression values of the structured object comprises: and combining a plurality of target features extracted from the target image with anchor frame information or anchor point information associated in advance in the target image, and predicting a regression value of the structured target according to a combination result.

10. A computer device, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of any of claims 1-6.

11. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the method of any of claims 1-6.