CN112417205A

CN112417205A - Target retrieval device and method and electronic equipment

Info

Publication number: CN112417205A
Application number: CN201910767234.1A
Authority: CN
Inventors: 尹汭; 谭志明; 丁蓝
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-08-20
Filing date: 2019-08-20
Publication date: 2021-02-26
Also published as: JP2021034015A; JP7491057B2

Abstract

The embodiment of the invention provides a target retrieval device and method and electronic equipment. The device comprises: a first detection unit, configured to perform object detection on each of a plurality of input images to obtain object detection results of the plurality of input images; a second detection unit for detecting attributes of the person based on the object detection results of the plurality of input images to obtain attribute detection results; the third detection unit is used for detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and the retrieval unit is used for carrying out target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

Description

Target retrieval device and method and electronic equipment

Technical Field

The invention relates to the technical field of information.

Background

Target retrieval is an important application in video surveillance. Targets with specified characteristics or functions can be quickly found by using the technology. For example, this technique can be used to locate criminals, or to locate missing children and the elderly, etc.

In a conventional object search method, generally, features of a pedestrian article of a person or a motion of the person in an image are extracted, and an object search is performed based on these features.

It should be noted that the above background description is only for the sake of clarity and complete description of the technical solutions of the present invention and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the invention.

Disclosure of Invention

However, in the above conventional target search method, the features for performing the target search are single, which results in low search efficiency and search accuracy, and the types of the features for performing the target search are fixed, which makes it impossible to flexibly cope with different search requirements.

The embodiment of the invention provides a target retrieval device and method and electronic equipment, which firstly carry out object detection and attribute detection of people according to object detection results, carry out behavior detection of people according to the attribute detection results, and finally carry out target retrieval according to the detection results, so that the object detection results, the attribute detection results and the behavior detection results are integrated during target retrieval, namely rich multidimensional characteristics are integrated for target retrieval, therefore, the rapid and accurate target retrieval can be realized, in addition, in the attribute detection of people, the type of the detected attributes can be determined according to actual requirements, and therefore, the target retrieval device has good expandability and customizability.

According to a first aspect of embodiments of the present invention, there is provided a target retrieval apparatus, the apparatus including: a first detection unit, configured to perform object detection on each of a plurality of input images to obtain object detection results of the plurality of input images; a second detection unit for detecting attributes of the person based on the object detection results of the plurality of input images to obtain attribute detection results; the third detection unit is used for detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and the retrieval unit is used for carrying out target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

According to a second aspect of embodiments of the present invention, there is provided an electronic device comprising the apparatus according to the first aspect of embodiments of the present invention.

According to a third aspect of embodiments of the present invention, there is provided a target retrieval method, the method including: respectively carrying out object detection on a plurality of input images to obtain object detection results of the plurality of input images; detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result; detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

The invention has the beneficial effects that: the method comprises the steps of firstly carrying out object detection, carrying out attribute detection on people according to object detection results, carrying out behavior detection on people according to attribute detection results, and finally carrying out target retrieval according to the detection results, so that the object detection results, the attribute detection results and the behavior detection results are integrated during target retrieval, namely rich multidimensional characteristics are integrated for target retrieval, and therefore rapid and accurate target retrieval can be realized.

Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.

Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic diagram of a target retrieval apparatus according to embodiment 1 of the present invention;

fig. 2 is a schematic diagram of an object detection result of an input image according to embodiment 1 of the present invention;

fig. 3 is a schematic view of a method of motion detection of a person according to embodiment 1 of the present invention;

FIG. 4 is a schematic diagram of the detection result of the key points of the human body in embodiment 1 of the present invention;

fig. 5 is a schematic diagram of the third detecting unit 103 according to embodiment 1 of the present invention;

FIG. 6 is a diagram illustrating a target search result according to embodiment 1 of the present invention;

fig. 7 is a schematic view of an electronic device according to embodiment 2 of the present invention;

fig. 8 is a schematic block diagram of a system configuration of an electronic apparatus according to embodiment 2 of the present invention;

fig. 9 is a schematic diagram of a target retrieval method according to embodiment 3 of the present invention.

Detailed Description

The foregoing and other features of the invention will become apparent from the following description taken in conjunction with the accompanying drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the embodiments in which the principles of the invention may be employed, it being understood that the invention is not limited to the embodiments described, but, on the contrary, is intended to cover all modifications, variations, and equivalents falling within the scope of the appended claims.

Example 1

The embodiment of the invention provides a target retrieval device. Fig. 1 is a schematic diagram of a target search apparatus according to embodiment 1 of the present invention.

As shown in fig. 1, the object retrieval apparatus 100 includes:

a first detection unit 101, configured to perform object detection on each of the plurality of input images to obtain object detection results of the plurality of input images;

a second detection unit 102 configured to perform attribute detection of a person based on object detection results of a plurality of input images, and obtain an attribute detection result;

a third detection unit 103, configured to perform behavior detection on a person according to the object detection result and the attribute detection result, so as to obtain a behavior detection result; and

and the retrieval unit 104 is configured to perform target retrieval according to the object detection result, the attribute detection result, and the behavior detection result to obtain a target retrieval result.

It can be seen from the above embodiments that first, object detection is performed and attribute detection of a person is performed according to an object detection result, and behavior detection of the person is performed according to an attribute detection result, and finally, object retrieval is performed according to the above detection results, so that since the object detection result, the attribute detection result, and the behavior detection result are integrated at the time of object retrieval, that is, rich multidimensional characteristics are integrated for object retrieval, rapid and accurate object retrieval can be achieved, and in addition, in attribute detection of a person, the type of detected attribute can be determined according to actual needs, and thus, the method has good expandability and customizability.

In this embodiment, the input image may be an image obtained in real time or obtained in advance. For example, the input images are video images captured by the monitoring device, each input image corresponds to one frame of the video image, and the plurality of input images may be a plurality of consecutive frames.

In this embodiment, the first detection unit 101 performs object detection on each of the plurality of input images, and obtains object detection results of the plurality of input images.

In the present embodiment, the object may include a person, a car, a bus, a truck, a bicycle, a motorcycle, various animals, and the like.

In both embodiments, the first detection unit 101 can perform detection based on various target detection methods, such as fast R-CNN, FPN, Yolo network, etc.

In this embodiment, different networks may be used for detection according to different requirements, for example, a Yolo network may be used when the requirement on the processing speed is high, and a fast R-CNN network may be used when the requirement on the identification accuracy is high.

The first detection unit 101 detects a plurality of input images, respectively, and obtains object detection results of the plurality of input images, that is, respective objects identified using bounding boxes in the respective input images.

Fig. 2 is a schematic diagram of an object detection result of an input image according to embodiment 1 of the present invention. As shown in fig. 2, a bounding box of a person to be detected is marked in the input image.

In this embodiment, as shown in fig. 1, the apparatus 100 may further include:

a fourth detection unit 105, configured to perform tracking detection of a person on the plurality of input images, and determine an Identification (ID) of the person in the plurality of input images.

For example, the fourth detection unit 105 determines the identification of the person in the plurality of input images from at least one of the motion trajectory of the person in the plurality of input images and the features of the plurality of input images.

For example, the Deep Sort method is used for tracking and detecting people, the motion of people is described in time (motion trajectory) and space (convolution extraction features) according to the motion trajectory of people in a plurality of input images and the features of the plurality of input images, and the influence of factors such as shielding and human body feature change on the detection result can be effectively overcome.

In the present embodiment, the second detection unit 102 performs attribute detection of a person based on object detection results of a plurality of input images, resulting in an attribute detection result. For example, the second detection unit performs attribute detection of the person based on the bounding box of the person in the object detection result.

In the detection, the second detection unit 102 detects the attribute of a person based on the object detection result of each of the plurality of input images, that is, the bounding box of the person in each of the input images.

In this embodiment, the type of the attribute of the person detected by the second detection unit 102 may be determined according to actual needs, i.e. the functionality of the second detection unit 102 is expandable and customizable.

For example, the detection of the attribute of the person comprises at least one of the following detections: detecting the motion of a person; human pedestrian item detection; detecting the age of the person; sex detection of the person; and human expression detection.

In this embodiment, the motion detection of the person may be performed based on the key points.

Fig. 3 is a schematic diagram of a method for detecting human motion according to embodiment 1 of the present invention. As shown in fig. 3, the method includes:

step 301: detecting key points of the person in the detected boundary frame of the person;

step 302: calculating the characteristics of the person according to the detected key points of the person; and

step 303: according to the features of the person, torso movements, upper limb movements and head movements of the person are detected based on the classifier.

In step 301, key-points of the human body can be detected based on various methods, for example, based on a Cascaded Pyramid Network (CPN). Alternatively, the detection may be performed by a method such as Open-dose or Alpha-dose.

In the present embodiment, the key points of the human body may include a plurality of points respectively representing positions where a plurality of parts of the human body are located, for example, points respectively representing two ears, two eyes, a nose, two shoulders, two elbows, two wrists, two hips, two knees, and two ankles of the human body.

Fig. 4 is a schematic diagram of a detection result of a key point of a human body in embodiment 1 of the present invention. As shown in fig. 4, in the bounding box of one human body, key points representing respective parts of the human body are detected by the CPN and position information of the key points can be output.

In step 302, a feature of the person is calculated according to the detected key points of the person, for example, the feature of the human body may include: two-dimensional coordinates of a plurality of points respectively representing positions of a plurality of parts of the human body; and at least one angle between the connecting lines of the plurality of points.

In this embodiment, the features of the human body to be calculated may be determined according to actual needs.

In step 303, torso motions, upper limb motions, and head motions of the person are detected based on the classifier according to the characteristics of the person.

In the present embodiment, the torso movement of the human body may be detected based on various classifiers, for example, the detection may be performed based on a Multi-Layer Perceptron (MLP) classifier. And the detection is carried out according to the calculated characteristics and based on an MLP classifier, so that better detection performance can be obtained.

In the present embodiment, the head movement and the upper limb movement of the human body, for example, the head-up, head-down, hand-up, etc. movements may be detected based on a preset rule. Preset rules can be set for different actions according to actual needs, for example, when the heights of the two ears are higher than the heights of the two eyes, the user is judged to look down; when the height of the wrist is higher than that of the elbow, the wrist is judged to be lifted.

In this embodiment, when detecting an article of a person at any time, the type and/or attribute of the article of the pedestrian may be detected in the bounding box of the person under detection. For example, the Yolo network may be used for pedestrian item detection.

In this embodiment, the pedestrian articles may include various types of clothing, carry-on articles, accessories, and the like. The attribute of the pedestrian article may be various attributes of the article, for example, the color of the clothing.

In this embodiment, the detection of the age of the person, the detection of the gender of the person, and the detection of the expression of the person can all use the existing detection methods, and the details are not repeated herein.

After the second detection unit 102 detects the attribute detection result, the third detection unit 103 performs behavior detection of the person according to the object detection result and the attribute detection result to obtain a behavior detection result.

Fig. 5 is a schematic diagram of the third detecting unit 103 according to embodiment 1 of the present invention. As shown in fig. 5, the third detection unit 103 includes:

a fusion unit 501 for fusing the object detection result and the attribute detection result; and

a determining unit 502, configured to determine a behavior of the person according to the fused detection result and a preset rule, so as to obtain a behavior detection result.

In this embodiment, the fusion unit 501 fuses the object detection result and the attribute detection result, for example, the attribute detection result includes the human motion detection result, and the fusion unit 501 temporally fuses the human motion detection result and the object detection result. The determining unit 502 determines the behavior of the person according to the fused detection result and a preset rule to obtain a behavior detection result.

For example, the result of motion detection of a person includes: a person sitting continuously; the object detection result includes: a bicycle is detected in the leg region of the person. Then, the fusion unit 501 fuses the detection results, and the obtained features may be: the person continues to perform a sitting motion near the bicycle. At this time, the determination unit 502 may determine that the behavior of the person is "biking" based on the result of the fusion.

For another example, the result of detecting the motion of the person includes: the continuous walking movement of the person; the object detection result includes: a dog is detected in the vicinity of the person. Then, the fusion unit 501 fuses the detection results, and the obtained features may be: the person continues the act of walking in the vicinity of the dog. At this time, the determination unit 502 may determine that the behavior of the person is "walking a dog" based on the result of the fusion.

In this embodiment, as shown in fig. 1, the apparatus 100 may further include:

a storage unit 106 for storing an object detection result, an attribute detection result, and a behavior detection result corresponding to an identification of a person in each input image,

for example, each input image is each frame of a video, and various detection results are stored for each frame.

In the storage content corresponding to one input image, the object detection result, the attribute detection result, and the behavior detection result are stored corresponding to the identification of the person. For example, the stored content corresponding to the first frame (frame1) includes: the position, motion, pedestrian items, behavior, etc. of the bounding box corresponding to the person whose ID is 0; the position, motion, pedestrian item, behavior, etc. of the bounding box corresponding to the person with ID 1.

In the present embodiment, the search unit 105 searches the content stored in the storage unit 106 according to the search target, and obtains a target search result.

For example, if the search target is a person having an ID of 1, all search results of the person having an ID of 1 can be quickly searched in the stored content.

For example, if the search target is a person running with a red jacket, the stored content is searched for among the stored detection results based on the feature, and all the search results matching the feature can be quickly searched for.

In this embodiment, as shown in fig. 1, the apparatus 100 may further include:

a display unit 107 for displaying the target retrieval result in at least one of the plurality of input images.

Fig. 6 is a schematic diagram illustrating a target search result according to embodiment 1 of the present invention. As shown in fig. 6, if the search target is a person who wears pink sleeves and stands, a person frame matching the search target is identified in the input image.

In addition, when the plurality of input images are a plurality of consecutive frames of the video, the identified retrieval target may be displayed consecutively in the respective frames by playing the video or dragging a progress bar under the images. In addition, it is also possible to set and display a search target on the right side of the displayed image, the search target being determined by clicking and selecting.

Example 2

An embodiment of the present invention further provides an electronic device, and fig. 7 is a schematic diagram of an electronic device in embodiment 2 of the present invention. As shown in fig. 7, the electronic device 700 includes a target retrieval apparatus 701, and the structure and function of the target retrieval apparatus 701 are the same as those described in embodiment 1, and are not described herein again.

Fig. 8 is a schematic block diagram of a system configuration of an electronic apparatus according to embodiment 2 of the present invention. As shown in fig. 8, the electronic device 800 may include a central processor 801 and a memory 802; the memory 802 is coupled to the central processor 801. The figure is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.

As shown in fig. 8, the electronic device 800 may further include: an input unit 803, a display 804, a power supply 805.

In one embodiment, the functions of the target retrieval apparatus described in example 1 may be integrated into the central processor 801. Among other things, the central processor 801 may be configured to: respectively carrying out object detection on a plurality of input images to obtain object detection results of the plurality of input images; detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result; detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

For example, the central processor 801 may also be configured to: performing tracking detection of people on the plurality of input images, and determining identification of people in the plurality of input images.

For example, the tracking detection of the person on the plurality of input images includes: determining an identity of a person in the plurality of input images from at least one of a motion trajectory of the person in the plurality of input images and features of the plurality of input images.

For example, the central processor 801 may also be configured to: storing an object detection result, an attribute detection result and a behavior detection result corresponding to a person identifier according to each input image, and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result, including: and searching in the stored content to obtain the target searching result.

For example, the central processor 801 may also be configured to: displaying the target retrieval result in at least one of the plurality of input images.

For example, the detecting of the attribute of the person from the object detection results of the plurality of input images includes: detecting the attribute of the person according to the boundary frame of the person in the object detection result

For example, the detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result includes: fusing the object detection result and the attribute detection result; and determining the behavior of the person according to the fused detection result and a preset rule to obtain the behavior detection result.

In another embodiment, the target retrieval device described in embodiment 1 may be configured separately from the central processing unit 801, for example, the target retrieval device may be configured as a chip connected to the central processing unit 801, and the function of the target retrieval device is realized by the control of the central processing unit 801.

It is not necessary that the electronic device 800 in this embodiment include all of the components shown in fig. 8.

As shown in fig. 8, the central processor 801, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, and the central processor 801 receives inputs and controls the operation of the various components of the electronic device 800.

The memory 802, for example, may be one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. And the central processor 801 can execute the program stored in the memory 802 to realize information storage or processing, or the like. The functions of other parts are similar to the prior art and are not described in detail here. The components of electronic device 800 may be implemented in dedicated hardware, firmware, software, or combinations thereof, without departing from the scope of the invention.

Example 3

The embodiment of the invention also provides a target retrieval method, which corresponds to the target retrieval device in the embodiment 1. Fig. 9 is a schematic diagram of a target retrieval method according to embodiment 3 of the present invention. As shown in fig. 9, the method includes:

step 901: respectively carrying out object detection on the plurality of input images to obtain object detection results of the plurality of input images;

step 902: detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result;

step 903: detecting the behaviors of the people according to the object detection result and the attribute detection result to obtain a behavior detection result; and

step 904: and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

In this embodiment, the specific implementation method of the above steps is the same as that described in embodiment 1, and is not repeated here.

An embodiment of the present invention also provides a computer-readable program, where when the program is executed in a target retrieval apparatus or an electronic device, the program causes a computer to execute the target retrieval method described in embodiment 3 in the target retrieval apparatus or the electronic device.

An embodiment of the present invention further provides a storage medium storing a computer-readable program, where the computer-readable program enables a computer to execute the object retrieval method described in embodiment 3 in an object retrieval device or an electronic device.

The object retrieval method performed in the object retrieval device or the electronic device described in connection with the embodiments of the present invention may be directly embodied as hardware, a software module executed by a processor, or a combination of both. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams illustrated in fig. 1 may correspond to individual software modules of a computer program flow or may correspond to individual hardware modules. These software modules may correspond to the steps shown in fig. 9, respectively. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the electronic device employs a relatively large capacity MEGA-SIM card or a large capacity flash memory device, the software module may be stored in the MEGA-SIM card or the large capacity flash memory device.

One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 1 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof designed to perform the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 1 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.

While the invention has been described with reference to specific embodiments, it will be apparent to those skilled in the art that these descriptions are illustrative and not intended to limit the scope of the invention. Various modifications and alterations of this invention will become apparent to those skilled in the art based upon the spirit and principles of this invention, and such modifications and alterations are also within the scope of this invention.

With respect to the embodiments including the above embodiments, the following remarks are also disclosed:

1. a method of object retrieval, the method comprising:

respectively carrying out object detection on a plurality of input images to obtain object detection results of the plurality of input images;

detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result;

detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and

and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

2. The method according to supplementary note 1, wherein the method further comprises:

performing tracking detection of people on the plurality of input images, and determining identification of people in the plurality of input images.

3. The method according to supplementary note 2, wherein the performing tracking detection of a person on the plurality of input images includes:

determining an identity of a person in the plurality of input images from at least one of a motion trajectory of the person in the plurality of input images and features of the plurality of input images.

4. The method according to supplementary note 2, wherein the method further comprises:

storing object detection results, attribute detection results, and behavior detection results corresponding to the identification of the person in accordance with the respective input images,

the target retrieval is performed according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result, and the method comprises the following steps:

and searching in the stored content to obtain the target searching result.

5. The method according to supplementary note 1, wherein the method further comprises:

displaying the target retrieval result in at least one of the plurality of input images.

6. The method according to supplementary note 1, wherein the detecting of the attribute of the person according to the object detection results of the plurality of input images, comprises:

and detecting the attribute of the person according to the boundary frame of the person in the object detection result.

7. The method according to supplementary note 1, wherein the detecting behavior of a person according to the object detection result and the attribute detection result to obtain a behavior detection result includes:

fusing the object detection result and the attribute detection result; and

and determining the behavior of the person according to the fused detection result and a preset rule to obtain the behavior detection result.

8. The method according to any of the supplementary notes 1-7, wherein the detection of the person's attributes comprises at least one of the following detections:

detecting the motion of a person;

human pedestrian item detection;

detecting the age of the person;

sex detection of the person; and

and detecting the expression of the human.

Claims

1. A target retrieval apparatus, the apparatus comprising:

a first detection unit, configured to perform object detection on each of a plurality of input images to obtain object detection results of the plurality of input images;

a second detection unit for detecting attributes of the person based on the object detection results of the plurality of input images to obtain attribute detection results;

the third detection unit is used for detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and

and the retrieval unit is used for carrying out target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

2. The apparatus of claim 1, wherein the apparatus further comprises:

a fourth detection unit configured to perform tracking detection of a person on the plurality of input images and determine an identification of the person in the plurality of input images.

3. The apparatus of claim 2, wherein,

the fourth detection unit determines the identification of the person in the plurality of input images according to at least one of the motion trajectory of the person in the plurality of input images and the features of the plurality of input images.

4. The apparatus of claim 2, wherein the apparatus further comprises:

a storage unit for storing an object detection result, an attribute detection result, and a behavior detection result corresponding to an identification of a person in each input image,

and the retrieval unit retrieves the content stored in the storage unit to obtain the target retrieval result.

5. The apparatus of claim 1, wherein the apparatus further comprises:

a display unit for displaying the target retrieval result in at least one of the plurality of input images.

6. The apparatus of claim 1, wherein,

the second detection unit detects the attribute of the person according to the boundary frame of the person in the object detection result.

7. The apparatus of claim 1, wherein the third detection unit comprises:

a fusion unit for fusing the object detection result and the attribute detection result; and

and the determining unit is used for determining the behavior of the person according to the fused detection result and a preset rule to obtain the behavior detection result.

8. The apparatus of claim 1, wherein the detection of the attribute of the person comprises at least one of:

detecting the motion of a person;

human pedestrian item detection;

detecting the age of the person;

sex detection of the person; and

and detecting the expression of the human.

9. An electronic device comprising the apparatus of claim 1.

10. A method of object retrieval, the method comprising: