CN112417205A - Target retrieval device and method and electronic equipment - Google Patents

Target retrieval device and method and electronic equipment Download PDF

Info

Publication number
CN112417205A
CN112417205A CN201910767234.1A CN201910767234A CN112417205A CN 112417205 A CN112417205 A CN 112417205A CN 201910767234 A CN201910767234 A CN 201910767234A CN 112417205 A CN112417205 A CN 112417205A
Authority
CN
China
Prior art keywords
detection result
person
detection
attribute
input images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910767234.1A
Other languages
Chinese (zh)
Inventor
尹汭
谭志明
丁蓝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201910767234.1A priority Critical patent/CN112417205A/en
Priority to JP2020092444A priority patent/JP7491057B2/en
Publication of CN112417205A publication Critical patent/CN112417205A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a target retrieval device and method and electronic equipment. The device comprises: a first detection unit, configured to perform object detection on each of a plurality of input images to obtain object detection results of the plurality of input images; a second detection unit for detecting attributes of the person based on the object detection results of the plurality of input images to obtain attribute detection results; the third detection unit is used for detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and the retrieval unit is used for carrying out target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.

Description

Target retrieval device and method and electronic equipment
Technical Field
The invention relates to the technical field of information.
Background
Target retrieval is an important application in video surveillance. Targets with specified characteristics or functions can be quickly found by using the technology. For example, this technique can be used to locate criminals, or to locate missing children and the elderly, etc.
In a conventional object search method, generally, features of a pedestrian article of a person or a motion of the person in an image are extracted, and an object search is performed based on these features.
It should be noted that the above background description is only for the sake of clarity and complete description of the technical solutions of the present invention and for the understanding of those skilled in the art. Such solutions are not considered to be known to the person skilled in the art merely because they have been set forth in the background section of the invention.
Disclosure of Invention
However, in the above conventional target search method, the features for performing the target search are single, which results in low search efficiency and search accuracy, and the types of the features for performing the target search are fixed, which makes it impossible to flexibly cope with different search requirements.
The embodiment of the invention provides a target retrieval device and method and electronic equipment, which firstly carry out object detection and attribute detection of people according to object detection results, carry out behavior detection of people according to the attribute detection results, and finally carry out target retrieval according to the detection results, so that the object detection results, the attribute detection results and the behavior detection results are integrated during target retrieval, namely rich multidimensional characteristics are integrated for target retrieval, therefore, the rapid and accurate target retrieval can be realized, in addition, in the attribute detection of people, the type of the detected attributes can be determined according to actual requirements, and therefore, the target retrieval device has good expandability and customizability.
According to a first aspect of embodiments of the present invention, there is provided a target retrieval apparatus, the apparatus including: a first detection unit, configured to perform object detection on each of a plurality of input images to obtain object detection results of the plurality of input images; a second detection unit for detecting attributes of the person based on the object detection results of the plurality of input images to obtain attribute detection results; the third detection unit is used for detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and the retrieval unit is used for carrying out target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.
According to a second aspect of embodiments of the present invention, there is provided an electronic device comprising the apparatus according to the first aspect of embodiments of the present invention.
According to a third aspect of embodiments of the present invention, there is provided a target retrieval method, the method including: respectively carrying out object detection on a plurality of input images to obtain object detection results of the plurality of input images; detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result; detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.
The invention has the beneficial effects that: the method comprises the steps of firstly carrying out object detection, carrying out attribute detection on people according to object detection results, carrying out behavior detection on people according to attribute detection results, and finally carrying out target retrieval according to the detection results, so that the object detection results, the attribute detection results and the behavior detection results are integrated during target retrieval, namely rich multidimensional characteristics are integrated for target retrieval, and therefore rapid and accurate target retrieval can be realized.
Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims.
Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, integers, steps or components but does not preclude the presence or addition of one or more other features, integers, steps or components.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a schematic diagram of a target retrieval apparatus according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of an object detection result of an input image according to embodiment 1 of the present invention;
fig. 3 is a schematic view of a method of motion detection of a person according to embodiment 1 of the present invention;
FIG. 4 is a schematic diagram of the detection result of the key points of the human body in embodiment 1 of the present invention;
fig. 5 is a schematic diagram of the third detecting unit 103 according to embodiment 1 of the present invention;
FIG. 6 is a diagram illustrating a target search result according to embodiment 1 of the present invention;
fig. 7 is a schematic view of an electronic device according to embodiment 2 of the present invention;
fig. 8 is a schematic block diagram of a system configuration of an electronic apparatus according to embodiment 2 of the present invention;
fig. 9 is a schematic diagram of a target retrieval method according to embodiment 3 of the present invention.
Detailed Description
The foregoing and other features of the invention will become apparent from the following description taken in conjunction with the accompanying drawings. In the description and drawings, particular embodiments of the invention have been disclosed in detail as being indicative of some of the embodiments in which the principles of the invention may be employed, it being understood that the invention is not limited to the embodiments described, but, on the contrary, is intended to cover all modifications, variations, and equivalents falling within the scope of the appended claims.
Example 1
The embodiment of the invention provides a target retrieval device. Fig. 1 is a schematic diagram of a target search apparatus according to embodiment 1 of the present invention.
As shown in fig. 1, the object retrieval apparatus 100 includes:
a first detection unit 101, configured to perform object detection on each of the plurality of input images to obtain object detection results of the plurality of input images;
a second detection unit 102 configured to perform attribute detection of a person based on object detection results of a plurality of input images, and obtain an attribute detection result;
a third detection unit 103, configured to perform behavior detection on a person according to the object detection result and the attribute detection result, so as to obtain a behavior detection result; and
and the retrieval unit 104 is configured to perform target retrieval according to the object detection result, the attribute detection result, and the behavior detection result to obtain a target retrieval result.
It can be seen from the above embodiments that first, object detection is performed and attribute detection of a person is performed according to an object detection result, and behavior detection of the person is performed according to an attribute detection result, and finally, object retrieval is performed according to the above detection results, so that since the object detection result, the attribute detection result, and the behavior detection result are integrated at the time of object retrieval, that is, rich multidimensional characteristics are integrated for object retrieval, rapid and accurate object retrieval can be achieved, and in addition, in attribute detection of a person, the type of detected attribute can be determined according to actual needs, and thus, the method has good expandability and customizability.
In this embodiment, the input image may be an image obtained in real time or obtained in advance. For example, the input images are video images captured by the monitoring device, each input image corresponds to one frame of the video image, and the plurality of input images may be a plurality of consecutive frames.
In this embodiment, the first detection unit 101 performs object detection on each of the plurality of input images, and obtains object detection results of the plurality of input images.
In the present embodiment, the object may include a person, a car, a bus, a truck, a bicycle, a motorcycle, various animals, and the like.
In both embodiments, the first detection unit 101 can perform detection based on various target detection methods, such as fast R-CNN, FPN, Yolo network, etc.
In this embodiment, different networks may be used for detection according to different requirements, for example, a Yolo network may be used when the requirement on the processing speed is high, and a fast R-CNN network may be used when the requirement on the identification accuracy is high.
The first detection unit 101 detects a plurality of input images, respectively, and obtains object detection results of the plurality of input images, that is, respective objects identified using bounding boxes in the respective input images.
Fig. 2 is a schematic diagram of an object detection result of an input image according to embodiment 1 of the present invention. As shown in fig. 2, a bounding box of a person to be detected is marked in the input image.
In this embodiment, as shown in fig. 1, the apparatus 100 may further include:
a fourth detection unit 105, configured to perform tracking detection of a person on the plurality of input images, and determine an Identification (ID) of the person in the plurality of input images.
For example, the fourth detection unit 105 determines the identification of the person in the plurality of input images from at least one of the motion trajectory of the person in the plurality of input images and the features of the plurality of input images.
For example, the Deep Sort method is used for tracking and detecting people, the motion of people is described in time (motion trajectory) and space (convolution extraction features) according to the motion trajectory of people in a plurality of input images and the features of the plurality of input images, and the influence of factors such as shielding and human body feature change on the detection result can be effectively overcome.
In the present embodiment, the second detection unit 102 performs attribute detection of a person based on object detection results of a plurality of input images, resulting in an attribute detection result. For example, the second detection unit performs attribute detection of the person based on the bounding box of the person in the object detection result.
In the detection, the second detection unit 102 detects the attribute of a person based on the object detection result of each of the plurality of input images, that is, the bounding box of the person in each of the input images.
In this embodiment, the type of the attribute of the person detected by the second detection unit 102 may be determined according to actual needs, i.e. the functionality of the second detection unit 102 is expandable and customizable.
For example, the detection of the attribute of the person comprises at least one of the following detections: detecting the motion of a person; human pedestrian item detection; detecting the age of the person; sex detection of the person; and human expression detection.
In this embodiment, the motion detection of the person may be performed based on the key points.
Fig. 3 is a schematic diagram of a method for detecting human motion according to embodiment 1 of the present invention. As shown in fig. 3, the method includes:
step 301: detecting key points of the person in the detected boundary frame of the person;
step 302: calculating the characteristics of the person according to the detected key points of the person; and
step 303: according to the features of the person, torso movements, upper limb movements and head movements of the person are detected based on the classifier.
In step 301, key-points of the human body can be detected based on various methods, for example, based on a Cascaded Pyramid Network (CPN). Alternatively, the detection may be performed by a method such as Open-dose or Alpha-dose.
In the present embodiment, the key points of the human body may include a plurality of points respectively representing positions where a plurality of parts of the human body are located, for example, points respectively representing two ears, two eyes, a nose, two shoulders, two elbows, two wrists, two hips, two knees, and two ankles of the human body.
Fig. 4 is a schematic diagram of a detection result of a key point of a human body in embodiment 1 of the present invention. As shown in fig. 4, in the bounding box of one human body, key points representing respective parts of the human body are detected by the CPN and position information of the key points can be output.
In step 302, a feature of the person is calculated according to the detected key points of the person, for example, the feature of the human body may include: two-dimensional coordinates of a plurality of points respectively representing positions of a plurality of parts of the human body; and at least one angle between the connecting lines of the plurality of points.
In this embodiment, the features of the human body to be calculated may be determined according to actual needs.
In step 303, torso motions, upper limb motions, and head motions of the person are detected based on the classifier according to the characteristics of the person.
In the present embodiment, the torso movement of the human body may be detected based on various classifiers, for example, the detection may be performed based on a Multi-Layer Perceptron (MLP) classifier. And the detection is carried out according to the calculated characteristics and based on an MLP classifier, so that better detection performance can be obtained.
In the present embodiment, the head movement and the upper limb movement of the human body, for example, the head-up, head-down, hand-up, etc. movements may be detected based on a preset rule. Preset rules can be set for different actions according to actual needs, for example, when the heights of the two ears are higher than the heights of the two eyes, the user is judged to look down; when the height of the wrist is higher than that of the elbow, the wrist is judged to be lifted.
In this embodiment, when detecting an article of a person at any time, the type and/or attribute of the article of the pedestrian may be detected in the bounding box of the person under detection. For example, the Yolo network may be used for pedestrian item detection.
In this embodiment, the pedestrian articles may include various types of clothing, carry-on articles, accessories, and the like. The attribute of the pedestrian article may be various attributes of the article, for example, the color of the clothing.
In this embodiment, the detection of the age of the person, the detection of the gender of the person, and the detection of the expression of the person can all use the existing detection methods, and the details are not repeated herein.
After the second detection unit 102 detects the attribute detection result, the third detection unit 103 performs behavior detection of the person according to the object detection result and the attribute detection result to obtain a behavior detection result.
Fig. 5 is a schematic diagram of the third detecting unit 103 according to embodiment 1 of the present invention. As shown in fig. 5, the third detection unit 103 includes:
a fusion unit 501 for fusing the object detection result and the attribute detection result; and
a determining unit 502, configured to determine a behavior of the person according to the fused detection result and a preset rule, so as to obtain a behavior detection result.
In this embodiment, the fusion unit 501 fuses the object detection result and the attribute detection result, for example, the attribute detection result includes the human motion detection result, and the fusion unit 501 temporally fuses the human motion detection result and the object detection result. The determining unit 502 determines the behavior of the person according to the fused detection result and a preset rule to obtain a behavior detection result.
For example, the result of motion detection of a person includes: a person sitting continuously; the object detection result includes: a bicycle is detected in the leg region of the person. Then, the fusion unit 501 fuses the detection results, and the obtained features may be: the person continues to perform a sitting motion near the bicycle. At this time, the determination unit 502 may determine that the behavior of the person is "biking" based on the result of the fusion.
For another example, the result of detecting the motion of the person includes: the continuous walking movement of the person; the object detection result includes: a dog is detected in the vicinity of the person. Then, the fusion unit 501 fuses the detection results, and the obtained features may be: the person continues the act of walking in the vicinity of the dog. At this time, the determination unit 502 may determine that the behavior of the person is "walking a dog" based on the result of the fusion.
In this embodiment, as shown in fig. 1, the apparatus 100 may further include:
a storage unit 106 for storing an object detection result, an attribute detection result, and a behavior detection result corresponding to an identification of a person in each input image,
for example, each input image is each frame of a video, and various detection results are stored for each frame.
In the storage content corresponding to one input image, the object detection result, the attribute detection result, and the behavior detection result are stored corresponding to the identification of the person. For example, the stored content corresponding to the first frame (frame1) includes: the position, motion, pedestrian items, behavior, etc. of the bounding box corresponding to the person whose ID is 0; the position, motion, pedestrian item, behavior, etc. of the bounding box corresponding to the person with ID 1.
In the present embodiment, the search unit 105 searches the content stored in the storage unit 106 according to the search target, and obtains a target search result.
For example, if the search target is a person having an ID of 1, all search results of the person having an ID of 1 can be quickly searched in the stored content.
For example, if the search target is a person running with a red jacket, the stored content is searched for among the stored detection results based on the feature, and all the search results matching the feature can be quickly searched for.
In this embodiment, as shown in fig. 1, the apparatus 100 may further include:
a display unit 107 for displaying the target retrieval result in at least one of the plurality of input images.
Fig. 6 is a schematic diagram illustrating a target search result according to embodiment 1 of the present invention. As shown in fig. 6, if the search target is a person who wears pink sleeves and stands, a person frame matching the search target is identified in the input image.
In addition, when the plurality of input images are a plurality of consecutive frames of the video, the identified retrieval target may be displayed consecutively in the respective frames by playing the video or dragging a progress bar under the images. In addition, it is also possible to set and display a search target on the right side of the displayed image, the search target being determined by clicking and selecting.
It can be seen from the above embodiments that first, object detection is performed and attribute detection of a person is performed according to an object detection result, and behavior detection of the person is performed according to an attribute detection result, and finally, object retrieval is performed according to the above detection results, so that since the object detection result, the attribute detection result, and the behavior detection result are integrated at the time of object retrieval, that is, rich multidimensional characteristics are integrated for object retrieval, rapid and accurate object retrieval can be achieved, and in addition, in attribute detection of a person, the type of detected attribute can be determined according to actual needs, and thus, the method has good expandability and customizability.
Example 2
An embodiment of the present invention further provides an electronic device, and fig. 7 is a schematic diagram of an electronic device in embodiment 2 of the present invention. As shown in fig. 7, the electronic device 700 includes a target retrieval apparatus 701, and the structure and function of the target retrieval apparatus 701 are the same as those described in embodiment 1, and are not described herein again.
Fig. 8 is a schematic block diagram of a system configuration of an electronic apparatus according to embodiment 2 of the present invention. As shown in fig. 8, the electronic device 800 may include a central processor 801 and a memory 802; the memory 802 is coupled to the central processor 801. The figure is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
As shown in fig. 8, the electronic device 800 may further include: an input unit 803, a display 804, a power supply 805.
In one embodiment, the functions of the target retrieval apparatus described in example 1 may be integrated into the central processor 801. Among other things, the central processor 801 may be configured to: respectively carrying out object detection on a plurality of input images to obtain object detection results of the plurality of input images; detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result; detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.
For example, the central processor 801 may also be configured to: performing tracking detection of people on the plurality of input images, and determining identification of people in the plurality of input images.
For example, the tracking detection of the person on the plurality of input images includes: determining an identity of a person in the plurality of input images from at least one of a motion trajectory of the person in the plurality of input images and features of the plurality of input images.
For example, the central processor 801 may also be configured to: storing an object detection result, an attribute detection result and a behavior detection result corresponding to a person identifier according to each input image, and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result, including: and searching in the stored content to obtain the target searching result.
For example, the central processor 801 may also be configured to: displaying the target retrieval result in at least one of the plurality of input images.
For example, the detecting of the attribute of the person from the object detection results of the plurality of input images includes: detecting the attribute of the person according to the boundary frame of the person in the object detection result
For example, the detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result includes: fusing the object detection result and the attribute detection result; and determining the behavior of the person according to the fused detection result and a preset rule to obtain the behavior detection result.
For example, the detection of the attribute of the person comprises at least one of the following detections: detecting the motion of a person; human pedestrian item detection; detecting the age of the person; sex detection of the person; and human expression detection.
In another embodiment, the target retrieval device described in embodiment 1 may be configured separately from the central processing unit 801, for example, the target retrieval device may be configured as a chip connected to the central processing unit 801, and the function of the target retrieval device is realized by the control of the central processing unit 801.
It is not necessary that the electronic device 800 in this embodiment include all of the components shown in fig. 8.
As shown in fig. 8, the central processor 801, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, and the central processor 801 receives inputs and controls the operation of the various components of the electronic device 800.
The memory 802, for example, may be one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. And the central processor 801 can execute the program stored in the memory 802 to realize information storage or processing, or the like. The functions of other parts are similar to the prior art and are not described in detail here. The components of electronic device 800 may be implemented in dedicated hardware, firmware, software, or combinations thereof, without departing from the scope of the invention.
It can be seen from the above embodiments that first, object detection is performed and attribute detection of a person is performed according to an object detection result, and behavior detection of the person is performed according to an attribute detection result, and finally, object retrieval is performed according to the above detection results, so that since the object detection result, the attribute detection result, and the behavior detection result are integrated at the time of object retrieval, that is, rich multidimensional characteristics are integrated for object retrieval, rapid and accurate object retrieval can be achieved, and in addition, in attribute detection of a person, the type of detected attribute can be determined according to actual needs, and thus, the method has good expandability and customizability.
Example 3
The embodiment of the invention also provides a target retrieval method, which corresponds to the target retrieval device in the embodiment 1. Fig. 9 is a schematic diagram of a target retrieval method according to embodiment 3 of the present invention. As shown in fig. 9, the method includes:
step 901: respectively carrying out object detection on the plurality of input images to obtain object detection results of the plurality of input images;
step 902: detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result;
step 903: detecting the behaviors of the people according to the object detection result and the attribute detection result to obtain a behavior detection result; and
step 904: and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.
In this embodiment, the specific implementation method of the above steps is the same as that described in embodiment 1, and is not repeated here.
It can be seen from the above embodiments that first, object detection is performed and attribute detection of a person is performed according to an object detection result, and behavior detection of the person is performed according to an attribute detection result, and finally, object retrieval is performed according to the above detection results, so that since the object detection result, the attribute detection result, and the behavior detection result are integrated at the time of object retrieval, that is, rich multidimensional characteristics are integrated for object retrieval, rapid and accurate object retrieval can be achieved, and in addition, in attribute detection of a person, the type of detected attribute can be determined according to actual needs, and thus, the method has good expandability and customizability.
An embodiment of the present invention also provides a computer-readable program, where when the program is executed in a target retrieval apparatus or an electronic device, the program causes a computer to execute the target retrieval method described in embodiment 3 in the target retrieval apparatus or the electronic device.
An embodiment of the present invention further provides a storage medium storing a computer-readable program, where the computer-readable program enables a computer to execute the object retrieval method described in embodiment 3 in an object retrieval device or an electronic device.
The object retrieval method performed in the object retrieval device or the electronic device described in connection with the embodiments of the present invention may be directly embodied as hardware, a software module executed by a processor, or a combination of both. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams illustrated in fig. 1 may correspond to individual software modules of a computer program flow or may correspond to individual hardware modules. These software modules may correspond to the steps shown in fig. 9, respectively. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the electronic device employs a relatively large capacity MEGA-SIM card or a large capacity flash memory device, the software module may be stored in the MEGA-SIM card or the large capacity flash memory device.
One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 1 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof designed to perform the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 1 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.
While the invention has been described with reference to specific embodiments, it will be apparent to those skilled in the art that these descriptions are illustrative and not intended to limit the scope of the invention. Various modifications and alterations of this invention will become apparent to those skilled in the art based upon the spirit and principles of this invention, and such modifications and alterations are also within the scope of this invention.
With respect to the embodiments including the above embodiments, the following remarks are also disclosed:
1. a method of object retrieval, the method comprising:
respectively carrying out object detection on a plurality of input images to obtain object detection results of the plurality of input images;
detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result;
detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and
and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.
2. The method according to supplementary note 1, wherein the method further comprises:
performing tracking detection of people on the plurality of input images, and determining identification of people in the plurality of input images.
3. The method according to supplementary note 2, wherein the performing tracking detection of a person on the plurality of input images includes:
determining an identity of a person in the plurality of input images from at least one of a motion trajectory of the person in the plurality of input images and features of the plurality of input images.
4. The method according to supplementary note 2, wherein the method further comprises:
storing object detection results, attribute detection results, and behavior detection results corresponding to the identification of the person in accordance with the respective input images,
the target retrieval is performed according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result, and the method comprises the following steps:
and searching in the stored content to obtain the target searching result.
5. The method according to supplementary note 1, wherein the method further comprises:
displaying the target retrieval result in at least one of the plurality of input images.
6. The method according to supplementary note 1, wherein the detecting of the attribute of the person according to the object detection results of the plurality of input images, comprises:
and detecting the attribute of the person according to the boundary frame of the person in the object detection result.
7. The method according to supplementary note 1, wherein the detecting behavior of a person according to the object detection result and the attribute detection result to obtain a behavior detection result includes:
fusing the object detection result and the attribute detection result; and
and determining the behavior of the person according to the fused detection result and a preset rule to obtain the behavior detection result.
8. The method according to any of the supplementary notes 1-7, wherein the detection of the person's attributes comprises at least one of the following detections:
detecting the motion of a person;
human pedestrian item detection;
detecting the age of the person;
sex detection of the person; and
and detecting the expression of the human.

Claims (10)

1. A target retrieval apparatus, the apparatus comprising:
a first detection unit, configured to perform object detection on each of a plurality of input images to obtain object detection results of the plurality of input images;
a second detection unit for detecting attributes of the person based on the object detection results of the plurality of input images to obtain attribute detection results;
the third detection unit is used for detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and
and the retrieval unit is used for carrying out target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.
2. The apparatus of claim 1, wherein the apparatus further comprises:
a fourth detection unit configured to perform tracking detection of a person on the plurality of input images and determine an identification of the person in the plurality of input images.
3. The apparatus of claim 2, wherein,
the fourth detection unit determines the identification of the person in the plurality of input images according to at least one of the motion trajectory of the person in the plurality of input images and the features of the plurality of input images.
4. The apparatus of claim 2, wherein the apparatus further comprises:
a storage unit for storing an object detection result, an attribute detection result, and a behavior detection result corresponding to an identification of a person in each input image,
and the retrieval unit retrieves the content stored in the storage unit to obtain the target retrieval result.
5. The apparatus of claim 1, wherein the apparatus further comprises:
a display unit for displaying the target retrieval result in at least one of the plurality of input images.
6. The apparatus of claim 1, wherein,
the second detection unit detects the attribute of the person according to the boundary frame of the person in the object detection result.
7. The apparatus of claim 1, wherein the third detection unit comprises:
a fusion unit for fusing the object detection result and the attribute detection result; and
and the determining unit is used for determining the behavior of the person according to the fused detection result and a preset rule to obtain the behavior detection result.
8. The apparatus of claim 1, wherein the detection of the attribute of the person comprises at least one of:
detecting the motion of a person;
human pedestrian item detection;
detecting the age of the person;
sex detection of the person; and
and detecting the expression of the human.
9. An electronic device comprising the apparatus of claim 1.
10. A method of object retrieval, the method comprising:
respectively carrying out object detection on a plurality of input images to obtain object detection results of the plurality of input images;
detecting the attribute of the person according to the object detection results of the plurality of input images to obtain an attribute detection result;
detecting the behavior of the person according to the object detection result and the attribute detection result to obtain a behavior detection result; and
and performing target retrieval according to the object detection result, the attribute detection result and the behavior detection result to obtain a target retrieval result.
CN201910767234.1A 2019-08-20 2019-08-20 Target retrieval device and method and electronic equipment Pending CN112417205A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910767234.1A CN112417205A (en) 2019-08-20 2019-08-20 Target retrieval device and method and electronic equipment
JP2020092444A JP7491057B2 (en) 2019-08-20 2020-05-27 Target search device and method, electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910767234.1A CN112417205A (en) 2019-08-20 2019-08-20 Target retrieval device and method and electronic equipment

Publications (1)

Publication Number Publication Date
CN112417205A true CN112417205A (en) 2021-02-26

Family

ID=74678545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910767234.1A Pending CN112417205A (en) 2019-08-20 2019-08-20 Target retrieval device and method and electronic equipment

Country Status (2)

Country Link
JP (1) JP7491057B2 (en)
CN (1) CN112417205A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023195305A1 (en) * 2022-04-08 2023-10-12 コニカミノルタ株式会社 Information processing device, information processing program, machine-learning device, and machine-learning program
CN115131825A (en) * 2022-07-14 2022-09-30 北京百度网讯科技有限公司 Human body attribute identification method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915655A (en) * 2015-06-15 2015-09-16 西安电子科技大学 Multi-path monitor video management method and device
CN109359515A (en) * 2018-08-30 2019-02-19 东软集团股份有限公司 A kind of method and device that the attributive character for target object is identified
CN109446364A (en) * 2018-10-23 2019-03-08 北京旷视科技有限公司 Capture search method, image processing method, device, equipment and storage medium
CN109522790A (en) * 2018-10-08 2019-03-26 百度在线网络技术(北京)有限公司 Human body attribute recognition approach, device, storage medium and electronic equipment
CN109598176A (en) * 2017-09-30 2019-04-09 佳能株式会社 Identification device and recognition methods
CN109803067A (en) * 2017-11-16 2019-05-24 富士通株式会社 Video concentration method, video enrichment facility and electronic equipment
CN109992685A (en) * 2017-12-29 2019-07-09 杭州海康威视系统技术有限公司 A kind of method and device of retrieving image
CN110135246A (en) * 2019-04-03 2019-08-16 平安科技(深圳)有限公司 A kind of recognition methods and equipment of human action

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000293685A (en) 1999-04-06 2000-10-20 Toyota Motor Corp Scene recognizing device
JP6532043B2 (en) 2017-10-26 2019-06-19 パナソニックIpマネジメント株式会社 Lost object monitoring device, left object monitoring system provided with the same, and left object monitoring method
JP2018120644A (en) 2018-05-10 2018-08-02 シャープ株式会社 Identification apparatus, identification method, and program

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915655A (en) * 2015-06-15 2015-09-16 西安电子科技大学 Multi-path monitor video management method and device
CN109598176A (en) * 2017-09-30 2019-04-09 佳能株式会社 Identification device and recognition methods
CN109803067A (en) * 2017-11-16 2019-05-24 富士通株式会社 Video concentration method, video enrichment facility and electronic equipment
CN109992685A (en) * 2017-12-29 2019-07-09 杭州海康威视系统技术有限公司 A kind of method and device of retrieving image
CN109359515A (en) * 2018-08-30 2019-02-19 东软集团股份有限公司 A kind of method and device that the attributive character for target object is identified
CN109522790A (en) * 2018-10-08 2019-03-26 百度在线网络技术(北京)有限公司 Human body attribute recognition approach, device, storage medium and electronic equipment
CN109446364A (en) * 2018-10-23 2019-03-08 北京旷视科技有限公司 Capture search method, image processing method, device, equipment and storage medium
CN110135246A (en) * 2019-04-03 2019-08-16 平安科技(深圳)有限公司 A kind of recognition methods and equipment of human action

Also Published As

Publication number Publication date
JP2021034015A (en) 2021-03-01
JP7491057B2 (en) 2024-05-28

Similar Documents

Publication Publication Date Title
US11790682B2 (en) Image analysis using neural networks for pose and action identification
Tapu et al. A smartphone-based obstacle detection and classification system for assisting visually impaired people
CN110427905A (en) Pedestrian tracting method, device and terminal
CN108304819B (en) Gesture recognition system and method, and storage medium
CN107633206B (en) Eyeball motion capture method, device and storage medium
CN106030610A (en) Real-time 3D gesture recognition and tracking system for mobile devices
CN111950321B (en) Gait recognition method, device, computer equipment and storage medium
Do et al. Real-time and robust multiple-view gender classification using gait features in video surveillance
CN112417205A (en) Target retrieval device and method and electronic equipment
CN104794446A (en) Human body action recognition method and system based on synthetic descriptors
Pang et al. Analysis of computer vision applied in martial arts
US20220129669A1 (en) System and Method for Providing Multi-Camera 3D Body Part Labeling and Performance Metrics
CN111126102A (en) Personnel searching method and device and image processing equipment
CN116246343A (en) Light human body behavior recognition method and device
Desai Segmentation and recognition of fingers using Microsoft Kinect
WO2020016963A1 (en) Information processing device, control method, and program
JP2017097549A (en) Image processing apparatus, method, and program
CN111274854A (en) Human body action recognition method and vision enhancement processing system
Yang et al. Football referee gesture recognition algorithm based on YOLOv8s
Tsinikos et al. Real-time activity recognition for surveillance applications on edge devices
Pham et al. Detection and tracking hand from FPV: benchmarks and challenges on rehabilitation exercises dataset
Elshami et al. A Comparative Study of Recent 2D Human Pose Estimation Methods
GB2603640A (en) Action identification using neural networks
WO2023084778A1 (en) Image processing device, image processing method, and program
WO2023209955A1 (en) Information processing device, information processing method, and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination