CN108256401B - Method and device for obtaining target attribute feature semantics - Google Patents

Method and device for obtaining target attribute feature semantics Download PDF

Info

Publication number
CN108256401B
CN108256401B CN201611244945.3A CN201611244945A CN108256401B CN 108256401 B CN108256401 B CN 108256401B CN 201611244945 A CN201611244945 A CN 201611244945A CN 108256401 B CN108256401 B CN 108256401B
Authority
CN
China
Prior art keywords
attribute
target
features
video image
semantics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611244945.3A
Other languages
Chinese (zh)
Other versions
CN108256401A (en
Inventor
陈锡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201611244945.3A priority Critical patent/CN108256401B/en
Publication of CN108256401A publication Critical patent/CN108256401A/en
Application granted granted Critical
Publication of CN108256401B publication Critical patent/CN108256401B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Abstract

The embodiment of the invention provides a method and a device for acquiring target attribute feature semantics, wherein the method comprises the following steps: acquiring a video image, wherein the video image contains at least one target; obtaining a detection result of a target from the video image; extracting attribute features of the target from the video image by using the detection result of the target to obtain an attribute comprehensive feature set of the target, wherein the attribute comprehensive feature set contains data of a plurality of attribute features; and processing the data of the attribute features in the attribute comprehensive feature set by using the relation among the attributes to obtain the semantics of the attribute features of the target. The embodiment of the invention can obtain the semantics of more types of attribute features, can meet the actual application requirements of more attribute features, and has higher accuracy of the obtained semantics of the attribute features.

Description

Method and device for obtaining target attribute feature semantics
Technical Field
The invention relates to the field of intelligent video monitoring, in particular to a method and a device for acquiring target attribute feature semantics.
Background
Video monitoring is an important research content in the fields of computer vision, mode recognition, artificial intelligence and the like, and has wide application prospects in the fields of safety monitoring, intelligent transportation, military navigation and the like. The current intelligent video monitoring does not meet the purpose of only providing video images for manual monitoring, but detects a target according to the obtained video images to obtain attribute characteristics of the target. The requirements for higher monitoring, searching and positioning of the target can be met through the acquired attribute characteristics of the target.
In the prior art, a method for establishing a semantic scene model for a scene image of a moving object by using a computer obtains a track of the object in a video image to obtain the semantic of track characteristics. In practical application, however, it is also necessary to obtain various attribute features except the target trajectory, such as the color and model of the vehicle, and the sex and age of the pedestrian, so as to obtain the attribute features of the target described in other angles except the trajectory, and convert the attribute features into semantics, so as to meet higher monitoring, searching and positioning requirements. Therefore, the types of the target attributes acquired by the existing method are few, the application range in the field of intelligent video monitoring is narrow, and the actual application requirements on more attribute characteristics cannot be met.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for acquiring object attribute feature semantics, which are used for acquiring semantics of more attribute features of an object from a video image so as to meet the actual application requirements of the more attribute features. The specific technical scheme is as follows:
the embodiment of the invention provides a method for acquiring target attribute feature semantics, which comprises the following steps:
acquiring a video image, wherein the video image contains at least one target;
obtaining a detection result of a target from the video image;
extracting attribute features of the target from the video image by using the detection result of the target to obtain an attribute comprehensive feature set of the target, wherein the attribute comprehensive feature set contains data of a plurality of attribute features;
and processing the data of the attribute features in the attribute comprehensive feature set by using the relation among the attributes to obtain the semantics of the attribute features of the target.
Optionally, the step of acquiring a video image includes:
acquiring a video image from video monitoring equipment, wherein the video monitoring equipment at least comprises a camera;
the target at least comprises one of a motor vehicle, a non-motor vehicle and a pedestrian or any combination thereof.
Optionally, the detection result of the target at least includes: type of object, size of object, location of object in video image.
Optionally, the step of extracting the attribute features of the target from the video image by using the detection result of the target to obtain the attribute comprehensive feature set of the target includes:
extracting an image of the target from the video image by using the detection result of the target;
converting the format of the image of the target to obtain image data of a specified format corresponding to the target;
acquiring a universal attribute set corresponding to the type of the target in a plurality of preset universal attribute sets, wherein the universal attribute set comprises a plurality of attributes;
calculating the image data of the target by using the general attribute set corresponding to the type of the target to obtain a general attribute feature set of the target, wherein the general attribute feature set comprises a plurality of attribute features;
and respectively calculating the attribute features in the general attribute feature set to obtain data of a plurality of attribute features, and integrating the data of the attribute features into an attribute comprehensive feature set of a target.
Optionally, the step of processing the data of the plurality of attribute features in the attribute comprehensive feature set by using the relationship among the plurality of attributes to obtain the semantics of the plurality of attribute features of the target includes:
processing the data of each attribute feature in the attribute comprehensive feature set into a one-dimensional array, and acquiring an attribute relation matrix of a target by the one-dimensional array corresponding to the data of the plurality of attribute features;
multiplying the attribute relation matrix by a preset coefficient matrix to obtain an attribute relation network of the target, wherein the preset coefficient matrix contains the relation among a plurality of attributes;
classifying and mapping the attributes in the attribute relationship network to obtain the classification probability of the attribute characteristics;
obtaining a classification result of the attribute characteristics according to the classification probability of the attribute characteristics;
and converting the classification result of the attribute features into semantics to obtain the semantics of a plurality of attribute features of the target.
Optionally, after obtaining semantics of a plurality of attribute features of the target, the method further includes:
transmitting semantics of a plurality of attribute features of the target to an application device.
The embodiment of the invention also provides a device for acquiring the target attribute feature semantics, which comprises the following steps:
the system comprises a video image acquisition module, a video image acquisition module and a video image processing module, wherein the video image acquisition module is used for acquiring a video image which contains at least one target;
the target detection module is used for obtaining a detection result of a target from the video image;
the attribute feature extraction module is used for extracting the attribute features of the target from the video image by using the detection result of the target to obtain an attribute comprehensive feature set of the target, wherein the attribute comprehensive feature set contains data of a plurality of attribute features;
and the attribute collaborative judgment module is used for processing the data of the plurality of attribute features in the attribute comprehensive feature set by using the relation among the plurality of attributes to obtain the semantics of the plurality of attribute features of the target.
Optionally, the video image acquisition module is specifically configured to:
acquiring a video image from video monitoring equipment, wherein the video monitoring equipment at least comprises a camera;
the target at least comprises one of a motor vehicle, a non-motor vehicle and a pedestrian or any combination thereof.
Optionally, the detection result of the target at least includes: type of object, size of object, location of object in video image.
Optionally, the attribute feature extraction module includes:
the image extraction submodule is used for extracting an image of the target from the video image by using the detection result of the target;
the format conversion submodule is used for converting the format of the image of the target to obtain image data of a specified format corresponding to the target;
the universal attribute set acquisition submodule is used for acquiring a universal attribute set corresponding to the type of the target in a plurality of preset universal attribute sets, wherein each universal attribute set comprises a plurality of attributes;
the universal attribute feature set acquisition submodule is used for calculating the image data of the target by utilizing a universal attribute set corresponding to the type of the target to obtain a universal attribute feature set of the target, wherein the universal attribute feature set comprises a plurality of attribute features;
and the attribute comprehensive characteristic set acquisition submodule is used for respectively calculating the attribute characteristics in the universal attribute characteristic set to acquire data of a plurality of attribute characteristics and integrating the data of the attribute characteristics into a target attribute comprehensive characteristic set.
Optionally, the attribute collaborative determination module includes:
the attribute relationship matrix acquisition submodule is used for processing the data of each attribute feature in the attribute comprehensive feature set into a one-dimensional array, and acquiring an attribute relationship matrix of a target by the one-dimensional array corresponding to the data of a plurality of attribute features;
the attribute relation network acquisition submodule is used for multiplying the attribute relation matrix by a preset coefficient matrix to obtain an attribute relation network of a target, wherein the preset coefficient matrix contains the relation among a plurality of attributes;
the classification mapping submodule is used for performing classification mapping on the attributes in the attribute relation network to obtain the classification probability of the attribute characteristics;
the classification result acquisition submodule is used for acquiring a classification result of the attribute characteristics according to the classification probability of the attribute characteristics;
and the semantic conversion submodule is used for converting the classification result of the attribute features into semantics to obtain the semantics of the plurality of attribute features of the target.
Optionally, the apparatus further comprises:
a sending module, configured to send semantics of the plurality of attribute features of the target to an application device.
The method for obtaining the semantics of the attribute features of the target provided by the embodiment of the invention comprises the steps of firstly obtaining a video image containing the target, secondly obtaining a detection result of the target from the video image, thirdly extracting the attribute features of the target from the video image by using the detection result of the target, finally obtaining an attribute comprehensive feature set of the data containing a plurality of attribute features of the target, and lastly processing the data of the plurality of attribute features in the attribute comprehensive feature set by using the relation among the plurality of attributes to obtain the semantics of the plurality of attribute features of the target. The embodiment of the invention can obtain the semantics of more types of attribute features, such as the color and the model of a vehicle, the sex and the age of a pedestrian, and the like, can meet the actual application requirements of more attribute features, and has higher accuracy of the obtained semantics of the attribute features. Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for obtaining target attribute feature semantics according to an embodiment of the present invention;
FIG. 2 is a flow chart based on an example of the method shown in FIG. 1;
FIG. 3 is a diagram of an apparatus for obtaining target attribute feature semantics according to an embodiment of the present invention;
fig. 4 is a block diagram based on an example of the apparatus shown in fig. 3.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method and a device for acquiring object attribute feature semantics, which can acquire the semantics of more attribute features of an object from a video image so as to meet the actual application requirements of the more attribute features.
Currently, the field of intelligent video surveillance requires target detection of acquired video images to acquire attributes of targets. And the requirements for higher monitoring, searching and positioning of the target are met through the acquired attributes of the target. For example, the target is located and tracked in the video image by using the attribute of the target. However, the existing method for extracting the track features of the target and obtaining the track feature semantics can only provide the attribute features of the target described by the track angle, the types of the obtained target attributes are few, the application range in the field of intelligent video monitoring is narrow, and the actual application requirements on more attribute features cannot be met.
The embodiment of the invention provides a method for obtaining the semantics of target attribute features, which mainly comprises the following steps: extracting an image of a target from the video image, extracting a plurality of attribute features of the target aiming at the image of the target, and processing the obtained attribute features by utilizing the relation among the attributes to obtain the semantics of the attribute features of the target.
Referring to fig. 1, fig. 1 is a flowchart of a method for obtaining target attribute feature semantics according to an embodiment of the present invention. The method comprises the following steps:
step 101, acquiring a video image.
In this embodiment, a video image is obtained through a video monitoring device, where the video monitoring device is: the equipment is positioned in the video monitoring area, can carry out video monitoring and provides video images of the monitoring area. The video surveillance apparatus may include: video cameras, still cameras, cell phones, etc.
The video image contains at least one object. The target is a monitoring object in the monitoring scene, and the target may be any object or living being specified as required, and may include: various vehicles, pedestrians, animals, buildings, and so forth.
And 102, obtaining a detection result of the target from the video image.
In this embodiment, a target detection method or a target detection model is adopted to obtain a target detection result from a video image. The present application does not limit the target detection method or the target detection model used, and any method capable of detecting a target in a video image may be applied to the present application. Among them, the target detection method is, for example, a target detection method based on the technologies of deep learning, pattern recognition, image processing, and the like. The target detection method comprises the following steps: a boosting method for improving the accuracy of a weak classification algorithm, a deformable part DPM model, an FRCNN (fast-rcnn) method and the like.
The detection result of the target may include: the type of object, the size of the object, the location of the object, the motion state of the object, etc.
Wherein the types of targets include: vehicle type, pedestrian type, animal type, building type, etc. The size of the target and the position of the target represent the area range of the target in the video image. The motion state of the object includes a stationary state or a motion state.
It should be noted that, in the embodiment of the present invention, all the targets in the video image can be detected at one time, and the detection results corresponding to all the targets are obtained.
The target can be accurately positioned in the video image and the image of the target can be acquired by acquiring the detection result of the target.
And 103, extracting the attribute features of the target from the video image by using the detection result of the target to obtain an attribute comprehensive feature set of the target. The attribute comprehensive characteristic set comprises data of a plurality of attribute characteristics.
In this embodiment, the type of the target, the size of the target, and the position of the target are used to locate the target in the video image, the image of the target is separately extracted from the video image, the attribute features of the target are extracted from the image of the target, the attribute features of the target are represented in a data form, and the attribute comprehensive feature set of the target is obtained.
For example, for a type that is a pedestrian type, the attributes may include: gender, age, hairstyle, body type, top dress style, bottom dress style, whether to carry a bag, etc. For a type of motor vehicle, the attributes may include: vehicle type, color, vehicle brand, sub-brand, etc.
Taking pedestrian a as an example, according to step 103, for the attribute: the data of each attribute characteristic of the pedestrian A can be acquired by sex, hairstyle, top dressing style, bottom dressing style and whether the bag is carried, wherein the corresponding meaning of the data of each attribute characteristic can be as follows: male, short hair, red short sleeve, white shorts and red carrying bag.
And 104, processing the data of the attribute features in the attribute comprehensive feature set by using the relation among the attributes to obtain the semantics of the attribute features of the target.
The preset relationship among the attributes is data of the influence degree of each attribute on other attributes, which is obtained by carrying out statistical analysis on real attributes of a large number of pictures and utilizing a big data analysis means. The influence degree of each attribute on other attributes is embodied as one-dimensional data, and the influence degree of the attributes on other attributes is a coefficient matrix formed by the one-dimensional data, and the coefficient matrix represents the relation between the attributes.
In this embodiment, the multiple attributes in the obtained attribute comprehensive feature set are cooperatively judged by using the preset link between the attributes, that is, the attribute comprehensive feature set is multiplied by a coefficient matrix, the respective classification probability of the multiple attributes in the attribute comprehensive feature set is determined by using the data of the influence degree of each attribute on other attributes in the coefficient matrix, the classification probability is the probability that each attribute is determined as each feature in the multiple features of the attribute, the corrected attribute feature of each attribute is obtained according to the respective classification probability of the attribute, and finally, the semantic conversion is performed on the corrected attribute feature to obtain the semantics of the multiple attribute features of the target. The process of multiplying the attribute comprehensive feature set by the coefficient matrix is the process of correcting the plurality of attribute features.
Taking pedestrian a as an example, the data for the attribute features in step 103 corresponds to the meaning: male, short hair, red short sleeve, white shorts and red carrying bag. In step 104, the relationship among the attributes, such as "gender", "top-up style", "bottom-up style" and "whether to carry a bag", is utilized to determine the attributes of "wearing red short sleeves, white shorts and carrying a red bag", the degree of contribution to the attribute feature of "women" in the attribute "gender" is greater than the degree of contribution to "men", and after the correction of each attribute through the relationship matrix, the probability that the attribute feature of "women" in the attribute "gender" is finally obtained is greater than the probability of "men", so that the data "men" of the attribute feature of the pedestrian a is corrected to "women". And finally, converting the data of the attribute characteristics into the semantic meaning of character description, namely, the pedestrian A is as follows: short hair, female, body wear red cotta, white shorts, carry red bag.
In the embodiment of the invention, through the steps 101 to 104, the video image is converted into the attribute characteristics of the target described by the characters, the types of the acquired attributes are increased, and the accuracy of attribute characteristic judgment can be improved by adopting multi-attribute cooperative judgment.
It can be seen that, in the method for obtaining semantics of attribute features of an object provided in the embodiment of the present invention, the semantics of multiple attribute features of the object are obtained by obtaining a video image containing the object, obtaining a detection result of the object from the video image, extracting the attribute features of the object from the video image by using the detection result of the object, obtaining an attribute comprehensive feature set of the object and data containing multiple attribute features, and finally processing the data of the multiple attribute features in the attribute comprehensive feature set by using a connection among multiple attributes. The embodiment of the invention can obtain the semantics of more types of attribute features, such as the color, the model, the brand and the sub-brand of a vehicle, the sex, the age, the hairstyle, the body type, the top style, the bottom style, whether to carry things, whether to carry a backpack, whether to wear glasses, whether to wear a mask, whether to wear a hat, whether to play a mobile phone and the like of a pedestrian, can meet the actual application requirements of more attribute features, and the obtained semantics of the attribute features have higher accuracy.
Referring to fig. 2, fig. 2 is a flowchart based on an example of the method shown in fig. 1, as an embodiment of the method shown in fig. 1. The method comprises the following steps:
step 201, acquiring a video image from a video monitoring device.
In this embodiment, the video monitoring device includes at least a camera.
The video image contains at least one target, and the target at least comprises one of a motor vehicle, a non-motor vehicle and a pedestrian or any combination thereof.
Step 202, obtaining a detection result of the target from the video image.
In this embodiment, the prior art is adopted for detecting the target in the video image, and details are not described herein.
The detection result of the target at least comprises: type of object, size of object, location of object in video image.
The types of targets include at least: vehicle type, non-vehicle type, pedestrian type.
The size of the target may be a curved box containing a preset shape of the target, such as a rectangular box, a circular box, etc.
The position of the target in the video image may be a coordinate value of a center point of the target in the video image, or an edge preset point of a curve frame containing a preset shape of the target, a coordinate value in the video image, or the like. For example, the edge preset point of the rectangular frame can be a point where four corners of the rectangle are located.
Step 2031, using the detection result of the target to extract the image of the target from the video image.
In this embodiment, the target is positioned in the video image according to the type of the target, the size of the target, and the position of the target in the video image, and the image of the target is extracted.
The prior art is adopted for extracting the target image from the video image, and details are not repeated here.
Step 2032, the format of the image of the target is converted to obtain the image data of the specified format corresponding to the target.
In this embodiment, the format of the target image is an RGB (red-green-blue) format or a color coding method YUV format embodied by luminance and chrominance, and the format of the target image is converted into a data format required for attribute feature extraction, so as to obtain the target image data. The format conversion further includes: and according to the requirement of a data format required by the attribute feature extraction, carrying out various image enhancement operations on the image surrounding the target, such as changing the brightness, increasing the noise and the like.
Step 2033, a generic attribute set corresponding to the type of the target is obtained from a plurality of pre-set generic attribute sets, where the generic attribute set includes a plurality of attributes.
In this embodiment, corresponding generic attribute sets are preset for different types of targets. The generic attribute set contains all the attributes of this type of object.
For example, the generic set of attributes for an object of type vehicle type may include attributes for a vehicle such as: vehicle type, color, vehicle brand, sub-brand, etc. The generic set of attributes for objects of the type pedestrian may include attributes for pedestrians such as: gender, age, hairstyle, body type, top style, bottom style, carrying, backpack, wearing glasses, wearing mask, wearing hat, wearing mobile phone, etc.
In this embodiment, the type of the target is obtained according to the image data of the target, and a generic attribute set corresponding to the type of the target is obtained in a plurality of pre-set generic attribute sets.
The attributes of each type of target are different, and the embodiment of the invention can acquire the general attribute set corresponding to the type of the target aiming at various types of targets, thereby accurately and thoroughly acquiring the attribute characteristics of the target.
In the following embodiments, the type of the object is taken as an example of a pedestrian type, and the method for acquiring the object attribute feature semantics of other types of objects is similar to that of the pedestrian type.
Step 2034, calculating the image data of the target by using the generic attribute set corresponding to the type of the target, and obtaining a generic attribute feature set of the target containing a plurality of attribute features.
In this embodiment, a deep learning technique is adopted for each attribute in the general attribute set corresponding to the type of the target, and the image data of the target is calculated to obtain the general attribute feature set of the target. The general attribute feature set contains the attribute features of the target, which are extracted for each attribute in the general attribute set, and the attribute features are embodied in a data form.
Taking the pedestrian B as an example, the general attribute set corresponding to the pedestrian type is: gender, age, hairstyle, style of top dressing, style of bottom dressing, whether to wear a backpack, whether to wear glasses, and whether to wear a mask. Calculating the image data of the pedestrian B, extracting the features of each attribute, and obtaining the general attribute feature set of the pedestrian B may include: male characteristics, young characteristics, long hair characteristics, short sleeve characteristics, skirt characteristics, no backpack characteristics, glasses wearing characteristics, and no mask wearing characteristics. The above attribute features may further include a plurality of detailed attribute features, for example, the feature of long hair may include: long straight hair, long perm, and the like; the features of wearing glasses may include: wearing myopia glasses, sunglasses, etc.
It should be noted that a common set of attributes applies to all targets of this type. The obtained general attribute feature set is a set which is used for carrying out basic judgment on attributes in the general attribute set, covers basic attribute features of all attributes and is basic description of attribute features of the target, and the general attribute feature set covers as many attributes as possible and is very strong in descriptive property and inclusion property. Therefore, in step 2035, a small number of operations are required to further extract the attribute features. The embodiment of the invention adopts the universal attribute set, can easily expand the attribute types when the attribute types are continuously increased, reduces the time overhead and space overhead increment of extracting the target attribute characteristics, and improves the expandability of the method, so the method of the embodiment of the invention is beneficial to a hardware platform with limited resources.
Step 2035, calculating the attribute features in the generic attribute feature set respectively to obtain data of a plurality of attribute features, and integrating the data of a plurality of attribute features into a target attribute comprehensive feature set.
In this embodiment, the attribute features in the general attribute feature set are respectively calculated, where the calculation modes include convolution, sampling, truncation, normalization, and the like of the convolutional neural network.
Taking pedestrian B as an example, in the general attribute feature set: on the basis of male characteristics, young characteristics, long-hair characteristics, short-sleeve characteristics, skirt characteristics, backpack-free characteristics, glasses-wearing characteristics and mask-wearing characteristics, data of a plurality of attribute characteristics are obtained by performing a series of calculations on video images of the pedestrian B, and after the data of the attribute characteristics are further integrated and processed, an attribute comprehensive characteristic set describing the specific attribute characteristics of the pedestrian B is obtained. These comprehensive feature sets can be used to describe: male, young, long hair, black short sleeve, white skirt, no backpack, sunglasses, no mask.
Step 2041, processing the data of each attribute feature in the attribute comprehensive feature set into a one-dimensional array, and obtaining an attribute relationship matrix of the target from the one-dimensional arrays corresponding to the data of the plurality of attribute features.
Taking the pedestrian B as an example, the data of each attribute feature in the attribute comprehensive feature set of the pedestrian B is processed into a one-dimensional array, and the respective one-dimensional array of the eight attribute features is obtained. And sequentially taking the eight one-dimensional arrays as rows of the matrix according to the sequence of the attributes in the attribute comprehensive characteristic set to obtain a two-dimensional attribute relation matrix of the pedestrian B.
The embodiment of the invention converts the attribute comprehensive feature set into the attribute relation matrix for subsequent matrix calculation.
Step 2042, multiplying the attribute relationship matrix by a preset coefficient matrix to obtain an attribute relationship network of the target, wherein the preset coefficient matrix contains the relation among a plurality of attributes.
In this embodiment, the coefficient matrix is a two-dimensional matrix obtained by analyzing the inherent relationship between the attributes through a big data analysis means, and the elements in the coefficient matrix are numerical values representing the degree of relationship between the attributes. The coefficient matrix thus contains the connections between the attributes.
And multiplying the attribute relation matrix by a preset coefficient matrix to obtain an attribute relation network of the target, wherein the attribute relation network of the target is a two-dimensional matrix. Each row in the attribute relationship network is an attribute feature.
Step 2043, the attributes in the attribute relationship network are classified and mapped to obtain the classification probability of the attribute features.
In this embodiment, the classification mapping may adopt various methods, such as a regression model softmax. And performing classification mapping of the attributes aiming at the data of the attribute characteristics in the attribute relation network to obtain the classification probability of the attribute characteristics.
Taking the pedestrian B as an example, after the step 2041, the step 2042 and the step 2043 of the attribute feature of the pedestrian B, the attribute feature of the pedestrian B is classified into the attribute of "gender", the probability of 0.8 and the probability of 0.2, respectively, by the influence degree of the attribute feature of "youth, long hair, black short sleeve, white skirt, no backpack, sunglasses and no mask" on the attribute of "gender".
Step 2044, obtain the classification result of the attribute feature according to the classification probability of the attribute feature.
Taking pedestrian B as an example, the result of classifying the attribute features of the attribute "gender" of pedestrian B is obtained as "female" according to the classification probability of "female" being greater than the classification probability of "male" in the classification probability of the attribute features of the attribute "gender" of pedestrian B, so that the attribute feature "male" obtained in step 2035 is corrected to "female".
The process of obtaining the classification result of other attribute features is similar to the above process, for example, the attribute feature of the attribute "age" may include children, teenagers, adolescents, middle-aged people, and elderly people, the classification probability of each attribute feature of "age" is obtained according to the relation among a plurality of attributes, and the attribute feature determined by the attribute "age" is determined according to the maximum one of the classification probabilities.
Through steps 2041 to 2044, the attribute features in the attribute comprehensive feature set can be subjected to collaborative judgment and correction through the relation between the attributes, so that the attribute features with higher accuracy are obtained. The method for cooperatively judging all attributes based on the interrelation among the attributes, which is adopted by the embodiment of the invention, can realize the mutual correction of the attributes, improve the accuracy of attribute judgment together and solve the problem of result contradiction caused by independent judgment of the attributes.
Step 2045, convert the classification result of the attribute features into semantics, and obtain semantics of multiple attribute features of the target.
Taking the pedestrian B as an example, the attribute characteristics described by data of the pedestrian B are as follows: the method comprises the following steps of female, young, long hair, black short sleeves, white short skirts, no backpack, sunglasses and no mask, and the semantics of a plurality of attribute features of the pedestrian B are obtained according to the preset semantic corresponding relation of the attribute features. Finally, the attribute characteristics of the pedestrian B described by the characters are obtained as follows: women, young people, long hair, black short sleeves, white skirt, no backpack, sunglasses, and no mask.
The semantic correspondence of the preset attribute features may be correspondence of data and characters, and the data includes numerical values, characters, and the like. For example, the value "0" corresponds to the attribute feature "female", or the character "woman" corresponds to the attribute feature "female". When a numerical value "0" or a character "woman" appears in the attribute feature described by the data, the attribute feature is obtained as "female" through semantic conversion.
In the embodiment of the present invention, through steps 201 to 2045, semantics of a plurality of attribute features of a target are obtained through a video image containing the target.
The method for obtaining the semantics of the attribute features of the target according to the embodiment of the present invention may further include step 205, sending the semantics of the plurality of attribute features of the target to the application device.
In this embodiment, the application device includes a computer and the like, and the application device may perform various operations according to the obtained semantics of the multiple attribute features of the target, for example: displaying the attribute information of the target, and performing target search by using the attribute characteristics as a screening condition. The semantic meaning of the attribute characteristics acquired by the embodiment of the invention converts the image into the text description of the attribute, can provide information resources for the requirement of video structuring, can provide good technical support for searching the image by using the image, and can meet the application requirement of more intelligent video monitoring.
It can be seen that, in the method for obtaining semantic meanings of attribute features of a target provided in the embodiments of the present invention, a video image containing the target is first obtained from a video monitoring device, a detection result of the target is then obtained from the video image, an image of the target is then extracted from the video image using the detection result of the target, image data of the target is calculated using a general attribute set corresponding to the type of the target for data of the image of the target, a general attribute feature set of the target is obtained, an attribute comprehensive feature set of the target is obtained on the basis of the general attribute feature set, finally, a collaborative determination of the attribute is performed on the attribute comprehensive feature set using a connection between attributes, and a determination result is processed to obtain semantic meanings of a plurality of attribute features of the target. The embodiment of the invention can obtain the semantics of more types of attribute features, such as the color and the model of a vehicle, the sex and the age of a pedestrian, and the like, can meet the actual application requirements of more attribute features, and has higher accuracy of the obtained semantics of the attribute features.
Referring to fig. 3, fig. 3 is a structural diagram of an apparatus for obtaining target attribute feature semantics according to an embodiment of the present invention. The method comprises the following steps:
the video image acquiring module 301 is configured to acquire a video image, where the video image includes at least one target.
And an object detection module 302, configured to obtain a detection result of the object from the video image.
The attribute feature extraction module 303 is configured to extract an attribute feature of the target from the video image by using a detection result of the target, and obtain an attribute comprehensive feature set of the target, where the attribute comprehensive feature set includes data of a plurality of attribute features.
The attribute collaborative determination module 304 is configured to process data of a plurality of attribute features in the attribute comprehensive feature set by using a relationship between the plurality of attributes, and obtain semantics of the plurality of attribute features of the target.
It can be seen that, in the apparatus for obtaining semantics of attribute features of an object according to the embodiment of the present invention, the video image including the object is obtained, the detection result of the object is obtained from the video image, the attribute features of the object are extracted from the video image according to the detection result of the object, an attribute comprehensive feature set of data including multiple attribute features of the object is obtained, and finally, the data including the multiple attribute features in the attribute comprehensive feature set is processed according to the relationship among the multiple attributes, so as to obtain semantics of the multiple attribute features of the object. The embodiment of the invention can obtain the semantics of more types of attribute features, such as the color and the model of a vehicle, the sex and the age of a pedestrian, and the like, can meet the actual application requirements of more attribute features, and has higher accuracy of the obtained semantics of the attribute features.
It should be noted that, the apparatus according to the embodiment of the present invention is an apparatus applying the method for obtaining target attribute feature semantics, and all embodiments of the method for obtaining target attribute feature semantics are applicable to the apparatus and can achieve the same or similar beneficial effects.
Referring to fig. 4 as an embodiment of the apparatus shown in fig. 3, fig. 4 is a block diagram based on an example of the apparatus shown in fig. 3. The method comprises the following steps:
the video image obtaining module 401 is specifically configured to:
acquiring a video image from video monitoring equipment, wherein the video monitoring equipment at least comprises a camera; the video image contains at least one target, and the target at least comprises one of a motor vehicle, a non-motor vehicle and a pedestrian or any combination thereof.
An object detection module 402, configured to obtain a detection result of an object from a video image; wherein, the detection result of the target at least comprises: type of object, size of object, location of object in video image.
An attribute feature extraction module 403, including:
the image extraction sub-module 4031 is configured to extract an image of the target from the video image by using the detection result of the target.
And a format conversion module 4032, configured to convert the format of the image of the target, and obtain image data in a specified format corresponding to the target.
The generic attribute set obtaining sub-module 4033 is configured to obtain a generic attribute set corresponding to the type of the target in a plurality of preset generic attribute sets, where the generic attribute set includes a plurality of attributes.
The general attribute feature set acquisition submodule 4034 is configured to calculate image data of the target by using a general attribute set corresponding to the type of the target, and acquire a general attribute feature set of the target, which includes a plurality of attribute features.
And the attribute comprehensive feature set acquisition submodule 4035 is used for respectively calculating the attribute features in the general attribute feature set to acquire data of a plurality of attribute features, and integrating the data of the attribute features into a target attribute comprehensive feature set.
The attribute cooperation judging module 404 includes:
the attribute relationship matrix obtaining sub-module 4041 is configured to process the data of each attribute feature in the attribute comprehensive feature set into a one-dimensional array, and obtain the attribute relationship matrix of the target according to the one-dimensional array corresponding to the data of the plurality of attribute features.
The attribute relationship network obtaining sub-module 4042 is configured to multiply the attribute relationship matrix with a preset coefficient matrix to obtain an attribute relationship network of the target, where the preset coefficient matrix includes a relationship among a plurality of attributes.
And the classification mapping submodule 4043 is used for performing classification mapping on the attributes in the attribute relationship network to obtain the classification probability of the attribute features.
The classification result obtaining sub-module 4044 is configured to obtain a classification result of the attribute feature according to the classification probability of the attribute feature.
The semantic conversion module 4045 is configured to convert the classification result of the attribute features into semantics, and obtain semantics of the multiple attribute features of the target.
The device of the embodiment of the invention also comprises:
a sending module 405, configured to send semantics of the plurality of attribute features of the target to the application device.
It can be seen that, in the apparatus for obtaining semantics of attribute features of a target provided in the embodiments of the present invention, a video image containing the target is first obtained from a video monitoring device, the target in the video image is then detected to obtain a detection result of the target, the detection result of the target is then used to extract an image of the target from the video image, the image data of the target is calculated according to data of the image of the target by using a general attribute set corresponding to the type of the target to obtain a general attribute feature set of the target, an attribute comprehensive feature set of the target is obtained on the basis of the general attribute feature set, and finally, by using a connection between attributes, a collaborative determination between attributes is performed on the attribute comprehensive feature set, and the determination result is processed to obtain semantics of a plurality of attribute features of the target. The embodiment of the invention can obtain the semantics of more types of attribute features, such as the color and the model of a vehicle, the sex and the age of a pedestrian, and the like, can meet the actual application requirements of more attribute features, and has higher accuracy of the obtained semantics of the attribute features.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method for obtaining target attribute feature semantics is applied to the field of intelligent video monitoring and is characterized by comprising the following steps:
acquiring a video image, wherein the video image contains at least one target;
obtaining a detection result of a target from the video image;
extracting attribute features of the target from the video image by using the detection result of the target to obtain an attribute comprehensive feature set of the target, wherein the attribute comprehensive feature set contains data of a plurality of attribute features;
processing the data of the attribute features in the attribute comprehensive feature set by using the relation among the attributes to obtain the semantics of the attribute features of the target;
the step of processing the data of the plurality of attribute features in the attribute comprehensive feature set by using the relation among the plurality of attributes to obtain the semantics of the plurality of attribute features of the target comprises the following steps:
processing the data of each attribute feature in the attribute comprehensive feature set into a one-dimensional array, and acquiring an attribute relation matrix of a target by the one-dimensional array corresponding to the data of the plurality of attribute features;
multiplying the attribute relation matrix by a preset coefficient matrix to obtain an attribute relation network of the target, wherein the preset coefficient matrix contains the relation among a plurality of attributes;
classifying and mapping the attributes in the attribute relationship network to obtain the classification probability of the attribute characteristics;
obtaining a classification result of the attribute characteristics according to the classification probability of the attribute characteristics;
and converting the classification result of the attribute features into semantics to obtain the semantics of a plurality of attribute features of the target.
2. The method of claim 1, wherein the step of obtaining the video image comprises:
acquiring a video image from video monitoring equipment, wherein the video monitoring equipment at least comprises a camera;
the target at least comprises one of a motor vehicle, a non-motor vehicle and a pedestrian or any combination thereof.
3. The method of claim 1, wherein the detection result of the target at least comprises: type of object, size of object, location of object in video image.
4. The method according to claim 3, wherein the step of extracting the attribute features of the target from the video image by using the detection result of the target to obtain the attribute comprehensive feature set of the target comprises:
extracting an image of the target from the video image by using the detection result of the target;
converting the format of the image of the target to obtain image data of a specified format corresponding to the target;
acquiring a universal attribute set corresponding to the type of the target in a plurality of preset universal attribute sets, wherein the universal attribute set comprises a plurality of attributes;
calculating the image data of the target by using the general attribute set corresponding to the type of the target to obtain a general attribute feature set of the target, wherein the general attribute feature set comprises a plurality of attribute features;
and respectively calculating the attribute features in the general attribute feature set to obtain data of a plurality of attribute features, and integrating the data of the attribute features into an attribute comprehensive feature set of a target.
5. The method of claim 1, wherein after obtaining the semantics of the plurality of attribute features of the target, the method further comprises:
transmitting semantics of a plurality of attribute features of the target to an application device.
6. The utility model provides a device of acquisition target attribute feature semantics, is applied to intelligent video monitoring field, its characterized in that includes:
the system comprises a video image acquisition module, a video image acquisition module and a video image processing module, wherein the video image acquisition module is used for acquiring a video image which contains at least one target;
the target detection module is used for obtaining a detection result of a target from the video image;
the attribute feature extraction module is used for extracting the attribute features of the target from the video image by using the detection result of the target to obtain an attribute comprehensive feature set of the target, wherein the attribute comprehensive feature set contains data of a plurality of attribute features;
the attribute collaborative judgment module is used for processing the data of the attribute features in the attribute comprehensive feature set by utilizing the relation among the attributes to obtain the semantics of the attribute features of the target;
the attribute collaborative judgment module comprises:
the attribute relationship matrix acquisition submodule is used for processing the data of each attribute feature in the attribute comprehensive feature set into a one-dimensional array, and acquiring an attribute relationship matrix of a target by the one-dimensional array corresponding to the data of a plurality of attribute features;
the attribute relation network acquisition submodule is used for multiplying the attribute relation matrix by a preset coefficient matrix to obtain an attribute relation network of a target, wherein the preset coefficient matrix contains the relation among a plurality of attributes;
the classification mapping submodule is used for performing classification mapping on the attributes in the attribute relation network to obtain the classification probability of the attribute characteristics;
the classification result acquisition submodule is used for acquiring a classification result of the attribute characteristics according to the classification probability of the attribute characteristics;
and the semantic conversion submodule is used for converting the classification result of the attribute features into semantics to obtain the semantics of the plurality of attribute features of the target.
7. The apparatus of claim 6, wherein the video image acquisition module is specifically configured to:
acquiring a video image from video monitoring equipment, wherein the video monitoring equipment at least comprises a camera;
the target at least comprises one of a motor vehicle, a non-motor vehicle and a pedestrian or any combination thereof.
8. The apparatus of claim 6, wherein the detection result of the target at least comprises: type of object, size of object, location of object in video image.
9. The apparatus of claim 8, wherein the attribute feature extraction module comprises:
the image extraction submodule is used for extracting an image of the target from the video image by using the detection result of the target;
the format conversion submodule is used for converting the format of the image of the target to obtain image data of a specified format corresponding to the target;
the universal attribute set acquisition submodule is used for acquiring a universal attribute set corresponding to the type of the target in a plurality of preset universal attribute sets, wherein each universal attribute set comprises a plurality of attributes;
the universal attribute feature set acquisition submodule is used for calculating the image data of the target by utilizing a universal attribute set corresponding to the type of the target to obtain a universal attribute feature set of the target, wherein the universal attribute feature set comprises a plurality of attribute features;
and the attribute comprehensive characteristic set acquisition submodule is used for respectively calculating the attribute characteristics in the universal attribute characteristic set to acquire data of a plurality of attribute characteristics and integrating the data of the attribute characteristics into a target attribute comprehensive characteristic set.
10. The apparatus of claim 6, further comprising:
a sending module, configured to send semantics of the plurality of attribute features of the target to an application device.
CN201611244945.3A 2016-12-29 2016-12-29 Method and device for obtaining target attribute feature semantics Active CN108256401B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611244945.3A CN108256401B (en) 2016-12-29 2016-12-29 Method and device for obtaining target attribute feature semantics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611244945.3A CN108256401B (en) 2016-12-29 2016-12-29 Method and device for obtaining target attribute feature semantics

Publications (2)

Publication Number Publication Date
CN108256401A CN108256401A (en) 2018-07-06
CN108256401B true CN108256401B (en) 2021-03-26

Family

ID=62719911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611244945.3A Active CN108256401B (en) 2016-12-29 2016-12-29 Method and device for obtaining target attribute feature semantics

Country Status (1)

Country Link
CN (1) CN108256401B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111976593A (en) * 2020-08-21 2020-11-24 大众问问(北京)信息科技有限公司 Voice prompt method, device, equipment and storage medium for vehicle external object
CN115099684B (en) * 2022-07-18 2023-04-07 江西中科冠物联网科技有限公司 Enterprise safety production management system and management method thereof

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1936885A (en) * 2005-09-21 2007-03-28 富士通株式会社 Natural language component identifying correcting apparatus and method based on morpheme marking
CN103020624A (en) * 2011-09-23 2013-04-03 杭州海康威视系统技术有限公司 Intelligent marking, searching and replaying method and device for surveillance videos of shared lanes
CN103593335A (en) * 2013-09-05 2014-02-19 姜赢 Chinese semantic proofreading method based on ontology consistency verification and reasoning
CN103810266A (en) * 2014-01-27 2014-05-21 中国电子科技集团公司第十研究所 Semantic network object identification and judgment method
CN104378539A (en) * 2014-11-28 2015-02-25 华中科技大学 Scene-adaptive video structuring semantic extraction camera and method thereof
CN104992142A (en) * 2015-06-03 2015-10-21 江苏大学 Pedestrian recognition method based on combination of depth learning and property learning
CN105979210A (en) * 2016-06-06 2016-09-28 深圳市深网视界科技有限公司 Pedestrian identification system based on multi-ball multi-gun camera array

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
GB2519348B (en) * 2013-10-18 2021-04-14 Vision Semantics Ltd Visual data mining

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1936885A (en) * 2005-09-21 2007-03-28 富士通株式会社 Natural language component identifying correcting apparatus and method based on morpheme marking
CN103020624A (en) * 2011-09-23 2013-04-03 杭州海康威视系统技术有限公司 Intelligent marking, searching and replaying method and device for surveillance videos of shared lanes
CN103593335A (en) * 2013-09-05 2014-02-19 姜赢 Chinese semantic proofreading method based on ontology consistency verification and reasoning
CN103810266A (en) * 2014-01-27 2014-05-21 中国电子科技集团公司第十研究所 Semantic network object identification and judgment method
CN104378539A (en) * 2014-11-28 2015-02-25 华中科技大学 Scene-adaptive video structuring semantic extraction camera and method thereof
CN104992142A (en) * 2015-06-03 2015-10-21 江苏大学 Pedestrian recognition method based on combination of depth learning and property learning
CN105979210A (en) * 2016-06-06 2016-09-28 深圳市深网视界科技有限公司 Pedestrian identification system based on multi-ball multi-gun camera array

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"Correlation-Based Video Semantic Concept Detection Using Multiple Correspondence Analysis";Lin Lin等;《2008 Tenth IEEE International Symposium on Multimedia》;20090109;第316-321页 *
"Models of Semantic Representation with Visual Attributes";Carina Silberer等;《Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics》;20131231;第572-582页 *
"属性学习若干重要问题的研究及应用";刘明霞;《中国博士学位论文全文数据库 信息科技辑》;20160715;第2016年卷(第7期);I140-23 *
"计算机视觉图像语义模型的描述方法研究";石跃祥;《中国博士学位论文全文数据库 信息科技辑》;20060615;第2006年卷(第6期);I138-9 *

Also Published As

Publication number Publication date
CN108256401A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
US20190228211A1 (en) Au feature recognition method and device, and storage medium
CN104268583B (en) Pedestrian re-recognition method and system based on color area features
CN108388882B (en) Gesture recognition method based on global-local RGB-D multi-mode
CN110728209A (en) Gesture recognition method and device, electronic equipment and storage medium
CN103020992B (en) A kind of video image conspicuousness detection method based on motion color-associations
CN106960181B (en) RGBD data-based pedestrian attribute identification method
CN103020985B (en) A kind of video image conspicuousness detection method based on field-quantity analysis
CN109101934A (en) Model recognizing method, device and computer readable storage medium
Liu et al. Detection of citrus fruit and tree trunks in natural environments using a multi-elliptical boundary model
CN110298281B (en) Video structuring method and device, electronic equipment and storage medium
CN104598871A (en) Correlation regression based face age calculating method
CN110032932B (en) Human body posture identification method based on video processing and decision tree set threshold
CN107273866A (en) A kind of human body abnormal behaviour recognition methods based on monitoring system
CN112750147A (en) Pedestrian multi-target tracking method and device, intelligent terminal and storage medium
CN115170792B (en) Infrared image processing method, device and equipment and storage medium
CN104951440B (en) Image processing method and electronic equipment
CN106570885A (en) Background modeling method based on brightness and texture fusion threshold value
CN108256401B (en) Method and device for obtaining target attribute feature semantics
CN113065568A (en) Target detection, attribute identification and tracking method and system
CN115035581A (en) Facial expression recognition method, terminal device and storage medium
WO2023279799A1 (en) Object identification method and apparatus, and electronic system
CN110458004B (en) Target object identification method, device, equipment and storage medium
CN115661903B (en) Picture identification method and device based on space mapping collaborative target filtering
Jeong et al. Homogeneity patch search method for voting-based efficient vehicle color classification using front-of-vehicle image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant