CN106920250A - Robot target identification and localization method and system based on RGB D videos - Google Patents

Robot target identification and localization method and system based on RGB D videos Download PDF

Info

Publication number
CN106920250A
CN106920250A CN201710078328.9A CN201710078328A CN106920250A CN 106920250 A CN106920250 A CN 106920250A CN 201710078328 A CN201710078328 A CN 201710078328A CN 106920250 A CN106920250 A CN 106920250A
Authority
CN
China
Prior art keywords
target
frame
video
candidate area
confidence level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710078328.9A
Other languages
Chinese (zh)
Other versions
CN106920250B (en
Inventor
陶文兵
李坤乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710078328.9A priority Critical patent/CN106920250B/en
Publication of CN106920250A publication Critical patent/CN106920250A/en
Application granted granted Critical
Publication of CN106920250B publication Critical patent/CN106920250B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a kind of robot target identification based on RGB D videos and localization method and system, by steps such as target candidate extraction, identification, the reliability estimating based on temporal consistency, Target Segmentation optimization, location estimations, target classification is determined in the scene and accurate locus positioning is obtained.Depth information of scene is utilized in the present invention, enhance identification and perceive ability with the spatial level of location algorithm, space-time consistency constraint during by using the length for being based on key frame, while Video processing efficiency is improved, it is ensured that the homogeneity and relevance of target in sequential target identification long and location tasks.In position fixing process, by the Accurate Segmentation target in plane space and the location consistency in the same target of depth information evaluation space, the collaboration target positioning in multi information mode is realized.Amount of calculation is small, and real-time is good, and identification is high with positioning precision, can be applied to be parsed based on online visual information the robot task of understanding technology.

Description

Robot target identification and localization method and system based on RGB-D videos
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of robot mesh based on RGB-D videos Mark not with localization method and system.
Background technology
In recent years, with the fast development of robot technology, the machine vision technique of object manipulator task is also obtained The extensive concern of researcher.Wherein, target identification be accurately positioned be robot vision problem an important ring, be perform after The precondition of continuous task.
Existing target identification method generally comprise extraction target information to be identified as basis of characterization and with field to be identified Two steps of matching of scape.The expression of traditional target to be identified generally comprises geometry, target appearance, extracts local feature Etc. method, often there is the deficiencies such as poor universality, stability deficiency, target abstracting capabilities difference in this kind of method.Above object table The defect for reaching also brings the difficulty for being difficult to overcome to follow-up matching process.
After obtaining the expression of target to be identified, object matching refers to that will obtain the objective expression to enter with scene characteristic to be identified Row compares, to recognize target.Generally speaking, existing method includes the two class methods based on Region Matching and characteristic matching.Base Matching in region refers to that the information for extracting image local subregion is compared, its amount of calculation and subregion number to be matched It is directly proportional;The method of feature based is matched to the characteristic feature in image, its matching accuracy rate and feature representation validity It is closely related.The class method of the above two proposes requirement higher to the acquisition of candidate region and feature representation, but due to two dimension The limitation of plane picture information and design feature, often effect is poor in the complex environment identification mission of object manipulator.
Target positioning is widely present in industrial production life, GPS, military radar monitoring such as in outdoor exercises, naval vessels Sonar etc., this kind equipment accurate positioning, operation distance range are very wide, but price is high.The alignment system of view-based access control model It is study hotspot new in recent years.According to the difference of vision sensor, be broadly divided into based on monocular vision sensor, binocular and The localization method of depth transducer, panoramic vision sensor.Monocular vision sensor price is low, simple structure, be easy to demarcate, but Positioning precision is often poor;Panoramic vision sensor can obtain complete scene information, and positioning precision is higher, but computationally intensive, Real-time is poor, equipment complex and expensive;Estimation of Depth or depth information collecting device based on binocular vision are to scene distance sense Know that ability is stronger, and system is relatively simple, real-time is easily achieved, and the concern being subject in recent years is also more and more.But this neck The research in domain is still at an early stage, still lack at present it is efficient, can real-time processing RGB-Depth videos target positioning side Method.
Due to having demand higher for depth information perception, therefore existing robot system is gathered mostly RGB-Depth videos are originated as visual information, depth information is the three-dimensional perception of scene, the level of complex target is divided, Positioning provides abundant information.However, because the complexity of robot operative scenario, computation complexity are higher, operand compared with Greatly, not yet there are system, the quickly and easily identification of RGB-Depth video objects and accurate positioning method at present.Therefore, research is based on The Indoor Robot target identification of RGB-Depth videos does not only have very strong researching value with Precision Orientation Algorithm, and has Boundless application prospect.
The content of the invention
For the disadvantages described above or Improvement requirement of prior art, the invention provides a kind of machine based on RGB-D videos People's target identification and localization method and system, the RGB-Depth videos obtained by the visual angle of handling machine people first are realized real-time , accurate target identification, and precise positioning of the target in robot working environment, so that the complexity such as auxiliary mark crawl Robot task.Thus solve to lack at present it is efficient, can real-time processing RGB-Depth videos object localization method technology Problem.
To achieve the above object, according to one aspect of the present invention, there is provided a kind of robot mesh based on RGB-D videos Mark not and localization method, including:
(1) the RGB-D sequence of frames of video of scene where positioning target to be identified is obtained;
(2) the key video sequence frame in the RGB-D sequence of frames of video is extracted, and target is extracted to the key video sequence frame and waited Favored area, filtering screening is carried out according to the corresponding depth information of each key video sequence frame to the object candidate area;
(3) object candidate area after filtering screening is identified based on depth network, by sequential space time correlation long Constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
(4) local Fast Segmentation is carried out to the object candidate area after filtering screening, according to the confidence of target identification result The timing intervals relation of degree and each key video sequence frame, chooses Chief frame of video from the key video sequence frame, and to segmentation Region carries out front and rear consecutive frame extension and collaboration optimization;
(5) key feature points are determined in the scene as positioning reference point, and then estimate that camera perspective and camera motion are estimated Evaluation, recognizes that segmentation result carries out target signature consistency constraint and target location uniformity about by Chief frame of video Beam, estimates the collaboration confidence level of positioning target to be identified and carries out space and be accurately positioned.
Preferably, the step (2) specifically includes:
(2.1) with interval sampling or key frame extraction method, it is determined that the key video sequence for recognizing positioning target to be identified Frame;
(2.2) using based on the target candidate in the confidence level sort method acquisition key video sequence frame like physical property priori Region constitutes object candidate area set, using the corresponding depth information of each key video sequence frame, obtains each object candidate area Hierarchy attributes in internal and its neighborhood, optimize screening, sort again to the object candidate area set.
Preferably, the step (3) specifically includes:
(3.1) the target identification depth network that will have been trained by the object candidate area feeding after step (2) screening, The target identification for obtaining the corresponding key video sequence frame of the object candidate area after each screening predicts the outcome and the prediction of each target identification First confidence level of result;
(3.2) the space time correlation constraint according to sequential long, the target identification of key video sequence frame is predicted the outcome carries out feature Conformance Assessment, evaluates the second confidence level that each target identification predicts the outcome, and will be put with described second by first confidence level The accumulation confidence level that reliability is obtained is ranked up, and further filters out the target time that accumulation confidence level is less than default confidence threshold value Favored area.
Preferably, the step (4) specifically includes:
(4.1) object candidate area and its extension neighborhood for being obtained for step (3.2), carry out quick Target Segmentation behaviour Make, obtain the initial segmentation of target, determine object boundary;
(4.2) it is constraint with space-time consistency in short-term, based on the accumulation confidence level ranking results in step (3.2), from institute State and filter out Chief frame of video in key video sequence frame;
(4.3) with it is long when space-time consistency be constraint, based on the initial segmentation of step (4.1), to positioning target to be identified Outward appearance modeling is carried out, 3-D graphic structure is carried out to Chief frame of video and its consecutive frame, and design maximum a posteriori probability-horse Er Kefu random field energy functions, cut algorithm and initial segmentation are optimized by figure, to the object segmentation result of single frames at this Carry out splitting extension and optimization in consecutive frame before and after frame.
Preferably, the step (5) specifically includes:
(5.1) the Chief frame of video obtained for step (4.2), according to adjacent between each Chief frame of video And visual field coincidence relation, multigroup same place point is extracted to as positioning reference point;
(5.2) the Chief frame of video overlapped according to the visual field estimates camera perspective change, and then by geometrical relationship, profit The movable information of camera is estimated with the depth information of positioning reference point point pair;
(5.3) according to the information that fathoms of positioning target to be identified, camera perspective and phase in Chief frame of video The movable information of machine, evaluates the locus uniformity of positioning target to be identified in Chief frame of video;
(5.4) according to the result of step (4.3), the feature consistency of positioning target two dimension cut zone to be identified is evaluated;
(5.5) feature consistency by overall merit positioning target two dimension cut zone to be identified and locus one Cause property, determines the locus of positioning target to be identified.
It is another aspect of this invention to provide that being with positioning there is provided a kind of robot target identification based on RGB-D videos System, including:
Acquisition module, the RGB-D sequence of frames of video for obtaining scene where positioning target to be identified;
Filtering screening module, for extracting the key video sequence frame in the RGB-D sequence of frames of video, and regards to the key Frequency frame extracts object candidate area, and the object candidate area is filtered according to each key video sequence frame corresponding depth information Screening;
Confidence level order module, for being identified to the object candidate area after filtering screening based on depth network, is led to Long sequential space time correlation constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
Optimization module, for carrying out local Fast Segmentation to the object candidate area after filtering screening, according to target identification The timing intervals relation of the confidence level of result and each key video sequence frame, chooses Chief video from the key video sequence frame Frame, and front and rear consecutive frame extension and collaboration optimization are carried out to cut zone;
Locating module, for determine in the scene key feature points as positioning reference point, and then estimate camera perspective and Camera motion estimate, recognizes that segmentation result carries out target signature consistency constraint and target position by Chief frame of video Put consistency constraint, estimate the collaboration confidence level of positioning target to be identified and carry out space to be accurately positioned.
In general, there is following skill compared with prior art, mainly by the contemplated above technical scheme of the present invention Art advantage:Depth information of scene is utilized in the present invention, identification is enhanced and is perceived ability with the spatial level of location algorithm, by adopting Space-time consistency constraint during with length based on key frame, while Video processing efficiency is improved, it is ensured that sequential target long The homogeneity and relevance of target in identification and location tasks.In position fixing process, by the Accurate Segmentation mesh in plane space It is marked with and in the location consistency of the same target of depth information evaluation space, the collaboration target realized in multi information mode is determined Position.Amount of calculation is small, and real-time is good, and identification is high with positioning precision, can be applied to parse understanding technology based on online visual information Robot task.
Brief description of the drawings
Fig. 1 is the overall procedure schematic diagram of present invention method;
Fig. 2 is the schematic flow sheet of target identification in the embodiment of the present invention;
Fig. 3 is the schematic flow sheet of targeting accuracy positioning in the embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as additionally, technical characteristic involved in invention described below each implementation method Not constituting conflict each other can just be mutually combined.
Method disclosed by the invention is related to key frame screening, the target identification based on depth network, segmentation, mark interframe to pass Pass, the location estimation based on consistency constraint and collaboration optimization etc. technology, can be directly used for RGB-D videos is defeated visual information In the robot system for entering, auxiliary robot completes target identification and targeting accuracy location tasks.
It is as shown in Figure 1 the overall procedure schematic diagram of present invention method.It will be seen from figure 1 that this method is included Target identification is accurately positioned two big steps with target, and target identification is the precondition of targeting accuracy positioning.Its specific embodiment party Formula is as follows:
(1) the RGB-D sequence of frames of video of scene where positioning target to be identified is obtained;
Preferably, in an embodiment of the invention, can be gathered by Kinect even depth vision sensor and treated The RGB-D video sequences of scene where identification positioning target;RGB pictures pair can also be gathered by binocular imaging apparatus, and passed through Disparity estimation depth information of scene is calculated as depth channel informations, so as to synthesize RGB-D videos as input.
(2) the key video sequence frame in RGB-D sequence of frames of video is extracted, and object candidate area is extracted to key video sequence frame, Filtering screening is carried out to object candidate area according to the corresponding depth information of each key video sequence frame;
(3) object candidate area after filtering screening is identified based on depth network, by sequential space time correlation long Constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
(4) local Fast Segmentation is carried out to the object candidate area after filtering screening, according to the confidence of target identification result The timing intervals relation of degree and each key video sequence frame, chooses Chief frame of video from key video sequence frame, and to cut zone Carry out front and rear consecutive frame extension and collaboration optimization;
(5) key feature points are determined in the scene as positioning reference point, and then estimate that camera perspective and camera motion are estimated Evaluation, recognizes that segmentation result carries out target signature consistency constraint and target location uniformity about by Chief frame of video Beam, estimates the collaboration confidence level of positioning target to be identified and carries out space and be accurately positioned.
Preferably, in one embodiment of the invention, above-mentioned steps (1) are specifically included:
(1.1) the RGB-D video sequences of scene where positioning target to be identified are gathered with Kinect, and it is flat to be sampled with neighborhood Sliding mode depth of cracking closure image cavity, is modified and is converted to real depth information, with RGB numbers according to Kinect parameters to it According to as input;
(1.2) when using binocular equipment gather as pair when, pass sequentially through camera calibration, Stereo matching (as to feature extraction, Same Physical structure corresponding points are extracted, calculate parallax) step, finally by projection model estimating depth as depth in video The input of passage.
Preferably, in one embodiment of the invention, above-mentioned steps (2) are specifically included:
(2.1) with interval sampling or key frame extraction method, it is determined that the key video sequence for recognizing positioning target to be identified Frame;
Wherein, step (2.1) is specifically included:Using quick Scale invariant features transform (Scale-invariant Feature transform, SIFT) Point matching method obtain consecutive frame scene Duplication, so as to estimate the field of current shooting Scape rate of change, frame of video faster is switched for photographed scene, improves sample frequency, and slower video is switched for photographed scene Frame, reduces sample frequency.Additionally, when practical application request is higher to efficiency of algorithm requirement, can directly use interval sampling side Method substitutes this step.
(2.2) using based on the target candidate in the confidence level sort method acquisition key video sequence frame like physical property priori Region constitutes object candidate area set, using the corresponding depth information of each key video sequence frame, obtains each object candidate area Hierarchy attributes in internal and its neighborhood, optimize screening, sort again to the object candidate area set.
Wherein, can be BING algorithms or Edge box algorithms based on the confidence level sort method like physical property priori.Such as Fig. 2 It is shown, the depth information of correspondence frame is recycled, the hierarchy attributes in object candidate area inside and its neighborhood are obtained, put according to height The principle that depth information is smooth, in-out-snap boundary depth information gradient is larger is answered inside the candidate frame of reliability, to target candidate Regional ensemble optimizes screening, sorts again.
Preferably, in one embodiment of the invention, above-mentioned steps (3) are specifically included:
(3.1) as shown in Fig. 2 the target trained by the object candidate area feeding after step (2) screening is known Other depth network, the target identification for obtaining the corresponding key video sequence frame of object candidate area after each screening predicts the outcome and each mesh The first confidence level that mark does not predict the outcome;
Wherein, the target identification depth network for having trained can be such as SPP-Net, R-CNN, Fast-R-CNN etc. deep Degree identification network, it is also possible to substituted by other depth recognition networks.
(3.2) the space time correlation constraint according to sequential long, the target identification of key video sequence frame is predicted the outcome carries out feature Conformance Assessment, evaluates the second confidence level that each target identification predicts the outcome, and will be obtained with the second confidence level by the first confidence level Accumulation confidence level be ranked up, further filter out the object candidate area that accumulation confidence level is less than default confidence threshold value.
Alternatively, in one embodiment of the invention, can be obtained to be identified by applying identification instruction to algorithm The detection recognition result of target is positioned, and by filtering low confidence recognition result boosting algorithm efficiency.
Alternatively, in one embodiment of the invention, above-mentioned steps (4) are specifically included:
(4.1) as shown in figure 3, the object candidate area obtained for step (3.2) and its extension neighborhood, are carried out quickly Target Segmentation is operated, and obtains the initial segmentation of target, determines object boundary;
Wherein, as a kind of optional implementation method, it is possible to use the GrabCut partitioning algorithms based on RGB-D information enter The quick Target Segmentation operation of row, obtains the initial segmentation of target, so as to obtain the two-dimensional localization of target in current video frame As a result.
(4.2) in order to further improve the efficiency that video object is positioned, as shown in figure 3, being for about with space-time consistency in short-term Beam, based on the accumulation confidence level ranking results in step (3.2), so that single frames recognition confidence is high, consecutive frame space-time consistency is strong It is criterion, Chief frame of video is filtered out from key video sequence frame;
(4.3) with it is long when space-time consistency be constraint, based on the initial segmentation of step (4.1), to positioning target to be identified Outward appearance modeling is carried out, 3-D graphic structure is carried out to Chief frame of video and its consecutive frame, and design maximum a posteriori probability-horse Er Kefu random field energy functions, cut algorithm and initial segmentation are optimized by figure, to the object segmentation result of single frames at this Segmentation extension is carried out before and after frame in consecutive frame, thus realize based on it is long-in short-term the two dimension target segmentation positioning of space-time consistency it is excellent Change.
Alternatively, in one embodiment of the invention, above-mentioned steps (5) are specifically included:
(5.1) as shown in figure 3, for step (4.2) obtain Chief frame of video, according to each Chief frame of video Between adjacent and visual field coincidence relation, extract multigroup same place point to as positioning reference point;
(5.2) the Chief frame of video overlapped according to the visual field estimates camera perspective change, and then by geometrical relationship, profit The movable information of camera is estimated with the depth information of positioning reference point point pair;
Wherein, the movable information of camera includes camera displacement and motion track.
(5.3) as shown in figure 3, according to the information that fathoms, the camera of positioning target to be identified in Chief frame of video Visual angle and the movable information of camera, evaluate the locus uniformity of positioning target to be identified in Chief frame of video;
(5.4) according to the result of step (4.3), the feature consistency of positioning target two dimension cut zone to be identified is evaluated, It is general that characteristic distance measurement and feature consistency evaluation are used for using the depth network extraction regional depth feature based on region;
(5.5) feature consistency by overall merit positioning target two dimension cut zone to be identified and locus one Cause property, determines the locus of positioning target to be identified.
In one embodiment of the invention, a kind of robot target identification based on RGB-D videos and positioning are disclosed System, the system includes:
Acquisition module, the RGB-D sequence of frames of video for obtaining scene where positioning target to be identified;
Filtering screening module, for extracting the key video sequence frame in the RGB-D sequence of frames of video, and regards to the key Frequency frame extracts object candidate area, and the object candidate area is filtered according to each key video sequence frame corresponding depth information Screening;
Confidence level order module, for being identified to the object candidate area after filtering screening based on depth network, is led to Long sequential space time correlation constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
Optimization module, for carrying out local Fast Segmentation to the object candidate area after filtering screening, according to target identification The timing intervals relation of the confidence level of result and each key video sequence frame, chooses Chief video from the key video sequence frame Frame, and front and rear consecutive frame extension and collaboration optimization are carried out to cut zone;
Locating module, for determine in the scene key feature points as positioning reference point, and then estimate camera perspective and Camera motion estimate, recognizes that segmentation result carries out target signature consistency constraint and target position by Chief frame of video Put consistency constraint, estimate the collaboration confidence level of positioning target to be identified and carry out space to be accurately positioned.
Wherein, the specific embodiment of each module is referred to the description of embodiment of the method, and the embodiment of the present invention will not be done Repeat.
As it will be easily appreciated by one skilled in the art that the foregoing is only presently preferred embodiments of the present invention, it is not used to The limitation present invention, all any modification, equivalent and improvement made within the spirit and principles in the present invention etc., all should include Within protection scope of the present invention.

Claims (6)

1. a kind of robot target identification and localization method based on RGB-D videos, it is characterised in that including:
(1) the RGB-D sequence of frames of video of scene where positioning target to be identified is obtained;
(2) the key video sequence frame in the RGB-D sequence of frames of video is extracted, and target candidate area is extracted to the key video sequence frame Domain, filtering screening is carried out according to the corresponding depth information of each key video sequence frame to the object candidate area;
(3) object candidate area after filtering screening is identified based on depth network, is constrained by sequential space time correlation long And multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
(4) local Fast Segmentation is carried out to the object candidate area after filtering screening, confidence level according to target identification result and The timing intervals relation of each key video sequence frame, chooses Chief frame of video from the key video sequence frame, and to cut zone Carry out front and rear consecutive frame extension and collaboration optimization;
(5) key feature points are determined in the scene as positioning reference point, and then estimate camera perspective and camera motion estimate, Recognize that segmentation result carries out target signature consistency constraint and target location consistency constraint by Chief frame of video, estimate Count the collaboration confidence level of positioning target to be identified and carry out space and be accurately positioned.
2. method according to claim 1, it is characterised in that the step (2) specifically includes:
(2.1) with interval sampling or key frame extraction method, it is determined that the key video sequence frame for recognizing positioning target to be identified;
(2.2) using based on the object candidate area in the confidence level sort method acquisition key video sequence frame like physical property priori Composition object candidate area set, using the corresponding depth information of each key video sequence frame, obtains the inside of each object candidate area And its hierarchy attributes in neighborhood, screening is optimized to the object candidate area set, is sorted again.
3. method according to claim 2, it is characterised in that the step (3) specifically includes:
(3.1) the target identification depth network that will have been trained by the object candidate area feeding after step (2) screening, obtains The target identification of the corresponding key video sequence frame of object candidate area after each screening predicts the outcome and each target identification predicts the outcome The first confidence level;
(3.2) the space time correlation constraint according to sequential long, the target identification of key video sequence frame is predicted the outcome, and it is consistent to carry out feature Property evaluate, evaluate the second confidence level that each target identification predicts the outcome, will be by first confidence level and second confidence level The accumulation confidence level for obtaining is ranked up, and further filters out the target candidate area that accumulation confidence level is less than default confidence threshold value Domain.
4. method according to claim 3, it is characterised in that the step (4) specifically includes:
(4.1) object candidate area and its extension neighborhood for being obtained for step (3.2), carry out quick Target Segmentation operation, The initial segmentation of target is obtained, object boundary is determined;
(4.2) it is constraint with space-time consistency in short-term, based on the accumulation confidence level ranking results in step (3.2), from the pass Chief frame of video is filtered out in key frame of video;
(4.3) with it is long when space-time consistency be constraint, based on the initial segmentation of step (4.1), positioning target to be identified is carried out Outward appearance is modeled, and carries out 3-D graphic structure to Chief frame of video and its consecutive frame, and design maximum a posteriori probability-Ma Erke Husband's random field energy function, cuts algorithm and initial segmentation is optimized by figure, to the object segmentation result of single frames before the frame Carry out splitting extension and optimization in consecutive frame afterwards.
5. method according to claim 4, it is characterised in that the step (5) specifically includes:
(5.1) the Chief frame of video obtained for step (4.2), according to adjacent between each Chief frame of video and regards Wild coincidence relation, extracts multigroup same place point to as positioning reference point;
(5.2) the Chief frame of video overlapped according to the visual field estimates camera perspective change, and then by geometrical relationship, using fixed The depth information of position reference point point pair estimates the movable information of camera;
(5.3) according to the information that fathoms of positioning target to be identified, camera perspective and camera in Chief frame of video Movable information, evaluates the locus uniformity of positioning target to be identified in Chief frame of video;
(5.4) according to the result of step (4.3), the feature consistency of positioning target two dimension cut zone to be identified is evaluated;
(5.5) feature consistency and locus by overall merit positioning target two dimension cut zone to be identified are consistent Property, determine the locus of positioning target to be identified.
6. a kind of robot target identification and alignment system based on RGB-D videos, it is characterised in that including:
Acquisition module, the RGB-D sequence of frames of video for obtaining scene where positioning target to be identified;
Filtering screening module, for extracting the key video sequence frame in the RGB-D sequence of frames of video, and to the key video sequence frame Object candidate area is extracted, sieves is carried out to the object candidate area according to the corresponding depth information of each key video sequence frame Choosing;
Confidence level order module, for being identified to the object candidate area after filtering screening based on depth network, by length Sequential space time correlation is constrained and multiframe identification Uniform estimates, and confidence level sequence is carried out to target identification result;
Optimization module, for carrying out local Fast Segmentation to the object candidate area after filtering screening, according to target identification result Confidence level and each key video sequence frame timing intervals relation, from the key video sequence frame choose Chief frame of video, and Front and rear consecutive frame extension and collaboration optimization are carried out to cut zone;
Locating module, for determining key feature points in the scene as positioning reference point, and then estimates camera perspective and camera Motion estimated values, recognize that segmentation result carries out target signature consistency constraint and target location one by Chief frame of video The constraint of cause property, estimates the collaboration confidence level of positioning target to be identified and carries out space and be accurately positioned.
CN201710078328.9A 2017-02-14 2017-02-14 Robot target identification and localization method and system based on RGB-D video Active CN106920250B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710078328.9A CN106920250B (en) 2017-02-14 2017-02-14 Robot target identification and localization method and system based on RGB-D video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710078328.9A CN106920250B (en) 2017-02-14 2017-02-14 Robot target identification and localization method and system based on RGB-D video

Publications (2)

Publication Number Publication Date
CN106920250A true CN106920250A (en) 2017-07-04
CN106920250B CN106920250B (en) 2019-08-13

Family

ID=59453597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710078328.9A Active CN106920250B (en) 2017-02-14 2017-02-14 Robot target identification and localization method and system based on RGB-D video

Country Status (1)

Country Link
CN (1) CN106920250B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108214487A (en) * 2017-12-16 2018-06-29 广西电网有限责任公司电力科学研究院 Based on the positioning of the robot target of binocular vision and laser radar and grasping means
CN108304808A (en) * 2018-02-06 2018-07-20 广东顺德西安交通大学研究院 A kind of monitor video method for checking object based on space time information Yu depth network
CN108460790A (en) * 2018-03-29 2018-08-28 西南科技大学 A kind of visual tracking method based on consistency fallout predictor model
CN108627816A (en) * 2018-02-28 2018-10-09 沈阳上博智像科技有限公司 Image distance measuring method, device, storage medium and electronic equipment
CN108981698A (en) * 2018-05-29 2018-12-11 杭州视氪科技有限公司 A kind of vision positioning method based on multi-modal data
CN109977981A (en) * 2017-12-27 2019-07-05 深圳市优必选科技有限公司 Scene analysis method based on binocular vision, robot and storage device
CN110675421A (en) * 2019-08-30 2020-01-10 电子科技大学 Depth image collaborative segmentation method based on few labeling frames
CN115091472A (en) * 2022-08-26 2022-09-23 珠海市南特金属科技股份有限公司 Target positioning method based on artificial intelligence and clamping manipulator control system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110013807A1 (en) * 2009-07-17 2011-01-20 Samsung Electronics Co., Ltd. Apparatus and method for recognizing subject motion using a camera
CN104598890A (en) * 2015-01-30 2015-05-06 南京邮电大学 Human body behavior recognizing method based on RGB-D video
CN104867161A (en) * 2015-05-14 2015-08-26 国家电网公司 Video-processing method and device
US20160132754A1 (en) * 2012-05-25 2016-05-12 The Johns Hopkins University Integrated real-time tracking system for normal and anomaly tracking and the methods therefor
CN105589974A (en) * 2016-02-04 2016-05-18 通号通信信息集团有限公司 Surveillance video retrieval method and system based on Hadoop platform
CN105931270A (en) * 2016-04-27 2016-09-07 石家庄铁道大学 Video keyframe extraction method based on movement trajectory analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110013807A1 (en) * 2009-07-17 2011-01-20 Samsung Electronics Co., Ltd. Apparatus and method for recognizing subject motion using a camera
US20160132754A1 (en) * 2012-05-25 2016-05-12 The Johns Hopkins University Integrated real-time tracking system for normal and anomaly tracking and the methods therefor
CN104598890A (en) * 2015-01-30 2015-05-06 南京邮电大学 Human body behavior recognizing method based on RGB-D video
CN104867161A (en) * 2015-05-14 2015-08-26 国家电网公司 Video-processing method and device
CN105589974A (en) * 2016-02-04 2016-05-18 通号通信信息集团有限公司 Surveillance video retrieval method and system based on Hadoop platform
CN105931270A (en) * 2016-04-27 2016-09-07 石家庄铁道大学 Video keyframe extraction method based on movement trajectory analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHANG, ZHIGUO, LIU, LIMAN, TAO, WENBING等: "Confidence-driven infrared target detection", 《INFRARED PHYSICS & TECHNOLOGY》 *
ZHONGWEI GUO等: "Battlefield Video Target Mining", 《INTERNATIONAL CONGRESS ON IMAGE & SIGNAL PROCESSING》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108214487A (en) * 2017-12-16 2018-06-29 广西电网有限责任公司电力科学研究院 Based on the positioning of the robot target of binocular vision and laser radar and grasping means
CN109977981A (en) * 2017-12-27 2019-07-05 深圳市优必选科技有限公司 Scene analysis method based on binocular vision, robot and storage device
CN109977981B (en) * 2017-12-27 2020-11-24 深圳市优必选科技有限公司 Scene analysis method based on binocular vision, robot and storage device
CN108304808A (en) * 2018-02-06 2018-07-20 广东顺德西安交通大学研究院 A kind of monitor video method for checking object based on space time information Yu depth network
CN108304808B (en) * 2018-02-06 2021-08-17 广东顺德西安交通大学研究院 Monitoring video object detection method based on temporal-spatial information and deep network
CN108627816A (en) * 2018-02-28 2018-10-09 沈阳上博智像科技有限公司 Image distance measuring method, device, storage medium and electronic equipment
CN108460790A (en) * 2018-03-29 2018-08-28 西南科技大学 A kind of visual tracking method based on consistency fallout predictor model
CN108981698A (en) * 2018-05-29 2018-12-11 杭州视氪科技有限公司 A kind of vision positioning method based on multi-modal data
CN108981698B (en) * 2018-05-29 2020-07-14 杭州视氪科技有限公司 Visual positioning method based on multi-mode data
CN110675421A (en) * 2019-08-30 2020-01-10 电子科技大学 Depth image collaborative segmentation method based on few labeling frames
CN110675421B (en) * 2019-08-30 2022-03-15 电子科技大学 Depth image collaborative segmentation method based on few labeling frames
CN115091472A (en) * 2022-08-26 2022-09-23 珠海市南特金属科技股份有限公司 Target positioning method based on artificial intelligence and clamping manipulator control system

Also Published As

Publication number Publication date
CN106920250B (en) 2019-08-13

Similar Documents

Publication Publication Date Title
CN106920250B (en) Robot target identification and localization method and system based on RGB-D video
Čech et al. Scene flow estimation by growing correspondence seeds
KR101788225B1 (en) Method and System for Recognition/Tracking Construction Equipment and Workers Using Construction-Site-Customized Image Processing
CN109784130B (en) Pedestrian re-identification method, device and equipment thereof
CN103458261B (en) Video scene variation detection method based on stereoscopic vision
CN104517095B (en) A kind of number of people dividing method based on depth image
CN103164858A (en) Adhered crowd segmenting and tracking methods based on superpixel and graph model
CN107560592A (en) A kind of precision ranging method for optronic tracker linkage target
TWI686748B (en) People-flow analysis system and people-flow analysis method
CN110264493A (en) A kind of multiple target object tracking method and device under motion state
KR101139389B1 (en) Video Analysing Apparatus and Method Using Stereo Cameras
WO2024114119A1 (en) Sensor fusion method based on binocular camera guidance
US11645777B2 (en) Multi-view positioning using reflections
US8989481B2 (en) Stereo matching device and method for determining concave block and convex block
CN112633096B (en) Passenger flow monitoring method and device, electronic equipment and storage medium
Nair Camera-based object detection, identification and distance estimation
CN110415297A (en) Localization method, device and unmanned equipment
CN117456114B (en) Multi-view-based three-dimensional image reconstruction method and system
RU2370817C2 (en) System and method for object tracking
CN114022531A (en) Image processing method, electronic device, and storage medium
CN103679699A (en) Stereo matching method based on translation and combined measurement of salient images
CN108090930A (en) Barrier vision detection system and method based on binocular solid camera
CN112767452B (en) Active sensing method and system for camera
JP6548306B2 (en) Image analysis apparatus, program and method for tracking a person appearing in a captured image of a camera
CN110473246B (en) Distance measurement method of multiple shielding targets based on binocular vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant