CN106920250A - Robot target identification and localization method and system based on RGB D videos - Google Patents
Robot target identification and localization method and system based on RGB D videos Download PDFInfo
- Publication number
- CN106920250A CN106920250A CN201710078328.9A CN201710078328A CN106920250A CN 106920250 A CN106920250 A CN 106920250A CN 201710078328 A CN201710078328 A CN 201710078328A CN 106920250 A CN106920250 A CN 106920250A
- Authority
- CN
- China
- Prior art keywords
- target
- frame
- video
- candidate area
- confidence level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a kind of robot target identification based on RGB D videos and localization method and system, by steps such as target candidate extraction, identification, the reliability estimating based on temporal consistency, Target Segmentation optimization, location estimations, target classification is determined in the scene and accurate locus positioning is obtained.Depth information of scene is utilized in the present invention, enhance identification and perceive ability with the spatial level of location algorithm, space-time consistency constraint during by using the length for being based on key frame, while Video processing efficiency is improved, it is ensured that the homogeneity and relevance of target in sequential target identification long and location tasks.In position fixing process, by the Accurate Segmentation target in plane space and the location consistency in the same target of depth information evaluation space, the collaboration target positioning in multi information mode is realized.Amount of calculation is small, and real-time is good, and identification is high with positioning precision, can be applied to be parsed based on online visual information the robot task of understanding technology.
Description
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of robot mesh based on RGB-D videos
Mark not with localization method and system.
Background technology
In recent years, with the fast development of robot technology, the machine vision technique of object manipulator task is also obtained
The extensive concern of researcher.Wherein, target identification be accurately positioned be robot vision problem an important ring, be perform after
The precondition of continuous task.
Existing target identification method generally comprise extraction target information to be identified as basis of characterization and with field to be identified
Two steps of matching of scape.The expression of traditional target to be identified generally comprises geometry, target appearance, extracts local feature
Etc. method, often there is the deficiencies such as poor universality, stability deficiency, target abstracting capabilities difference in this kind of method.Above object table
The defect for reaching also brings the difficulty for being difficult to overcome to follow-up matching process.
After obtaining the expression of target to be identified, object matching refers to that will obtain the objective expression to enter with scene characteristic to be identified
Row compares, to recognize target.Generally speaking, existing method includes the two class methods based on Region Matching and characteristic matching.Base
Matching in region refers to that the information for extracting image local subregion is compared, its amount of calculation and subregion number to be matched
It is directly proportional;The method of feature based is matched to the characteristic feature in image, its matching accuracy rate and feature representation validity
It is closely related.The class method of the above two proposes requirement higher to the acquisition of candidate region and feature representation, but due to two dimension
The limitation of plane picture information and design feature, often effect is poor in the complex environment identification mission of object manipulator.
Target positioning is widely present in industrial production life, GPS, military radar monitoring such as in outdoor exercises, naval vessels
Sonar etc., this kind equipment accurate positioning, operation distance range are very wide, but price is high.The alignment system of view-based access control model
It is study hotspot new in recent years.According to the difference of vision sensor, be broadly divided into based on monocular vision sensor, binocular and
The localization method of depth transducer, panoramic vision sensor.Monocular vision sensor price is low, simple structure, be easy to demarcate, but
Positioning precision is often poor;Panoramic vision sensor can obtain complete scene information, and positioning precision is higher, but computationally intensive,
Real-time is poor, equipment complex and expensive;Estimation of Depth or depth information collecting device based on binocular vision are to scene distance sense
Know that ability is stronger, and system is relatively simple, real-time is easily achieved, and the concern being subject in recent years is also more and more.But this neck
The research in domain is still at an early stage, still lack at present it is efficient, can real-time processing RGB-Depth videos target positioning side
Method.
Due to having demand higher for depth information perception, therefore existing robot system is gathered mostly
RGB-Depth videos are originated as visual information, depth information is the three-dimensional perception of scene, the level of complex target is divided,
Positioning provides abundant information.However, because the complexity of robot operative scenario, computation complexity are higher, operand compared with
Greatly, not yet there are system, the quickly and easily identification of RGB-Depth video objects and accurate positioning method at present.Therefore, research is based on
The Indoor Robot target identification of RGB-Depth videos does not only have very strong researching value with Precision Orientation Algorithm, and has
Boundless application prospect.
The content of the invention
For the disadvantages described above or Improvement requirement of prior art, the invention provides a kind of machine based on RGB-D videos
People's target identification and localization method and system, the RGB-Depth videos obtained by the visual angle of handling machine people first are realized real-time
, accurate target identification, and precise positioning of the target in robot working environment, so that the complexity such as auxiliary mark crawl
Robot task.Thus solve to lack at present it is efficient, can real-time processing RGB-Depth videos object localization method technology
Problem.
To achieve the above object, according to one aspect of the present invention, there is provided a kind of robot mesh based on RGB-D videos
Mark not and localization method, including:
(1) the RGB-D sequence of frames of video of scene where positioning target to be identified is obtained;
(2) the key video sequence frame in the RGB-D sequence of frames of video is extracted, and target is extracted to the key video sequence frame and waited
Favored area, filtering screening is carried out according to the corresponding depth information of each key video sequence frame to the object candidate area;
(3) object candidate area after filtering screening is identified based on depth network, by sequential space time correlation long
Constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
(4) local Fast Segmentation is carried out to the object candidate area after filtering screening, according to the confidence of target identification result
The timing intervals relation of degree and each key video sequence frame, chooses Chief frame of video from the key video sequence frame, and to segmentation
Region carries out front and rear consecutive frame extension and collaboration optimization;
(5) key feature points are determined in the scene as positioning reference point, and then estimate that camera perspective and camera motion are estimated
Evaluation, recognizes that segmentation result carries out target signature consistency constraint and target location uniformity about by Chief frame of video
Beam, estimates the collaboration confidence level of positioning target to be identified and carries out space and be accurately positioned.
Preferably, the step (2) specifically includes:
(2.1) with interval sampling or key frame extraction method, it is determined that the key video sequence for recognizing positioning target to be identified
Frame;
(2.2) using based on the target candidate in the confidence level sort method acquisition key video sequence frame like physical property priori
Region constitutes object candidate area set, using the corresponding depth information of each key video sequence frame, obtains each object candidate area
Hierarchy attributes in internal and its neighborhood, optimize screening, sort again to the object candidate area set.
Preferably, the step (3) specifically includes:
(3.1) the target identification depth network that will have been trained by the object candidate area feeding after step (2) screening,
The target identification for obtaining the corresponding key video sequence frame of the object candidate area after each screening predicts the outcome and the prediction of each target identification
First confidence level of result;
(3.2) the space time correlation constraint according to sequential long, the target identification of key video sequence frame is predicted the outcome carries out feature
Conformance Assessment, evaluates the second confidence level that each target identification predicts the outcome, and will be put with described second by first confidence level
The accumulation confidence level that reliability is obtained is ranked up, and further filters out the target time that accumulation confidence level is less than default confidence threshold value
Favored area.
Preferably, the step (4) specifically includes:
(4.1) object candidate area and its extension neighborhood for being obtained for step (3.2), carry out quick Target Segmentation behaviour
Make, obtain the initial segmentation of target, determine object boundary;
(4.2) it is constraint with space-time consistency in short-term, based on the accumulation confidence level ranking results in step (3.2), from institute
State and filter out Chief frame of video in key video sequence frame;
(4.3) with it is long when space-time consistency be constraint, based on the initial segmentation of step (4.1), to positioning target to be identified
Outward appearance modeling is carried out, 3-D graphic structure is carried out to Chief frame of video and its consecutive frame, and design maximum a posteriori probability-horse
Er Kefu random field energy functions, cut algorithm and initial segmentation are optimized by figure, to the object segmentation result of single frames at this
Carry out splitting extension and optimization in consecutive frame before and after frame.
Preferably, the step (5) specifically includes:
(5.1) the Chief frame of video obtained for step (4.2), according to adjacent between each Chief frame of video
And visual field coincidence relation, multigroup same place point is extracted to as positioning reference point;
(5.2) the Chief frame of video overlapped according to the visual field estimates camera perspective change, and then by geometrical relationship, profit
The movable information of camera is estimated with the depth information of positioning reference point point pair;
(5.3) according to the information that fathoms of positioning target to be identified, camera perspective and phase in Chief frame of video
The movable information of machine, evaluates the locus uniformity of positioning target to be identified in Chief frame of video;
(5.4) according to the result of step (4.3), the feature consistency of positioning target two dimension cut zone to be identified is evaluated;
(5.5) feature consistency by overall merit positioning target two dimension cut zone to be identified and locus one
Cause property, determines the locus of positioning target to be identified.
It is another aspect of this invention to provide that being with positioning there is provided a kind of robot target identification based on RGB-D videos
System, including:
Acquisition module, the RGB-D sequence of frames of video for obtaining scene where positioning target to be identified;
Filtering screening module, for extracting the key video sequence frame in the RGB-D sequence of frames of video, and regards to the key
Frequency frame extracts object candidate area, and the object candidate area is filtered according to each key video sequence frame corresponding depth information
Screening;
Confidence level order module, for being identified to the object candidate area after filtering screening based on depth network, is led to
Long sequential space time correlation constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
Optimization module, for carrying out local Fast Segmentation to the object candidate area after filtering screening, according to target identification
The timing intervals relation of the confidence level of result and each key video sequence frame, chooses Chief video from the key video sequence frame
Frame, and front and rear consecutive frame extension and collaboration optimization are carried out to cut zone;
Locating module, for determine in the scene key feature points as positioning reference point, and then estimate camera perspective and
Camera motion estimate, recognizes that segmentation result carries out target signature consistency constraint and target position by Chief frame of video
Put consistency constraint, estimate the collaboration confidence level of positioning target to be identified and carry out space to be accurately positioned.
In general, there is following skill compared with prior art, mainly by the contemplated above technical scheme of the present invention
Art advantage:Depth information of scene is utilized in the present invention, identification is enhanced and is perceived ability with the spatial level of location algorithm, by adopting
Space-time consistency constraint during with length based on key frame, while Video processing efficiency is improved, it is ensured that sequential target long
The homogeneity and relevance of target in identification and location tasks.In position fixing process, by the Accurate Segmentation mesh in plane space
It is marked with and in the location consistency of the same target of depth information evaluation space, the collaboration target realized in multi information mode is determined
Position.Amount of calculation is small, and real-time is good, and identification is high with positioning precision, can be applied to parse understanding technology based on online visual information
Robot task.
Brief description of the drawings
Fig. 1 is the overall procedure schematic diagram of present invention method;
Fig. 2 is the schematic flow sheet of target identification in the embodiment of the present invention;
Fig. 3 is the schematic flow sheet of targeting accuracy positioning in the embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as additionally, technical characteristic involved in invention described below each implementation method
Not constituting conflict each other can just be mutually combined.
Method disclosed by the invention is related to key frame screening, the target identification based on depth network, segmentation, mark interframe to pass
Pass, the location estimation based on consistency constraint and collaboration optimization etc. technology, can be directly used for RGB-D videos is defeated visual information
In the robot system for entering, auxiliary robot completes target identification and targeting accuracy location tasks.
It is as shown in Figure 1 the overall procedure schematic diagram of present invention method.It will be seen from figure 1 that this method is included
Target identification is accurately positioned two big steps with target, and target identification is the precondition of targeting accuracy positioning.Its specific embodiment party
Formula is as follows:
(1) the RGB-D sequence of frames of video of scene where positioning target to be identified is obtained;
Preferably, in an embodiment of the invention, can be gathered by Kinect even depth vision sensor and treated
The RGB-D video sequences of scene where identification positioning target;RGB pictures pair can also be gathered by binocular imaging apparatus, and passed through
Disparity estimation depth information of scene is calculated as depth channel informations, so as to synthesize RGB-D videos as input.
(2) the key video sequence frame in RGB-D sequence of frames of video is extracted, and object candidate area is extracted to key video sequence frame,
Filtering screening is carried out to object candidate area according to the corresponding depth information of each key video sequence frame;
(3) object candidate area after filtering screening is identified based on depth network, by sequential space time correlation long
Constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
(4) local Fast Segmentation is carried out to the object candidate area after filtering screening, according to the confidence of target identification result
The timing intervals relation of degree and each key video sequence frame, chooses Chief frame of video from key video sequence frame, and to cut zone
Carry out front and rear consecutive frame extension and collaboration optimization;
(5) key feature points are determined in the scene as positioning reference point, and then estimate that camera perspective and camera motion are estimated
Evaluation, recognizes that segmentation result carries out target signature consistency constraint and target location uniformity about by Chief frame of video
Beam, estimates the collaboration confidence level of positioning target to be identified and carries out space and be accurately positioned.
Preferably, in one embodiment of the invention, above-mentioned steps (1) are specifically included:
(1.1) the RGB-D video sequences of scene where positioning target to be identified are gathered with Kinect, and it is flat to be sampled with neighborhood
Sliding mode depth of cracking closure image cavity, is modified and is converted to real depth information, with RGB numbers according to Kinect parameters to it
According to as input;
(1.2) when using binocular equipment gather as pair when, pass sequentially through camera calibration, Stereo matching (as to feature extraction,
Same Physical structure corresponding points are extracted, calculate parallax) step, finally by projection model estimating depth as depth in video
The input of passage.
Preferably, in one embodiment of the invention, above-mentioned steps (2) are specifically included:
(2.1) with interval sampling or key frame extraction method, it is determined that the key video sequence for recognizing positioning target to be identified
Frame;
Wherein, step (2.1) is specifically included:Using quick Scale invariant features transform (Scale-invariant
Feature transform, SIFT) Point matching method obtain consecutive frame scene Duplication, so as to estimate the field of current shooting
Scape rate of change, frame of video faster is switched for photographed scene, improves sample frequency, and slower video is switched for photographed scene
Frame, reduces sample frequency.Additionally, when practical application request is higher to efficiency of algorithm requirement, can directly use interval sampling side
Method substitutes this step.
(2.2) using based on the target candidate in the confidence level sort method acquisition key video sequence frame like physical property priori
Region constitutes object candidate area set, using the corresponding depth information of each key video sequence frame, obtains each object candidate area
Hierarchy attributes in internal and its neighborhood, optimize screening, sort again to the object candidate area set.
Wherein, can be BING algorithms or Edge box algorithms based on the confidence level sort method like physical property priori.Such as Fig. 2
It is shown, the depth information of correspondence frame is recycled, the hierarchy attributes in object candidate area inside and its neighborhood are obtained, put according to height
The principle that depth information is smooth, in-out-snap boundary depth information gradient is larger is answered inside the candidate frame of reliability, to target candidate
Regional ensemble optimizes screening, sorts again.
Preferably, in one embodiment of the invention, above-mentioned steps (3) are specifically included:
(3.1) as shown in Fig. 2 the target trained by the object candidate area feeding after step (2) screening is known
Other depth network, the target identification for obtaining the corresponding key video sequence frame of object candidate area after each screening predicts the outcome and each mesh
The first confidence level that mark does not predict the outcome;
Wherein, the target identification depth network for having trained can be such as SPP-Net, R-CNN, Fast-R-CNN etc. deep
Degree identification network, it is also possible to substituted by other depth recognition networks.
(3.2) the space time correlation constraint according to sequential long, the target identification of key video sequence frame is predicted the outcome carries out feature
Conformance Assessment, evaluates the second confidence level that each target identification predicts the outcome, and will be obtained with the second confidence level by the first confidence level
Accumulation confidence level be ranked up, further filter out the object candidate area that accumulation confidence level is less than default confidence threshold value.
Alternatively, in one embodiment of the invention, can be obtained to be identified by applying identification instruction to algorithm
The detection recognition result of target is positioned, and by filtering low confidence recognition result boosting algorithm efficiency.
Alternatively, in one embodiment of the invention, above-mentioned steps (4) are specifically included:
(4.1) as shown in figure 3, the object candidate area obtained for step (3.2) and its extension neighborhood, are carried out quickly
Target Segmentation is operated, and obtains the initial segmentation of target, determines object boundary;
Wherein, as a kind of optional implementation method, it is possible to use the GrabCut partitioning algorithms based on RGB-D information enter
The quick Target Segmentation operation of row, obtains the initial segmentation of target, so as to obtain the two-dimensional localization of target in current video frame
As a result.
(4.2) in order to further improve the efficiency that video object is positioned, as shown in figure 3, being for about with space-time consistency in short-term
Beam, based on the accumulation confidence level ranking results in step (3.2), so that single frames recognition confidence is high, consecutive frame space-time consistency is strong
It is criterion, Chief frame of video is filtered out from key video sequence frame;
(4.3) with it is long when space-time consistency be constraint, based on the initial segmentation of step (4.1), to positioning target to be identified
Outward appearance modeling is carried out, 3-D graphic structure is carried out to Chief frame of video and its consecutive frame, and design maximum a posteriori probability-horse
Er Kefu random field energy functions, cut algorithm and initial segmentation are optimized by figure, to the object segmentation result of single frames at this
Segmentation extension is carried out before and after frame in consecutive frame, thus realize based on it is long-in short-term the two dimension target segmentation positioning of space-time consistency it is excellent
Change.
Alternatively, in one embodiment of the invention, above-mentioned steps (5) are specifically included:
(5.1) as shown in figure 3, for step (4.2) obtain Chief frame of video, according to each Chief frame of video
Between adjacent and visual field coincidence relation, extract multigroup same place point to as positioning reference point;
(5.2) the Chief frame of video overlapped according to the visual field estimates camera perspective change, and then by geometrical relationship, profit
The movable information of camera is estimated with the depth information of positioning reference point point pair;
Wherein, the movable information of camera includes camera displacement and motion track.
(5.3) as shown in figure 3, according to the information that fathoms, the camera of positioning target to be identified in Chief frame of video
Visual angle and the movable information of camera, evaluate the locus uniformity of positioning target to be identified in Chief frame of video;
(5.4) according to the result of step (4.3), the feature consistency of positioning target two dimension cut zone to be identified is evaluated,
It is general that characteristic distance measurement and feature consistency evaluation are used for using the depth network extraction regional depth feature based on region;
(5.5) feature consistency by overall merit positioning target two dimension cut zone to be identified and locus one
Cause property, determines the locus of positioning target to be identified.
In one embodiment of the invention, a kind of robot target identification based on RGB-D videos and positioning are disclosed
System, the system includes:
Acquisition module, the RGB-D sequence of frames of video for obtaining scene where positioning target to be identified;
Filtering screening module, for extracting the key video sequence frame in the RGB-D sequence of frames of video, and regards to the key
Frequency frame extracts object candidate area, and the object candidate area is filtered according to each key video sequence frame corresponding depth information
Screening;
Confidence level order module, for being identified to the object candidate area after filtering screening based on depth network, is led to
Long sequential space time correlation constraint and multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
Optimization module, for carrying out local Fast Segmentation to the object candidate area after filtering screening, according to target identification
The timing intervals relation of the confidence level of result and each key video sequence frame, chooses Chief video from the key video sequence frame
Frame, and front and rear consecutive frame extension and collaboration optimization are carried out to cut zone;
Locating module, for determine in the scene key feature points as positioning reference point, and then estimate camera perspective and
Camera motion estimate, recognizes that segmentation result carries out target signature consistency constraint and target position by Chief frame of video
Put consistency constraint, estimate the collaboration confidence level of positioning target to be identified and carry out space to be accurately positioned.
Wherein, the specific embodiment of each module is referred to the description of embodiment of the method, and the embodiment of the present invention will not be done
Repeat.
As it will be easily appreciated by one skilled in the art that the foregoing is only presently preferred embodiments of the present invention, it is not used to
The limitation present invention, all any modification, equivalent and improvement made within the spirit and principles in the present invention etc., all should include
Within protection scope of the present invention.
Claims (6)
1. a kind of robot target identification and localization method based on RGB-D videos, it is characterised in that including:
(1) the RGB-D sequence of frames of video of scene where positioning target to be identified is obtained;
(2) the key video sequence frame in the RGB-D sequence of frames of video is extracted, and target candidate area is extracted to the key video sequence frame
Domain, filtering screening is carried out according to the corresponding depth information of each key video sequence frame to the object candidate area;
(3) object candidate area after filtering screening is identified based on depth network, is constrained by sequential space time correlation long
And multiframe identification Uniform estimates, confidence level sequence is carried out to target identification result;
(4) local Fast Segmentation is carried out to the object candidate area after filtering screening, confidence level according to target identification result and
The timing intervals relation of each key video sequence frame, chooses Chief frame of video from the key video sequence frame, and to cut zone
Carry out front and rear consecutive frame extension and collaboration optimization;
(5) key feature points are determined in the scene as positioning reference point, and then estimate camera perspective and camera motion estimate,
Recognize that segmentation result carries out target signature consistency constraint and target location consistency constraint by Chief frame of video, estimate
Count the collaboration confidence level of positioning target to be identified and carry out space and be accurately positioned.
2. method according to claim 1, it is characterised in that the step (2) specifically includes:
(2.1) with interval sampling or key frame extraction method, it is determined that the key video sequence frame for recognizing positioning target to be identified;
(2.2) using based on the object candidate area in the confidence level sort method acquisition key video sequence frame like physical property priori
Composition object candidate area set, using the corresponding depth information of each key video sequence frame, obtains the inside of each object candidate area
And its hierarchy attributes in neighborhood, screening is optimized to the object candidate area set, is sorted again.
3. method according to claim 2, it is characterised in that the step (3) specifically includes:
(3.1) the target identification depth network that will have been trained by the object candidate area feeding after step (2) screening, obtains
The target identification of the corresponding key video sequence frame of object candidate area after each screening predicts the outcome and each target identification predicts the outcome
The first confidence level;
(3.2) the space time correlation constraint according to sequential long, the target identification of key video sequence frame is predicted the outcome, and it is consistent to carry out feature
Property evaluate, evaluate the second confidence level that each target identification predicts the outcome, will be by first confidence level and second confidence level
The accumulation confidence level for obtaining is ranked up, and further filters out the target candidate area that accumulation confidence level is less than default confidence threshold value
Domain.
4. method according to claim 3, it is characterised in that the step (4) specifically includes:
(4.1) object candidate area and its extension neighborhood for being obtained for step (3.2), carry out quick Target Segmentation operation,
The initial segmentation of target is obtained, object boundary is determined;
(4.2) it is constraint with space-time consistency in short-term, based on the accumulation confidence level ranking results in step (3.2), from the pass
Chief frame of video is filtered out in key frame of video;
(4.3) with it is long when space-time consistency be constraint, based on the initial segmentation of step (4.1), positioning target to be identified is carried out
Outward appearance is modeled, and carries out 3-D graphic structure to Chief frame of video and its consecutive frame, and design maximum a posteriori probability-Ma Erke
Husband's random field energy function, cuts algorithm and initial segmentation is optimized by figure, to the object segmentation result of single frames before the frame
Carry out splitting extension and optimization in consecutive frame afterwards.
5. method according to claim 4, it is characterised in that the step (5) specifically includes:
(5.1) the Chief frame of video obtained for step (4.2), according to adjacent between each Chief frame of video and regards
Wild coincidence relation, extracts multigroup same place point to as positioning reference point;
(5.2) the Chief frame of video overlapped according to the visual field estimates camera perspective change, and then by geometrical relationship, using fixed
The depth information of position reference point point pair estimates the movable information of camera;
(5.3) according to the information that fathoms of positioning target to be identified, camera perspective and camera in Chief frame of video
Movable information, evaluates the locus uniformity of positioning target to be identified in Chief frame of video;
(5.4) according to the result of step (4.3), the feature consistency of positioning target two dimension cut zone to be identified is evaluated;
(5.5) feature consistency and locus by overall merit positioning target two dimension cut zone to be identified are consistent
Property, determine the locus of positioning target to be identified.
6. a kind of robot target identification and alignment system based on RGB-D videos, it is characterised in that including:
Acquisition module, the RGB-D sequence of frames of video for obtaining scene where positioning target to be identified;
Filtering screening module, for extracting the key video sequence frame in the RGB-D sequence of frames of video, and to the key video sequence frame
Object candidate area is extracted, sieves is carried out to the object candidate area according to the corresponding depth information of each key video sequence frame
Choosing;
Confidence level order module, for being identified to the object candidate area after filtering screening based on depth network, by length
Sequential space time correlation is constrained and multiframe identification Uniform estimates, and confidence level sequence is carried out to target identification result;
Optimization module, for carrying out local Fast Segmentation to the object candidate area after filtering screening, according to target identification result
Confidence level and each key video sequence frame timing intervals relation, from the key video sequence frame choose Chief frame of video, and
Front and rear consecutive frame extension and collaboration optimization are carried out to cut zone;
Locating module, for determining key feature points in the scene as positioning reference point, and then estimates camera perspective and camera
Motion estimated values, recognize that segmentation result carries out target signature consistency constraint and target location one by Chief frame of video
The constraint of cause property, estimates the collaboration confidence level of positioning target to be identified and carries out space and be accurately positioned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710078328.9A CN106920250B (en) | 2017-02-14 | 2017-02-14 | Robot target identification and localization method and system based on RGB-D video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710078328.9A CN106920250B (en) | 2017-02-14 | 2017-02-14 | Robot target identification and localization method and system based on RGB-D video |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106920250A true CN106920250A (en) | 2017-07-04 |
CN106920250B CN106920250B (en) | 2019-08-13 |
Family
ID=59453597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710078328.9A Active CN106920250B (en) | 2017-02-14 | 2017-02-14 | Robot target identification and localization method and system based on RGB-D video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106920250B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108214487A (en) * | 2017-12-16 | 2018-06-29 | 广西电网有限责任公司电力科学研究院 | Based on the positioning of the robot target of binocular vision and laser radar and grasping means |
CN108304808A (en) * | 2018-02-06 | 2018-07-20 | 广东顺德西安交通大学研究院 | A kind of monitor video method for checking object based on space time information Yu depth network |
CN108460790A (en) * | 2018-03-29 | 2018-08-28 | 西南科技大学 | A kind of visual tracking method based on consistency fallout predictor model |
CN108627816A (en) * | 2018-02-28 | 2018-10-09 | 沈阳上博智像科技有限公司 | Image distance measuring method, device, storage medium and electronic equipment |
CN108981698A (en) * | 2018-05-29 | 2018-12-11 | 杭州视氪科技有限公司 | A kind of vision positioning method based on multi-modal data |
CN109977981A (en) * | 2017-12-27 | 2019-07-05 | 深圳市优必选科技有限公司 | Scene analysis method based on binocular vision, robot and storage device |
CN110675421A (en) * | 2019-08-30 | 2020-01-10 | 电子科技大学 | Depth image collaborative segmentation method based on few labeling frames |
CN115091472A (en) * | 2022-08-26 | 2022-09-23 | 珠海市南特金属科技股份有限公司 | Target positioning method based on artificial intelligence and clamping manipulator control system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110013807A1 (en) * | 2009-07-17 | 2011-01-20 | Samsung Electronics Co., Ltd. | Apparatus and method for recognizing subject motion using a camera |
CN104598890A (en) * | 2015-01-30 | 2015-05-06 | 南京邮电大学 | Human body behavior recognizing method based on RGB-D video |
CN104867161A (en) * | 2015-05-14 | 2015-08-26 | 国家电网公司 | Video-processing method and device |
US20160132754A1 (en) * | 2012-05-25 | 2016-05-12 | The Johns Hopkins University | Integrated real-time tracking system for normal and anomaly tracking and the methods therefor |
CN105589974A (en) * | 2016-02-04 | 2016-05-18 | 通号通信信息集团有限公司 | Surveillance video retrieval method and system based on Hadoop platform |
CN105931270A (en) * | 2016-04-27 | 2016-09-07 | 石家庄铁道大学 | Video keyframe extraction method based on movement trajectory analysis |
-
2017
- 2017-02-14 CN CN201710078328.9A patent/CN106920250B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110013807A1 (en) * | 2009-07-17 | 2011-01-20 | Samsung Electronics Co., Ltd. | Apparatus and method for recognizing subject motion using a camera |
US20160132754A1 (en) * | 2012-05-25 | 2016-05-12 | The Johns Hopkins University | Integrated real-time tracking system for normal and anomaly tracking and the methods therefor |
CN104598890A (en) * | 2015-01-30 | 2015-05-06 | 南京邮电大学 | Human body behavior recognizing method based on RGB-D video |
CN104867161A (en) * | 2015-05-14 | 2015-08-26 | 国家电网公司 | Video-processing method and device |
CN105589974A (en) * | 2016-02-04 | 2016-05-18 | 通号通信信息集团有限公司 | Surveillance video retrieval method and system based on Hadoop platform |
CN105931270A (en) * | 2016-04-27 | 2016-09-07 | 石家庄铁道大学 | Video keyframe extraction method based on movement trajectory analysis |
Non-Patent Citations (2)
Title |
---|
ZHANG, ZHIGUO, LIU, LIMAN, TAO, WENBING等: "Confidence-driven infrared target detection", 《INFRARED PHYSICS & TECHNOLOGY》 * |
ZHONGWEI GUO等: "Battlefield Video Target Mining", 《INTERNATIONAL CONGRESS ON IMAGE & SIGNAL PROCESSING》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108214487A (en) * | 2017-12-16 | 2018-06-29 | 广西电网有限责任公司电力科学研究院 | Based on the positioning of the robot target of binocular vision and laser radar and grasping means |
CN109977981A (en) * | 2017-12-27 | 2019-07-05 | 深圳市优必选科技有限公司 | Scene analysis method based on binocular vision, robot and storage device |
CN109977981B (en) * | 2017-12-27 | 2020-11-24 | 深圳市优必选科技有限公司 | Scene analysis method based on binocular vision, robot and storage device |
CN108304808A (en) * | 2018-02-06 | 2018-07-20 | 广东顺德西安交通大学研究院 | A kind of monitor video method for checking object based on space time information Yu depth network |
CN108304808B (en) * | 2018-02-06 | 2021-08-17 | 广东顺德西安交通大学研究院 | Monitoring video object detection method based on temporal-spatial information and deep network |
CN108627816A (en) * | 2018-02-28 | 2018-10-09 | 沈阳上博智像科技有限公司 | Image distance measuring method, device, storage medium and electronic equipment |
CN108460790A (en) * | 2018-03-29 | 2018-08-28 | 西南科技大学 | A kind of visual tracking method based on consistency fallout predictor model |
CN108981698A (en) * | 2018-05-29 | 2018-12-11 | 杭州视氪科技有限公司 | A kind of vision positioning method based on multi-modal data |
CN108981698B (en) * | 2018-05-29 | 2020-07-14 | 杭州视氪科技有限公司 | Visual positioning method based on multi-mode data |
CN110675421A (en) * | 2019-08-30 | 2020-01-10 | 电子科技大学 | Depth image collaborative segmentation method based on few labeling frames |
CN110675421B (en) * | 2019-08-30 | 2022-03-15 | 电子科技大学 | Depth image collaborative segmentation method based on few labeling frames |
CN115091472A (en) * | 2022-08-26 | 2022-09-23 | 珠海市南特金属科技股份有限公司 | Target positioning method based on artificial intelligence and clamping manipulator control system |
Also Published As
Publication number | Publication date |
---|---|
CN106920250B (en) | 2019-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106920250B (en) | Robot target identification and localization method and system based on RGB-D video | |
Čech et al. | Scene flow estimation by growing correspondence seeds | |
KR101788225B1 (en) | Method and System for Recognition/Tracking Construction Equipment and Workers Using Construction-Site-Customized Image Processing | |
CN109784130B (en) | Pedestrian re-identification method, device and equipment thereof | |
CN103458261B (en) | Video scene variation detection method based on stereoscopic vision | |
CN104517095B (en) | A kind of number of people dividing method based on depth image | |
CN103164858A (en) | Adhered crowd segmenting and tracking methods based on superpixel and graph model | |
CN107560592A (en) | A kind of precision ranging method for optronic tracker linkage target | |
TWI686748B (en) | People-flow analysis system and people-flow analysis method | |
CN110264493A (en) | A kind of multiple target object tracking method and device under motion state | |
KR101139389B1 (en) | Video Analysing Apparatus and Method Using Stereo Cameras | |
WO2024114119A1 (en) | Sensor fusion method based on binocular camera guidance | |
US11645777B2 (en) | Multi-view positioning using reflections | |
US8989481B2 (en) | Stereo matching device and method for determining concave block and convex block | |
CN112633096B (en) | Passenger flow monitoring method and device, electronic equipment and storage medium | |
Nair | Camera-based object detection, identification and distance estimation | |
CN110415297A (en) | Localization method, device and unmanned equipment | |
CN117456114B (en) | Multi-view-based three-dimensional image reconstruction method and system | |
RU2370817C2 (en) | System and method for object tracking | |
CN114022531A (en) | Image processing method, electronic device, and storage medium | |
CN103679699A (en) | Stereo matching method based on translation and combined measurement of salient images | |
CN108090930A (en) | Barrier vision detection system and method based on binocular solid camera | |
CN112767452B (en) | Active sensing method and system for camera | |
JP6548306B2 (en) | Image analysis apparatus, program and method for tracking a person appearing in a captured image of a camera | |
CN110473246B (en) | Distance measurement method of multiple shielding targets based on binocular vision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |