CN109782902A

CN109782902A - A kind of operation indicating method and glasses

Info

Publication number: CN109782902A
Application number: CN201811543901.XA
Authority: CN
Inventors: 程俊; 王鹏
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2019-05-21
Also published as: WO2020125499A9; WO2020125499A1

Abstract

The present invention provides a kind of operation indicating method, apparatus and glasses, are suitable for technical field of data processing, this method comprises: obtaining the image of user's local environment, and the 3D semanteme map based on picture construction building user's local environment；Eye movement identification is carried out to user, judges whether user watches the article for including in 3D semanteme map attentively；If user watches the article for including in 3D semanteme map attentively, the corresponding operation mode of 3D semanteme map is obtained, includes the operating procedure to one or more articles in operation mode；Whether monitoring user meets the requirement of operating procedure to the operation of article；If user is unsatisfactory for the requirement of operating procedure, the corresponding operation indicating of output operating procedure to the operation of article.User no longer needs to carry out any manual operation input, without always actively against screen viewing, it can learn the problems in oneself operation in time, realize the intelligent Matching to user's operation study course, greatly improve user to the determination efficiency of operating mistake.

Description

A kind of operation indicating method and glasses

Technical field

The invention belongs to technical field of data processing more particularly to operation indicating method and glasses.

Background technique

In real life and work, user is frequently necessary to some operation study courses of internet searching to instruct some operations of oneself It is whether correct, such as search for some cook and picture and text study course and look at whether oneself cooking methods has according to picture and text study course is cooked Accidentally, or some equipment operation study courses are searched for judge whether equipment operation is wrong, in the prior art, user is to pass through computer The equipment such as mobile phone are manually entered some keywords to scan for, and corresponding image-text video study course are obtained, further according to these picture and text Video tutorials gradually compare, but these require user and carry out a large amount of manual operation input, and need user always It is actively compared against screen, determines the mistake of operation, cumbersome and inefficiency.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of operation indicating method and glasses, to solve to use in the prior art Family needs a large amount of manual operations to check and compare, and just can determine that mistake present in oneself operation, cumbersome inefficiency Problem.

The first aspect of the embodiment of the present invention provides a kind of operation indicating method, comprising:

Obtain the image of user's local environment, and the 3D semanteme map based on described image building user's local environment；

Eye movement identification is carried out to user, judges whether the user watches the article for including in the 3D semanteme map attentively；

If the user watches the article for including in the 3D semanteme map attentively, the corresponding operation of the 3D semanteme map is obtained Mode includes the operating procedure to one or more articles in the operation mode；

Monitor the requirement whether user meets the operating procedure to the operation of the article；

If the user is unsatisfactory for the requirement of the operating procedure to the operation of the article, the operating procedure pair is exported The operation indicating answered.

The second aspect of the embodiment of the present invention provides a kind of glasses, and the glasses include memory, processor, described to deposit The computer program that can be run on the processor is stored on reservoir, the processor executes real when the computer program Now the step of operation indicating method as described above.

Existing beneficial effect is the embodiment of the present invention compared with prior art: the 3D language by constructing user's local environment Free burial ground for the destitute figure, and there are articles to watch behavior attentively identifying user determines user there are when operation indicating demand, according to 3D semanteme Operation mode needed for map actual conditions intelligent recognition goes out user (i.e. to the operation study course of article), and according to user to article Practical operation situation, to be monitored and prompt user's operation so that user no longer needs to carry out any manual operation input, Without actively being watched against screen always, it can learn the problems in oneself operation in time, realize to user's operation study course Intelligent Matching, greatly improve user and efficiency known to operating mistake.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is the implementation process schematic diagram for the operation indicating method that the embodiment of the present invention one provides；

Fig. 2 is the implementation process schematic diagram of operation indicating method provided by Embodiment 2 of the present invention；

Fig. 3 A and Fig. 3 B are the implementation process schematic diagrames for the operation indicating method that the embodiment of the present invention three provides；

Fig. 4 A and Fig. 4 B are the implementation process schematic diagrames for the operation indicating method that the embodiment of the present invention four provides；

Fig. 5 is the implementation process schematic diagram for the operation indicating method that the embodiment of the present invention five provides；

Fig. 6 is the implementation process schematic diagram for the operation indicating method that the embodiment of the present invention six provides；

Fig. 7 is the structural schematic diagram for the operation indicating device that the embodiment of the present invention seven provides；

Fig. 8 is the schematic diagram for the glasses that the embodiment of the present invention eight provides.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.

In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.

To facilitate the understanding of the present invention, first the embodiment of the present invention is briefly described herein, even if for the ease of user It was found that the operational issue of oneself, the embodiment of the present invention can construct corresponding 3D based on the image of user environment semantically first Figure, so that it is determined that out user's actually located environment the case where and environment in include object etc., but in view of in actual conditions Not necessarily just there are operational requirements user is in a certain environment, such as user may only pass through a certain place, or Although user is among a certain place but does not need to operate, therefore false triggering in order to prevent, in the embodiment of the present invention Can also eye movement analysis be carried out to user, judge that user whether there is the behavior of watching attentively to article, and exist in user and watch behavior attentively When, determine user there are the demands of operation indicating, at this time according to the actually located 3D semanteme map situation of user come intelligent With operation study course needed for user, finally user analyzes the real-time operation of article, and be unsatisfactory for grasping in user's operation It is corresponding to carry out correct operating procedure prompt when making the requirement of study course, so that user is not necessarily to carry out cumbersome manual operation, The problem of required operation study course can also be obtained in time, and learn oneself operation in time and correct mode of operation.

Should clearly, specific executing subject in embodiments of the present invention can be according to practical situations demand not It is same and different, for example, it is either the smart machines such as wearable device (such as intelligent glasses), are also possible to server etc. and set Standby, when executing subject is wearable device, all data sampling and processings and output operation are by can in the embodiment of the present invention Wearable device is completed, and when executing subject is that server etc. directly can not carry out the equipment of data acquisition output to user, this The work for acquiring and exporting to user data in inventive embodiments is completed by other equipment, i.e., data obtain in the embodiment of the present invention The direct object for taking and exporting not instead of user can carry out the other equipment of data acquisition output, to user with realization pair The operation indicating purpose of user, for example, be responsible for acquiring the data of user and being sent to server handling by intelligent glasses, The prompt generated after processing is sent to intelligent glasses by server, then by intelligent glasses final output to user, it is specific to execute The determination of main body can be selected and be designed according to practical research and development situation and demand by research staff, this sentences executing subject To be illustrated for intelligent glasses, details are as follows:

Fig. 1 shows the implementation flow chart of the operation indicating method of the offer of the embodiment of the present invention one, and details are as follows:

S101 obtains the image of user's local environment, and the 3D semanteme map based on picture construction user's local environment.

In order to realize that the intelligence to user demand accurately identifies, in the embodiment of the present invention first can to user's local environment into Row Image Acquisition, and carry out the building of 3D semanteme map.Wherein, 3D semanteme map is exactly the surrounding three-dimensional comprising semantic information Figure.In the embodiment of the present invention, wide-angle camera can be set in intelligent glasses to carry out the acquisition of ambient image, get Ambient image and then the building that surrounding three-dimensional map is carried out based on these images, and identify article wherein included and every Kind article associated property data.Specifically, the method for 3D semanteme map structuring includes but is not limited to as based on stereoscopic vision 3D semantically nomography or other developing algorithms are not limited herein, can specifically be chosen according to actual needs by technical staff Or setting, or refer to other related embodiments of the invention.

S102 carries out eye movement identification to user, judges whether user watches the article for including in 3D semanteme map attentively.

When user is in a certain environment, it can not illustrate that it there is the demand of operation study course guidance, such as user A certain place may only be passed through, or be only at a certain place but do not need to operate etc., therefore, if direct basis User's local environment operates to carry out operation study course matching and prompt etc., may result in the mistake prompt to user, but work as user When being in a certain environment, and routinely watching some articles attentively, illustrates that user is particularly likely that and need to carry out object manipulation, because This, the embodiment of the present invention, which can whether there is user, identifies the behavior of watching attentively of article, and can exist determine user When watching behavior attentively, the operations such as subsequent operation study course matching are just carried out, specifically, meeting in intelligent glasses in the embodiment of the present invention The devices such as the built-in device, such as eye tracker that eye movement shooting is carried out to user, and eyes figure is carried out to user by these devices Acquisition as data and the eye image progress eye movement analysis to acquisition, to determine the eye movement what state of user, such as user Whether blink, whether watching area at which and has and watches article etc. attentively, in embodiments of the present invention, specific eye movement analysis tracking Method can voluntarily be selected by technical staff, and some existing algorithms both can be used, can also self-setting according to demand, Huo Zhecan Examine other related embodiment methods of the invention.

In embodiments of the present invention, it needs to confirm that user whether there is the behavior for watching article attentively, that is, is carrying out eye movement identification When not only need to identify the article that user sees, it is also necessary to be further confirmed whether to see some article (i.e. there are duration Watch article attentively) behavior, therefore in the embodiment of the present invention, on the basis of determining article that user sees, it is also necessary to which statistics is used The time of article is seen at family, and judges whether to watch article attentively according to this, specifically refers to the embodiment of the present invention three.

S103 obtains the corresponding operation mode of 3D semanteme map, behaviour if user watches the article for including in 3D semanteme map attentively Include the operating procedure to one or more articles in operation mode.

When user is in a certain environment and routinely watches some articles attentively, illustrates that user is particularly likely that and need to carry out Object manipulation, at this point, the embodiment of the present invention can the 3D semanteme map to user environment carry out analyze determine its actually located ring Operation that user can be carried out and right is estimated in border situation, kitchen in this way or equipment cabinet etc. further according to ambient conditions Answer required operation study course.In the embodiment of the present invention, operation study course either technical staff preset it is multiple, further according to 3D Semantically figure scene type etc. is chosen, operation study course create-rule that can also be certain by the default setting of technical staff, Further according to 3D, semantically figure scene type etc. is obtained to operate the generation of study course.

As an embodiment of the present invention, when progress operation mode determines, comprising:

Identify the corresponding scene type of 3D semanteme map.

Based on the article for including in scene type and 3D semanteme map, corresponding operation mode is obtained.

In embodiments of the present invention, it is contemplated that different demands of the user under different actual scenes, it can be simultaneously according to user The corresponding scene type of 3D semanteme map and the article for actually including judge that the actual capabilities demand of user, such as user are in Among kitchen, and include abundance of food, then illustrate that user very likely cooks, can be generated at this time more corresponding The study course cooked.Wherein, specific scene type recognition methods and item identification method not limit herein, can be by technology people Member sets according to actual needs.

As another embodiment of the invention, when progress operation mode determines, comprising:

It identifies the corresponding scene type of 3D semanteme map, and obtains the user data of user.

Based on the article and user data for including in scene type, 3D semanteme map, corresponding operation mode is obtained.

Even the actual demand of different user is also in view of in the case where identical scene article having the same It is that there may be certain differences, such as even if be among kitchen and include abundance of food, but the taste of each user can Can be different, corresponding cooking course also certainly exists bigger difference, therefore, according only to user's local environment scene type with And comprising article situation be difficult to realize accurately identify user's operation study course demand sometimes.In embodiments of the present invention, meeting Some user data of the article and user that according to the scene type of environment, actually include simultaneously determine the reality of user Operational requirements, and filter out corresponding operation study course.Wherein the particular content of user data can by technical staff or user from Row setting, the including but not limited to pre-set some behaviour of the personal information of such as user and taste data or user oneself Make demand, such as user presets the dietary requirements of oneself.

Whether S104, monitoring user meet the requirement of operating procedure to the operation of article.

After the operation study course needed for getting user, the embodiment of the present invention starts to supervise the practical operation of user Identification is surveyed, and is compared with the operating procedure among operation study course, judges whether user's practical operation meets operation study course It is required that so that it is determined that operational deficiencies existing for user out.Wherein, in order to realize the monitoring identification to user's operation, the present invention is real It applies in example, the camera in intelligent glasses can shoot the behavior of user, and can carry out behavior to obtained image/video Whether analysis, the i.e. behavior act of analysis user and behavior sequence etc. meet the requirement in operation study course.

S105, if user is unsatisfactory for the requirement of operating procedure to the operation of article, the corresponding operation of output operating procedure is mentioned Show.

When the operation of operating procedure requirement in operation study course occurs being unsatisfactory in user, it is wrong to illustrate that operation occurs in user Accidentally, therefore at this time the embodiment of the present invention can carry out operation indicating to user, i.e., there are mistakes for informing user's current operation, and can be same When inform the correct operating procedure of user, for example, after pressing power button, needing use pattern when operating to equipment After key selected equipment mode, then by start button carry out, and if user after pressing power button, directly just press start button, at this time Discovery of embodiment of the present invention user's operation is unsatisfactory for the requirement of operation study course, will prompt user, should not need by start button It will be by mode key selected equipment mode.Wherein, the way of output of operation indicating includes but is not limited to such as audio/video/text Prompt, specifically can be by technical staff's sets itself.

As a kind of specific implementation for carrying out prompt output in the embodiment of the present invention one, comprising:

It identifies watching area of the user in 3D semanteme map, and carries out the augmented reality of operation indicating based on watching area Output.

In the embodiment of the present invention, it can be prompted using augmented reality (Augmented Reality, AR) format technology The output of information, so that the problem of user more can intuitively know oneself and corresponding correct operation.

The embodiment of the present invention passes through the 3D semanteme map of building user's local environment, and there are article notes identifying user Depending on behavior, that is, user is determined there are when operation indicating demand, needed for going out user according to 3D semanteme map actual conditions intelligent recognition Operation mode (i.e. to the operation study course of article), and according to user to the practical operation situation of article, to be monitored and mention Show user's operation, so that user no longer needs to carry out any manual operation input, without always actively against screen viewing To learn the problems in oneself operation in time, the intelligent Matching to user's operation study course is realized, user couple is greatly improved Operating mistake knows efficiency.

As a kind of specific implementation of 3D semanteme map structuring in the embodiment of the present invention one, in the embodiment of the present invention In, the ambient image for needing to acquire includes the color image and depth image of environment, as shown in Fig. 2, the embodiment of the present invention two, packet It includes:

S201 is based on color image and depth image, obtains locating for location information and posture information and the user of user The location information and Item Information of article in environment.

In embodiments of the present invention, intelligent glasses obtain color image and depth map using RGB-D camera as sensor Picture, using vision SLAM algorithm complete intelligent glasses autonomous positioning and pose estimate and optimization (i.e. the location information of user and Posture information obtains), while the semantic information that Articles detecting obtains environment is carried out, then it will be perceived using RGB-D partitioning algorithm To article split, thus the 3D semanteme map of constructing environment.Specifically, vision SLAM basic framework is by vision mileage Meter, rear end optimization, winding detection and three-dimensional are built four parts of figure and are formed.Visual odometry will be completed to move between adjacent two field pictures Estimation, roughly estimate the current pose of camera.It is globally consistent that rear end optimization seeks to the progress of the estimation to visual odometry Optimization, eliminate noise jamming, furthermore with winding detect constrained optimization pose, make positioning and pose estimation it is more accurate. Winding detection is to eliminate the cumulative errors of process when coming back to the position passed through originally.The building of map is based on first three portion Movement and the pose for dividing estimation, create the three-dimensional map of environment.

Vision SLAM algorithm whole design is realized by sensor input data to the globally consistent of current location and posture Estimation, is defining for the positioning and movement to itself.Vision SLAM algorithm is by visual odometry, rear end optimization and winding detection three A partial cooperative is completed.

Visual odometry utilizes matching using the adjacent two frames ORB feature of FLANN algorithmic match by extracting ORB characteristic point As a result the rough estimate for completing intelligent glasses position and posture is combined using PnP algorithm and RANSAC algorithm.Visual odometry Task be estimate intelligent glasses between two field pictures pose variation, estimate intelligent glasses for a period of time in posture and Motion profile.It is made of following process:

1, feature extraction

We extract characteristics of image by the way of ORB feature extraction.ORB feature is described by FAST key point and BRIEF Son is constituted, and artificially imparts rotation and scale invariability.

The process of feature extraction:

1) coarse extraction, for certain point p, pixel value Ip, 16 points on the circle for being 3 as center radius using p, if There are the pixel values of 12 or more points to differ in threshold value with Ip, then it is assumed that the p is candidate's FAST key point.

2) comentropy of each subset is calculated.Using information gain as evaluation criterion, the highest pixel of value is set as determining The root node of plan tree, and continue to be iterated its subset, the property until determining the point, i.e. FAST key point or non-FAST are closed Key point.Then ID3 decision tree is just generated, optimal FAST key point is filtered out using the tree.

3) thought of non-maxima suppression is utilized, in subrange, keep score highest FAST key point, deletes it The lower FAST key point of his score traverses one time, screening can be completed.

4) characteristic dimension and rotational invariance, are assigned.Scale invariability is realized using pyramid principles, by image drop sampling Processing obtains image pyramid, all completes above-mentioned four steps feature extraction to its each layer, realizes the Scale invariant of FAST key point Property.Rotational invariance is realized by gray scale centroid method, is calculated using key point as the mass center of center image block U, from center to matter The vector of the heart is defined as the direction of key point, realizes the rotational invariance of FAST key point

Description: improvement is made on the basis of BRIEF.First is that considering all the points in 31 × 31 neighborhoods of key point, will scheme After carrying out Gaussian smoothing filter, chooses gray average in 5 × 5 neighborhoods and a single point gray scale is replaced to be calculated, noise immunity is strong.Two Uncorrelated greedy search algorithm when being 5 × 5 neighborhood of selection using mean value close to 0.5, ensure that description it is representative and Uniqueness makes it have distinction.

2, characteristic matching

Quick approximate KNN (FLANN) algorithm is selected, and core concept is search range to be determined using export index structure Characteristic matching is completed in adjacent domain, can effectively accelerate matching speed in characteristic proximity region in position.It is retouched using BRIEF The characteristics of son is made of 0 and 1 is stated, uses local sensitivity Hash as export index structure.Feature is thrown in the same way Shadow to hash space, after original two adjacent Projection Characters still the adjacent probability of hash space also can be very high, and it It, in this way can be in hash space neighborhood in the probability meeting very little that hash space is adjacent after preceding non-conterminous two Projection Characters It is matched, effectively reduces range.

3, pose algorithm for estimating designs

After extracting and matching the feature of two consecutive frames, the movement and appearance of intelligent glasses are estimated using matching relationship State.For quantitative estimation intelligent glasses movement and pose it may first have to understand that intelligent glasses imaging is several with the mathematics of spatial point What relationship.The process of intelligent glasses imaging is also referred to as observation process, i.e. point reflection in three-dimensional space or transmitting light, passes through The optical center of intelligent glasses projects to the process on the imaging plane of intelligent glasses.

The present invention carries out the rough estimate of two interframe poses using PnP algorithm, and RANSAC algorithm is recycled to carry out the pose Interframe consistency optimization, avoid error hiding feature to pose estimation cause to seriously affect.Using RANSAC algorithm to the problem It optimizes.

The ORB characteristic point for extracting image first, the characteristic matching of adjacent two field pictures is carried out using quick nearest neighbor algorithm, Finally camera motion and Attitude estimation are completed using PnP algorithm and RANSAC algorithm.

4, rear end optimizes

Due to data noise, error hiding, the influence of the factors such as error is calculated, the error of pose estimation is will cause, transports for a long time Error is built up when row, can seriously affect system performance.Rear end optimization seeks to the estimation to visual odometry and carries out the overall situation Noise jamming is eliminated in consistent optimization, keeps pose estimation more accurate.In addition, being passed the information on after system detection is to winding To rear end, cumulative errors are eliminated.

The embodiment of the present invention introduces crucial frame mechanism, selects representational picture frame to carry out pose optimization, reduction need not The calculating wanted.For local pose optimization problem, adjustment algorithm is collected using bundle, optimization camera is currently in the posture of convergence process And characteristic point.When system detection goes out winding, rear end uses pose figure optimal way, obtains globally consistent track and pose. Rear end optimization receives the pose and characteristic point that visual odometry transmits, and is optimized by the way of bundle collection adjustment.Workflow Are as follows: check queue, processing key frame, point map is rejected, generates and merged, part bundle collection adjusting and optimizing pose.

Meeting following four principles is key frame by image setting:

(1) it since camera acquisition data frame frequency is higher, is had to pass through between current key frame and previous frame key frame certain Train interval.

(2) rear end optimizes part not in the operating condition.

(3) current key frame is lower than a certain range with the mutual zone of mutual visibility domain of selectively all key frames before.

(4) present frame possesses enough characteristic points and matching, guarantees the rich of feature.

5, winding detects

Winding detection, mainly solves the problems, such as that intelligent glasses pose evaluated error gradually accumulates.When phase in coming once again Through to cross place when, which confirm and establishes the connection relationship between current pose and history pose, pass to rear end progress it is excellent Change processing, the accumulated error of system long-play is eliminated, globally consistent track and pose is obtained.On the other hand, winding Detection provides being associated with for current data and all historical datas, in tracking loss of the visual odometry to feature, Ke Yili It is relocated with winding detection, the robustness of enhancing pose estimation.

Winding detection is persistently detected in system operation, and the estimation of intelligent glasses pose is eliminated by the way that the constraint of winding occurs Cumulative errors.Closed loop the constraint relationship when camera is returned to the position that some once came passes to rear end, and to carry out pose figure excellent Change.The workflow of winding detection includes the following steps: (1) the detection of winding candidate frame.Step 2 is contacted with the foundation of preceding key frame.Step 3, it detects whether that winding, if not occurring, return value step 1, if entering step 4 occurs.Step 4, the optimization of pose figure.

5.1, the foundation of casette model

Feature is regarded as word one by one, training includes the dictionary of all characteristic types in advance, to the feature of each image According to the set of one equivalent of dictionary creation, that is, bag of words.Then, as long as judging to compare it when the similarity degree of image Bag of words, greatly accelerate winding detection speed.Feature clustering is calculated using unsupervised machine learning K-means++ Method improves search efficiency using the structure of K-d tree.

The training process of dictionary are as follows:

1) in root node, all samples is divided into k class with above-mentioned K-mean++ algorithm, obtain first layer.

2) sample for belonging to the node is equally clustered again as k class with K-mean++, is obtained by each node to each layer New one layer.

3) and so on, leaf node layer to the last.The leaf layer is exactly the corresponding word of feature.

5.2 similarity calculating method

Introduce TF-IDF algorithm.If the similarity of present frame and some key frame before is more than present frame and Shang Yiguan 3 times of the similarity of key frame, are considered as may have occurred winding.But there is still a need for a verification steps, set up the slow of winding detection Mechanism is deposited, single similarity height is not enough to be judged as winding, and when the similarity of successive frame is all very high, just winding occurs for confirmation. After winding occurs for confirmation, winding detection part sends this information to rear end, and rear end carries out pose using figure optimal way excellent Change, eliminates cumulative errors, obtain globally consistent track and pose.

The summary of vision SLAM algorithm overall flow:

The design of vision SLAM algorithm consists of three parts: visual odometry, rear end optimization, winding detection.Visual odometry The ORB feature of sensor input frame is extracted in part first, carries out the spy of adjacent two field pictures using quick nearest neighbor algorithm later Sign matching is combined using PnP algorithm and RANSAC algorithm and completes the estimation of camera pose.Rear end part transmits visual odometry Pose and characteristic point are optimized by the way of bundle collection adjustment.Winding detection is detected using bag of words and trained dictionary The position whether camera arrived before returning to sends rear end for this constraint information, rear end is using figure optimization in case of winding Mode optimizes pose, eliminates cumulative errors, ensure that the global coherency of camera track and pose.

It, first can be from cromogram in the embodiment of the present invention in order to realize that Item Information and location information to article obtain Articles detecting is carried out as in, wherein detection algorithm includes but is not limited to such as Yolv V3 object detection algorithms, it not limits herein, And article segmentation is carried out from color image, details are as follows:

The depth map of respective pixel, that is, three-dimensional point cloud are also obtained while obtaining color image.Therefore, it establishes The object segmentation for needing will test before semantic map comes out, and recycles the camera pose of the estimation of vision SLAM module and optimization Pixel is projected into the position in space, to construct three-dimensional semantic map.We pass through a kind of improvement GrabCut algorithm Realize target RGB-D segmentation, the geometrical plane of algorithm combination CPF (Constrained Plane Fitting) algorithm segmentation Information improves the segmentation effect of GrabCut, realizes the RGB-D segmentation of objective.Divided first using GrabCut algorithm Image recycles CPF algorithm to divide three-dimensional point cloud, finally using point cloud segmentation result as filter, rejects image segmentation result In do not meet the pixel of object space geometrical relationship, complete the RGB-D segmentation of target.

S202 constructs 3D according to the location information of user and the Item Information and location information of posture information and article Semantic map.

So far the whole information needed for constructing semantic map have been obtained, including the intelligent glasses position of the key frame optimized Set and posture (i.e. the location information and posture information of user), key frame in article classification (i.e. Item Information) and its position and The three-dimensional segmentation of the target object detected.Next what is done is exactly the globally consistent three-dimensional language of information architecture of Integration obtaining Free burial ground for the destitute figure, substantially process are divided into three steps: carrying out the consensus of data first and update target object model, then construct ring Semantic information, is finally fused in three-dimensional map by the three-dimensional map in border, obtains the semantic map of the three-dimensional comprising abundant information.

The developing algorithm research and design of semantic map.A kind of target RGB-D segmentation for improving GrabCut is devised first Algorithm combines GrabCut segmentation cromogram with the information of CPF segmentation depth map, and the Target Segmentation that will test comes out, complete The Objective extraction work of figure is built at the semanteme as unit of object.And marked according to object category, recycle target object Consensus update subject model, avoid the multiple modeling to same target.And then with colored Octree map Structure constructs and stores the semantic map of three-dimensional comprising abundant information.

1, data correlation and model modification algorithm design

Whether the effect of data correlation determines the target in map when being the result for obtaining target after RGB-D is divided In, it needs to add new object and still existing object is safeguarded, avoid repeatedly modeling same target in map There is ghost image.

Firstly for detecting each time, the Euclidean distance based on each mass center for putting cloud after segmentation selects one group of candidate target Boundary marker.Then the three-dimensional point of to map existing boundary mark and current goal carries out nearest neighbor search, and calculates reference point Pair Euclidean distance.The Euclidean distance of two three-dimensional points i.e. the 2- norm of two o'clock.

If the three-dimensional point for having more than half in target is all less than certain threshold value with the existing target range of map, it is believed that The target with the existing target of map be it is same, current goal information is associated with the existing target of map, to safeguard jointly Object module.In addition, the nearest neighbor search of three-dimensional point is accelerated by seeking k-d tree structure when local environment is more complex.For Guarantee data correlation can get newest information, using designing improved RGB-D target above as long as detecting object Partitioning algorithm is split.Each object in this way in map retains three information: the mesh obtained by data correlation Mark model observes the probability of all categories that the key frame pose of the target and module of target detection provide.Target object in map Probability can be updated according to probability value that module of target detection provides, if current detect C type objects altogether, Sc indicates the target The vector of probability composition of all categories, n are the key frame number for detecting this target, and target detection probability, which updates, to be calculated:

Then the generic of the target is max (Sc) in map, and confidence level is p=max (Sc) ln, in semantic map The mark of target category and probability provides information.

2, the building and storage form of semantic map

Present invention use is flexible, it is small to account for amount of ram and supports the map view of real-time update: Octree map

Entire space is divided into eight child nodes according to space coordinates as root node, is further continued for each child node point For eight child nodes, required resolution ratio, i.e. leaf node are assigned to always.Octree map is different from the voxel model of point cloud chart Point is, when all the points are all occupied or prevent take up in certain square, it is not necessary to this node be unfolded, comparatively occupy Memory headroom is very small.And the speed for searching for leaf node is very fast, and d layers of Octree time complexity is O (d).In addition, Octree map can support the color that each node is arranged, that is, colored Octree map view, while support at any time more New and update information is highly suitable for constructing three-dimensional semantic map.Therefore, select colored Octree map as three-dimensional language herein The building and storage form of free burial ground for the destitute figure.

The three-dimensional semantic map of building, initially sets up the three-dimensional map and continuous updating of environment, then in real time by semantic information It is fused in three-dimensional map, it can three-dimensional semantic map of the building comprising abundant information.During camera motion constantly Ground obtains information and handles, and then is continuously updated semantic map.

Three-dimensional build figure work be according to after key frame is estimated and is optimized in vision SLAM algorithm position and posture, will RGB-D cameras capture to depth information be mapped in three-dimensional space, establish the three-dimensional map of environment.It is used due to the present invention The depth of each pixel in the available visual field of RGB-D camera, can directly using depth map carry out it is dense build figure, according to optimization Camera pose, splice after depth map is mapped as a cloud, obtain three-dimensional map.

Semantic map is exactly the three-dimensional map comprising semantic information, that is, marks out environment when establishing Octree map Semantic information.It incorporates in octree structure, is just obtained comprising rich when establishing three-dimensional Octree map, while by semantic information The semantic map of the three-dimensional of rich information.

Semanteme of the present invention builds drawing system using RGB-D camera as visual sensor, captures colour information and depth information.Benefit The autonomous positioning of AR intelligent glasses is completed with vision SLAM algorithm and pose estimates and optimization, obtains globally consistent track and position Appearance.Target detection is carried out using convolutional neural networks YOLOv3 model simultaneously, detects the object category occurred in key frame, probability And its position, obtain the semantic information of environment.Then the object segmentation perceived is come out using RGB-D partitioning algorithm, is selected The three-dimensional map of Octree map view constructing environment.Finally, semantic information is incorporated the Octree map, the three of environment are completed Tie up the building of semantic map.

The embodiment of the present invention devises the vision SLAM algorithm using RGB-D camera as sensor.The present invention selects ORB feature As the basis of algorithm, space geometry relationship not only is provided in the pose estimation of visual odometry, when also detecting as winding The standard of image similarity judgement, realizes the uniformity of system to a certain extent.The pose estimation of design of the embodiment of the present invention In algorithm, ICP algorithm is substituted using PnP algorithm, PnP algorithm is sat using the good camera coordinates of previous frame optimization and current frame pixel Mark calculates pose, avoids the interference of camera measurement error.Rear end optimization part is using targetedly algorithm process, to vision mileage The optimization for counting the pose and point map that transmit collects adjustment algorithm using part bundle, when detecting the closed loop constrained optimization transmitted to winding Using the method optimizing pose of figure optimization.

Devise the semantic overall structure for building drawing system.A kind of target RGB-D segmentation for improving GrabCut is devised herein Algorithm, the CPF segmentation result amendment for being combined depth point cloud information is divided merely with the GrabCut of color image information, real The complementation of both existing performance.The Object Segmentation of target detection is come out, and is marked according to object category, comprehensive visual SLAM Detection and the segmentation result of camera pose and target that algorithm obtains, with the building of colored Octree map structure and storage environment Three-dimensional semanteme map.Operating system in laboratory environments, system is while self poisoning, Attitude estimation, Semantic Aware, structure Readable and accurate three-dimensional semantic map is built out, the feasibility that semanteme of the present invention builds figure scheme is demonstrated.

As a kind of specific implementation for watch attentively to user Activity recognition in the embodiment of the present invention one, such as Fig. 3 A institute Show, the embodiment of the present invention three, comprising:

S301 obtains the eyes image of user, carries out Pupil diameter to eyes image, and based on obtained pupil position letter Cease the watching area for determining user in 3D semanteme map.

In the embodiment of the present invention, the tracking of user's sight can be carried out based on the pupil of user and Purkinje image, and determine The region that user watches attentively out, therefore firstly the need of the position for determining pupil in eyes.Wherein, specific pupil recognizer can By technical staff's sets itself according to demand, the sample of pupil image is including but not limited to such as carried out using neural network model Data training, and identify the pupil in eyes image, or is identified with reference to the embodiment of the present invention four.Due to being based on pupil It is more mature come the technology for carrying out the tracking of user's sight with Purkinje image, therefore it will not go into details herein.

S302 identifies the article for including in watching area, and counts the article for including in watching area and be look in region Continued presence duration.

S303 determines that user watches 3D attentively semantically if there is the duration greater than preset duration threshold value in continued presence duration The article for including in figure.

After determining the watching area of user, it is also necessary to further determine the article for including in watching area, And each article is look at continuous duration existing for region (i.e. user continuously watches duration attentively to article), if the duration compared with It is long, then illustrate that user has the behavior for watching a certain article attentively.Wherein, the occurrence size of preset duration threshold value, can be by technology people Member's sets itself.

As an embodiment of the present invention, it is contemplated that user's head can move, therefore when carrying out above-mentioned Eye-controlling focus, Sight can change, and in order to realize more accurate Eye-controlling focus, include: in the embodiment of the present invention

The proper motion on head includes two basic exercises: the pitching movement of vertical direction and the left and right fortune of horizontal direction It is dynamic.The general mapping model based on quadratic polynomial is that user is calibrated in the case where keeping head stationary by multiple spot It obtains, when head position changes, the estimation which obtains is watched point tolerance attentively and greatly increased.It is mentioned on the basis of herein A kind of dynamic solution annual reporting law of head based on polynomial map, the algorithm need to obtain head in real time using head motion tracking equipment out The information of movement, it is preliminary that the polynomial map model obtained when in the case where head is dynamic first with calibration carries out blinkpunkt position The estimation point coordinate is combined initial eye position to establish three-dimensional direction of visual lines, recycles head movement information to this by estimation Direction of visual lines carries out rotation and translation compensation, using the intersection point of current compensated direction of visual lines and screen as final blinkpunkt Estimated coordinates.Furthermore, it is contemplated that the influence of head movement is derived from head movement the position of eyes is changed in fact, So we only it is to be understood that the situation of change of eye position sight can be compensated.

Illustrate that the sight established under the proper motion of head estimation model indicates in figure as shown in Figure 3B according to the above several points The side plan view of one sight estimation principle, O1 indicate that initial eye position, O2 indicate the position of eyes after movement.Initial Position is corresponding to obtain pupil cornea vector pccr1 when watching point S1 on screen attentively, and head is to the right after Y-axis rotation alpha angle, eye Eyeball moves on to position O2, it is assumed that pccr2 is equal to pccr1, i.e. any variation does not occur for eye figure feature, then corresponding sight will be sent out to the right Raw deflection, the intersection point with screen is we assume that be S2, this is blinkpunkt position at this time, but since pupil cornea vector does not occur Variation, carrying out sight with pccr2, to estimate timing error larger.The point according to a preliminary estimate that we estimate the point as sight to this, will The initial position of the point and eyes establishes direction of visual lines g1, recycles the angle information of head movement to be modified g1, will repair The intersection point of direction of visual lines and screen after just is current blinkpunkt estimation point.

As a kind of implementation positioned in the embodiment of the present invention three to pupil, as shown in Figure 4 A, the present invention is real Apply example four, comprising:

Eyes image is divided into N × M area image, and to all areas image grayscale binary conversion treatment, obtained by S401 To corresponding N × M eye gray value, wherein N and M is positive integer.

As shown in Figure 4 B, details are as follows for the eye feature in the embodiment of the present invention, and basic rectangular block is in the same size in figure, ABCD is the rectangular characteristic of most original in figure, E by 3 it is generally rectangular form, F is made of 9 rectangles, and G is a rectangle, H With I by 4 it is generally rectangular constitute, J is made of 12 rectangles, and K and L are made of 4 rectangles, and the calculating of each rectangular characteristic is all figure In black portions pixel and subtract white portion pixel and, feature G here is a single rectangular characteristic, so only counting Calculate rectangle in pixel and.Eye feature is designed based on eye structure, since the brightness of canthus and surrounding is varied, Canthus is darker relative to the pixel of surrounding, and feature F can be very good to show this feature, and the central point of eyeball is substantially presented Black, so simple rectangle G has meant that obvious picture twice can occur for this characteristic of eyeball, the horizontal direction of eyes Element mutation, from sclera to iris, then by iris to sclera, feature H can reflect this variation, similarly the Vertical Square of eyes Also there is similar grey scale change feature upwards, I feature is namely based on this feature and generates, they and C and E altogether can Enhancing the description to eye level and vertical grey scale change feature, J illustrates the grey scale change situation of part between canthus and eyeball, K, L is demonstrated by the marginal information at canthus.After increasing new feature the eye feature quantity for classification is reduced, makes eye detection Become to be more easier.

The Pupil diameter of the embodiment of the present invention uses the independent design philosophy of frame.Examine the discovery of human eye screenshot, people Many parts of eye picture signal can with incident ray, different people, eyes mirror-reflection and change, moreover, with face The difference of corner, the relative position appeared in screenshot is also different, and directly progress classification based training effect is not fully up to expectations.Through excessive Secondary comparison, it has been found that pupil region is image information more stable in human eye screenshot, the feature of this part when opening eyes Obviously.Therefore we first position pupil and then simplify the complex nature of the problem.

In the embodiment of the present invention, N and M are positive integer, and occurrence size is by technical staff's sets itself, with M=N= It is illustrated for 10, eyes image can be divided into 10 × 10 area images at this time, and each area image can be calculated Gray value obtains 10 × 10 eye gray values.

S402, obtains the skin image of user, and calculates skin image binarization of gray value treated average gray value.

In order to realize the tradeoff between colour of skin gray scale and minimum gray scale, the embodiment of the present invention can also acquire the skin figure of user Picture, and calculate its corresponding average gray value.Wherein skin image can be the skin image of circumference of eyes preferably.

S403, according to the sequence of absolute difference from small to large, from the difference of corresponding eye gray value and average gray value Value absolute value, which is less than in the area image of default gray threshold, carries out optical sieving, until the area image that screening obtains includes Pixel number is in preset quantity threshold range, obtains the corresponding area image of pupil, to determine pupil in eyes image Position.

Obtaining the average gray value of the corresponding eye gray value of each area image and the skin being calculated Later, the embodiment of the present invention can calculate separately the eye gray value of each area image, the difference with the average gray value of skin Absolute value, and the area image for meeting default gray threshold requirement can be successively filtered out according to sequence from small to large, and every It is secondary filter out an area image after, the pixel number that primary all area images filtered out include is counted, until screening The pixel number that area image out includes is in position within the scope of preset quantity, accurate and reliable to guarantee to identify pupil. Wherein, the occurrence size for presetting gray threshold and preset quantity threshold value, can there is technical staff's sets itself.

As a kind of concrete methods of realizing for carrying out scene Recognition in the embodiment of the present invention one to 3D semanteme map, the present invention Embodiment includes:

Individually classified to realize to every picture, there is used herein Places205-AlexNet network models, should Performance of the model in various benchmark datasets is more than other methods.Places205-AlexNet network model follows The network architecture identical with AlexNet but its targetedly trained in scene classification task.The training dataset packet About 2,500,000 pictures of 205 semantic categories, and every a kind of at least 5000 samples pictures are contained.These image credits are in each Kind of Internet resources, such as Google's picture, (Bing), Flickr must be answered, and picture is subjected to classification annotation.The sample of training dataset This value volume and range of product ensures that resulting classifier can be extensive well, and when it is applied from untrained environment It need not carry out second training or fine tuning.Which ensure that the semanteme of this paper build drawing system be it is transplantable, can be by a variety of contexts The user of operation uses.

The input of Places205-AlexNet network model is adjusted to the RGB picture of 224 × 224 × 3 pixels, and It is unrelated with their original dimension.Places205-AlexNet convolutional neural networks model shares 8 layer networks, including preceding 5, face convolutional layer and below 3 full articulamentums.The picture It of given current scene, the output layer soft-max output of the network Discrete probabilistic in 205 known scene classes divides p (o_t|I_t).The classifier of this paper uses Places205-AlexNet network As a feature vector, fc7 layers are that the last one in network is general (i.e. unrelated with class) complete for the output of the fc7 layer of model Articulamentum.Since the purpose of Places205-AlexNet network model design is in order to Places205 contextual data concentration 205 scene types are identified, therefore last fc8 layer has 205 output neuron nodes with prob layers.

Assuming that given current image It, the output layer prob of the network exports the discrete probabilistic on 205 known class Ci Distribution p (o_t|I_t).It enablesIndicate the mix vector of known scene class label.Then the corresponding combination of definition Likelihood are as follows:

Wherein indicate that the image It of t moment belongs to the probability of scene class Ci, and mutually indepedent between Ci.Since camera obtains Adjacent two picture be continuous in time, therefore recursive Bayesian filter technology can be applied.Herein by robot Scene classification problem is described as a probability Estimation problem and estimates the discrete probabilistic point on all possible scene tag Ci Cloth wherein it is all from the observation pictures of past till now be known.Assuming that meeting single order Markov property, then can be obtained such as Under Bayesian filter formula:

After scene classification problem is considered as a Bayesian estimation problem, other information money can be naturally enough integrated Source.For example, indoors due to the work of indoor service robot, the outdoor scene in 205 class scenes substantially can not quilt It observes.

As the embodiment of the present invention five, it is contemplated that in actual conditions, meet scene type, article situation and user data It is required that operation study course there may come a time when it is multiple, at this point, in order to ensure finally determining operation study course is that user is actually required, such as Shown in Fig. 5, the embodiment of the present invention includes:

S501 obtains multiple modes of operation.

S502, the mode for obtaining user couple chooses instruction, and chooses instruction based on mode, filters out from multiple modes of operation A kind of operation mode.

When there are many operation study course met the requirements, the embodiment of the present invention can voluntarily be chosen one needed for it as user Carry out subsequent operation.Wherein, mode chooses the input mode of instruction, including but not limited to as user passes through voice/eye movement/head The modes such as posture changing are inputted, such as the relevant informations of a variety of operation study courses, Yong Hutong are shown in intelligent glasses Crossing voice selection, eye movement operation (such as blink, watch attentively) or head pose variation, (handover operation study course of such as shaking the head is nodded really Recognize, instruction input chosen with implementation pattern), the side that can carry out instruction interaction need to be specifically set according to actual needs by technical staff Formula.

As the information exchange implementation of eye movement always of user in the embodiment of the present invention and intelligent glasses, the present invention is implemented User can carry out a variety of eye movement operations in example, to carry out information exchange operation with intelligent glasses, comprising:

1) judge whether to be reading behavior:

For marking the main blinkpunkt of reading behavior (having confirmed that fixation object, do not need more recognition times), In time span preset range, preset range is preferably 600-1100 milliseconds.

2) judge that user returns view behavior:

View can be defined back by pan data, i.e., it is the center of circle, 1 field of regard that blinkpunkt coordinate, which falls in preceding 5 blinkpunkts, Domain be radius constitute space in, but the blinkpunkt preceding 1 blinkpunkt not counting.

3) judge whether it is the behavior for changing global focus:

Whole focus change is that (more than duration preset duration, preferably 1100 in the least in upper one main blinkpunkt Second or more) afterwards pan length be more than 3 watching areas or more.

4) judge the behavior that local focus changes:

Local focus change is that (more than duration preset duration, preferably 1100 in the least in upper one main blinkpunkt Second or more) after of length no more than 3 watching areas of multiple pans, but its position change total distance has been more than 3 watching areas Behavior.

5) to the judgement of search behavior:

Search behavior: when whole focus change occur or local focus changes behavior, it is believed that user is It scans for.Whole focus change is typical search behavior, and local focus change is then that user thinks approximate region It has been found that, continually looking for specific objective.

Persistently search: in upper primary main blinkpunkt (more than duration preset duration, preferably 1100 milliseconds or more) The behavior when pan of 10 watching areas or more continuously occurs afterwards, it is believed that user is in thinking, absent-minded or rest.

In the embodiment of the present invention, user can carry out eye movement control to intelligent glasses by the above method, to realize this All kinds of man-machine interactive operations needed for invention other embodiments.

As a kind of concrete methods of realizing for carrying out user's operation monitoring in the embodiment of the present invention one, as shown in fig. 6, this hair Bright embodiment six, comprising:

S601 identifies behavior of the user to object manipulation when, and judges whether behavior meets the requirement of operating procedure.

S602 obtains the corresponding preset attribute threshold value of article that operation mode is related to, and identifies that user grasps article During work, whether the goods attribute data of article meet preset attribute threshold value.

S603, if behavior is unsatisfactory for the requirement of operating procedure and/or goods attribute data are unsatisfactory for preset attribute threshold value, Determine that user is unsatisfactory for the requirement of operating procedure to the operation of article.

Even if in view of seeming errorless in user's operation behavior, but wherein specific every single stepping whether entirely accurate sometimes And be difficult to, such as lectotype-starts again again for the really first booting-of user, but mistake mode is selected when mode is chosen, It can still result in whole operation at this time to go wrong, therefore, in order to determine the operational issue of user, the embodiment of the present invention pair in time It is not only merely the operation behavior of monitoring user, while can also be to the attribute of user's operation article when user carries out operation monitoring According to being monitored, had occurred to prevent class mistake here but unrecognized situation occur.

Specifically, in the embodiment of the present invention corresponding attribute threshold can be respectively provided with to each article being related in operation study course Value, such as in some study course of cooking, set the cooking time of oven as 1 hour temperature be 120 degree, and in user operation process The attribute data of article is identified, judges that it operates whether caused goods attribute data variation is in threshold requirement range, As above-mentioned, judge oven set by user cooking time and temperature whether be 1 hour and 120 degree (can be according to intelligent glasses Image when the user setting oven taken is identified to obtain).

When the attribute data that article caused by wrong (such as pressing the wrong button) or user's operation occurs in user behavior is unsatisfactory for wanting All illustrate that the operation of user is wrong when asking (such as oven temperature setting mistake), therefore the embodiment of the present invention can all determine user Operation study course medium-height grass is unsatisfactory in the requirement of step to the operation of article.

Corresponding to the method for foregoing embodiments, Fig. 7 shows the structure of operation indicating device provided in an embodiment of the present invention Block diagram, for ease of description, only parts related to embodiments of the present invention are shown.The exemplary operation indicating device of Fig. 7 can be with It is the executing subject for the operation indicating method that previous embodiment one provides.

Referring to Fig. 7, which includes:

Map structuring module 71, for obtaining the image of user's local environment, and based on locating for described image building user The 3D semanteme map of environment.

Watch identification module 72 attentively, for carrying out eye movement identification to user, it is semantic to judge whether the user watches the 3D attentively The article for including in map.

Pattern acquiring module 73, if watching the article for including in the 3D semanteme map attentively for the user, described in acquisition 3D semanteme map corresponding operation mode includes the operating procedure to one or more articles in the operation mode.

Monitoring modular 74 is operated, whether the operating procedure is met to the operation of the article for monitoring the user It is required that.

Operation indicating module 75, if being unsatisfactory for wanting for the operating procedure to the operation of the article for the user It asks, exports the corresponding operation indicating of the operating procedure.

Further, map structuring module 71, comprising:

Based on the color image and the depth image, the location information and posture information of the user is obtained, and The location information and Item Information of the article in user's local environment.

According to the location information of the user and posture information and the Item Information and location information of the article, building The 3D semanteme map.

Further, watch identification module 72 attentively, comprising:

Pupil diameter module carries out Pupil diameter to the eyes image for obtaining the eyes image of the user, and Watching area of the user in the 3D semanteme map is determined based on obtained pupil position information.

Duration statistical module, the article for including in the watching area for identification, and count packet in the watching area The article contained continued presence duration in the watching area.

Confirmation module is watched attentively, if determining for there is the duration greater than preset duration threshold value in the continued presence duration The user watches the article for including in the 3D semanteme map attentively.

Further, Pupil diameter module, comprising:

The eyes image is divided into N × M area image, and to all area image binarization of gray value at Reason, obtains corresponding N × M eye gray value, wherein N and M is positive integer.

The skin image of the user is obtained, and calculates the skin image binarization of gray value treated average gray Value.

According to the sequence of absolute difference from small to large, from the corresponding eye gray value and the average gray value Absolute difference, which is less than in the area image of default gray threshold, carries out optical sieving, until the region that screening obtains The pixel number that image includes is in preset quantity threshold range, the corresponding area image of pupil is obtained, to determine State position of the pupil in the eyes image.

Further, pattern acquiring module 73, comprising:

Identify the corresponding scene type of the 3D semanteme map.

Based on the article for including in the scene type and the 3D semanteme map, the corresponding operation mode is obtained.

Further, pattern acquiring module 73, further includes:

Scene Recognition module, the corresponding scene type of the 3D semanteme map for identification, and obtain the use of the user User data.

Mode decision module, for based on the article and the use for including in the scene type, the 3D semanteme map User data obtains the corresponding operation mode.

Further, mode decision module, comprising:

Obtain a variety of operation modes.

The mode for obtaining the user couple chooses instruction, and chooses instruction based on the mode, from a variety of behaviour Operation mode filters out a kind of operation mode.

Further, monitoring modular 74 is operated, comprising:

It identifies behavior of the user to the object manipulation when, and judges whether the behavior meets the operating procedure Requirement.

The corresponding preset attribute threshold value of the article that the operation mode is related to is obtained, and identifies the user to described Article carries out in operating process, and whether the goods attribute data of the article meet the preset attribute threshold value.

If the behavior is unsatisfactory for the requirement of the operating procedure and/or the goods attribute data be unsatisfactory for it is described pre- If attribute thresholds, determine that the user is unsatisfactory for the requirement of the operating procedure to the operation of the article.

Further, operation indicating module 75, comprising:

It identifies watching area of the user in the 3D semanteme map, and the behaviour is carried out based on the watching area Make the augmented reality prompted output.

Each module realizes the process of respective function in operation indicating device provided in an embodiment of the present invention, before specifically referring to The description of embodiment illustrated in fig. 1 one is stated, details are not described herein again.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

Although will also be appreciated that term " first ", " second " etc. are used in some embodiment of the present invention in the text Various elements are described, but these elements should not be limited by these terms.These terms are used only to an element It is distinguished with another element.For example, the first table can be named as the second table, and similarly, the second table can be by It is named as the first table, without departing from the range of various described embodiments.First table and the second table are all tables, but It is them is not same table.

Fig. 8 is the schematic diagram for the glasses that one embodiment of the invention provides.As shown in figure 8, the glasses 8 of the embodiment include: Processor 80, memory 81 are stored with the computer program 82 that can be run on the processor 80 in the memory 81.Institute The step realized in above-mentioned each operation indicating embodiment of the method when processor 80 executes the computer program 82 is stated, such as is schemed Step 101 shown in 1 is to 105.Alternatively, the processor 80 realizes that above-mentioned each device is implemented when executing the computer program 82 The function of each module/unit in example, such as the function of module 71 to 75 shown in Fig. 7.

Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

The memory 71 can be the internal storage unit of the glasses 7, such as the hard disk or memory of glasses 7.It is described Memory 71 is also possible to the External memory equipment of the glasses 7, such as the plug-in type hard disk being equipped on the glasses 7, intelligence Storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) Deng.Further, the memory 71 can also both include the internal storage unit of the glasses 7 or set including external storage It is standby.The memory 71 is for other programs and data needed for storing the computer program and the glasses.It is described to deposit Reservoir 71, which can be also used for temporarily storing, have been sent or data to be sent.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium It may include: any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic that can carry the computer program code Dish, CD, computer storage, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium etc..

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the essence of corresponding technical solution is departed from the spirit and scope of the technical scheme of various embodiments of the present invention, it should all It is included within protection scope of the present invention.

Claims

1. a kind of operation indicating method characterized by comprising

If the user watches the article for including in the 3D semanteme map attentively, the corresponding operation mould of the 3D semanteme map is obtained Formula includes the operating procedure to one or more articles in the operation mode；

If the user is unsatisfactory for the requirement of the operating procedure to the operation of the article, it is corresponding to export the operating procedure Operation indicating.

2. operation indicating method as described in claim 1, which is characterized in that described image includes color image and depth map Picture, the image for obtaining user's local environment, and the 3D semanteme map based on described image building building user's local environment, Include:

Based on the color image and the depth image, the location information and posture information and described of the user is obtained The location information and Item Information of the article in user's local environment；

According to the location information of the user and posture information and the Item Information and location information of the article, described in building 3D semanteme map.

3. operation indicating method as described in claim 1, which is characterized in that it is described that eye movement identification is carried out to user, judge institute State whether user watches the article for including in the 3D semanteme map attentively, comprising:

The eyes image for obtaining the user carries out Pupil diameter to the eyes image, and based on obtained pupil position letter Breath determines watching area of the user in the 3D semanteme map；

It identifies the article for including in the watching area, and counts the article for including in the watching area in the watching area Interior continued presence duration；

If there is the duration greater than preset duration threshold value in the continued presence duration, it is semantic to determine that the user watches the 3D attentively The article for including in map.

4. operation indicating method as claimed in claim 3, which is characterized in that described fixed to eyes image progress pupil Position, comprising:

The eyes image is divided into N × M area image, and all area image binarization of gray value are handled, is obtained To corresponding N × M eye gray value, wherein N and M is positive integer；

The skin image of the user is obtained, and calculates the skin image binarization of gray value treated average gray value；

According to the sequence of absolute difference from small to large, from the difference of corresponding the eye gray value and the average gray value Absolute value, which is less than in the area image of default gray threshold, carries out optical sieving, until the area image that screening obtains The pixel number for including is in preset quantity threshold range, the corresponding area image of pupil is obtained, with the determination pupil Position of the hole in the eyes image.

5. operation indicating method as described in claim 1, which is characterized in that described to obtain the corresponding behaviour of the 3D semanteme map Operation mode, comprising:

Identify the corresponding scene type of the 3D semanteme map；

6. operation indicating method as described in claim 1, which is characterized in that described to obtain the corresponding behaviour of the 3D semanteme map Operation mode, comprising:

It identifies the corresponding scene type of the 3D semanteme map, and obtains the user data of the user；

Based on the article and the user data for including in the scene type, the 3D semanteme map, obtain corresponding described Operation mode.

7. operation indicating method as claimed in claim 6, which is characterized in that the process for obtaining the operation mode includes wrapping It includes:

Obtain a variety of operation modes；

The mode for obtaining the user couple chooses instruction, and chooses instruction based on the mode, from a variety of operation moulds Formula filters out a kind of operation mode.

8. operation indicating method as described in claim 1, which is characterized in that behaviour of the monitoring user to the article Whether the requirement of the operating procedure is met, comprising:

It identifies behavior of the user to the object manipulation when, and judges whether the behavior meets wanting for the operating procedure It asks；

The corresponding preset attribute threshold value of the article that the operation mode is related to is obtained, and identifies the user to the article It carries out in operating process, whether the goods attribute data of the article meet the preset attribute threshold value；

If the behavior is unsatisfactory for the requirement of the operating procedure and/or the goods attribute data are unsatisfactory for the default category Property threshold value, determines that the user is unsatisfactory for the requirement of the operating procedure to the operation of the article.

9. operation indicating method as described in claim 1, which is characterized in that described to export the corresponding operation of the operating procedure Prompt, comprising:

It identifies watching area of the user in the 3D semanteme map, and the operation is carried out based on the watching area and is mentioned The augmented reality output shown.

10. a kind of glasses, which is characterized in that the glasses include memory, processor, and being stored on the memory can be in institute The computer program run on processor is stated, the processor realizes such as claim 1 to 9 times when executing the computer program The step of the method for anticipating.