CN104020853A

CN104020853A - Kinect-based system and method for controlling network browser

Info

Publication number: CN104020853A
Application number: CN201410283898.8A
Authority: CN
Inventors: 张庆丰; 董侠; 张嘉昕; 汤中伟; 林烈峰; 容玉钿
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2014-06-23
Filing date: 2014-06-23
Publication date: 2014-09-03

Abstract

The invention discloses a Kinect-based system and method for controlling a network browser, which are realized through a three-layer system structure of a support layer, a logic processing layer and an interaction layer. The Kinect-based system comprises a data flow receiving module, an action information obtaining module, an action information processing module and a display module. The Kinect-based method comprises the steps: receiving an original data flow of a Kinect sensor at the support layer; obtaining relevant information of a user through the original data flow of the Kinect sensor, and triggering a corresponding event according to the relevant information of the user, sending action information corresponding to the event to the logic processing layer; processing the action information sent to the support layer by the logic processing layer, and sending a processing result to the interaction layer; and displaying the processing result sent by the logic processing layer by using a browsing interface or input interface at the interaction layer. According to the Kinect-based system and method, when the Kinect is used for controlling the network browser, the basic control of the network browser and the input of text information can be realized through gestures.

Description

The system and method for the manipulation web browser based on Kinect

Technical field

The present invention relates to a kind of system and method for handling web browser, especially a kind of system and method for the manipulation web browser based on Kinect.Belong to field of human-computer interaction.

Background technology

2010, Microsoft formally issued Kinect.Kinect is a novel 3D body sense picture pick-up device.It can obtain the original sense data stream such as bone, the degree of depth, people's action is followed the trail of and is detected, to complete the control to computing machine.The man-machine interaction mode of this instant motion capture user's limbs information realization has certain revolution meaning.Meanwhile, Kinect is because of its compactness, and cheap and wieldy feature must worldwide be popularized, and become the computer-aided equipment of a family expenses.

At present, exist on the market some for Kinect and the application and development relevant with web browser, specific as follows:

1) Microsoft Research has a free JavaScript API---Kinected Browser.The data of Kinect can be used for the programming of JavaScript or DOM.IE browser on Xbox360 can use Kinect gesture operation.A team develops of Massachusetts Institute of Technology (MIT) Depthjs, allow to use Kinect to carry out web page browsing operation, comprise click, advance, retreat, page rolling, translation and amplification.But these projects all do not realize gesture input, can run into unavoidably when need to input when user carries out web page browsing, if require user to make to use gesture operating web browser device, and using keyboard to input, this is obviously very awkward, is not easy to man-machine interaction.

2) the conference presentation exchange method based on gesture identification that Chinese Patent Application No. is 201110388900.4, it is to demonstrate the data in control area by gathering in image, identification is in the gesture of demonstration control zone intra domain user, then from control gesture set, obtain corresponding control command, and control the output of conference presentation content by it.But demonstrator can only control advancing, retreat, play, stop etc. of PPT, can not be in the enterprising row labels operation of PPT, can not on PPT, write, some auxiliary function such as magnifieres etc. are not provided yet, for remote operator, can cause certain operation easier.

3) invention that Chinese Patent Application No. is 201180061879.5 be Intel company propose for web browser on the methods, devices and systems of content exchange, what it proposed is a kind of concept of macroscopic view, and the identification of primary study user gesture, instrument is not also just for Kinect.

Summary of the invention

The object of the invention is in order to solve the defect of above-mentioned prior art, a kind of system of the manipulation web browser based on Kinect is provided, this system, when handling web browser with Kinect, can realize the input to the basic controlling of web browser and text message by gesture.

Another object of the present invention is to provide a kind of method of the manipulation web browser based on Kinect.

Object of the present invention can be by taking following technical scheme to reach:

The system of the manipulation web browser based on Kinect, described system realizes by supporting layer, logical process layer and interbedded formation three-layer architecture, specifically comprises:

Data stream receiver module, for receiving the original data stream of Kinect sensor at supporting layer;

Action message acquisition module, for obtain user's relevant information by the original data stream of Kinect sensor at supporting layer, and triggers corresponding event according to user's relevant information, and the action message corresponding with event sent to logical process layer;

Action message processing module, processes for action message supporting layer being sent at logical process layer, and result is sent to interbedded formation;

Display module, for the result of utilizing browser interface or inputting interface display logic processing layer to send in interbedded formation.

As a kind of preferred version, the original data stream receiving in described data stream receiver module comprises color image data stream, depth image data stream and bone inter-area traffic interarea.

As a kind of preferred version, described action message acquisition module specifically comprises:

Data stream acquiring unit, for obtaining depth image data stream and bone inter-area traffic interarea;

Coordinate transformation unit, for the movement of automatic tracing user hand, and becomes two-dimentional coordinate depth coordinate with bone coordinate conversion, and the position of hand is mapped as to the coordinate points on screen;

Information extraction unit, for preserving the information of extracting from depth image data stream and bone inter-area traffic interarea;

Hand state judging unit, for real-time listening hand state, and judges hand state, when hand is when clenching fist state, triggers incoming event, and input message is sent to logical process layer; When the Z of hand position coordinate change and XY changes in coordinates hour, trigger and press click event, will press click information and send to logical process layer.

As a kind of preferred version, described action message processing module specifically comprises:

Input message estimation & disposing unit, judges for the input message that supporting layer is sent, if input message is the gesture information at browser interface, according to conversion table, gesture is mapped as to actual operation, and to interbedded formation transmit operation result; If input message is the trace information at inputting interface, according to the data message of hand moving process, generate ink marks set, ink marks is analyzed, and sent ink mark information and candidate word set to interbedded formation;

Press click information estimation & disposing unit, for the click information of pressing that supporting layer is sent, judge, if press click information, be at browser interface, to press the click information of large buttons, forwarding button, back or inputting interface switching push button, or the click information of pressing Chinese and English switching push button, recognition button or browser interface switching push button at inputting interface, trigger corresponding event, and this Event triggered another can be the event that interbedded formation receives and responds; If press click information, be at browser interface, to press the click information of hover button, opening timing device, collect the location point information of several hands, when the time of timer be 1.5 seconds and all location points all one among a small circle in, being judged as hovering clicks, calculate the mean value of all location points and determine the target location of clicking, then send target position information to interbedded formation.

As a kind of preferred version, described display module, specific as follows:

For what receive when interbedded formation, be the gesture information of waving up or down, at the browser interface Webpage that rolls up or down; What receive when interbedded formation is ink mark information, on the input area of the drawing board of inputting interface, shows handwriting, or handwriting is removed; What receive when interbedded formation is the event of large buttons, at browser interface, Webpage is amplified; What receive when interbedded formation is the event of forwarding button or back, at browser interface, carries out Webpage redirect; What receive when interbedded formation is the event that inputting interface switches, and browser interface is switched to inputting interface; What receive when interbedded formation is the event of Chinese and English switching push button, at inputting interface, Chinese is switched to English, or English is switched to Chinese; What receive when interbedded formation is the event of recognition button, at inputting interface, generates candidate word; What receive when interbedded formation is the event of browser interface switching push button, browser interface is switched to inputting interface; What receive when interbedded formation is the target position information that hovering is clicked, and in the target location of browser interface, triggers mouse click event.

Another object of the present invention can be by taking following technical scheme to reach:

The method of the manipulation web browser based on Kinect, described method realizes by supporting layer, logical process layer and interbedded formation three-layer architecture, specifically comprises:

At supporting layer, receive the original data stream of Kinect sensor;

At supporting layer, by the original data stream of Kinect sensor, obtain user's relevant information, and trigger corresponding event according to user's relevant information, the action message corresponding with event sent to logical process layer;

Action message supporting layer being sent at logical process layer is processed, and result is sent to interbedded formation;

The result of utilizing browser interface or inputting interface display logic processing layer to send in interbedded formation.

As a kind of preferred version, described original data stream comprises color image data stream, depth image data stream and bone inter-area traffic interarea.

As a kind of preferred version, described user's the relevant information of obtaining by the original data stream of Kinect sensor at supporting layer, and trigger corresponding event according to user's relevant information, and the action message corresponding with event sent to logical process layer, specifically comprise:

Obtain depth image data stream and bone inter-area traffic interarea;

The movement of automatic tracing user hand, and depth coordinate is become to two-dimentional coordinate with bone coordinate conversion, the position of hand is mapped as to the coordinate points on screen;

The information of extracting from depth image data stream and bone inter-area traffic interarea is preserved;

Real-time listening hand state, and hand state is judged, when hand is when clenching fist state, trigger incoming event, input message is sent to logical process layer; When the Z of hand position coordinate change and XY changes in coordinates hour, trigger and press click event, will press click information and send to logical process layer.

As a kind of preferred version, described action message supporting layer being sent at logical process layer is processed, and result is sent to interbedded formation, specifically comprises:

The input message that supporting layer is sent judges, if input message is the gesture information at browser interface, according to conversion table, gesture is mapped as to actual operation, and to interbedded formation transmit operation result; If input message is the trace information at inputting interface, according to the data message of hand moving process, generate ink marks set, ink marks is analyzed, and sent ink mark information and candidate word set to interbedded formation;

The click information of pressing that supporting layer is sent judges, if press click information, be at browser interface, to press the click information of large buttons, forwarding button, back or inputting interface switching push button, or the click information of pressing Chinese and English switching push button, recognition button or browser interface switching push button at inputting interface, trigger corresponding event, and this Event triggered another can be the event that interbedded formation receives and responds; If press click information, be at browser interface, to press the click information of hover button, opening timing device, collect the location point information of several hands, when the time of timer be 1.5 seconds and all location points all one among a small circle in, being judged as hovering clicks, calculate the mean value of all location points and determine the target location of clicking, then send target position information to interbedded formation.

As a kind of preferred version, the described result of utilizing browser interface or inputting interface display logic processing layer to send in interbedded formation, specific as follows:

What receive when interbedded formation is the gesture information of waving up or down, at the browser interface Webpage that rolls up or down; What receive when interbedded formation is ink mark information, on the input area of the drawing board of inputting interface, shows handwriting, or handwriting is removed; What receive when interbedded formation is the event of large buttons, at browser interface, Webpage is amplified; What receive when interbedded formation is the event of forwarding button or back, at browser interface, carries out Webpage redirect; What receive when interbedded formation is the event that inputting interface switches, and browser interface is switched to inputting interface; What receive when interbedded formation is the event of Chinese and English switching push button, at inputting interface, Chinese is switched to English, or English is switched to Chinese; What receive when interbedded formation is the event of recognition button, at inputting interface, generates candidate word; What receive when interbedded formation is the event of browser interface switching push button, browser interface is switched to inputting interface; What receive when interbedded formation is the target position information that hovering is clicked, and in the target location of browser interface, triggers mouse click event.

The present invention has following beneficial effect with respect to prior art:

1, system and method for the present invention is provided with browser interface and inputting interface, user can be used gesture and carry out word input or browsing page, and be divided into three layers of supporting layer, logical process layer and interbedded formation and embody structures, the degree of coupling is little, operate more stable, realization approach is also more clear, can be widely used in the operation of web browser.

2, system and method for the present invention is when handling web browser with Kinect, can realize the input to the basic controlling of web browser and text message by gesture, basic controlling comprises clickthrough, the page is amplified, carries out page jump, scroll through pages, Chinese and English is supported in text input, has overcome existingly by the system that Kinect handles web browser, only have gesture to control and there is no a weak point of gesture input.

Accompanying drawing explanation

Fig. 1 is the architectural block diagram of system of the present invention.

Fig. 2 is the browser interface schematic diagram of system of the present invention

Fig. 3 is the inputting interface schematic diagram of system of the present invention.

Fig. 4 is that the action message of system of the present invention is obtained process flow diagram.

Fig. 5 is that realization flow figure is clicked in the hovering of system of the present invention.

Fig. 6 is user's operational flowchart of system of the present invention.

Embodiment

Embodiment 1:

As shown in Figure 1, the system of the manipulation web browser based on Kinect of the present embodiment realizes by supporting layer, logical process layer and interbedded formation three-layer architecture, on supporting layer, there are data stream receiver module and action message acquisition module, on logical process layer, there is action message processing module, in interbedded formation, have display module, display module is containing browser interface and inputting interface, inputting interface as shown in Figure 2, mainly provide drawing board to user writing, and be provided with and comprise Chinese and English switching push button, the menu bar of recognition button and these three buttons of browser interface switching push button, browser interface as shown in Figure 3, also be provided with large buttons, forwarding button, back, hovering button click, the menu bar of these five buttons of inputting interface switching push button, all respectively there are the Hand icon (being user's hand skeleton point representing on interface) and KinectUserViewer control (showing the degree of depth image of personage in Kinect) in two interfaces, these two controls can be given the feedback of some overall situations of user, represent current system normal operation, wherein:

1) realization of supporting layer

Data stream receiver module, for receiving the original data stream of Kinect sensor at supporting layer, this module can directly be obtained the original data stream of Kinect sensor by Kinect SDK, and carry out on this basis the comparatively application and development of bottom, the data stream obtaining from Kinect comprises color image data stream, bone inter-area traffic interarea and depth image data stream, these data stream are all to consist of Frame, by setting up buffer zone, application program can save Frame, and the method for obtaining Frame has two kinds, respectively polling model and event model, the main use case model of system of the present embodiment, the FrameReady event of first log-on data stream during use, when Event triggered, the attribute FrameReadyEventArgs that will call event obtains Frame, and event model does not need other inspection and abnormality processing.

Action message acquisition module, for obtain user's relevant information by the original data stream of Kinect sensor at supporting layer, and trigger corresponding event according to user's relevant information, the action message corresponding with event (input message and press click information) sent to logical process layer; The flow process that this module is obtained action message as shown in Figure 4, specifically comprises data stream acquiring unit, coordinate transformation unit, information extraction unit and hand state judging unit; Wherein:

Data stream acquiring unit, for by registration DepthFrameReady and SkeletonFrameReady event, obtains depth image data stream and bone inter-area traffic interarea;

Information extraction unit, for by registration InteractionFrameReady event, preserves the information of extracting from depth image data stream and bone inter-area traffic interarea; These information comprise timestamp (time when each Frame is acquired), hand information (state and position) and user ID (which user is information belong to);

Hand state judging unit, for utilizing Interaction stream real-time listening hand state, and judges hand state, when hand is when clenching fist state, triggers incoming event, and input message is sent to logical process layer; When the Z of hand position coordinate change and XY changes in coordinates hour, trigger and press click event, will press click information and send to logical process layer; Because action message has all been stored, when needed, logical process layer can obtain these information from Interaction stream, and does relevant processing.

2) realization of logical process layer

Action message processing module, processes for action message supporting layer being sent at logical process layer, and result is sent to interbedded formation; This module specifically comprises input message estimation & disposing unit, wherein:

About the processing of trace information, need to call InkAnalyzer class, such is a class that is used for specially analyzing ink marks, topological analysis is provided, writes and draws the functions such as access of classification and handwriting recognition; Wherein, add ink marks and remove ink marks and need to use AddStroke method and RemoveStroke method, Analyze () method starts synchronous ink imprint analysis operation; Ink imprint analysis comprises topological analysis, writes and draws classification and handwriting recognition, and GetAltemates () method can obtain alternative analysis result, by using InkAnalyzer class can obtain very high discrimination.

When supporting layer transmits input message, represent that input starts, because hand is to move on the drawing board of inputting interface, and record is the location point of each time point hand, so will first carry out type conversion, by Point conversion in type, be Stroke type, then to Stroke, utilize Recognize function to analyze and identify, by false code, be described below:

1) Function Begin

2) not input of if

3) return to sky Candidate Set

4) else

5) call GetCombinedStore (strokes) function and obtain ink marks

6) newly-built class InkAnalyzer object

7) calling class methods AddStroke adds ink marks in object to

8) call class methods SetStrokeType ink marks type is set

9) call class methods Analyze and start to analyze ink marks

10) if analyzes successfully

11) return to Candidate Set

12) else

13) discharge all resources that InkAnalyzer is used

14) return to sky Candidate Set

15) endif

16) endif

17) Function End

Each candidate word in set, is made into a KinectTileButton, and all buttons concentrate in a ScrollViewer control, at inputting interface, drags scroll bar and presses button and just can select needed character.

Press click information estimation & disposing unit, for the click information of pressing that supporting layer is sent, judge, if press click information, be to press large buttons at browser interface, forwarding button, the click information of back or inputting interface switching push button, or press Chinese and English switching push button at inputting interface, the click information of recognition button or browser interface switching push button, press large buttons, forwarding button, back, inputting interface switching push button etc. is all the KinectTileButton that Kinect supports, and another can be the event that interbedded formation receives and responds this Event triggered, if press click information, be at browser interface, to press the click information of hover button, as shown in Figure 5, opening timing device, collect the location point information of several hands, when the time of timer be 1.5 seconds and all location points all one among a small circle in, being judged as hovering and clicking, calculate the mean value of all location points and determine the target location of clicking, then send target position information to interbedded formation, this is for the element of webpage or the designed a kind of hover point blow mode of browser function menu.

3) realization of interbedded formation

Display module, for the result of utilizing browser interface or inputting interface display logic processing layer to send in interbedded formation, specific as follows:

For what receive when interbedded formation, be the gesture information of waving up or down, at the browser interface Webpage that rolls up or down; What receive when interbedded formation is ink mark information, on the input area of the drawing board of inputting interface, shows handwriting, or handwriting is removed; What receive when interbedded formation is the event of large buttons, at browser interface by Webpage; What receive when interbedded formation is the event of forwarding button or back, at browser interface, carries out Webpage redirect; What receive when interbedded formation is the event that inputting interface switches, and browser interface is switched to inputting interface; What receive when interbedded formation is the event of Chinese and English switching push button, at inputting interface, Chinese is switched to English, or English is switched to Chinese; What receive when interbedded formation is the event of recognition button, at inputting interface, generates candidate word; What receive when interbedded formation is the event of browser interface switching push button, browser interface is switched to inputting interface; What receive when interbedded formation is the target position information that hovering is clicked, and as shown in Figure 5, in the target location of browser interface, triggers mouse click event.

About the design of inputting interface, core is to create a panel, and hand also can be write as mouse on drawing board; First, Kinect has to push away before Kinect Interactions (gesture identification), palm and can realize button (Push-to-Press), grasps gesture and can realize the functions such as translation (Grip-to-Pan), the prerequisite of using is on interface, to add a KinectRegion container class, KinectRegion is used Kinect Interaction to carry out mutual key element in WPF, it is the container of other Kinect interaction control, KinectRegion is also responsible for showing and mobile the Hand icon, i.e. user's hand skeleton point representing on interface.Application program can have a plurality of KinectRegion on main interface, but each KinectRegion can not be nested, can have the Kinect sensor object of oneself in each KinectRegion; Therefore, interbedded formation is mainly comprised of a KinectRegion container, is placed with the controls such as drawing board and button above.

Traditional drawing board is to define the region that receives and show ink stroke, by InkCanvas class, controlled, mouse can move in the above and write, in order to realize hand and mouse binding and to show on drawing board, just must in InkCanvas class, add the support to Kinect, so a newly-built Kinectinkcanve class, is to inherit InkCanvas class, has increased some for identifying the method for hand motion.Wherein InitializeKinectinkcanva is core methed, calls the gesture operation that KinectRegion container has, and makes drawing board also can identify gesture; In addition, when hand state changes, OnKinectRegionChanged method is new data more, and be synchronized to drawing board, by Kinectinkcanve class, drawing board can show hand and mobile the Hand icon, and due to the primary function of drawing board, hand also can have been scribbled as mouse on drawing board.

As shown in Fig. 2, Fig. 3 and Fig. 6, the system user operation steps of the manipulation web browser based on Kinect of the present embodiment is as follows:

S1, connection Kinect, open system and browser;

S2, be switched to browser interface, press hovering button click, open hover point blow mode;

S3, hand is moved on address field or search column, hover 1.5 seconds;

S4, acquisition input focus, be switched to inputting interface, by Chinese and English switching push button, selects Chinese or English mode, prepares to carry out handwriting input;

S5, clench fist, start to write, the motion track of hand can be recorded, and the input area 1 on inputting interface can show handwriting simultaneously, until loose fist finishes;

S6, if desired rewriting, person's handwriting is removed in the setting-out to the right of can clenching fist, and then re-enters; Otherwise, press recognition button, the candidate word region 2 that inputting interface below just there will be several candidate words to form, can drag scroll bar 3 and press candidate word button and select needed character;

S7, select after needed character, if complete input, be switched to browser interface; Otherwise, return to step S5 and continue input;

S8, when browser interface, the mode of clicking by hovering enters the needed page (being the page corresponding to institute's inputting word information), and the operation that now can carry out has: the menu item of the mode click browser of clicking by hovering and page elements, upwards wave or wave scroll through pages downwards, utilize large buttons to carry out page amplification, utilize forwarding button or back to carry out page jump; While if desired carrying out word input, can, the hovering in the localities of inputting 1.5 seconds, adopt the method for step S3～S6 to input; If user does not need to continue browsing pages, after browsing, shutdown system also disconnects Kinect connection.

Embodiment 2:

The method of the manipulation web browser based on Kinect of the present embodiment is corresponding with the system of embodiment 1, realizes equally by supporting layer, logical process layer and interbedded formation three-layer architecture, specifically comprises:

At supporting layer, receive the original data stream of Kinect sensor, described original data stream comprises color image data stream, depth image data stream and bone inter-area traffic interarea;

Describedly at supporting layer, by the original data stream of Kinect sensor, obtain user's relevant information, and trigger corresponding event according to user's relevant information, the action message corresponding with event sent to logical process layer, specifically comprise:

Obtain depth image data stream and bone inter-area traffic interarea;

Described action message supporting layer being sent at logical process layer is processed, and result is sent to interbedded formation, specifically comprises:

The described result of utilizing browser interface or inputting interface display logic processing layer to send in interbedded formation, specific as follows:

One of ordinary skill in the art will appreciate that all or part of step in the system and method for realizing above-described embodiment can come the hardware that instruction is relevant to complete by program, corresponding program can be stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk or CD etc.

In sum, system and method of the present invention is provided with browser interface and inputting interface, user can be used gesture and carry out word input or browsing page, and be divided into three layers of supporting layer, logical process layer and interbedded formation and embody structures, the degree of coupling is little, operate more stablely, realization approach is also more clear, can be widely used in the operation of web browser.

The above; it is only patent preferred embodiment of the present invention; but the protection domain of patent of the present invention is not limited to this; anyly be familiar with those skilled in the art in the disclosed scope of patent of the present invention; according to the present invention, the technical scheme of patent and patent of invention design thereof are equal to replacement or are changed, and all belong to the protection domain of patent of the present invention.

Claims

1. the system of the manipulation web browser based on Kinect, is characterized in that described system realizes by supporting layer, logical process layer and interbedded formation three-layer architecture, specifically comprises:

2. the system of the manipulation web browser based on Kinect according to claim 1, is characterized in that: the original data stream receiving in described data stream receiver module comprises color image data stream, depth image data stream and bone inter-area traffic interarea.

3. the system of the manipulation web browser based on Kinect according to claim 2, is characterized in that: described action message acquisition module specifically comprises:

4. the system of the manipulation web browser based on Kinect according to claim 3, is characterized in that: described action message processing module specifically comprises:

5. the system of the manipulation web browser based on Kinect according to claim 4, is characterized in that: described display module, specific as follows:

6. the method for the manipulation web browser based on Kinect, is characterized in that described method realizes by supporting layer, logical process layer and interbedded formation three-layer architecture, specifically comprises:

At supporting layer, receive the original data stream of Kinect sensor;

7. the method for the manipulation web browser based on Kinect according to claim 6, is characterized in that: described original data stream comprises color image data stream, depth image data stream and bone inter-area traffic interarea.

8. the method for the manipulation web browser based on Kinect according to claim 7, it is characterized in that: described user's the relevant information of obtaining by the original data stream of Kinect sensor at supporting layer, and trigger corresponding event according to user's relevant information, the action message corresponding with event sent to logical process layer, specifically comprises:

Obtain depth image data stream and bone inter-area traffic interarea;

9. the method for the manipulation web browser based on Kinect according to claim 8, is characterized in that: described action message supporting layer being sent at logical process layer is processed, and result is sent to interbedded formation, specifically comprises:

10. the method for the manipulation web browser based on Kinect according to claim 9, is characterized in that: the described result of utilizing browser interface or inputting interface display logic processing layer to send in interbedded formation, specific as follows: