CN104199834A - Method and system for interactively obtaining and outputting remote resources on surface of information carrier - Google Patents

Method and system for interactively obtaining and outputting remote resources on surface of information carrier Download PDF

Info

Publication number
CN104199834A
CN104199834A CN201410377980.7A CN201410377980A CN104199834A CN 104199834 A CN104199834 A CN 104199834A CN 201410377980 A CN201410377980 A CN 201410377980A CN 104199834 A CN104199834 A CN 104199834A
Authority
CN
China
Prior art keywords
indication
finger
instruction
user
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410377980.7A
Other languages
Chinese (zh)
Other versions
CN104199834B (en
Inventor
徐�明
徐颢毓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410377980.7A priority Critical patent/CN104199834B/en
Publication of CN104199834A publication Critical patent/CN104199834A/en
Application granted granted Critical
Publication of CN104199834B publication Critical patent/CN104199834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9554Retrieval from the web using information identifiers, e.g. uniform resource locators [URL] by using bar codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/23Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on positionally close patterns or neighbourhood relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technology of video tracking and image recognizing, and provides a method and system for interactively obtaining and outputting remote resources on the surface of an information carrier. The method comprises the following steps that A, a camera of a user side carries out target recognition on an indication body, the moving track and pause of an indication end point are tracked, and user indication intention and an indication area are obtained; B, image block bitmap data are extracted according to the indication intention, and content information included in the image block bitmap data are recognized; C, a retrieval expression is generated according to the content information and sent to a remote server; D, the remote server retrieves multimedia resources meeting the conditions in a special knowledge base through the retrieval expression and sends the multimedia resources to the user side; E, the user side outputs the multimedia resources, and prepares for the next time user interaction. According to the method, the content of the indication area can be recognized from the surface of the target information carrier, and correlated retrieval and output can be carried out.

Description

Obtain the method and system of remote resource output from information carrier surface interactive mode
Technical field
The present invention relates to video and image acquisition and recognition technology, relate in particular to a kind of method and system of obtaining remote resource output from information carrier surface interactive mode.
Background technology
Point reader and the talking pen occurring are in recent years well received children education study electronic products.Point reader is called again intelligent computer book reading machine, English interactive electronic textbook, synchronous book reading machine, electronic textbook etc., it is a kind of sound interactive learning product, it becomes the books teaching material of word the Sound teaching material that can need by study any sounding by electromagnetic induction location technology.1999, leapfrog company of the U.S., according to the rule of children learn language, developed early stage point reader product, became rapidly U.S. child and began to learn the indispensable instrument in stage at language, was popular for afterwards Japan, Singapore and country in Southeast Asia.Calendar year 2001, concept and the technology of point reader have been introduced China, Ying Yipai, sound precious, learn and find pleasure in, gets well first-class, student, high large quantities of company sets foot in this industry one after another step by step.Under the participation and effort of many companies, point reader has experienced from veneer to be opened to two, from wired to wireless again to wired, store large capacity storage into from low capacity, from downloading to USB again without downloading to the download of RS232 serial ports, from dedicated voice compression chip to using the compression of MP3 universal phonetic, from optic panel to panel, print additional content, profile from drawer type to two evolutions of opening integral forming.
The ultimate principle of point reader: making pronunciation when file, give pronunciation file pre-set the longitude and latitude position of corresponding word content.Not that any textbook can reading, only have the textbook of having made related audio file resource in advance and being stored into point reader storer, just can be applied to point reader.When use, require textbook to be placed on the tram of machine flat board, first use special pens to choose books and the page number, then with the word content of upper (X, the Y) position of special pens point textbook page, flat board perceives be equipped with pen and taps (X, Y) this point, just receive instruction, read the corresponding audio file of this point, play by machine.Under reading mode of learning, touch word or the picture on teaching material with talking pen, just can carry out human-computer interaction study, listen the explanation of current page content, follow read, re-reading, recording contrast etc.
Occurred in recent years the new product talking pen of revising of the functions, " longitude and latitude coordinate tablet " made transparent " longitude and latitude coordinate plastic foil " by its principle.This improvement is given talking pen but not above-mentioned tablet by coordinate identification work, when use, only film need be covered on certain one page of books, clicks the corresponding page number, just can reading sounding.
The principle of work of up-to-date talking pen learning system is: on books, print additional Quick Response Code or add recessive Magnetic Induction material at paper nexine in advance, meanwhile, make the audio files matching with books and be stored in the storage chip in talking pen.Reading nib has Quick Response Code that identification prints on page or the function of Magnetic Induction material institute inclusion information, user in use selects to want certain one page of reading, click the content of the positions such as this page of upper specific pattern, word, numeral, talking pen can be identified bookish Quick Response Code or be read out by the electromagnetic induction head assembling in nib the index information that in paper, Magnetic Induction material comprises by the camera assembling in nib, thereby find out audio files corresponding in storage chip, for study.
No matter be point reader, or talking pen, they have common feature: one, need to comprise special arrangement or the film of longitude and latitude coordinate, or use in advance special process to carry out books printing; Two, education resource fixedly matches with books, is stored in the storage chip on point reader or talking pen.These features, cause traditional reading cost of products high, carry inconvenience, and storer learning content and books are fixing supporting, are difficult to expand, use face is restricted.Graph text information in corresponding numerous general books, data, reading product cannot carry out reading.
Summary of the invention
Technical matters to be solved by this invention is to provide the method and system of obtaining remote resource output from information carrier surface interactive mode, is intended to solve existing reading product and technical costs high, the problem that application space is narrow.
The present invention is achieved in that the method for obtaining remote resource output from information carrier surface interactive mode, comprises the following steps:
Steps A, the indication body that enters its camera visual field is carried out to target identification with user side, follow the tracks of out movement locus and the stall position of indication end end points on indication body, obtain user's instruction intention and indicating area according to described movement locus and stall position; The camera of described user side is positioned at target information carrier surface top, for target information carrier is taken;
Step B, user side, according to described user's instruction intention, extract the image block data bitmap in described indicating area, and identify the content information comprising in described image block data bitmap;
Step C, user side are according to described user's instruction intention, convert described content information to current classification code or current term, if described current term has renewal, generate expression formula for search, and described expression formula for search is sent to remote server, otherwise return to steps A;
Step D, described remote server retrieve qualified multimedia resource by described expression formula for search in specialized knowledge base, and send to described user side;
Step e, described user side are exported the described multimedia resource receiving, and then turn back to steps A, prepare user interactions next time.
Further, described steps A specifically comprises the following steps:
The camera collection current video image frame of steps A 1, user side;
The RGB color data of each pixel in steps A 2, the user side current video image frame that read step A1 gathers one by one, calculate according to the indication body appearance color gauss hybrid models of setting up in advance the probable value that this pixel belongs to described indication body appearance color gauss hybrid models, and judge whether this pixel matches with the appearance color of indication body;
Steps A 3, repeating step A2, until each pixel in described current video image frame is disposed; Choose again the foreground image that all pixels that match with appearance color indication body obtain current video image frame, described foreground image is carried out to filtering and eliminating noise, then the method based on mathematical morphology is carried out profile detection, choose largest connected region in the detected whole profiles profile as indication body, in the profile of described indication body, search out the indication end endpoint location of indication body;
The indication end endpoint location of indication body in steps A 4, the current video image frame that extracts according to steps A 3, described indication end end points is carried out to gripper path analysis and the understanding of instruction intention, if not yet determine user's instruction intention and indicating area, turn back to steps A 1 and continue to carry out.
Further, in described steps A 2, indicate external table color gauss hybrid models to be divided into finger complexion model and lip pencil indicant color model;
Described indication body appearance color gauss hybrid models all carries out modeling based on CrCgCb color space application gauss hybrid models technology, adopt multiple single Gaussian distribution to mix, by following formula, the probability density function G (x) of model be weighted to mix and calculate: G ( x ) = Σ j = 1 M α j P j ( c , μ j , Σ j ) , Σ j = 1 M α j = 1 , Wherein, M is the number of the single Gaussian distribution that comprises of model, α jfor the hybrid weight of the probability density function of each single Gaussian distribution, Ρ j(c, μ j, Σ j) be defined as: P ( c , μ , Σ ) = 1 ( 2 π ) | Σ | exp [ - 1 2 ( c - μ ) T Σ - 1 ( c - μ ) ] , Wherein, the transposition of T representing matrix, c=[c r, c g, c b] tfor the three-component column vector of texture color CrCgCb of pixel to be estimated, μ is that model is expected, Σ is model variance, and μ, Σ are by the CrCgCb characteristic series vector c of some training sample pixels idraw, for mean vector, covariance matrix, the number that n is training sample;
Described finger complexion model as follows G01-G02 is trained each calculating parameter that obtains model:
Step G01, using the finger skin pixel rgb value collecting for the experiment people at multiple illumination, different model camera and different sexes and age in advance as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of the Gaussian Mixture probability density function of finger complexion model;
Step G02, before user uses described steps A, gather this user self finger colour of skin RGB data under its current environment for use, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of Gaussian Mixture probability density function to finger complexion model definite in step G01 carries out revaluation training, upgrades the calculating parameter of finger complexion model according to the result of revaluation training;
Described lip pencil indicant color model as follows G11-G12 is trained each calculating parameter that obtains model:
Step G11, using the lip pencil indicant appearance colored pixels rgb value collecting for multiple illumination, different model camera in advance as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of the Gaussian Mixture probability density function of lip pencil indicant color model;
Step G12, before user uses described steps A, gather the lip pencil indicant appearance color RGB data under the current environment for use of this user, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of Gaussian Mixture probability density function to lip pencil indicant color model definite in step G11 carries out revaluation training, upgrades the calculating parameter of lip pencil indicant color model according to the result of revaluation training.
Further, when the profile of indication body processing in described steps A 3, be divided into the processing of finger contours and the processing of lip pencil indicant profile;
Described finger contours is processed and when definite finger fingertip indicating positions, is specifically comprised the following steps A301-A304:
Steps A 301, definition template subimage, carry out erosion operation processing to the finger contours image after binaryzation;
Steps A 302, finger contours after treatment is carried out to transverse and longitudinal coordinate projection, from top to bottom, from left to right, search the place of projection value significant change, as the rough position of finger fingertip, and centered by this position, structure rough search window;
Steps A 303, employing individual layer function Connection Neural Network are predicted described finger fingertip position, determine accurate search window, are specifically calculated as: Y=[X|f (XW h+ β h)] W, wherein, X is input vector, β hfor bias matrix, W hfor input layer is to the weight matrix of hidden layer, W is the weight matrix that precondition goes out, and described weight matrix adopts horizontal line from left to right, horizontal line from right to left, clockwise circle shape, counterclockwise circle shape, clockwise rectangular shape on training strategy, rectangular shape is trained counterclockwise;
Steps A 304, carry out finger fingertip detection based on template matches, define several template finger, in described accurate search window, calculate the absolute value distance of subimage to be matched and each template finger, the exact position that draws finger fingertip;
Described lip pencil indicant profile is processed and when the indication end endpoint location of definite lip pencil indicant, is specifically comprised the following steps A311-A312:
Steps A 311, the lip pencil indicant contour images after binaryzation is communicated with to processing, asks for the center of gravity of connected graph;
Steps A 312, using the center of gravity of described connected graph as search center point, and as center from upper left side, directly over search for to top-right search order, calculate successively the Euclidean distance of described search center point to each pixel on profile diagram, take out described Euclidean distance value the maximum, find the corresponding point on lip pencil indicant profile diagram according to this apart from maximum path, as the final position of lip pencil indicant indication end end points.
Further, described steps A 4 specifically comprises the following steps:
The indication end endpoint location of indication body in the current video image frame that steps A 401, obtaining step A3 are extracted, with its before in some frames the indication end endpoint location of indication body compare, calculate translational speed and direction;
Steps A 402, according to time sequencing, the indication end end points translational speed in each frame is analyzed, in the time detecting that indication end end points has an obvious pause process first, using this residing position coordinates that pauses as starting point indicating positions;
Steps A 403, recognizing after starting point indicating positions, in the time again detecting that the motion of indication end end points has a marked halt, using residing this pause position coordinates as End point indication position, if End point indication position not yet detected, return to execution step A1;
Steps A 404, between starting point indicating positions and End point indication position, movement locus to indication end end points is analyzed: if detect that indication end end points from left to right or from right to left does the motion of near linear, the instruction that represents user is intended that text instruction, and indicating area is the residing image block areas of line of text of end points movement locus top, indication end; If detect that indication end end points does the sealing campaign of similar circle shape or approaches the motion of sealing, represent that user's instruction is intended that icon instruction, indicating area is for connecing the image block areas that rectangle comprises in the end points circus movement track of indication end; If detect that indication end end points does the sealing campaign of similar rectangle or approaches the motion of sealing, the instruction schematic diagram that represents user is graphic code instruction, and indicating area is the image block areas comprising in the end points regular-shape motion track of indication end.
Further, described in described step B, picture material recognition methods specifically comprises:
According to the instruction intention of differentiating user out in steps A, if text instruction selects OCR text recognition method described image block data bitmap to be carried out to content recognition, the text that obtains wherein comprising and character string information;
According to the instruction intention of differentiating user out in steps A, if icon instruction, according to default icon library, utilize each icon feature templates in icon library, select the method for image recognition to carry out content recognition to described image block data bitmap, obtain corresponding to the icon index character string information in described icon library;
According to the instruction intention of differentiating user out in steps A, if graphic code instruction selects respectively Quick Response Code and bar code recognition method described image block data bitmap to be carried out to content recognition, the character string information that obtains wherein comprising.
Further, the process that generates expression formula for search in described step C specifically comprises the following steps:
Step C1, indicate intention according to differentiating user out in steps A, the content information recognizing in step B is changed into search condition item, concrete grammar comprises:
If user's instruction is intended to icon instruction or graphic code instruction, using the described content information recognizing in step B as current classification code, and remove all current terms;
If user's instruction is intended to text instruction, the described content information recognizing in step B is carried out to word segmentation processing, extract vocabulary wherein or phrase as current term;
If the described current term of step C2 has renewal, described current classification code and each current term are carried out to logical combination, generate expression formula for search.
The present invention also provides a kind of system of obtaining remote resource output from information carrier surface interactive mode, comprising: indication body type arranges module, video capture and target identification module, indication body appearance color gauss hybrid models storehouse, gripper path analysis module, image block extraction and content identifier module, search condition generation module, network delivery module, remote resource retrieval module, specialized knowledge base and information and shows or playing module;
Described indication body type arranges module and is connected with described video capture and target identification module, and for selecting the presently used indication body type of user, described indication body type comprises the finger of different ethnic groups and the lip pencil indicant of multiple color; Described video capture and target identification module arrange the current selected indication body type of module according to described indication body type and enable corresponding indication body appearance color gauss hybrid models and indication end end points searching method;
Described indication body appearance color gauss hybrid models storehouse comprises the finger complexion model of different ethnic groups and the lip pencil indicant color model of multiple color, and in described indication body appearance color gauss hybrid models storehouse, each model arranges the selected current indication body type of module by described video capture and target identification module according to described indication body type and selects;
Described video capture and target identification module comprise camera, foreground image extraction unit and endpoint location unit, indication end, and described camera is positioned at target information carrier surface top and carries out video capture, and the indication body coming into view is carried out to target identification;
Described foreground image extraction unit reads each current video image frame collecting the RGB color data of each pixel one by one, calculate according to the indication body appearance color gauss hybrid models of setting up in advance the probable value whether described pixel belongs to described indication body appearance color gauss hybrid models, and differentiate described pixel and whether match with the appearance color of indication body, after each pixel in described current video image frame is disposed, obtain the foreground image of described current video image frame;
Endpoint location unit, described indication end, described foreground image is carried out to filtering and eliminating noise, then the method based on mathematical morphology is carried out profile detection, choose largest connected region in the detected whole profiles profile as indication body, in the profile of described indication body, search out the indication end endpoint location of indication body;
Described gripper path analysis module is connected with described video capture and target identification module, for following the tracks of out movement locus and the pause of indication end end points on indication body, obtains user's instruction intention and indicating area according to described movement locus and pause;
Described image block extracts and content identifier module is intended to according to described user's instruction, extracts the image block data bitmap in described indicating area, and adopts picture material recognition methods to identify the content information comprising in described image block data bitmap;
Described search condition generation module is connected with described image block extraction and content identifier module, described network delivery module respectively, according to described user's instruction intention, described image block is extracted and content identifier module identifies content information or as current classification code, or carry out word segmentation processing and extract vocabulary wherein or phrase as current term, if current term has renewal, thereby current classification code and each current term are carried out to logical combination generation expression formula for search, and described expression formula for search is sent to described network delivery module;
Described network delivery module shows with described remote resource retrieval module, described information respectively or playing module is connected, utilize wired or wireless network that described expression formula for search is sent to described remote resource retrieval module, also for the multimedia resource being retrieved by the described remote resource retrieval module forwarding receiving is given, described information shows described network delivery module or playing module is exported;
Described remote resource retrieval module is connected with described specialized knowledge base, described expression formula for search is retrieved in described specialized knowledge base to qualified multimedia resource, and sends it back described network delivery module;
In described specialized knowledge base, at least comprise text, hypertext, audio frequency, video, animation and three-dimensional artificial resource, each resource at least marks keyword, classification code and the autograph information for retrieval.
Further, described indication body appearance color gauss hybrid models is divided into finger complexion model and lip pencil indicant color model;
Described indication body appearance color gauss hybrid models all carries out modeling based on CrCgCb color space application gauss hybrid models technology, adopt multiple single Gaussian distribution to mix, by following formula, the probability density function G (x) of model be weighted to mix and calculate: G ( x ) = Σ j = 1 M α j P j ( c , μ j , Σ j ) , Σ j = 1 M α j = 1 , Wherein, M is the number of the single Gaussian distribution that comprises of model, α jfor the hybrid weight of the probability density function of each single Gaussian distribution, Ρ j(c, μ j, Σ j) be defined as: P ( c , μ , Σ ) = 1 ( 2 π ) | Σ | exp [ - 1 2 ( c - μ ) T Σ - 1 ( c - μ ) ] , Wherein, the transposition of T representing matrix, c=[c r, c g, c b] tfor the three-component column vector of texture color CrCgCb of pixel to be estimated, μ is that model is expected, Σ is model variance, and μ, Σ are by the CrCgCb characteristic series vector c of some training sample pixels idraw, for mean vector, covariance matrix, the number that n is training sample;
Described finger complexion model and lip pencil indicant color model are in advance all observed method that sample carries out model training and are obtained each calculating parameter of model by collection;
While training described finger complexion model, the finger skin pixel rgb value that the prior experiment people for multiple illumination, different model camera and different sexes and age is collected is as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of finger complexion model Gaussian Mixture probability density function;
Before described video capture and target identification module use described finger complexion model, can also gather user's self finger colour of skin RGB data under its current environment for use, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of the Gaussian Mixture probability density function to described finger complexion model carries out revaluation training;
While training described lip pencil indicant color model, using the lip pencil indicant appearance colored pixels rgb value collecting for multiple illumination, different model camera in advance as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of described lip pencil indicant color model Gaussian Mixture probability density function;
Before described video capture and target identification module use described lip pencil indicant color model, can also gather the lip pencil indicant appearance color RGB data under the current environment for use of this user, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of the Gaussian Mixture probability density function to described lip pencil indicant color model carries out revaluation training.
Further, endpoint location unit, described indication end comprises finger endpoint location unit, indication end and endpoint location unit, lip pencil indicant indication end;
First endpoint location unit, described finger indication end defines template subimage, and the finger contours image after binaryzation is carried out to erosion operation processing; Then finger contours after treatment is carried out to transverse and longitudinal coordinate projection, from top to bottom, from left to right, search the place of projection value significant change, as the rough position of finger fingertip, and centered by this position, structure rough search window; Adopt again individual layer function Connection Neural Network to predict described finger fingertip position, determine accurate search window, be specifically calculated as: Y=[X|f (XW h+ β h)] W, wherein, X is input vector, β hfor bias matrix, W hfor input layer is to the weight matrix of hidden layer, W is the weight matrix that precondition goes out, and described weight matrix adopts horizontal line from left to right, horizontal line from right to left, clockwise circle shape, counterclockwise circle shape, clockwise rectangular shape on training strategy, rectangular shape is trained counterclockwise; Carry out finger fingertip detection based on template matches, define several template finger, in described accurate search window, calculate the absolute value distance of subimage to be matched and each template finger, the exact position that draws finger fingertip;
First endpoint location unit, described lip pencil indicant indication end is communicated with processing to the lip pencil indicant contour images after binaryzation, asks for the center of gravity of connected graph; Then the point using the center of gravity of described connected graph as search center, and as center from upper left side, directly over search for to top-right search order, calculate successively the Euclidean distance of described search center point to each pixel on profile diagram, take out described Euclidean distance value the maximum, find the corresponding point on lip pencil indicant profile diagram according to this apart from maximum path, as the final position of lip pencil indicant instruction end points.
Further, described gripper path analysis module comprises that end points translational speed and direction calculating unit, starting point positioning unit, terminal positioning unit and instruction are intended to understand unit;
Described end points translational speed and direction calculating unit be the indication end endpoint location of indication body in current video image frame, with its before in some frames the indication end endpoint location of indication body compare, calculate translational speed and moving direction;
Described starting point positioning unit is analyzed the translational speed of the indication end end points in each frame according to time sequencing, in the time detecting that there is an obvious pause process indication end first, using this residing position coordinates that pauses as starting point indicating positions;
Described terminal positioning unit is recognizing after starting point indicating positions, in the time again detecting that the motion of indication end has a marked halt, using residing this pause position coordinates as End point indication position, if End point indication position not yet detected, continue to calculate translational speed and the moving direction of indication end by described translational speed and direction calculating unit;
Described instruction intention understand unit for to indication end the movement locus between starting point indicating positions and End point indication position analyze: if indication end detected and from left to right or from right to left do the motion of near linear, the instruction that represents user is intended that text instruction, and indicating area is the residing image block areas of line of text of end points movement locus top, indication end; If detect that indication end does the sealing campaign of similar circle shape or approach the motion of sealing, the instruction that represents user is intended that icon instruction, and indicating area is for connecing the image block areas that rectangle comprises in the end points circus movement track of indication end; If detect that indication end does the sealing campaign of similar rectangle or approach the motion of sealing, the instruction schematic diagram that represents user is graphic code instruction, and indicating area is the image block areas comprising in the end points regular-shape motion track of indication end.
Further, described image block extraction and content identifier module comprise image block extraction unit, text identification unit, icon recognition unit and graphic code recognition unit;
The user indicating area that described image block extraction unit is understood out according to described gripper path analysis module, extract the image block data bitmap in described indicating area, and the user who understands out according to described gripper path analysis module indicates intention, call corresponding a kind of recognition unit from text identification unit, icon recognition unit and graphic code recognition unit and identify the content information comprising described image block data bitmap;
Described text identification unit, adopts OCR text recognition method described image block data bitmap to be carried out to content recognition, the text that obtains wherein comprising and character string information;
Described icon recognition unit, according to default icon library, utilizes each icon feature templates in icon library, selects the method for image recognition to carry out content recognition to described image block data bitmap, obtains corresponding to the icon index character string information in icon library;
Described graphic code recognition unit, selects respectively Quick Response Code and bar code recognition method described image block data bitmap to be carried out to content recognition, the character string information that obtains wherein comprising.
The present invention compared with prior art, beneficial effect is: the described method of obtaining remote resource output from information carrier surface interactive mode can be applicable to print media, the identification of the various media surfaces such as electronic display unit appointed area internal information, and the content information recognizing is converted into expression formula for search in specialized knowledge base, searches for the multimedia resource being associated, also the multimedia resource being associated is fed back, the method can not only be carried out content recognition to indicating area from all communications medias surface, can also from the large specialized knowledge base of memory space, retrieve be associated multimedia resource feedback output, application surface is very extensive.
Brief description of the drawings
Fig. 1 is the present invention obtains the method for remote resource output process flow diagram from information carrier surface interactive mode;
Fig. 2 is the structural representation that obtains the system of remote resource output from information carrier surface interactive mode;
Fig. 3 is the structural representation of image block extraction and content identifier module.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
The present invention retrieves and exports the information on information carrier by Internet resources, and the mode of output can adopt display mode, the hypertext web page browsing mode of mode, word or the graph image of speech play, broadcast mode of video cartoon etc.Described information carrier comprises paper media and Digital Media, wherein, paper media comprises the Digital Medias such as books, publication, newspaper, magazine, printed material, Digital Media comprises the webpage that shows on flat-panel display devices, text, hypertext, figure, image, video, animation etc., and flat-panel display devices can be mobile phone, panel computer, notebook computer, liquid crystal display, TV etc.Described Internet resources comprise the single medium resources such as text, hypertext, graph image, audio frequency, video, plane animation, three-dimensional artificial scene, and the hypermedia composite structure being made up of according to a certain concept or theme these single medium resources.
As shown in Figure 1, for the present invention's one preferred embodiment, a kind of method of obtaining remote resource output from information carrier surface interactive mode, comprise the following steps: steps A, with user side to entering the indication body in its camera visual field and take and target identification, follow the tracks of out movement locus and the stall position of indication end end points on indication body, obtain user's instruction intention and indicating area according to described movement locus and stall position; The camera of described user side is positioned at target information carrier surface top, for target information carrier is taken; Step B, user side, according to described user's instruction intention, extract the image block data bitmap in described indicating area, and adopt picture material recognition methods to identify the content information comprising in described image block data bitmap; Step C, user side are according to described user's instruction intention, convert described content information to current classification code or current term, if described current term has renewal, generate expression formula for search, and described expression formula for search is sent to remote server, otherwise return to steps A; Step D, described remote server retrieve qualified multimedia resource by described expression formula for search in specialized knowledge base, and send to described user side; Step e, described user side are exported the described multimedia resource receiving, and then turn back to steps A, prepare user interactions next time.
Steps A specifically comprises the following steps:
Steps A 1, the user side camera collection current video image frame that is positioned at target information carrier surface top.
The RGB color data of each pixel in steps A 2, the user side current video image frame that read step A1 gathers one by one, calculate according to the indication body appearance color gauss hybrid models of setting up in advance the probable value that this pixel belongs to described indication body appearance color gauss hybrid models, and judge whether this pixel matches with the appearance color of indication body.
Indication body appearance color gauss hybrid models is divided into finger complexion model and lip pencil indicant color model:
Indication body appearance color gauss hybrid models all carries out modeling based on CrCgCb color space application gauss hybrid models technology, adopt multiple single Gaussian distribution to mix, by following formula, the probability density function G (x) of model be weighted to mix and calculate: G ( x ) = Σ j = 1 M α j P j ( c , μ j , Σ j ) , Σ j = 1 M α j = 1 , Wherein, M is the number of the single Gaussian distribution that comprises of model, α jfor the hybrid weight of the probability density function of each single Gaussian distribution, Ρ j(c, μ j, Σ j) be defined as: P ( c , μ , Σ ) = 1 ( 2 π ) | Σ | exp [ - 1 2 ( c - μ ) T Σ - 1 ( c - μ ) ] , Wherein, the transposition of T representing matrix, c=[c r, c g, c b] tfor the three-component column vector of texture color CrCgCb of pixel to be estimated, μ is that model is expected, Σ is model variance, and μ, Σ are by the CrCgCb characteristic series vector c of some training sample pixels idraw, for mean vector, covariance matrix, the number that n is training sample.
Finger complexion model as follows G01-G02 training obtain each calculating parameter of model: step G01, using in advance for the finger skin pixel rgb value collecting under the conditions such as the experiment people at multiple illumination, different model camera and different sexes and age as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of the Gaussian Mixture probability density function of finger complexion model.Step G02, before user uses described steps A, gather this user self finger colour of skin RGB data under its current environment for use, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of Gaussian Mixture probability density function to finger complexion model definite in step G01 carries out revaluation training, upgrades the calculating parameter of finger complexion model according to the result of revaluation training.
Described lip pencil indicant color model as follows G11-G12 training obtains each calculating parameter of model: step G11, will make a video recording the lip pencil indicant appearance colored pixels rgb value that collects under first-class condition as observing sample value for multiple illumination, different model in advance, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of the Gaussian Mixture probability density function of lip pencil indicant color model.Step G12, before user uses described steps A, gather the lip pencil indicant appearance color RGB data under the current environment for use of this user, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of Gaussian Mixture probability density function to lip pencil indicant color model definite in step G11 carries out revaluation training, upgrades the calculating parameter of lip pencil indicant color model according to the result of revaluation training.
Steps A 3, repeating step A2, until each pixel in described current video image frame is disposed; Choose again the foreground image that all pixels that match with appearance color indication body obtain current video image frame, described foreground image is carried out to filtering and eliminating noise, then the method based on mathematical morphology is carried out profile detection, choose largest connected region in the detected whole profiles profile as indication body, in the profile of described indication body, search out the indication end endpoint location of indication body.
When the profile of indication body processing in steps A 3, be divided into the processing of finger contours and the processing of lip pencil indicant profile.
Described finger contours is processed and when definite finger fingertip indicating positions, specifically comprised the following steps A301-A304: steps A 301, definition template subimage, carry out erosion operation processing to the finger contours image after binaryzation.Steps A 302, finger contours after treatment is carried out to transverse and longitudinal coordinate projection, from top to bottom, from left to right, search the place of projection value significant change, as the rough position of finger fingertip, and centered by this position, structure rough search window.Steps A 303, employing individual layer function Connection Neural Network are predicted described finger fingertip position, determine accurate search window, are specifically calculated as: Y=[X|f (XW h+ β h)] W, wherein, X is input vector, β hfor bias matrix, W hfor input layer is to the weight matrix of hidden layer, W is the weight matrix that precondition goes out, and described weight matrix adopts horizontal line from left to right, horizontal line from right to left, clockwise circle shape, counterclockwise circle shape, clockwise rectangular shape on training strategy, rectangular shape is trained counterclockwise.Steps A 304, carry out finger fingertip detection based on template matches, define several template finger, in described accurate search window, calculate the absolute value distance of subimage to be matched and each template finger, the exact position that draws finger fingertip.
Described lip pencil indicant profile is processed and when the indication end endpoint location of definite lip pencil indicant, specifically comprise the following steps A311-A312: steps A 311, the lip pencil indicant contour images after binaryzation is communicated with to processing, asks for the center of gravity of connected graph.Steps A 312, using the center of gravity of described connected graph as search center point, and as center from upper left side, directly over search for to top-right search order, calculate successively the Euclidean distance of described search center point to each pixel on profile diagram, take out described Euclidean distance value the maximum, find the corresponding point on lip pencil indicant profile diagram according to this apart from maximum path, as the final position of lip pencil indicant indication end end points.
The indication end endpoint location of indication body in steps A 4, the current video image frame that extracts according to steps A 3, described indication end end points is carried out to gripper path analysis and the understanding of instruction intention, if not yet determine user's instruction intention and indicating area, turn back to steps A 1 and continue to carry out.
Steps A 4 specifically comprises the following steps: the indication end endpoint location of indication body in the current video image frame that steps A 401, obtaining step A3 are extracted, with its before in some frames the indication end endpoint location of indication body compare, calculate translational speed and direction.Steps A 402, according to time sequencing, the indication end end points translational speed in each frame is analyzed, in the time detecting that indication end end points has an obvious pause process first, using this residing position coordinates that pauses as starting point indicating positions.Steps A 403, recognizing after starting point indicating positions, in the time again detecting that the motion of indication end end points has a marked halt, using residing this pause position coordinates as End point indication position, if End point indication position not yet detected, return to execution step A1.Steps A 404, between starting point indicating positions and End point indication position, movement locus to indication end end points is analyzed: if detect that indication end end points from left to right or from right to left does the motion of near linear, the instruction that represents user is intended that text instruction, and indicating area is the residing image block areas of line of text of end points movement locus top, indication end; If detect that indication end end points does the sealing campaign of similar circle shape or approaches the motion of sealing, represent that user's instruction is intended that icon instruction, indicating area is for connecing the image block areas that rectangle comprises in the end points circus movement track of indication end; If detect that indication end end points does the sealing campaign of similar rectangle or approaches the motion of sealing, represent that user's instruction is intended that graphic code instruction, indicating area is the image block areas comprising in the end points regular-shape motion track of indication end.
In steps A, differentiate user's out the text instruction respectively of instruction intention, icon instruction and graphic code instruction.The recognition methods of picture material described in step B specifically comprises:
According to the instruction intention of differentiating user out in steps A, if text instruction selects OCR text recognition method described image block data bitmap to be carried out to content recognition, the text that obtains wherein comprising and character string information.
According to the instruction intention of differentiating user out in steps A, if icon instruction, according to default icon library, utilize each icon feature templates in icon library, select the method for image recognition to carry out content recognition to described image block data bitmap, obtain corresponding to the icon index character string information in icon library.
According to the instruction intention of differentiating user out in steps A, if graphic code instruction selects respectively Quick Response Code and bar code recognition method described image block data bitmap to be carried out to content recognition, the character string information that obtains wherein comprising.
The process that generates expression formula for search in step C specifically comprises the following steps:
Step C1, indicate intention according to differentiating user out in steps A, the content information recognizing in step B is changed into search condition item, concrete grammar comprises: if user's instruction is intended to icon instruction or graphic code instruction, using the described content information recognizing in step B as current classification code, and remove all current terms; If user's instruction is intended to text instruction, the described content information recognizing in step B is carried out to word segmentation processing, extract vocabulary wherein or phrase as current term.
If the described current term of step C2 has renewal, current classification code and each current term are carried out to logical combination, generate expression formula for search.
As shown in Figure 2, the system of obtaining remote resource output from information carrier surface interactive mode, comprising: indication body type arranges module 210, video capture and target identification module 202, indication body appearance color gauss hybrid models storehouse 211, gripper path analysis module 203, image block extracts and content identifier module 204, search condition generation module 205, network delivery module 206, remote resource retrieval module 207, specialized knowledge base 209 and information show or playing module 208.
Indication body type arranges module 210 and is connected with video capture and target identification module 202, be used for selecting the presently used indication body type of user, indication body type comprises the finger of different ethnic groups and the lip pencil indicant of multiple color, and different ethnic groups comprise white people, yellow, black race and brown race etc.Video capture and target identification module 202 arrange the current selected indication body type of module 210 according to indication body type and enable corresponding indication body appearance color gauss hybrid models and indication end end points searching method.
Indication body appearance color gauss hybrid models storehouse 211 comprises the finger complexion model of different ethnic groups and the lip pencil indicant color model of multiple color, and in described indication body appearance color gauss hybrid models storehouse, each model arranges the selected current indication body type of module 210 by video capture and target identification module 202 according to indication body type and selects.
Video capture and target identification module 202 comprise camera, foreground image extraction unit and endpoint location unit, indication end.Described camera is positioned at 201 tops, communications media surface and carries out video capture, and the indication body coming into view is carried out to target identification.Foreground image extraction unit reads each current video image frame collecting the RGB color data of each pixel one by one, calculate according to the indication body appearance color gauss hybrid models of setting up in advance the probable value whether described pixel belongs to described indication body appearance color gauss hybrid models, and differentiate described pixel and whether match with the appearance color of indication body, after each pixel in described current video image frame is disposed, obtain the foreground image of described current video image frame.Endpoint location unit, indication end carries out filtering and eliminating noise to described foreground image, then the method based on mathematical morphology is carried out profile detection, choose largest connected region in the detected whole profiles profile as indication body, in the profile of described indication body, search out the indication end endpoint location of indication body.
Gripper path analysis module 203 is connected with video capture and target identification module 202, for following the tracks of out movement locus and the pause of indication end end points on indication body, obtains user's instruction intention and indicating area according to described movement locus and pause.
Image block extracts and content identifier module 204 is intended to according to described user's instruction, extracts the image block data bitmap in described indicating area, and adopts picture material recognition methods to identify the content information comprising in described image block data bitmap.
Search condition generation module 205 extracts with image block respectively and content identifier module 204, network delivery module 206 are connected, according to described user's instruction intention, using the content information that image block extracts and content identifier module 204 identifies or as current classification code, or carry out word segmentation processing and extract vocabulary wherein or phrase as current term, if current term has renewal, thereby current classification code and each current term are carried out to logical combination generation expression formula for search, and described expression formula for search is sent to described network delivery module.
Network delivery module 206 shows with remote resource retrieval module 207, information respectively or playing module 208 is connected, utilize wired or wireless network that described expression formula for search is sent to remote resource retrieval module 207, also for the multimedia resource being retrieved by remote resource retrieval module 207 receiving is forwarded, the information of giving shows network delivery module 206 or playing module 208 is exported.
Remote resource retrieval module 207 is connected with specialized knowledge base 209, described expression formula for search is retrieved in specialized knowledge base 209 to qualified multimedia resource, and sends it back described network delivery module.
Specialized knowledge base 209 comprises the resources such as text, hypertext, audio frequency, video, animation and three-dimensional emulation, and each resource at least marks keyword, classification code and the autograph information for retrieval.
Indication body appearance color gauss hybrid models is divided into finger complexion model and lip pencil indicant color model, and in the time that reality is used, the finger of using according to user or lip pencil indicant select different color model to carry out target identification and trajectory analysis.
Indication body appearance color gauss hybrid models all carries out modeling based on CrCgCb color space application gauss hybrid models technology, adopt multiple single Gaussian distribution to mix, by following formula, the probability density function G (x) of model be weighted to mix and calculate: G ( x ) = Σ j = 1 M α j P j ( c , μ j , Σ j ) , Σ j = 1 M α j = 1 , Wherein, M is the number of the single Gaussian distribution that comprises of model, α jfor the hybrid weight of the probability density function of each single Gaussian distribution, Ρ j(c, μ j, Σ j) be defined as: P ( c , μ , Σ ) = 1 ( 2 π ) | Σ | exp [ - 1 2 ( c - μ ) T Σ - 1 ( c - μ ) ] , Wherein, the transposition of T representing matrix, c=[c r, c g, c b] tfor the three-component column vector of texture color CrCgCb of pixel to be estimated, μ is that model is expected, Σ is model variance, and μ, Σ are by the CrCgCb characteristic series vector c of some training sample pixels idraw, for mean vector, covariance matrix, the number that n is training sample.
Finger complexion model and lip pencil indicant color model are all observed method that sample carries out model training and are obtained each calculating parameter of model in advance by collection.
While training described finger complexion model, using the finger skin pixel rgb value collecting under the conditions such as the prior experiment people for multiple illumination, different model camera and different sexes and age as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of finger complexion model Gaussian Mixture probability density function.Before described video capture and target identification module use described finger complexion model, can also gather user's self finger colour of skin RGB data under its current environment for use, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of the Gaussian Mixture probability density function to described finger complexion model carries out revaluation training.
When training lip pencil indicant color model, to in advance make a video recording the lip pencil indicant appearance colored pixels rgb value that collects under first-class condition as observing sample value for multiple illumination, different model, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of described lip pencil indicant color model Gaussian Mixture probability density function.Before described video capture and target identification module use described lip pencil indicant color model, can also gather the lip pencil indicant appearance color RGB data under the current environment for use of this user, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of the Gaussian Mixture probability density function to described lip pencil indicant color model carries out revaluation training.
Endpoint location unit, indication end comprises finger endpoint location unit, indication end and endpoint location unit, lip pencil indicant indication end.
First finger endpoint location unit, indication end defines template subimage, and the finger contours image after binaryzation is carried out to erosion operation processing.Then finger contours after treatment is carried out to transverse and longitudinal coordinate projection, from top to bottom, from left to right, search the place of projection value significant change, as the rough position of finger fingertip, and centered by this position, structure rough search window.Adopt again individual layer function Connection Neural Network to predict described finger fingertip position, determine accurate search window, be specifically calculated as: Y=[X|f (XW h+ β h)] W, wherein, X is input vector, β hfor bias matrix, W hfor input layer is to the weight matrix of hidden layer, W is the weight matrix that precondition goes out, and described weight matrix adopts horizontal line from left to right, horizontal line from right to left, clockwise circle shape, counterclockwise circle shape, clockwise rectangular shape on training strategy, rectangular shape is trained counterclockwise.Carry out finger fingertip detection based on template matches, define several template finger, in described accurate search window, calculate the absolute value distance of subimage to be matched and each template finger, the exact position that draws finger fingertip.
First endpoint location unit, lip pencil indicant indication end is communicated with processing to the lip pencil indicant contour images after binaryzation, asks for the center of gravity of connected graph.Then the point using the center of gravity of described connected graph as search center, and as center from upper left side, directly over search for to top-right search order, calculate successively the Euclidean distance of described search center point to each pixel on profile diagram, take out described Euclidean distance value the maximum, find the corresponding point on lip pencil indicant profile diagram according to this apart from maximum path, as the final position of lip pencil indicant instruction end points.
Gripper path analysis module 203 comprises that end points translational speed and direction calculating unit, starting point positioning unit, terminal positioning unit and instruction are intended to understand unit.
End points translational speed and direction calculating unit be the indication end endpoint location of indication body in current video image frame, with its before in some frames the indication end endpoint location of indication body compare, calculate translational speed and moving direction.
Starting point positioning unit is analyzed the translational speed of the indication end end points in each frame according to time sequencing, in the time detecting that there is an obvious pause process indication end first, using this residing position coordinates that pauses as starting point indicating positions.
Terminal positioning unit is recognizing after starting point indicating positions, in the time again detecting that the motion of indication end has a marked halt, using residing this pause position coordinates as End point indication position, if End point indication position not yet detected, continue to calculate translational speed and the moving direction of indication end by described translational speed and direction calculating unit.
Instruction intention understand unit for to indication end the movement locus between starting point indicating positions and End point indication position analyze: if indication end detected and from left to right or from right to left do the motion of near linear, the instruction that represents user is intended that text instruction, and indicating area is the residing image block areas of line of text of end points movement locus top, indication end; If detect that indication end does the sealing campaign of similar circle shape or approach the motion of sealing, the instruction that represents user is intended that icon instruction, and indicating area is for connecing the image block areas that rectangle comprises in the end points circus movement track of indication end; If detect that indication end does the sealing campaign of similar rectangle or approach the motion of sealing, the instruction schematic diagram that represents user is graphic code instruction, and indicating area is the image block areas comprising in the end points regular-shape motion track of indication end.
As shown in Figure 3, image block extraction and content identifier module 204 comprise image block extraction unit 301, text identification unit 302, icon recognition unit 303 and graphic code recognition unit 304.
The user indicating area that image block extraction unit 301 is understood out according to gripper path analysis module 203, extract the image block data bitmap in described indicating area, and the user who understands out according to described gripper path analysis module 203 indicates intention, call corresponding a kind of recognition unit from text identification unit 302, icon recognition unit 303 and graphic code recognition unit 304 and identify the content information comprising described image block data bitmap.In the time that described user's instruction is intended to text instruction, call text identification unit 302; In the time that described user's instruction is intended to icon instruction, calling graph mark recognition unit 303; In the time that described user's instruction is intended to graphic code instruction, calling graph shape code recognition unit 304.
Text identification unit 302 adopts OCR text recognition method described image block data bitmap to be carried out to content recognition, the text that obtains wherein comprising and character string information.
Icon recognition unit 303, according to default icon library, utilizes each icon feature templates in icon library, selects the method for image recognition to carry out content recognition to described image block data bitmap, obtains corresponding to the icon index character string information in icon library.
Graphic code recognition unit 304 selects respectively Quick Response Code and bar code recognition method described image block data bitmap to be carried out to content recognition, the character string information that obtains wherein comprising.
Adopt and described obtain from information carrier surface interactive mode the reading product that the system of remote resource output is made, have easy to carry, learning content easily expands, application space is wide, and can support the advantage of autonomous learning whenever and wherever possible.
Now the concrete application of the method for obtaining remote resource output from information carrier surface interactive mode is given an example.
What embodiment mono-, smart mobile phone were auxiliary search and the autonomous learning system of books associated network resource
This system is made up of smart mobile phone client software and remote resource management server, mobile phone and have between the server of specialized knowledge base and carry out data communication by Wifi or 3G/4G mobile network, upload by user and point information category and the text that mutual instruction is obtained, then from specialized knowledge base, download the education resource retrieving.
Under the operating systems such as smart mobile phone Android, IOS, design autonomous learning client software, utilize the camera of mobile phone to take print media current pages such as books, finger on the page is carried out to target following and trajectory analysis simultaneously, understand and finger movement intention, judge the type of indicated region media information, be text instruction, icon instruction or graphic code instruction, and navigate to the specified region of finger, extract the image block data bitmap of described appointed area.Certainly,, while using this system, also can use lip pencil indicant replacing finger.According to the information type in the aforementioned indicated region drawing, for icon instruction and graphic code region, extracted image block data bitmap is identified to the information sequence string wherein comprising as current classification code; According to the information type in the aforementioned indicated region drawing, for text indicating area, extracted image block data bitmap is carried out to text identification, identify the text wherein comprising, extract vocabulary wherein as current term; Utilize Wifi or the 3G/4G communication network of mobile phone, the aforementioned current classification code drawing and current term composition expression formula for search are sent to remote server, in specialized knowledge base, retrieve the digital resource of the various forms that are associated; The various digital resources that retrieve from server specialized knowledge base are passed back by wireless network, on mobile phone liquid crystal screen, organized, show or play.
In actual use, when media surface shooting due to user's handheld mobile phone, all can have slight shake, the video frame image picture area gathering by mobile phone camera has slight movement, and background is also understable.Thereby in the time of opponent's fingering row target following and track extraction, adopt finger skin color modeling and know method for distinguishing and extract the foreground image that comprises hand, whether each pixel of differentiating picture frame by the Gaussian Mixture Model Probability Density Function of finger complexion model mates with the finger colour of skin, utilize the profile testing method based on mathematical morphology to extract finger contours shape to eliminate the impact of general light shade, then by definition subtemplate image, the aforementioned finger contours that obtains is communicated with to processing, predict the rough position of finger by Projection Analysis, predict definite search window in conjunction with forecast model again, the last exact position of detecting finger fingertip in search window with template matching method.Extract behind the exact position of finger tip, carry out gesture understanding and track following, navigate to the information area of user's instruction, identify the text message or the classification information that in this region, comprise, form expression formula for search and be transferred to specialized knowledge base by wireless network module, the be associated retrieval of education resource, finally downloads to mobile phone by the corresponding education resource finding out, and organizes and show in display screen.
Embodiment bis-, students in middle and primary schools' learning System based on intelligent learning LED desk lamp
This system is made up of intelligent learning LED desk lamp and remote teaching Resource Server, between intelligent learning LED desk lamp and remote teaching Resource Server, carry out data communication by Wifi or 3G/4G mobile network, upload by user and point information category and the text that mutual instruction is obtained, download the education resource retrieving from Network Database.The physical arrangement of intelligent learning LED desk lamp is taking common LED desk lamp as basis, near the LED module of the rack-mount aspect of camera plate, susceptor surface embeds LCDs, the modules such as embedded type CPU, internal memory, Flash storage, wireless network, video acquisition that inside is integrated.LED lamp can provide light source at night, and when student uses, books are placed on desk lamp below, can adjust the relative position of books and camera by the button in liquid crystal display, to ensure that current page that student pays close attention at camera within sweep of the eye.
In actual design, select ARM11 as embedded microprocessor, adopt Eclipse IDE for C/C++Developers as developing instrument, the embedded software of the functions such as video acquisition, target following, gesture identification, track extract, text identification based on image, the icon identification based on image, the graphic code identification based on image, wireless network transmissions, multimedia messages demonstration is supported in design, realizes the students in middle and primary schools' Web-based Self-regulated Learning system based on intelligent learning LED desk lamp.
Embodiment tri-, based on intelligent glasses and the mutual professional knowledge inquiry system of finger
This system is made up of intelligent glasses and long-range knowledge services device, between intelligent glasses and long-range knowledge services device, carry out data communication by the mobile radio network such as Wifi or 3G/4G, upload by user and point information category and the text that mutual instruction is obtained, download the associated resources retrieving from network specialty knowledge base.The physical arrangement of intelligent glasses is on the basis of common spectacles, on cell mount, be mounted with minisize pick-up head outwardly, eyeglass inside surface embeds flexible ultra-thin LCDs, the modules such as embedded type CPU, internal memory, Flash storage, wireless network, video acquisition that inside is integrated.When system is used, the printed materials such as books or plane information display device in intelligent glasses front, adjust the position of head, to ensure that the current written information that user is paid close attention to is in camera within sweep of the eye.
Under Android operating system, utilize the embedded software of Google Glass Mirror API Google glasses development interface design intelligent glasses end, the camera calling on frame is taken the print media current page coming into view or flat-panel display devices surface, opponent's fingering row target following simultaneously and trajectory analysis, understand and finger movement intention, judge the type of indicated region media information, and navigate to the specified region of finger, extract the image block data bitmap of described appointed area, and identify text message wherein or identify classified information.Utilize the Wi-Fi network communication module that embeds design on intelligent glasses, the classified information sequence string identifying or text are sent to long-range knowledge services device, in specialized knowledge base, retrieve the digital resource of the various forms that are associated.The various digital resources that retrieve from server specialized knowledge base are passed back by wireless network, on the liquid crystal display on intelligent glasses, organized, show or play.
The described method of obtaining remote resource output from information carrier surface interactive mode can be used as a kind of revolutionary technology, upgrading at present before, widely used point reader, talking pen series of products in primary grades language education field.The method has the characteristics such as use is unsophisticated, cost is low, information scale is large, expansion is convenient, applicability is wide.Except language learning, the technical method that the present invention proposes can also be applied to the learning activities of other subject course, expand after corresponding networked knowledge base, each classman even university student can be browsed various textbooks or the data such as mathematics, physics, chemistry, history, geography on one side, by natural interaction and Internet Transmission, obtain the current relational learning resource that runs into knotty problem, carry out autonomous learning.The method can also be applied in the daily work of people and life, can pass through NetFind solution to the current difficulties running in time on printed material or flat-panel display devices.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (12)

1. the method for obtaining remote resource output from information carrier surface interactive mode, is characterized in that, comprises the following steps:
Steps A, the indication body that enters its camera visual field is carried out to target identification with user side, follow the tracks of out movement locus and the stall position of indication end end points on indication body, obtain user's instruction intention and indicating area according to described movement locus and stall position; The camera of described user side is positioned at target information carrier surface top, for target information carrier is taken;
Step B, user side, according to described user's instruction intention, extract the image block data bitmap in described indicating area, and identify the content information comprising in described image block data bitmap;
Step C, user side are according to described user's instruction intention, convert described content information to current classification code or current term, if described current term has renewal, generate expression formula for search, and described expression formula for search is sent to remote server, otherwise return to steps A;
Step D, described remote server retrieve qualified multimedia resource by described expression formula for search in specialized knowledge base, and send to described user side;
Step e, described user side are exported the described multimedia resource receiving, and then turn back to steps A, prepare user interactions next time.
2. method according to claim 1, is characterized in that, described steps A specifically comprises the following steps:
The camera collection current video image frame of steps A 1, user side;
The RGB color data of each pixel in steps A 2, the user side current video image frame that read step A1 gathers one by one, calculate according to the indication body appearance color gauss hybrid models of setting up in advance the probable value that this pixel belongs to described indication body appearance color gauss hybrid models, and judge whether this pixel matches with the appearance color of indication body;
Steps A 3, repeating step A2, until each pixel in described current video image frame is disposed; Choose again the foreground image that all pixels that match with appearance color indication body obtain current video image frame, described foreground image is carried out to filtering and eliminating noise, then the method based on mathematical morphology is carried out profile detection, choose largest connected region in the detected whole profiles profile as indication body, in the profile of described indication body, search out the indication end endpoint location of indication body;
The indication end endpoint location of indication body in steps A 4, the current video image frame that extracts according to steps A 3, described indication end end points is carried out to gripper path analysis and the understanding of instruction intention, if not yet determine user's instruction intention and indicating area, turn back to steps A 1 and continue to carry out.
3. method according to claim 2, is characterized in that, indicates external table color gauss hybrid models to be divided into finger complexion model and lip pencil indicant color model in described steps A 2;
Described indication body appearance color gauss hybrid models all carries out modeling based on CrCgCb color space application gauss hybrid models technology, adopt multiple single Gaussian distribution to mix, by following formula, the probability density function G (x) of model be weighted to mix and calculate: G ( x ) = Σ j = 1 M α j P j ( c , μ j , Σ j ) , Σ j = 1 M α j = 1 , Wherein, M is the number of the single Gaussian distribution that comprises of model, α jfor the hybrid weight of the probability density function of each single Gaussian distribution, Ρ j(c, μ j, Σ j) be defined as: P ( c , μ , Σ ) = 1 ( 2 π ) | Σ | exp [ - 1 2 ( c - μ ) T Σ - 1 ( c - μ ) ] , Wherein, the transposition of T representing matrix, c=[c r, c g, c b] tfor the three-component column vector of texture color CrCgCb of pixel to be estimated, μ is that model is expected, Σ is model variance, and μ, Σ are by the CrCgCb characteristic series vector c of some training sample pixels idraw, for mean vector, covariance matrix, the number that n is training sample;
Described finger complexion model as follows G01-G02 is trained each calculating parameter that obtains model:
Step G01, using the finger skin pixel rgb value collecting for the experiment people at multiple illumination, different model camera and different sexes and age in advance as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of the Gaussian Mixture probability density function of finger complexion model;
Step G02, before user uses described steps A, gather this user self finger colour of skin RGB data under its current environment for use, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of Gaussian Mixture probability density function to finger complexion model definite in step G01 carries out revaluation training, upgrades the calculating parameter of finger complexion model according to the result of revaluation training;
Described lip pencil indicant color model as follows G11-G12 is trained each calculating parameter that obtains model:
Step G11, using the lip pencil indicant appearance colored pixels rgb value collecting for multiple illumination, different model camera in advance as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of the Gaussian Mixture probability density function of lip pencil indicant color model;
Step G12, before user uses described steps A, gather the lip pencil indicant appearance color RGB data under the current environment for use of this user, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of Gaussian Mixture probability density function to lip pencil indicant color model definite in step G11 carries out revaluation training, upgrades the calculating parameter of lip pencil indicant color model according to the result of revaluation training.
4. method according to claim 2, is characterized in that, is divided into the processing of finger contours and the processing of lip pencil indicant profile when the profile of indication body processing in described steps A 3;
Described finger contours is processed and when definite finger fingertip indicating positions, is specifically comprised the following steps A301-A304:
Steps A 301, definition template subimage, carry out erosion operation processing to the finger contours image after binaryzation;
Steps A 302, finger contours after treatment is carried out to transverse and longitudinal coordinate projection, from top to bottom, from left to right, search the place of projection value significant change, as the rough position of finger fingertip, and centered by this position, structure rough search window;
Steps A 303, employing individual layer function Connection Neural Network are predicted described finger fingertip position, determine accurate search window, are specifically calculated as: Y=[X|f (XW h+ β h)] W, wherein, X is input vector, β hfor bias matrix, W hfor input layer is to the weight matrix of hidden layer, W is the weight matrix that precondition goes out, and described weight matrix adopts horizontal line from left to right, horizontal line from right to left, clockwise circle shape, counterclockwise circle shape, clockwise rectangular shape on training strategy, rectangular shape is trained counterclockwise;
Steps A 304, carry out finger fingertip detection based on template matches, define several template finger, in described accurate search window, calculate the absolute value distance of subimage to be matched and each template finger, the exact position that draws finger fingertip;
Described lip pencil indicant profile is processed and when the indication end endpoint location of definite lip pencil indicant, is specifically comprised the following steps A311-A312:
Steps A 311, the lip pencil indicant contour images after binaryzation is communicated with to processing, asks for the center of gravity of connected graph;
Steps A 312, using the center of gravity of described connected graph as search center point, and as center from upper left side, directly over search for to top-right search order, calculate successively the Euclidean distance of described search center point to each pixel on profile diagram, take out described Euclidean distance value the maximum, find the corresponding point on lip pencil indicant profile diagram according to this apart from maximum path, as the final position of lip pencil indicant indication end end points.
5. method according to claim 2, is characterized in that, described steps A 4 specifically comprises the following steps:
The indication end endpoint location of indication body in the current video image frame that steps A 401, obtaining step A3 are extracted, with its before in some frames the indication end endpoint location of indication body compare, calculate translational speed and direction;
Steps A 402, according to time sequencing, the indication end end points translational speed in each frame is analyzed, in the time detecting that indication end end points has an obvious pause process first, using this residing position coordinates that pauses as starting point indicating positions;
Steps A 403, recognizing after starting point indicating positions, in the time again detecting that the motion of indication end end points has a marked halt, using residing this pause position coordinates as End point indication position, if End point indication position not yet detected, return to execution step A1;
Steps A 404, between starting point indicating positions and End point indication position, movement locus to indication end end points is analyzed: if detect that indication end end points from left to right or from right to left does the motion of near linear, the instruction that represents user is intended that text instruction, and indicating area is the residing image block areas of line of text of end points movement locus top, indication end; If detect that indication end end points does the sealing campaign of similar circle shape or approaches the motion of sealing, represent that user's instruction is intended that icon instruction, indicating area is for connecing the image block areas that rectangle comprises in the end points circus movement track of indication end; If detect that indication end end points does the sealing campaign of similar rectangle or approaches the motion of sealing, represent that user's instruction is intended that graphic code instruction, indicating area is the image block areas comprising in the end points regular-shape motion track of indication end.
6. method according to claim 1, is characterized in that, picture material recognition methods described in described step B specifically comprises:
According to the instruction intention of differentiating user out in steps A, if text instruction selects OCR text recognition method described image block data bitmap to be carried out to content recognition, the text that obtains wherein comprising and character string information;
According to the instruction intention of differentiating user out in steps A, if icon instruction, according to default icon library, utilize each icon feature templates in icon library, select the method for image recognition to carry out content recognition to described image block data bitmap, obtain corresponding to the icon index character string information in described icon library;
According to the instruction intention of differentiating user out in steps A, if graphic code instruction selects respectively Quick Response Code and bar code recognition method described image block data bitmap to be carried out to content recognition, the character string information that obtains wherein comprising.
7. method according to claim 1, is characterized in that, the process that generates expression formula for search in described step C specifically comprises the following steps:
Step C1, indicate intention according to differentiating user out in steps A, the content information recognizing in step B is changed into search condition item, concrete grammar comprises:
If user's instruction is intended to icon instruction or graphic code instruction, using the described content information recognizing in step B as current classification code, and remove all current terms;
If user's instruction is intended to text instruction, the described content information recognizing in step B is carried out to word segmentation processing, extract vocabulary wherein or phrase as current term;
If the described current term of step C2 has renewal, described current classification code and each current term are carried out to logical combination, generate expression formula for search.
8. obtain the system of remote resource output from information carrier surface interactive mode, it is characterized in that, comprising: indication body type arranges module, video capture and target identification module, indication body appearance color gauss hybrid models storehouse, gripper path analysis module, image block extraction and content identifier module, search condition generation module, network delivery module, remote resource retrieval module, specialized knowledge base and information and shows or playing module;
Described indication body type arranges module and is connected with described video capture and target identification module, and for selecting the presently used indication body type of user, described indication body type comprises the finger of different ethnic groups and the lip pencil indicant of multiple color; Described video capture and target identification module arrange the current selected indication body type of module according to described indication body type and enable corresponding indication body appearance color gauss hybrid models and indication end end points searching method;
Described indication body appearance color gauss hybrid models storehouse comprises the finger complexion model of different ethnic groups and the lip pencil indicant color model of multiple color, and in described indication body appearance color gauss hybrid models storehouse, each model arranges the selected current indication body type of module by described video capture and target identification module according to described indication body type and selects;
Described video capture and target identification module comprise camera, foreground image extraction unit and endpoint location unit, indication end, and described camera is positioned at target information carrier surface top and carries out video capture, and the indication body coming into view is carried out to target identification;
Described foreground image extraction unit reads each current video image frame collecting the RGB color data of each pixel one by one, calculate according to the indication body appearance color gauss hybrid models of setting up in advance the probable value whether described pixel belongs to described indication body appearance color gauss hybrid models, and differentiate described pixel and whether match with the appearance color of indication body, after each pixel in described current video image frame is disposed, obtain the foreground image of described current video image frame;
Endpoint location unit, described indication end, described foreground image is carried out to filtering and eliminating noise, then the method based on mathematical morphology is carried out profile detection, choose largest connected region in the detected whole profiles profile as indication body, in the profile of described indication body, search out the indication end endpoint location of indication body;
Described gripper path analysis module is connected with described video capture and target identification module, for following the tracks of out movement locus and the pause of indication end end points on indication body, obtains user's instruction intention and indicating area according to described movement locus and pause;
Described image block extracts and content identifier module is intended to according to described user's instruction, extracts the image block data bitmap in described indicating area, and adopts picture material recognition methods to identify the content information comprising in described image block data bitmap;
Described search condition generation module is connected with described image block extraction and content identifier module, described network delivery module respectively, according to described user's instruction intention, described image block is extracted and content identifier module identifies content information or as current classification code, or carry out word segmentation processing and extract vocabulary wherein or phrase as current term, if current term has renewal, thereby current classification code and each current term are carried out to logical combination generation expression formula for search, and described expression formula for search is sent to described network delivery module;
Described network delivery module shows with described remote resource retrieval module, described information respectively or playing module is connected, utilize wired or wireless network that described expression formula for search is sent to described remote resource retrieval module, also for the multimedia resource being retrieved by the described remote resource retrieval module forwarding receiving is given, described information shows described network delivery module or playing module is exported;
Described remote resource retrieval module is connected with described specialized knowledge base, described expression formula for search is retrieved in described specialized knowledge base to qualified multimedia resource, and sends it back described network delivery module;
In described specialized knowledge base, at least comprise text, hypertext, audio frequency, video, animation and three-dimensional artificial resource, each resource at least marks keyword, classification code and the autograph information for retrieval.
9. system according to claim 8, is characterized in that, described indication body appearance color gauss hybrid models is divided into finger complexion model and lip pencil indicant color model;
Described indication body appearance color gauss hybrid models all carries out modeling based on CrCgCb color space application gauss hybrid models technology, adopt multiple single Gaussian distribution to mix, by following formula, the probability density function G (x) of model be weighted to mix and calculate: G ( x ) = Σ j = 1 M α j P j ( c , μ j , Σ j ) , Σ j = 1 M α j = 1 , Wherein, M is the number of the single Gaussian distribution that comprises of model, α jfor the hybrid weight of the probability density function of each single Gaussian distribution, Ρ j(c, μ j, Σ j) be defined as: P ( c , μ , Σ ) = 1 ( 2 π ) | Σ | exp [ - 1 2 ( c - μ ) T Σ - 1 ( c - μ ) ] , Wherein, the transposition of T representing matrix, c=[c r, c g, c b] tfor the three-component column vector of texture color CrCgCb of pixel to be estimated, μ is that model is expected, Σ is model variance, and μ, Σ are by the CrCgCb characteristic series vector c of some training sample pixels idraw, for mean vector, covariance matrix, the number that n is training sample;
Described finger complexion model and lip pencil indicant color model are in advance all observed method that sample carries out model training and are obtained each calculating parameter of model by collection;
While training described finger complexion model, the finger skin pixel rgb value that the prior experiment people for multiple illumination, different model camera and different sexes and age is collected is as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of finger complexion model Gaussian Mixture probability density function;
Before described video capture and target identification module use described finger complexion model, can also gather user's self finger colour of skin RGB data under its current environment for use, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of the Gaussian Mixture probability density function to described finger complexion model carries out revaluation training;
While training described lip pencil indicant color model, using the lip pencil indicant appearance colored pixels rgb value collecting for multiple illumination, different model camera in advance as observing sample value, utilize expectation-maximization algorithm to carry out maximal possibility estimation, determine each calculating parameter of described lip pencil indicant color model Gaussian Mixture probability density function;
Before described video capture and target identification module use described lip pencil indicant color model, can also gather the lip pencil indicant appearance color RGB data under the current environment for use of this user, using these numerical value as new observation sample value, re-use expectation-maximization algorithm and carry out maximal possibility estimation, each calculating parameter of the Gaussian Mixture probability density function to described lip pencil indicant color model carries out revaluation training.
10. system according to claim 8, is characterized in that, endpoint location unit, described indication end comprises finger endpoint location unit, indication end and endpoint location unit, lip pencil indicant indication end;
First endpoint location unit, described finger indication end defines template subimage, and the finger contours image after binaryzation is carried out to erosion operation processing; Then finger contours after treatment is carried out to transverse and longitudinal coordinate projection, from top to bottom, from left to right, search the place of projection value significant change, as the rough position of finger fingertip, and centered by this position, structure rough search window; Adopt again individual layer function Connection Neural Network to predict described finger fingertip position, determine accurate search window, be specifically calculated as: Y=[X|f (XW h+ β h)] W, wherein, X is input vector, β hfor bias matrix, W hfor input layer is to the weight matrix of hidden layer, W is the weight matrix that precondition goes out, and described weight matrix adopts horizontal line from left to right, horizontal line from right to left, clockwise circle shape, counterclockwise circle shape, clockwise rectangular shape on training strategy, rectangular shape is trained counterclockwise; Carry out finger fingertip detection based on template matches, define several template finger, in described accurate search window, calculate the absolute value distance of subimage to be matched and each template finger, the exact position that draws finger fingertip;
First endpoint location unit, described lip pencil indicant indication end is communicated with processing to the lip pencil indicant contour images after binaryzation, asks for the center of gravity of connected graph; Then the point using the center of gravity of described connected graph as search center, and as center from upper left side, directly over search for to top-right search order, calculate successively the Euclidean distance of described search center point to each pixel on profile diagram, take out described Euclidean distance value the maximum, find the corresponding point on lip pencil indicant profile diagram according to this apart from maximum path, as the final position of lip pencil indicant instruction end points.
11. systems according to claim 8, is characterized in that, described gripper path analysis module comprises that end points translational speed and direction calculating unit, starting point positioning unit, terminal positioning unit and instruction are intended to understand unit;
Described end points translational speed and direction calculating unit be the indication end endpoint location of indication body in current video image frame, with its before in some frames the indication end endpoint location of indication body compare, calculate translational speed and moving direction;
Described starting point positioning unit is analyzed the translational speed of the indication end end points in each frame according to time sequencing, in the time detecting that there is an obvious pause process indication end first, using this residing position coordinates that pauses as starting point indicating positions;
Described terminal positioning unit is recognizing after starting point indicating positions, in the time again detecting that the motion of indication end has a marked halt, using residing this pause position coordinates as End point indication position, if End point indication position not yet detected, continue to calculate translational speed and the moving direction of indication end by described translational speed and direction calculating unit;
Described instruction intention understand unit for to indication end the movement locus between starting point indicating positions and End point indication position analyze: if indication end detected and from left to right or from right to left do the motion of near linear, the instruction that represents user is intended that text instruction, and indicating area is the residing image block areas of line of text of end points movement locus top, indication end; If detect that indication end does the sealing campaign of similar circle shape or approach the motion of sealing, the instruction that represents user is intended that icon instruction, and indicating area is for connecing the image block areas that rectangle comprises in the end points circus movement track of indication end; If detect that indication end does the sealing campaign of similar rectangle or approach the motion of sealing, the instruction schematic diagram that represents user is graphic code instruction, and indicating area is the image block areas comprising in the end points regular-shape motion track of indication end.
12. systems according to claim 8, is characterized in that, described image block extracts and content identifier module comprises image block extraction unit, text identification unit, icon recognition unit and graphic code recognition unit;
The user indicating area that described image block extraction unit is understood out according to described gripper path analysis module, extract the image block data bitmap in described indicating area, and the user who understands out according to described gripper path analysis module indicates intention, call corresponding a kind of recognition unit from text identification unit, icon recognition unit and graphic code recognition unit and identify the content information comprising described image block data bitmap;
Described text identification unit, adopts OCR text recognition method described image block data bitmap to be carried out to content recognition, the text that obtains wherein comprising and character string information;
Described icon recognition unit, according to default icon library, utilizes each icon feature templates in icon library, selects the method for image recognition to carry out content recognition to described image block data bitmap, obtains corresponding to the icon index character string information in icon library;
Described graphic code recognition unit, selects respectively Quick Response Code and bar code recognition method described image block data bitmap to be carried out to content recognition, the character string information that obtains wherein comprising.
CN201410377980.7A 2014-08-04 2014-08-04 The method and system for obtaining remote resource from information carrier surface interactive mode and exporting Active CN104199834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410377980.7A CN104199834B (en) 2014-08-04 2014-08-04 The method and system for obtaining remote resource from information carrier surface interactive mode and exporting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410377980.7A CN104199834B (en) 2014-08-04 2014-08-04 The method and system for obtaining remote resource from information carrier surface interactive mode and exporting

Publications (2)

Publication Number Publication Date
CN104199834A true CN104199834A (en) 2014-12-10
CN104199834B CN104199834B (en) 2018-11-27

Family

ID=52085127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410377980.7A Active CN104199834B (en) 2014-08-04 2014-08-04 The method and system for obtaining remote resource from information carrier surface interactive mode and exporting

Country Status (1)

Country Link
CN (1) CN104199834B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104618741A (en) * 2015-03-02 2015-05-13 浪潮软件集团有限公司 Information pushing system and method based on video content
CN104850484A (en) * 2015-05-22 2015-08-19 成都千牛信息技术有限公司 Bitmap analysis based automatic determination method for character terminal interactive state
CN105631051A (en) * 2016-02-29 2016-06-01 华南理工大学 Character recognition based mobile augmented reality reading method and reading system thereof
CN105956092A (en) * 2016-04-29 2016-09-21 广东小天才科技有限公司 Test question search method and device applied to electronic terminal
CN105989344A (en) * 2015-02-26 2016-10-05 阿里巴巴集团控股有限公司 Barcode recognition method and device
CN107273895A (en) * 2017-06-15 2017-10-20 幻视互动(北京)科技有限公司 Method for the identification of video flowing real-time text and translation of head-wearing type intelligent equipment
CN107465910A (en) * 2017-08-17 2017-12-12 康佳集团股份有限公司 A kind of combination AR glasses carry out the method and system of AR information real time propelling movements
CN107748744A (en) * 2017-10-31 2018-03-02 广东小天才科技有限公司 A kind of method for building up and device for sketching the contours frame knowledge base
CN107831896A (en) * 2017-11-07 2018-03-23 广东欧珀移动通信有限公司 Audio-frequency information player method, device, storage medium and electronic equipment
CN107844552A (en) * 2017-10-31 2018-03-27 广东小天才科技有限公司 One kind sketches the contours frame knowledge base content providing and device
CN108052581A (en) * 2017-12-08 2018-05-18 四川金英科技有限责任公司 A kind of case video studies and judges device
CN108536287A (en) * 2018-03-26 2018-09-14 深圳市深晓科技有限公司 A kind of method and device indicating reading according to user
CN108733687A (en) * 2017-04-18 2018-11-02 陈伯妤 A kind of information retrieval method and system based on Text region
CN108897778A (en) * 2018-06-04 2018-11-27 四川创意信息技术股份有限公司 A kind of image labeling method based on multi-source big data analysis
CN109033455A (en) * 2018-08-27 2018-12-18 深圳艺达文化传媒有限公司 The camera lens labeling method and Related product of promotion video
CN109323159A (en) * 2017-07-31 2019-02-12 科尼克自动化株式会社 Illuminating bracket formula multimedia equipment
CN109947273A (en) * 2019-03-25 2019-06-28 广东小天才科技有限公司 A kind of put reads localization method and device
CN111144414A (en) * 2019-01-25 2020-05-12 邹玉平 Image processing method, related device and system
CN112001380A (en) * 2020-07-13 2020-11-27 上海翎腾智能科技有限公司 Method and system for recognizing Chinese meaning phrases based on artificial intelligence realistic scene
CN112307867A (en) * 2020-03-03 2021-02-02 北京字节跳动网络技术有限公司 Method and apparatus for outputting information
CN114187605A (en) * 2021-12-13 2022-03-15 苏州方兴信息技术有限公司 Data integration method and device and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324922A (en) * 2008-07-30 2008-12-17 北京中星微电子有限公司 Method and apparatus for acquiring fingertip track
CN202600994U (en) * 2012-05-31 2012-12-12 刘建生 Camera type point reading machine and pen head module thereof
CN103236195A (en) * 2013-04-22 2013-08-07 中山大学 On-line touch-and-talk pen system and touch reading method thereof
CN103763453A (en) * 2013-01-25 2014-04-30 陈旭 Image and text collection and recognition device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324922A (en) * 2008-07-30 2008-12-17 北京中星微电子有限公司 Method and apparatus for acquiring fingertip track
CN202600994U (en) * 2012-05-31 2012-12-12 刘建生 Camera type point reading machine and pen head module thereof
CN103763453A (en) * 2013-01-25 2014-04-30 陈旭 Image and text collection and recognition device
CN103236195A (en) * 2013-04-22 2013-08-07 中山大学 On-line touch-and-talk pen system and touch reading method thereof

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989344A (en) * 2015-02-26 2016-10-05 阿里巴巴集团控股有限公司 Barcode recognition method and device
CN104618741A (en) * 2015-03-02 2015-05-13 浪潮软件集团有限公司 Information pushing system and method based on video content
CN104850484A (en) * 2015-05-22 2015-08-19 成都千牛信息技术有限公司 Bitmap analysis based automatic determination method for character terminal interactive state
CN104850484B (en) * 2015-05-22 2017-11-24 成都千牛信息技术有限公司 A kind of character terminal interaction mode automatic judging method based on bitmap analysis
CN105631051A (en) * 2016-02-29 2016-06-01 华南理工大学 Character recognition based mobile augmented reality reading method and reading system thereof
CN105956092A (en) * 2016-04-29 2016-09-21 广东小天才科技有限公司 Test question search method and device applied to electronic terminal
CN108733687A (en) * 2017-04-18 2018-11-02 陈伯妤 A kind of information retrieval method and system based on Text region
CN107273895A (en) * 2017-06-15 2017-10-20 幻视互动(北京)科技有限公司 Method for the identification of video flowing real-time text and translation of head-wearing type intelligent equipment
CN107273895B (en) * 2017-06-15 2020-07-14 幻视互动(北京)科技有限公司 Method for recognizing and translating real-time text of video stream of head-mounted intelligent device
CN109323159A (en) * 2017-07-31 2019-02-12 科尼克自动化株式会社 Illuminating bracket formula multimedia equipment
CN107465910A (en) * 2017-08-17 2017-12-12 康佳集团股份有限公司 A kind of combination AR glasses carry out the method and system of AR information real time propelling movements
CN107748744B (en) * 2017-10-31 2021-01-26 广东小天才科技有限公司 Method and device for establishing drawing box knowledge base
CN107844552A (en) * 2017-10-31 2018-03-27 广东小天才科技有限公司 One kind sketches the contours frame knowledge base content providing and device
CN107748744A (en) * 2017-10-31 2018-03-02 广东小天才科技有限公司 A kind of method for building up and device for sketching the contours frame knowledge base
CN107831896B (en) * 2017-11-07 2021-06-25 Oppo广东移动通信有限公司 Audio information playing method and device, storage medium and electronic equipment
CN107831896A (en) * 2017-11-07 2018-03-23 广东欧珀移动通信有限公司 Audio-frequency information player method, device, storage medium and electronic equipment
CN108052581A (en) * 2017-12-08 2018-05-18 四川金英科技有限责任公司 A kind of case video studies and judges device
CN108536287A (en) * 2018-03-26 2018-09-14 深圳市深晓科技有限公司 A kind of method and device indicating reading according to user
CN108536287B (en) * 2018-03-26 2021-03-02 深圳市同维通信技术有限公司 Method and device for reading according to user instruction
CN108897778A (en) * 2018-06-04 2018-11-27 四川创意信息技术股份有限公司 A kind of image labeling method based on multi-source big data analysis
CN109033455A (en) * 2018-08-27 2018-12-18 深圳艺达文化传媒有限公司 The camera lens labeling method and Related product of promotion video
CN111144414A (en) * 2019-01-25 2020-05-12 邹玉平 Image processing method, related device and system
CN109947273A (en) * 2019-03-25 2019-06-28 广东小天才科技有限公司 A kind of put reads localization method and device
CN109947273B (en) * 2019-03-25 2022-04-05 广东小天才科技有限公司 Point reading positioning method and device
CN112307867A (en) * 2020-03-03 2021-02-02 北京字节跳动网络技术有限公司 Method and apparatus for outputting information
CN112001380A (en) * 2020-07-13 2020-11-27 上海翎腾智能科技有限公司 Method and system for recognizing Chinese meaning phrases based on artificial intelligence realistic scene
CN112001380B (en) * 2020-07-13 2024-03-26 上海翎腾智能科技有限公司 Recognition method and system for Chinese meaning phrase based on artificial intelligence reality scene
CN114187605A (en) * 2021-12-13 2022-03-15 苏州方兴信息技术有限公司 Data integration method and device and readable storage medium

Also Published As

Publication number Publication date
CN104199834B (en) 2018-11-27

Similar Documents

Publication Publication Date Title
CN104199834A (en) Method and system for interactively obtaining and outputting remote resources on surface of information carrier
CN111582241B (en) Video subtitle recognition method, device, equipment and storage medium
US8358320B2 (en) Interactive transcription system and method
CN111339246B (en) Query statement template generation method, device, equipment and medium
US20130108994A1 (en) Adaptive Multimodal Communication Assist System
CN104317827B (en) A kind of picture air navigation aid of commodity
CN109344793A (en) Aerial hand-written method, apparatus, equipment and computer readable storage medium for identification
CN109815955A (en) Topic householder method and system
CN110070089A (en) Calligraphy guidance method and device, computer equipment and medium
CN107783718A (en) A kind of online assignment hand-written based on papery/examination input method and device
CN109858009A (en) Device, method and its computer storage medium of control instruction are generated according to text
CN108509567B (en) Method and device for building digital culture content library
CN111610901B (en) AI vision-based English lesson auxiliary teaching method and system
Lu Mobile augmented reality technology for design and implementation of library document push system
CN112861750B (en) Video extraction method, device, equipment and medium based on inflection point detection
TWM537702U (en) Augmented reality learning and reference system and architecture thereof
Ouali et al. Real-time application for recognition and visualization of arabic words with vowels based dl and ar
CN102339535A (en) System and method for learning text
Karatzas et al. Human-Document Interaction Systems--A New Frontier for Document Image Analysis
CN115294573A (en) Job correction method, device, equipment and medium
Angrave et al. Creating tiktoks, memes, accessible content, and books from engineering videos? first solve the scene detection problem
Li et al. A platform for creating Smartphone apps to enhance Chinese learning using augmented reality
Rai et al. MyOcrTool: visualization system for generating associative images of Chinese characters in smart devices
CN115035759B (en) Chinese character learning system based on tangible user interface and working method thereof
CN110033655A (en) A kind of projection writing method and its system based on laser pen

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant