[embodiment]
Below in conjunction with specific embodiment and accompanying drawing, technical scheme is described in detail.
In one embodiment, as shown in Figure 1, a kind of sign Language Recognition Method, comprises the following steps:
Step S10, gathers the image comprising marked region.
In the present embodiment, marked region is a region in the image gathered, and this region can be formed by interactive device.
Concrete, in one embodiment, interactive device can be hand-held device, part or all of hand-held device can be set as the color of specifying or shape, gather the image of hand-held device, the part of this designated color in the hand-held device in image or shape forms marked region.In addition, interactive device can also be the hand-held device of tape label, namely on hand-held device, attach the mark (as reflectorized material) of designated color or shape, gather the image of hand-held device, on the hand-held device in image, the mark of incidental designated color or shape forms marked region.
In another embodiment, interactive device can also be human body (such as face, palm, arm etc.), gathers the image of human body, and the human body in image forms marked region.In addition, interactive device can also be the human body of tape label, namely on human body, attach the mark (as reflectorized material) of designated color or shape, when gathering the image of human body, the mark of this designated color in image or shape forms marked region.
Step S20, the attitude in identification marking region.
Concrete, the image collected is processed, extracts the marked region in image, then produce the attitude of marked region according to the pixel coordinate of the pixel in marked region in the image coordinate system built.So-called attitude, refers to the posture state that marked region is formed in the picture.Further, in two dimensional image, attitude is marked region in two dimensional image and the angle between predeterminated position, i.e. attitude angle; In 3-D view, the vector that attitude forms for the multiple attitude angle between the marked region in two dimensional image and predeterminated position, i.e. attitude vectors." marked region produce attitude " said in the present invention, " attitude of marked region ", " attitude " all refer to described attitude, namely the attitude angle of different embodiment and attitude vectors.
Step S30, generates the steering order that attitude is corresponding.
In the present embodiment, preset the mapping relations between the attitude of marked region and steering order, and these mapping relations are stored in a database.After identifying the attitude of marked region, the steering order corresponding with attitude can be searched according to the attitude identified from database.
Step S40, converts steering order to natural language information.
The language message of natural language information and normal person's easy understand, as Chinese, English, Latin etc.The mapping table of steering order and natural language information can be pre-set, and store in a database, then by inquiring about steering order in a database to obtain the natural language information corresponding with it.
Such as, the steering order preset and the mapping table of natural language information can be as shown in table 1:
Table 1
Steering order |
Natural language information |
command_cir |
Circle |
command_heart |
Like |
command_ok |
Alright |
...... |
...... |
Using the finger of human body as interactive device, when the finger of user defines a circle, then corresponding generating represents circular steering order command_cir, and then inquiry obtains corresponding natural language information for " circle " in a database.When the finger of user surround one heart-shaped time, then correspondingly generate steering order command_heart, then inquiry obtaining corresponding natural language information for " love " in a database.Be connected between the forefinger and thumb of user, when OK shape is made in the expansion of other three fingers, then corresponding generation steering order command_ok, then inquiry can obtain corresponding natural language information for " good " in a database.
In one embodiment, steering order can be arranged in steering order sequence, generate natural language information according to steering order sequence.
First by generating multiple steering order every the sampling interval T execution step S10, the step S20 that preset, step S30.Again the steering order of generation is arranged in steering order sequence according to the order generated, and generates natural language information according to steering order sequence.
Such as, when the finger of employing user and arm are as interactive device, caught the finger of user and the attitude of arm every 0.1 second, and generate steering order for identifying the attitude in this moment.After generating multiple steering order within a certain period of time, steering order then illustrates the finger of user and the running orbit of arm according to the sequence of the order composition generated.
Further, can calculate the eigenwert of steering order sequence, the mapping relations according to the eigenwert preset and natural language information generate natural language information.
The attitude part for representing attitude information and the coordinate part for representing this attitude coordinate information is in the picture comprised in steering order.When using finger and arm as interactive device, the attitude part of steering order then can represent the shape that the attitude of finger and arm is formed or abstract vector graphics, as the ring-type that the five fingers expansion shape, finger surround.The coordinate part of steering order then illustrates the position of shape on image that finger or arm surround.When image is two dimensional image, coordinate is two-dimensional coordinate.When image is 3-D view, coordinate is three-dimensional coordinate.
Eigenwert is for representing the feature that the steering order sequence with certain similarity is common.Posture feature value can be adopted to represent the variation characteristic of the attitude part in steering order sequence, adopt coordinate characteristic value to represent the variation characteristic of the coordinate part in steering order sequence.Then by the eigenwert of posture feature value and coordinate characteristic value composition control instruction sequence.
Eigenwert and natural language information have default mapping relations.Such as, can in advance eigenwert 2 (1F@1L) corresponding natural language " be thanks ".This eigenwert 2 (1F@1L) is based on the change of the attitude part of steering order sequence, represent that stretching (being represented by the 1F in eigenwert) by thumb becomes thumb and bend (being represented by the 1L in eigenwert), and bent twice (being multiplied by 2 with bracket to represent).In advance corresponding for eigenwert 5F_y-down natural language " can be pressed ".This eigenwert 5F_y-down, based on the change of the coordinate part in steering order sequence, represents that palm (being represented by the 5F in eigenwert) presses (being represented by the y-down in eigenwert) from top to bottom.Further, first the steering order repeated in steering order sequence is removed, and then calculate the eigenwert of steering order sequence.Remove the steering order repeated and can reduce calculated amount.
Further, first the steering order being greater than threshold value in steering order sequence with adjacent steering order difference is removed, and then calculate the eigenwert of steering order sequence.When the sampling interval preset is less, if when the attitude part of certain steering order in steering order sequence or coordinate part and adjacent steering order difference are greater than threshold value, then judge the steering order of this steering order as mistake.False command will be filtered out steering order sequence, and can not be used to the eigenwert calculating steering order sequence.
Such as, when user adopts finger as input media, if the steering order sequence got is: [3F (0,0), 3F (1,0), 3F (2,0), 2F (3,0), 3F (4,0), 3F (5,0)].Wherein attitude part 3F represents 3 attitudes pointed, and 2F represents 2 attitudes pointed, and coordinate part (0,0)-(5,0) represents that finger gesture is in the coordinate in the image of catching.This steering order sequence is for representing the track that three finger horizontal translations of user are formed, and each steering order is the sampled point of this track.Wherein, may be because user is in moving process, there is the situation pointed and close up, steering order 2F (3,0) larger in the upper gap of the number pointed (the attitude part of steering order) with adjacent 3F (2,0), 3F (4,0), therefore be judged as the steering order of mistake, and be moved out of steering order sequence.When certain steering order and adjacent steering order have bigger difference, normally because capture the nonstandard sign language gesture of user, or be blocked when mobile due to interactive device, can not its shape obtained completely and cause.Such as, in deaf and dumb sign language, some gesture needs two hands alternately mobile, in movement, the situation of blocking may be there is, therefore remove the steering order being greater than threshold value (shape different or coordinate distance be greater than threshold value) from adjacent steering order gap, what eigenwert can be made to calculate is more accurate.
When calculating the eigenwert of steering order sequence, first calculate the steering order in steering order sequence posture feature value, then according to posture feature value by steering order sequence segment, the steering order in the steering order subsequence after segmentation has identical posture feature value.The monotonicity of the coordinate then described according to the coordinate part of the steering order in steering order subsequence, in image, the variance of the distance of a certain reference coordinate calculates the coordinate characteristic value of steering order subsequence.Then the posture feature value of all steering order subsequences and coordinate characteristic value are integrated into the eigenwert of steering order sequence.
Such as, to point as interactive device, user makes following action:
Stretch out one and point translation from left to right, stretch the five fingers after moving on to certain position and do quadrant motion, finally fix, and bending forefinger and thumb, forefinger and thumb tip are collided and forms OK shape gesture.
The steering order sequence got is: [1F (0,0), 1F (1,0), 1F (2,0), 1F (3,0), 5F (2.5,1.5), 5F (1.5,2.5), 5F (0,3), OK (0,3), OK (0,3), OK (0,3)].Wherein 1F is the posture feature value of 1 finger, and 5F is the posture feature value of 5 fingers, and OK represents the posture feature value of OK shape gesture, and bracket inner digital represents attitude coordinate figure in the picture.Remove the steering order of repetition and after being greater than the steering order of threshold value with adjacent steering order difference, then according to the difference of posture feature value, steering order sequence be divided into three sub-steering order sequences:
Subsequence 1:[1F (0,0), 1F (1,0), 1F (2,0), 1F (3,0)].
Subsequence 2:[5F (2.5,1.5), 5F (1.5,2.5), 5F (0,3)].
Subsequence 3:[OK (0,3)].
The coordinate characteristic value of subsequence 1 is that x-right, x-right represent that the monotonicity of the coordinate of subsequence 1 is that horizontal ordinate increases progressively (x-axis), ordinate constant (y-axis).The coordinate characteristic value of subsequence 2 is that q-cir, q-cir represent that the track of the coordinate formation of subsequence 2 is less than threshold value to the variance of reference point coordinate (0,0), and namely track is 1/4th circles to the center of circle (0,0).The coordinate characteristic value of subsequence 3 is hold, namely represents the last attitude that fixes.
Then the posture feature value of subsequence 1, subsequence 2 and subsequence 3 and coordinate characteristic value are integrated, the eigenwert obtaining steering order sequence is: 1F_x-right@5F_q-cir@OK_hold.Wherein _ represent separator, for distinguishing posture feature value and coordinate characteristic value with@.
After obtaining the eigenwert of steering order sequence, then obtain the natural language corresponding with the eigenwert of steering order sequence according to the eigenwert preset with the mapping relations of natural language.As above in example, the natural language that 1F_x-right@5F_q-cir@OK_hold is corresponding is " perfection ", what then user made stretch out one points translation from left to right, stretch the five fingers after moving on to certain position and do quadrant motion, finally fix, and bending forefinger and thumb, namely the sign language forming OK shape gesture of forefinger and thumb tip being collided has been converted into natural language " perfection ".
It should be noted that, sign language is herein not limited to the deaf and dumb sign language of standard, also can be user-defined sign language.The corresponding relation of sign language and natural language depends on default eigenwert and the mapping relations of natural language.
Further, when steering order being arranged in steering order sequence, steering order can be arranged in queue according to the order generated, when detecting that there is the identical steering order of continuous predetermined number at queue end, arrange complete, and generate steering order sequence according to queue.
Pre-set buffer area, whenever this steering order is then stored in buffer area by generation steering order in order, form queue.Meanwhile, detect this buffer area, if when there is an identical steering order of continuous predetermined number at queue end, the steering order OK (0,3) as above in example, queue arrangement is complete, then takes out this queue formation control instruction sequence from buffer area.
Intercepting steering order sequence by the identical steering order of continuous predetermined number, making when distinguishing multiple sign language gesture (pause between similar English word) convenient, thus wrong identification when avoiding many gestures.
In one embodiment, after converting steering order to natural language information, natural language information is shown by the mode of text and/or audio frequency.
By display screen, natural language information is shown with the form of word, also by audio frequency apparatus, natural language information can be played back.
Such as, after steering order is changed into natural language information, natural language information is encoded, and send out this coding to remote terminal (mobile phone, computer, TV etc.), first this coding and decoding is become natural language information after remote terminal reception to this coding, then show this natural language information in a text form by display screen or play this natural language information in the form of speech by audio frequency apparatus.
Show natural language information by the mode of text and/or audio frequency, other people can be facilitated to understand natural language information.Natural language information coding is sent to remote terminal by telephone network or internet or television network, and then show this natural language information by remote terminal by decoding, make deaf-mute can realize distance communicating with the normal person being ignorant of sign language, thus facilitate the communication of deaf-mute and normal person.
As shown in Figure 2, in one embodiment, the image comprising marked region collected is two dimensional image, and the detailed process of above-mentioned steps S20 comprises:
Step S202, extracts the pixel with pre-set color Model Matching in image, carries out connected domain detection to the pixel obtained, extract the marked region in the connected domain detecting and obtain.
Concrete, comprise the image of marked region by camera acquisition, the image obtained is two-dimensional visible light image.Preferably, also can add infrared fileter before the camera lens of video camera, for elimination other wave band light except infrared band, then the image gathered is two-dimensional infrared image.Due in visible images, the object in scene can form interference to the identification of marked region, and infrared image is because having filtered out visible ray information, disturbs less, and therefore two-dimensional infrared image is more conducive to extracting marked region.
In the present embodiment, set up color model in advance.The color of such as marked region is red, then set up red model in advance, in this model, the rgb value component of pixel can between 200 to 255, and G, B component can close to zero; Obtain the pixel meeting the rgb value of this red model in the image gathered and be red pixel.In addition, when forming marked region by human body in the image gathered, then the pixel of mating with default complexion model in the image of collection can be obtained.Connected domain detection is carried out to the pixel obtained, obtains multiple connected domain, if connected domain is the set of individual continuous print pixel composition.
In the present embodiment, because the size and shape of marked region should be roughly changeless, when carrying out connected domain to the pixel obtained and detecting, girth and/or the area of all connected domains in the pixel of acquisition can be calculated.Concrete, the girth of connected domain can be the number of connected domain boundary pixel, and the area of connected domain can be the number of the whole pixels in connected domain.Further, the girth of the girth of the connected domain of acquisition and/or area and default marked region and/or area can be contrasted, obtain to meet and preset the girth of marked region and/or the connected domain of area is marked region.Preferably, also can using girth square with the ratio of area as judgment criterion, this ratio of connected domain meets this ratio of default marked region, then this connected domain is marked region.
Step S204, obtains the pixel coordinate in marked region, produces marked region attitude according to this pixel coordinate.
Concrete, in one embodiment, as shown in Figure 3, interactive device comprises portion of the handle and is attached to the mark of portion of the handle, and wherein, mark can be the reflectorized material of elongate in shape, preferably, can be oval or rectangular shape.In other embodiments, interactive device also can be human body, and as face, palm, arm etc., then the marked region in the image collected is the region of human body.
In the present embodiment, marked region is a continuum, the process then producing the attitude of marked region according to pixel coordinate is: the covariance matrix calculating pixel coordinate, obtain covariance matrix eigenvalue of maximum characteristic of correspondence vector, produce the attitude of marked region according to proper vector, the attitude of this marked region is an attitude angle.
Concrete, as shown in Figure 4, build two dimensional image coordinate system, for the some A (u1 of two on this coordinate system, and B (u2 v1), v2), its attitude angle formed is then the arc tangent of slope, i.e. arctan ((v2-v1)/(u2-u1)).Concrete, in the present embodiment, calculate the covariance matrix of the pixel coordinate in the marked region extracted, obtain covariance matrix eigenvalue of maximum characteristic of correspondence vector, the direction of this proper vector is the direction of marked region major axis place straight line.As shown in Figure 4, marked region major axis place rectilinear direction is the direction of A, B 2 place straight lines, if proper vector is [dir_u, dir_v]
t, wherein, the projection of direction on u axle of dir_u descriptive markup region major axis, its absolute value is proportional to the projection of vector on u change in coordinate axis direction (i.e. u2-u1) pointing to B from A; The projection of direction on v axle of dir_v descriptive markup region major axis, its absolute value is proportional to the projection of vector on v change in coordinate axis direction (i.e. v2-v1) pointing to B from A.If dir_u or dir_v is less than 0, be then modified to [-dir_u ,-dir_v]
t, then the attitude angle of marked region is: arctan (dir_v/dir_u).
In another embodiment, marked region comprises the first continuum and the second continuum, the detailed process then producing the attitude of marked region according to described pixel coordinate is: calculate the center of gravity of the first continuum and the center of gravity of the second continuum according to pixel coordinate, produces the attitude of marked region according to the pixel coordinate of the pixel coordinate of the center of gravity of the first continuum and the center of gravity of the second continuum.Concrete, in one embodiment, interactive device comprises portion of the handle and is attached to two marks of portion of the handle.As shown in Figure 5, be labeled as two, be attached to portion of the handle front end respectively, the shape of mark can be oval or rectangle.Preferably, mark can for being positioned at two round dots of handgrip part front end.As shown in Figure 6, mark can be arranged on the two ends of portion of the handle.In other embodiments, also can mark be arranged on human body, such as, be arranged on face, palm or arm.It should be noted that two set marks can be inconsistent in the feature such as size, shape, color.
In the present embodiment, the marked region of extraction comprises two continuums, is respectively the first continuum and the second continuum.Further, the center of gravity of these two continuums is calculated according to pixel coordinate.Concrete, calculate the mean value of the whole pixel coordinates in continuum, the pixel coordinate obtained is the center of gravity of continuum.As shown in Figure 4, the center of gravity of two continuums calculated is respectively A (u1, v1) and B (u2, v2), then the attitude angle of marked region is the arc tangent of slope, i.e. arctan ((v2-v1)/(u2-u1)).
In another embodiment, the image gathered can be 3-D view.Concrete, traditional stereo visual system (video camera known by two locus and Correlation method for data processing equipment form), structured-light system (right video camera, a light source and a Correlation method for data processing equipment forms) or TOF (time of flight, flight time) depth camera collection 3-D view (i.e. three dimensional depth image) can be utilized.
In the present embodiment, as shown in Figure 7, the detailed process of step S20 comprises:
Step S210, to Image Segmentation Using, extracts the connected domain in this image, calculates the property value of connected domain, the property value of connected domain and the marked region property value preset is contrasted, and this marked region is the connected domain meeting the marked region property value that this is preset.
Concrete, when two adjacent pixel depth differences are less than the threshold value preset in three dimensional depth image, such as 5 centimetres, then think that two pixels are communicated with, connected domain detection is carried out to whole image, a series of connected domains comprising mark connected domain can be obtained.
In the present embodiment, the property value of connected domain comprises the size and dimension of connected domain.Concrete, calculate the size/shape of connected domain, contrast with the size/shape of the mark on interactive device, the connected domain obtaining the size/shape meeting mark is the connected domain of marked region (marked region).For rectangle marked, being marked in the image of collection namely on interactive device is rectangle, the length of pre-set mark and width, calculate length and the width of physical region corresponding to connected domain, length and the width of this length and width and mark are more close, then connected domain is more similar to marked region.
Further, the process calculating the length of physical region corresponding to connected domain and width is as follows: the covariance matrix calculating the three-dimensional coordinate of connected domain pixel, the length of the physical region adopting following formulae discovery connected domain corresponding and width:
wherein, k is the coefficient preset, such as, be set to 4, and when λ is covariance matrix eigenvalue of maximum, then l is the length of connected domain, and when λ is the second largest eigenwert of covariance matrix, then l is the width of connected domain.
Further, also can preset the length breadth ratio of rectangle marked, such as length breadth ratio is 2, then the length breadth ratio of the physical region that connected domain is corresponding is more close to the length breadth ratio of the rectangle marked of default settings, then connected domain is more similar to marked region, concrete, the length breadth ratio of the physical region adopting following formulae discovery connected domain corresponding:
wherein, r is the length breadth ratio of connected domain, λ
0for the eigenvalue of maximum of covariance matrix, λ
1for the Second Largest Eigenvalue of covariance matrix.
Step S220, obtains the pixel coordinate in marked region, produces the attitude of marked region according to this pixel coordinate.
Concrete, in the present embodiment, the attitude of marked region is attitude vectors.As shown in Figure 8, build 3-D view coordinate system, this coordinate is right-handed coordinate system.In the coordinate system, if space vector OP, P are projected as p at plane X OY, be then [α, θ] with the attitude vectors of polar coordinate representation vector OP
t, α is angle XOp, and namely X-axis is to Op angle, and span is 0 to 360 degree, and θ is angle pOP, i.e. the angle of OP and XOY plane, and span is that-90 degree are to 90 degree.If 2 on the space ray in this coordinate system is A (x1, y1, z1) and B (x2, y2, z2), then this attitude vectors of 2 [α, θ]
tavailable following formula is uniquely determined:
In the present embodiment, after extracting marked region, calculate the covariance matrix of the pixel coordinate in marked region, obtain covariance matrix eigenvalue of maximum characteristic of correspondence vector, and this proper vector is converted to attitude vectors.Concrete, if the attitude vectors obtained is [dir
x, dir
y, dir
z]
t, wherein, dir
xrepresent 2 distances in the direction of the x axis, dir
yrepresent 2 distances in the y-axis direction, dir
zrepresent 2 distances in the z-axis direction.Can think that the ray that this attitude vectors describes has two points, i.e. (0,0,0) and (dir
x, dir
y, dir
z), namely ray triggers from initial point, points to (dir
x, dir
y, dir
z), then attitude angle need meet above-mentioned formula (1) and (2), makes the x1=0 in above-mentioned formula (1) and (2), y1=0, z1=0, x2=dir
x, y2=dir
y, z2=dir
z, attitude vectors [α, θ] can be obtained
t.
In one embodiment, marked region is a continuum, the process then producing the attitude of marked region according to pixel coordinate is: the covariance matrix calculating pixel coordinate, obtain covariance matrix eigenvalue of maximum characteristic of correspondence vector, produce the attitude of marked region according to proper vector.As mentioned above, the attitude of this marked region is an attitude vectors.
In another embodiment, marked region comprises the first continuum and the second continuum, the detailed process then producing the attitude of marked region according to described pixel coordinate is: calculate the center of gravity of the first continuum and the center of gravity of the second continuum according to pixel coordinate, calculates the attitude of marked region according to the pixel coordinate of the pixel coordinate of the center of gravity of the first continuum and the center of gravity of the second continuum.As shown in Figure 8, in the present embodiment, the pixel coordinate in marked region is three-dimensional coordinate, concrete, and can produce the attitude of marked region according to the pixel coordinate of the center of gravity of two continuums calculated, this attitude is an attitude vectors.
In one embodiment, also can comprise before the step of the attitude in identification marking region: judge that the image gathered is the step of two dimensional image or 3-D view.Concrete, if the image gathered is two dimensional image, then performs above-mentioned steps S202 to step S204, if the image gathered is 3-D view, then perform above-mentioned steps S210 to S220.
As shown in Figure 9, in one embodiment, the detailed process of above-mentioned steps S30 comprises:
Step S302, obtains the attitude of this marked region in current frame image.
As mentioned above, the attitude obtained in step S302 can be the attitude (i.e. attitude angle) of the marked region in the two dimensional image of present frame, also can be the attitude (i.e. attitude vectors) of the marked region in the dark image of three-dimensional of present frame.In the present embodiment, the mapping relations between attitude and steering order are preset.This attitude also can be described as absolute pose.
Step S304, the attitude according to presetting generates the steering order corresponding with this attitude with the mapping relations between steering order.
Such as, steering order is left mouse button instruction and right button instruction.For two dimensional image, the span of attitude angle is that-180 degree are to 180 degree.Can preset the attitude angle in current frame image in the scope of (a, b), then trigger left button instruction, the attitude angle in current frame image in the scope of (c, d), then triggers right button instruction.Wherein, a, b, c, d angle all for presetting, meets a < b, c < d, and the common factor of set [a, b] and set [c, d] is empty.
In addition, in 3-D view, the attitude identified comprises two attitude angle, and one of them attitude angle can be used to obtain steering order, and two attitude angle also can be used to obtain steering order.Use the Method And Principle of one of them attitude angle and two dimensional image similar, then repeat no more at this.When using two attitude angle, if can arrange two attitude angle all within the scope of the instruction triggers preset time, just trigging control instruction.
As shown in Figure 10, in another embodiment, the image comprising marked region of collection is image sequence, and the detailed process of above-mentioned steps S30 comprises:
Step S310, the relative attitude between the attitude obtaining this marked region in the attitude of this marked region in current frame image and previous frame image.
In the present embodiment, can the image sequence that is made up of multiple image comprising marked region of Real-time Collection.As mentioned above, the attitude obtained in step S310 can be the attitude angle of the marked region in current frame image and previous frame image, also can be the attitude vectors of the marked region in current frame image and previous frame image.Relative attitude between attitude in attitude in current frame image and previous frame image is both differences.
Step S320, the relative attitude according to presetting generates the steering order corresponding with this relative attitude with the mapping relations between steering order.
Such as, for two dimensional image, relative attitude is relative attitude angle, the attitude angle that can preset current frame image is greater than 30 degree than the attitude angle increase of previous frame, namely when relative attitude angle is greater than 30 degree, then the instruction of the roller roll counter-clockwise of mouse is triggered, when the attitude angle of current frame image is greater than 40 degree than the attitude angle minimizing of previous frame, namely relative attitude angle is less than-40 when spending, then the instruction that the roller triggering mouse rolls clockwise.The principle of 3-D view is similar with it, then repeats no more at this.
In 3-D view, the attitude identified comprises two attitude angle, and one of them attitude angle can be used to obtain steering order, and two attitude angle also can be used to obtain steering order.Use the Method And Principle of one of them attitude angle and two dimensional image similar, then repeat no more at this.When using two attitude angle, if can arrange the change of two attitude angle when all meeting pre-conditioned, such as first attitude angle change is greater than the first threshold preset, and second attitude angle change is greater than the Second Threshold preset, then trigging control instruction.
In one embodiment, as shown in figure 11, the system of webpage is browsed in a kind of control, comprises image capture module 10, gesture recognition module 20, directive generation module 30 and instruction transformation module 40, wherein:
Image capture module 10 comprises the image of marked region for gathering.
In the present embodiment, marked region is a region in the image gathered, and this region can be formed by interactive device.Concrete, in one embodiment, interactive device can be hand-held device, part or all of hand-held device can be set as the color of specifying or shape, gather the image of hand-held device, the part of this designated color in the hand-held device in image or shape forms marked region.In addition, interactive device can also be the hand-held device of tape label, namely on hand-held device, attach the mark (as reflectorized material) of designated color or shape, gather the image of hand-held device, on the hand-held device in image, the mark of incidental designated color or shape forms marked region.
In another embodiment, interactive device can also be human body (such as face, palm, arm etc.), gathers the image of human body, and the human body in image forms marked region.In addition, interactive device can also be the human body of tape label, namely on human body, attach the mark (as reflectorized material) of designated color or shape, when gathering the image of human body, the mark of this designated color in image or shape forms marked region.
Gesture recognition module 20 is for the attitude in identification marking region.
Concrete, the image collected is processed, extracts the marked region in image, then obtain the attitude of marked region according to the pixel coordinate of the pixel in marked region in the image coordinate system built.So-called attitude, refers to the posture state that marked region is formed in the picture.Further, in two dimensional image, attitude is marked region in two dimensional image and the angle between predeterminated position, i.e. attitude angle; In 3-D view, the vector that attitude forms for the multiple attitude angle between the marked region in two dimensional image and predeterminated position, i.e. attitude vectors." marked region produce attitude " said in the present invention, " attitude of marked region " all refers to described attitude, namely the attitude angle of different embodiment and attitude vectors.
Directive generation module 30 is for generating steering order corresponding to attitude.
In the present embodiment, preset the mapping relations between the attitude of marked region and steering order, and these mapping relations are stored in database (not shown).After identifying the attitude of marked region, the attitude that directive generation module 30 can be used for identifying according to gesture recognition module 20 searches the steering order corresponding with attitude from database.
Instruction transformation module 40, for converting steering order to natural language information.
The language message of natural language information and normal person's easy understand, as Chinese, English, Latin etc.The mapping table of steering order and natural language information can be pre-set, and store in a database, then by inquiring about steering order in a database to obtain the natural language information corresponding with it.
Such as, the steering order preset and the mapping table of natural language information can be as shown in table 2:
Table 2
Steering order |
Natural language information |
command_cir |
Circle |
command_heart |
Like |
command_ok |
Alright |
...... |
...... |
Using the finger of human body as interactive device, when the finger of user defines a circle, then corresponding generating represents circular steering order command_cir, and then inquiry obtains corresponding natural language information for " circle " in a database.When the finger of user surround one heart-shaped time, then correspondingly generate steering order command_heart, then inquiry obtaining corresponding natural language information for " love " in a database.Be connected between the forefinger and thumb of user, when OK shape is made in the expansion of other three fingers, then corresponding generation steering order command_ok, then inquiry can obtain corresponding natural language information for " good " in a database.
In one embodiment, instruction transformation module 40 also can be used for steering order to be arranged in steering order sequence, generates natural language information according to steering order sequence.
Predeterminable sampling interval T, then generates multiple steering order every sampling interval T, then the steering order of generation is arranged in steering order sequence according to the order generated, and then can generate natural language information according to steering order sequence.
Such as, the finger of user and arm can be adopted as interactive device, then caught the finger of user and the attitude of arm every 0.1 second, and generate steering order for identifying the attitude in this moment.After generating multiple steering order within a certain period of time, steering order then illustrates the finger of user and the running orbit of arm according to the sequence of the order composition generated.
Further, instruction transformation module 40 also can be used for the eigenwert calculating steering order sequence, and the mapping relations according to the eigenwert preset and natural language information generate natural language information.
Can comprise two parts information in steering order, a part is the information attitude part for representing attitude, and another part then represents the coordinate part of this attitude coordinate information in the picture.When using finger and arm as interactive device, the attitude part of steering order then can represent the shape that the attitude of finger and arm is formed or abstract vector graphics, as the ring-type that the five fingers expansion shape, finger surround.The coordinate part of steering order then illustrates the position of shape on image that finger or arm surround.When image is two dimensional image, coordinate is two-dimensional coordinate.When image is 3-D view, coordinate is three-dimensional coordinate.
Eigenwert is for representing the feature that the steering order sequence with certain similarity is common.Posture feature value can be adopted to represent the variation characteristic of the attitude part in steering order sequence, adopt coordinate characteristic value to represent the variation characteristic of the coordinate part in steering order sequence.Then by the eigenwert of posture feature value and coordinate characteristic value composition control instruction sequence.
Eigenwert and natural language information have default mapping relations.Such as, can in advance eigenwert 2 (1F@1L) corresponding natural language " be thanks ".This eigenwert 2 (1F@1L) is based on the change of the attitude part of steering order sequence, represent that stretching (being represented by the 1F in eigenwert) by thumb becomes thumb and bend (being represented by the 1L in eigenwert), and bent twice (being multiplied by 2 with bracket to represent).In advance corresponding for eigenwert 5F_y-down natural language " can be pressed ".This eigenwert 5F_y-down, based on the change of the coordinate part in steering order sequence, represents that palm (being represented by the 5F in eigenwert) presses (being represented by the y-down in eigenwert) from top to bottom.
Further, instruction transformation module 40 also for first the steering order repeated in steering order sequence being removed, and then calculates the eigenwert of steering order sequence.Remove the steering order repeated and can reduce calculated amount.
Further, instruction transformation module 40 also for first the steering order being greater than threshold value in steering order sequence with adjacent steering order difference being removed, and then calculates the eigenwert of steering order sequence.When the sampling interval preset is less, if when the attitude part of certain steering order in steering order sequence or coordinate part and adjacent steering order difference are greater than threshold value, then judge the steering order of this steering order as mistake.False command will be filtered out steering order sequence, and can not be used to the eigenwert calculating steering order sequence.
Such as, when user adopts finger as input media, if the steering order sequence got is: [3F (0,0), 3F (1,0), 3F (2,0), 2F (3,0), 3F (4,0), 3F (5,0)].Wherein attitude part 3F represents 3 attitudes pointed, and 2F represents 2 attitudes pointed, and coordinate part (0,0)-(5,0) represents that finger gesture is in the coordinate in the image of catching.This steering order sequence is for representing the track that three finger horizontal translations of user are formed, and each steering order is the sampled point of this track.Wherein, may be because user is in moving process, there is the situation pointed and close up, steering order 2F (3,0) larger in the upper gap of the number pointed (the attitude part of steering order) with adjacent 3F (2,0), 3F (4,0), therefore be judged as the steering order of mistake, and be moved out of steering order sequence.
When certain steering order and adjacent steering order have bigger difference, normally because capture the nonstandard sign language gesture of user, or be blocked when mobile due to interactive device, can not its shape obtained completely and cause.Such as, in deaf and dumb sign language, some gesture needs two hands alternately mobile, in movement, the situation of blocking may be there is, therefore remove the steering order being greater than threshold value (shape different or coordinate distance be greater than threshold value) from adjacent steering order gap, what eigenwert can be made to calculate is more accurate.
When calculating the eigenwert of steering order sequence, instruction transformation module 40 is also for calculating the posture feature value of the steering order in steering order sequence, then according to posture feature value by steering order sequence segment, the steering order in the steering order subsequence after segmentation has identical posture feature value.The monotonicity of the coordinate then described according to the coordinate part of the steering order in steering order subsequence, in image, the variance of the distance of a certain reference coordinate calculates the coordinate characteristic value of steering order subsequence.Then the posture feature value of all steering order subsequences and coordinate characteristic value are integrated into the eigenwert of steering order sequence.
Such as, to point as interactive device, user makes following action:
Stretch out one and point translation from left to right, stretch the five fingers after moving on to certain position and do quadrant motion, finally fix, and bending forefinger and thumb, forefinger and thumb tip are collided and forms OK shape gesture.
The steering order sequence got is: [1F (0,0), 1F (1,0), 1F (2,0), 1F (3,0), 5F (2.5,1.5), 5F (1.5,2.5), 5F (0,3), OK (0,3), OK (0,3), OK (0,3)].Wherein 1F is the posture feature value of 1 finger, and 5F is the posture feature value of 5 fingers, and OK represents the posture feature value of OK shape gesture, and bracket inner digital represents attitude coordinate figure in the picture.Remove the steering order of repetition and after being greater than the steering order of threshold value with adjacent steering order difference, according to the difference of posture feature value, steering order sequence can be divided into three sub-steering order sequences:
Subsequence 1:[1F (0,0), 1F (1,0), 1F (2,0), 1F (3,0)].
Subsequence 2:[5F (2.5,1.5), 5F (1.5,2.5), 5F (0,3)].
Subsequence 3:[OK (0,3)].
The coordinate characteristic value of subsequence 1 is that x-right, x-right represent that the monotonicity of the coordinate of subsequence 1 is that horizontal ordinate increases progressively (x-axis), ordinate constant (y-axis).The coordinate characteristic value of subsequence 2 is that q-cir, q-cir represent that the track of the coordinate formation of subsequence 2 is less than threshold value to the variance of reference point coordinate (0,0), and namely track is 1/4th circles to the center of circle (0,0).The coordinate characteristic value of subsequence 3 is hold, namely represents the last attitude that fixes.
Then the posture feature value of subsequence 1, subsequence 2 and subsequence 3 and coordinate characteristic value are integrated by instruction transformation module 40, and the eigenwert obtaining steering order sequence is: 1F_x-right@5F_q-cir@OK_hold.Wherein _ represent separator, for distinguishing posture feature value and coordinate characteristic value with@.
After obtaining the eigenwert of steering order sequence, then obtain the natural language corresponding with the eigenwert of steering order sequence according to the eigenwert preset with the mapping relations of natural language.As above in example, the natural language that 1F_x-right@5F_q-cir@OK_hold is corresponding is " perfection ", what then user made stretch out one points translation from left to right, stretch the five fingers after moving on to certain position and do quadrant motion, finally fix, and bending forefinger and thumb, namely the sign language forming OK shape gesture of forefinger and thumb tip being collided has been converted into natural language " perfection ".
It should be noted that, sign language is herein not limited to the deaf and dumb sign language of standard, also can be user-defined sign language.The corresponding relation of sign language and natural language depends on default eigenwert and the mapping relations of natural language.
Further, when steering order being arranged in steering order sequence, instruction transformation module 40 also can be used for steering order to be arranged in queue according to the order generated, when detecting that there is the identical steering order of continuous predetermined number at queue end, arrange complete, and generate steering order sequence according to queue.
Pre-set buffer area, whenever this steering order is then stored in buffer area by generation steering order in order, form queue.Meanwhile, detect this buffer area, if when there is an identical steering order of continuous predetermined number at queue end, the steering order OK (0,3) as above in example, queue arrangement is complete, then takes out this queue formation control instruction sequence from buffer area.
Intercepting steering order sequence by the identical steering order of continuous predetermined number, making when distinguishing multiple sign language gesture (pause between similar English word) convenient, thus wrong identification when avoiding many gestures.
In one embodiment, as shown in figure 16, sign Language Recognition also comprises information display module 50, for after converting steering order to natural language information, shows natural language information by the mode of text and/or audio frequency.
Natural language information also for being shown with the form of word by natural language information by display screen, also can be played back by audio frequency apparatus by information display module 50.
Such as, after steering order is changed into natural language information, natural language information is encoded, and send out this coding to remote terminal (mobile phone, computer, TV etc.), first this coding and decoding is become natural language information after remote terminal reception to this coding, then show this natural language information in a text form by display screen or play this natural language information in the form of speech by audio frequency apparatus.
Show natural language information by the mode of text and/or audio frequency, other people can be facilitated to understand natural language information.Natural language information coding is sent to remote terminal by telephone network or internet or television network, and then show this natural language information by remote terminal by decoding, make deaf-mute can realize distance communicating with the normal person being ignorant of sign language, thus facilitate the communication of deaf-mute and normal person.
As shown in figure 12, in one embodiment, the image that image capture module 10 collects is two dimensional image, and gesture recognition module 20 comprises the first image processing module 202 and the first attitude generation module 204, wherein:
First image processing module 202, for extracting the pixel with pre-set color Model Matching in image, carries out connected domain detection to the pixel obtained, and extracts the marked region in the connected domain detecting and obtain.
Concrete, image capture module 10 can be video camera, and its image collected can be two-dimensional visible light image.Preferably, also can add infrared fileter before the camera lens of video camera, for elimination other wave band light except infrared band, then the image that image capture module 10 gathers is two-dimensional infrared image.Due in visible images, the object in scene can form interference to the identification of marked region, and infrared image is because having filtered out visible ray information, disturbs less, and therefore two-dimensional infrared image is more conducive to extracting marked region.
Concrete, the first image processing module 202 is for setting up color model in advance.The color of such as marked region is red, then set up red model in advance, in this model, the rgb value component of pixel can between 200 to 255, and G, B component can close to zero; First image processing module 202 is red pixel for the pixel of the rgb value meeting this red model in getting frame image.In addition, when forming marked region by human body in the image gathered, the first image processing module 202 is for obtaining the pixel of mating with default complexion model in image.First image processing module 202 also for carrying out connected domain detection to the pixel obtained, obtains multiple connected domain, if connected domain is the set of individual continuous print pixel composition.
In the present embodiment, because the size and shape of marked region should be roughly changeless, the first image processing module 202, when carrying out connected domain to the pixel obtained and detecting, can calculate girth and/or the area of all connected domains in the pixel of acquisition.Concrete, the girth of connected domain can be the number of connected domain boundary pixel, and the area of connected domain can be the number of the whole pixels in connected domain.Further, the first image processing module 202 can be used for the girth of the girth of the connected domain of acquisition and/or area and default marked region and/or area to contrast, and obtains to meet to preset the girth of marked region and/or the connected domain of area is marked region.Preferably, the first image processing module 202 also can be used for using girth square with the ratio of area as judgment criterion, this ratio of connected domain meets this ratio of default marked region, then this connected domain is marked region.
First attitude generation module 204, for obtaining the pixel coordinate in marked region, produces the attitude of marked region according to this pixel coordinate.
In the present embodiment, the attitude that marked region produces is attitude angle.In one embodiment, marked region is a continuum, then the first attitude generation module 204 is for calculating the covariance matrix of pixel coordinate, obtain covariance matrix eigenvalue of maximum characteristic of correspondence vector, produce the attitude of marked region according to proper vector, the attitude of this marked region is an attitude angle.
In another embodiment, marked region comprises the first continuum and the second continuum, then the first attitude generation module 204 is for the center of gravity of the center of gravity and the second continuum that calculate the first continuum according to pixel coordinate, calculates the attitude of marked region according to the pixel coordinate of the pixel coordinate of the center of gravity of the first continuum and the center of gravity of the second continuum.Concrete, calculate the mean value of the whole pixel coordinates in continuum, the pixel coordinate obtained is the center of gravity of continuum.
In another embodiment, the image that image capture module 10 collects is 3-D view.Concrete, image capture module 10 can adopt traditional stereo visual system (video camera known by two control positions and related software form), structured-light system (right video camera, a light source and a related software forms) or TOF (time of flight, flight time) depth camera to realize collection 3-D view (i.e. three dimensional depth image).
In the present embodiment, as shown in figure 13, gesture recognition module 20 comprises the second image processing module 210 and the second attitude generation module 220, wherein:
Second image processing module 210 is for described Image Segmentation Using, extract the connected domain in image, and calculate the property value of connected domain, the property value of connected domain and the marked region property value preset are contrasted, described marked region is the connected domain meeting described default marked region property value.
Concrete, second image processing module 210, for when two adjacent pixel depth differences are less than the threshold value preset in 3-D view, such as 5 centimetres, then thinks that two pixels are communicated with, connected domain detection is carried out to whole image, a series of connected domains comprising mark connected domain can be obtained.
In the present embodiment, the property value of connected domain comprises the size and dimension of connected domain.Concrete, second image processing module 210 is for calculating the size/shape of connected domain, contrast with the size/shape of the mark on interactive device, the connected domain obtaining the size/shape meeting mark is the connected domain of marked region (marked region).For rectangle marked, being marked in the image of collection namely on interactive device is rectangle, the length of pre-set mark and width, second image processing module 210 is for calculating length and the width of physical region corresponding to connected domain, length and the width of this length and width and mark are more close, then connected domain is more similar to marked region.
Further, second image processing module 210 is as follows for the process of the length and width that calculate physical region corresponding to connected domain: the covariance matrix calculating the three-dimensional coordinate of connected domain pixel, the length of the physical region adopting following formulae discovery connected domain corresponding and width:
wherein, k is the coefficient preset, such as, be set to 4, and when λ is covariance matrix eigenvalue of maximum, then l is the length of connected domain, and when λ is the second largest eigenwert of covariance matrix, then l is the width of connected domain.
Further, second image processing module 210 also can be used for the length breadth ratio presetting rectangle marked, such as length breadth ratio is 2, then the length breadth ratio of the physical region that connected domain is corresponding is more close to the length breadth ratio of the rectangle marked of default settings, then connected domain is more similar to marked region, concrete, attribute matching module 234 is for the length breadth ratio of the physical region that adopts following formulae discovery connected domain corresponding:
wherein, r is the length breadth ratio of connected domain, λ
0for the eigenvalue of maximum of covariance matrix, λ
1for the Second Largest Eigenvalue of covariance matrix.
Second attitude generation module 220, for obtaining the pixel coordinate in marked region, produces the attitude of marked region according to described pixel coordinate.
In the present embodiment, the attitude of marked region is attitude vectors.In one embodiment, marked region is a continuum, then the second attitude generation module 220 is for calculating the covariance matrix of pixel coordinate, obtains covariance matrix eigenvalue of maximum characteristic of correspondence vector, produces the attitude of marked region according to proper vector.As mentioned above, the attitude of this marked region is an attitude vectors.
In another embodiment, marked region comprises the first continuum and the second continuum, then the second attitude generation module 220 is for the center of gravity of the center of gravity and the second continuum that calculate the first continuum according to pixel coordinate, produces the attitude of marked region according to the pixel coordinate of the pixel coordinate of the center of gravity of the first continuum and the center of gravity of the second continuum.In the present embodiment, the pixel coordinate in marked region is three-dimensional coordinate, concrete, and can produce the attitude of marked region according to the pixel coordinate of the center of gravity of two continuums calculated, this attitude is an attitude vectors.
In one embodiment, gesture recognition module 20 also comprises judge module (not shown), for judging that the image gathered is two dimensional image or 3-D view.Concrete, in the present embodiment, when the image that judge module determines collection is two dimensional image, then notifies that the first image processing module 202 extracts the marked region in two dimensional image, and then produce the attitude of this marked region by the first attitude generation module 204.When the image that judge module determines collection is two dimensional image, then notifies that the second image processing module 210 extracts the marked region in 3-D view, and then produce the attitude of this marked region by the second attitude generation module 220.Understandable, in the present embodiment, gesture recognition module 20 comprises judge module (not shown), the first image processing module 202, first attitude generation module 204, second image processing module 210 and the second attitude generation module 220 simultaneously.The present embodiment, both by the attitude in two dimensional image identification marking region, again by the attitude in two dimensional image identification marking region.
As shown in figure 14, in one embodiment, directive generation module 30 comprises the first attitude acquisition module 302 and module 304 is searched in the first instruction, wherein:
First attitude acquisition module 302 for obtaining the attitude of the described marked region in current frame image from gesture recognition module 20.
Concrete, this attitude can be the attitude angle of the marked region in the two dimensional image of present frame, also can be the attitude vectors of the marked region in the three dimensional depth image of present frame.In the present embodiment, the mapping relations between attitude and steering order are preset.This attitude also can be described as absolute pose.
First instruction searches module 304 for generating the steering order corresponding with described attitude according to the attitude preset with the mapping relations between steering order.
In the present embodiment, the image comprising marked region gathered can be image sequence.First attitude acquisition module 302 also for from gesture recognition module 20, obtain the marked region in the attitude of the marked region in current frame image and previous frame image attitude between relative attitude.First instruction searches module 304 also for generating the steering order corresponding with relative attitude according to the relative attitude preset with the mapping relations between steering order.
In another embodiment, the image comprising marked region gathered can be image sequence.As shown in figure 15, directive generation module 30 comprises the second attitude acquisition module 310 and module 320 is searched in the second instruction, wherein:
Second attitude acquisition module is used for the relative attitude between the attitude of the marked region obtained from gesture recognition module 20 in the attitude of the marked region in current frame image and previous frame image.
Second instruction searches module 320 for generating the steering order corresponding with relative attitude according to the relative attitude preset with the mapping relations between steering order.
Above-mentioned sign Language Recognition Method and system, go out according to the image recognition comprising marked region collected the attitude that marked region produces, and generate steering order corresponding to attitude, then this steering order is converted to the natural language information of normal person's easy understand.Because the image by obtaining limb action judges movement locus and the attitude of limb action, therefore, all there is record to the process of whole limb action, thus avoid the situation missing certain action or attitude in default of sensor, thus improve accuracy rate when identifying sign language.
The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.