CN1246793C - Method of hand language translation through a intermediate mode language - Google Patents

Method of hand language translation through a intermediate mode language Download PDF

Info

Publication number
CN1246793C
CN1246793C CN 02121369 CN02121369A CN1246793C CN 1246793 C CN1246793 C CN 1246793C CN 02121369 CN02121369 CN 02121369 CN 02121369 A CN02121369 A CN 02121369A CN 1246793 C CN1246793 C CN 1246793C
Authority
CN
China
Prior art keywords
sign language
language
points
face
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 02121369
Other languages
Chinese (zh)
Other versions
CN1464433A (en
Inventor
高文
马继勇
王春立
吴江琴
陈熙林
宋益波
尹宝才
王兆其
山世光
曾炜
晏洁
吴枫
姚鸿勋
张洪明
吕岩
王瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN 02121369 priority Critical patent/CN1246793C/en
Publication of CN1464433A publication Critical patent/CN1464433A/en
Application granted granted Critical
Publication of CN1246793C publication Critical patent/CN1246793C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)
  • Image Processing (AREA)

Abstract

The present invention relates to a method of hand language translation through an intermediate mode language, which comprises: acquiring data of the hand language words, picking up characteristic information in the data of the hand language words, identifying continuous sentences in the hand language according to the characteristic information, recording the identification result of the data of the intermediate mode language, transforming the data of the intermediate mode language into the non-hand language words and outputting the non-hand language words; furthermore, acquiring the data of the non-hand language words, transforming the non-hand language words into the data of the intermediate mode language and recording the data of the intermediate mode language according to the corresponding relationship between the data of the intermediate mode language and the non-hand language, finding the corresponding data of the hand language words in a hand language word library according to the date of the intermediate mode language, combining the data of the hand language words into hand language image information and outputting the hand language image information. The present invention makes both the hand language and the non-hand language mode language correspond to the intermediate mode language, which is favorable for the expansion of a hand language translation system and has convenience for the mutual transformation between the non-hand language and the hand language.

Description

Method for sign language translation through intermediate mode language
Technical Field
The present invention relates to a method for translating sign language by an intermediate mode language, and more particularly, to a method for translating sign language into a non-sign language in an intermediate mode language data format and translating the non-sign language into sign language in an intermediate mode language data format.
Background artOperation of the art
The language is an indispensable tool for people to communicate with each other, but the number of the languages in use in the world is hundreds, and if the language is added with local languages, sign languages of deaf people and the like, the number of the languages is difficult to count; such a large variety of languages makes it very difficult to communicate between people using different languages, not only between physiologically healthy people, but also in some people with physiological disabilities; therefore, translation between various languages has been a big problem that plagues all mankind.
With the continuous progress of human science and technology, especially in the last 20 years, the rapid development of computer technology has made it practical to translate one perceptual language into another language using a computer. However, currently, language translation by a computer is usually to directly translate only one mode of language into another mode of language; this method of translation and its system have the following disadvantages:
in the field of computer automatic translation, there is usually only one fixed vocabulary correspondence between one translated language and one translation language. Therefore, the existing translation system can not realize the translation of multiple languages for one translated language; if the translation system is used for realizing the translation from one translated language to a plurality of languages, a corresponding fixed vocabulary corresponding relation is required to be established between the translated language and each translation language; this makes it on the one hand very laborious to design and implement a corresponding translation system, and on the other hand language extensions of the translation system are not easy to implement.
Disclosure of Invention
The invention mainly aims to provide a method for sign language translation through an intermediate mode language, which translates the sign language into the intermediate mode language and further translates the language of the intermediate mode into a required non-sign language form; or converting a non-sign language into an intermediate mode language and then translating the intermediate mode language into a sign language.
Another objective of the present invention is to provide a method for sign language translation through an intermediate mode language, wherein the sign language and the language in the non-sign language mode both correspond to the intermediate mode language, which is beneficial to the extension of the sign language translation system, so as to realize the interconversion between the non-sign language and the sign language.
The purpose of the invention is realized as follows:
the invention provides a method for sign language translation through an intermediate mode language, which translates sign language into non-sign language through an intermediate mode language data form and comprises the following specific steps:
step 101: collecting sign language word data;
step 102: extracting characteristic information in the sign language word data;
step 103: carrying out sign language continuous statement identification according to the characteristic information, and then recording the identification result of the intermediate mode language data;
step 104: and converting the intermediate mode language data into non-sign language words and outputting the non-sign language words according to the corresponding relation between the intermediate mode language data and the corresponding non-sign language.
The invention also provides a method for sign language translation through the intermediate mode language, which translates the non-sign language into the sign language through an intermediate mode language data form, and comprises the following specific steps:
step 201: collecting non-sign language word data;
step 202: converting the non-sign language words into intermediate mode language data and recording according to the corresponding relation between the intermediate mode language data and the non-sign language;
step 203: and finding corresponding sign language word data in the sign language word library according to the intermediate mode language data, synthesizing the sign language word data into sign language image information and outputting the sign language image information.
The method for converting sign language into non-sign language or converting non-sign language into sign language further comprises the following steps: corresponding face information is also collected while sign language word data or non-sign language data is collected; then, extracting the characteristic data in the face information, and finally synthesizing the output face image when the characteristic data is used for translation output.
The specific method for collecting sign language word data comprises the following steps: collecting sensing data of each joint of the hand by using a data glove; inputting position and direction data of sign language gestures by adopting a position tracker; wherein, the data glove is arranged on the left hand and the right hand of the human body; the position tracker comprises a transmitter and more than one receiver; the transmitter sends out electromagnetic waves, the receiver is arranged on the left wrist and the right wrist of a human body, and the receiver receives the electromagnetic waves and calculates the position and the direction data of the receiver relative to the transmitter.
The specific method for extracting the characteristic information in the sign language word data comprises the following steps: calculating the positions and the directions of the left hand and the right hand relative to a reference, carrying out normalization processing on each component of sensing data of each joint of the hands, and establishing a sign language sample model library by taking the processed data as a training sample of a hidden Markov model (HMM for short) 1.
One HMM available parameter as described above: λ ═ (a, B, pi) represents,
wherein A ═ { a ═ aijIs a state transition probability matrix,
and satisfies the formula: a isij=P[qt+1=Sj|qt=Si],1≤i,j≤N;
And the constraint conditions are satisfied: a isij≥0,1≤i,j≤N, Σ j = 1 N a ij = 1 , 1≤i≤N;
In the above formula, N is the number of states of the model;
π={πi},πirepresenting the probability of starting from the ith state node,
and satisfies the formula: pii=P[q1=Si],1≤i≤N;
And the constraint conditions are satisfied: pii≥0, Σ i = 1 N π i = 1 , 1≤i≤N;
B={bj(k) Is the probability density of the observed signal, b is the continuous vector of observed symbolsj(k) Is a continuous probability density function, and: b j ( k ) = Σ m = 1 M c jm G [ μ jm , Σ jm , O k ] , 1≤j≤N;
where N is the number of states of the model, M is the number of mixed terms, OkIs an observation vector at the k moment; c. CjmIs a Mixing ratio (Mixing probability) and satisfies:
Σ m = 1 M c jm = 1 , 1≤j≤N,cjm≥0,1≤j ≤N,1≤m≤M
wherein: g is taken as the Gaussian probability density function, mujmSum ΣjmRespectively a mean vector and a covariance matrix of the mth component in the Gaussian mixture probability density;
G [ u jm · Σ jm · O K ] = 1 2 π | Σ jm | 1 2 exp [ - 1 2 ( O k - μ jm ) T Σ jm - 1 ( O k - μ jm ) ]
k sets of training data O ═ O corresponding to the same gesture word(1),O(2),…,O(k)]Wherein O ( k ) = [ O 1 ( k ) O 2 ( K ) … O T k ( k ) ] Is the k-th set of training data, TkIs the total frame number of the kth group of training data;
πi,aijand cjm,μm,∑mThe reevaluation formula of (c) is:
π ‾ i = 1 K Σ k = 1 K γ 1 ( k ) ( i )
i.e. the expected probability of the occurrence of the state node i at the time t equal to 1;
a ‾ ij = Σ k = 1 K Σ t = 1 T k - 1 ξ t ( k ) ( i , j ) / Σ k = 1 K Σ t = 1 T k - 1 γ t ( k ) ( i ) , wherein,
a partial fraction of the expected probability of transitioning from state node i to state node j;
the mother part is the expected probability of transition from the state node i;
c ‾ jm = Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) / Σ k = 1 K Σ t = 1 T k Σ m = 1 M γ t ( k ) ( j , m ) , wherein,
the fraction is the expected probability of occurrence at the mth branch of the state node j;
the sub-parent part is the expected probability of occurrence at the state node j;
μ ‾ m = Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) · O t ( k ) / Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) , wherein,
the fraction is the expected probability of the observation sequence 0 appearing in the mth branch of the state node j;
the mother part is the expected probability of the mth branch appearing at the state node j;
Σ ‾ m = Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) · ( O t ( k ) - μ m ) ( O t ( k ) - μ m ) ′ / Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) , wherein,
the branch part is an expected probability of the mean square error of the observation sequence 0 appearing in the mth branch of the state node j;
the denominator part is the expected probability of the occurrence of the mth branch at the state node j.
The specific method for recognizing the sign language continuous sentences comprises the following steps:
after the model base is built, likelihood probabilities of the test sample and various possible model sequences are calculated by a Viterbi (Viterbi) decoding method, and the word sequence corresponding to the model sequence with the maximum probability value is the recognition result.
Let the vocabulary have a capacity V, the model number of the word is k 1, and V, and the corresponding model parameter is (pi)k,Ak,ck,μk,Uk) The number of states of each word is L, and the input sign language frame sequence number is i ═ 1, 2.
When vocabulary is transferred within the same model (i > 1):
Pr ( j , k ) = p ( i , j , k ) * max { Pr ( j , k ) * A j , j k , Pr ( j - 1 , k ) * A j - 1 , j k }
T(i,j,k)=k
F ( i , j , k ) = j 0 = arg max { Pr ( j , k ) * A j , j k , Pr ( j - 1 , k ) * A j - 1 , j k }
when a vocabulary transitions at the boundary of the model (j ═ 1):
Pr ( j , k ) = p ( i , j , k ) * max { Pr ( j , k ) * A j , j k , Pr ( L , k * ) * Pr ( k | k * ) , 1 ≤ k * ≤ V }
T ( i , j , k ) = k 0 = arg max { Pr ( j , k ) * A j , j k , Pr ( L , k * ) * Pr ( k | k * ) , 1 ≤ k * ≤ V }
F ( i , j , k ) = j 0 = arg max { Pr ( j , k ) * A j , j k , Pr ( L , k * ) * Pr ( k | k * ) , 1 ≤ k * ≤ V }
initially (i is 1), Pr (1, k) is p (1, 1, k), Pr (j, k) is 0, j > 1
T(1,j,k)=-1
F(1,j,k)=-1
The above formula can be used to recursively solve for each Pr (L, k) and thus a global maximum probability can be obtained:
Pr max = max k { Pr ( L , k ) }
backtracking the optimal path (T) from T (i, j, k) and F (i, j, k)i,Fj) (reverse order):
T N = arg max k { Pr ( L , k ) }
FN=L
Ti=T(i+1,Ti+1,Fi+1),N-1≥i≥1
Fi=F(i+1,Ti+1,Fi+1),N-1≥i≥1
obtaining an identification result; wherein,
p (i, j, k): is the probability of the ith frame occurring at the jth state of word k;
pr (j, k): the maximum probability is about the jth state of the word k from the beginning of the state transition until the current input frame i;
t (i, j, k): recording the serial number of the model where the previous frame is located;
f (i, j, k): the state of the previous frame in the model T (i, j, k) is recorded.
In order to improve the recognition accuracy, a second order markov chain (Bigram) is embedded in the Viterbi search process, that is: the prior probability of a sentence can be calculated by:
P ( W ) = P ( w 1 , w 2 , . . . w n ) = Π t = 1 n P ( w i | w i - 1 )
wherein,
w is the recognized sentence;
w1,w2,...wnfor each word in the sentence W being recognized;
P(w1|wi-1) The frequency of occurrence of word pairs.
The specific method for synthesizing the sign language word data into the sign language image information comprises the following steps:
establishing a virtual human by adopting a VRML human body representation model;
determining the angle value of each degree of freedom of the virtual human;
calculating the position and the direction of each limb of the virtual human, and determining a gesture of the virtual human;
ignoring the non-upper extremity joint angle for that pose;
and continuously displaying each gesture in one sign language motion according to a specified time interval to generate a corresponding sign language motion image.
When generating a sign language moving image, smooth interpolation is further performed between adjacent frames of the sign language moving image: the specific interpolation is calculated according to the following formula:
G i ( t f ) = G i ( t 1 ) + t f - t f t f 2 - t f 1 ( G i ( t f 2 ) - G i ( t f 1 ) )
wherein,
f1and f2Two adjacent image frames in a sign language movement are respectively;
tf1and tf2Are respectively f1And f2A time value from a starting point;
t1is the time value from the starting point of the inserted frame;
t1a time value as a starting point;
Gi(tf) Function value of curve for inserted degree of freedom
Gi(f1) A degree of freedom curve function value as a starting point;
Gi(tf1) Is f1A degree of freedom curve function of;
Gi(tf2) Is f2Function value of the degree of freedom curve of (1).
When generating the sign language moving image, the method further adopts a four-tuple-based motion interpolation method to carry out smooth transition processing on the complex joints in the discontinuous sign language frames, and the specific smooth processing is calculated according to the following formula:
q t f = sin ( 1 - t f ′ ) θ sin θ q f 1 + sin ( t f ′ θ ) sin θ q f 2
wherein,
f1and f2Respectively being image frames of two adjacent gestures in a sign language movement;
tf1and tf2Are respectively f1And f2A time value from a starting point;
qf1and q isf2Each joint is at tf1,tf2The direction of the time;
tftime value of inserting frame distance from starting point;
theta isf1·qf1Cos θ.
The extracting the feature data in the face information at least comprises the following steps: detecting front face features and side face feature points; wherein,
the detection of the front face features at least comprises the following steps: coarse positioning of facial features, detection of key feature points and detection of feature shapes based on a deformation template;
the detection of the characteristic points of the face at the side at least comprises the following steps: extracting a face side contour line and detecting face side characteristic points.
The coarse positioning of the facial features is as follows: the method comprises the steps of firstly positioning the position of an iris, and then obtaining position data of other organs of a human face according to position data of the center point of the iris, statistical prior data of the structure of the facial organs and the gray level distribution characteristics of the facial organs.
The detection of the key feature points of the human face is as follows: acquiring main characteristic points on an eye corner point, a mouth corner point and a chin curve as initial values of corresponding organ template parameters; the method specifically comprises the following steps: detecting eye key points, mouth key points and chin key points; wherein: the eye key points comprise left and right eye corner points and boundary points of upper and lower eyelids; the key points of the mouth part comprise two mouth corner points, an upper lip highest point and a lower lip lowest point; the point of interest of the chin comprises the intersection point of the extension lines of the left and right mouth corners and the chin, the intersection point of the perpendicular line passing through the middle lip point and the chin, the intersection point of the perpendicular line passing through the left and right mouth corner points and the chin, the intersection point of the straight line passing through the left mouth corner point and 45 degrees left and lower and the chin, and the intersection point of the straight line passing through the right mouth corner point and 45 degrees right and lower and the chin.
The characteristic shape detection based on the deformed template comprises the following steps: detecting the characteristic shape of the eye area to obtain eye template parameters; detecting the shape of the mouth to obtain mouth template parameters; and detecting the shape of the chin to obtain parameters of the chin template.
The extraction of the side contour line of the human face is as follows: segmenting a face region by using the skin color characteristics of the face; then edge detection is adopted, and contour lines are positioned according to prior data of the face contour.
The detection of the face side characteristic points is as follows: dividing the contour line of the human face into an upper section and a lower section by taking the nose tip point as a boundary; and obtaining an approximate function expression of the contour line through curve fitting, calculating a point with a first derivative of the function being zero, and taking the point as a face side characteristic point.
The method for synthesizing and outputting the face image by using the feature data comprises the following steps: defining a plurality of feature points on a human face model, wherein the feature points can be extracted from front and side images of a specific person, and the automatic extraction of the feature points belongs to the detection and analysis category of human face images; it is assumed that analysis and recognition techniques have been applied to extract the desired features or deformation curves from a particular face image, which are then taken as deformation parameters for a generic face model. The general face neutral model is a three-dimensional grid body, the three-dimensional coordinates of each characteristic point are known, and two kinds of transformation are carried out in the modification process from the general face neutral model to the specific face neutral model; firstly: the general human face neutral model is subjected to overall transformation, and the overall outline of the face is modified to be matched with the approximate positions of the facial shape and five sense organs of a specific human. Setting the coordinates of the point on the face model before transformation as (x, y, z) and after transformation as (x ', y ', z '), and setting the coordinates of the central point of the front face and the central point of the rear face before transformation as o (x)0,y0,z0) And o (x'0,y′0,z′0) In which the faceThe central point o is defined as the intersection point between the connecting line of the eye corner points of the two eyes and the longitudinal central axis of the face. Parameters p, q1 and q2 are respectively defined as the distance from the center point to the temple, the center point to the center point of the forehead and the center point to the chin; the parameter u is defined as the distance from the center point of the mouth to the lower edge of the ear; the parameters r1, r2 and r3 are respectively defined in the lateral side as the distance from the highest point of the forehead to the hairline, the distance from the center point to the upper edge of the ear and the distance from the corner point of the mouth to the lower edge of the ear.
For the top half face (the part above the eye horizon) the following modification formula is used:
x′=x0′+p′/p(x-x0)
y′=y0′+q1′/q1(y-y0)
z′=z0′+r1′/r1(z-z0) The middle and lower portions of the face may be similarly modified.
The modification formula of the points on the eye area in the partial transformation is as follows:
assuming that (x, y, z) is the coordinates of the transformed anterior eye region point and (x ', y ', z ') is the coordinates of the transformed posterior eye region point, then:
x′=ax+by+cz
y′=dx+ey+fz
the a, b, c, d, e and f can be obtained by solving a six-membered linear equation set by bringing three groups of characteristic points before and after transformation; through this transformation, a movement of the minute positions of the eye and a change of its shape can be achieved. As seen from the above equation, the modification only occurs in the x and y directions, and no modification in depth is performed, and therefore side information of the characteristic face is not well reflected. The same applies to the modification of the eyebrows and the mouth.
Let the coordinates of the nose region points before and after transformation be (x, y, z) and (x ', y ', z '), respectively, and the coordinates of the center point of the nose be (x)0,y0,z0) And (x'0,y′0,z′0) The transformation to the nose portion is calculated according to the following formula:
x′=x0′+p′/p(x-x0)
y′=y0′+q′/q(y-y0)
z′=z0′+r′/r(z-z0)
after the whole transformation and the partial transformation are completed, a neutral three-dimensional face mesh body basically having specific face features is obtained.
The method for synthesizing and outputting the face image by using the feature data further comprises the following steps: the lip model uses two parabolas to fit the upper lip line, and one parabola to fit the lower lip line. Selecting two mouth corner points, the highest points of the two upper lip parabolas, the lowest point of the lower lip parabola and the intersection point of the two upper lip parabolas; in addition, a plurality of points are respectively added on the lower lip parabola, the upper lip parabola and the connecting line of the two angular points of the mouth to respectively become the points on the upper inner lip parabola and the lower inner lip parabola. The parabola of the outer contour of the lip portion then satisfies the following formula:
y-a(x-b)2+c
the coefficients a, b, c can be solved by substituting the coordinates of known points into the above equation.
The dynamic lip movement model is described as five mutually-associated parabolas by utilizing the opening model, wherein the five parabolas comprise two upper lips, one lower lip and one upper inner lip and one lower inner lip. In the lip dynamic synthesis driven by parameters, the opening distance in the longitudinal direction and the transverse direction is given, or the mouth corner point and the highest points of the upper lip and the lower lip are given, so that the corresponding lip opening state can be determined.
The invention provides a method for sign language translation through an intermediate mode language, which translates sign language and non-sign language into the intermediate mode language, and further translates the intermediate mode language into a required language form; the languages of the sign language and the non-sign language mode correspond to the intermediate mode language, so that the extension of a sign language translation system is facilitated, and the mutual conversion between the non-sign language and the sign language is facilitated.
The invention is further described in detail below with reference to the following figures and specific examples:
drawings
Fig. 1 is a schematic diagram of the basic principle of the present invention.
FIG. 2 is a flow chart illustrating the translation of sign language into intermediate mode language according to the present invention.
FIG. 3 is a flow chart illustrating the translation of an intermediate mode language into a sign language according to the present invention.
Fig. 4 is a schematic view of an overall flow structure according to an embodiment of the present invention.
Detailed Description
Referring to fig. 1, the basic principle of the present invention is: using intermediate mode language M as sign language MAAnd a non-sign language MBThe necessary route for switching between; namely: sign language MATranslation to non-sign language MBOr using a non-sign language MBTranslating into sign language MAAll through an intermediate mode language M.
Example 1: converting sign language into speech output
Referring to fig. 2 and 4, the specific method for translating sign language into non-sign language through an intermediate schema language data form is as follows:
firstly, collecting sign language word data; in one embodiment of the invention, two data gloves with 18 sensors and a matched device thereof, namely a position tracker, are adopted as gesture input devices, the position tracker is composed of a transmitter and a plurality of receivers, the transmitter sends out electromagnetic waves, each receiver receives the electromagnetic waves, and then the position and direction data of the receiver relative to the transmitter are calculated according to the received electromagnetic waves. The left and right wrists of a human body doing sign language movement are respectively provided with a receiver, because the position of a transmitter is not fixed, the coordinates of sign language data collected during testing are changed frequently, therefore, a third receiver is placed on the human body, the position of the receiver is used as a reference point and a reference coordinate system, and the position and the direction of the receiver on the left hand and the right hand relative to the reference coordinate system are acquired as invariant characteristics by referring to the position and the direction data of the third receiver.
The specific method for extracting the characteristic information in the sign language word data comprises the following steps: and calculating the positions and the directions of the left hand and the right hand relative to the reference, carrying out normalization processing on each component of the sensing data of each joint of the hands, taking the processed data as an HMM training sample, and establishing a sign language sample model library.
One HMM available parameter as described above: λ ═ (a, B, pi) represents,
wherein A ═ { a ═ aijIs a state transition probability matrix,
and satisfies the formula: a isij=P[qi+j=Sj|ql=Si],1≤i,j≤N;
And the constraint conditions are satisfied: a isij≥0,1≤i,j≤N, Σ j = 1 N a ij = 1 , 1≤i≤N;
In the above formula, N is the number of states of the model;
π={πi},πirepresenting the probability of starting from the ith state node,
and satisfies the formula: pii=P[q1=Si],1≤i≤N;
And the constraint conditions are satisfied: pii≥0, Σ i = 1 N π i = 1 , 1≤i≤N;
B={bj(k) Is the probability density of the observed signal, b is the continuous vector of observed symbolsj(k) Is a continuous probability density function, and: b j ( k ) = Σ m = 1 M c jm G [ μ jm , Σ jm , O k ] , 1 ≤ j ≤ N ;
where N is the number of states of the model, M is the number of mixed terms, OkIs an observation vector at the k moment; c. CjmIs a Mixing ratio (Mixing probability) and satisfies:
Σ m = 1 M c jm = 1 , 1≤j≤N,cjm≥0,1≤j≤N,1≤m≤M
wherein: g is taken as the Gaussian probability density function, mujmSum ΣjmRespectively a mean vector and a covariance matrix of the mth component in the Gaussian mixture probability density;
G [ μ jm , Σ jm , O k ] = 1 2 π | Σ jm | 1 2 exp [ - 1 2 ( O k - μ jm ) T Σ jm - 1 ( O k - μ jm ) ]
k sets of training data O ═ O corresponding to the same gesture word(1),O(2),...,O(K)]Wherein O ( k ) - - [ O 1 ( k ) O 2 ( k ) . . . O T k ( k ) ] Is the k-th set of training data, TkIs the total frame number of the kth group of training data;
πi,aijand cjm,μm,∑mThe reevaluation formula of (c) is:
π ‾ i = 1 K Σ k = 1 k γ t ( k ) ( i )
i.e. the expected probability of the occurrence of the state node i at the time t equal to 1;
a ‾ ij = Σ k = 1 K Σ t = 1 T k - 1 ξ t ( k ) ( i , j ) / Σ k = 1 K Σ t = 1 T k - 1 γ t ( k ) ( i ) , wherein,
a partial fraction of the expected probability of transitioning from state node i to state node j;
the mother part is the expected probability of transition from the state node i;
c ‾ jm = Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) / Σ k = 1 K Σ t = 1 T k Σ m = 1 M γ t ( k ) ( j , m ) , wherein,
the fraction is the expected probability of occurrence at the mth branch of the state node j;
the sub-parent part is the expected probability of occurrence at the state node j;
μ ‾ m = Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) · O t ( k ) / Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) , wherein,
the fraction is the expected probability of the observation sequence 0 appearing in the mth branch of the state node j;
the mother part is the expected probability of the mth branch appearing at the state node j;
Σ ‾ m = Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) · ( O t ( k ) - μ m ) ( O t ( k ) - μ m ) ′ / Σ k = 1 K Σ t = 1 T k γ t ( k ) ( j , m ) , wherein,
the branch part is an expected probability of the mean square error of the observation sequence 0 appearing in the mth branch of the state node j;
the denominator part is the expected probability of the occurrence of the mth branch at the state node j.
After extracting the characteristic information of the sign language word data, carrying out sign language continuous statement recognition according to the characteristic information, and then recording the recognition result of the intermediate mode language data; the specific method comprises the following steps:
the hand shape, position and direction are separately processed using a Semi-Continuous Hidden Markov Model (SCHMM) to reduce the number of codebooks, and then the sign language is described by building a multi-dimensional letter string of the position, direction and hand shape.
Firstly, establishing a continuous model for all words by using a single data stream, and respectively clustering six parts of data such as left-hand shape, right-hand relative left-hand position, right-hand relative left-hand direction, and distances among three receivers of mean vectors on all state nodes, wherein the clustering step comprises the following steps:
initialization: randomly selecting a plurality of vectors from a training vector set as an initial codebook;
finding the closest codeword: for each training vector, searching a code word vector closest to the training vector in the current codebook, and distributing the vector to a set corresponding to the code word;
and (3) code word modification: correcting the code words into the mean value of all training vectors in the corresponding set;
and the two steps are carried out until the mean square error of the vectors in each class is lower than a given threshold value.
And after the clustering center is obtained, quantizing the mean vector of each state node of all the models. Each state node only records the code book serial number closest to the state node. During identification, for each frame of identification data, only the distance between the identification data and each codebook is calculated, and the distance between the identification data and the codebook recorded by the node is used for replacing the distance between the identification data and the mean vector of the node.
After the model base is built, likelihood probabilities of the test sample and various possible model sequences are calculated by a Viterbi (Viterbi) decoding method, and the word sequence corresponding to the model sequence with the maximum probability value is the recognition result.
Let the vocabulary have a capacity of V, the model number of the word is k 1k,Ak,ck,μk,Uk) The number of states of each word is L, and the input sign language frame sequence is numbered as i ═ 1, 2.., N;
when vocabulary is transferred within the same model (j > 1):
Pr ( j , k ) = p ( i , j , k ) * max { Pr ( j , k ) * A j , j k , Pr ( j - 1 , k ) * A j - 1 , j k }
T(i,j,k)=k
F ( i , j , k ) = j 0 = arg max { Pr ( j , k ) * A j , j k , Pr ( j - 1 , k ) * A j - 1 , j k }
when a vocabulary transitions at the boundary of the model (j ═ 1):
Pr ( j , k ) = p ( i , j , k ) * max { Pr ( j , k ) * A j , j k , Pr ( L , k * ) * Pr ( k | k * ) , 1 ≤ k * ≤ V }
T ( i , j , k ) = k 0 = arg max { Pr ( j , k ) * A j , j k , Pr ( L , k * ) * Pr ( k | k * ) , 1 ≤ k * ≤ V }
F ( i , j , k ) = j 0 = arg max { Pr ( j , k ) * A j , j k , Pr ( L , k * ) * Pr ( k | k * ) , 1 ≤ k * ≤ V }
initially (i is 1), Pr (1, k) is p (1, 1, k), Pr (j, k) is 0, j > 1
T(1,j,k)=-1
F(1,i,k)=-1
The above formula can be used to recursively solve for each Pr (L, k) and thus a global maximum probability can be obtained:
Pr max = max k { Pr ( L , k ) }
backtracking the optimal path (T) from T (i, j, k) and F (i, j, k)i,Fi) (reverse order):
T X = arg max k { Pr ( L , k ) }
FX=L
Ti=T(i+1,Tj+1,Fj+1), N-1≥i≥1
Fi=F(i+1,Ti+1,Fi+1), N-1≥i≥1
obtaining an identification result; wherein,
p (i, j, k): is the probability of the ith frame occurring at the jth state of word k;
pr (j, k): maximum probability for a state transition from the beginning to the jth state of word k until the current input frame i:
t (i, j, k): recording the serial number of the model where the previous frame is located;
f (i, j, k): the state of the previous frame in the model T (i, j, k) is recorded.
In order to improve the identification accuracy, a second-order markov large chain (Bigram) is embedded in the Viterbi search process, that is: the prior probability of a sentence can be calculated by:
P ( W ) = P ( w 1 , w 2 , . . . w n ) = Π t = 1 n P ( w i | w i - 1 )
wherein,
w is a sentence which is recognized to lie prone;
w1,w2,...wnfor each word in the sentence W that is identified;
P(wi|wi-1) The frequency of occurrence of word pairs.
Thus, the sentence which can convert the motion information of the sign language into the intermediate mode language is obtained.
When the input sign language is required to be translated into a non-sign language sentence (for example, a speech sentence), the intermediate mode language data is converted into the non-sign language word and output according to the corresponding relationship between the intermediate mode language data and the corresponding non-sign language. In the present embodiment, the language as the intermediate mode between the sign language and the non-sign language is a text; taking speech output as an example: when speech output is required, the characters in the text can be expressed by using speech synthesis. The voice synthesis can be performed by performing simple smoothing processing on the audio data of the isolated words and then performing connected broadcasting. The specific speech synthesis method adopts the existing system for synthesizing speech by using text files.
Example 2: converting speech into sign language and outputting
Referring to fig. 3 and 4, a specific method for translating a non-sign language into a sign language through an intermediate mode language data format includes:
firstly, collecting word data of non-sign language, and converting spoken language into intermediate mode language data by using the existing voice recognition technology; in this embodiment, the speech recognition may be implemented using a speech data development tool developed by IBM corporation.
Then, converting the non-sign language words into intermediate mode language data and recording according to the corresponding relation between the intermediate mode language data and the non-sign language; in this embodiment, the intermediate mode language is stored as a text.
Finally, according to the text, finding out corresponding sign language word data in a sign language word library, synthesizing the sign language word data into sign language image information and outputting the sign language image information; the specific implementation comprises the following steps: inputting, analyzing and segmenting text, converting natural language expressed by the text into sign language codes, synthesizing sign language word data into sign language image information and the like.
The input, analysis and segmentation of the text can be realized by using the existing natural language identification method, and then the segmented natural language words correspond to corresponding sign language words in a sign language word stock to obtain sign language feature data of the sign language words. The sign language feature data is used to finally synthesize a corresponding sign language image.
The specific method for synthesizing the sign language image comprises the following steps:
firstly, a human body representation model of Virtual Reality Modeling Language (VRML) is adopted to establish a Virtual human; determining the angle value of each degree of freedom of the virtual human, calculating the position and the direction of each limb of the virtual human, and determining one gesture of the virtual human;
because the sign language is the movement of the upper limb of the human body, and the sign language movement is the projection of the movement of the human body on the upper limb joint of the human body, when the sign language is displayed (namely the sign language is mapped to the gesture of the virtual human body), one sign language movement representation can be expanded into a complete human body movement representation; that is, the sign language display may be performed using a general method of human motion display, and thus, by ignoring the non-upper limb joint angle of the virtual human pose obtained in the above steps, one sign language pose of the virtual human may be obtained; or the non-upper limb joint angle of the virtual human posture obtained by the steps is ignored, so that one sign language posture of the virtual human can be expressed in a complete human motion posture.
After all sign language motion posture data are obtained, each posture in one sign language motion is continuously displayed according to a specified time interval, and a corresponding sign language motion image is generated.
When generating the sign language moving image, further performing smooth interpolation between adjacent frames of the sign language moving image; the specific interpolation is calculated according to the following formula:
G i ( t f ) = G i ( t 1 ) + t f - t f 1 t f 2 - t f 1 ( G i ( t f 2 ) - G i ( t f 1 ) )
wherein,
f1and f2Two adjacent image frames in a sign language movement are respectively;
tf1and tf2Are respectively f1And f2A time value from a starting point;
tfis the time value from the starting point of the inserted frame;
t1a time value as a starting point;
Gi(tf) Function values for the interpolated degree of freedom curves;
Gi(t1) A degree of freedom curve function value as a starting point;
Gi(tf1) Is f1A degree of freedom curve function of;
Gi(tf2) Is f2Function value of the degree of freedom curve of (1).
When generating the sign language moving image, the method further adopts a four-tuple-based motion interpolation method to carry out smooth transition processing on the complex joints in the discontinuous sign language frames, and the specific smooth processing is calculated according to the following formula:
q 1 f = sin ( 1 - t f ) θ sin θ q f 1 + sin ( t f θ ) sin θ q f 2
wherein,
f1and f2Respectively being image frames of two adjacent gestures in a sign language movement;
tf1and tf2Are respectively f1And f2A time value from a starting point;
qf1and q isf2Each joint is at tf1,tf2The direction of the time;
tftime value of inserting frame distance from starting point;
theta isf1,qf2Cos θ.
In the above two embodiments, only the sign language motion recognition and synthesis method is given. In fact, sign language usually also contains human facial expression information and lip movement information; furthermore, when recognizing or synthesizing a hand, description is often given to facial expression features and lip movement information of a specific expression.
Referring to fig. 4, in the embodiment of the present invention, a face specific to an expression is further detected to obtain corresponding face features and lip movement information, and the face features and the lip movement information are synthesized at an output end and output in synchronization with a sign language sentence. The specific method for detecting and synthesizing the face and lip movement information comprises the following steps:
firstly, extracting feature data in the face information, wherein the feature data at least comprises the following components: detecting front face features and side face feature points; wherein,
the detection of the front face features at least comprises the following steps: coarse positioning of facial features, detection of key feature points and detection of feature shapes based on a deformation template;
the detection of the characteristic points of the face at the side at least comprises the following steps: extracting a face side contour line and detecting face side characteristic points.
The coarse positioning of the facial features is as follows: the method comprises the steps of firstly positioning the positions of irises of human eyes, and then obtaining position data of other organs of the human face according to position data of central points of two irises, statistical prior data of facial organ structures and facial gray scale distribution characteristics.
The detection of the key feature points of the human face is as follows: acquiring main characteristic points on an eye corner point, a mouth corner point and a chin curve as initial values of corresponding organ template parameters; the method specifically comprises the following steps: detecting eye key points, mouth key points and chin key points; wherein: the eye key points comprise left and right eye corner points and boundary points of upper and lower eyelids; the key points of the mouth part comprise two mouth corner points, an upper lip highest point and a lower lip lowest point; the key points of the chin comprise intersection points of extension lines of the left and right mouth corners and the chin, intersection points of a perpendicular line passing through the middle lip point and the chin, intersection points of a perpendicular line passing through the left and right mouth corner points and the chin, intersection points of a straight line passing through the left mouth corner point and 45 degrees left and lower and the chin, and intersection points of a straight line passing through the right mouth corner point and 45 degrees right and lower and the chin.
The characteristic shape detection based on the deformed template comprises the following steps: detecting the characteristic shape of the eye area to obtain eye template parameters; detecting the shape of the mouth to obtain mouth template parameters; and detecting the shape of the chin to obtain parameters of the chin template.
The extraction of the side contour line of the human face is as follows: segmenting a face region by using the skin color characteristics of the face; then edge detection is adopted, and contour lines are positioned according to prior data of the face contour.
The detection of the face side characteristic points is as follows: dividing the contour line of the human face into an upper section and a lower section by taking the nose tip point as a boundary; and obtaining an approximate function expression of the contour line through curve fitting, calculating a point with a first derivative of the function being zero, and taking the point as a face side characteristic point.
To extract features from a particular face, 41 feature points may be defined on the face model. Here, the feature points can be extracted from the face and side images of a specific person, and the automatic extraction of the feature points belongs to the category of face image detection and analysis, assuming that a desired feature or deformation curve has been extracted from the specific face image by applying an analysis and recognition technique, and then it is taken as a deformation parameter for a general face model. Since the generic face neutral model is a three-dimensional mesh, the three-dimensional coordinates of each feature point are known, and two transformations are performed in the process of modifying the generic face neutral model to the specific face neutral model. Firstly, the general human face neutral model is subjected to overall transformation, and the overall transformation aims to modify the overall outline of the face so as to match the general face neutral model with the approximate positions of the facial shape and five sense organs of a specific person. Setting the coordinates of the point on the face model before transformation as (x, y, z) and after transformation as (x ', y ', z '), and setting the coordinates of the central point of the front face and the central point of the rear face before transformation as o (x)0,y0,z0) And o (x'0,y′0,z′0). Wherein, the central point o of the face is defined as the intersection point between the connecting line of the eye corner points of the two eyes and the longitudinal central axis of the face. Parameters p, q1 and q2 are respectively defined as the distance from the center point to the temple, the center point to the center point of the forehead and the center point to the chin; the parameter u is defined as the distance from the center point of the mouth to the lower edge of the ear; the parameters r1, r2 and r3 are respectively defined in the lateral side as the distance from the highest point of the forehead to the hairline, the distance from the center point to the upper edge of the ear and the distance from the corner point of the mouth to the lower edge of the ear.
The formula is modified as follows for the top half face (the part above the eye horizon):
x′=x0′+p′/p(x-x0)
y′=y0′+q1′/q1(y-y0)
z′=z0′+r1′/r1(z-z0)
the middle and lower portions of the face may be similarly modified.
The modification formula of the points on the eye area in the partial transformation is as follows:
setting: (x, y, z) is the coordinates of the eye region points before transformation, (x ', y ', z ') is the coordinates of the eye region points after transformation, then
x′=ax+by+cz
y′=dx+ey+fz
The variables a, b, c, d, e and f can be obtained by solving a six-membered linear equation set by bringing three groups of characteristic points before and after transformation, and the movement of the tiny positions of the eyes and the change of the shapes of the tiny positions of the eyes can be achieved through the transformation.
The above equation shows that the modification only occurs in the x and y directions, and no modification in depth is performed, and therefore the side information of the characteristic face is not well reflected. The same applies to the modification of the eyebrows and the mouth.
The transformation formula for the nose portion is as follows: let the coordinates of the nose region points before and after transformation be (x, y, z) and (x ', y ', z '), respectively, and the coordinates of the center point of the nose be (x)0,y0,z0) And (x'0,y′0,z′0),
x′=x0′+p′/p(x-x0)
y′=y0′+q′/q(y-y0)
z′=z0′+r′/r(z-z0)
After the whole transformation and the partial transformation are completed, a neutral three-dimensional face mesh body basically having specific face features is obtained.
The lip model in the face image adopts two parabolas to fit the upper lip line, one parabola is used to fit the lower lip line, and two mouth corner points, the highest points of the two upper lip parabolas, the lowest points of the lower lip parabolas and the intersection point of the two upper lip parabolas are selected. In addition, two points are added on the lower lip parabola, two points are added on the upper lip parabola, and three groups of two coincident points are added on the connecting line of the two mouth corner points. For the open-mouth version, each pair of coincident points is separated and becomes a point on the parabola of the upper and lower inner lips. The parabolic equation of the outer contour of the lip portion satisfies:
y=a(x-b)2+c
the coefficients a, b, c can be solved by substituting the coordinates of known points into the above equation.
The dynamic lip movement model is described as five mutually-associated parabolas by utilizing the opening model, wherein the five parabolas comprise two upper lips, one lower lip and one upper inner lip and one lower inner lip. In the lip dynamic synthesis driven by parameters, the opening distance in the longitudinal direction and the transverse direction is given, or the mouth corner point and the highest points of the upper lip and the lower lip are given, so that the corresponding lip opening state can be determined.
In Chinese, each phonetic unit which can be distinguished naturally in the language is a syllable, usually a Chinese character is a syllable, usually a syllable is composed of initial consonant and final, the duration of initial consonant is very short, then it can be quickly slipped into the mouth form of final, in Chinese phonetic alphabet, there are 19 initial consonants and 39 final consonants, and the final consonants are divided into single final, compound final and nasal final. When a single vowel sounds, the lip shape of the tongue position is unchanged in the whole sound making process, so that the tongue position can be regarded as a mouth shape. Defining several basic mouth shapes on the basis of the common mouth shape of Chinese pronunciation, interactively changing mouth shape parameters, and adjusting the grid point positions on the mouth region on the skin grid body to form a grid body representing the basic mouth shape and storing the grid body in advance. According to the basic mouth shape described above, a vowel mouth shape library can be derived, and the derivation rule is as follows:
(1) the single vowel pronunciation has basic mouth shape corresponding to it
(2) For the pronunciation mouth shape of the compound vowel and the nasal vowel, the mouth shape can be disassembled into a linear combination of a plurality of basic mouth shapes.
Corresponding mouth shape parameters can be obtained for all compound finals and nasal finals, and thus a final mouth shape library is formed. During synthesis, mouth shapes corresponding to the initial consonants and the final consonants are found according to the pinyin of Chinese characters, and then the mouth shapes are synthesized, and interpolation can be carried out between the mouth shapes if necessary to smooth the change of lips.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: the invention can be modified and replaced by other modifications or equivalents without departing from the spirit and scope of the invention, and the invention is to be covered by the claims.

Claims (27)

1. A method for translating sign language by intermediate mode language is characterized in that the sign language is translated into non-sign language by an intermediate mode language data form, and the method comprises the following specific steps:
step 101: collecting sign language word data;
step 102: extracting characteristic information in the sign language word data;
step 103: carrying out sign language continuous statement identification according to the characteristic information, and then recording the identification result of the intermediate mode language data;
step 104: and converting the intermediate mode language data into non-sign language words and outputting the non-sign language words according to the corresponding relation between the intermediate mode language data and the corresponding non-sign language.
2. The method of claim 1 for sign language translation using an intermediate mode language, wherein: the method for translating sign language into non-sign language further comprises: collecting corresponding face information while collecting sign language word data; then, extracting the characteristic data in the face information, and finally synthesizing the output face image when the characteristic data is used for translation output.
3. The method of claim 1 for sign language translation using an intermediate mode language, wherein: the specific method for collecting sign language word data comprises the following steps: collecting sensing data of each joint of the hand by using a data glove; inputting position and direction data of sign language gestures by adopting a position tracker; wherein, the data glove is arranged on the left hand and the right hand of the human body; the position tracker comprises a transmitter and more than one receiver; the transmitter sends out electromagnetic waves, the receiver is arranged on the left wrist and the right wrist of a human body, and the receiver receives the electromagnetic waves and calculates the position and the direction data of the receiver relative to the transmitter.
4. The method of claim 1 for sign language translation using an intermediate mode language, wherein: the specific method for extracting the characteristic information in the sign language word data comprises the following steps:
and calculating the positions and the directions of the left hand and the right hand relative to the reference, carrying out normalization processing on each component of the sensing data of each joint of the hands, taking the processed data as a training sample of the HMM, and establishing a sign language sample model library.
5. The method of claim 1 for sign language translation using an intermediate mode language, wherein: the specific method for recognizing the sign language continuous sentences comprises the following steps: and calculating the likelihood probability of the test sample and various possible model sequences by using a Viterbi decoding method, and taking the word sequence corresponding to the model sequence with the maximum probability value as an identification result.
6. The method of claim 5 for sign language translation using an intermediate mode language, wherein: the specific method for sign language continuous sentence recognition also comprises embedding a second-order Markov chain in the Viterbi search process based on a statistical language model, namely, the sentence W ═ W1,w2,…,wnCan be calculated using the following equation:
P ( W ) = P ( w 1 , w 2 , Λ w n ) = Π i = 1 n P ( w i | w i - 1 ) ;
wherein,
w is the recognized sentence;
w1,w2,…wnfor each word in the sentence W being recognized;
P(wi|wi-1) The frequency of occurrence of word pairs.
7. The method of claim 2 for sign language translation using an intermediate mode language, wherein: the extracting the feature data in the face information at least comprises the following steps: detecting front face features and side face feature points; wherein,
the detection of the front face features at least comprises the following steps: coarse positioning of facial features, detection of key feature points and detection of feature shapes based on a deformation template;
the detection of the characteristic points of the face at the side at least comprises the following steps: extracting a face side contour line and detecting face side characteristic points.
8. The method of claim 7 for sign language translation using an intermediate mode language, wherein: the coarse positioning of the facial features is as follows: the method comprises the steps of firstly positioning the position of an iris, and then obtaining position data of other organs of a human face according to position data of the center point of the iris, statistical prior data of the structure of the facial organs and the gray level distribution characteristics of the facial organs.
9. The method of claim 7 for sign language translation using an intermediate mode language, wherein: the detection of the key feature points of the human face is as follows: acquiring main characteristic points on an eye corner point, a mouth corner point and a chin curve as initial values of corresponding organ template parameters; the method specifically comprises the following steps: detecting eye key points, mouth key points and chin key points; wherein: the eye key points comprise left and right eye corner points and boundary points of upper and lower eyelids; the key points of the mouth part comprise two mouth corner points, an upper lip highest point and a lower lip lowest point; the key points of the chin comprise intersection points of extension lines of the left and right mouth corners and the chin, intersection points of a perpendicular line passing through the middle lip point and the chin, intersection points of a perpendicular line passing through the left and right mouth corner points and the chin, intersection points of a straight line passing through the left mouth corner point and 45 degrees left and lower and the chin, and intersection points of a straight line passing through the right mouth corner point and 45 degrees right and lower and the chin.
10. The method of claim 7 for sign language translation using an intermediate mode language, wherein: the characteristic shape detection based on the deformed template comprises the following steps: detecting the characteristic shape of the eye area to obtain eye template parameters; detecting the shape of the mouth to obtain mouth template parameters; and detecting the shape of the chin to obtain parameters of the chin template.
11. The method of claim 7 for sign language translation using an intermediate mode language, wherein: the extraction of the side contour line of the human face is as follows: segmenting a face region by using the skin color characteristics of the face; then edge detection is adopted, and contour lines are positioned according to prior data of the face contour.
12. The method of claim 7 for sign language translation using an intermediate mode language, wherein: the detection of the face side characteristic points is as follows: dividing the contour line of the human face into an upper section and a lower section by taking the nose tip point as a boundary; and obtaining an approximate function expression of the contour line through curve fitting, calculating a point with a first derivative of the function being zero, and taking the point as a face side characteristic point.
13. The method of claim 2 for sign language translation using an intermediate mode language, wherein: the method for synthesizing and outputting the face image by using the feature data comprises the following steps: defining more than one characteristic point on the face model, and taking the characteristic points as deformation parameters of a general face model; integrally transforming a general face neutral model to modify the whole outline of the face so as to match the whole outline with the positions of the specific face shape and five sense organs; and obtaining a neutral three-dimensional face mesh body with specific face features.
14. A method for sign language translation using an intermediate mode language according to claim 2 or 13, wherein: the method for synthesizing and outputting the face image by using the feature data further comprises the following steps: establishing a parameterized lip movement model according to the lip characteristic data, and finally synthesizing a corresponding mouth shape according to the lip movement model and the corresponding mouth shape of the language pronunciation; the method specifically comprises the following steps:
the lip model adopts two parabolas to fit an upper lip line, and one parabola is used to fit a lower lip line; selecting two mouth corner points, the highest points of two upper lip parabolas, the lowest point of a lower lip parabola and the intersection point of the two upper lip parabolas, adding two points on the lower lip parabola, adding two points on the upper lip parabola, and adding three groups of two coincident points on the connecting line of the two mouth corner points; when opening the mouth, the coincident points are separated and become points on the parabola of the upper and lower inner lips; and, the parabolic equation of the outer contour of the lip satisfies:
y=a(x-b)2+c
and a, b and c can be obtained by substituting the coordinate values of each point of the lip line into the equation to solve.
15. A method for sign language translation through an intermediate mode language is characterized in that a non-sign language is translated into a sign language through an intermediate mode language data form, and the method specifically comprises the following steps:
step 201: collecting non-sign language word data;
step 202: converting the non-sign language words into intermediate mode language data and recording according to the corresponding relation between the intermediate mode language data and the non-sign language;
step 203: and finding corresponding sign language word data in the sign language word library according to the intermediate mode language data, synthesizing the sign language word data into sign language image information and outputting the sign language image information.
16. The method of sign language translation using an intermediate mode language according to claim 15, wherein: the method for translating the non-sign language into the sign language further comprises the following steps: collecting corresponding face information while collecting non-sign language word data; then, extracting the characteristic data in the face information, and finally synthesizing the output face image when the characteristic data is used for translation output.
17. The method of sign language translation using an intermediate mode language according to claim 15, wherein: the specific method for synthesizing the sign language word data into the sign language image information comprises the following steps:
establishing a virtual human by adopting a VRML human body representation model;
determining the angle value of each degree of freedom of the virtual human;
calculating the position and the direction of each limb of the virtual human, and determining a gesture of the virtual human;
ignoring the non-upper extremity joint angle for that pose;
and continuously displaying each gesture in one sign language motion according to a specified time interval to generate a corresponding sign language motion image.
18. The method of claim 17, wherein the method comprises: when generating the sign language moving image, further performing smooth interpolation between adjacent frames of the sign language moving image; the specific interpolation is calculated according to the following formula:
G i ( t f ' ) = G i ( t 1 ) + t f ' - t f 1 t f 2 - t f 1 ( G i ( t f 2 ) - G i ( t f 1 ) )
wherein,
f1and f2Two adjacent image frames in a sign language movement are respectively;
tf1and tf2Are respectively f1And f2A time value from a starting point;
tf’is the time value from the starting point of the inserted frame;
t1a time value as a starting point;
Gi(tf’) Function values for the interpolated degree of freedom curves;
Gi(t1) A degree of freedom curve function value as a starting point;
Gi(tf1) Is f1A degree of freedom curve function of;
Gi(tf2) Is f2Function value of the degree of freedom curve of (1).
19. The method of claim 17, wherein the method comprises: when generating the sign language moving image, the method further adopts a four-tuple-based motion interpolation method to carry out smooth transition processing on the complex joints in the discontinuous sign language frames, and the specific smooth processing is calculated according to the following formula:
q t f ' = sin ( 1 - t f ' ) θ sin θ q f 1 + sin ( t f ' θ ) sin θ q f 2
wherein,
f1and f2Respectively being image frames of two adjacent gestures in a sign language movement;
tf1and tf2Are respectively f1And f2A time value from a starting point;
qf1and q isf2Are respectively a jointtf1,tf2The direction of the time;
tf’time value of inserting frame distance from starting point;
theta is q f 1 · q f 2 = cos θ And (4) determining.
20. The method of claim 16, wherein the method comprises: the extracting the feature data in the face information at least comprises the following steps: detecting front face features and side face feature points; wherein,
the detection of the front face features at least comprises the following steps: coarse positioning of facial features, detection of key feature points and detection of feature shapes based on a deformation template;
the detection of the characteristic points of the face at the side at least comprises the following steps: extracting a face side contour line and detecting face side characteristic points.
21. The method of claim 20, wherein the method comprises: the coarse positioning of the facial features is as follows: the method comprises the steps of firstly positioning the position of an iris, and then obtaining position data of other organs of a human face according to position data of the center point of the iris, statistical prior data of the structure of the facial organs and the gray level distribution characteristics of the facial organs.
22. The method of claim 20, wherein the method comprises: the detection of the key feature points of the human face is as follows: acquiring main characteristic points on an eye corner point, a mouth corner point and a chin curve as initial values of corresponding organ template parameters; the method specifically comprises the following steps: detecting eye key points, mouth key points and chin key points; wherein: the eye key points comprise left and right eye corner points and boundary points of upper and lower eyelids; the key points of the mouth part comprise two mouth corner points, an upper lip highest point and a lower lip lowest point; the key points of the chin comprise intersection points of extension lines of the left and right mouth corners and the chin, intersection points of a perpendicular line passing through the middle lip point and the chin, intersection points of a perpendicular line passing through the left and right mouth corner points and the chin, intersection points of a straight line passing through the left mouth corner point and 45 degrees left and lower and the chin, and intersection points of a straight line passing through the right mouth corner point and 45 degrees right and lower and the chin.
23. The method of claim 20, wherein the method comprises: the characteristic shape detection based on the deformed template comprises the following steps: detecting the characteristic shape of the eye area to obtain eye template parameters; detecting the shape of the mouth to obtain mouth template parameters; and detecting the shape of the chin to obtain parameters of the chin template.
24. The method of claim 20, wherein the method comprises: the extraction of the side contour line of the human face is as follows: segmenting a face region by using the skin color characteristics of the face; then edge detection is adopted, and contour lines are positioned according to prior data of the face contour.
25. The method of claim 20, wherein the method comprises: the detection of the face side characteristic points is as follows: dividing the contour line of the human face into an upper section and a lower section by taking the nose tip point as a boundary; and obtaining an approximate function expression of the contour line through curve fitting, calculating a point with a first derivative of the function being zero, and taking the point as a face side characteristic point.
26. The method of claim 16, wherein the method comprises: the method for synthesizing and outputting the face image by using the feature data comprises the following steps: defining more than one characteristic point on the face model, and taking the characteristic points as deformation parameters of a general face model; integrally transforming a general face neutral model to modify the whole outline of the face so as to match the whole outline with the positions of the specific face shape and five sense organs; and obtaining a neutral three-dimensional face mesh body with specific face features.
27. A method for sign language translation using an intermediate mode language according to claim 16 or 26, wherein: the method for synthesizing and outputting the face image by using the feature data further comprises the following steps: establishing a parameterized lip movement model according to the lip characteristic data, and finally synthesizing a corresponding mouth shape according to the lip movement model and the corresponding mouth shape of the language pronunciation; the method specifically comprises the following steps:
the lip model adopts two parabolas to fit an upper lip line, and one parabola is used to fit a lower lip line; selecting two mouth corner points, the highest points of two upper lip parabolas, the lowest point of a lower lip parabola and the intersection point of the two upper lip parabolas, adding two points on the lower lip parabola, adding two points on the upper lip parabola, and adding three groups of two coincident points on the connecting line of the two mouth corner points; when opening the mouth, the coincident points are separated and become points on the parabola of the upper and lower inner lips; and, the parabolic equation of the outer contour of the lip satisfies:
y=a(x-b)2+c
and a, b and c can be obtained by substituting the coordinate values of each point of the lip line into the equation to solve.
CN 02121369 2002-06-17 2002-06-17 Method of hand language translation through a intermediate mode language Expired - Fee Related CN1246793C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 02121369 CN1246793C (en) 2002-06-17 2002-06-17 Method of hand language translation through a intermediate mode language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 02121369 CN1246793C (en) 2002-06-17 2002-06-17 Method of hand language translation through a intermediate mode language

Publications (2)

Publication Number Publication Date
CN1464433A CN1464433A (en) 2003-12-31
CN1246793C true CN1246793C (en) 2006-03-22

Family

ID=29742946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 02121369 Expired - Fee Related CN1246793C (en) 2002-06-17 2002-06-17 Method of hand language translation through a intermediate mode language

Country Status (1)

Country Link
CN (1) CN1246793C (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332229A1 (en) * 2009-06-30 2010-12-30 Sony Corporation Apparatus control based on visual lip share recognition
CN102737397B (en) * 2012-05-25 2015-10-07 北京工业大学 What map based on motion excursion has rhythm head movement synthetic method
CN106203235B (en) * 2015-04-30 2020-06-30 腾讯科技(深圳)有限公司 Living body identification method and apparatus
CN108629241B (en) * 2017-03-23 2022-01-14 华为技术有限公司 Data processing method and data processing equipment
CN108766434B (en) * 2018-05-11 2022-01-04 东北大学 Sign language recognition and translation system and method
CN109166409B (en) * 2018-10-10 2021-02-12 长沙千博信息技术有限公司 Sign language conversion method and device
WO2021218750A1 (en) * 2020-04-30 2021-11-04 Guangdong Oppo Mobile Telecommunications Corp., Ltd. System and method for translating sign language

Also Published As

Publication number Publication date
CN1464433A (en) 2003-12-31

Similar Documents

Publication Publication Date Title
CN1241168C (en) Learning apparatus, learning method, and robot apparatus
CN1194337C (en) Voice identifying apparatus and method, and recording medium with recorded voice identifying program
CN1196103C (en) Voice identifying apparatus and method, and recording medium with recorded voice identifying program
CN1161687C (en) Scribble matching
CN1879147A (en) Text-to-speech method and system, computer program product therefor
CN1462428A (en) Sound processing apparatus
CN1725295A (en) Speech processing apparatus, speech processing method, program, and recording medium
CN1143263C (en) System and method for generating and using context dependent subsyllable models to recognize a tonal language
CN1168068C (en) Speech synthesizing system and speech synthesizing method
CN1238833C (en) Voice idnetifying device and voice identifying method
CN1328321A (en) Apparatus and method for providing information by speech
CN1914666A (en) Voice synthesis device
CN1453767A (en) Speech recognition apparatus and speech recognition method
CN101042868A (en) Clustering system, clustering method, clustering program and attribute estimation system using clustering system
CN1608259A (en) Machine translation
CN1842702A (en) Speech synthesis apparatus and speech synthesis method
CN101046960A (en) Apparatus and method for processing voice in speech
CN1855224A (en) Information processing apparatus, information processing method, and program
CN1941077A (en) Apparatus and method speech recognition of character string in speech input
CN1151573A (en) Voice recognizing method, information forming method, Voice recognizing apparatus, and recording medium
CN1246793C (en) Method of hand language translation through a intermediate mode language
CN101034409A (en) Search method for human motion based on data drive and decision tree analysis
CN1275223A (en) Natural language processing device and method
CN1266633C (en) Sound distinguishing method in speech sound inquiry
CN1229194A (en) Fundamental frequency pattern generating method, fundamental frequency pattern generator, and program recording medium

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20060322

Termination date: 20200617

CF01 Termination of patent right due to non-payment of annual fee