CN109460748B - Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method - Google Patents

Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method Download PDF

Info

Publication number
CN109460748B
CN109460748B CN201811501014.6A CN201811501014A CN109460748B CN 109460748 B CN109460748 B CN 109460748B CN 201811501014 A CN201811501014 A CN 201811501014A CN 109460748 B CN109460748 B CN 109460748B
Authority
CN
China
Prior art keywords
sign language
camera device
monocular
image pickup
gesture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811501014.6A
Other languages
Chinese (zh)
Other versions
CN109460748A (en
Inventor
张晓利
刘欢
邹亚男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Science and Technology
Original Assignee
Inner Mongolia University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Science and Technology filed Critical Inner Mongolia University of Science and Technology
Priority to CN201811501014.6A priority Critical patent/CN109460748B/en
Publication of CN109460748A publication Critical patent/CN109460748A/en
Application granted granted Critical
Publication of CN109460748B publication Critical patent/CN109460748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/009Teaching or communicating with deaf persons

Abstract

The invention discloses a three-dimensional visual sign language recognition device and a multi-information fusion sign language recognition method, and relates to the technical field of intelligent translation of sign languages; the problem that the vision blind area defect of binocular equipment and an independent gesture recognition scheme cannot accurately acquire information to be conveyed by the sign language of the deaf-mute is solved; the invention adds a monocular camera device on the vertical plane of the binocular equipment to collect the information in the vision blind area, and the information is used as the data complement of the vision blind area of the binocular equipment; the method has the advantages that a group of sign language gestures are defined by the expression of a sign language expressive person, the relative position of hands on the human body when sign language communication is performed and sign language gesture recognition, so that the number of recognizable gestures is increased in a mode of combining and defining the sign language, and the richness of a sign language library is enlarged; meanwhile, various combined elements refine each sign language, so that the specificity of each sign language is improved, the sign language judgment errors caused by similar and similar gestures are avoided, and the accuracy of sign language identification is improved.

Description

Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method
Technical Field
The invention relates to the technical field of intelligent translation of sign language, in particular to a three-dimensional visual sign language recognition device and a multi-information fusion sign language recognition method.
Background
According to the statistics of national data, the number of the deaf-mutes is the largest in five disabled people such as disabled, limb disabled and intelligent disabled in China, namely, every 100 people has one deaf-mutes, and about 80 ten thousand people are less than 7 years old. In addition, according to the latest data, 0.02% of the deaf-mutes in China are born each year, and about 1% of the hearing-impaired newborns are calculated, and hearing disorders exist in China. In life, the deaf-mute is a group with communication obstacle with the outside, and the people with the communication obstacle almost determine the future of the people from birth, and the circles they can live in, and the communication and living environments of the people can only be limited in the world of the deaf-mute. For example, they go to hospitals without accompanying people, and because of the disorder of language expression and no sign language translation, there are various limitations in setting forth the illness state to doctors; when going to a service window such as a bank, the user cannot communicate with a salesman. Sign language is taken as a highly structured gesture, and is an indispensable means for the deaf-mute to communicate daily. Sign language identification is taken as an important component in the field of man-machine interaction, and research and implementation of the sign language identification have important academic value and wide application prospect.
Stereoscopic vision is an important branch of computer vision, binocular stereoscopic vision is one of the branches, two or more images of a measured object are acquired from different positions by using imaging equipment, three-dimensional information of the surface point of the object is acquired by calculating position deviation among corresponding points of the images by using a triangulation principle, and finally, the shape or the surface digital three-dimensional morphology of the object is reconstructed. However, the conventional binocular device has the limitation of a visual blind area, as shown in fig. 1, when the recognition plane is perpendicular to the plane of the binocular camera, the blind area part cannot be accurately recognized due to the mutual shielding of all parts of the object, and then the independent gesture recognition scheme cannot accurately acquire the information to be transmitted by the sign language of the deaf-mute, so that the accuracy of sign language recognition of the deaf-mute is reduced.
Disclosure of Invention
The invention aims to provide a three-vision sign language recognition device and a multi-information fusion sign language recognition method, which solve the problems that vision blind area defects of binocular equipment are overcome, and an independent gesture recognition scheme cannot accurately acquire information to be transmitted by a deaf-mute sign language; in order to solve the visual blind area defect of the binocular equipment, a monocular camera device is added on the vertical plane of the binocular equipment, and information in the visual blind area is collected and used as data complement of the visual blind area of the binocular equipment; aiming at the fact that the existing independent gesture recognition scheme cannot accurately acquire the information to be conveyed by the sign language of the deaf-mute, a group of sign language gestures are defined by the expression of the sign language expressive person, the relative position of hands on the human body when sign language communication is carried out and the sign language gesture recognition, so that the number of the recognizable gestures is increased by combining the mode of defining the sign language, and the richness of a sign language library is enlarged; meanwhile, various combined elements refine each sign language, so that the specificity of each sign language is improved, the sign language judgment errors caused by similar and similar gestures are avoided, and the accuracy of sign language identification is improved.
The technical scheme adopted by the invention is as follows: a three-dimensional visual sign language recognition apparatus comprising: the device comprises a visual platform, a left camera device, a right camera device, a monocular adjusting bracket, a monocular camera device and image processing equipment;
the visual platform is horizontally arranged, a left camera device and a right camera device are symmetrically arranged on the visual platform, and a monocular adjusting bracket is arranged on a perpendicular bisector of a connecting line of the centers of lenses of the left camera device and the right camera device; the monocular adjusting bracket comprises a supporting rod, a screw rod mounting seat, a bearing, a hand crank and a lifting nut; the support rod is arranged perpendicular to the visual platform, a lifting limit groove along the length direction of the support rod is arranged in front of the support rod, and a height scale is arranged on the side face of the support rod; the screw rod is arranged in front of the supporting rod in parallel, the upper end and the lower end of the screw rod are rotatably arranged on a screw rod mounting seat, a bearing is arranged in the screw rod mounting seat, and a crank handle is arranged at the upper end of the screw rod; the lifting nut is arranged on the screw rod, the rear side surface of the lifting nut is provided with a lifting clamping part, the lifting clamping part can be clamped in the lifting limiting groove in a vertically sliding manner, the front side surface of the lifting nut is provided with a rotation connecting part, a horizontal rotating part capable of horizontally rotating left and right is arranged on the rotation connecting part, and the front end of the horizontal rotating part is horizontally provided with a monocular camera device; the lifting clamping part is provided with a height pointer, and the height pointer points to the height scale; the bottom surface of the lifting nut extends out of the protractor, the protractor is parallel to the visual platform, an angle pointer is arranged on the monocular camera device, and the angle pointer points to scale marks on the protractor; the height pointer is positioned at the same height as the center of the lens of the monocular image pickup device and indicates the center height of the lens of the monocular image pickup device; the angle pointer is parallel to the central axis of the lens of the monocular image pickup device and indicates the horizontal rotation angle of the monocular image pickup device; the central axis of the lens of the monocular image pickup device is vertical to the vertical surfaces where the left image pickup device and the right image pickup device are positioned; the left camera device, the right camera device and the monocular camera device jointly establish a virtual three-dimensional coordinate system, wherein a straight line where the central axis of the lens of the monocular camera device is positioned is an X axis, is perpendicular to the vision platform and is connected with the center of the lens of the left camera device and the center of the lens of the right camera deviceThe straight line at the midpoint of the line is the Z axis, the straight line perpendicular to the XZ plane and intersecting the X axis and the Y axis is the Y axis, and the intersection point of the X axis, the Y axis and the Z axis is the origin of coordinates (X) 0 ,Y 0 ,Z 0 ) The method comprises the steps of carrying out a first treatment on the surface of the The established virtual three-dimensional coordinate system does not change after being used, namely the positions and angles of the left camera device, the right camera device and the monocular camera device are not changed, the monocular adjusting bracket is used for calibrating before being used each time, and the monocular camera device is debugged to be an initial position, so that when the gesture is identified, the gesture displayed by a deaf-mute can be accurately judged as long as the gesture recorded in the gesture model library is reproduced, and the identification error caused by the position and angle change of the monocular camera device is avoided; when the lifting nut is used, the lifting nut is lifted or lowered by clockwise or anticlockwise shaking of the hand crank; the height pointer points to the height scale, so that the height of adjustment is convenient to observe, and the precise adjustment of the center height of the lens of the monocular camera device is realized; the angle pointer is arranged on the monocular image pickup device and points to scale marks on the protractor, the angle pointer is parallel to the central axis of the lens of the monocular image pickup device and indicates the horizontal rotation angle of the monocular image pickup device, and the horizontal rotation angle is accurately regulated by observing the scale marks of the angle pointer on the protractor;
the left image pickup device, the right image pickup device and the monocular image pickup device are connected with image processing equipment, wherein the left image pickup device and the right image pickup device form a binocular stereoscopic vision system;
the image processing device comprises an image acquisition unit, a three-dimensional modeling unit, a gesture model library, a human face expression library, a human hand and human body relative position library, a combined sign language library, a gesture verification unit, a human face expression verification unit, a human hand position verification unit, a combined sign language verification unit, a sign language conversion unit and a sign language output unit;
the gesture model library stores gesture models, wherein the gesture models are gesture model information acquired and recorded in advance by a left camera device, a right camera device and a monocular camera device, and the gesture model information comprises knuckle coordinates and vector data;
storing facial expression pictures in a facial expression library; storing relative position pictures of the human hand and the human body in a relative position library of the human hand and the human body; and storing combined sign language in a combined sign language library, wherein each combined sign language is defined by a gesture model, a facial expression picture and a picture of the relative position of a human hand and a human body.
Further, the image processing device is a PC.
Further, the sign language conversion unit converts the combined sign language into characters and sends the characters to the sign language output unit for output; the sign language output unit is a text display.
Further, the sign language conversion unit converts the combined sign language into voice and sends the voice to the sign language output unit for output; the sign language output unit is a voice player.
Further, the three-dimensional modeling unit adopts an Opencv computer vision library, and establishes a gesture model according to depth distance information from a human hand to the left image pickup device and the right image pickup device, wherein the depth distance information is obtained from the left image pickup device and the right image pickup device from bottom to top, and the depth distance information from a human hand to the monocular image pickup device is obtained from the side surface of the monocular image pickup device.
A multi-information fusion sign language recognition method of a three-dimensional visual sign language recognition device is characterized by comprising the following steps:
(1) Initial position calibration: the height and the angle of the monocular camera device are adjusted through the monocular adjusting bracket, so that the monocular camera device meets the requirement of establishing a virtual three-dimensional coordinate system;
(2) Gesture models are input in advance through a left camera device, a right camera device and a monocular camera device, and a gesture model library is established: making a gesture in a virtual three-dimensional coordinate system and keeping the gesture, wherein the left camera device, the right camera device and the monocular camera device continuously acquire for 10-15 minutes; the left camera device and the right camera device form a binocular stereoscopic vision system, and depth distance information from a human hand to the left camera device and the right camera device is obtained from bottom to top; the monocular camera device obtains depth distance information from a human hand to the monocular camera device from the side surface and is used as data complement of blind areas of the binocular stereoscopic vision system, so that relatively complete gesture information is extracted, and the binocular vision blind areas are eliminated to a great extent or completely; inputting data into a mysql database by using java (programming) language to finish inputting a gesture model; the input gesture model information comprises coordinates and vector data of the finger joints in a virtual three-dimensional coordinate system;
(3) Storing the facial expression pictures into a facial expression library, and storing the relative position pictures of the human hands and the human bodies into a relative position library of the human hands and the human bodies; the method comprises the steps that a combined sign language library stores combined sign languages, and each combined sign language is defined by a gesture model, a facial expression picture and a picture of the relative positions of a human hand and a human body;
(4) The deaf-mute is just opposite to the monocular camera device, and the hands are stretched into a virtual three-dimensional coordinate system to express sign language;
(5) The left camera device and the right camera device form a binocular stereoscopic vision system, and depth distance information from a human hand to the left camera device and the right camera device is obtained from bottom to top; the monocular camera device obtains depth distance information from a human hand to the monocular camera device from the side surface and is used as data complement of blind areas of the binocular stereoscopic vision system, so that relatively complete gesture information is extracted, and the binocular vision blind areas are eliminated to a great extent or completely; in addition, the monocular camera device also collects the relative position picture of the human hand and the human body and the facial expression picture of the human face;
(6) The image acquisition unit receives information acquired by the left camera device, the right camera device and the monocular camera device at the same moment and respectively sends the information to the three-dimensional modeling unit, the facial expression verification unit and the hand position verification unit;
the three-dimensional modeling unit establishes a gesture model according to depth distance information acquired by the three camera devices, and sends the established gesture model to the gesture verification unit; the gesture verification unit retrieves the gesture model stored in the gesture model library, performs finger joint coordinate and vector data matching with the built gesture model, confirms the gesture model with the highest matching degree, and sends the gesture model to the combined sign language verification unit; the facial expression verification unit is used for calling facial expression pictures stored in the facial expression library, carrying out feature matching on the facial expression pictures and facial expression pictures acquired by the monocular camera device, confirming the facial expression picture with the highest matching degree, and sending the facial expression picture to the combined sign language verification unit; the hand position verification unit invokes the hand and human body relative position pictures stored in the hand and human body relative position library, performs feature matching on the hand and human body relative position pictures acquired by the monocular camera device, confirms the hand and human body relative position picture with the highest matching degree, and sends the hand and human body relative position picture to the combined sign language verification unit;
the combined sign language verification unit retrieves the combined sign language stored in the combined sign language library for comparison, confirms the combined sign language, sends the combined sign language to the sign language conversion unit for conversion, and sends the converted combined sign language to the sign language output unit for output.
The invention has the beneficial effects that: the invention provides a three-vision sign language recognition device and a multi-information fusion sign language recognition method, which solve the problems that vision blind area defects of binocular equipment and an independent gesture recognition scheme cannot accurately acquire information to be transmitted by a deaf-mute sign language; in order to solve the visual blind area defect of the binocular equipment, a monocular camera device is added on the vertical plane of the binocular equipment, and information in the visual blind area is collected and used as data complement of the visual blind area of the binocular equipment; aiming at the fact that the existing independent gesture recognition scheme cannot accurately acquire the information to be conveyed by the sign language of the deaf-mute, a group of sign language gestures are defined by the expression of the sign language expressive person, the relative position of hands on the human body when sign language communication is carried out and the sign language gesture recognition, so that the number of the recognizable gestures is increased by combining the mode of defining the sign language, and the richness of a sign language library is enlarged; meanwhile, various combined elements refine each sign language, so that the specificity of each sign language is improved, the sign language judgment errors caused by similar and similar gestures are avoided, and the accuracy of sign language identification is improved.
Drawings
Fig. 1 is a schematic diagram of a conventional binocular apparatus having a visual blind area limitation.
Fig. 2 is a schematic perspective view of a three-dimensional visual sign language recognition device according to the present invention.
Fig. 3 is a schematic view of the monocular accommodating stent structure of the present invention.
Fig. 4 is a schematic view of a lifting nut according to the present invention.
Fig. 5 is a schematic view of an angle pointer structure according to the present invention.
Fig. 6 is a flow chart of a multi-information fusion sign language recognition method of the present invention.
FIG. 7 is a schematic representation of the expression of "good" combination sign language in an embodiment of the invention.
FIG. 8 is a schematic diagram of gesture model information of a "right hand vertical thumb gesture" entered in an embodiment of the invention.
In the figure: the visual platform 1, the left camera device 2, the right camera device 3, the monocular adjusting bracket 4, the monocular camera device 5, the supporting rod 6, the lead screw 7, the lead screw mounting seat 8, the bearing 9, the hand crank 10, the lifting nut 11, the lifting limit groove 6-1, the height scale 6-2, the lifting clamping part 11-1, the rotating connecting part 11-2, the horizontal rotating part 11-3, the height pointer 11-4, the protractor 11-5 and the angle pointer 5-1.
Detailed Description
For a clearer understanding of the technical solutions, objects and effects of the present invention, specific embodiments of the present invention will now be described with reference to the accompanying drawings.
In an embodiment, as shown in fig. 7, defining "right hand upright thumb gesture", "facial smiling expression", "gesture in front of left chest" represents a combination sign language that represents the meaning of "good"; the three-dimensional visual sign language recognition device and the multi-information fusion sign language recognition method provided by the invention are applied to judge and recognize;
(1) Initial position calibration: the height and the angle of the monocular camera device are adjusted through the monocular adjusting bracket, so that the monocular camera device meets the requirement of establishing a virtual three-dimensional coordinate system: the left image pickup device, the right image pickup device and the monocular image pickup device jointly establish a virtual three-dimensional coordinate system, wherein a straight line where the central axis of the lens of the monocular image pickup device is located is an X axis, a straight line which is perpendicular to the vision platform and passes through the midpoint of the connecting line of the centers of the lenses of the left image pickup device and the right image pickup device is a Z axis, a straight line which is perpendicular to an XZ plane and intersects with the X axis and the Y axis is a Y axis, and intersection points of the X axis, the Y axis and the Z axis are coordinate origins (X 0 ,Y 0 ,Z 0 ) The method comprises the steps of carrying out a first treatment on the surface of the The established virtual three-dimensional coordinate system is not changed after use, namely the positions and angles of the left camera device, the right camera device and the monocular camera device are not changed, the monocular adjusting bracket is used for calibrating before each use, the monocular camera device is debugged to an initial position, and when the gesture is identified,the gestures displayed by the deaf-mute can be accurately judged only by reproducing the gestures recorded in the gesture model library, so that the recognition errors caused by the position and angle change of the monocular camera device are avoided;
(2) The gesture models of right hand vertical thumb gestures are input in advance through a left camera device, a right camera device and a monocular camera device and stored in a gesture model library: making a gesture of 'right hand vertical thumb gesture' in a virtual three-dimensional coordinate system, and keeping the gesture, wherein the left camera device, the right camera device and the monocular camera device continuously acquire for 10-15 minutes; the left camera device and the right camera device form a binocular stereoscopic vision system, and depth distance information from a human hand to the left camera device and the right camera device is obtained from bottom to top; the thumb is positioned in a binocular vision blind area, the monocular camera device acquires depth distance information from the thumb to the monocular camera device from the side surface and is used as data complement of the blind area of the binocular stereoscopic vision system, so that relatively complete gesture information is extracted, and the binocular vision blind area is eliminated to a great extent or completely; inputting data into a mysql database by using java language to finish gesture model input; the entered gesture model information contains coordinate and vector data of the finger joints in a virtual three-dimensional coordinate system, as shown in fig. 8, two coordinates (X 1 ,Y 1 ,Z 1 ),(X 2 ,Y 2 ,Z 2 ) And vector data representing the direction and length of the front joint of the thumb;
(3) Storing the picture of the smiling expression of the face into a facial expression library, and storing the picture of the gesture in front of the left chest into a relative position library of the hand and the human body; the gesture model of the right hand vertical thumb gesture, the picture of the smiling expression of the face, the picture of the gesture in front of the left chest are jointly defined as the meaning of good, and the meaning is stored in a combined sign language library;
(4) The deaf-mute is just opposite to the monocular camera device, stretches the hand into the virtual three-dimensional coordinate system to make a gesture of 'right hand vertical thumb gesture' in front of the left chest again, and makes a 'facial smile expression' at the same time;
(5) The left camera device and the right camera device form a binocular stereoscopic vision system, and depth distance information from a human hand to the left camera device and the right camera device is obtained from bottom to top; the monocular camera device acquires depth distance information from a human hand to the monocular camera device from the side surface, and the depth distance information is used as data complement of blind areas of the binocular stereoscopic vision system, and in addition, the monocular camera device also acquires gesture pictures of right hand vertical thumb gesture positioned in front of left chest and pictures of facial smile expression;
(6) The image acquisition unit receives information acquired by the left camera device, the right camera device and the monocular camera device at the same moment and respectively sends the information to the three-dimensional modeling unit, the facial expression verification unit and the hand position verification unit;
the three-dimensional modeling unit establishes a gesture model of 'right hand vertical thumb gesture' according to depth distance information acquired by the three camera devices, and sends the established gesture model to the gesture verification unit; the gesture verification unit retrieves a gesture model which is input into the gesture model library in advance, performs finger joint coordinate and vector data matching with the built gesture model, confirms the gesture model with the highest matching degree, and sends the confirmed gesture model with the highest matching degree to the combined sign language verification unit; the three-dimensional modeling unit can adopt an Opencv computer vision library, and establishes a gesture model according to depth distance information from a human hand to the left image pickup device and the right image pickup device, wherein the depth distance information is acquired from the left image pickup device and the right image pickup device from bottom to top, and the depth distance information from the human hand to the monocular image pickup device is acquired from the side surface by the monocular image pickup device; the matching of the finger joint coordinates and the vector data is realized through computer programming;
the facial expression verification unit is used for calling facial expression pictures stored in the facial expression library, carrying out feature matching on the facial expression pictures and facial expression pictures acquired by the monocular camera device, confirming the facial expression picture with the highest matching degree, and sending the facial expression picture to the combined sign language verification unit; the method can adopt a 'perceptual hash algorithm' in the prior art to program, realize the search work of similar pictures, and has the functions of generating a fingerprint character string for each image and then comparing fingerprints of different images; the closer the results, the more similar the images are; this is the prior art, and therefore will not be described in detail;
the hand position verification unit invokes the hand and human body relative position pictures stored in the hand and human body relative position library, performs feature matching on the hand and human body relative position pictures acquired by the monocular camera device, confirms the hand and human body relative position picture with the highest matching degree, and sends the hand and human body relative position picture to the combined sign language verification unit; the 'perceptual hash algorithm' in the prior art can be adopted for programming to realize identification;
the combined sign language verification unit is used for retrieving the combined sign language stored in the combined sign language library for comparison, confirming the combined sign language which simultaneously meets the expressions of 'right hand vertical thumb gesture', 'face smiling expression', 'gesture is positioned in front of left chest', namely confirming the combined sign language with good meaning, transmitting the combined sign language to the sign language conversion unit for conversion, converting the combined sign language into characters or voice, transmitting the characters or voice to the sign language output unit for output, and the sign language output unit adopts a character display and a voice player; the combined sign language comparison and confirmation process is realized through computer programming.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that the foregoing embodiments may be modified or practiced with equivalents to some of the features of the embodiments, and that any modifications, equivalents, improvements and substitutions made herein within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A three-dimensional visual sign language recognition apparatus comprising: the device comprises a visual platform, a left camera device, a right camera device, a monocular adjusting bracket, a monocular camera device and image processing equipment;
the visual platform is horizontally arranged, a left camera device and a right camera device are symmetrically arranged on the visual platform, and a monocular adjusting bracket is arranged on a perpendicular bisector of a connecting line of the centers of lenses of the left camera device and the right camera device; the monocular adjusting bracket comprises a supporting rod, a screw rod mounting seat, a bearing, a hand crank and a lifting nut; the support rod is arranged perpendicular to the vision platform, and the front of the support rod is provided with a support edgeA lifting limit groove in the length direction of the rod, and a height scale is arranged on the side surface of the lifting limit groove; the screw rod is arranged in front of the supporting rod in parallel, the upper end and the lower end of the screw rod are rotatably arranged on a screw rod mounting seat, a bearing is arranged in the screw rod mounting seat, and a crank handle is arranged at the upper end of the screw rod; the lifting nut is arranged on the screw rod, the rear side surface of the lifting nut is provided with a lifting clamping part, the lifting clamping part can be clamped in the lifting limiting groove in a vertically sliding manner, the front side surface of the lifting nut is provided with a rotation connecting part, a horizontal rotating part capable of horizontally rotating left and right is arranged on the rotation connecting part, and the front end of the horizontal rotating part is horizontally provided with a monocular camera device; the lifting clamping part is provided with a height pointer, and the height pointer points to the height scale; the bottom surface of the lifting nut extends out of the protractor, the protractor is parallel to the visual platform, an angle pointer is arranged on the monocular camera device, and the angle pointer points to scale marks on the protractor; the height pointer is positioned at the same height as the center of the lens of the monocular image pickup device and indicates the center height of the lens of the monocular image pickup device; the angle pointer is parallel to the central axis of the lens of the monocular image pickup device and indicates the horizontal rotation angle of the monocular image pickup device; the central axis of the lens of the monocular image pickup device is vertical to the vertical surfaces where the left image pickup device and the right image pickup device are positioned; the left camera device, the right camera device and the monocular camera device jointly establish a virtual three-dimensional coordinate system, wherein a straight line where the central axis of the lens of the monocular camera device is located is an X-axis, a straight line which is vertical to the vision platform and passes through the midpoint of the connecting line of the centers of the lenses of the left camera device and the right camera device is a Z-axis, a straight line which is vertical to the XZ plane and intersects with the X-axis and the Y-axis is a Y-axis, and intersection points of the X-axis, the Y-axis and the Z-axis are coordinate origins (X) 0 ,Y 0 ,Z 0 );
The left image pickup device, the right image pickup device and the monocular image pickup device are connected with image processing equipment, wherein the left image pickup device and the right image pickup device form a binocular stereoscopic vision system;
the image processing device comprises an image acquisition unit, a three-dimensional modeling unit, a gesture model library, a human face expression library, a human hand and human body relative position library, a combined sign language library, a gesture verification unit, a human face expression verification unit, a human hand position verification unit, a combined sign language verification unit, a sign language conversion unit and a sign language output unit;
the gesture model library stores gesture models, wherein the gesture models are gesture model information acquired and recorded in advance by a left camera device, a right camera device and a monocular camera device, and the gesture model information comprises knuckle coordinates and vector data;
storing facial expression pictures in a facial expression library; storing relative position pictures of the human hand and the human body in a relative position library of the human hand and the human body; and storing combined sign language in a combined sign language library, wherein each combined sign language is defined by a gesture model, a facial expression picture and a picture of the relative position of a human hand and a human body.
2. The three-vision sign language recognition apparatus according to claim 1, wherein the image processing device is a PC.
3. The three-dimensional visual sign language recognition apparatus according to claim 1, wherein the sign language conversion means converts the combined sign language into a character, and transmits the character to the sign language output means for output; the sign language output unit is a text display.
4. The three-dimensional visual sign language recognition apparatus according to claim 1, wherein the sign language conversion unit converts the combined sign language into a voice, and transmits the voice to the sign language output unit for output; the sign language output unit is a voice player.
5. The three-dimensional visual sign language recognition device according to claim 1, wherein the three-dimensional modeling unit uses an Opencv computer vision library to perform gesture model building according to depth distance information from a human hand acquired from the left image pickup device and the right image pickup device from bottom to top to the left image pickup device and the right image pickup device, and depth distance information from a human hand acquired from the side surface to the monocular image pickup device by the monocular image pickup device.
6. A multi-information fusion sign language recognition method using the three-dimensional visual sign language recognition apparatus according to claim 1, comprising the steps of:
(1) Initial position calibration: the height and the angle of the monocular camera device are adjusted through the monocular adjusting bracket, so that the monocular camera device meets the requirement of establishing a virtual three-dimensional coordinate system;
(2) Gesture models are input in advance through a left camera device, a right camera device and a monocular camera device, and a gesture model library is established: making a gesture in a virtual three-dimensional coordinate system and keeping the gesture, wherein the left camera device, the right camera device and the monocular camera device continuously acquire for 10-15 minutes; the left camera device and the right camera device form a binocular stereoscopic vision system, and depth distance information from a human hand to the left camera device and the right camera device is obtained from bottom to top; the monocular image pickup device obtains depth distance information from a human hand to the monocular image pickup device from the side face; inputting data into a mysql database by using java language to finish inputting a gesture model; the input gesture model information comprises coordinates and vector data of the finger joints in a virtual three-dimensional coordinate system;
(3) Storing the facial expression pictures into a facial expression library, and storing the relative position pictures of the human hands and the human bodies into a relative position library of the human hands and the human bodies; the method comprises the steps that a combined sign language library stores combined sign languages, and each combined sign language is defined by a gesture model, a facial expression picture and a picture of the relative positions of a human hand and a human body;
(4) The deaf-mute is just opposite to the monocular camera device, and the hands are stretched into a virtual three-dimensional coordinate system to express sign language;
(5) The left camera device and the right camera device form a binocular stereoscopic vision system, and depth distance information from a human hand to the left camera device and the right camera device is obtained from bottom to top; the monocular image pickup device obtains depth distance information from a human hand to the monocular image pickup device from the side surface and uses the depth distance information as data complement of blind areas of the binocular stereoscopic vision system; in addition, the monocular camera device also collects the relative position picture of the human hand and the human body and the facial expression picture of the human face;
(6) The image acquisition unit receives information acquired by the left camera device, the right camera device and the monocular camera device at the same moment and respectively sends the information to the three-dimensional modeling unit, the facial expression verification unit and the hand position verification unit;
the three-dimensional modeling unit establishes a gesture model according to depth distance information acquired by the three camera devices, and sends the established gesture model to the gesture verification unit; the gesture verification unit retrieves a gesture model stored in advance in the gesture model library, performs finger joint coordinate and vector data matching with the built gesture model, confirms the gesture model with the highest matching degree, and sends the gesture model to the combined sign language verification unit;
the facial expression verification unit is used for calling facial expression pictures stored in the facial expression library, carrying out feature matching on the facial expression pictures and facial expression pictures acquired by the monocular camera device, confirming the facial expression picture with the highest matching degree, and sending the facial expression picture to the combined sign language verification unit;
the hand position verification unit invokes the hand and human body relative position pictures stored in the hand and human body relative position library, performs feature matching on the hand and human body relative position pictures acquired by the monocular camera device, confirms the hand and human body relative position picture with the highest matching degree, and sends the hand and human body relative position picture to the combined sign language verification unit;
the combined sign language verification unit retrieves the combined sign language stored in the combined sign language library for comparison, confirms the combined sign language, sends the combined sign language to the sign language conversion unit for conversion, and sends the converted combined sign language to the sign language output unit for output.
CN201811501014.6A 2018-12-10 2018-12-10 Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method Active CN109460748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811501014.6A CN109460748B (en) 2018-12-10 2018-12-10 Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811501014.6A CN109460748B (en) 2018-12-10 2018-12-10 Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method

Publications (2)

Publication Number Publication Date
CN109460748A CN109460748A (en) 2019-03-12
CN109460748B true CN109460748B (en) 2024-03-01

Family

ID=65612920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811501014.6A Active CN109460748B (en) 2018-12-10 2018-12-10 Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method

Country Status (1)

Country Link
CN (1) CN109460748B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734712B (en) * 2020-12-31 2022-07-01 武汉第二船舶设计研究所(中国船舶重工集团公司第七一九研究所) Imaging detection method and system for health state of ship vibration equipment
CN113197403B (en) * 2021-05-14 2023-02-17 广州乾睿医疗科技有限公司 Method capable of preventing virus infection and smart bracelet

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527092A (en) * 2009-04-08 2009-09-09 西安理工大学 Computer assisted hand language communication method under special session context
KR20160109708A (en) * 2015-03-12 2016-09-21 주식회사 디지털스케치 Sign language translator, system and method
CN108960158A (en) * 2018-07-09 2018-12-07 珠海格力电器股份有限公司 A kind of system and method for intelligent sign language translation
CN209980267U (en) * 2018-12-10 2020-01-21 内蒙古科技大学 Three-vision sign language recognition device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101527092A (en) * 2009-04-08 2009-09-09 西安理工大学 Computer assisted hand language communication method under special session context
KR20160109708A (en) * 2015-03-12 2016-09-21 주식회사 디지털스케치 Sign language translator, system and method
CN108960158A (en) * 2018-07-09 2018-12-07 珠海格力电器股份有限公司 A kind of system and method for intelligent sign language translation
CN209980267U (en) * 2018-12-10 2020-01-21 内蒙古科技大学 Three-vision sign language recognition device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于视觉的手势识别技术;孙丽娟;张立材;郭彩龙;;计算机技术与发展(第10期);全文 *

Also Published As

Publication number Publication date
CN109460748A (en) 2019-03-12

Similar Documents

Publication Publication Date Title
US11704876B2 (en) Mobile device for viewing of dental treatment outcomes
WO2020042345A1 (en) Method and system for acquiring line-of-sight direction of human eyes by means of single camera
CN105574518B (en) Method and device for detecting living human face
JP6077720B2 (en) Method for determining at least one value of a parameter for customizing a vision compensation device
CN109460748B (en) Three-dimensional visual sign language recognition device and multi-information fusion sign language recognition method
CN112232128B (en) Eye tracking based method for identifying care needs of old disabled people
TWI557601B (en) A puppil positioning system, method, computer program product and computer readable recording medium
CN111145865A (en) Vision-based hand fine motion training guidance system and method
CN115240247A (en) Recognition method and system for detecting motion and posture
CN209980267U (en) Three-vision sign language recognition device
JP4682372B2 (en) Gaze direction detection device, gaze direction detection method, and program for causing computer to execute gaze direction detection method
CN116749168A (en) Rehabilitation track acquisition method based on gesture teaching
JP2005091085A (en) Noncontact type joint angle measuring system
CN113197542B (en) Online self-service vision detection system, mobile terminal and storage medium
JPH06251123A (en) Device for generating talking with hands
JP6461394B1 (en) Image generating apparatus and image generating program
Lin et al. A vision-based compensation detection approach during robotic stroke rehabilitation therapy
CN116110131B (en) Body interaction behavior recognition method and VR system
CN116152931B (en) Gesture recognition method and VR system
Gavier et al. VirtualIMU: Generating Virtual Wearable Inertial Data from Video for Deep Learning Applications
Tomikawa et al. An adaptability of head motion as computer input device
CN114387154A (en) Three-dimensional airway environment construction method for intubation robot
CN116188674A (en) Human skeleton point data marking method, device, electronic equipment and storage medium
CN112826441A (en) Interpupillary distance measuring method based on augmented reality technology
De Marco et al. Data recording and analysis of American Sign Language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant