CN114067433A - Language and image understanding system based on multiple protocols - Google Patents
Language and image understanding system based on multiple protocols Download PDFInfo
- Publication number
- CN114067433A CN114067433A CN202111325893.3A CN202111325893A CN114067433A CN 114067433 A CN114067433 A CN 114067433A CN 202111325893 A CN202111325893 A CN 202111325893A CN 114067433 A CN114067433 A CN 114067433A
- Authority
- CN
- China
- Prior art keywords
- user
- language
- robot
- image
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 49
- 230000006870 function Effects 0.000 claims abstract description 26
- 206010011878 Deafness Diseases 0.000 claims abstract description 19
- 230000004044 response Effects 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 43
- 230000008030 elimination Effects 0.000 claims description 11
- 238000003379 elimination reaction Methods 0.000 claims description 11
- 206010044565 Tremor Diseases 0.000 claims description 9
- 210000001061 forehead Anatomy 0.000 claims description 9
- 230000009471 action Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 6
- 230000006735 deficit Effects 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 6
- 238000000926 separation method Methods 0.000 claims description 6
- 208000024891 symptom Diseases 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 4
- 206010010356 Congenital anomaly Diseases 0.000 claims description 3
- 208000032041 Hearing impaired Diseases 0.000 claims description 3
- 230000004888 barrier function Effects 0.000 claims description 3
- 230000006399 behavior Effects 0.000 claims description 3
- 230000037237 body shape Effects 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 208000016354 hearing loss disease Diseases 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000008520 organization Effects 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J19/00—Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
- B25J19/02—Sensing devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/238—Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
- H04N21/2387—Stream processing in response to a playback request from an end-user, e.g. for trick-play
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Manipulator (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a language and image understanding system based on multiple protocols, which comprises an image collecting module, a voice broadcasting module and a human body identifying module, wherein the human body identifying module comprises a limb moving speed analyzing unit, a user disability condition analyzing unit and a sound and image splitting unit, the limb moving speed analyzing unit is used for measuring the limb moving speed of a user during the period that the limb extends out to contact with a robot so as to judge the limb flexible level of the user, if the limb flexible level is high, the playing speed of a display screen of the robot is high so as to save time and avoid the overlong waiting time of a subsequent deaf-mute, the user disability condition analyzing unit is used for detecting whether the user has hearing failure or language failure or both failures, and the sound and image splitting unit is used for starting a voice function and an image function in a time-sharing period according to the disability analysis report of the user, the method has the characteristics of strong practicability and capability of automatically identifying sign language and solving problems in a plurality of response modes.
Description
Technical Field
The invention relates to the technical field of sign language, in particular to a multi-protocol-based language and image understanding system.
Background
The deaf-mute can find the robot for help due to the difficulty of communication of the deaf-mute and the low efficiency of seeking help for the pedestrian, and the robot can ask and answer according to the language and the image understanding, thereby effectively improving the comfort of the deaf-mute in the space.
Disclosure of Invention
It is an object of the present invention to provide a multi-protocol based language and image understanding system to solve the problems set forth in the background above.
In order to solve the technical problems, the invention provides the following technical scheme: the utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and image separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out of the robot to contact the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen of the robot has a high display speed so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and image separation unit is used for starting a voice function and an image function according to the disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.
According to the technical scheme, the detection process of the limb movement speed of the user comprises the following steps:
the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as LLevel ofCalculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as LIs perpendicular toThen calculating the distance L from the hand part of the human body to the display screen of the robot through the pythagorean theoremHand (W.E.);
The effective scanning distance of the robot is LEffective distanceIf it is LLevel ofGreater than LEffective distanceThe robot sends voice broadcast to remind the user of approaching, if LLevel ofLess than or equal to LEffective distanceThe robot calculates the moving speed V of the limbs of the userHand (W.E.),VHand (W.E.)=LHand (W.E.)/TContact timeIn the formula, TContact timeSetting the rated user moving speed as V for the time from the stop of the robot to the contact of the limbs of the user with the display screen of the robotForehead (forehead)Will VForehead (forehead)The six levels are divided into V1-V6, V1 indicates that the limb movement speed of the user is the slowest, V6 indicates that the limb movement speed of the user is the fastest, and V is divided intoHand (W.E.)And VForehead (forehead)The method comprises the steps of comparing and judging to obtain a corresponding voice broadcasting speed level and a corresponding video playing speed level, setting the voice broadcasting speed level to be A1-A6, A1 to show that the voice broadcasting speed is the slowest, A6 to show that the voice broadcasting speed is the fastest, setting video playing speed levels B1-B6, B1 to show that the video playing speed is the slowest, and B6 to show that the playing speed is influenced the fastest, so that the effect of judging the information receiving speed according to the body movement speed of a human body is achieved, the customization of personal services is achieved while the information is effectively transmitted, and a user is more comfortable when seeking help.
According to the technical scheme, the process for judging the disability degree of the deaf-mutes comprises the following steps:
setting a user with single hearing impairment as a first-level disability, recording as a c1 person, setting a user with single language impairment as a second-level disability, recording as a c2 person, setting a user with hearing and language impairment as a third-level disability, recording as a c3 person, displaying on a robot display screen, automatically judging that the user is a language handicapped person if the information of the user cannot be received within 3 seconds by the robot, simultaneously sending voice broadcast in the robot, carrying out sign language operation, clicking the display screen to determine, automatically judging that the user is a hearing impaired person if the information of the user cannot be received within 3 seconds by the robot, simultaneously carrying out the judging processes, ending the judging process within 3 seconds, analyzing the result, receiving the voice information by the robot if the user can speak, driving an image playing function and a voice playing function by a voice and video splitting unit at the moment, the user can obtain effective help information from image playing, judge whether the language organization of the user is smooth at the same time, remind the user to use sign language if the fluency is not up to standard, need not remind if the fluency is up to standard, if the user can not speak, the hearing can still drive only the image playing function and the voice playing function by the sound and video splitting unit, if the user can neither speak nor have a barrier, the sound and video splitting unit only drives only the image playing function at the moment, and immediately executes the sign language service function.
According to the technical scheme, the image collection module comprises a gesture recognition unit, a surrounding environment interference elimination unit and a shaking amplitude elimination unit, the gesture recognition unit is used for monitoring gesture changes of a user to identify the user, the surrounding environment interference elimination unit is used for shielding noises and other dynamic behaviors around the robot and increasing communication fluency between the user and the robot, and the shaking amplitude elimination unit is used for eliminating slight shaking in the gesture change process and increasing the accuracy of sign language.
According to the technical scheme, the voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information to help the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.
According to the technical scheme, the sign language information interaction process comprises the following steps:
the robot scans the dynamic gesture of a user, matches the real-time dynamic gesture with the gesture record of the data storage library, translates the sign language meaning of the user, answers correspondingly according to the translated content, solves the problem for the user, pre-judges the meaning of the sentence expressed by the user in the sign language translating process, provides ten sentences which are closest to the meaning expressed by the user, displays the ten sentences on a display screen, is selected by the user, reduces the time for the user to display the sign language, has complex sign language action, provides multiple choices provided by a language and image understanding system in a pre-judging mode, improves the efficiency, reduces the sign language expression information error, increases the accuracy, selects one sentence which is closest to the meaning from the ten pre-judging sentences by the user, solves the problem of answering by the language and image understanding system if the user successfully selects the sentence, and can click to quit to continue displaying the sign language if no pre-judging sentence which is satisfied by the user is in the ten sentences, the gesture recognition unit continues to receive sign language information, translates the sign language information while receiving the sign language information, performs sentence prejudgment again when the translated sign language information obtains translation information with a large prejudgment difference compared with the previous prejudgment, provides ten prejudgment sentences for a user to select, repeats prejudgment until the prejudgment is successful, completely translates the sign language information of the user if the prejudgment is not successful, and answers aiming at the complete information.
According to the above technical solution, the language and image understanding system answer flow:
the process of answering questions can select picture answering, voice broadcasting and image displaying, the language and image understanding system makes selections according to the detection information of the user disability condition analysis unit, picture answering and image displaying can be provided for c1 personnel, image displaying is preferentially selected, image displaying information is specific and easy to understand for users, picture answering, voice broadcasting and image displaying can be provided for c2 personnel, voice broadcasting is preferentially selected, voice broadcasting efficiency is high, picture answering and image displaying can be provided for c3 personnel, image displaying is preferentially selected, and answering information is displayed on a display screen in a sign language mode when image displaying is performed;
in the sign language display process of a user, if the user does not make a selection in the sentence prejudging process, the prejudging sentence is retained on a display screen for 6 seconds, the character learning ability of the user is judged to be weak, response modes are adjusted in the language and image understanding system response process, the answer is given priority by pictures, the number of the pictures is small, the user is prevented from understanding errors, three answer modes of picture answering, voice broadcasting and image displaying can be realized in each answer mode, if the prior answer mode cannot meet the user, the user can manually select the answer mode until the answer mode is satisfied.
According to the technical scheme, special condition analysis is as follows:
people who are congenital deaf will become dumb because the language of people is acquired by the heaven, the language of people cannot be heard from the birth, for this case, the process is completed as described above, and the acquired deaf person is based on the speech ability, a lip language recognition mode can be selected on a screen displayed by the robot, the lip language recognition unit finds out the lip part according to the face information scanned by the human body recognition module, dynamically pre-judging the information to be expressed by the user according to the lips, wherein the lip language recognition accuracy is poor, the pre-judging sentences are provided as six sentences, the selection range is narrowed, the selection speed of the user is accelerated, if the pre-judging sentences are not selected, and continuing to collect the lip language information, performing sentence prejudgment again, after the two times of lip language information prejudgment, the language and image understanding system recommends the user to use sign language to perform semantic output, and simultaneously giving voice broadcast and image display if the sentence prejudgment is successful.
According to the technical scheme, the environment removing and shaking amplitude eliminating process comprises the following steps:
the hand trembling symptoms of the old are common, in the gesture recognition process, besides the dynamic change of hands when the sign language is displayed, the hand slight dynamic amplitude caused by the hand trembling symptoms exists, the hand dynamic amplitude is divided into 12 levels of Y1-Y12 by the hand trembling amplitude elimination unit, Y1 represents that the hand dynamic amplitude is minimum, Y12 represents that the hand dynamic amplitude is maximum, the dynamic amplitudes of Y1-Y2 levels are automatically eliminated, and the errors of a language and image understanding system in the process of recognizing the sign language are reduced;
because this space probably is public space, has personnel mobility, and the unit is just accepted to the show information in the front of robot in the surrounding environment interference of getting rid of, guarantees the information source uniqueness.
According to the technical scheme, the image display information storage source is as follows:
the hand language information is recorded artificially and delivered to animation production companies to produce hand language images with unified standards, and both the hand language images and the hand language logics are stored in a language and image understanding system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of the system of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: the utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and video separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out to contact with the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen display speed of the robot is high so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and video separation unit is used for starting a voice function and an image function according to a disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.
The user limb moving speed detection process comprises the following steps:
the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as LLevel ofCalculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as LIs perpendicular toThen calculated by Pythagorean theoremObtaining the distance L from the hand of the human body to the display screen of the robotHand (W.E.);
The effective scanning distance of the robot is LEffective distanceIf it is LLevel ofGreater than LEffective distanceThe robot sends voice broadcast to remind the user of approaching, if LLevel ofLess than or equal to LEffective distanceThe robot calculates the moving speed V of the limbs of the userHand (W.E.),VHand (W.E.)=LHand (W.E.)/TContact timeIn the formula, TContact timeSetting the rated user moving speed as V for the time from the stop of the robot to the contact of the limbs of the user with the display screen of the robotForehead (forehead)Will VForehead (forehead)The six levels are divided into V1-V6, V1 indicates that the limb movement speed of the user is the slowest, V6 indicates that the limb movement speed of the user is the fastest, and V is divided intoHand (W.E.)And VForehead (forehead)The method comprises the steps of comparing and judging to obtain a corresponding voice broadcasting speed level and a corresponding video playing speed level, setting the voice broadcasting speed level to be A1-A6, A1 to show that the voice broadcasting speed is the slowest, A6 to show that the voice broadcasting speed is the fastest, setting video playing speed levels B1-B6, B1 to show that the video playing speed is the slowest, and B6 to show that the playing speed is influenced the fastest, so that the effect of judging the information receiving speed according to the body movement speed of a human body is achieved, the customization of personal services is achieved while the information is effectively transmitted, and a user is more comfortable when seeking help.
The process of judging the disability degree of the deaf-mutes comprises the following steps:
setting a user with single hearing impairment as a first-level disability, recording as a c1 person, setting a user with single language impairment as a second-level disability, recording as a c2 person, setting a user with hearing and language impairment as a third-level disability, recording as a c3 person, displaying on a robot display screen, automatically judging that the user is a language handicapped person if the information of the user cannot be received within 3 seconds by the robot, simultaneously sending voice broadcast in the robot, carrying out sign language operation, clicking the display screen to determine, automatically judging that the user is a hearing impaired person if the information of the user cannot be received within 3 seconds by the robot, simultaneously carrying out the judging processes, ending the judging process within 3 seconds, analyzing the result, receiving the voice information by the robot if the user can speak, driving an image playing function and a voice playing function by a voice and video splitting unit at the moment, the user can obtain effective help information from image playing, judge whether the language organization of the user is smooth at the same time, remind the user to use sign language if the fluency is not up to standard, need not remind if the fluency is up to standard, if the user can not speak, the hearing can still drive only the image playing function and the voice playing function by the sound and video splitting unit, if the user can neither speak nor have a barrier, the sound and video splitting unit only drives only the image playing function at the moment, and immediately executes the sign language service function.
The image collection module comprises a gesture recognition unit, a surrounding environment interference eliminating unit and a shaking amplitude eliminating unit, wherein the gesture recognition unit is used for monitoring gesture changes of a user to recognize the purpose of the user, the surrounding environment interference eliminating unit is used for shielding noise and other dynamic behaviors around the robot, the communication fluency between the user and the robot is increased, and the shaking amplitude eliminating unit is used for eliminating slight shaking in the gesture change process and increasing the accuracy of sign language.
The voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information and providing help for the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.
Sign language information interaction flow:
the robot scans the dynamic gesture of a user, matches the real-time dynamic gesture with the gesture record of the data storage library, translates the sign language meaning of the user, answers correspondingly according to the translated content, solves the problem for the user, pre-judges the meaning of the sentence expressed by the user in the sign language translating process, provides ten sentences which are closest to the meaning expressed by the user, displays the ten sentences on a display screen, is selected by the user, reduces the time for the user to display the sign language, has complex sign language action, provides multiple choices provided by a language and image understanding system in a pre-judging mode, improves the efficiency, reduces the sign language expression information error, increases the accuracy, selects one sentence which is closest to the meaning from the ten pre-judging sentences by the user, solves the problem of answering by the language and image understanding system if the user successfully selects the sentence, and can click to quit to continue displaying the sign language if no pre-judging sentence which is satisfied by the user is in the ten sentences, the gesture recognition unit continues to receive sign language information, translates the sign language information while receiving the sign language information, performs sentence prejudgment again when the translated sign language information obtains translation information with a large prejudgment difference compared with the previous prejudgment, provides ten prejudgment sentences for a user to select, repeats prejudgment until the prejudgment is successful, completely translates the sign language information of the user if the prejudgment is not successful, and answers aiming at the complete information.
Language and image understanding system answer flow:
the process of answering questions can select picture answering, voice broadcasting and image displaying, the language and image understanding system makes selections according to the detection information of the user disability condition analysis unit, picture answering and image displaying can be provided for c1 personnel, image displaying is preferentially selected, image displaying information is specific and easy to understand for users, picture answering, voice broadcasting and image displaying can be provided for c2 personnel, voice broadcasting is preferentially selected, voice broadcasting efficiency is high, picture answering and image displaying can be provided for c3 personnel, image displaying is preferentially selected, and answering information is displayed on a display screen in a sign language mode when image displaying is performed;
in the sign language display process of a user, if the user does not make a selection in the sentence prejudging process, the prejudging sentence is retained on a display screen for 6 seconds, the character learning ability of the user is judged to be weak, response modes are adjusted in the language and image understanding system response process, the answer is given priority by pictures, the number of the pictures is small, the user is prevented from understanding errors, three answer modes of picture answering, voice broadcasting and image displaying can be realized in each answer mode, if the prior answer mode cannot meet the user, the user can manually select the answer mode until the answer mode is satisfied.
Special case analysis:
people who are congenital deaf will become dumb because the language of people is acquired by the heaven, the language of people cannot be heard from the birth, for this case, the process is completed as described above, and the acquired deaf person is based on the speech ability, a lip language recognition mode can be selected on a screen displayed by the robot, the lip language recognition unit finds out the lip part according to the face information scanned by the human body recognition module, dynamically pre-judging the information to be expressed by the user according to the lips, wherein the lip language recognition accuracy is poor, the pre-judging sentences are provided as six sentences, the selection range is narrowed, the selection speed of the user is accelerated, if the pre-judging sentences are not selected, and continuing to collect the lip language information, performing sentence prejudgment again, after the two times of lip language information prejudgment, the language and image understanding system recommends the user to use sign language to perform semantic output, and simultaneously giving voice broadcast and image display if the sentence prejudgment is successful.
Environment elimination and jitter amplitude elimination flow:
the hand trembling symptoms of the old are common, in the gesture recognition process, besides the dynamic change of hands when the sign language is displayed, the hand slight dynamic amplitude caused by the hand trembling symptoms exists, the hand dynamic amplitude is divided into 12 levels of Y1-Y12 by the hand trembling amplitude elimination unit, Y1 represents that the hand dynamic amplitude is minimum, Y12 represents that the hand dynamic amplitude is maximum, the dynamic amplitudes of Y1-Y2 levels are automatically eliminated, and the errors of a language and image understanding system in the process of recognizing the sign language are reduced;
because this space probably is public space, has personnel mobility, and the unit is just accepted to the show information in the front of robot in the surrounding environment interference of getting rid of, guarantees the information source uniqueness.
The information storage source of the image display is as follows:
the hand language information is recorded artificially and delivered to animation production companies to produce hand language images with unified standards, and both the hand language images and the hand language logics are stored in a language and image understanding system.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. The utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and image separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out of the robot to contact the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen of the robot has a high display speed so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and image separation unit is used for starting a voice function and an image function according to the disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.
2. A multi-protocol based language and image understanding system according to claim 1, wherein: the user limb moving speed detection process comprises the following steps:
the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as LLevel ofCalculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as LIs perpendicular toThen calculating the distance L from the hand part of the human body to the display screen of the robot through the pythagorean theoremHand (W.E.);
The effective scanning distance of the robot is LEffective distanceIf it is LLevel ofGreater than LEffective distanceThe robot sends voice broadcast to remind the user of approaching, if LLevel ofLess than or equal to LEffective distanceThe robot calculates the moving speed V of the limbs of the userHand (W.E.),VHand (W.E.)=LHand (W.E.)/TContact timeIn the formula, TContact timeSetting the rated user moving speed as V for the time from the stop of the robot to the contact of the limbs of the user with the display screen of the robotForehead (forehead)Will VForehead (forehead)The six levels are divided into V1-V6, V1 indicates that the limb movement speed of the user is the slowest, V6 indicates that the limb movement speed of the user is the fastest, and V is divided intoHand (W.E.)And VForehead (forehead)The method comprises the steps of comparing and judging to obtain a corresponding voice broadcasting speed level and a corresponding video playing speed level, setting the voice broadcasting speed level to be A1-A6, A1 to show that the voice broadcasting speed is the slowest, A6 to show that the voice broadcasting speed is the fastest, setting video playing speed levels B1-B6, B1 to show that the video playing speed is the slowest, and B6 to show that the playing speed is influenced the fastest, so that the effect of judging the information receiving speed according to the body movement speed of a human body is achieved, the customization of personal services is achieved while the information is effectively transmitted, and a user is more comfortable when seeking help.
3. A multi-protocol based language and image understanding system according to claim 2, wherein: the process of judging the disability degree of the deaf-mutes comprises the following steps:
setting a user with single hearing impairment as a first-level disability, recording as a c1 person, setting a user with single language impairment as a second-level disability, recording as a c2 person, setting a user with hearing and language impairment as a third-level disability, recording as a c3 person, displaying on a robot display screen, automatically judging that the user is a language handicapped person if the information of the user cannot be received within 3 seconds by the robot, simultaneously sending voice broadcast in the robot, carrying out sign language operation, clicking the display screen to determine, automatically judging that the user is a hearing impaired person if the information of the user cannot be received within 3 seconds by the robot, simultaneously carrying out the judging processes, ending the judging process within 3 seconds, analyzing the result, receiving the voice information by the robot if the user can speak, driving an image playing function and a voice playing function by a voice and video splitting unit at the moment, the user can obtain effective help information from image playing, judge whether the language organization of the user is smooth at the same time, remind the user to use sign language if the fluency is not up to standard, need not remind if the fluency is up to standard, if the user can not speak, the hearing can still drive only the image playing function and the voice playing function by the sound and video splitting unit, if the user can neither speak nor have a barrier, the sound and video splitting unit only drives only the image playing function at the moment, and immediately executes the sign language service function.
4. A multi-protocol based language and image understanding system according to claim 3, wherein: the image collection module comprises a gesture recognition unit, a surrounding environment interference removing unit and a shaking amplitude removing unit, wherein the gesture recognition unit is used for monitoring gesture changes of a user to recognize the user's intention, the surrounding environment interference removing unit is used for shielding noise and other dynamic behaviors around the robot and increasing the communication fluency between the user and the robot, and the shaking amplitude removing unit is used for removing slight shaking in the gesture change process and increasing the accuracy of sign language.
5. A multi-protocol based language and image understanding system according to claim 4, wherein: the voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information and providing help for the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.
6. A multi-protocol based language and image understanding system according to claim 5, wherein: sign language information interaction flow:
the robot scans the dynamic gesture of a user, matches the real-time dynamic gesture with the gesture record of the data storage library, translates the sign language meaning of the user, answers correspondingly according to the translated content, solves the problem for the user, pre-judges the meaning of the sentence expressed by the user in the sign language translating process, provides ten sentences which are closest to the meaning expressed by the user, displays the ten sentences on a display screen, is selected by the user, reduces the time for the user to display the sign language, has complex sign language action, provides multiple choices provided by a language and image understanding system in a pre-judging mode, improves the efficiency, reduces the sign language expression information error, increases the accuracy, selects one sentence which is closest to the meaning from the ten pre-judging sentences by the user, solves the problem of answering by the language and image understanding system if the user successfully selects the sentence, and can click to quit to continue displaying the sign language if no pre-judging sentence which is satisfied by the user is in the ten sentences, the gesture recognition unit continues to receive sign language information, translates the sign language information while receiving the sign language information, performs sentence prejudgment again when the translated sign language information obtains translation information with a large prejudgment difference compared with the previous prejudgment, provides ten prejudgment sentences for a user to select, repeats prejudgment until the prejudgment is successful, completely translates the sign language information of the user if the prejudgment is not successful, and answers aiming at the complete information.
7. A multi-protocol based language and image understanding system according to claim 6, wherein: language and image understanding system answer flow:
the process of answering questions can select picture answering, voice broadcasting and image displaying, the language and image understanding system makes selections according to the detection information of the user disability condition analysis unit, picture answering and image displaying can be provided for c1 personnel, image displaying is preferentially selected, image displaying information is specific and easy to understand for users, picture answering, voice broadcasting and image displaying can be provided for c2 personnel, voice broadcasting is preferentially selected, voice broadcasting efficiency is high, picture answering and image displaying can be provided for c3 personnel, image displaying is preferentially selected, and answering information is displayed on a display screen in a sign language mode when image displaying is performed;
in the sign language display process of a user, if the user does not make a selection in the sentence prejudging process, the prejudging sentence is retained on a display screen for 6 seconds, the character learning ability of the user is judged to be weak, response modes are adjusted in the language and image understanding system response process, the answer is given priority by pictures, the number of the pictures is small, the user is prevented from understanding errors, three answer modes of picture answering, voice broadcasting and image displaying can be realized in each answer mode, if the prior answer mode cannot meet the user, the user can manually select the answer mode until the answer mode is satisfied.
8. A multi-protocol based language and image understanding system according to claim 7, wherein: special case analysis:
people who are congenital deaf will become dumb because the language of people is acquired by the heaven, the language of people cannot be heard from the birth, for this case, the process is completed as described above, and the acquired deaf person is based on the speech ability, a lip language recognition mode can be selected on a screen displayed by the robot, the lip language recognition unit finds out the lip part according to the face information scanned by the human body recognition module, dynamically pre-judging the information to be expressed by the user according to the lips, wherein the lip language recognition accuracy is poor, the pre-judging sentences are provided as six sentences, the selection range is narrowed, the selection speed of the user is accelerated, if the pre-judging sentences are not selected, and continuing to collect the lip language information, performing sentence prejudgment again, after the two times of lip language information prejudgment, the language and image understanding system recommends the user to use sign language to perform semantic output, and simultaneously giving voice broadcast and image display if the sentence prejudgment is successful.
9. A multi-protocol based language and image understanding system according to claim 8, wherein: environment elimination and jitter amplitude elimination flow:
the hand trembling symptoms of the old are common, in the gesture recognition process, besides the dynamic change of hands when the sign language is displayed, the hand slight dynamic amplitude caused by the hand trembling symptoms exists, the hand dynamic amplitude is divided into 12 levels of Y1-Y12 by the hand trembling amplitude elimination unit, Y1 represents that the hand dynamic amplitude is minimum, Y12 represents that the hand dynamic amplitude is maximum, the dynamic amplitudes of Y1-Y2 levels are automatically eliminated, and the errors of a language and image understanding system in the process of recognizing the sign language are reduced;
because this space probably is public space, has personnel mobility, and the unit is just accepted to the show information in the front of robot in the surrounding environment interference of getting rid of, guarantees the information source uniqueness.
10. A multi-protocol based language and image understanding system according to claim 9, wherein: the information storage source of the image display is as follows:
the hand language information is recorded artificially and delivered to animation production companies to produce hand language images with unified standards, and both the hand language images and the hand language logics are stored in a language and image understanding system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111325893.3A CN114067433A (en) | 2021-11-10 | 2021-11-10 | Language and image understanding system based on multiple protocols |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111325893.3A CN114067433A (en) | 2021-11-10 | 2021-11-10 | Language and image understanding system based on multiple protocols |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114067433A true CN114067433A (en) | 2022-02-18 |
Family
ID=80274464
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111325893.3A Pending CN114067433A (en) | 2021-11-10 | 2021-11-10 | Language and image understanding system based on multiple protocols |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114067433A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116805272A (en) * | 2022-10-29 | 2023-09-26 | 武汉行已学教育咨询有限公司 | Visual education teaching analysis method, system and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN205058054U (en) * | 2015-09-29 | 2016-03-02 | 塔米智能科技(北京)有限公司 | Multi -functional interactive usher robot |
CN106502424A (en) * | 2016-11-29 | 2017-03-15 | 上海小持智能科技有限公司 | Based on the interactive augmented reality system of speech gestures and limb action |
CN108260006A (en) * | 2018-01-12 | 2018-07-06 | 南京工程学院 | Interactive Intelligent home theater and its control method based on the detection of human body pose |
CN110598576A (en) * | 2019-08-21 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Sign language interaction method and device and computer medium |
CN111144367A (en) * | 2019-12-31 | 2020-05-12 | 重庆百事得大牛机器人有限公司 | Auxiliary semantic recognition method based on gesture recognition |
-
2021
- 2021-11-10 CN CN202111325893.3A patent/CN114067433A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN205058054U (en) * | 2015-09-29 | 2016-03-02 | 塔米智能科技(北京)有限公司 | Multi -functional interactive usher robot |
CN106502424A (en) * | 2016-11-29 | 2017-03-15 | 上海小持智能科技有限公司 | Based on the interactive augmented reality system of speech gestures and limb action |
CN108260006A (en) * | 2018-01-12 | 2018-07-06 | 南京工程学院 | Interactive Intelligent home theater and its control method based on the detection of human body pose |
CN110598576A (en) * | 2019-08-21 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Sign language interaction method and device and computer medium |
CN111144367A (en) * | 2019-12-31 | 2020-05-12 | 重庆百事得大牛机器人有限公司 | Auxiliary semantic recognition method based on gesture recognition |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116805272A (en) * | 2022-10-29 | 2023-09-26 | 武汉行已学教育咨询有限公司 | Visual education teaching analysis method, system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11241789B2 (en) | Data processing method for care-giving robot and apparatus | |
CN111709358B (en) | Teacher-student behavior analysis system based on classroom video | |
EP2012304B1 (en) | Methods for electronically analysing a dialogue and corresponding systems | |
US20040152060A1 (en) | Learning condition judging program and user condition judging system | |
CN109448851A (en) | A kind of cognition appraisal procedure and device | |
CN109754653B (en) | Method and system for personalized teaching | |
CN109961047A (en) | Study measure of supervision, device, robot and the storage medium of educational robot | |
Garcia et al. | Dysarthric sentence intelligibility: Contribution of iconic gestures and message predictiveness | |
CN112768070A (en) | Mental health evaluation method and system based on dialogue communication | |
CN114067433A (en) | Language and image understanding system based on multiple protocols | |
CN114582355B (en) | Infant crying detection method and device based on audio and video fusion | |
CN110349063A (en) | A kind of school work growth curve test method and system | |
Abbasi et al. | Student mental state inference from unintentional body gestures using dynamic Bayesian networks | |
CN116088675A (en) | Virtual image interaction method, related device, equipment, system and medium | |
CN113313982B (en) | Education system based on 5G network | |
WO2022180860A1 (en) | Video session evaluation terminal, video session evaluation system, and video session evaluation program | |
CN114792521A (en) | Intelligent answering method and device based on voice recognition | |
CN113208592A (en) | Psychological test system with multiple answering modes | |
CN110288986A (en) | Online cognition self-appraisal voice system and its processing method | |
WO2022180853A1 (en) | Video session evaluation terminal, video session evaluation system, and video session evaluation program | |
WO2022180862A1 (en) | Video session evaluation terminal, video session evaluation system, and video session evaluation program | |
WO2022180856A1 (en) | Video session evaluation terminal, video session evaluation system, and video session evaluation program | |
WO2022180861A1 (en) | Video session evaluation terminal, video session evaluation system, and video session evaluation program | |
WO2022180855A1 (en) | Video session evaluation terminal, video session evaluation system, and video session evaluation program | |
WO2022180858A1 (en) | Video session evaluation terminal, video session evaluation system, and video session evaluation program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |