CN114067433A - Language and image understanding system based on multiple protocols - Google Patents

Language and image understanding system based on multiple protocols Download PDF

Info

Publication number
CN114067433A
CN114067433A CN202111325893.3A CN202111325893A CN114067433A CN 114067433 A CN114067433 A CN 114067433A CN 202111325893 A CN202111325893 A CN 202111325893A CN 114067433 A CN114067433 A CN 114067433A
Authority
CN
China
Prior art keywords
user
language
robot
image
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111325893.3A
Other languages
Chinese (zh)
Inventor
周超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202111325893.3A priority Critical patent/CN114067433A/en
Publication of CN114067433A publication Critical patent/CN114067433A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/238Interfacing the downstream path of the transmission network, e.g. adapting the transmission rate of a video stream to network bandwidth; Processing of multiplex streams
    • H04N21/2387Stream processing in response to a playback request from an end-user, e.g. for trick-play

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a language and image understanding system based on multiple protocols, which comprises an image collecting module, a voice broadcasting module and a human body identifying module, wherein the human body identifying module comprises a limb moving speed analyzing unit, a user disability condition analyzing unit and a sound and image splitting unit, the limb moving speed analyzing unit is used for measuring the limb moving speed of a user during the period that the limb extends out to contact with a robot so as to judge the limb flexible level of the user, if the limb flexible level is high, the playing speed of a display screen of the robot is high so as to save time and avoid the overlong waiting time of a subsequent deaf-mute, the user disability condition analyzing unit is used for detecting whether the user has hearing failure or language failure or both failures, and the sound and image splitting unit is used for starting a voice function and an image function in a time-sharing period according to the disability analysis report of the user, the method has the characteristics of strong practicability and capability of automatically identifying sign language and solving problems in a plurality of response modes.

Description

Language and image understanding system based on multiple protocols
Technical Field
The invention relates to the technical field of sign language, in particular to a multi-protocol-based language and image understanding system.
Background
The deaf-mute can find the robot for help due to the difficulty of communication of the deaf-mute and the low efficiency of seeking help for the pedestrian, and the robot can ask and answer according to the language and the image understanding, thereby effectively improving the comfort of the deaf-mute in the space.
Disclosure of Invention
It is an object of the present invention to provide a multi-protocol based language and image understanding system to solve the problems set forth in the background above.
In order to solve the technical problems, the invention provides the following technical scheme: the utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and image separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out of the robot to contact the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen of the robot has a high display speed so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and image separation unit is used for starting a voice function and an image function according to the disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.
According to the technical scheme, the detection process of the limb movement speed of the user comprises the following steps:
the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as LLevel ofCalculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as LIs perpendicular toThen calculating the distance L from the hand part of the human body to the display screen of the robot through the pythagorean theoremHand (W.E.)
The effective scanning distance of the robot is LEffective distanceIf it is LLevel ofGreater than LEffective distanceThe robot sends voice broadcast to remind the user of approaching, if LLevel ofLess than or equal to LEffective distanceThe robot calculates the moving speed V of the limbs of the userHand (W.E.),VHand (W.E.)=LHand (W.E.)/TContact timeIn the formula, TContact timeSetting the rated user moving speed as V for the time from the stop of the robot to the contact of the limbs of the user with the display screen of the robotForehead (forehead)Will VForehead (forehead)The six levels are divided into V1-V6, V1 indicates that the limb movement speed of the user is the slowest, V6 indicates that the limb movement speed of the user is the fastest, and V is divided intoHand (W.E.)And VForehead (forehead)The method comprises the steps of comparing and judging to obtain a corresponding voice broadcasting speed level and a corresponding video playing speed level, setting the voice broadcasting speed level to be A1-A6, A1 to show that the voice broadcasting speed is the slowest, A6 to show that the voice broadcasting speed is the fastest, setting video playing speed levels B1-B6, B1 to show that the video playing speed is the slowest, and B6 to show that the playing speed is influenced the fastest, so that the effect of judging the information receiving speed according to the body movement speed of a human body is achieved, the customization of personal services is achieved while the information is effectively transmitted, and a user is more comfortable when seeking help.
According to the technical scheme, the process for judging the disability degree of the deaf-mutes comprises the following steps:
setting a user with single hearing impairment as a first-level disability, recording as a c1 person, setting a user with single language impairment as a second-level disability, recording as a c2 person, setting a user with hearing and language impairment as a third-level disability, recording as a c3 person, displaying on a robot display screen, automatically judging that the user is a language handicapped person if the information of the user cannot be received within 3 seconds by the robot, simultaneously sending voice broadcast in the robot, carrying out sign language operation, clicking the display screen to determine, automatically judging that the user is a hearing impaired person if the information of the user cannot be received within 3 seconds by the robot, simultaneously carrying out the judging processes, ending the judging process within 3 seconds, analyzing the result, receiving the voice information by the robot if the user can speak, driving an image playing function and a voice playing function by a voice and video splitting unit at the moment, the user can obtain effective help information from image playing, judge whether the language organization of the user is smooth at the same time, remind the user to use sign language if the fluency is not up to standard, need not remind if the fluency is up to standard, if the user can not speak, the hearing can still drive only the image playing function and the voice playing function by the sound and video splitting unit, if the user can neither speak nor have a barrier, the sound and video splitting unit only drives only the image playing function at the moment, and immediately executes the sign language service function.
According to the technical scheme, the image collection module comprises a gesture recognition unit, a surrounding environment interference elimination unit and a shaking amplitude elimination unit, the gesture recognition unit is used for monitoring gesture changes of a user to identify the user, the surrounding environment interference elimination unit is used for shielding noises and other dynamic behaviors around the robot and increasing communication fluency between the user and the robot, and the shaking amplitude elimination unit is used for eliminating slight shaking in the gesture change process and increasing the accuracy of sign language.
According to the technical scheme, the voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information to help the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.
According to the technical scheme, the sign language information interaction process comprises the following steps:
the robot scans the dynamic gesture of a user, matches the real-time dynamic gesture with the gesture record of the data storage library, translates the sign language meaning of the user, answers correspondingly according to the translated content, solves the problem for the user, pre-judges the meaning of the sentence expressed by the user in the sign language translating process, provides ten sentences which are closest to the meaning expressed by the user, displays the ten sentences on a display screen, is selected by the user, reduces the time for the user to display the sign language, has complex sign language action, provides multiple choices provided by a language and image understanding system in a pre-judging mode, improves the efficiency, reduces the sign language expression information error, increases the accuracy, selects one sentence which is closest to the meaning from the ten pre-judging sentences by the user, solves the problem of answering by the language and image understanding system if the user successfully selects the sentence, and can click to quit to continue displaying the sign language if no pre-judging sentence which is satisfied by the user is in the ten sentences, the gesture recognition unit continues to receive sign language information, translates the sign language information while receiving the sign language information, performs sentence prejudgment again when the translated sign language information obtains translation information with a large prejudgment difference compared with the previous prejudgment, provides ten prejudgment sentences for a user to select, repeats prejudgment until the prejudgment is successful, completely translates the sign language information of the user if the prejudgment is not successful, and answers aiming at the complete information.
According to the above technical solution, the language and image understanding system answer flow:
the process of answering questions can select picture answering, voice broadcasting and image displaying, the language and image understanding system makes selections according to the detection information of the user disability condition analysis unit, picture answering and image displaying can be provided for c1 personnel, image displaying is preferentially selected, image displaying information is specific and easy to understand for users, picture answering, voice broadcasting and image displaying can be provided for c2 personnel, voice broadcasting is preferentially selected, voice broadcasting efficiency is high, picture answering and image displaying can be provided for c3 personnel, image displaying is preferentially selected, and answering information is displayed on a display screen in a sign language mode when image displaying is performed;
in the sign language display process of a user, if the user does not make a selection in the sentence prejudging process, the prejudging sentence is retained on a display screen for 6 seconds, the character learning ability of the user is judged to be weak, response modes are adjusted in the language and image understanding system response process, the answer is given priority by pictures, the number of the pictures is small, the user is prevented from understanding errors, three answer modes of picture answering, voice broadcasting and image displaying can be realized in each answer mode, if the prior answer mode cannot meet the user, the user can manually select the answer mode until the answer mode is satisfied.
According to the technical scheme, special condition analysis is as follows:
people who are congenital deaf will become dumb because the language of people is acquired by the heaven, the language of people cannot be heard from the birth, for this case, the process is completed as described above, and the acquired deaf person is based on the speech ability, a lip language recognition mode can be selected on a screen displayed by the robot, the lip language recognition unit finds out the lip part according to the face information scanned by the human body recognition module, dynamically pre-judging the information to be expressed by the user according to the lips, wherein the lip language recognition accuracy is poor, the pre-judging sentences are provided as six sentences, the selection range is narrowed, the selection speed of the user is accelerated, if the pre-judging sentences are not selected, and continuing to collect the lip language information, performing sentence prejudgment again, after the two times of lip language information prejudgment, the language and image understanding system recommends the user to use sign language to perform semantic output, and simultaneously giving voice broadcast and image display if the sentence prejudgment is successful.
According to the technical scheme, the environment removing and shaking amplitude eliminating process comprises the following steps:
the hand trembling symptoms of the old are common, in the gesture recognition process, besides the dynamic change of hands when the sign language is displayed, the hand slight dynamic amplitude caused by the hand trembling symptoms exists, the hand dynamic amplitude is divided into 12 levels of Y1-Y12 by the hand trembling amplitude elimination unit, Y1 represents that the hand dynamic amplitude is minimum, Y12 represents that the hand dynamic amplitude is maximum, the dynamic amplitudes of Y1-Y2 levels are automatically eliminated, and the errors of a language and image understanding system in the process of recognizing the sign language are reduced;
because this space probably is public space, has personnel mobility, and the unit is just accepted to the show information in the front of robot in the surrounding environment interference of getting rid of, guarantees the information source uniqueness.
According to the technical scheme, the image display information storage source is as follows:
the hand language information is recorded artificially and delivered to animation production companies to produce hand language images with unified standards, and both the hand language images and the hand language logics are stored in a language and image understanding system.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of the system of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: the utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and video separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out to contact with the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen display speed of the robot is high so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and video separation unit is used for starting a voice function and an image function according to a disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.
The user limb moving speed detection process comprises the following steps:
the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as LLevel ofCalculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as LIs perpendicular toThen calculated by Pythagorean theoremObtaining the distance L from the hand of the human body to the display screen of the robotHand (W.E.)
The effective scanning distance of the robot is LEffective distanceIf it is LLevel ofGreater than LEffective distanceThe robot sends voice broadcast to remind the user of approaching, if LLevel ofLess than or equal to LEffective distanceThe robot calculates the moving speed V of the limbs of the userHand (W.E.),VHand (W.E.)=LHand (W.E.)/TContact timeIn the formula, TContact timeSetting the rated user moving speed as V for the time from the stop of the robot to the contact of the limbs of the user with the display screen of the robotForehead (forehead)Will VForehead (forehead)The six levels are divided into V1-V6, V1 indicates that the limb movement speed of the user is the slowest, V6 indicates that the limb movement speed of the user is the fastest, and V is divided intoHand (W.E.)And VForehead (forehead)The method comprises the steps of comparing and judging to obtain a corresponding voice broadcasting speed level and a corresponding video playing speed level, setting the voice broadcasting speed level to be A1-A6, A1 to show that the voice broadcasting speed is the slowest, A6 to show that the voice broadcasting speed is the fastest, setting video playing speed levels B1-B6, B1 to show that the video playing speed is the slowest, and B6 to show that the playing speed is influenced the fastest, so that the effect of judging the information receiving speed according to the body movement speed of a human body is achieved, the customization of personal services is achieved while the information is effectively transmitted, and a user is more comfortable when seeking help.
The process of judging the disability degree of the deaf-mutes comprises the following steps:
setting a user with single hearing impairment as a first-level disability, recording as a c1 person, setting a user with single language impairment as a second-level disability, recording as a c2 person, setting a user with hearing and language impairment as a third-level disability, recording as a c3 person, displaying on a robot display screen, automatically judging that the user is a language handicapped person if the information of the user cannot be received within 3 seconds by the robot, simultaneously sending voice broadcast in the robot, carrying out sign language operation, clicking the display screen to determine, automatically judging that the user is a hearing impaired person if the information of the user cannot be received within 3 seconds by the robot, simultaneously carrying out the judging processes, ending the judging process within 3 seconds, analyzing the result, receiving the voice information by the robot if the user can speak, driving an image playing function and a voice playing function by a voice and video splitting unit at the moment, the user can obtain effective help information from image playing, judge whether the language organization of the user is smooth at the same time, remind the user to use sign language if the fluency is not up to standard, need not remind if the fluency is up to standard, if the user can not speak, the hearing can still drive only the image playing function and the voice playing function by the sound and video splitting unit, if the user can neither speak nor have a barrier, the sound and video splitting unit only drives only the image playing function at the moment, and immediately executes the sign language service function.
The image collection module comprises a gesture recognition unit, a surrounding environment interference eliminating unit and a shaking amplitude eliminating unit, wherein the gesture recognition unit is used for monitoring gesture changes of a user to recognize the purpose of the user, the surrounding environment interference eliminating unit is used for shielding noise and other dynamic behaviors around the robot, the communication fluency between the user and the robot is increased, and the shaking amplitude eliminating unit is used for eliminating slight shaking in the gesture change process and increasing the accuracy of sign language.
The voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information and providing help for the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.
Sign language information interaction flow:
the robot scans the dynamic gesture of a user, matches the real-time dynamic gesture with the gesture record of the data storage library, translates the sign language meaning of the user, answers correspondingly according to the translated content, solves the problem for the user, pre-judges the meaning of the sentence expressed by the user in the sign language translating process, provides ten sentences which are closest to the meaning expressed by the user, displays the ten sentences on a display screen, is selected by the user, reduces the time for the user to display the sign language, has complex sign language action, provides multiple choices provided by a language and image understanding system in a pre-judging mode, improves the efficiency, reduces the sign language expression information error, increases the accuracy, selects one sentence which is closest to the meaning from the ten pre-judging sentences by the user, solves the problem of answering by the language and image understanding system if the user successfully selects the sentence, and can click to quit to continue displaying the sign language if no pre-judging sentence which is satisfied by the user is in the ten sentences, the gesture recognition unit continues to receive sign language information, translates the sign language information while receiving the sign language information, performs sentence prejudgment again when the translated sign language information obtains translation information with a large prejudgment difference compared with the previous prejudgment, provides ten prejudgment sentences for a user to select, repeats prejudgment until the prejudgment is successful, completely translates the sign language information of the user if the prejudgment is not successful, and answers aiming at the complete information.
Language and image understanding system answer flow:
the process of answering questions can select picture answering, voice broadcasting and image displaying, the language and image understanding system makes selections according to the detection information of the user disability condition analysis unit, picture answering and image displaying can be provided for c1 personnel, image displaying is preferentially selected, image displaying information is specific and easy to understand for users, picture answering, voice broadcasting and image displaying can be provided for c2 personnel, voice broadcasting is preferentially selected, voice broadcasting efficiency is high, picture answering and image displaying can be provided for c3 personnel, image displaying is preferentially selected, and answering information is displayed on a display screen in a sign language mode when image displaying is performed;
in the sign language display process of a user, if the user does not make a selection in the sentence prejudging process, the prejudging sentence is retained on a display screen for 6 seconds, the character learning ability of the user is judged to be weak, response modes are adjusted in the language and image understanding system response process, the answer is given priority by pictures, the number of the pictures is small, the user is prevented from understanding errors, three answer modes of picture answering, voice broadcasting and image displaying can be realized in each answer mode, if the prior answer mode cannot meet the user, the user can manually select the answer mode until the answer mode is satisfied.
Special case analysis:
people who are congenital deaf will become dumb because the language of people is acquired by the heaven, the language of people cannot be heard from the birth, for this case, the process is completed as described above, and the acquired deaf person is based on the speech ability, a lip language recognition mode can be selected on a screen displayed by the robot, the lip language recognition unit finds out the lip part according to the face information scanned by the human body recognition module, dynamically pre-judging the information to be expressed by the user according to the lips, wherein the lip language recognition accuracy is poor, the pre-judging sentences are provided as six sentences, the selection range is narrowed, the selection speed of the user is accelerated, if the pre-judging sentences are not selected, and continuing to collect the lip language information, performing sentence prejudgment again, after the two times of lip language information prejudgment, the language and image understanding system recommends the user to use sign language to perform semantic output, and simultaneously giving voice broadcast and image display if the sentence prejudgment is successful.
Environment elimination and jitter amplitude elimination flow:
the hand trembling symptoms of the old are common, in the gesture recognition process, besides the dynamic change of hands when the sign language is displayed, the hand slight dynamic amplitude caused by the hand trembling symptoms exists, the hand dynamic amplitude is divided into 12 levels of Y1-Y12 by the hand trembling amplitude elimination unit, Y1 represents that the hand dynamic amplitude is minimum, Y12 represents that the hand dynamic amplitude is maximum, the dynamic amplitudes of Y1-Y2 levels are automatically eliminated, and the errors of a language and image understanding system in the process of recognizing the sign language are reduced;
because this space probably is public space, has personnel mobility, and the unit is just accepted to the show information in the front of robot in the surrounding environment interference of getting rid of, guarantees the information source uniqueness.
The information storage source of the image display is as follows:
the hand language information is recorded artificially and delivered to animation production companies to produce hand language images with unified standards, and both the hand language images and the hand language logics are stored in a language and image understanding system.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and image separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out of the robot to contact the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen of the robot has a high display speed so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and image separation unit is used for starting a voice function and an image function according to the disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.
2. A multi-protocol based language and image understanding system according to claim 1, wherein: the user limb moving speed detection process comprises the following steps:
the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as LLevel ofCalculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as LIs perpendicular toThen calculating the distance L from the hand part of the human body to the display screen of the robot through the pythagorean theoremHand (W.E.)
The effective scanning distance of the robot is LEffective distanceIf it is LLevel ofGreater than LEffective distanceThe robot sends voice broadcast to remind the user of approaching, if LLevel ofLess than or equal to LEffective distanceThe robot calculates the moving speed V of the limbs of the userHand (W.E.),VHand (W.E.)=LHand (W.E.)/TContact timeIn the formula, TContact timeSetting the rated user moving speed as V for the time from the stop of the robot to the contact of the limbs of the user with the display screen of the robotForehead (forehead)Will VForehead (forehead)The six levels are divided into V1-V6, V1 indicates that the limb movement speed of the user is the slowest, V6 indicates that the limb movement speed of the user is the fastest, and V is divided intoHand (W.E.)And VForehead (forehead)The method comprises the steps of comparing and judging to obtain a corresponding voice broadcasting speed level and a corresponding video playing speed level, setting the voice broadcasting speed level to be A1-A6, A1 to show that the voice broadcasting speed is the slowest, A6 to show that the voice broadcasting speed is the fastest, setting video playing speed levels B1-B6, B1 to show that the video playing speed is the slowest, and B6 to show that the playing speed is influenced the fastest, so that the effect of judging the information receiving speed according to the body movement speed of a human body is achieved, the customization of personal services is achieved while the information is effectively transmitted, and a user is more comfortable when seeking help.
3. A multi-protocol based language and image understanding system according to claim 2, wherein: the process of judging the disability degree of the deaf-mutes comprises the following steps:
setting a user with single hearing impairment as a first-level disability, recording as a c1 person, setting a user with single language impairment as a second-level disability, recording as a c2 person, setting a user with hearing and language impairment as a third-level disability, recording as a c3 person, displaying on a robot display screen, automatically judging that the user is a language handicapped person if the information of the user cannot be received within 3 seconds by the robot, simultaneously sending voice broadcast in the robot, carrying out sign language operation, clicking the display screen to determine, automatically judging that the user is a hearing impaired person if the information of the user cannot be received within 3 seconds by the robot, simultaneously carrying out the judging processes, ending the judging process within 3 seconds, analyzing the result, receiving the voice information by the robot if the user can speak, driving an image playing function and a voice playing function by a voice and video splitting unit at the moment, the user can obtain effective help information from image playing, judge whether the language organization of the user is smooth at the same time, remind the user to use sign language if the fluency is not up to standard, need not remind if the fluency is up to standard, if the user can not speak, the hearing can still drive only the image playing function and the voice playing function by the sound and video splitting unit, if the user can neither speak nor have a barrier, the sound and video splitting unit only drives only the image playing function at the moment, and immediately executes the sign language service function.
4. A multi-protocol based language and image understanding system according to claim 3, wherein: the image collection module comprises a gesture recognition unit, a surrounding environment interference removing unit and a shaking amplitude removing unit, wherein the gesture recognition unit is used for monitoring gesture changes of a user to recognize the user's intention, the surrounding environment interference removing unit is used for shielding noise and other dynamic behaviors around the robot and increasing the communication fluency between the user and the robot, and the shaking amplitude removing unit is used for removing slight shaking in the gesture change process and increasing the accuracy of sign language.
5. A multi-protocol based language and image understanding system according to claim 4, wherein: the voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information and providing help for the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.
6. A multi-protocol based language and image understanding system according to claim 5, wherein: sign language information interaction flow:
the robot scans the dynamic gesture of a user, matches the real-time dynamic gesture with the gesture record of the data storage library, translates the sign language meaning of the user, answers correspondingly according to the translated content, solves the problem for the user, pre-judges the meaning of the sentence expressed by the user in the sign language translating process, provides ten sentences which are closest to the meaning expressed by the user, displays the ten sentences on a display screen, is selected by the user, reduces the time for the user to display the sign language, has complex sign language action, provides multiple choices provided by a language and image understanding system in a pre-judging mode, improves the efficiency, reduces the sign language expression information error, increases the accuracy, selects one sentence which is closest to the meaning from the ten pre-judging sentences by the user, solves the problem of answering by the language and image understanding system if the user successfully selects the sentence, and can click to quit to continue displaying the sign language if no pre-judging sentence which is satisfied by the user is in the ten sentences, the gesture recognition unit continues to receive sign language information, translates the sign language information while receiving the sign language information, performs sentence prejudgment again when the translated sign language information obtains translation information with a large prejudgment difference compared with the previous prejudgment, provides ten prejudgment sentences for a user to select, repeats prejudgment until the prejudgment is successful, completely translates the sign language information of the user if the prejudgment is not successful, and answers aiming at the complete information.
7. A multi-protocol based language and image understanding system according to claim 6, wherein: language and image understanding system answer flow:
the process of answering questions can select picture answering, voice broadcasting and image displaying, the language and image understanding system makes selections according to the detection information of the user disability condition analysis unit, picture answering and image displaying can be provided for c1 personnel, image displaying is preferentially selected, image displaying information is specific and easy to understand for users, picture answering, voice broadcasting and image displaying can be provided for c2 personnel, voice broadcasting is preferentially selected, voice broadcasting efficiency is high, picture answering and image displaying can be provided for c3 personnel, image displaying is preferentially selected, and answering information is displayed on a display screen in a sign language mode when image displaying is performed;
in the sign language display process of a user, if the user does not make a selection in the sentence prejudging process, the prejudging sentence is retained on a display screen for 6 seconds, the character learning ability of the user is judged to be weak, response modes are adjusted in the language and image understanding system response process, the answer is given priority by pictures, the number of the pictures is small, the user is prevented from understanding errors, three answer modes of picture answering, voice broadcasting and image displaying can be realized in each answer mode, if the prior answer mode cannot meet the user, the user can manually select the answer mode until the answer mode is satisfied.
8. A multi-protocol based language and image understanding system according to claim 7, wherein: special case analysis:
people who are congenital deaf will become dumb because the language of people is acquired by the heaven, the language of people cannot be heard from the birth, for this case, the process is completed as described above, and the acquired deaf person is based on the speech ability, a lip language recognition mode can be selected on a screen displayed by the robot, the lip language recognition unit finds out the lip part according to the face information scanned by the human body recognition module, dynamically pre-judging the information to be expressed by the user according to the lips, wherein the lip language recognition accuracy is poor, the pre-judging sentences are provided as six sentences, the selection range is narrowed, the selection speed of the user is accelerated, if the pre-judging sentences are not selected, and continuing to collect the lip language information, performing sentence prejudgment again, after the two times of lip language information prejudgment, the language and image understanding system recommends the user to use sign language to perform semantic output, and simultaneously giving voice broadcast and image display if the sentence prejudgment is successful.
9. A multi-protocol based language and image understanding system according to claim 8, wherein: environment elimination and jitter amplitude elimination flow:
the hand trembling symptoms of the old are common, in the gesture recognition process, besides the dynamic change of hands when the sign language is displayed, the hand slight dynamic amplitude caused by the hand trembling symptoms exists, the hand dynamic amplitude is divided into 12 levels of Y1-Y12 by the hand trembling amplitude elimination unit, Y1 represents that the hand dynamic amplitude is minimum, Y12 represents that the hand dynamic amplitude is maximum, the dynamic amplitudes of Y1-Y2 levels are automatically eliminated, and the errors of a language and image understanding system in the process of recognizing the sign language are reduced;
because this space probably is public space, has personnel mobility, and the unit is just accepted to the show information in the front of robot in the surrounding environment interference of getting rid of, guarantees the information source uniqueness.
10. A multi-protocol based language and image understanding system according to claim 9, wherein: the information storage source of the image display is as follows:
the hand language information is recorded artificially and delivered to animation production companies to produce hand language images with unified standards, and both the hand language images and the hand language logics are stored in a language and image understanding system.
CN202111325893.3A 2021-11-10 2021-11-10 Language and image understanding system based on multiple protocols Pending CN114067433A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111325893.3A CN114067433A (en) 2021-11-10 2021-11-10 Language and image understanding system based on multiple protocols

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111325893.3A CN114067433A (en) 2021-11-10 2021-11-10 Language and image understanding system based on multiple protocols

Publications (1)

Publication Number Publication Date
CN114067433A true CN114067433A (en) 2022-02-18

Family

ID=80274464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111325893.3A Pending CN114067433A (en) 2021-11-10 2021-11-10 Language and image understanding system based on multiple protocols

Country Status (1)

Country Link
CN (1) CN114067433A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805272A (en) * 2022-10-29 2023-09-26 武汉行已学教育咨询有限公司 Visual education teaching analysis method, system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205058054U (en) * 2015-09-29 2016-03-02 塔米智能科技(北京)有限公司 Multi -functional interactive usher robot
CN106502424A (en) * 2016-11-29 2017-03-15 上海小持智能科技有限公司 Based on the interactive augmented reality system of speech gestures and limb action
CN108260006A (en) * 2018-01-12 2018-07-06 南京工程学院 Interactive Intelligent home theater and its control method based on the detection of human body pose
CN110598576A (en) * 2019-08-21 2019-12-20 腾讯科技(深圳)有限公司 Sign language interaction method and device and computer medium
CN111144367A (en) * 2019-12-31 2020-05-12 重庆百事得大牛机器人有限公司 Auxiliary semantic recognition method based on gesture recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205058054U (en) * 2015-09-29 2016-03-02 塔米智能科技(北京)有限公司 Multi -functional interactive usher robot
CN106502424A (en) * 2016-11-29 2017-03-15 上海小持智能科技有限公司 Based on the interactive augmented reality system of speech gestures and limb action
CN108260006A (en) * 2018-01-12 2018-07-06 南京工程学院 Interactive Intelligent home theater and its control method based on the detection of human body pose
CN110598576A (en) * 2019-08-21 2019-12-20 腾讯科技(深圳)有限公司 Sign language interaction method and device and computer medium
CN111144367A (en) * 2019-12-31 2020-05-12 重庆百事得大牛机器人有限公司 Auxiliary semantic recognition method based on gesture recognition

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805272A (en) * 2022-10-29 2023-09-26 武汉行已学教育咨询有限公司 Visual education teaching analysis method, system and storage medium

Similar Documents

Publication Publication Date Title
US11241789B2 (en) Data processing method for care-giving robot and apparatus
CN111709358B (en) Teacher-student behavior analysis system based on classroom video
EP2012304B1 (en) Methods for electronically analysing a dialogue and corresponding systems
US20040152060A1 (en) Learning condition judging program and user condition judging system
CN109448851A (en) A kind of cognition appraisal procedure and device
CN109754653B (en) Method and system for personalized teaching
CN109961047A (en) Study measure of supervision, device, robot and the storage medium of educational robot
Garcia et al. Dysarthric sentence intelligibility: Contribution of iconic gestures and message predictiveness
CN112768070A (en) Mental health evaluation method and system based on dialogue communication
CN114067433A (en) Language and image understanding system based on multiple protocols
CN114582355B (en) Infant crying detection method and device based on audio and video fusion
CN110349063A (en) A kind of school work growth curve test method and system
Abbasi et al. Student mental state inference from unintentional body gestures using dynamic Bayesian networks
CN116088675A (en) Virtual image interaction method, related device, equipment, system and medium
CN113313982B (en) Education system based on 5G network
WO2022180860A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
CN114792521A (en) Intelligent answering method and device based on voice recognition
CN113208592A (en) Psychological test system with multiple answering modes
CN110288986A (en) Online cognition self-appraisal voice system and its processing method
WO2022180853A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
WO2022180862A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
WO2022180856A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
WO2022180861A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
WO2022180855A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program
WO2022180858A1 (en) Video session evaluation terminal, video session evaluation system, and video session evaluation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination