CN114067433A

CN114067433A - Language and image understanding system based on multiple protocols

Info

Publication number: CN114067433A
Application number: CN202111325893.3A
Authority: CN
Inventors: 周超
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-02-18

Abstract

The invention discloses a language and image understanding system based on multiple protocols, which comprises an image collecting module, a voice broadcasting module and a human body identifying module, wherein the human body identifying module comprises a limb moving speed analyzing unit, a user disability condition analyzing unit and a sound and image splitting unit, the limb moving speed analyzing unit is used for measuring the limb moving speed of a user during the period that the limb extends out to contact with a robot so as to judge the limb flexible level of the user, if the limb flexible level is high, the playing speed of a display screen of the robot is high so as to save time and avoid the overlong waiting time of a subsequent deaf-mute, the user disability condition analyzing unit is used for detecting whether the user has hearing failure or language failure or both failures, and the sound and image splitting unit is used for starting a voice function and an image function in a time-sharing period according to the disability analysis report of the user, the method has the characteristics of strong practicability and capability of automatically identifying sign language and solving problems in a plurality of response modes.

Description

Language and image understanding system based on multiple protocols

Technical Field

The invention relates to the technical field of sign language, in particular to a multi-protocol-based language and image understanding system.

Background

The deaf-mute can find the robot for help due to the difficulty of communication of the deaf-mute and the low efficiency of seeking help for the pedestrian, and the robot can ask and answer according to the language and the image understanding, thereby effectively improving the comfort of the deaf-mute in the space.

Disclosure of Invention

It is an object of the present invention to provide a multi-protocol based language and image understanding system to solve the problems set forth in the background above.

In order to solve the technical problems, the invention provides the following technical scheme: the utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and image separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out of the robot to contact the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen of the robot has a high display speed so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and image separation unit is used for starting a voice function and an image function according to the disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.

According to the technical scheme, the detection process of the limb movement speed of the user comprises the following steps:

the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as L_{Level of}Calculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as L_{Is perpendicular to}Then calculating the distance L from the hand part of the human body to the display screen of the robot through the pythagorean theorem_{Hand (W.E.)}；

The effective scanning distance of the robot is L_{Effective distance}If it is L_{Level of}Greater than L_{Effective distance}The robot sends voice broadcast to remind the user of approaching, if L_{Level of}Less than or equal to L_{Effective distance}The robot calculates the moving speed V of the limbs of the user_{Hand (W.E.)}，V_{Hand (W.E.)}＝L_{Hand (W.E.)}/T_{Contact time}In the formula, T_{Contact time}Setting the rated user moving speed as V for the time from the stop of the robot to the contact of the limbs of the user with the display screen of the robot_{Forehead (forehead)}Will V_{Forehead (forehead)}The six levels are divided into V1-V6, V1 indicates that the limb movement speed of the user is the slowest, V6 indicates that the limb movement speed of the user is the fastest, and V is divided into_{Hand (W.E.)}And V_{Forehead (forehead)}The method comprises the steps of comparing and judging to obtain a corresponding voice broadcasting speed level and a corresponding video playing speed level, setting the voice broadcasting speed level to be A1-A6, A1 to show that the voice broadcasting speed is the slowest, A6 to show that the voice broadcasting speed is the fastest, setting video playing speed levels B1-B6, B1 to show that the video playing speed is the slowest, and B6 to show that the playing speed is influenced the fastest, so that the effect of judging the information receiving speed according to the body movement speed of a human body is achieved, the customization of personal services is achieved while the information is effectively transmitted, and a user is more comfortable when seeking help.

According to the technical scheme, the process for judging the disability degree of the deaf-mutes comprises the following steps:

setting a user with single hearing impairment as a first-level disability, recording as a c1 person, setting a user with single language impairment as a second-level disability, recording as a c2 person, setting a user with hearing and language impairment as a third-level disability, recording as a c3 person, displaying on a robot display screen, automatically judging that the user is a language handicapped person if the information of the user cannot be received within 3 seconds by the robot, simultaneously sending voice broadcast in the robot, carrying out sign language operation, clicking the display screen to determine, automatically judging that the user is a hearing impaired person if the information of the user cannot be received within 3 seconds by the robot, simultaneously carrying out the judging processes, ending the judging process within 3 seconds, analyzing the result, receiving the voice information by the robot if the user can speak, driving an image playing function and a voice playing function by a voice and video splitting unit at the moment, the user can obtain effective help information from image playing, judge whether the language organization of the user is smooth at the same time, remind the user to use sign language if the fluency is not up to standard, need not remind if the fluency is up to standard, if the user can not speak, the hearing can still drive only the image playing function and the voice playing function by the sound and video splitting unit, if the user can neither speak nor have a barrier, the sound and video splitting unit only drives only the image playing function at the moment, and immediately executes the sign language service function.

According to the technical scheme, the image collection module comprises a gesture recognition unit, a surrounding environment interference elimination unit and a shaking amplitude elimination unit, the gesture recognition unit is used for monitoring gesture changes of a user to identify the user, the surrounding environment interference elimination unit is used for shielding noises and other dynamic behaviors around the robot and increasing communication fluency between the user and the robot, and the shaking amplitude elimination unit is used for eliminating slight shaking in the gesture change process and increasing the accuracy of sign language.

According to the technical scheme, the voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information to help the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.

According to the technical scheme, the sign language information interaction process comprises the following steps:

the robot scans the dynamic gesture of a user, matches the real-time dynamic gesture with the gesture record of the data storage library, translates the sign language meaning of the user, answers correspondingly according to the translated content, solves the problem for the user, pre-judges the meaning of the sentence expressed by the user in the sign language translating process, provides ten sentences which are closest to the meaning expressed by the user, displays the ten sentences on a display screen, is selected by the user, reduces the time for the user to display the sign language, has complex sign language action, provides multiple choices provided by a language and image understanding system in a pre-judging mode, improves the efficiency, reduces the sign language expression information error, increases the accuracy, selects one sentence which is closest to the meaning from the ten pre-judging sentences by the user, solves the problem of answering by the language and image understanding system if the user successfully selects the sentence, and can click to quit to continue displaying the sign language if no pre-judging sentence which is satisfied by the user is in the ten sentences, the gesture recognition unit continues to receive sign language information, translates the sign language information while receiving the sign language information, performs sentence prejudgment again when the translated sign language information obtains translation information with a large prejudgment difference compared with the previous prejudgment, provides ten prejudgment sentences for a user to select, repeats prejudgment until the prejudgment is successful, completely translates the sign language information of the user if the prejudgment is not successful, and answers aiming at the complete information.

According to the above technical solution, the language and image understanding system answer flow:

the process of answering questions can select picture answering, voice broadcasting and image displaying, the language and image understanding system makes selections according to the detection information of the user disability condition analysis unit, picture answering and image displaying can be provided for c1 personnel, image displaying is preferentially selected, image displaying information is specific and easy to understand for users, picture answering, voice broadcasting and image displaying can be provided for c2 personnel, voice broadcasting is preferentially selected, voice broadcasting efficiency is high, picture answering and image displaying can be provided for c3 personnel, image displaying is preferentially selected, and answering information is displayed on a display screen in a sign language mode when image displaying is performed;

in the sign language display process of a user, if the user does not make a selection in the sentence prejudging process, the prejudging sentence is retained on a display screen for 6 seconds, the character learning ability of the user is judged to be weak, response modes are adjusted in the language and image understanding system response process, the answer is given priority by pictures, the number of the pictures is small, the user is prevented from understanding errors, three answer modes of picture answering, voice broadcasting and image displaying can be realized in each answer mode, if the prior answer mode cannot meet the user, the user can manually select the answer mode until the answer mode is satisfied.

According to the technical scheme, special condition analysis is as follows:

people who are congenital deaf will become dumb because the language of people is acquired by the heaven, the language of people cannot be heard from the birth, for this case, the process is completed as described above, and the acquired deaf person is based on the speech ability, a lip language recognition mode can be selected on a screen displayed by the robot, the lip language recognition unit finds out the lip part according to the face information scanned by the human body recognition module, dynamically pre-judging the information to be expressed by the user according to the lips, wherein the lip language recognition accuracy is poor, the pre-judging sentences are provided as six sentences, the selection range is narrowed, the selection speed of the user is accelerated, if the pre-judging sentences are not selected, and continuing to collect the lip language information, performing sentence prejudgment again, after the two times of lip language information prejudgment, the language and image understanding system recommends the user to use sign language to perform semantic output, and simultaneously giving voice broadcast and image display if the sentence prejudgment is successful.

According to the technical scheme, the environment removing and shaking amplitude eliminating process comprises the following steps:

the hand trembling symptoms of the old are common, in the gesture recognition process, besides the dynamic change of hands when the sign language is displayed, the hand slight dynamic amplitude caused by the hand trembling symptoms exists, the hand dynamic amplitude is divided into 12 levels of Y1-Y12 by the hand trembling amplitude elimination unit, Y1 represents that the hand dynamic amplitude is minimum, Y12 represents that the hand dynamic amplitude is maximum, the dynamic amplitudes of Y1-Y2 levels are automatically eliminated, and the errors of a language and image understanding system in the process of recognizing the sign language are reduced;

because this space probably is public space, has personnel mobility, and the unit is just accepted to the show information in the front of robot in the surrounding environment interference of getting rid of, guarantees the information source uniqueness.

According to the technical scheme, the image display information storage source is as follows:

the hand language information is recorded artificially and delivered to animation production companies to produce hand language images with unified standards, and both the hand language images and the hand language logics are stored in a language and image understanding system.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of the system of the present invention;

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a technical solution: the utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and video separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out to contact with the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen display speed of the robot is high so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and video separation unit is used for starting a voice function and an image function according to a disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.

The user limb moving speed detection process comprises the following steps:

the robot runs in a public area, a user stands on the front side of the robot with the help of the robot, namely the robot stops running immediately, detects the current user, scans the height and the body shape of the current user, calculates the horizontal distance between the body and a display screen of the robot and records the horizontal distance as L_{Level of}Calculating the distance from the closest point of the human body to the robot display screen to the hand of the human body and recording the distance as L_{Is perpendicular to}Then calculated by Pythagorean theoremObtaining the distance L from the hand of the human body to the display screen of the robot_{Hand (W.E.)}；

The process of judging the disability degree of the deaf-mutes comprises the following steps:

The image collection module comprises a gesture recognition unit, a surrounding environment interference eliminating unit and a shaking amplitude eliminating unit, wherein the gesture recognition unit is used for monitoring gesture changes of a user to recognize the purpose of the user, the surrounding environment interference eliminating unit is used for shielding noise and other dynamic behaviors around the robot, the communication fluency between the user and the robot is increased, and the shaking amplitude eliminating unit is used for eliminating slight shaking in the gesture change process and increasing the accuracy of sign language.

The voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information and providing help for the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.

Sign language information interaction flow:

Language and image understanding system answer flow:

Special case analysis:

Environment elimination and jitter amplitude elimination flow:

The information storage source of the image display is as follows:

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The utility model provides a language and image understanding system based on multiprotocol, includes image collection module, voice broadcast module and human identification module, its characterized in that: the human body identification module comprises a limb movement speed analysis unit, a user disability condition analysis unit and a sound and image separation unit, wherein the limb movement speed analysis unit is used for measuring the limb movement speed of a user during the period that the limb extends out of the robot to contact the robot so as to judge the flexible level of the limb of the user, if the flexible level of the limb is high, the display screen of the robot has a high display speed so as to save time and avoid overlong waiting time of subsequent deaf-mutes, the user disability condition analysis unit is used for detecting whether the user has hearing faults or language faults or both faults, and the sound and image separation unit is used for starting a voice function and an image function according to the disability analysis report of the user so as to reduce unnecessary consumption of a robot battery.

2. A multi-protocol based language and image understanding system according to claim 1, wherein: the user limb moving speed detection process comprises the following steps:

3. A multi-protocol based language and image understanding system according to claim 2, wherein: the process of judging the disability degree of the deaf-mutes comprises the following steps:

4. A multi-protocol based language and image understanding system according to claim 3, wherein: the image collection module comprises a gesture recognition unit, a surrounding environment interference removing unit and a shaking amplitude removing unit, wherein the gesture recognition unit is used for monitoring gesture changes of a user to recognize the user's intention, the surrounding environment interference removing unit is used for shielding noise and other dynamic behaviors around the robot and increasing the communication fluency between the user and the robot, and the shaking amplitude removing unit is used for removing slight shaking in the gesture change process and increasing the accuracy of sign language.

5. A multi-protocol based language and image understanding system according to claim 4, wherein: the voice broadcasting module comprises a voice receiving unit, a voice playing unit and a lip language recognition unit, the voice receiving unit is used for receiving voice information of a user, the voice playing unit is used for playing set voice information and providing help for the user, and the lip language recognition unit is used for providing lip language service for deaf-mutes who cannot sign language.

6. A multi-protocol based language and image understanding system according to claim 5, wherein: sign language information interaction flow:

7. A multi-protocol based language and image understanding system according to claim 6, wherein: language and image understanding system answer flow:

8. A multi-protocol based language and image understanding system according to claim 7, wherein: special case analysis:

9. A multi-protocol based language and image understanding system according to claim 8, wherein: environment elimination and jitter amplitude elimination flow:

10. A multi-protocol based language and image understanding system according to claim 9, wherein: the information storage source of the image display is as follows: