CN111611868A

CN111611868A - System and method for recognizing head action semantics facing to dumb language system

Info

Publication number: CN111611868A
Application number: CN202010332961.8A
Authority: CN
Inventors: 林羽晨; 张金艺
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-09-01

Abstract

The invention relates to a system and a method for recognizing head action semantics facing a dummy language system. The system consists of a camera unit, a processor unit and a display unit; the operation steps of the method are embodied by a semantic flow for detecting and identifying the head action by a processor unit. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.

Description

System and method for recognizing head action semantics facing to dumb language system

Technical Field

The invention relates to a video semantic recognition system and a method, in particular to a head action semantic recognition system and a method for a mute system.

Background

Video semantic recognition is always a research hotspot concerned by the academic and industrial circles, and has great value in the field of human-computer interaction. Since the 21 st century, video semantic recognition technology has become the focus and key point of deployment of government policies in our country. Active semantic recognition is an important branch of video semantic recognition, which can be applied across numerous fields. The head action semantic recognition facing the dummy language system is an important application field of the activity semantic recognition, can recognize the head action semantic of a dummy language user, and enriches the dummy language recognition system.

The existing recognition system of the dummy language is homogeneous and single, and the expressed semantic meaning of the dummy language is judged mostly only by detecting and recognizing the gesture of a dummy language user. In real life, however, the mute user can express rich semantics through head movements and transmit more various comprehensive semantics by combining gestures. Therefore, in the whisper system, it becomes more important to recognize the semantic meaning of the head action. However, the current mature dummy speech recognition system does not have the function of detecting and recognizing the semantic of the head motion, so the effect in practical application is not good.

The invention provides a system and a method for recognizing head action semantics facing a dummy language system. System background of the invention: the camera unit is used for capturing the video data of the user in the mute language, the processor unit is used for detecting and identifying the head action semantics of the user, and the display unit is used for outputting a processing result. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized.

Because the body posture of the mute user is relatively fixed and the illumination change of the environment is relatively small, the camera unit is used for capturing video data and is communicated with the processor unit, such as USB, RS-485, WiFi, Bluetooth and the like, so that the head action semantics can be recognized, the method is completely feasible, and the practical application environment has universality.

Disclosure of Invention

The invention aims to provide a system and a method for recognizing head action semantics facing a dummy language system, aiming at the limitation of the current dummy language recognition system in practical application. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.

In order to achieve the purpose, the invention adopts the following technical scheme:

a system and a method for recognizing the semantic of head action based on a mute system can recognize the whole head action of a mute user, and can also recognize the detail actions of five sense organs such as eyes and mouth of the mute user, thereby realizing more complete semantic recognition. The system mainly comprises a camera unit, a processor unit and a display unit, and is characterized in that: the camera units are connected with the processor unit in a wired or wireless mode, and the number n of the camera units is at least 1; the processor unit is connected with the display unit in a wired or wireless mode, and the number m of the display units is at least 1.

The number n of the camera units may vary depending on the size of the system, but is at least 1.

The structure of the processor unit: the device comprises a data receiving module, a filtering and noise reducing module, a head action detecting module, a data cutting module, a head action identifying module, a five sense organs action identifying module, a semantic generating module and a data sending module. The data receiving module is connected with the filtering and noise reducing module in a wired mode, the filtering and noise reducing module is connected with the head action detecting module in a wired mode, the head action detecting module is connected with the data cutting module in a wired mode, the data cutting module is connected with the head action identifying module in a wired mode, the data cutting module is connected with the five sense organ action identifying module in a wired mode, the head action identifying module is connected with the semantic generating module in a wired mode, the five sense organ action identifying module is connected with the semantic generating module in a wired mode, the semantic generating module is connected with the data sending module in a wired mode, and the head action detecting module is connected with the data sending module in a wired mode.

The number m of the above-mentioned display units may vary depending on the size of the system scale, but is at least 1.

The head action semantic recognition system facing the mute system can efficiently and accurately recognize the head action semantic of the mute user.

A method for recognizing the head action semantic facing to a mute language system is operated by adopting the system and is characterized in that: the head action semantic recognition flow is embodied by a processor unit detecting and recognizing the head action semantic flow.

The processor unit detects and identifies a semantic flow of the head action:

1) the data receiving module is used for receiving the video data transmitted by the camera unit and transmitting the video data to the filtering and noise reducing module;

2) the filtering and noise reducing module filters noise in the video data and improves the reliability of the data;

3) the head action detection module detects and judges whether the video contains head action, and if not, the data sending module directly sends an unidentified result to the display unit; if yes, sending the video to a data clipping module;

4) the data cutting module is used for preprocessing data, selecting key frames in the video and improving the operation processing speed of the system;

5) the head action recognition module recognizes the whole head action in the video and classifies the head action; the five sense organs action recognition module recognizes the detail action of the five sense organs of the head in the video and classifies the details action of the five sense organs;

6) the semantic generation module converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic description;

7) and the data sending module sends the final semantic description result to the display unit to realize the semantic recognition of the head action.

Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:

the system of the invention consists of a camera unit, a processor unit and a display unit; the method comprises the step that a processor unit detects and identifies a semantic flow of the head action. The invention can not only identify the whole head action of the mute user, but also identify the detail action of the eyes, the mouth and other five sense organs of the mute user, and realize more complete semantic identification. Because the body posture of the mute user is relatively fixed and the illumination change of the environment is relatively small, the camera unit is used for capturing video data and is communicated with the processor unit, such as USB, RS-485, WiFi, Bluetooth and the like, so that the head action semantics can be recognized, the method is completely feasible, and the practical application environment has universality.

Drawings

Fig. 1 is a schematic structural diagram of a semantic recognition system for head movement based on a whispering system according to a first embodiment of the present invention.

Fig. 2 is a block diagram of a processor unit for implementing semantic recognition of a head action in a dummy language oriented system according to a second embodiment of the present invention.

FIG. 3 is a flowchart of a work flow of implementing a processor unit to detect and recognize a semantic of a head action according to a third embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings:

example one

Referring to fig. 1, the semantic recognition system for the head movement facing the dumb language system is composed of a camera unit (1.1, 1.2, …, 1.n), a processor unit 2 and a display unit (3.1, 3.2, …, 3.m), wherein the camera unit (1.1, 1.2, …, 1.n) is connected with the processor unit 2 in a wired or wireless manner; the processor unit 2 is connected to the display units (3.1, 3.2, …, 3.m) in a wired or wireless manner. The camera units (1.1, 1.2, …, 1.n) can vary in size according to the scale of the system, but are at least 1; the display units (3.1, 3.2, …, 3.m) can vary in size according to the scale of the system, but are at least 1.

Example two

This embodiment is substantially the same as the first embodiment, and is characterized in that:

referring to fig. 2, the structure of the processor unit 2: the device comprises a data receiving module 4, a filtering and noise reducing module 5, a head action detecting module 6, a data clipping module 7, a head action identifying module 8, a five sense organs action identifying module 9, a semantic generating module 10 and a data sending module 11. The data receiving module 4 is connected with the filtering and noise reducing module 5 in a wired manner, the filtering and noise reducing module 5 is connected with the head action detecting module 6 in a wired manner, the head action detecting module 6 is connected with the data clipping module 7 in a wired manner, the data clipping module 7 is connected with the head action identifying module 8 in a wired manner, the data clipping module 7 is connected with the facial features action identifying module 9 in a wired manner, the head action identifying module 8 is connected with the semantic generating module 10 in a wired manner, the facial features action identifying module 9 is connected with the semantic generating module 10 in a wired manner, the semantic generating module 10 is connected with the data sending module 11 in a wired manner, and the head action detecting module 6 is connected with the data sending module 11 in a wired manner.

EXAMPLE III

The method for recognizing the head action semantics facing the dummy language system is operated by adopting the system. The method is characterized in that the head action semantic recognition flow is embodied by a processor unit detecting and recognizing the head action semantic flow.

Referring to fig. 3, the processor unit detects a semantic flow of recognizing head actions:

1) the data receiving module 4 is used for receiving the video data transmitted by the camera units (1.1, 1.2, …, 1.n) and sending the video data to the filtering and noise reducing module 5;

2) the filtering and noise reducing module 5 filters noise in the video data, and improves the reliability of the data;

3) the head action detection module 6 detects and judges whether the video contains head action, if not, the data sending module 11 directly sends the 'unidentified' result to the display unit (3.1, 3.2, …, 3. m); if yes, sending the video to a data clipping module 7;

4) the data cutting module 7 preprocesses the data, selects key frames in the video and improves the operation processing speed of the system;

5) the head action recognition module 8 recognizes the whole head action in the video and classifies the head action; the five sense organs action recognition module 9 recognizes the detail action of the five sense organs of the head in the video and classifies the details action;

6) the semantic generation module 10 converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic descriptions;

7) the data sending module 11 sends the final semantic description result to the display units (3.1, 3.2, …, 3.m) to realize semantic recognition of the head action.

In the above embodiment of the invention, the camera unit is used for capturing the video data of the mute user, the processor unit is used for detecting and identifying the head action semantics of the mute user, and the display unit is used for outputting the processing result. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.

Claims

1. A semantic recognition system for head action facing to a mute language system is composed of n camera units (1.1, 1.2, …, 1.n), a processor unit (2) and m display units (3.1, 3.2, …, 3.m), and is characterized in that: the camera units (1.1, 1.2, …, 1.n) are connected with the processor unit (2) in a wired or wireless mode, and the number n of the camera units is at least 1; the processor unit (2) is connected in a wired or wireless manner to display units (3.1, 3.2, …, 3.m), the number of which m is at least 1.

2. The matte-language-system-oriented head action semantic recognition system of claim 1, wherein: the processor unit (2) consists of a data receiving module (4), a filtering and noise reducing module (5), a head action detecting module (6), a data cutting module (7), a head action identifying module (8), a five sense organ action identifying module (9), a semantic generating module (10) and a data sending module (11), wherein the data receiving module (4) is connected with the filtering and noise reducing module (5) in a wired mode, the filtering and noise reducing module (5) is connected with the head action detecting module (6) in a wired mode, the head action detecting module (6) is connected with the data cutting module (7) in a wired mode, the data cutting module (7) is connected with the head action identifying module (8) in a wired mode, the data cutting module (7) is connected with the five sense organ action identifying module (9) in a wired mode, and the head action identifying module (8) is connected with the semantic generating module (10) in a wired mode, the five sense organs action recognition module (9) is connected with the semantic generation module (10) in a wired mode, the semantic generation module (10) is connected with the data transmission module (11) in a wired mode, and the head action detection module (6) is connected with the data transmission module (11) in a wired mode.

3. A method for recognizing semantic head movements for a whisper system, which is operated by the system for recognizing semantic head movements for a whisper system as claimed in claim 1, wherein: the head semantic identification flow is embodied by a flow of detecting and identifying head action semantics by a processor unit; the processor unit detects and identifies a head action semantic flow:

1) the data receiving module (4) is used for receiving the video data transmitted by the camera units (1.1, 1.2, …, 1.n) and sending the video data to the filtering and noise reducing module (5); the filtering and noise reducing module (5) filters noise in the video data and improves the reliability of the data;

2) the head action detection module (6) detects and judges whether the video contains head action, if not, the data transmission module (10) directly transmits the 'unidentified' result to the display units (3.1, 3.2, …, 3. m); if yes, sending the video to a data clipping module (7);

3) the data cutting module (7) preprocesses the data, selects key frames in the video and improves the operation processing speed of the system;

4) the head action recognition module (8) recognizes the whole head action in the video and classifies the head action;

5) the five sense organ action recognition module (9) recognizes the detail action of the five sense organs of the head in the video and classifies the detail action;

6) the semantic generation module (10) converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic descriptions;

7) and the data sending module (11) sends the final semantic description result to the display units (3.1, 3.2, … and 3.m) to realize the semantic recognition of the head action.