CN111611868A - System and method for recognizing head action semantics facing to dumb language system - Google Patents
System and method for recognizing head action semantics facing to dumb language system Download PDFInfo
- Publication number
- CN111611868A CN111611868A CN202010332961.8A CN202010332961A CN111611868A CN 111611868 A CN111611868 A CN 111611868A CN 202010332961 A CN202010332961 A CN 202010332961A CN 111611868 A CN111611868 A CN 111611868A
- Authority
- CN
- China
- Prior art keywords
- module
- action
- head
- semantic
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009471 action Effects 0.000 title claims abstract description 105
- 238000000034 method Methods 0.000 title claims abstract description 22
- 210000000697 sensory organ Anatomy 0.000 claims abstract description 22
- 238000001914 filtration Methods 0.000 claims description 15
- 230000004886 head movement Effects 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims 3
- 210000003128 head Anatomy 0.000 abstract description 65
- 230000000007 visual effect Effects 0.000 abstract description 3
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a system and a method for recognizing head action semantics facing a dummy language system. The system consists of a camera unit, a processor unit and a display unit; the operation steps of the method are embodied by a semantic flow for detecting and identifying the head action by a processor unit. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.
Description
Technical Field
The invention relates to a video semantic recognition system and a method, in particular to a head action semantic recognition system and a method for a mute system.
Background
Video semantic recognition is always a research hotspot concerned by the academic and industrial circles, and has great value in the field of human-computer interaction. Since the 21 st century, video semantic recognition technology has become the focus and key point of deployment of government policies in our country. Active semantic recognition is an important branch of video semantic recognition, which can be applied across numerous fields. The head action semantic recognition facing the dummy language system is an important application field of the activity semantic recognition, can recognize the head action semantic of a dummy language user, and enriches the dummy language recognition system.
The existing recognition system of the dummy language is homogeneous and single, and the expressed semantic meaning of the dummy language is judged mostly only by detecting and recognizing the gesture of a dummy language user. In real life, however, the mute user can express rich semantics through head movements and transmit more various comprehensive semantics by combining gestures. Therefore, in the whisper system, it becomes more important to recognize the semantic meaning of the head action. However, the current mature dummy speech recognition system does not have the function of detecting and recognizing the semantic of the head motion, so the effect in practical application is not good.
The invention provides a system and a method for recognizing head action semantics facing a dummy language system. System background of the invention: the camera unit is used for capturing the video data of the user in the mute language, the processor unit is used for detecting and identifying the head action semantics of the user, and the display unit is used for outputting a processing result. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized.
Because the body posture of the mute user is relatively fixed and the illumination change of the environment is relatively small, the camera unit is used for capturing video data and is communicated with the processor unit, such as USB, RS-485, WiFi, Bluetooth and the like, so that the head action semantics can be recognized, the method is completely feasible, and the practical application environment has universality.
Disclosure of Invention
The invention aims to provide a system and a method for recognizing head action semantics facing a dummy language system, aiming at the limitation of the current dummy language recognition system in practical application. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.
In order to achieve the purpose, the invention adopts the following technical scheme:
a system and a method for recognizing the semantic of head action based on a mute system can recognize the whole head action of a mute user, and can also recognize the detail actions of five sense organs such as eyes and mouth of the mute user, thereby realizing more complete semantic recognition. The system mainly comprises a camera unit, a processor unit and a display unit, and is characterized in that: the camera units are connected with the processor unit in a wired or wireless mode, and the number n of the camera units is at least 1; the processor unit is connected with the display unit in a wired or wireless mode, and the number m of the display units is at least 1.
The number n of the camera units may vary depending on the size of the system, but is at least 1.
The structure of the processor unit: the device comprises a data receiving module, a filtering and noise reducing module, a head action detecting module, a data cutting module, a head action identifying module, a five sense organs action identifying module, a semantic generating module and a data sending module. The data receiving module is connected with the filtering and noise reducing module in a wired mode, the filtering and noise reducing module is connected with the head action detecting module in a wired mode, the head action detecting module is connected with the data cutting module in a wired mode, the data cutting module is connected with the head action identifying module in a wired mode, the data cutting module is connected with the five sense organ action identifying module in a wired mode, the head action identifying module is connected with the semantic generating module in a wired mode, the five sense organ action identifying module is connected with the semantic generating module in a wired mode, the semantic generating module is connected with the data sending module in a wired mode, and the head action detecting module is connected with the data sending module in a wired mode.
The number m of the above-mentioned display units may vary depending on the size of the system scale, but is at least 1.
The head action semantic recognition system facing the mute system can efficiently and accurately recognize the head action semantic of the mute user.
A method for recognizing the head action semantic facing to a mute language system is operated by adopting the system and is characterized in that: the head action semantic recognition flow is embodied by a processor unit detecting and recognizing the head action semantic flow.
The processor unit detects and identifies a semantic flow of the head action:
1) the data receiving module is used for receiving the video data transmitted by the camera unit and transmitting the video data to the filtering and noise reducing module;
2) the filtering and noise reducing module filters noise in the video data and improves the reliability of the data;
3) the head action detection module detects and judges whether the video contains head action, and if not, the data sending module directly sends an unidentified result to the display unit; if yes, sending the video to a data clipping module;
4) the data cutting module is used for preprocessing data, selecting key frames in the video and improving the operation processing speed of the system;
5) the head action recognition module recognizes the whole head action in the video and classifies the head action; the five sense organs action recognition module recognizes the detail action of the five sense organs of the head in the video and classifies the details action of the five sense organs;
6) the semantic generation module converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic description;
7) and the data sending module sends the final semantic description result to the display unit to realize the semantic recognition of the head action.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:
the system of the invention consists of a camera unit, a processor unit and a display unit; the method comprises the step that a processor unit detects and identifies a semantic flow of the head action. The invention can not only identify the whole head action of the mute user, but also identify the detail action of the eyes, the mouth and other five sense organs of the mute user, and realize more complete semantic identification. Because the body posture of the mute user is relatively fixed and the illumination change of the environment is relatively small, the camera unit is used for capturing video data and is communicated with the processor unit, such as USB, RS-485, WiFi, Bluetooth and the like, so that the head action semantics can be recognized, the method is completely feasible, and the practical application environment has universality.
Drawings
Fig. 1 is a schematic structural diagram of a semantic recognition system for head movement based on a whispering system according to a first embodiment of the present invention.
Fig. 2 is a block diagram of a processor unit for implementing semantic recognition of a head action in a dummy language oriented system according to a second embodiment of the present invention.
FIG. 3 is a flowchart of a work flow of implementing a processor unit to detect and recognize a semantic of a head action according to a third embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings:
example one
Referring to fig. 1, the semantic recognition system for the head movement facing the dumb language system is composed of a camera unit (1.1, 1.2, …, 1.n), a processor unit 2 and a display unit (3.1, 3.2, …, 3.m), wherein the camera unit (1.1, 1.2, …, 1.n) is connected with the processor unit 2 in a wired or wireless manner; the processor unit 2 is connected to the display units (3.1, 3.2, …, 3.m) in a wired or wireless manner. The camera units (1.1, 1.2, …, 1.n) can vary in size according to the scale of the system, but are at least 1; the display units (3.1, 3.2, …, 3.m) can vary in size according to the scale of the system, but are at least 1.
Example two
This embodiment is substantially the same as the first embodiment, and is characterized in that:
referring to fig. 2, the structure of the processor unit 2: the device comprises a data receiving module 4, a filtering and noise reducing module 5, a head action detecting module 6, a data clipping module 7, a head action identifying module 8, a five sense organs action identifying module 9, a semantic generating module 10 and a data sending module 11. The data receiving module 4 is connected with the filtering and noise reducing module 5 in a wired manner, the filtering and noise reducing module 5 is connected with the head action detecting module 6 in a wired manner, the head action detecting module 6 is connected with the data clipping module 7 in a wired manner, the data clipping module 7 is connected with the head action identifying module 8 in a wired manner, the data clipping module 7 is connected with the facial features action identifying module 9 in a wired manner, the head action identifying module 8 is connected with the semantic generating module 10 in a wired manner, the facial features action identifying module 9 is connected with the semantic generating module 10 in a wired manner, the semantic generating module 10 is connected with the data sending module 11 in a wired manner, and the head action detecting module 6 is connected with the data sending module 11 in a wired manner.
EXAMPLE III
The method for recognizing the head action semantics facing the dummy language system is operated by adopting the system. The method is characterized in that the head action semantic recognition flow is embodied by a processor unit detecting and recognizing the head action semantic flow.
Referring to fig. 3, the processor unit detects a semantic flow of recognizing head actions:
1) the data receiving module 4 is used for receiving the video data transmitted by the camera units (1.1, 1.2, …, 1.n) and sending the video data to the filtering and noise reducing module 5;
2) the filtering and noise reducing module 5 filters noise in the video data, and improves the reliability of the data;
3) the head action detection module 6 detects and judges whether the video contains head action, if not, the data sending module 11 directly sends the 'unidentified' result to the display unit (3.1, 3.2, …, 3. m); if yes, sending the video to a data clipping module 7;
4) the data cutting module 7 preprocesses the data, selects key frames in the video and improves the operation processing speed of the system;
5) the head action recognition module 8 recognizes the whole head action in the video and classifies the head action; the five sense organs action recognition module 9 recognizes the detail action of the five sense organs of the head in the video and classifies the details action;
6) the semantic generation module 10 converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic descriptions;
7) the data sending module 11 sends the final semantic description result to the display units (3.1, 3.2, …, 3.m) to realize semantic recognition of the head action.
In the above embodiment of the invention, the camera unit is used for capturing the video data of the mute user, the processor unit is used for detecting and identifying the head action semantics of the mute user, and the display unit is used for outputting the processing result. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.
Claims (3)
1. A semantic recognition system for head action facing to a mute language system is composed of n camera units (1.1, 1.2, …, 1.n), a processor unit (2) and m display units (3.1, 3.2, …, 3.m), and is characterized in that: the camera units (1.1, 1.2, …, 1.n) are connected with the processor unit (2) in a wired or wireless mode, and the number n of the camera units is at least 1; the processor unit (2) is connected in a wired or wireless manner to display units (3.1, 3.2, …, 3.m), the number of which m is at least 1.
2. The matte-language-system-oriented head action semantic recognition system of claim 1, wherein: the processor unit (2) consists of a data receiving module (4), a filtering and noise reducing module (5), a head action detecting module (6), a data cutting module (7), a head action identifying module (8), a five sense organ action identifying module (9), a semantic generating module (10) and a data sending module (11), wherein the data receiving module (4) is connected with the filtering and noise reducing module (5) in a wired mode, the filtering and noise reducing module (5) is connected with the head action detecting module (6) in a wired mode, the head action detecting module (6) is connected with the data cutting module (7) in a wired mode, the data cutting module (7) is connected with the head action identifying module (8) in a wired mode, the data cutting module (7) is connected with the five sense organ action identifying module (9) in a wired mode, and the head action identifying module (8) is connected with the semantic generating module (10) in a wired mode, the five sense organs action recognition module (9) is connected with the semantic generation module (10) in a wired mode, the semantic generation module (10) is connected with the data transmission module (11) in a wired mode, and the head action detection module (6) is connected with the data transmission module (11) in a wired mode.
3. A method for recognizing semantic head movements for a whisper system, which is operated by the system for recognizing semantic head movements for a whisper system as claimed in claim 1, wherein: the head semantic identification flow is embodied by a flow of detecting and identifying head action semantics by a processor unit; the processor unit detects and identifies a head action semantic flow:
1) the data receiving module (4) is used for receiving the video data transmitted by the camera units (1.1, 1.2, …, 1.n) and sending the video data to the filtering and noise reducing module (5); the filtering and noise reducing module (5) filters noise in the video data and improves the reliability of the data;
2) the head action detection module (6) detects and judges whether the video contains head action, if not, the data transmission module (10) directly transmits the 'unidentified' result to the display units (3.1, 3.2, …, 3. m); if yes, sending the video to a data clipping module (7);
3) the data cutting module (7) preprocesses the data, selects key frames in the video and improves the operation processing speed of the system;
4) the head action recognition module (8) recognizes the whole head action in the video and classifies the head action;
5) the five sense organ action recognition module (9) recognizes the detail action of the five sense organs of the head in the video and classifies the detail action;
6) the semantic generation module (10) converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic descriptions;
7) and the data sending module (11) sends the final semantic description result to the display units (3.1, 3.2, … and 3.m) to realize the semantic recognition of the head action.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010332961.8A CN111611868A (en) | 2020-04-24 | 2020-04-24 | System and method for recognizing head action semantics facing to dumb language system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010332961.8A CN111611868A (en) | 2020-04-24 | 2020-04-24 | System and method for recognizing head action semantics facing to dumb language system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111611868A true CN111611868A (en) | 2020-09-01 |
Family
ID=72204679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010332961.8A Pending CN111611868A (en) | 2020-04-24 | 2020-04-24 | System and method for recognizing head action semantics facing to dumb language system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111611868A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117117A (en) * | 2010-01-06 | 2011-07-06 | 致伸科技股份有限公司 | System and method for control through identifying user posture by image extraction device |
CN103440640A (en) * | 2013-07-26 | 2013-12-11 | 北京理工大学 | Method for clustering and browsing video scenes |
CN108470206A (en) * | 2018-02-11 | 2018-08-31 | 北京光年无限科技有限公司 | Head exchange method based on visual human and system |
CN110334600A (en) * | 2019-06-03 | 2019-10-15 | 武汉工程大学 | A kind of multiple features fusion driver exception expression recognition method |
CN110688921A (en) * | 2019-09-17 | 2020-01-14 | 东南大学 | Method for detecting smoking behavior of driver based on human body action recognition technology |
CN110931042A (en) * | 2019-11-14 | 2020-03-27 | 北京欧珀通信有限公司 | Simultaneous interpretation method and device, electronic equipment and storage medium |
-
2020
- 2020-04-24 CN CN202010332961.8A patent/CN111611868A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102117117A (en) * | 2010-01-06 | 2011-07-06 | 致伸科技股份有限公司 | System and method for control through identifying user posture by image extraction device |
CN103440640A (en) * | 2013-07-26 | 2013-12-11 | 北京理工大学 | Method for clustering and browsing video scenes |
CN108470206A (en) * | 2018-02-11 | 2018-08-31 | 北京光年无限科技有限公司 | Head exchange method based on visual human and system |
CN110334600A (en) * | 2019-06-03 | 2019-10-15 | 武汉工程大学 | A kind of multiple features fusion driver exception expression recognition method |
CN110688921A (en) * | 2019-09-17 | 2020-01-14 | 东南大学 | Method for detecting smoking behavior of driver based on human body action recognition technology |
CN110931042A (en) * | 2019-11-14 | 2020-03-27 | 北京欧珀通信有限公司 | Simultaneous interpretation method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106057205B (en) | Automatic voice interaction method for intelligent robot | |
US11854550B2 (en) | Determining input for speech processing engine | |
KR20190084789A (en) | Electric terminal and method for controlling the same | |
CN105139858B (en) | A kind of information processing method and electronic equipment | |
CN102932212A (en) | Intelligent household control system based on multichannel interaction manner | |
CN104125523A (en) | Dynamic earphone system and application method thereof | |
EP2118722A1 (en) | Controlling a document based on user behavioral signals detected from a 3d captured image stream | |
CN112052333B (en) | Text classification method and device, storage medium and electronic equipment | |
US11281302B2 (en) | Gesture based data capture and analysis device and system | |
CN111696562B (en) | Voice wake-up method, device and storage medium | |
CN112434139A (en) | Information interaction method and device, electronic equipment and storage medium | |
US20210110815A1 (en) | Method and apparatus for determining semantic meaning of pronoun | |
KR102353486B1 (en) | Mobile terminal and method for controlling the same | |
US20230386461A1 (en) | Voice user interface using non-linguistic input | |
CN113671846B (en) | Intelligent device control method and device, wearable device and storage medium | |
CN113571053A (en) | Voice wake-up method and device | |
CN114610158B (en) | Data processing method and device, electronic equipment and storage medium | |
CN115206306A (en) | Voice interaction method, device, equipment and system | |
WO2016206644A1 (en) | Robot control engine and system | |
KR102592613B1 (en) | Automatic interpretation server and method thereof | |
CN111611868A (en) | System and method for recognizing head action semantics facing to dumb language system | |
EP4231267B1 (en) | Transportation vehicle type identification method and apparatus | |
CN111985252A (en) | Dialogue translation method and device, storage medium and electronic equipment | |
CN106815264B (en) | Information processing method and system | |
CN117198286A (en) | Voice interaction method and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200901 |