CN111611868A - System and method for recognizing head action semantics facing to dumb language system - Google Patents

System and method for recognizing head action semantics facing to dumb language system Download PDF

Info

Publication number
CN111611868A
CN111611868A CN202010332961.8A CN202010332961A CN111611868A CN 111611868 A CN111611868 A CN 111611868A CN 202010332961 A CN202010332961 A CN 202010332961A CN 111611868 A CN111611868 A CN 111611868A
Authority
CN
China
Prior art keywords
module
action
head
semantic
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010332961.8A
Other languages
Chinese (zh)
Inventor
林羽晨
张金艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202010332961.8A priority Critical patent/CN111611868A/en
Publication of CN111611868A publication Critical patent/CN111611868A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a system and a method for recognizing head action semantics facing a dummy language system. The system consists of a camera unit, a processor unit and a display unit; the operation steps of the method are embodied by a semantic flow for detecting and identifying the head action by a processor unit. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.

Description

System and method for recognizing head action semantics facing to dumb language system
Technical Field
The invention relates to a video semantic recognition system and a method, in particular to a head action semantic recognition system and a method for a mute system.
Background
Video semantic recognition is always a research hotspot concerned by the academic and industrial circles, and has great value in the field of human-computer interaction. Since the 21 st century, video semantic recognition technology has become the focus and key point of deployment of government policies in our country. Active semantic recognition is an important branch of video semantic recognition, which can be applied across numerous fields. The head action semantic recognition facing the dummy language system is an important application field of the activity semantic recognition, can recognize the head action semantic of a dummy language user, and enriches the dummy language recognition system.
The existing recognition system of the dummy language is homogeneous and single, and the expressed semantic meaning of the dummy language is judged mostly only by detecting and recognizing the gesture of a dummy language user. In real life, however, the mute user can express rich semantics through head movements and transmit more various comprehensive semantics by combining gestures. Therefore, in the whisper system, it becomes more important to recognize the semantic meaning of the head action. However, the current mature dummy speech recognition system does not have the function of detecting and recognizing the semantic of the head motion, so the effect in practical application is not good.
The invention provides a system and a method for recognizing head action semantics facing a dummy language system. System background of the invention: the camera unit is used for capturing the video data of the user in the mute language, the processor unit is used for detecting and identifying the head action semantics of the user, and the display unit is used for outputting a processing result. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized.
Because the body posture of the mute user is relatively fixed and the illumination change of the environment is relatively small, the camera unit is used for capturing video data and is communicated with the processor unit, such as USB, RS-485, WiFi, Bluetooth and the like, so that the head action semantics can be recognized, the method is completely feasible, and the practical application environment has universality.
Disclosure of Invention
The invention aims to provide a system and a method for recognizing head action semantics facing a dummy language system, aiming at the limitation of the current dummy language recognition system in practical application. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.
In order to achieve the purpose, the invention adopts the following technical scheme:
a system and a method for recognizing the semantic of head action based on a mute system can recognize the whole head action of a mute user, and can also recognize the detail actions of five sense organs such as eyes and mouth of the mute user, thereby realizing more complete semantic recognition. The system mainly comprises a camera unit, a processor unit and a display unit, and is characterized in that: the camera units are connected with the processor unit in a wired or wireless mode, and the number n of the camera units is at least 1; the processor unit is connected with the display unit in a wired or wireless mode, and the number m of the display units is at least 1.
The number n of the camera units may vary depending on the size of the system, but is at least 1.
The structure of the processor unit: the device comprises a data receiving module, a filtering and noise reducing module, a head action detecting module, a data cutting module, a head action identifying module, a five sense organs action identifying module, a semantic generating module and a data sending module. The data receiving module is connected with the filtering and noise reducing module in a wired mode, the filtering and noise reducing module is connected with the head action detecting module in a wired mode, the head action detecting module is connected with the data cutting module in a wired mode, the data cutting module is connected with the head action identifying module in a wired mode, the data cutting module is connected with the five sense organ action identifying module in a wired mode, the head action identifying module is connected with the semantic generating module in a wired mode, the five sense organ action identifying module is connected with the semantic generating module in a wired mode, the semantic generating module is connected with the data sending module in a wired mode, and the head action detecting module is connected with the data sending module in a wired mode.
The number m of the above-mentioned display units may vary depending on the size of the system scale, but is at least 1.
The head action semantic recognition system facing the mute system can efficiently and accurately recognize the head action semantic of the mute user.
A method for recognizing the head action semantic facing to a mute language system is operated by adopting the system and is characterized in that: the head action semantic recognition flow is embodied by a processor unit detecting and recognizing the head action semantic flow.
The processor unit detects and identifies a semantic flow of the head action:
1) the data receiving module is used for receiving the video data transmitted by the camera unit and transmitting the video data to the filtering and noise reducing module;
2) the filtering and noise reducing module filters noise in the video data and improves the reliability of the data;
3) the head action detection module detects and judges whether the video contains head action, and if not, the data sending module directly sends an unidentified result to the display unit; if yes, sending the video to a data clipping module;
4) the data cutting module is used for preprocessing data, selecting key frames in the video and improving the operation processing speed of the system;
5) the head action recognition module recognizes the whole head action in the video and classifies the head action; the five sense organs action recognition module recognizes the detail action of the five sense organs of the head in the video and classifies the details action of the five sense organs;
6) the semantic generation module converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic description;
7) and the data sending module sends the final semantic description result to the display unit to realize the semantic recognition of the head action.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:
the system of the invention consists of a camera unit, a processor unit and a display unit; the method comprises the step that a processor unit detects and identifies a semantic flow of the head action. The invention can not only identify the whole head action of the mute user, but also identify the detail action of the eyes, the mouth and other five sense organs of the mute user, and realize more complete semantic identification. Because the body posture of the mute user is relatively fixed and the illumination change of the environment is relatively small, the camera unit is used for capturing video data and is communicated with the processor unit, such as USB, RS-485, WiFi, Bluetooth and the like, so that the head action semantics can be recognized, the method is completely feasible, and the practical application environment has universality.
Drawings
Fig. 1 is a schematic structural diagram of a semantic recognition system for head movement based on a whispering system according to a first embodiment of the present invention.
Fig. 2 is a block diagram of a processor unit for implementing semantic recognition of a head action in a dummy language oriented system according to a second embodiment of the present invention.
FIG. 3 is a flowchart of a work flow of implementing a processor unit to detect and recognize a semantic of a head action according to a third embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention are described in detail below with reference to the accompanying drawings:
example one
Referring to fig. 1, the semantic recognition system for the head movement facing the dumb language system is composed of a camera unit (1.1, 1.2, …, 1.n), a processor unit 2 and a display unit (3.1, 3.2, …, 3.m), wherein the camera unit (1.1, 1.2, …, 1.n) is connected with the processor unit 2 in a wired or wireless manner; the processor unit 2 is connected to the display units (3.1, 3.2, …, 3.m) in a wired or wireless manner. The camera units (1.1, 1.2, …, 1.n) can vary in size according to the scale of the system, but are at least 1; the display units (3.1, 3.2, …, 3.m) can vary in size according to the scale of the system, but are at least 1.
Example two
This embodiment is substantially the same as the first embodiment, and is characterized in that:
referring to fig. 2, the structure of the processor unit 2: the device comprises a data receiving module 4, a filtering and noise reducing module 5, a head action detecting module 6, a data clipping module 7, a head action identifying module 8, a five sense organs action identifying module 9, a semantic generating module 10 and a data sending module 11. The data receiving module 4 is connected with the filtering and noise reducing module 5 in a wired manner, the filtering and noise reducing module 5 is connected with the head action detecting module 6 in a wired manner, the head action detecting module 6 is connected with the data clipping module 7 in a wired manner, the data clipping module 7 is connected with the head action identifying module 8 in a wired manner, the data clipping module 7 is connected with the facial features action identifying module 9 in a wired manner, the head action identifying module 8 is connected with the semantic generating module 10 in a wired manner, the facial features action identifying module 9 is connected with the semantic generating module 10 in a wired manner, the semantic generating module 10 is connected with the data sending module 11 in a wired manner, and the head action detecting module 6 is connected with the data sending module 11 in a wired manner.
EXAMPLE III
The method for recognizing the head action semantics facing the dummy language system is operated by adopting the system. The method is characterized in that the head action semantic recognition flow is embodied by a processor unit detecting and recognizing the head action semantic flow.
Referring to fig. 3, the processor unit detects a semantic flow of recognizing head actions:
1) the data receiving module 4 is used for receiving the video data transmitted by the camera units (1.1, 1.2, …, 1.n) and sending the video data to the filtering and noise reducing module 5;
2) the filtering and noise reducing module 5 filters noise in the video data, and improves the reliability of the data;
3) the head action detection module 6 detects and judges whether the video contains head action, if not, the data sending module 11 directly sends the 'unidentified' result to the display unit (3.1, 3.2, …, 3. m); if yes, sending the video to a data clipping module 7;
4) the data cutting module 7 preprocesses the data, selects key frames in the video and improves the operation processing speed of the system;
5) the head action recognition module 8 recognizes the whole head action in the video and classifies the head action; the five sense organs action recognition module 9 recognizes the detail action of the five sense organs of the head in the video and classifies the details action;
6) the semantic generation module 10 converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic descriptions;
7) the data sending module 11 sends the final semantic description result to the display units (3.1, 3.2, …, 3.m) to realize semantic recognition of the head action.
In the above embodiment of the invention, the camera unit is used for capturing the video data of the mute user, the processor unit is used for detecting and identifying the head action semantics of the mute user, and the display unit is used for outputting the processing result. By adopting the method and the device, the overall head action of the mute user can be recognized, the detail actions of the eyes, the mouth and other five sense organs of the mute user can be recognized, and more complete semantic recognition is realized. The system is simple and visual, the method is excellent in performance, and the applied mute recognition scene has universality.

Claims (3)

1. A semantic recognition system for head action facing to a mute language system is composed of n camera units (1.1, 1.2, …, 1.n), a processor unit (2) and m display units (3.1, 3.2, …, 3.m), and is characterized in that: the camera units (1.1, 1.2, …, 1.n) are connected with the processor unit (2) in a wired or wireless mode, and the number n of the camera units is at least 1; the processor unit (2) is connected in a wired or wireless manner to display units (3.1, 3.2, …, 3.m), the number of which m is at least 1.
2. The matte-language-system-oriented head action semantic recognition system of claim 1, wherein: the processor unit (2) consists of a data receiving module (4), a filtering and noise reducing module (5), a head action detecting module (6), a data cutting module (7), a head action identifying module (8), a five sense organ action identifying module (9), a semantic generating module (10) and a data sending module (11), wherein the data receiving module (4) is connected with the filtering and noise reducing module (5) in a wired mode, the filtering and noise reducing module (5) is connected with the head action detecting module (6) in a wired mode, the head action detecting module (6) is connected with the data cutting module (7) in a wired mode, the data cutting module (7) is connected with the head action identifying module (8) in a wired mode, the data cutting module (7) is connected with the five sense organ action identifying module (9) in a wired mode, and the head action identifying module (8) is connected with the semantic generating module (10) in a wired mode, the five sense organs action recognition module (9) is connected with the semantic generation module (10) in a wired mode, the semantic generation module (10) is connected with the data transmission module (11) in a wired mode, and the head action detection module (6) is connected with the data transmission module (11) in a wired mode.
3. A method for recognizing semantic head movements for a whisper system, which is operated by the system for recognizing semantic head movements for a whisper system as claimed in claim 1, wherein: the head semantic identification flow is embodied by a flow of detecting and identifying head action semantics by a processor unit; the processor unit detects and identifies a head action semantic flow:
1) the data receiving module (4) is used for receiving the video data transmitted by the camera units (1.1, 1.2, …, 1.n) and sending the video data to the filtering and noise reducing module (5); the filtering and noise reducing module (5) filters noise in the video data and improves the reliability of the data;
2) the head action detection module (6) detects and judges whether the video contains head action, if not, the data transmission module (10) directly transmits the 'unidentified' result to the display units (3.1, 3.2, …, 3. m); if yes, sending the video to a data clipping module (7);
3) the data cutting module (7) preprocesses the data, selects key frames in the video and improves the operation processing speed of the system;
4) the head action recognition module (8) recognizes the whole head action in the video and classifies the head action;
5) the five sense organ action recognition module (9) recognizes the detail action of the five sense organs of the head in the video and classifies the detail action;
6) the semantic generation module (10) converts the classified integral head action and the five sense organ detail action into head semantics with actual significance and generates corresponding semantic descriptions;
7) and the data sending module (11) sends the final semantic description result to the display units (3.1, 3.2, … and 3.m) to realize the semantic recognition of the head action.
CN202010332961.8A 2020-04-24 2020-04-24 System and method for recognizing head action semantics facing to dumb language system Pending CN111611868A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010332961.8A CN111611868A (en) 2020-04-24 2020-04-24 System and method for recognizing head action semantics facing to dumb language system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010332961.8A CN111611868A (en) 2020-04-24 2020-04-24 System and method for recognizing head action semantics facing to dumb language system

Publications (1)

Publication Number Publication Date
CN111611868A true CN111611868A (en) 2020-09-01

Family

ID=72204679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010332961.8A Pending CN111611868A (en) 2020-04-24 2020-04-24 System and method for recognizing head action semantics facing to dumb language system

Country Status (1)

Country Link
CN (1) CN111611868A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117117A (en) * 2010-01-06 2011-07-06 致伸科技股份有限公司 System and method for control through identifying user posture by image extraction device
CN103440640A (en) * 2013-07-26 2013-12-11 北京理工大学 Method for clustering and browsing video scenes
CN108470206A (en) * 2018-02-11 2018-08-31 北京光年无限科技有限公司 Head exchange method based on visual human and system
CN110334600A (en) * 2019-06-03 2019-10-15 武汉工程大学 A kind of multiple features fusion driver exception expression recognition method
CN110688921A (en) * 2019-09-17 2020-01-14 东南大学 Method for detecting smoking behavior of driver based on human body action recognition technology
CN110931042A (en) * 2019-11-14 2020-03-27 北京欧珀通信有限公司 Simultaneous interpretation method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117117A (en) * 2010-01-06 2011-07-06 致伸科技股份有限公司 System and method for control through identifying user posture by image extraction device
CN103440640A (en) * 2013-07-26 2013-12-11 北京理工大学 Method for clustering and browsing video scenes
CN108470206A (en) * 2018-02-11 2018-08-31 北京光年无限科技有限公司 Head exchange method based on visual human and system
CN110334600A (en) * 2019-06-03 2019-10-15 武汉工程大学 A kind of multiple features fusion driver exception expression recognition method
CN110688921A (en) * 2019-09-17 2020-01-14 东南大学 Method for detecting smoking behavior of driver based on human body action recognition technology
CN110931042A (en) * 2019-11-14 2020-03-27 北京欧珀通信有限公司 Simultaneous interpretation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106057205B (en) Automatic voice interaction method for intelligent robot
US11854550B2 (en) Determining input for speech processing engine
CN109309751B (en) Voice recording method, electronic device and storage medium
CN105139858B (en) A kind of information processing method and electronic equipment
KR20190084789A (en) Electric terminal and method for controlling the same
CN102932212A (en) Intelligent household control system based on multichannel interaction manner
CN104125523A (en) Dynamic earphone system and application method thereof
WO2008084034A1 (en) Controlling a document based on user behavioral signals detected from a 3d captured image stream
CN112052333B (en) Text classification method and device, storage medium and electronic equipment
CN111696562B (en) Voice wake-up method, device and storage medium
KR102353486B1 (en) Mobile terminal and method for controlling the same
US20230386461A1 (en) Voice user interface using non-linguistic input
US11281302B2 (en) Gesture based data capture and analysis device and system
WO2022199500A1 (en) Model training method, scene recognition method, and related device
CN113671846B (en) Intelligent device control method and device, wearable device and storage medium
CN109032345A (en) Apparatus control method, device, equipment, server-side and storage medium
CN113571053A (en) Voice wake-up method and device
CN115291724A (en) Man-machine interaction method and device, storage medium and electronic equipment
WO2016206644A1 (en) Robot control engine and system
Sneha et al. Smartphone based emotion recognition and classification
CN114610158A (en) Data processing method and device, electronic equipment and storage medium
KR102592613B1 (en) Automatic interpretation server and method thereof
CN111611868A (en) System and method for recognizing head action semantics facing to dumb language system
CN115206306A (en) Voice interaction method, device, equipment and system
CN106815264B (en) Information processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200901