CN114924645A - Interaction method and system based on gesture recognition - Google Patents

Interaction method and system based on gesture recognition Download PDF

Info

Publication number
CN114924645A
CN114924645A CN202210542825.0A CN202210542825A CN114924645A CN 114924645 A CN114924645 A CN 114924645A CN 202210542825 A CN202210542825 A CN 202210542825A CN 114924645 A CN114924645 A CN 114924645A
Authority
CN
China
Prior art keywords
human
gesture
area
speaker
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210542825.0A
Other languages
Chinese (zh)
Inventor
徐东升
丁为国
朱雷震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhuangsheng Xiaomeng Information Technology Co ltd
Original Assignee
Shanghai Zhuangsheng Xiaomeng Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhuangsheng Xiaomeng Information Technology Co ltd filed Critical Shanghai Zhuangsheng Xiaomeng Information Technology Co ltd
Priority to CN202210542825.0A priority Critical patent/CN114924645A/en
Publication of CN114924645A publication Critical patent/CN114924645A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an interaction method and system based on gesture recognition, wherein the interaction method based on gesture recognition comprises the following steps: acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream; performing human shape detection on the image frame to determine human shape regions in the image frame; and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result. According to the method and the device, the focusing area is determined according to the gesture detection result, so that the target area in the conference is focused in real time. If the target area is set as a human-shaped area of the speaker or the participant, the speaker or the participant in the conference can be effectively focused in real time.

Description

Interaction method and system based on gesture recognition
Technical Field
The invention relates to the technical field of artificial intelligence recognition interaction, in particular to an interaction method and system based on gesture recognition.
Background
With the high-speed development of artificial intelligence technology, the visual field makes a major breakthrough, and visual algorithm technologies such as face recognition, target detection, target tracking and the like are widely applied in various industries. At present, in a conference interaction mode, an intelligent conference interaction mode is a trend of future development. The pure voice and pure video interaction mode in the traditional online conference is too monotonous, and when the picture displayed in the traditional online conference contains too much background, the participants can not be effectively focused, and the speaker can not be highlighted in the multi-person conference scene.
Therefore, the invention provides an interaction method and system based on gesture recognition, so as to effectively focus on a speaker and participants in a conference.
Disclosure of Invention
The invention provides an interaction method and an interaction system based on gesture recognition, which are used for effectively focusing a speaker and participants in a conference.
In a first aspect, the present invention provides an interaction method based on gesture recognition, including: acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream; performing human shape detection on the image frame to determine human shape regions in the image frame; and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result.
The beneficial effects are that: according to the method and the device, the focusing area is determined according to the gesture detection result, so that the target area in the conference is focused in real time. If the target area is set as a human-shaped area of the speaker or the participant, the speaker or the participant in the conference can be effectively focused in real time.
Optionally, the performing gesture detection on the humanoid region includes: executing gesture detection on the human-shaped area, if a first gesture is detected, determining that the human-shaped area containing the first gesture is a human-shaped area of a speaker, and the human-shaped area not containing the first gesture is a human-shaped area of a participant, and executing real-time focusing processing on the human-shaped area of the speaker; and if the second gesture is detected, performing focusing processing on the human-shaped area of the participant. The beneficial effects are that: according to the first gesture and the second gesture of the speaker stroke, the focusing area can be freely switched, so that the effect of an online conference is achieved.
Further optionally, the performing gesture detection on the humanoid region further includes: and if the first gesture and the second gesture are not detected, executing real-time focusing processing on the human-shaped area of the participant. The beneficial effects are that: by performing real-time focusing on the human-shaped areas of the participants, the privacy of the participants can be protected, and some useless information or interference information can be effectively shielded.
Optionally, the performing real-time focusing processing on the human-shaped area of the speaker comprises: performing face detection on the human-shaped area of the speaker, and determining the facial features of the speaker according to the result of the face detection; performing face recognition on the image frame based on facial features of the speaker when a human-shaped region of the speaker is not detected; and determining a humanoid area containing the speaker based on the result of the face recognition, and performing real-time focusing processing on the humanoid area containing the speaker. The beneficial effects are that: when the human shape detection of the speaker fails due to reasons such as shielding, the human shape area of the speaker can be determined again according to the result of face recognition, and real-time focusing is performed to prevent the loss of a focusing target.
Further optionally, the performing face recognition on the image frame based on the facial features of the speaker comprises: and if the face of the speaker is not recognized, quitting the real-time focusing processing of the human-shaped area of the speaker, and executing the real-time focusing processing on the human-shaped area of the participant. The beneficial effects are that: if the face of the speaker is not detected, which indicates that the speaker may temporarily leave the conference, the focus of the human-shaped area of the participant is switched.
Optionally, the interaction method based on gesture recognition further includes: arranging anti-shake areas around the human-shaped areas of the speaker and the participants; if the human shape position of the speaker exceeds the anti-shake area, human shape detection is carried out on the image frame again, and the human shape area of the speaker is determined again according to a detection result; and if the position of the human figure of the participant exceeds the anti-shake area, re-performing human figure detection on the image frame, and re-determining the human figure area of the participant according to the detection result. The beneficial effects are that: since the position of the person may change, such as lowering the head to take notes, or holding a cup, or suddenly standing or sitting down, it is necessary to set the anti-shake range to control the human figure region within a reasonable range and reduce the number of times of human figure re-detection.
Further optionally, if the position of the speaker does not exceed the anti-shake area, locking the human-shaped area of the speaker; and if the position of the participant does not exceed the anti-shake area, locking the human-shaped area of the participant.
Optionally, the performing real-time focusing on the humanoid region of the speaker comprises: and performing feature extraction on the humanoid area of the speaker, and predicting the action track of the speaker in the next frame based on the detected first gesture so as to realize real-time focusing processing on the humanoid area of the speaker.
In a second aspect, the invention is directed to an interaction system based on gesture recognition, configured to perform the interaction method based on gesture recognition according to any one of the first aspect, the system comprising modules/units for performing the method according to any one of the possible designs of the first aspect. These modules/units may be implemented by hardware, or by hardware executing corresponding software.
As for the advantageous effects of the above second aspect, reference may be made to the description of the above first aspect.
Drawings
FIG. 1 is a flowchart of an embodiment of an interaction method based on gesture recognition according to the present invention;
FIG. 2 is a schematic structural diagram of an embodiment of an interactive system based on gesture recognition according to the present invention;
fig. 3 is a schematic diagram of a screenshot of an online conference provided by the present invention.
Detailed Description
The technical solution in the embodiments of the present application is described below with reference to the drawings in the embodiments of the present application. In the description of the embodiments of the present application, the terminology used in the following embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an association relationship that associates objects, meaning that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise. The term "coupled" includes direct coupling and indirect coupling, unless otherwise noted. "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
In the embodiments of the present application, the words "exemplary" or "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
The invention provides an interaction method based on gesture recognition, the flow of which is shown in figure 1, and the method comprises the following steps:
s101: acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream;
s102: performing human shape detection on the image frame to determine human shape regions in the image frame;
s103: and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result.
In a possible embodiment, optionally, the performing gesture detection on the humanoid region comprises: executing gesture detection on the human-shaped area, if a first gesture is detected, determining that the human-shaped area containing the first gesture is a human-shaped area of a speaker, and the human-shaped area not containing the first gesture is a human-shaped area of a participant, and executing real-time focusing processing on the human-shaped area of the speaker; and if the second gesture is detected, performing focusing processing on the human-shaped area of the participant. In this embodiment, according to the first gesture and the second gesture of the speaker stroke, the focus area can be freely switched, so as to achieve the effect of an online conference.
In another possible embodiment, the performing gesture detection on the humanoid region further includes: and if the first gesture and the second gesture are not detected, executing real-time focusing processing on the human-shaped area of the participant. In the embodiment, the real-time focusing processing is performed on the human-shaped area of the participant, so that the privacy of the participant can be protected, and useless information or interference information can be effectively shielded.
Illustratively, a picture shot by a monocular camera is transmitted into a display to display a picture of a video conference, and the picture image is sent into a human shape and gesture detection model to identify the human shape and the gesture in the picture. If the first gesture is not found, displaying a dynamic real-time focusing picture of the participant; if the first gesture is found, the picture is dynamically focused to a gesture maker, namely a speaker, and meanwhile, only if the speaker makes a second gesture, the focusing mode of the speaker can be closed, and the dynamic real-time focusing picture of the participant is switched.
In a further possible embodiment, the performing a real-time focusing process on the humanoid region of the speaker comprises: performing face detection on the human-shaped area of the speaker, and determining the facial features of the speaker according to the result of the face detection; performing face recognition on the image frame based on facial features of the speaker when a human-shaped region of the speaker is not detected; and determining a human-shaped area containing the speaker based on the face recognition result, and performing real-time focusing processing on the human-shaped area containing the speaker. In this embodiment, when the human shape detection of the speaker fails due to a shielding or the like, the human shape area of the speaker may be determined again according to the result of the face recognition, and real-time focusing may be performed to prevent a focused target from being lost.
In one possible embodiment, the performing face recognition on the image frames based on facial features of the speaker comprises: and if the face of the speaker is not recognized, exiting the real-time focusing processing on the human-shaped area of the speaker, and executing the real-time focusing processing on the human-shaped area of the participant. In this embodiment, if the face of the presenter is not detected, which indicates that the presenter may temporarily leave the conference, the focus is switched to the participant.
In one possible embodiment, an anti-shake area is provided around the human-shaped area of the speaker and the human-shaped area of the participant; if the human shape position of the speaker exceeds the anti-shake area, human shape detection is carried out on the image frame again, and the human shape area of the speaker is determined again according to the detection result; and if the human shape position of the participant exceeds the anti-shake area, carrying out human shape detection on the image frame again, and determining the human shape area of the participant again according to the detection result. In this embodiment, since the position of the human may change, such as lowering the head to take a note, or taking a cup, or suddenly standing or sitting down, it is necessary to set the anti-shake range to control the human shape area within a reasonable range and reduce the number of times of human shape re-detection.
In yet another possible embodiment, the human-shaped area of the speaker is locked if the position of the speaker does not exceed the anti-shake area; and if the position of the participant does not exceed the anti-shake area, locking the human-shaped area of the participant.
Illustratively, since the position of the detected person is constantly changing, which may cause a slight jitter in the focused picture, the target position is controlled within a reasonable range, i.e., an anti-jitter range needs to be set. If the target position detected by the current frame is out of the range, the target position detected by the current frame is obtained again, otherwise, the target position is the target position of the previous frame, the current frame is i, the target position is x, and the jitter range is k, then:
Figure BDA0003650927480000061
optionally, the redundancy processing may be further performed, and based on the position of the speaker or the position of the participant, the widening processing is performed to make the display more reasonable, and if the current frame is i, the target size is y, and the redundancy coefficient is r, then: y is y r (r is more than or equal to 1); and then executing focusing processing on the target human-shaped area, wherein the focusing processing is a sliding focusing process, when the picture is focused from one position to another position, the picture is also slowly moved from one position to another position to form a picture dynamic silk sliding process, and if the current frame is i, the sliding coefficient is a, the range is (0, 1), the target position width w and the central point (cx, cy) slide according to the width and the central point, the following steps are carried out:
Figure BDA0003650927480000071
the actual picture being a final output displayThe height of the picture with fixed size is calculated according to the size proportion of the actual picture by the calculated width, and finally the picture is determined by the width, the height and the central point and is scaled in the same proportion to obtain the actual picture.
In one possible embodiment, the performing real-time focusing on the human-shaped area of the speaker comprises: and performing feature extraction on the humanoid area of the speaker, and predicting the action track of the speaker in the next frame based on the detected first gesture so as to realize real-time focusing processing on the humanoid area of the speaker.
Illustratively, the action track of the speaker in the next frame is predicted based on a method combining deep learning and traditional tracking ideas. Firstly, a tracking mode is started through gesture recognition, a speaker is recognized based on a detection algorithm, next frame track and characteristic information of the speaker are predicted through a Kalman filtering method and a ReID (pedestrian re-recognition) method, and then the track information and the characteristic information obtained through Hungary algorithm are matched with the speaker detected and recognized in the current frame.
The interaction method based on gesture recognition provided by the invention can get rid of the traditional remote control mode and realize intelligent control by executing the gesture command at a long distance.
The invention provides an interaction system based on gesture recognition, which is configured to execute the interaction method based on gesture recognition according to any one of the above embodiments, as shown in fig. 2, and the interaction system comprises: the system comprises an acquisition module 201, a human shape detection module 202, a gesture detection module 203 and a focusing module 204; the obtaining module 201 is configured to obtain a video data stream of a monitored area, and obtain an image frame from the video data stream; the human shape detection module 202 is configured to perform human shape detection on the image frame to determine a human shape region in the image frame; the gesture detection module 203 is configured to perform gesture detection on the human-shaped region, and the focusing module 204 is configured to determine a focusing region according to a result of the gesture detection.
In one possible embodiment, the gesture detection module comprises: a setting unit and a detection unit; the setting unit is used for setting a first gesture and a second gesture; the detection unit is used for executing gesture detection on the human-shaped area; if the first gesture is detected, determining that a humanoid area containing the first gesture is a humanoid area of a speaker, and a humanoid area not containing the first gesture is a humanoid area of a participant, wherein the focusing module executes real-time focusing processing on the humanoid area of the speaker; and if the second gesture is detected, the focusing module performs focusing processing on the human-shaped area of the participant.
Illustratively, as shown in fig. 3, the full screen of the online conference includes 5 participants including the speaker, a shape area 1 of the speaker, and shape areas 2-4 of the participants. If the first gesture is detected, the focusing module performs real-time focusing processing on the humanoid area 1 of the speaker; if a second gesture is detected, the focusing module performs focusing on the human-shaped regions 2-4 of the participants.
The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the embodiments of the present application should be covered within the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. An interaction method based on gesture recognition is characterized by comprising the following steps:
acquiring a video data stream of a monitoring area, and acquiring an image frame from the video data stream;
performing human shape detection on the image frame to determine human shape regions in the image frame;
and executing gesture detection on the human-shaped area, and determining a focusing area according to the gesture detection result.
2. The gesture recognition based interaction method according to claim 1, wherein the performing gesture detection on the humanoid region comprises:
executing gesture detection on the human-shaped area, if a first gesture is detected, determining that the human-shaped area containing the first gesture is a human-shaped area of a speaker, and the human-shaped area not containing the first gesture is a human-shaped area of a participant, and executing real-time focusing processing on the human-shaped area of the speaker; and if the second gesture is detected, performing focusing processing on the human-shaped area of the participant.
3. The gesture recognition based interaction method according to claim 2, wherein the performing gesture detection on the humanoid region further comprises: and if neither the first gesture nor the second gesture is detected, performing real-time focusing processing on the human-shaped area of the participant.
4. The interaction method based on gesture recognition according to claim 2, wherein the performing of real-time focusing processing on the humanoid region of the speaker comprises: executing face detection on the human-shaped area of the speaker, and determining the facial features of the speaker according to the face detection result;
performing face recognition on the image frame based on facial features of the speaker when a human-shaped region of the speaker is not detected;
and determining a humanoid area containing the speaker based on the result of the face recognition, and performing real-time focusing processing on the humanoid area containing the speaker.
5. The gesture recognition based interaction method according to claim 4, wherein the performing of the face recognition on the image frame based on the facial features of the speaker comprises:
and if the face of the speaker is not recognized, exiting the real-time focusing processing on the human-shaped area of the speaker, and executing the real-time focusing processing on the human-shaped area of the participant.
6. The gesture recognition based interaction method according to claim 2, further comprising: arranging anti-shake areas around the human-shaped area of the speaker and the human-shaped area of the participant; if the human shape position of the speaker exceeds the anti-shake area, human shape detection is carried out on the image frame again, and the human shape area of the speaker is determined again according to a detection result; and if the position of the human figure of the participant exceeds the anti-shake area, re-performing human figure detection on the image frame, and re-determining the human figure area of the participant according to the detection result.
7. The interaction method based on gesture recognition according to claim 6, wherein if the position of the speaker does not exceed the anti-shake area, the human-shaped area of the speaker is locked; and if the position of the participant does not exceed the anti-shake area, locking the human-shaped area of the participant.
8. The interaction method based on gesture recognition according to claim 2, wherein the performing of real-time focusing on the humanoid region of the speaker comprises:
and performing feature extraction on the humanoid area of the speaker, and predicting the action track of the speaker in the next frame based on the detected first gesture so as to realize real-time focusing processing on the humanoid area of the speaker.
9. A gesture recognition based interaction system configured to perform the gesture recognition based interaction method according to any one of claims 1 to 8, comprising: the system comprises an acquisition module, a human shape detection module, a gesture detection module and a focusing module;
the acquisition module is used for acquiring a video data stream of a monitoring area and acquiring an image frame from the video data stream;
the human shape detection module is used for performing human shape detection on the image frame so as to determine a human shape area in the image frame;
the gesture detection module is used for executing gesture detection on the humanoid area, and the focusing module is used for determining a focusing area according to a gesture detection result.
10. The gesture recognition based interaction system according to claim 9, wherein the gesture detection module comprises: a setting unit and a detection unit; the setting unit is used for setting a first gesture and a second gesture; the detection unit is used for executing gesture detection on the human-shaped area; if the first gesture is detected, determining that a humanoid area containing the first gesture is a humanoid area of a speaker, and a humanoid area not containing the first gesture is a humanoid area of a participant, wherein the focusing module executes real-time focusing processing on the humanoid area of the speaker; and if the second gesture is detected, the focusing module executes focusing processing on the human-shaped area of the participant.
CN202210542825.0A 2022-05-18 2022-05-18 Interaction method and system based on gesture recognition Pending CN114924645A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210542825.0A CN114924645A (en) 2022-05-18 2022-05-18 Interaction method and system based on gesture recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210542825.0A CN114924645A (en) 2022-05-18 2022-05-18 Interaction method and system based on gesture recognition

Publications (1)

Publication Number Publication Date
CN114924645A true CN114924645A (en) 2022-08-19

Family

ID=82808675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210542825.0A Pending CN114924645A (en) 2022-05-18 2022-05-18 Interaction method and system based on gesture recognition

Country Status (1)

Country Link
CN (1) CN114924645A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103281508A (en) * 2013-05-23 2013-09-04 深圳锐取信息技术股份有限公司 Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system
CN105049764A (en) * 2015-06-17 2015-11-11 武汉智亿方科技有限公司 Image tracking method and system for teaching based on multiple positioning cameras
CN108664853A (en) * 2017-03-30 2018-10-16 北京君正集成电路股份有限公司 Method for detecting human face and device
CN109257559A (en) * 2018-09-28 2019-01-22 苏州科达科技股份有限公司 A kind of image display method, device and the video conferencing system of panoramic video meeting
CN111079686A (en) * 2019-12-25 2020-04-28 开放智能机器(上海)有限公司 Single-stage face detection and key point positioning method and system
CN112052805A (en) * 2020-09-10 2020-12-08 深圳数联天下智能科技有限公司 Face detection frame display method, image processing device, equipment and storage medium
CN112689092A (en) * 2020-12-23 2021-04-20 广州市迪士普音响科技有限公司 Automatic tracking conference recording and broadcasting method, system, device and storage medium
CN112954451A (en) * 2021-02-05 2021-06-11 广州市奥威亚电子科技有限公司 Method, device and equipment for adding information to video character and storage medium
CN113705510A (en) * 2021-09-02 2021-11-26 广州市奥威亚电子科技有限公司 Target identification tracking method, device, equipment and storage medium
CN113784045A (en) * 2021-08-31 2021-12-10 北京安博盛赢教育科技有限责任公司 Focusing interaction method, device, medium and electronic equipment
CN113784046A (en) * 2021-08-31 2021-12-10 北京安博盛赢教育科技有限责任公司 Follow-up shooting method, device, medium and electronic equipment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103281508A (en) * 2013-05-23 2013-09-04 深圳锐取信息技术股份有限公司 Video picture switching method, video picture switching system, recording and broadcasting server and video recording and broadcasting system
CN105049764A (en) * 2015-06-17 2015-11-11 武汉智亿方科技有限公司 Image tracking method and system for teaching based on multiple positioning cameras
CN108664853A (en) * 2017-03-30 2018-10-16 北京君正集成电路股份有限公司 Method for detecting human face and device
CN109257559A (en) * 2018-09-28 2019-01-22 苏州科达科技股份有限公司 A kind of image display method, device and the video conferencing system of panoramic video meeting
CN111079686A (en) * 2019-12-25 2020-04-28 开放智能机器(上海)有限公司 Single-stage face detection and key point positioning method and system
CN112052805A (en) * 2020-09-10 2020-12-08 深圳数联天下智能科技有限公司 Face detection frame display method, image processing device, equipment and storage medium
CN112689092A (en) * 2020-12-23 2021-04-20 广州市迪士普音响科技有限公司 Automatic tracking conference recording and broadcasting method, system, device and storage medium
CN112954451A (en) * 2021-02-05 2021-06-11 广州市奥威亚电子科技有限公司 Method, device and equipment for adding information to video character and storage medium
CN113784045A (en) * 2021-08-31 2021-12-10 北京安博盛赢教育科技有限责任公司 Focusing interaction method, device, medium and electronic equipment
CN113784046A (en) * 2021-08-31 2021-12-10 北京安博盛赢教育科技有限责任公司 Follow-up shooting method, device, medium and electronic equipment
CN113705510A (en) * 2021-09-02 2021-11-26 广州市奥威亚电子科技有限公司 Target identification tracking method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
EP3855731B1 (en) Context based target framing in a teleconferencing environment
US9396399B1 (en) Unusual event detection in wide-angle video (based on moving object trajectories)
US6894714B2 (en) Method and apparatus for predicting events in video conferencing and other applications
CN103595953B (en) A kind of method and apparatus for controlling video capture
CN104169842B (en) For controlling method, the method for operating video clip, face orientation detector and the videoconference server of video clip
US20220319032A1 (en) Optimal view selection in a teleconferencing system with cascaded cameras
CN111488774A (en) Image processing method and device for image processing
US20220327732A1 (en) Information processing apparatus, information processing method, and program
US20220319034A1 (en) Head Pose Estimation in a Multi-Camera Teleconferencing System
WO2021253259A1 (en) Presenter-tracker management in a videoconferencing environment
CN113591562A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN114924645A (en) Interaction method and system based on gesture recognition
JP6859641B2 (en) Evaluation system, information processing equipment and programs
Kumano et al. Collective first-person vision for automatic gaze analysis in multiparty conversations
CN114816045A (en) Method and device for determining interaction gesture and electronic equipment
JP4831750B2 (en) Communication trigger system
Komiya et al. Image-based attention level estimation of interaction scene by head pose and gaze information
CN112287877A (en) Multi-role close-up shot tracking method
JPH10149447A (en) Gesture recognition method/device
JP2009106325A (en) Communication induction system
Nishida et al. SOANets: Encoder-decoder based Skeleton Orientation Alignment Network for White Cane User Recognition from 2D Human Skeleton Sequence.
Al-Hames et al. Automatic multi-modal meeting camera selection for video-conferences and meeting browsers
US11805225B2 (en) Tracker activation and deactivation in a videoconferencing system
WO2023137715A1 (en) Gimbal control method and apparatus, and movable platform and computer-readable medium
Nishida et al. Exemplar-based Pseudo-Viewpoint Rotation for White-Cane User Recognition from a 2D Human Pose Sequence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination