CN111918106A - Multimedia playing system and method for application scene recognition - Google Patents

Multimedia playing system and method for application scene recognition Download PDF

Info

Publication number
CN111918106A
CN111918106A CN202010648167.4A CN202010648167A CN111918106A CN 111918106 A CN111918106 A CN 111918106A CN 202010648167 A CN202010648167 A CN 202010648167A CN 111918106 A CN111918106 A CN 111918106A
Authority
CN
China
Prior art keywords
dynamic
signal
pattern
multimedia
multimedia playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010648167.4A
Other languages
Chinese (zh)
Inventor
胡飞青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010648167.4A priority Critical patent/CN111918106A/en
Priority to PCT/CN2020/109760 priority patent/WO2022007130A1/en
Publication of CN111918106A publication Critical patent/CN111918106A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Social Psychology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to a multimedia playing system applying scene recognition, which comprises: the command recognition device is connected with the signal processing device and used for recognizing the personal sound in the third signal to obtain a corresponding personal external command; the LED display array is used for executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal exceeds the limit; and the MCU control chip is arranged in the multimedia playing terminal and is used for executing playing control of the corresponding multimedia file based on the received personnel external command. The invention also relates to a multimedia playing method applying scene recognition. The multimedia playing system and method for application scene recognition are intelligent in operation and wide in application. Because the audio signal played by the multimedia playing terminal is removed while the voice control command of the personnel is identified, the identification precision of the voice control command is effectively improved.

Description

Multimedia playing system and method for application scene recognition
Technical Field
The invention relates to the field of multimedia playing control, in particular to a multimedia playing system and method applying scene recognition.
Background
The multimedia playing terminal is a novel communication terminal which can integrate various different media (voice, characters, data, images, moving images and the like) and integrates a telephone, a telegraph, a fax machine, a television, a computer and the like. As a direct interface between a communication network and a user, a communication terminal is the ultimate embodiment of the capability and performance of the communication network, and thus a multimedia terminal plays a very important role in the development of the entire communication technology.
The users of the multimedia system playing terminals access the multimedia information through the multimedia terminals, and the terminals are connected through a high-speed communication network to share the multimedia information.
The multimedia system playing terminal is a system playing terminal which processes and controls multimedia information by utilizing a computer technology and a digital communication network technology.
Disclosure of Invention
In order to solve the technical problems in the related field, the invention provides a multimedia playing system and method applying scene recognition, which can remove the audio signals played by a multimedia playing terminal to improve the recognition precision while recognizing a person voice control command, and simultaneously simulate the distribution layout of each organ of the human face under the control of sound production to draw dynamic patterns of open talk, thereby enhancing the interactivity of the multimedia playing terminal.
Therefore, the present invention needs to have at least two important points:
(1) in order to overcome the interference brought by the audio signal played by the multimedia playing terminal when the control command in the voice of the person is identified, the audio signal of the multimedia playing terminal is removed when the voice identification of the person is executed;
(2) and drawing dynamic simulation patterns matched with the face shape of the nearest person based on the face contour of the nearest person and the relative position of each face organ so as to execute dynamic display of the corresponding dynamic simulation patterns when the voice of the person is detected, thereby increasing the interest of the multimedia playing terminal.
According to an aspect of the present invention, there is provided a multimedia playing system applying scene recognition, the system including:
the front camera shooting mechanism is positioned on a shell of the multimedia playing terminal and is used for executing camera shooting actions on the playing environment of the multimedia playing terminal so as to obtain and output a corresponding playing environment image;
the contour identification equipment is connected with the front camera shooting mechanism and used for identifying the human face area with the largest area in the playing environment image based on the human face contour;
the layout detection device is connected with the contour recognition device and is used for identifying the geometric outline of the human face region with the largest area and identifying each face organ in the human face region, and describing the simulation pattern of the current face target based on the respective positions of the various face organs in the geometric outline and the geometric outline;
the dynamic drawing device is connected with the layout detection device and used for performing speaking dynamic drawing on the mouth organ based on the simulation pattern so as to obtain a dynamic pattern corresponding to the simulation pattern;
the data acquisition equipment is arranged in the multimedia playing terminal and used for acquiring an audio signal of a multimedia file currently played by the multimedia playing terminal as a first signal;
the content capturing device is positioned on the shell of the multimedia playing terminal and is used for acquiring an audio signal of the environment where the multimedia playing terminal is positioned in real time to serve as a second signal;
the signal processing device is respectively connected with the data acquisition device and the content capture device and is used for stripping the first signal out of the second signal to obtain a corresponding third signal;
the command recognition device is connected with the signal processing device and used for recognizing the voice of the person in the third signal to obtain a corresponding external command of the person;
the LED display array is respectively connected with the command recognition equipment and the dynamic drawing equipment and is used for executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal exceeds the limit;
and the MCU control chip is arranged in the multimedia playing terminal, is connected with the command identification equipment and is used for executing playing control of a corresponding multimedia file based on a received personnel external command.
According to another aspect of the present invention, there is also provided a multimedia playing method applying scene recognition, the method including:
the method comprises the steps that a front-mounted camera shooting mechanism is used, is located on a shell of a multimedia playing terminal and is used for carrying out camera shooting actions on a playing environment of the multimedia playing terminal so as to obtain and output a corresponding playing environment image;
using contour recognition equipment connected with the front camera shooting mechanism and used for recognizing the human face area with the largest area in the playing environment image based on the human face contour;
using a layout detection device connected with the contour recognition device for identifying the geometric shape of the human face region with the largest area and identifying each face organ inside the human face region, and depicting a simulation pattern of the current face target based on the respective positions of the various face organs in the geometric shape and the geometric shape;
using a dynamic drawing device connected with the layout detection device and used for performing speaking dynamic drawing on the mouth organ based on the simulation pattern so as to obtain a dynamic pattern corresponding to the simulation pattern;
the data acquisition equipment is arranged in the multimedia playing terminal and used for acquiring an audio signal of a multimedia file currently played by the multimedia playing terminal as a first signal;
the content capturing device is positioned on the shell of the multimedia playing terminal and used for acquiring the audio signal of the environment where the multimedia playing terminal is positioned in real time to serve as a second signal;
using a signal processing device, respectively connected to the data acquisition device and the content capture device, for stripping the first signal from the second signal to obtain a corresponding third signal;
using command recognition equipment connected with the signal processing equipment and used for recognizing the human voice in the third signal to obtain a corresponding human external command;
using an LED display array, respectively connected with the command recognition device and the dynamic drawing device, and configured to perform dynamic display of the received dynamic pattern when a waveform amplitude of a personal sound signal in the received third signal exceeds a limit;
and the MCU control chip is arranged in the multimedia playing terminal, is connected with the command recognition equipment and is used for executing playing control of corresponding multimedia files based on received personnel external commands.
The multimedia playing system and method for application scene recognition are intelligent in operation and wide in application. Because the audio signal played by the multimedia playing terminal is removed while the voice control command of the personnel is identified, the identification precision of the voice control command is effectively improved.
Detailed Description
Embodiments of the multimedia playing system and method using scene recognition according to the present invention will be described in detail below.
In the initial development of communication technology, people have simple requirements on the form of transmitted and exchanged information, the information quantity is small, and a single-media terminal can meet the requirements. Such as a telephone, which uses only one medium, voice, to represent information. The telegram is also a single medium communication terminal which only represents information in a simple medium, a combination of dash-dot lines. A computer is a communication terminal that has been newly developed and uses only media such as data to represent information.
With the continuous development of social production and life, people have higher and more diversified requirements on transmitted and exchanged information contents. Scenes, images and expressions are one of the basic requirements for realising the reality of the scene and the conversation, and the ability of voice media to represent information is far less rich and profound than that of image media. Accordingly, communication terminals such as television sets and video telephone sets have been developed. They generally use two or more media of voice and image to represent information, so that the quality and information quantity of communication service are greatly improved. However, television sets or videophones are not true multimedia terminals. The multimedia terminal requires not only the use of various media (voice, text, data, graphics, images, etc.) to represent information, but also the integration of these different media into an organic whole, and the mutual synchronization and cooperation to represent various actual information and its changes in real time, and the communication users' both parties can communicate information under the interaction. Such a multimedia terminal can provide a communication user with greater convenience and more satisfactory service. For example. In the case of a computer personnel management system, a user can retrieve not only data materials such as age, sex, and history of a person at any time, but also a photo image, voice and a personal appearance of the person by using a multimedia terminal.
However, in the actual multimedia playing process, it is not possible to remove the audio signal played by the multimedia playing terminal itself to improve the recognition accuracy while recognizing the voice control command of the person, and to simulate the distribution layout of each organ of the human face under the control of the voice to perform the drawing of the dynamic pattern of the open-ended speech, which naturally fails to enhance the interactivity of the multimedia playing terminal.
In order to overcome the defects, the invention builds a multimedia playing system and method for identifying application scenes, and can effectively solve the corresponding technical problems.
The multimedia playing system applying scene recognition according to the embodiment of the present invention includes:
the front camera shooting mechanism is positioned on a shell of the multimedia playing terminal and is used for executing camera shooting actions on the playing environment of the multimedia playing terminal so as to obtain and output a corresponding playing environment image;
the contour identification equipment is connected with the front camera shooting mechanism and used for identifying the human face area with the largest area in the playing environment image based on the human face contour;
the layout detection device is connected with the contour recognition device and is used for identifying the geometric outline of the human face region with the largest area and identifying each face organ in the human face region, and describing the simulation pattern of the current face target based on the respective positions of the various face organs in the geometric outline and the geometric outline;
the dynamic drawing device is connected with the layout detection device and used for performing speaking dynamic drawing on the mouth organ based on the simulation pattern so as to obtain a dynamic pattern corresponding to the simulation pattern;
the data acquisition equipment is arranged in the multimedia playing terminal and used for acquiring an audio signal of a multimedia file currently played by the multimedia playing terminal as a first signal;
the content capturing device is positioned on the shell of the multimedia playing terminal and is used for acquiring an audio signal of the environment where the multimedia playing terminal is positioned in real time to serve as a second signal;
the signal processing device is respectively connected with the data acquisition device and the content capture device and is used for stripping the first signal out of the second signal to obtain a corresponding third signal;
the command recognition device is connected with the signal processing device and used for recognizing the voice of the person in the third signal to obtain a corresponding external command of the person;
the LED display array is respectively connected with the command recognition equipment and the dynamic drawing equipment and is used for executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal exceeds the limit;
and the MCU control chip is arranged in the multimedia playing terminal, is connected with the command identification equipment and is used for executing playing control of a corresponding multimedia file based on a received personnel external command.
Next, the detailed structure of the multimedia playing system to which the scene recognition is applied will be further described.
In the multimedia playing system applying scene recognition:
the LED display array is also used for not executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal is not over-limit;
in the MCU control chip, the external command of the person comprises the file name of the multimedia file.
In the multimedia playing system applying scene recognition:
depicting a simulated pattern of a current facial target based on the respective locations of the various facial organs in the geometric form and the geometric form comprises: various facial organs within the geometric shape are identified and located based on the outline of each facial organ.
In the multimedia playing system applying scene recognition:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and drawing a plurality of drawing areas of the mouth organ corresponding to the speaking action on the basis of the imaging area corresponding to the mouth organ in the simulation pattern, and continuously playing the plurality of drawing areas to obtain a dynamic area corresponding to the mouth organ.
In the multimedia playing system applying scene recognition:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and replacing the imaging area corresponding to the mouth organ in the simulated pattern with the dynamic area corresponding to the mouth organ to obtain the dynamic pattern corresponding to the simulated pattern.
The multimedia playing method applying scene recognition shown in the embodiment of the invention comprises the following steps:
the method comprises the steps that a front-mounted camera shooting mechanism is used, is located on a shell of a multimedia playing terminal and is used for carrying out camera shooting actions on a playing environment of the multimedia playing terminal so as to obtain and output a corresponding playing environment image;
using contour recognition equipment connected with the front camera shooting mechanism and used for recognizing the human face area with the largest area in the playing environment image based on the human face contour;
using a layout detection device connected with the contour recognition device for identifying the geometric shape of the human face region with the largest area and identifying each face organ inside the human face region, and depicting a simulation pattern of the current face target based on the respective positions of the various face organs in the geometric shape and the geometric shape;
using a dynamic drawing device connected with the layout detection device and used for performing speaking dynamic drawing on the mouth organ based on the simulation pattern so as to obtain a dynamic pattern corresponding to the simulation pattern;
the data acquisition equipment is arranged in the multimedia playing terminal and used for acquiring an audio signal of a multimedia file currently played by the multimedia playing terminal as a first signal;
the content capturing device is positioned on the shell of the multimedia playing terminal and used for acquiring the audio signal of the environment where the multimedia playing terminal is positioned in real time to serve as a second signal;
using a signal processing device, respectively connected to the data acquisition device and the content capture device, for stripping the first signal from the second signal to obtain a corresponding third signal;
using command recognition equipment connected with the signal processing equipment and used for recognizing the human voice in the third signal to obtain a corresponding human external command;
using an LED display array, respectively connected with the command recognition device and the dynamic drawing device, and configured to perform dynamic display of the received dynamic pattern when a waveform amplitude of a personal sound signal in the received third signal exceeds a limit;
and the MCU control chip is arranged in the multimedia playing terminal, is connected with the command recognition equipment and is used for executing playing control of corresponding multimedia files based on received personnel external commands.
Next, the specific steps of the multimedia playing method for application scene recognition according to the present invention will be further described.
In the multimedia playing method for application scene recognition:
the LED display array is also used for not executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal is not over-limit;
in the MCU control chip, the external command of the person comprises the file name of the multimedia file.
In the multimedia playing method for application scene recognition:
depicting a simulated pattern of a current facial target based on the respective locations of the various facial organs in the geometric form and the geometric form comprises: various facial organs within the geometric shape are identified and located based on the outline of each facial organ.
In the multimedia playing method for application scene recognition:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and drawing a plurality of drawing areas of the mouth organ corresponding to the speaking action on the basis of the imaging area corresponding to the mouth organ in the simulation pattern, and continuously playing the plurality of drawing areas to obtain a dynamic area corresponding to the mouth organ.
In the multimedia playing method for application scene recognition:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and replacing the imaging area corresponding to the mouth organ in the simulated pattern with the dynamic area corresponding to the mouth organ to obtain the dynamic pattern corresponding to the simulated pattern.
In addition, the MCU may be classified into a non-on-chip ROM type and an on-chip ROM type according to its memory type. For a chip without on-chip ROM, an EPROM must be connected externally to be used (8031 is a typical chip). The chip with on-chip ROM type is further classified into an on-chip EPROM type (a typical chip is 87C51), a MASK on-chip MASK ROM type (a typical chip is 8051), an on-chip FLASH type (a typical chip is 89C51), and the like, and some companies also provide a chip with on-chip One Time Programming (OTP) (a typical chip is 97C 51). The MCU of the MASKROM is low in price, but the program is solidified when leaving the factory, so that the MASKROM is suitable for application occasions with fixed and unchangeable programs; the MCU program of the FLASH ROM can be repeatedly erased and written, has strong flexibility but higher price, and is suitable for application occasions insensitive to price or development application; the MCU price of the OTPROM is between the first two, and the OTPROM has one-time programmable capability, is suitable for application occasions requiring certain flexibility and low cost, and is especially an electronic product with continuously renewed functions and rapid mass production.
Finally, it should be noted that each functional device in the embodiments of the present invention may be integrated into one processing device, or each device may exist alone physically, or two or more devices may be integrated into one device.
The functions, if implemented in the form of software-enabled devices and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A multimedia playing system using scene recognition, the system comprising:
the front camera shooting mechanism is positioned on a shell of the multimedia playing terminal and is used for executing camera shooting actions on the playing environment of the multimedia playing terminal so as to obtain and output a corresponding playing environment image;
the contour identification equipment is connected with the front camera shooting mechanism and used for identifying the human face area with the largest area in the playing environment image based on the human face contour;
the layout detection device is connected with the contour recognition device and is used for identifying the geometric outline of the human face region with the largest area and identifying each face organ in the human face region, and describing the simulation pattern of the current face target based on the respective positions of the various face organs in the geometric outline and the geometric outline;
the dynamic drawing device is connected with the layout detection device and used for performing speaking dynamic drawing on the mouth organ based on the simulation pattern so as to obtain a dynamic pattern corresponding to the simulation pattern;
the data acquisition equipment is arranged in the multimedia playing terminal and used for acquiring an audio signal of a multimedia file currently played by the multimedia playing terminal as a first signal;
the content capturing device is positioned on the shell of the multimedia playing terminal and is used for acquiring an audio signal of the environment where the multimedia playing terminal is positioned in real time to serve as a second signal;
the signal processing device is respectively connected with the data acquisition device and the content capture device and is used for stripping the first signal out of the second signal to obtain a corresponding third signal;
the command recognition device is connected with the signal processing device and used for recognizing the voice of the person in the third signal to obtain a corresponding external command of the person;
the LED display array is respectively connected with the command recognition equipment and the dynamic drawing equipment and is used for executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal exceeds the limit;
and the MCU control chip is arranged in the multimedia playing terminal, is connected with the command identification equipment and is used for executing playing control of a corresponding multimedia file based on a received personnel external command.
2. The multimedia playback system using scene recognition as claimed in claim 1, wherein:
the LED display array is also used for not executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal is not over-limit;
in the MCU control chip, the external command of the person comprises the file name of the multimedia file.
3. The multimedia playback system that applies scene recognition as set forth in claim 2, wherein:
depicting a simulated pattern of a current facial target based on the respective locations of the various facial organs in the geometric form and the geometric form comprises: various facial organs within the geometric shape are identified and located based on the outline of each facial organ.
4. The multimedia playback system that employs scene recognition as set forth in claim 3, wherein:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and drawing a plurality of drawing areas of the mouth organ corresponding to the speaking action on the basis of the imaging area corresponding to the mouth organ in the simulation pattern, and continuously playing the plurality of drawing areas to obtain a dynamic area corresponding to the mouth organ.
5. The multimedia playback system that employs scene recognition as set forth in claim 4, wherein:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and replacing the imaging area corresponding to the mouth organ in the simulated pattern with the dynamic area corresponding to the mouth organ to obtain the dynamic pattern corresponding to the simulated pattern.
6. A multimedia playing method applying scene recognition is characterized by comprising the following steps:
the method comprises the steps that a front-mounted camera shooting mechanism is used, is located on a shell of a multimedia playing terminal and is used for carrying out camera shooting actions on a playing environment of the multimedia playing terminal so as to obtain and output a corresponding playing environment image;
using contour recognition equipment connected with the front camera shooting mechanism and used for recognizing the human face area with the largest area in the playing environment image based on the human face contour;
using a layout detection device connected with the contour recognition device for identifying the geometric shape of the human face region with the largest area and identifying each face organ inside the human face region, and depicting a simulation pattern of the current face target based on the respective positions of the various face organs in the geometric shape and the geometric shape;
using a dynamic drawing device connected with the layout detection device and used for performing speaking dynamic drawing on the mouth organ based on the simulation pattern so as to obtain a dynamic pattern corresponding to the simulation pattern;
the data acquisition equipment is arranged in the multimedia playing terminal and used for acquiring an audio signal of a multimedia file currently played by the multimedia playing terminal as a first signal;
the content capturing device is positioned on the shell of the multimedia playing terminal and used for acquiring the audio signal of the environment where the multimedia playing terminal is positioned in real time to serve as a second signal;
using a signal processing device, respectively connected to the data acquisition device and the content capture device, for stripping the first signal from the second signal to obtain a corresponding third signal;
using command recognition equipment connected with the signal processing equipment and used for recognizing the human voice in the third signal to obtain a corresponding human external command;
using an LED display array, respectively connected with the command recognition device and the dynamic drawing device, and configured to perform dynamic display of the received dynamic pattern when a waveform amplitude of a personal sound signal in the received third signal exceeds a limit;
and the MCU control chip is arranged in the multimedia playing terminal, is connected with the command recognition equipment and is used for executing playing control of corresponding multimedia files based on received personnel external commands.
7. The multimedia playing method using scene recognition according to claim 6, wherein:
the LED display array is also used for not executing dynamic display of the received dynamic pattern when the waveform amplitude of the personnel sound signal in the received third signal is not over-limit;
in the MCU control chip, the external command of the person comprises the file name of the multimedia file.
8. The multimedia playing method using scene recognition according to claim 7, wherein:
depicting a simulated pattern of a current facial target based on the respective locations of the various facial organs in the geometric form and the geometric form comprises: various facial organs within the geometric shape are identified and located based on the outline of each facial organ.
9. The multimedia playing method using scene recognition according to claim 8, wherein:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and drawing a plurality of drawing areas of the mouth organ corresponding to the speaking action on the basis of the imaging area corresponding to the mouth organ in the simulation pattern, and continuously playing the plurality of drawing areas to obtain a dynamic area corresponding to the mouth organ.
10. The multimedia playing method using scene recognition according to claim 9, wherein:
performing dynamic rendering of speech on a mouth organ therein based on the simulated pattern to obtain a dynamic pattern corresponding to the simulated pattern comprises: and replacing the imaging area corresponding to the mouth organ in the simulated pattern with the dynamic area corresponding to the mouth organ to obtain the dynamic pattern corresponding to the simulated pattern.
CN202010648167.4A 2020-07-07 2020-07-07 Multimedia playing system and method for application scene recognition Pending CN111918106A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010648167.4A CN111918106A (en) 2020-07-07 2020-07-07 Multimedia playing system and method for application scene recognition
PCT/CN2020/109760 WO2022007130A1 (en) 2020-07-07 2020-08-18 Multimedia playing system and method for application scene identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010648167.4A CN111918106A (en) 2020-07-07 2020-07-07 Multimedia playing system and method for application scene recognition

Publications (1)

Publication Number Publication Date
CN111918106A true CN111918106A (en) 2020-11-10

Family

ID=73227585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010648167.4A Pending CN111918106A (en) 2020-07-07 2020-07-07 Multimedia playing system and method for application scene recognition

Country Status (2)

Country Link
CN (1) CN111918106A (en)
WO (1) WO2022007130A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837051A (en) * 2021-09-17 2021-12-24 泰州蝶金软件有限公司 Intelligent broadcast control platform based on big data management

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339606B (en) * 2008-08-14 2011-10-12 北京中星微电子有限公司 Human face critical organ contour characteristic points positioning and tracking method and device
CN106297815A (en) * 2016-07-27 2017-01-04 武汉诚迈科技有限公司 A kind of method of echo cancellation in speech recognition scene
CN106326980A (en) * 2016-08-31 2017-01-11 北京光年无限科技有限公司 Robot and method for simulating human facial movements by robot
US20170221099A1 (en) * 2016-01-29 2017-08-03 Tyco Fire & Security Gmbh Adaptive video advertising using eas pedestals or similar structure
JP2018049305A (en) * 2016-09-20 2018-03-29 ナレルシステム株式会社 Communication method, computer program and device
CN109003606A (en) * 2018-07-05 2018-12-14 江苏海事职业技术学院 A kind of vehicle mounted multimedia interaction systems and its application method
CN109087636A (en) * 2017-12-15 2018-12-25 蔚来汽车有限公司 Interactive device
CN109117770A (en) * 2018-08-01 2019-01-01 吉林盘古网络科技股份有限公司 FA Facial Animation acquisition method, device and terminal device
WO2019236171A1 (en) * 2018-06-07 2019-12-12 Motorola Mobility Llc Methods and devices for identifying multiple persons within an environment of an electronic device
CN110750152A (en) * 2019-09-11 2020-02-04 云知声智能科技股份有限公司 Human-computer interaction method and system based on lip action
CN110834338A (en) * 2019-11-04 2020-02-25 深圳勇艺达机器人有限公司 Vehicle-mounted robot and control method thereof
CN111273833A (en) * 2020-03-25 2020-06-12 北京百度网讯科技有限公司 Man-machine interaction control method, device and system and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110304629A1 (en) * 2010-06-09 2011-12-15 Microsoft Corporation Real-time animation of facial expressions
CN202196369U (en) * 2011-04-27 2012-04-18 德信互动科技(北京)有限公司 Electronic system based on video control
US10560737B2 (en) * 2018-03-12 2020-02-11 Amazon Technologies, Inc. Voice-controlled multimedia device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339606B (en) * 2008-08-14 2011-10-12 北京中星微电子有限公司 Human face critical organ contour characteristic points positioning and tracking method and device
US20170221099A1 (en) * 2016-01-29 2017-08-03 Tyco Fire & Security Gmbh Adaptive video advertising using eas pedestals or similar structure
CN106297815A (en) * 2016-07-27 2017-01-04 武汉诚迈科技有限公司 A kind of method of echo cancellation in speech recognition scene
CN106326980A (en) * 2016-08-31 2017-01-11 北京光年无限科技有限公司 Robot and method for simulating human facial movements by robot
JP2018049305A (en) * 2016-09-20 2018-03-29 ナレルシステム株式会社 Communication method, computer program and device
CN109087636A (en) * 2017-12-15 2018-12-25 蔚来汽车有限公司 Interactive device
WO2019236171A1 (en) * 2018-06-07 2019-12-12 Motorola Mobility Llc Methods and devices for identifying multiple persons within an environment of an electronic device
CN109003606A (en) * 2018-07-05 2018-12-14 江苏海事职业技术学院 A kind of vehicle mounted multimedia interaction systems and its application method
CN109117770A (en) * 2018-08-01 2019-01-01 吉林盘古网络科技股份有限公司 FA Facial Animation acquisition method, device and terminal device
CN110750152A (en) * 2019-09-11 2020-02-04 云知声智能科技股份有限公司 Human-computer interaction method and system based on lip action
CN110834338A (en) * 2019-11-04 2020-02-25 深圳勇艺达机器人有限公司 Vehicle-mounted robot and control method thereof
CN111273833A (en) * 2020-03-25 2020-06-12 北京百度网讯科技有限公司 Man-machine interaction control method, device and system and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林荣暖: "Android终端音视频通信的质量保证若干问题的研究", 《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837051A (en) * 2021-09-17 2021-12-24 泰州蝶金软件有限公司 Intelligent broadcast control platform based on big data management

Also Published As

Publication number Publication date
WO2022007130A1 (en) 2022-01-13

Similar Documents

Publication Publication Date Title
CN110941954B (en) Text broadcasting method and device, electronic equipment and storage medium
WO2022001593A1 (en) Video generation method and apparatus, storage medium and computer device
JP2021192222A (en) Video image interactive method and apparatus, electronic device, computer readable storage medium, and computer program
CN111885414B (en) Data processing method, device and equipment and readable storage medium
US20140223279A1 (en) Data augmentation with real-time annotations
CN109872297A (en) Image processing method and device, electronic equipment and storage medium
CN109729420A (en) Image processing method and device, mobile terminal and computer readable storage medium
CN109474850B (en) Motion pixel video special effect adding method and device, terminal equipment and storage medium
WO2008073637A1 (en) Mute function for video applications
KR102491773B1 (en) Image deformation control method, device and hardware device
WO2022089224A1 (en) Video communication method and apparatus, electronic device, computer readable storage medium, and computer program product
CN110750161A (en) Interactive system, method, mobile device and computer readable medium
CN110475157A (en) Multimedia messages methods of exhibiting, device, computer equipment and storage medium
CN109862380A (en) Video data handling procedure, device and server, electronic equipment and storage medium
CN112151041B (en) Recording method, device, equipment and storage medium based on recorder program
CN111629222B (en) Video processing method, device and storage medium
CN111738769B (en) Video processing method and device
CN111340848A (en) Object tracking method, system, device and medium for target area
CN108563327A (en) Augmented reality method, apparatus, storage medium and electronic equipment
CN113378583A (en) Dialogue reply method and device, dialogue model training method and device, and storage medium
CN111918106A (en) Multimedia playing system and method for application scene recognition
CN113012500A (en) Remote teaching system
CN112149599A (en) Expression tracking method and device, storage medium and electronic equipment
KR20060102651A (en) Wireless communication terminal with message transmission according to feeling of terminal-user and method of message transmission using same
CN114398517A (en) Video data acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201110

RJ01 Rejection of invention patent application after publication