CN110021297A - A kind of intelligent display method and its device based on audio-video identification - Google Patents

A kind of intelligent display method and its device based on audio-video identification Download PDF

Info

Publication number
CN110021297A
CN110021297A CN201910296455.5A CN201910296455A CN110021297A CN 110021297 A CN110021297 A CN 110021297A CN 201910296455 A CN201910296455 A CN 201910296455A CN 110021297 A CN110021297 A CN 110021297A
Authority
CN
China
Prior art keywords
audio
paraphrase
control
image data
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910296455.5A
Other languages
Chinese (zh)
Inventor
倪雪平
尹大海
金文俊
倪末萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ying Long Opto Electronic Co Ltd
Original Assignee
Shanghai Ying Long Opto Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ying Long Opto Electronic Co Ltd filed Critical Shanghai Ying Long Opto Electronic Co Ltd
Priority to CN201910296455.5A priority Critical patent/CN110021297A/en
Publication of CN110021297A publication Critical patent/CN110021297A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/055Time compression or expansion for synchronising with other signals, e.g. video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The present invention relates to interaction field of display technology, disclose a kind of intelligent display method and its device based on audio-video identification, and method includes: acquisition image data and audio data;From the position coordinates for identifying image object in image data and its in image data;According to activities belonging to position coordinates identification image object;Local audio belonging to audio object and audio object is identified from audio data;Match activities and local audio;If successful match, label symbol is added in image data according to position coordinates;Display image data;Judge whether activities and local audio action are corresponding, if corresponding, then representative image target identification is correct, then display has the image data of label symbol, the accuracy rate for controlling position coordinates is in high level, and user can participate in control using movement and sound simultaneously or change display picture material.

Description

A kind of intelligent display method and its device based on audio-video identification
Technical field
The present invention relates to interaction field of display technology, more specifically, it relates to a kind of intelligence based on audio-video identification Display methods and its device.
Background technique
With the development of display screen interaction technique, camera is installed additional in LED screen, camera shoots the video before LED screen Image, LED screen show the picture that camera photographed.
After realizing interaction, camera is connected with the computing module for reading video image, and computing module can be mobile working It stands, computing module tentatively identifies the facial image in video image by template matching method, and marks facial image in video Position is sent to LED screen by the position in image, computing module, and LED screen shows out position, allows the personnel of observation LED screen can Know orientation locating for oneself face, realizes preliminary interaction.
But in the prior art that audio is not added to interactive participation, user can not be simultaneously using oneself movement and sound Sound participates in changing the operation of image in LED screen.
Summary of the invention
Problem in view of the prior art, the purpose of the present invention one are to provide a kind of intelligent display based on audio-video identification Method has the advantages that user can participate in control using movement and sound simultaneously or change display picture material;This hair Bright purpose two is to provide a kind of intelligent display device based on audio-video identification, with user can simultaneously using movement with And sound participates in control or changes the advantages of showing picture material.
To achieve the above object one, the present invention provides the following technical scheme that
A kind of intelligent display method based on audio-video identification, includes the following steps:
Acquire image data and audio data;
From the position coordinates for identifying image object in image data and its in image data;
According to activities belonging to position coordinates identification image object;
Local audio belonging to audio object and audio object is identified from audio data;
Match activities and local audio;
If successful match, label symbol is added in image data according to position coordinates;
Display has the image data of label symbol.
Through the above technical solutions, first identifying the position coordinates of image object from image data, then sat in position The activities of image object are identified on target basis, judge whether activities and local audio action are corresponding, if corresponding, Representative image target identification is correct, and then display has the image data of label symbol, controls at the accuracy rate of position coordinates In high level;Activities can be mouth action, and part audio corresponding with mouth action is the voice said;Activities It can be gesture, part audio corresponding with gesture is sound of the wind, and user can participate in control using movement and sound simultaneously or change Become display picture material.
Further, further includes:
Establish the audio interpreted library for being preset with multiple control paraphrase;
Corresponding control paraphrase is in image data or display properties;
Identify the control paraphrase in local audio;
Change the control paraphrase of image data according to the control paraphrase in local audio;
Or, changing the control paraphrase of display properties according to the control paraphrase in local audio.
Through the above technical solutions, control paraphrase can for it is multiple have control meaning words, such as " the next item up ", " under One " or " increasing screen intensity " etc., if identifying " the next item up ", show next image data or next image Target increases the brightness of display screen if identifying " increasing screen intensity ".
Further, further includes:
Establish be preset with it is multiple control paraphrase audio interpreted libraries, and be preset with it is multiple with control paraphrase act correspondingly The display interpreted library of paraphrase;
Corresponding control paraphrase is in image data;
Respective action paraphrase is in image data;
Identify the control paraphrase in local audio;
Movement paraphrase corresponding with image data is added in image data according to the control paraphrase in local audio.
Through the above technical solutions, control paraphrase can for it is multiple have control meaning words, such as " the next item up ", " under One " or " increasing screen intensity " etc., if identifying " the next item up ", show next image data or next image Target increases the brightness of display screen if identifying " increasing screen intensity ";Movement meaning can be had to be multiple by acting paraphrase Word, such as " grayscale image ", " amplification " or " diminution " etc., if identifying " grayscale image ", the gray scale of display image data Figure, if identifying " amplification " or " diminution ", the image object in the display image data that zooms in or out, amplification factor or Person's minification can be previously set, can also by subsequent speech recognition go out number depending on.
Further, further includes:
Image object is multiple;
A kind of identification audio object corresponding with one of image object;
Display multiple images data side by side.
Further, further includes:
Change the control paraphrase of all image datas according to the control paraphrase in local audio.
To achieve the above object two, the present invention provides the following technical scheme that
It is a kind of based on audio-video identification intelligent display device, based on display screen, screen data be connected with control centre's module, The picture recognition module and audio identification module connecting respectively with control centre module data, picture recognition module is for acquiring And image is analyzed, audio identification module is for acquiring and analyzing audio, and control centre's module is for receiving image and analysis knot Fruit, and by image and analysis as the result is shown in display screen;
Picture recognition module acquires out image data and image data is sent to control centre's module, and audio identification module can adopt Collect audio data out and audio data is sent to control centre's module;
Further include:
Picture recognition module includes camera and the recognition processor with camera data connection, and camera is for acquiring image Data, recognition processor is for receiving image data and from identifying image object in image data and its in image data Position coordinates;
Audio identification module includes audio collection device and the analysis processor with audio collection device data connection, audio collection device For acquiring audio data, analysis processor is for receiving audio data and identifying audio object and sound from audio data Local audio belonging to frequency target;
Control centre's module includes the center processor with recognition processor and analysis processor data connection, center processor Data connection has video-stream processor, and video-stream processor is connect with screen data, center processor receive recognition processor and The data of analysis processor, and drive display screen to show by video-stream processor;
It include matched data component in center processor, matched data component is used to match activities and local audio, if Successful match, then sending content to recognition processor is to add the label letter of label symbol in image data according to position coordinates Number;
Recognition processor receives marking signal, according to pre-set instruction modification and updates picture number according to marking signal According to recognition processor sends image data to center processor.
Further, include: in analysis processor
Initial audio component, for establishing the audio interpreted library for being preset with multiple control paraphrase;
Corresponding paraphrase component controls paraphrase in image data or display properties for corresponding with recognition processor data connection;
It identifies paraphrase component, goes out the control paraphrase in local audio for identification;
Change paraphrase component, for changing the control paraphrase of image data according to the control paraphrase in local audio, or, according to office Control paraphrase in portion's audio changes the control paraphrase of display properties.
Further, include: in analysis processor
Volume initial component is preset with multiple audio interpreted libraries for controlling paraphrase for establishing, and is preset with multiple and control Paraphrase acts the display interpreted library of paraphrase correspondingly;
Volume corresponds to component, for corresponding control paraphrase in image data and respective action paraphrase in image data;
Volume recognizer component goes out the control paraphrase in local audio for identification;
Volume changes component, corresponding with image data for being added in image data according to the control paraphrase in local audio Act paraphrase.
Further, image object is multiple;
Center processor includes identifying corresponding component, for identification a kind of audio object corresponding with one of image object;
Video-stream processor includes display component arranged side by side, for showing multiple images data side by side.
Further, further includes:
Center processor includes global change component, for changing all image datas according to the control paraphrase in local audio Control paraphrase.
Compared with prior art, the beneficial effects of the present invention are: first identifying the position of image object from image data Then coordinate identifies the activities of image object on the basis of position coordinates, judge activities and local audio action Whether correspond to, if corresponding, representative image target identification is correct, and then display has the image data of label symbol, controls The accuracy rate of position coordinates is in high level;Activities can be mouth action, and part audio corresponding with mouth action is The voice said;Activities can be gesture, it is corresponding with gesture part audio be sound of the wind, user can simultaneously using movement and Sound participates in control or changes display picture material.
Detailed description of the invention
Fig. 1 is the method flow diagram of the embodiment of the present invention one;
Fig. 2 is the device block diagram of the embodiment of the present invention two;
Fig. 3 is the block component diagram of two control centre's module of the embodiment of the present invention
Fig. 4 is the block component diagram of two analysis processor of the embodiment of the present invention.
Appended drawing reference: 1, display screen;2, control centre's module;21, center processor;211, matched data component;212, Identify corresponding component;213, global to change component;22, video-stream processor;221, display component arranged side by side;3, picture recognition module; 31, camera;32, recognition processor;4, audio identification module;41, audio collection device;42, analysis processor;421, initial sound Frequency component;422, corresponding paraphrase component;423, paraphrase component is identified;424, change paraphrase component;425, volume initial component; 426, volume corresponds to component;427, volume recognizer component;428, volume changes component.
Specific embodiment
With reference to the accompanying drawings and examples, the present invention will be described in detail.
Embodiment one
A kind of intelligent display method based on audio-video identification, as shown in Figure 1, including the following steps:
Acquire image data and audio data.Image data is plane or solid.The image data of plane is by black and white camera Or colour imagery shot, and three-dimensional image data is then acquired by Kinect device and is obtained.Audio data can be acquired by microphone Audio volume control by spectrum analysis and obtain.
From the position coordinates for identifying image object in image data and its in image data.Image object can be image Face in data, the method that template matching can be used match the topography of face and face, identify face and people The algorithm of the topography of face is the prior art, is not being repeated herein.Identify the seat after face where face in image data Mark is position coordinates.
According to activities belonging to position coordinates identification image object.It is starting point to the Local map of face using position coordinates As being detected, so as to increase the accuracy of face topography, image object is personnel A, and has personnel B by personnel A, Avoiding face topography is the position on personnel B, improves the accuracy of topography.Activities are more on a timeline The change procedure of the combination of width topography, several topographies can also be activities, for example, activities are mouth, mouth Combination of actions when bar speaking is activities.
Local audio belonging to audio object and audio object is identified from audio data.It is identified from audio data Text out, speech recognition technology are the prior art, the speech API that the companies such as Baidu, Iflytek or search dog can be called to provide Interface is identified, identifies the text information contained in audio data.
Match activities and local audio.It is had when mouth says text information and generates corresponding movement, for example, office Portion's movement can be to open and being opened flat, when text information is " good ", and activities are to open, when text information is " one ", Activities are to be opened flat.
If successful match, label symbol is added in image data according to position coordinates.When the part in image data Acting corresponding with the text information in audio data is then successful match, for it fails to match if not corresponding.Successful match then generation Table image data is accurate, to add label symbol in the corresponding position of image data according to position coordinates.Label meets It can be red frame, or the data processing on tone, such as contrast processing, coloration are carried out to the image data of position nearby coordinates Processing or pattern distortion processing etc..
Display has the image data of label symbol.Pass through the display of display screen 1 treated image data.
The position coordinates that image object is first identified from image data, then identify image on the basis of position coordinates The activities of target judge whether activities and local audio action are corresponding, if corresponding, representative image target identification is just Really, then image data of the display with label symbol, the accuracy rate for controlling position coordinates are in high level;Activities It can be mouth action, part audio corresponding with mouth action is the voice said;Activities can be gesture, corresponding with gesture Local audio be sound of the wind, user can simultaneously using movement and sound participate in control or change display picture material.
When handling audio data, further includes:
Establish the audio interpreted library for being preset with multiple control paraphrase.
Corresponding control paraphrase is in image data or display properties.
Identify the control paraphrase in local audio.
Change the control paraphrase of image data according to the control paraphrase in local audio.
Or, changing the control paraphrase of display properties according to the control paraphrase in local audio.
Controlling paraphrase can be multiple words with control meaning, such as " the next item up ", " the next item down " or " increase screen Brightness " etc. shows next image data or next image object if identifying " the next item up ", " increases if identifying Screen intensity " then increases by 1 act of display screen of brightness.
In some other embodiment, further includes:
Establish be preset with it is multiple control paraphrase audio interpreted libraries, and be preset with it is multiple with control paraphrase act correspondingly The display interpreted library of paraphrase.
Corresponding control paraphrase is in image data;
Respective action paraphrase is in image data;
Identify the control paraphrase in local audio;
Movement paraphrase corresponding with image data is added in image data according to the control paraphrase in local audio.
Controlling paraphrase can be multiple words with control meaning, such as " the next item up ", " the next item down " or " increase screen Brightness " etc. shows next image data or next image object if identifying " the next item up ", " increases if identifying Screen intensity " then increases by 1 act of display screen of brightness;Acting paraphrase can be multiple words with movement meaning, such as " gray scale Figure ", " amplification " or " diminution " etc., if identifying " grayscale image ", the grayscale image of display image data, if identifying " amplification " Or " diminution ", then the image object in the display image data that zooms in or out, amplification factor or minification can be prior Setting, can also by subsequent speech recognition go out number depending on.
In some other embodiment, further includes:
Image object is multiple;
A kind of identification audio object corresponding with one of image object;
Display multiple images data side by side.
After showing multiple images data, released according to the control that the control paraphrase in local audio changes all image datas Justice.Such as " closing all images " is identified in audio data, then all image datas are not shown, such as are identified in audio data " horizontally arranged " out then arranges all image data horizontals having shown that.
Embodiment two
A kind of intelligent display device based on audio-video identification, as shown in Fig. 2, being had based on display screen 1,1 data connection of display screen Control centre's module 2, the respectively picture recognition module 3 and audio identification module 4 with 2 data connection of control centre module, figure As identification module 3 is used for acquiring and analyzing image, audio identification module 4 for acquiring and analyzing audio, control centre's module 2 In reception image and analysis as a result, and by image and analysis as the result is shown in display screen 1.In actual device, display screen 1 Using LED screen, LED screen is provided with driving circuit and control centre's module 2 behind.
Picture recognition module 3 includes camera 31 and the recognition processor 32 with 31 data connection of camera, camera 31 for acquiring image data, recognition processor 32 for receive image data and identified from image data image object and Its position coordinates in image data.Picture recognition module 3 is fixedly mounted on the upper end of display screen 1, it may include black and white camera shooting Head and/or colour imagery shot and/or body-sensing camera.The image data of plane by black and white camera or colour imagery shot, and Three-dimensional image data is then acquired by Kinect device and is obtained.Embedded computer can be used in recognition processor 32, embedded High pass series processors can be used in chip.
Picture recognition module 3 acquires out image data and image data is sent to control centre's module 2.
Audio identification module 4 includes audio collection device 41 and the analysis processor with 41 data connection of audio collection device 42, audio collection device 41 is for acquiring audio data, and analysis processor 42 is for receiving audio data and knowing from audio data It Chu not local audio belonging to audio object and audio object.Audio identification module 4 can acquire out audio data and by audio Data are sent to control centre's module 2.Microphone can be used in audio collection device 41, and embedded core can be used in analysis processor 42 Piece, operation has the speech recognition program baked into chip in embedded chip, and identified off-line goes out the Chinese in audio.
Control centre's module 2 can be made of MCU, PLC, industrial computer, home computer etc., and control centre's module 2 is wrapped The center processor 21 with 42 data connection of recognition processor 32 and analysis processor is included, 21 data connection of center processor has Video-stream processor 22, video-stream processor 22 and 1 data connection of display screen, center processor 21 receive recognition processor 32 and divide The data of processor 42 are analysed, and drive display screen 1 to show by video-stream processor 22.
As shown in figure 3, including matched data component 211 in center processor 21, during matched data component 211 can be used The comparator unit of heart processor 21.Comparator unit is the size that hardware circuit is used to compare two values, by activities Chinese be converted to numerical value a, then the Chinese of the corresponding movement of Chinese of local audio is converted into numerical value b, a and b are in coupling number Relatively and the whether consistent result of data is obtained according in component 211.Matched data component 211 is for matching activities and part Audio, if successful match, sending content to recognition processor 32 is that marker character is added in image data according to position coordinates Number marking signal.
Recognition processor 32 receives marking signal, according to pre-set instruction modification and updates figure according to marking signal As data, recognition processor 32 sends image data to center processor 21.
As shown in figure 4, including: in analysis processor 42
Initial audio component 421, for establishing the audio interpreted library for being preset with multiple control paraphrase.Initial audio component 421 can Using the flash cell for capableing of power down preservation, offline storage has the audio interpreted library set inside flash cell.
Corresponding paraphrase component 422, and 32 data connection of recognition processor, for corresponding control paraphrase in image data or Display properties.Corresponding paraphrase component 422 can be correspondence image data or the flash cell of display properties, offline in flash cell It is stored with the relation information of corresponding control paraphrase and image data or display properties relationship in advance.
It identifies paraphrase component 423, goes out the control paraphrase in local audio for identification.Identify that paraphrase component 423 is inside Offline storage has the flash cell of recognizer.
Change paraphrase component 424, for changing the control paraphrase of image data according to the control paraphrase in local audio, Or, changing the control paraphrase of display properties according to the control paraphrase in local audio.Changing paraphrase component 424 can be change image The flash cell of data or display properties, offline storage has the control instruction for changing control paraphrase in flash cell.It adjusts It can be achieved with the change of control paraphrase with control instruction.
In some other embodiment, include: in analysis processor 42
Volume initial component 425 is preset with multiple audio interpreted libraries for controlling paraphrase for establishing, and is preset with multiple and control Paraphrase processed acts the display interpreted library of paraphrase correspondingly.Volume initial component 425 is the flash cell of multiple serial connections, Offline storage has the audio interpreted library set inside flash cell.
Volume corresponds to component 426, for corresponding control paraphrase in image data and respective action paraphrase in picture number According to;Volume corresponds to the flash cell that component 426 is multiple serial connections, and offline storage has corresponding control in advance inside flash cell The relation information of paraphrase and image data or display properties relationship.
Volume recognizer component 427 goes out the control paraphrase in local audio for identification.Volume recognizer component 427 is multiple Serial and internal offline storage has the flash cell of recognizer.
Volume changes component 428, for being added in image data according to the control paraphrase in local audio and picture number According to corresponding movement paraphrase.Volume changes the flash cell that component 428 is multiple serial connections, there is offline storage in flash cell For changing the control instruction of control paraphrase.Control instruction is called to can be achieved with the change of control paraphrase.
Image object is multiple.
Center processor 21 includes identifying corresponding component 212, for identification one kind corresponding with one of image object Audio object.Video-stream processor 22 includes display component 221 arranged side by side, for display multiple images data side by side, shows group side by side Part 221 can be multiple display screens 1 of laid out in parallel.
Center processor 21 includes global change component 213, all for being changed according to the control paraphrase in local audio The control paraphrase of image data.It can be flash cell that the overall situation, which changes component 213, there is for changing offline storage in flash cell There is the global control instruction of control paraphrase.Global control instruction is called to can be achieved with the unified change of all control paraphrase.
The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of intelligent display method based on audio-video identification, which comprises the steps of:
Acquire image data and audio data;
From the position coordinates for identifying image object in image data and its in image data;
According to activities belonging to position coordinates identification image object;
Local audio belonging to audio object and audio object is identified from audio data;
Match activities and local audio;
If successful match, label symbol is added in image data according to position coordinates;
Display has the image data of label symbol.
2. the method according to claim 1, wherein further include:
Establish the audio interpreted library for being preset with multiple control paraphrase;
Corresponding control paraphrase is in image data or display properties;
Identify the control paraphrase in local audio;
Change the control paraphrase of image data according to the control paraphrase in local audio;
Or, changing the control paraphrase of display properties according to the control paraphrase in local audio.
3. the method according to claim 1, wherein further include:
Establish be preset with it is multiple control paraphrase audio interpreted libraries, and be preset with it is multiple with control paraphrase act correspondingly The display interpreted library of paraphrase;
Corresponding control paraphrase is in image data;
Respective action paraphrase is in image data;
Identify the control paraphrase in local audio;
Movement paraphrase corresponding with image data is added in image data according to the control paraphrase in local audio.
4. the method according to claim 1, wherein further include:
Image object is multiple;
A kind of identification audio object corresponding with one of image object;
Display multiple images data side by side.
5. according to the method in claim 2 or 3, which is characterized in that further include:
Change the control paraphrase of all image datas according to the control paraphrase in local audio.
6. a kind of intelligent display device based on audio-video identification, which is characterized in that be based on display screen (1), display screen (1) data It is connected with control centre's module (2), the respectively picture recognition module (3) and audio with control centre module (2) data connection Identification module (4), picture recognition module (3) is for acquiring and analyzing image, and audio identification module (4) is for acquiring and analyzing sound Frequently, control centre's module (2) is for receiving image and analysis as a result, and by image and analysis as the result is shown in display screen (1);
Picture recognition module (3) acquires out image data and image data is sent to control centre's module (2), audio identification mould Block (4) can acquire out audio data and audio data is sent to control centre's module (2);
Further include:
Picture recognition module (3) includes camera (31) and the recognition processor (32) with camera (31) data connection, is taken the photograph As head (31) are used to receive image data and identify figure from image data for acquiring image data, recognition processor (32) Position coordinates as target and its in image data;
Audio identification module (4) includes audio collection device (41) and the analysis processor with audio collection device (41) data connection (42), audio collection device (41) is for acquiring audio data, and analysis processor (42) is for receiving audio data and from audio number Local audio belonging to audio object and audio object is identified in;
Control centre's module (2) includes the center processor with recognition processor (32) and analysis processor (42) data connection (21), center processor (21) data connection has video-stream processor (22), and video-stream processor (22) and display screen (1) data connect It connects, center processor (21) receives the data of recognition processor (32) and analysis processor (42), and passes through video-stream processor (22) driving display screen (1) is shown;
It include matched data component (211) in center processor (21), matched data component (211) is for matching activities With local audio, if successful match, sending content to recognition processor (32) is to be added in image data according to position coordinates The marking signal of marking symbol;
Recognition processor (32) receives marking signal, according to marking signal according to pre-set instruction modification and more new images Data, recognition processor (32) send image data to center processor (21).
7. device according to claim 6, which is characterized in that analysis processor includes: in (42)
Initial audio component (421), for establishing the audio interpreted library for being preset with multiple control paraphrase;
Corresponding paraphrase component (422), and recognition processor (32) data connection, for corresponding control paraphrase in image data or Display properties;
It identifies paraphrase component (423), goes out the control paraphrase in local audio for identification;
Change paraphrase component (424), for changing the control paraphrase of image data according to the control paraphrase in local audio, or, Change the control paraphrase of display properties according to the control paraphrase in local audio.
8. device according to claim 6, which is characterized in that analysis processor includes: in (42)
Volume initial component (425), for establish be preset with it is multiple control paraphrase audio interpreted libraries, and be preset with it is multiple with Control paraphrase acts the display interpreted library of paraphrase correspondingly;
Volume corresponds to component (426), for corresponding control paraphrase in image data and respective action paraphrase in image data;
Volume recognizer component (427) goes out the control paraphrase in local audio for identification;
Volume changes component (428), for being added in image data according to the control paraphrase in local audio and image data Corresponding movement paraphrase.
9. device according to claim 6, which is characterized in that image object is multiple;
Center processor (21) includes identifying corresponding component (212), for identification one kind corresponding with one of image object Audio object;
Video-stream processor (22) includes display component arranged side by side (221), for showing multiple images data side by side.
10. device according to claim 7 or 8, which is characterized in that further include:
Center processor (21) includes global change component (213), all for being changed according to the control paraphrase in local audio The control paraphrase of image data.
CN201910296455.5A 2019-04-13 2019-04-13 A kind of intelligent display method and its device based on audio-video identification Pending CN110021297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910296455.5A CN110021297A (en) 2019-04-13 2019-04-13 A kind of intelligent display method and its device based on audio-video identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910296455.5A CN110021297A (en) 2019-04-13 2019-04-13 A kind of intelligent display method and its device based on audio-video identification

Publications (1)

Publication Number Publication Date
CN110021297A true CN110021297A (en) 2019-07-16

Family

ID=67191283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910296455.5A Pending CN110021297A (en) 2019-04-13 2019-04-13 A kind of intelligent display method and its device based on audio-video identification

Country Status (1)

Country Link
CN (1) CN110021297A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328086A (en) * 2020-11-14 2021-02-05 上海卓腾展览展示有限公司 Intelligent display method, system and device based on video identification and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063903A (en) * 2010-09-25 2011-05-18 中国科学院深圳先进技术研究院 Speech interactive training system and speech interactive training method
CN102194456A (en) * 2010-03-11 2011-09-21 索尼公司 Information processing device, information processing method and program
US20120201404A1 (en) * 2011-02-09 2012-08-09 Canon Kabushiki Kaisha Image information processing apparatus and control method therefor
CN102932212A (en) * 2012-10-12 2013-02-13 华南理工大学 Intelligent household control system based on multichannel interaction manner
CN104428832A (en) * 2012-07-09 2015-03-18 Lg电子株式会社 Speech recognition apparatus and method
US20150088515A1 (en) * 2013-09-25 2015-03-26 Lenovo (Singapore) Pte. Ltd. Primary speaker identification from audio and video data
CN104966053A (en) * 2015-06-11 2015-10-07 腾讯科技(深圳)有限公司 Face recognition method and recognition system
TW201643689A (en) * 2015-05-19 2016-12-16 卡訊電子股份有限公司 Broadcast control system, method, computer program product and computer readable medium
CN106875947A (en) * 2016-12-28 2017-06-20 北京光年无限科技有限公司 For the speech output method and device of intelligent robot
CN108227903A (en) * 2016-12-21 2018-06-29 深圳市掌网科技股份有限公司 A kind of virtual reality language interactive system and method
CN108259844A (en) * 2018-03-29 2018-07-06 合肥惠科金扬科技有限公司 Intelligent display device with face and speech identifying function
CN208000587U (en) * 2018-03-08 2018-10-23 上海分泽时代软件技术有限公司 Image based on big data and speech recognition system
CN108831462A (en) * 2018-06-26 2018-11-16 北京奇虎科技有限公司 Vehicle-mounted voice recognition methods and device
CN109448708A (en) * 2018-10-15 2019-03-08 四川长虹电器股份有限公司 Far field voice wakes up system

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194456A (en) * 2010-03-11 2011-09-21 索尼公司 Information processing device, information processing method and program
CN102063903A (en) * 2010-09-25 2011-05-18 中国科学院深圳先进技术研究院 Speech interactive training system and speech interactive training method
US20120201404A1 (en) * 2011-02-09 2012-08-09 Canon Kabushiki Kaisha Image information processing apparatus and control method therefor
CN104428832A (en) * 2012-07-09 2015-03-18 Lg电子株式会社 Speech recognition apparatus and method
CN102932212A (en) * 2012-10-12 2013-02-13 华南理工大学 Intelligent household control system based on multichannel interaction manner
US20150088515A1 (en) * 2013-09-25 2015-03-26 Lenovo (Singapore) Pte. Ltd. Primary speaker identification from audio and video data
TW201643689A (en) * 2015-05-19 2016-12-16 卡訊電子股份有限公司 Broadcast control system, method, computer program product and computer readable medium
CN104966053A (en) * 2015-06-11 2015-10-07 腾讯科技(深圳)有限公司 Face recognition method and recognition system
CN108227903A (en) * 2016-12-21 2018-06-29 深圳市掌网科技股份有限公司 A kind of virtual reality language interactive system and method
CN106875947A (en) * 2016-12-28 2017-06-20 北京光年无限科技有限公司 For the speech output method and device of intelligent robot
CN208000587U (en) * 2018-03-08 2018-10-23 上海分泽时代软件技术有限公司 Image based on big data and speech recognition system
CN108259844A (en) * 2018-03-29 2018-07-06 合肥惠科金扬科技有限公司 Intelligent display device with face and speech identifying function
CN108831462A (en) * 2018-06-26 2018-11-16 北京奇虎科技有限公司 Vehicle-mounted voice recognition methods and device
CN109448708A (en) * 2018-10-15 2019-03-08 四川长虹电器股份有限公司 Far field voice wakes up system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328086A (en) * 2020-11-14 2021-02-05 上海卓腾展览展示有限公司 Intelligent display method, system and device based on video identification and storage medium

Similar Documents

Publication Publication Date Title
US11270099B2 (en) Method and apparatus for generating facial feature
CN110443110B (en) Face recognition method, device, terminal and storage medium based on multipath camera shooting
CN107660039B (en) A kind of lamp control system of identification dynamic gesture
CN109804622B (en) Recoloring of infrared image streams
CN108470169A (en) Face identification system and method
CN104808794B (en) lip language input method and system
WO2020006964A1 (en) Image detection method and device
US11334973B2 (en) Image colorizing method and device
CN111145257B (en) Article grabbing method and system and article grabbing robot
CN105528078B (en) The method and device of controlling electronic devices
CN111401246A (en) Smoke concentration detection method, device, equipment and storage medium
CN102147684B (en) Screen scanning method for touch screen and system thereof
CN112464885A (en) Image processing system for future change of facial color spots based on machine learning
CN109558788A (en) Silent voice inputs discrimination method, computing device and computer-readable medium
CN114556469A (en) Data processing method and device, electronic equipment and storage medium
CN109697389B (en) Identity recognition method and device
CN113705510A (en) Target identification tracking method, device, equipment and storage medium
KR102440198B1 (en) VIDEO SEARCH METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
CN112115950A (en) Wine label identification method, wine information management method, device, equipment and storage medium
CN115314713A (en) Method, system and device for extracting target segment in real time based on accelerated video
WO2021082045A1 (en) Smile expression detection method and apparatus, and computer device and storage medium
CN110119605A (en) A kind of portable multimedia training examination system
WO2022062027A1 (en) Wine product positioning method and apparatus, wine product information management method and apparatus, and device and storage medium
CN110021297A (en) A kind of intelligent display method and its device based on audio-video identification
CN105095841A (en) Method and device for generating eyeglasses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination