CN110021297A

CN110021297A - A kind of intelligent display method and its device based on audio-video identification

Info

Publication number: CN110021297A
Application number: CN201910296455.5A
Authority: CN
Inventors: 倪雪平; 尹大海; 金文俊; 倪末萍
Original assignee: Shanghai Ying Long Opto Electronic Co Ltd
Current assignee: Shanghai Ying Long Opto Electronic Co Ltd
Priority date: 2019-04-13
Filing date: 2019-04-13
Publication date: 2019-07-16

Abstract

The present invention relates to interaction field of display technology, disclose a kind of intelligent display method and its device based on audio-video identification, and method includes: acquisition image data and audio data；From the position coordinates for identifying image object in image data and its in image data；According to activities belonging to position coordinates identification image object；Local audio belonging to audio object and audio object is identified from audio data；Match activities and local audio；If successful match, label symbol is added in image data according to position coordinates；Display image data；Judge whether activities and local audio action are corresponding, if corresponding, then representative image target identification is correct, then display has the image data of label symbol, the accuracy rate for controlling position coordinates is in high level, and user can participate in control using movement and sound simultaneously or change display picture material.

Description

A kind of intelligent display method and its device based on audio-video identification

Technical field

The present invention relates to interaction field of display technology, more specifically, it relates to a kind of intelligence based on audio-video identification Display methods and its device.

Background technique

With the development of display screen interaction technique, camera is installed additional in LED screen, camera shoots the video before LED screen Image, LED screen show the picture that camera photographed.

After realizing interaction, camera is connected with the computing module for reading video image, and computing module can be mobile working It stands, computing module tentatively identifies the facial image in video image by template matching method, and marks facial image in video Position is sent to LED screen by the position in image, computing module, and LED screen shows out position, allows the personnel of observation LED screen can Know orientation locating for oneself face, realizes preliminary interaction.

But in the prior art that audio is not added to interactive participation, user can not be simultaneously using oneself movement and sound Sound participates in changing the operation of image in LED screen.

Summary of the invention

Problem in view of the prior art, the purpose of the present invention one are to provide a kind of intelligent display based on audio-video identification Method has the advantages that user can participate in control using movement and sound simultaneously or change display picture material；This hair Bright purpose two is to provide a kind of intelligent display device based on audio-video identification, with user can simultaneously using movement with And sound participates in control or changes the advantages of showing picture material.

To achieve the above object one, the present invention provides the following technical scheme that

A kind of intelligent display method based on audio-video identification, includes the following steps:

Acquire image data and audio data；

From the position coordinates for identifying image object in image data and its in image data；

According to activities belonging to position coordinates identification image object；

Local audio belonging to audio object and audio object is identified from audio data；

Match activities and local audio；

If successful match, label symbol is added in image data according to position coordinates；

Display has the image data of label symbol.

Through the above technical solutions, first identifying the position coordinates of image object from image data, then sat in position The activities of image object are identified on target basis, judge whether activities and local audio action are corresponding, if corresponding, Representative image target identification is correct, and then display has the image data of label symbol, controls at the accuracy rate of position coordinates In high level；Activities can be mouth action, and part audio corresponding with mouth action is the voice said；Activities It can be gesture, part audio corresponding with gesture is sound of the wind, and user can participate in control using movement and sound simultaneously or change Become display picture material.

Further, further includes:

Establish the audio interpreted library for being preset with multiple control paraphrase；

Corresponding control paraphrase is in image data or display properties；

Identify the control paraphrase in local audio；

Change the control paraphrase of image data according to the control paraphrase in local audio；

Or, changing the control paraphrase of display properties according to the control paraphrase in local audio.

Through the above technical solutions, control paraphrase can for it is multiple have control meaning words, such as " the next item up ", " under One " or " increasing screen intensity " etc., if identifying " the next item up ", show next image data or next image Target increases the brightness of display screen if identifying " increasing screen intensity ".

Further, further includes:

Establish be preset with it is multiple control paraphrase audio interpreted libraries, and be preset with it is multiple with control paraphrase act correspondingly The display interpreted library of paraphrase；

Corresponding control paraphrase is in image data；

Respective action paraphrase is in image data；

Identify the control paraphrase in local audio；

Movement paraphrase corresponding with image data is added in image data according to the control paraphrase in local audio.

Through the above technical solutions, control paraphrase can for it is multiple have control meaning words, such as " the next item up ", " under One " or " increasing screen intensity " etc., if identifying " the next item up ", show next image data or next image Target increases the brightness of display screen if identifying " increasing screen intensity "；Movement meaning can be had to be multiple by acting paraphrase Word, such as " grayscale image ", " amplification " or " diminution " etc., if identifying " grayscale image ", the gray scale of display image data Figure, if identifying " amplification " or " diminution ", the image object in the display image data that zooms in or out, amplification factor or Person's minification can be previously set, can also by subsequent speech recognition go out number depending on.

Further, further includes:

Image object is multiple；

A kind of identification audio object corresponding with one of image object；

Display multiple images data side by side.

Further, further includes:

Change the control paraphrase of all image datas according to the control paraphrase in local audio.

To achieve the above object two, the present invention provides the following technical scheme that

It is a kind of based on audio-video identification intelligent display device, based on display screen, screen data be connected with control centre's module, The picture recognition module and audio identification module connecting respectively with control centre module data, picture recognition module is for acquiring And image is analyzed, audio identification module is for acquiring and analyzing audio, and control centre's module is for receiving image and analysis knot Fruit, and by image and analysis as the result is shown in display screen；

Picture recognition module acquires out image data and image data is sent to control centre's module, and audio identification module can adopt Collect audio data out and audio data is sent to control centre's module；

Further include:

Picture recognition module includes camera and the recognition processor with camera data connection, and camera is for acquiring image Data, recognition processor is for receiving image data and from identifying image object in image data and its in image data Position coordinates；

Audio identification module includes audio collection device and the analysis processor with audio collection device data connection, audio collection device For acquiring audio data, analysis processor is for receiving audio data and identifying audio object and sound from audio data Local audio belonging to frequency target；

Control centre's module includes the center processor with recognition processor and analysis processor data connection, center processor Data connection has video-stream processor, and video-stream processor is connect with screen data, center processor receive recognition processor and The data of analysis processor, and drive display screen to show by video-stream processor；

It include matched data component in center processor, matched data component is used to match activities and local audio, if Successful match, then sending content to recognition processor is to add the label letter of label symbol in image data according to position coordinates Number；

Recognition processor receives marking signal, according to pre-set instruction modification and updates picture number according to marking signal According to recognition processor sends image data to center processor.

Further, include: in analysis processor

Initial audio component, for establishing the audio interpreted library for being preset with multiple control paraphrase；

Corresponding paraphrase component controls paraphrase in image data or display properties for corresponding with recognition processor data connection；

It identifies paraphrase component, goes out the control paraphrase in local audio for identification；

Change paraphrase component, for changing the control paraphrase of image data according to the control paraphrase in local audio, or, according to office Control paraphrase in portion's audio changes the control paraphrase of display properties.

Further, include: in analysis processor

Volume initial component is preset with multiple audio interpreted libraries for controlling paraphrase for establishing, and is preset with multiple and control Paraphrase acts the display interpreted library of paraphrase correspondingly；

Volume corresponds to component, for corresponding control paraphrase in image data and respective action paraphrase in image data；

Volume recognizer component goes out the control paraphrase in local audio for identification；

Volume changes component, corresponding with image data for being added in image data according to the control paraphrase in local audio Act paraphrase.

Further, image object is multiple；

Center processor includes identifying corresponding component, for identification a kind of audio object corresponding with one of image object；

Video-stream processor includes display component arranged side by side, for showing multiple images data side by side.

Further, further includes:

Center processor includes global change component, for changing all image datas according to the control paraphrase in local audio Control paraphrase.

Compared with prior art, the beneficial effects of the present invention are: first identifying the position of image object from image data Then coordinate identifies the activities of image object on the basis of position coordinates, judge activities and local audio action Whether correspond to, if corresponding, representative image target identification is correct, and then display has the image data of label symbol, controls The accuracy rate of position coordinates is in high level；Activities can be mouth action, and part audio corresponding with mouth action is The voice said；Activities can be gesture, it is corresponding with gesture part audio be sound of the wind, user can simultaneously using movement and Sound participates in control or changes display picture material.

Detailed description of the invention

Fig. 1 is the method flow diagram of the embodiment of the present invention one；

Fig. 2 is the device block diagram of the embodiment of the present invention two；

Fig. 3 is the block component diagram of two control centre's module of the embodiment of the present invention

Fig. 4 is the block component diagram of two analysis processor of the embodiment of the present invention.

Appended drawing reference: 1, display screen；2, control centre's module；21, center processor；211, matched data component；212, Identify corresponding component；213, global to change component；22, video-stream processor；221, display component arranged side by side；3, picture recognition module； 31, camera；32, recognition processor；4, audio identification module；41, audio collection device；42, analysis processor；421, initial sound Frequency component；422, corresponding paraphrase component；423, paraphrase component is identified；424, change paraphrase component；425, volume initial component； 426, volume corresponds to component；427, volume recognizer component；428, volume changes component.

Specific embodiment

With reference to the accompanying drawings and examples, the present invention will be described in detail.

Embodiment one

A kind of intelligent display method based on audio-video identification, as shown in Figure 1, including the following steps:

Acquire image data and audio data.Image data is plane or solid.The image data of plane is by black and white camera Or colour imagery shot, and three-dimensional image data is then acquired by Kinect device and is obtained.Audio data can be acquired by microphone Audio volume control by spectrum analysis and obtain.

From the position coordinates for identifying image object in image data and its in image data.Image object can be image Face in data, the method that template matching can be used match the topography of face and face, identify face and people The algorithm of the topography of face is the prior art, is not being repeated herein.Identify the seat after face where face in image data Mark is position coordinates.

According to activities belonging to position coordinates identification image object.It is starting point to the Local map of face using position coordinates As being detected, so as to increase the accuracy of face topography, image object is personnel A, and has personnel B by personnel A, Avoiding face topography is the position on personnel B, improves the accuracy of topography.Activities are more on a timeline The change procedure of the combination of width topography, several topographies can also be activities, for example, activities are mouth, mouth Combination of actions when bar speaking is activities.

Local audio belonging to audio object and audio object is identified from audio data.It is identified from audio data Text out, speech recognition technology are the prior art, the speech API that the companies such as Baidu, Iflytek or search dog can be called to provide Interface is identified, identifies the text information contained in audio data.

Match activities and local audio.It is had when mouth says text information and generates corresponding movement, for example, office Portion's movement can be to open and being opened flat, when text information is " good ", and activities are to open, when text information is " one ", Activities are to be opened flat.

If successful match, label symbol is added in image data according to position coordinates.When the part in image data Acting corresponding with the text information in audio data is then successful match, for it fails to match if not corresponding.Successful match then generation Table image data is accurate, to add label symbol in the corresponding position of image data according to position coordinates.Label meets It can be red frame, or the data processing on tone, such as contrast processing, coloration are carried out to the image data of position nearby coordinates Processing or pattern distortion processing etc..

Display has the image data of label symbol.Pass through the display of display screen 1 treated image data.

The position coordinates that image object is first identified from image data, then identify image on the basis of position coordinates The activities of target judge whether activities and local audio action are corresponding, if corresponding, representative image target identification is just Really, then image data of the display with label symbol, the accuracy rate for controlling position coordinates are in high level；Activities It can be mouth action, part audio corresponding with mouth action is the voice said；Activities can be gesture, corresponding with gesture Local audio be sound of the wind, user can simultaneously using movement and sound participate in control or change display picture material.

When handling audio data, further includes:

Establish the audio interpreted library for being preset with multiple control paraphrase.

Corresponding control paraphrase is in image data or display properties.

Identify the control paraphrase in local audio.

Change the control paraphrase of image data according to the control paraphrase in local audio.

Controlling paraphrase can be multiple words with control meaning, such as " the next item up ", " the next item down " or " increase screen Brightness " etc. shows next image data or next image object if identifying " the next item up ", " increases if identifying Screen intensity " then increases by 1 act of display screen of brightness.

In some other embodiment, further includes:

Establish be preset with it is multiple control paraphrase audio interpreted libraries, and be preset with it is multiple with control paraphrase act correspondingly The display interpreted library of paraphrase.

Corresponding control paraphrase is in image data；

Respective action paraphrase is in image data；

Identify the control paraphrase in local audio；

Controlling paraphrase can be multiple words with control meaning, such as " the next item up ", " the next item down " or " increase screen Brightness " etc. shows next image data or next image object if identifying " the next item up ", " increases if identifying Screen intensity " then increases by 1 act of display screen of brightness；Acting paraphrase can be multiple words with movement meaning, such as " gray scale Figure ", " amplification " or " diminution " etc., if identifying " grayscale image ", the grayscale image of display image data, if identifying " amplification " Or " diminution ", then the image object in the display image data that zooms in or out, amplification factor or minification can be prior Setting, can also by subsequent speech recognition go out number depending on.

In some other embodiment, further includes:

Image object is multiple；

A kind of identification audio object corresponding with one of image object；

Display multiple images data side by side.

After showing multiple images data, released according to the control that the control paraphrase in local audio changes all image datas Justice.Such as " closing all images " is identified in audio data, then all image datas are not shown, such as are identified in audio data " horizontally arranged " out then arranges all image data horizontals having shown that.

Embodiment two

A kind of intelligent display device based on audio-video identification, as shown in Fig. 2, being had based on display screen 1,1 data connection of display screen Control centre's module 2, the respectively picture recognition module 3 and audio identification module 4 with 2 data connection of control centre module, figure As identification module 3 is used for acquiring and analyzing image, audio identification module 4 for acquiring and analyzing audio, control centre's module 2 In reception image and analysis as a result, and by image and analysis as the result is shown in display screen 1.In actual device, display screen 1 Using LED screen, LED screen is provided with driving circuit and control centre's module 2 behind.

Picture recognition module 3 includes camera 31 and the recognition processor 32 with 31 data connection of camera, camera 31 for acquiring image data, recognition processor 32 for receive image data and identified from image data image object and Its position coordinates in image data.Picture recognition module 3 is fixedly mounted on the upper end of display screen 1, it may include black and white camera shooting Head and/or colour imagery shot and/or body-sensing camera.The image data of plane by black and white camera or colour imagery shot, and Three-dimensional image data is then acquired by Kinect device and is obtained.Embedded computer can be used in recognition processor 32, embedded High pass series processors can be used in chip.

Picture recognition module 3 acquires out image data and image data is sent to control centre's module 2.

Audio identification module 4 includes audio collection device 41 and the analysis processor with 41 data connection of audio collection device 42, audio collection device 41 is for acquiring audio data, and analysis processor 42 is for receiving audio data and knowing from audio data It Chu not local audio belonging to audio object and audio object.Audio identification module 4 can acquire out audio data and by audio Data are sent to control centre's module 2.Microphone can be used in audio collection device 41, and embedded core can be used in analysis processor 42 Piece, operation has the speech recognition program baked into chip in embedded chip, and identified off-line goes out the Chinese in audio.

Control centre's module 2 can be made of MCU, PLC, industrial computer, home computer etc., and control centre's module 2 is wrapped The center processor 21 with 42 data connection of recognition processor 32 and analysis processor is included, 21 data connection of center processor has Video-stream processor 22, video-stream processor 22 and 1 data connection of display screen, center processor 21 receive recognition processor 32 and divide The data of processor 42 are analysed, and drive display screen 1 to show by video-stream processor 22.

As shown in figure 3, including matched data component 211 in center processor 21, during matched data component 211 can be used The comparator unit of heart processor 21.Comparator unit is the size that hardware circuit is used to compare two values, by activities Chinese be converted to numerical value a, then the Chinese of the corresponding movement of Chinese of local audio is converted into numerical value b, a and b are in coupling number Relatively and the whether consistent result of data is obtained according in component 211.Matched data component 211 is for matching activities and part Audio, if successful match, sending content to recognition processor 32 is that marker character is added in image data according to position coordinates Number marking signal.

Recognition processor 32 receives marking signal, according to pre-set instruction modification and updates figure according to marking signal As data, recognition processor 32 sends image data to center processor 21.

As shown in figure 4, including: in analysis processor 42

Initial audio component 421, for establishing the audio interpreted library for being preset with multiple control paraphrase.Initial audio component 421 can Using the flash cell for capableing of power down preservation, offline storage has the audio interpreted library set inside flash cell.

Corresponding paraphrase component 422, and 32 data connection of recognition processor, for corresponding control paraphrase in image data or Display properties.Corresponding paraphrase component 422 can be correspondence image data or the flash cell of display properties, offline in flash cell It is stored with the relation information of corresponding control paraphrase and image data or display properties relationship in advance.

It identifies paraphrase component 423, goes out the control paraphrase in local audio for identification.Identify that paraphrase component 423 is inside Offline storage has the flash cell of recognizer.

Change paraphrase component 424, for changing the control paraphrase of image data according to the control paraphrase in local audio, Or, changing the control paraphrase of display properties according to the control paraphrase in local audio.Changing paraphrase component 424 can be change image The flash cell of data or display properties, offline storage has the control instruction for changing control paraphrase in flash cell.It adjusts It can be achieved with the change of control paraphrase with control instruction.

In some other embodiment, include: in analysis processor 42

Volume initial component 425 is preset with multiple audio interpreted libraries for controlling paraphrase for establishing, and is preset with multiple and control Paraphrase processed acts the display interpreted library of paraphrase correspondingly.Volume initial component 425 is the flash cell of multiple serial connections, Offline storage has the audio interpreted library set inside flash cell.

Volume corresponds to component 426, for corresponding control paraphrase in image data and respective action paraphrase in picture number According to；Volume corresponds to the flash cell that component 426 is multiple serial connections, and offline storage has corresponding control in advance inside flash cell The relation information of paraphrase and image data or display properties relationship.

Volume recognizer component 427 goes out the control paraphrase in local audio for identification.Volume recognizer component 427 is multiple Serial and internal offline storage has the flash cell of recognizer.

Volume changes component 428, for being added in image data according to the control paraphrase in local audio and picture number According to corresponding movement paraphrase.Volume changes the flash cell that component 428 is multiple serial connections, there is offline storage in flash cell For changing the control instruction of control paraphrase.Control instruction is called to can be achieved with the change of control paraphrase.

Image object is multiple.

Center processor 21 includes identifying corresponding component 212, for identification one kind corresponding with one of image object Audio object.Video-stream processor 22 includes display component 221 arranged side by side, for display multiple images data side by side, shows group side by side Part 221 can be multiple display screens 1 of laid out in parallel.

Center processor 21 includes global change component 213, all for being changed according to the control paraphrase in local audio The control paraphrase of image data.It can be flash cell that the overall situation, which changes component 213, there is for changing offline storage in flash cell There is the global control instruction of control paraphrase.Global control instruction is called to can be achieved with the unified change of all control paraphrase.

The above is only a preferred embodiment of the present invention, protection scope of the present invention is not limited merely to above-mentioned implementation Example, all technical solutions belonged under thinking of the present invention all belong to the scope of protection of the present invention.It should be pointed out that for the art Those of ordinary skill for, several improvements and modifications without departing from the principles of the present invention, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims

1. a kind of intelligent display method based on audio-video identification, which comprises the steps of:

Acquire image data and audio data；

Match activities and local audio；

Display has the image data of label symbol.

2. the method according to claim 1, wherein further include:

Corresponding control paraphrase is in image data or display properties；

Identify the control paraphrase in local audio；

3. the method according to claim 1, wherein further include:

Corresponding control paraphrase is in image data；

Respective action paraphrase is in image data；

Identify the control paraphrase in local audio；

4. the method according to claim 1, wherein further include:

Image object is multiple；

A kind of identification audio object corresponding with one of image object；

Display multiple images data side by side.

5. according to the method in claim 2 or 3, which is characterized in that further include:

6. a kind of intelligent display device based on audio-video identification, which is characterized in that be based on display screen (1), display screen (1) data It is connected with control centre's module (2), the respectively picture recognition module (3) and audio with control centre module (2) data connection Identification module (4), picture recognition module (3) is for acquiring and analyzing image, and audio identification module (4) is for acquiring and analyzing sound Frequently, control centre's module (2) is for receiving image and analysis as a result, and by image and analysis as the result is shown in display screen (1)；

Picture recognition module (3) acquires out image data and image data is sent to control centre's module (2), audio identification mould Block (4) can acquire out audio data and audio data is sent to control centre's module (2)；

Further include:

Picture recognition module (3) includes camera (31) and the recognition processor (32) with camera (31) data connection, is taken the photograph As head (31) are used to receive image data and identify figure from image data for acquiring image data, recognition processor (32) Position coordinates as target and its in image data；

Audio identification module (4) includes audio collection device (41) and the analysis processor with audio collection device (41) data connection (42), audio collection device (41) is for acquiring audio data, and analysis processor (42) is for receiving audio data and from audio number Local audio belonging to audio object and audio object is identified in；

Control centre's module (2) includes the center processor with recognition processor (32) and analysis processor (42) data connection (21), center processor (21) data connection has video-stream processor (22), and video-stream processor (22) and display screen (1) data connect It connects, center processor (21) receives the data of recognition processor (32) and analysis processor (42), and passes through video-stream processor (22) driving display screen (1) is shown；

It include matched data component (211) in center processor (21), matched data component (211) is for matching activities With local audio, if successful match, sending content to recognition processor (32) is to be added in image data according to position coordinates The marking signal of marking symbol；

Recognition processor (32) receives marking signal, according to marking signal according to pre-set instruction modification and more new images Data, recognition processor (32) send image data to center processor (21).

7. device according to claim 6, which is characterized in that analysis processor includes: in (42)

Initial audio component (421), for establishing the audio interpreted library for being preset with multiple control paraphrase；

Corresponding paraphrase component (422), and recognition processor (32) data connection, for corresponding control paraphrase in image data or Display properties；

It identifies paraphrase component (423), goes out the control paraphrase in local audio for identification；

Change paraphrase component (424), for changing the control paraphrase of image data according to the control paraphrase in local audio, or, Change the control paraphrase of display properties according to the control paraphrase in local audio.

8. device according to claim 6, which is characterized in that analysis processor includes: in (42)

Volume initial component (425), for establish be preset with it is multiple control paraphrase audio interpreted libraries, and be preset with it is multiple with Control paraphrase acts the display interpreted library of paraphrase correspondingly；

Volume corresponds to component (426), for corresponding control paraphrase in image data and respective action paraphrase in image data；

Volume recognizer component (427) goes out the control paraphrase in local audio for identification；

Volume changes component (428), for being added in image data according to the control paraphrase in local audio and image data Corresponding movement paraphrase.

9. device according to claim 6, which is characterized in that image object is multiple；

Center processor (21) includes identifying corresponding component (212), for identification one kind corresponding with one of image object Audio object；

Video-stream processor (22) includes display component arranged side by side (221), for showing multiple images data side by side.

10. device according to claim 7 or 8, which is characterized in that further include:

Center processor (21) includes global change component (213), all for being changed according to the control paraphrase in local audio The control paraphrase of image data.