CN107318054A

CN107318054A - Audio-visual automated processing system and method

Info

Publication number: CN107318054A
Application number: CN201610266079.1A
Authority: CN
Inventors: 张书纶; 林延宜
Original assignee: Shenzhen Yuzhan Precision Technology Co ltd; Hon Hai Precision Industry Co Ltd
Current assignee: Shenzhen Yuzhan Precision Technology Co ltd; Hon Hai Precision Industry Co Ltd
Priority date: 2016-04-26
Filing date: 2016-04-26
Publication date: 2017-11-03

Abstract

The present invention provides a kind of audio-visual automated processing system, runs in server, the server is connected with a transmitting terminal, and the system includes：Setup module, for determining to need the effect corresponding to each target signature and each target signature of detecting；Receiving module, for receiving video-audio data from the transmitting terminal；Detecting module, for when receiving the video-audio data, each target signature to be detected from the video-audio data；Processing module, is added in the video-audio data for when detecting target signature, obtaining the effect corresponding to the target signature, and by the effect.Present invention also offers a kind of audio-visual automated processing system.The processing that the video-audio data of dynamic load can be automated by the present invention.

Description

Audio-visual automated processing system and method

Technical field

The present invention relates to a kind of audio-visual automated processing system and a kind of audio-visual automatically process Method.

Background technology

Existing video audio processing system is needed first by the shadow when to audio-visual handle Sound file all loadeds, are then recognized set by user from the audio/video file of loading Static object, then will be user-defined static according to the object of identification Object is added in the audio/video file.This every time to handle audio/video file When be required for the audio/video file loaded, and the object and the object of addition of identification All it is static.

The content of the invention

In view of the foregoing, it is necessary to which a kind of audio-visual automated processing system and one are provided Plant sound processing method, the place that can be automated to the video-audio data of dynamic load Reason.

A kind of audio-visual automated processing system, runs in server, the server and one Individual transmitting terminal is connected, and the system includes：Setup module, for determining to need what is detected Effect corresponding to each target signature and each target signature；Receiving module, is used In from the transmitting terminal receive video-audio data；Detecting module, for receive it is described During video-audio data, each target signature is detected from the video-audio data；Processing module, For when detecting target signature, obtaining the effect corresponding to the target signature, and The effect is added in the video-audio data.

A kind of audio-visual automatic processing method, applied in server, the server and one Individual transmitting terminal is connected, and this method includes：Setting steps, it is determined that needing each of detecting Effect corresponding to target signature and each target signature；Receiving step, from described Transmitting terminal receives video-audio data；Step is detected, when receiving the video-audio data, Each target signature is detected from the video-audio data；Process step, when detecting mesh When marking feature, the effect corresponding to the target signature is obtained, and the effect is added to In the video-audio data.

Compared to prior art, audio-visual automated processing system of the invention can be from dynamic The target signature of setting is detected in the video-audio data of loading, and target signature institute is right The effect answered is added in the video-audio data.

Brief description of the drawings

Fig. 1 is the running environment signal of the embodiment of audio-visual automated processing system of the invention Figure.

Fig. 2 is the functional block diagram of the embodiment of audio-visual automated processing system of the invention.

Fig. 3 is the flow chart of the embodiment of sound processing method of the present invention.

Main element symbol description

Following embodiment will further illustrate the present invention with reference to above-mentioned accompanying drawing.

Embodiment

As shown in fig.1, being the operation of the embodiment of audio-visual automated processing system of the invention Environment schematic.The audio-visual automated processing system 10 is installed in the server 1. The server 1 (is only drawn with a transmitting terminal 2 and at least one receiving terminal 3 in figure One) communication connection.

The server 1 also includes, but not limited to first communication device 11, first deposited Storage device 12 and first processor 13.The transmitting terminal 2 includes, but not limited to Two communicators 21, the second storage device 22, second processor 23 and input unit 24.The receiving terminal 3 includes, but not limited to third communication device 31, the 3rd storage Device 32, the 3rd processor 33 and playing device 34.

The server 1 is logical by described first with the transmitting terminal 2 and receiving terminal 3 T unit 11, secondary communication device 21 and third communication device 31 are communicated to connect.Institute Stating first communication device 11, secondary communication device 21 and third communication device 31 can be with Be wireless network card, GPRS module etc. can realize radio communication device or Network interface card etc. can realize the device of wire communication.In the present embodiment, the server 1, Transmitting terminal 2 and receiving terminal 3 pass through the first communication device 11, secondary communication device 21 and third communication device 31 be connected with internet communication, then via the interconnection Network Communication is connected.

The first storage device 12, the second storage device 22 and the 3rd storage device 32 are installed on each in server 1, client 2 and receiving terminal 3 for storage respectively The programmed instruction section and data information of program, it can be that the storage insides such as internal memory are set Standby or smart media card (Smart Media Card), safe digital card Deposited outside (Secure Digital Card), flash memory cards (Flash Card) etc. Store up equipment.The first processor 13, the processor 33 of second processor 23 and the 3rd Be respectively used to perform be installed on it is each in the server 1, client 2 and receiving terminal 3 The programmed instruction section of individual program and each device is controlled to perform corresponding operation.

The input unit 24 is used for the input behaviour for receiving the user of the transmitting terminal 2 Make.The input operation includes each target signature that setting needs to detect.It is described each Effect corresponding to individual target signature can be acquiescence, can also be by the transmitting terminal 2 User set.That is, the input operation further comprises setting each Effect corresponding to target signature.The input operation, which can also further comprise receiving, to be used Family input video-audio data, the video-audio data can only include audio, only include video, Audio and video can also be included.The input unit 24 can be touch screen, key The input units such as disk, can further include microphone, image first-class voice and Video input device.

The target signature can be default countenance (such as smiling face, face of crying, Funny face etc.) or default action (such as raising one's hand, fall, wipe tears), Can also be default voice (such as laugh, applause, call for help) or Default object (such as cup, glasses, cap) etc..The audio-visual automatic place Reason system 10 passes through the above-mentioned target signature of default process monitoring, the default program Can be Facial expression recognition program, speech recognition program and article identification program In one or more.

The corresponding effect can play default sound (such as laugh, cheer Sound, the sound recorded in advance etc.) or play default picture or dynamic Picture or video or the specific effect of addition (for example add in the face of people Plus sunglasses, in the hand of people add loudspeaker etc.) or the effect above in two Plant or a variety of combinations.The audio-visual automated processing system 10 will by corresponding program The effect above is incorporated into the video-audio data.Corresponding program can be correspondence In the Video Rendering program of different-effect.

The playing device 34 is used to play the video-audio data after the processing received, and it can To be the audio-frequence player devices such as audio amplifier, loudspeaker, it can also further comprise that display screen etc. is regarded Frequency playing device.

The transmitting terminal 2, which is used for use, will need video-audio data to be processed and need to send To at least one list of receiving terminal 3 be sent to the server 1.The receiving terminal 3 is used In from the video-audio data after the reception processing of server 1, and play the video-audio data.Institute It can be mobile phone, tablet personal computer, Wearable etc. to state transmitting terminal 2 and receiving terminal 3 The equipment such as mobile device or notebook computer, PC.The service Device 1, which is used to receive from transmitting terminal 2, to be needed video-audio data to be processed and needs what is be sent to Receiving terminal 3, the video-audio data of reception is handled accordingly, and by the shadow after processing Sound data are sent to the receiving terminal 3 specified.The server 1 is a meter for being located at distal end Calculation machine or server or other equipment.

It should be noted that in certain embodiments, the transmitting terminal 2 can also be simultaneously It is receiving terminal 3.That is, the transmitting terminal 2 will need video-audio data to be processed to send To after the server 1, also from the video-audio data after the reception processing of server 1. Now, the transmitting terminal 2 is also simultaneously a receiving terminal 3, i.e. transmitting terminal 2 and receiving terminal 3 are arranged in same device or equipment.

The audio-visual automated processing system 10 is used for from the reception processing of transmitting terminal 2 Video-audio data and the target signature for needing detecting, and receiving the video-audio data When, detect the target signature from the video-audio data immediately, and by the target Effect corresponding to feature is automatically added in the video-audio data.

As shown in fig.2, being the function of the embodiment of audio-visual automated processing system of the invention Module map.The audio-visual automated processing system 10 can be divided into setup module 101, Receiving module 102, detecting module 103 and processing module 104.Mould alleged by the present invention Block is the series of computation machine program segment for referring to complete specific function, more suitable than program Together in the implementation procedure for describing the audio-visual automated processing system 10, below with reference to Fig. 3 Flow chart the concrete function of modules described.

As shown in fig.3, being the flow of the embodiment of audio-visual automatic processing method of the invention Figure.In the present embodiment, according to different demands, the step in flow chart shown in Fig. 3 Rapid execution sequence can change, and some steps can be omitted.

Step S31, setup module 101 determine need detect each target signature and Effect corresponding to each target signature.

In the present embodiment, described each target signature for needing to detect and each mesh Effect corresponding to mark feature is set by the user of the transmitting terminal 2. That is, described transmitting terminal 2 is detected the need for being received by input unit 24 set by user Target signature and each target signature corresponding to effect, then pass through described Target signature and each target signature institute that two communicators 21 detect the needs Corresponding effect is sent to the server 1.Specifically, the server 1 can be by All target signatures that can be detected and all effects that can be realized are sent to described Transmitting terminal 2, each target signature detected is needed so as to user's selection of transmitting terminal 2 And the effect corresponding to each target signature.For example, one target signature of setting is Certain is in short uttered, and sets an effect to play the one section of animation specified.

In another embodiment, described each target signature for needing to detect can be by institute The user for stating transmitting terminal 2 is set, the effect corresponding to each described target signature It can be the effect of acquiescence.Make that is, the transmitting terminal 2 is received by input unit 24 The target signature detected the need for user is set, is then filled by the described second communication Put 21 and the target signature for needing to detect is sent to the server 1.Specifically, The server 1 can be special by all target signatures that can be detected and each target Levy corresponding effect and be sent to the transmitting terminal 2, selected so as to the user of transmitting terminal 2 Select each target signature for needing to detect.

In another embodiment, it is described need detect each target signature and each Effect corresponding to target signature is acquiescence.Namely, it is necessary to each mesh of detecting Effect corresponding to mark feature and each target signature has been set.

Step S32, receiving module 102 is received from the transmitting terminal 2 need to be to be processed audio-visual One or more receiving terminals 3 that data and needs are sent to.The video-audio data can be with It is a voice document (such as one section recording) or a video file (example Such as one section of video for recording) or a voice flow (for example put through Phone) an or video flowing (video for example recorded).

It should be noted that in the present embodiment, transmitting terminal 2 by the video-audio data with The form of file stream is sent.The receiving module 102 is being received transmitted by transmitting terminal 2 File stream when, be just immediately performed step S33, at the same the receiving module 102 still after The video-audio data is received in continued access, without performing step again having received the video-audio data Rapid S33.

Step S33, detecting module 103 when receiving the video-audio data, immediately from Each target signature is detected in the video-audio data.The acquisition module 102 is receiving When stating video-audio data, just detected immediately by default program in the video-audio data The target signature set by the user of the transmitting terminal 2 whether is included, and which is included A little target signatures.And the acquisition module 103 is when detecting a target signature, just Step S34 is immediately performed, while the acquisition module 103 continues to the shadow that detecting is received Whether also include other target signatures in sound data.The default program can be face One kind in portion's Expression Recognition program, speech recognition program and article identification program Or it is a variety of.

Step S34, processing module 104 is detecting target spy from the video-audio data When levying, the effect corresponding to the target signature is obtained, and the effect is added to described In video-audio data.The corresponding effect can send default sound (for example to laugh at Sound, cheer, the sound recorded in advance etc.) or the default picture of broadcasting, Or animation or video or specific effect is added (such as people's Face adds sunglasses, in hand addition loudspeaker of people etc.) or the effect above In the combination of two or more.The processing module 104 is by corresponding program by institute The effect corresponding to target signature is stated to be incorporated into the video-audio data.It is corresponding Program can correspond to the Video Rendering program of the effect.

Step S35, the video-audio data after processing is sent to by processing module 104 needs hair The one or more receiving terminals 3 being sent to.Shadow of the receiving terminal 3 after processing is received During sound data, the video-audio data received is played by the playing device 34.

For example, if user a has setting sound " Hanabi " correspondence in systems Effect be " picture produce fireworks sound and light program ", when user a and user b is being carried out During video calling, user a sends invitation to user b, and expression is wished to go to see together Fireworks, to persuade user b to lift interest, user a can send sound " Hanabi ", The audio-visual automated processing system 10 can detect the sound that the user a is sent " Hanabi " (i.e. target signature), and the sound detected according to this draws currently Face addition fireworks sound and light program (i.e. corresponding effect).

It should be noted last that, technology of the above example only to illustrate the present invention Scheme and it is unrestricted, it will be understood by those within the art that, can be to this hair Bright technical scheme is modified or equivalent substitution, without departing from technical solution of the present invention Spirit and scope.

Claims

1. a kind of audio-visual automated processing system, runs in server, the server with One transmitting terminal is connected, it is characterised in that the system includes：

Setup module, for each target signature and each mesh for determining to need to detect Mark the effect corresponding to feature；

Receiving module, for receiving video-audio data from the transmitting terminal；

Detecting module, for when receiving the video-audio data, from the audio-visual number According to middle each target signature of detecting；

Processing module, for when detecting target signature, obtaining the target signature institute Corresponding effect, and the effect is added in the video-audio data.

2. audio-visual automated processing system as claimed in claim 1, it is characterised in that The detecting module is when receiving the video-audio data, immediately from the video-audio data Middle each target signature of detecting.

3. audio-visual automated processing system as claimed in claim 1, it is characterised in that The detecting module is detectd when having received the video-audio data from the video-audio data Survey each target signature.

4. audio-visual automated processing system as claimed in claim 1, it is characterised in that The receiving module also received from the transmitting terminal need it is being sent to the server Connected one or more receiving terminals；And the processing module is additionally operable to after processing Video-audio data is sent to one or more of receiving terminals.

5. the audio-visual automated processing system as described in any one of Claims 1-4, its It is characterised by, the target signature is default countenance, default action, pre- If voice and at least one of default object.

6. the audio-visual automated processing system as described in any one of Claims 1-4, its It is characterised by, the corresponding effect is to play default sound, play default figure One or more in piece or animation or video and the specific effect of addition.

7. a kind of audio-visual automatic processing method, applied in server, the server with One transmitting terminal is connected, it is characterised in that this method includes：

Setting steps, it is determined that needing each target signature and each target for detecting special Levy corresponding effect；

Receiving step, video-audio data is received from the transmitting terminal；

Step is detected, when receiving the video-audio data, from the video-audio data Detect each target signature；

Process step, when detecting target signature, is obtained corresponding to the target signature Effect, and the effect is added in the video-audio data.

8. audio-visual automatic processing method as claimed in claim 7, it is characterised in that The detecting module is when receiving the video-audio data, immediately from the video-audio data Middle each target signature of detecting.

9. audio-visual automatic processing method as claimed in claim 7, it is characterised in that The detecting module is detectd when having received the video-audio data from the video-audio data Survey each target signature.

10. audio-visual automatic processing method as claimed in claim 7, it is characterised in that The receiving step also received from the transmitting terminal need it is being sent to the server Connected one or more receiving terminals；And the processing module is additionally operable to after processing Video-audio data is sent to one or more of receiving terminals.

11. the audio-visual automatic processing method as described in any one of claim 7 to 10, its It is characterised by, the target signature is default countenance, default action, pre- If voice and at least one of default object.

12. the audio-visual automatic processing method as described in any one of claim 7 to 10, Characterized in that, the corresponding effect is to play default sound, play default One or more in picture or animation or video and the specific effect of addition.