CN114286114A

CN114286114A - Intelligent switching method and system for video streams

Info

Publication number: CN114286114A
Application number: CN202111363782.1A
Authority: CN
Inventors: 陆趣; 杨君蔚; 黄海峰; 朱竝清; 徐俊; 李辉石; 周鑫; 胡春玲
Original assignee: Shanghai Media Tech Co ltd
Current assignee: Shanghai Media Tech Co ltd
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-04-05
Anticipated expiration: 2041-11-17
Also published as: CN114286114B

Abstract

The invention provides a method and a system for intelligently switching video streams, which relate to the technical field of signal processing and comprise the following steps: continuously acquiring pulse code modulation audio data of multiple information sources and calculating to obtain a dual-channel volume value of each information source; performing single channel conversion on the dual-channel volume value to obtain a single channel volume value of each information source, and sequentially adding each single channel volume value corresponding to each channel of information source into a volume cache set; and carrying out volume detection on each single sound channel volume value respectively aiming at each volume cache set to obtain a corresponding volume detection result, carrying out intelligent switching on the video pictures of the video streams corresponding to the information sources according to the sound judgment results of the volume detection of all the information sources when the audio duration corresponding to each single sound channel volume value in the volume cache set is not less than a preset value, and updating each volume cache set respectively. The intelligent switching method has the beneficial effects that the intelligent switching of the video pictures is carried out based on the sound judgment result while the noise interference is prevented, and the labor cost is effectively saved.

Description

Intelligent switching method and system for video streams

Technical Field

The invention relates to the technical field of signal processing, in particular to a method and a system for intelligently switching video streams.

Background

With the development of internet technology, live video has been more and more widely paid attention to people. In the process of live broadcasting of some video programs (for example, product distribution conferences), it is often necessary to present to viewers the styles of various anchor programs located at various different positions, and after the target anchor program is determined, the live broadcasting picture is switched from the current anchor program to the target anchor program which may be located at different positions.

The traditional live broadcast picture switching is realized by utilizing hardware such as a director switching station and the like. However, the conventional switching method has a problem of high cost, and is not suitable for the development of internet live broadcast. With the development of the internet, a cloud broadcasting guide platform is provided, which is used for a virtual broadcasting guide platform for switching a plurality of paths of video streams, and can realize switching of a plurality of paths of audio and video and a user-defined scene, and live broadcast of a plurality of paths of push streams, and add functions such as a user-defined picture Logo, a user-defined character, a user-defined score, a user-defined caption bar, a user-defined element, delayed broadcast and the like on the original basis.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an intelligent video stream switching method, which comprises the following steps:

step S1, continuously collecting the pulse code modulation audio of the double channels of the multi-channel information sources, and calculating to obtain the double-channel volume value of the pulse code modulation audio of each information source;

step S2, performing single channel conversion on the dual-channel volume value to obtain a single channel volume value of each information source, and correspondingly adding each single channel volume value corresponding to each information source into a volume cache set according to the sequence of corresponding acquisition time;

step S3, for each volume buffer set, respectively performing volume detection on each monaural volume value to obtain a corresponding volume detection result, and determining whether an audio duration of the pulse code modulation audio corresponding to each monaural volume value in the volume buffer set is less than a preset value:

if yes, returning to the step S1;

if not, processing according to each volume detection result to obtain a sound judgment result of the current volume detection of the corresponding information source, and then turning to the step S4;

step S4, intelligently switching video frames of video streams corresponding to the information sources according to the sound determination result of the current volume detection of all the information sources, updating each volume cache set, and then returning to the step S3.

Preferably, the volume detection result is silent or voiced; then in step S4:

for each volume cache set, processing to obtain the sound judgment result of the corresponding information source according to a first proportion representing the soundless volume detection result within a first preset time length and a second proportion representing the soundless volume detection result within a second preset time length;

the first preset time length and the second preset time length are both larger than the preset value.

Preferably, in step S3, the detecting the volume of each of the monaural volume values includes:

comparing each single sound channel volume value with a volume threshold value respectively, and judging whether the single sound channel volume value is smaller than the volume threshold value:

if yes, outputting the volume detection result representing silence;

and if not, outputting the volume detection result indicating sound.

Preferably, the first preset duration is less than the second preset duration, and the pulse code modulation audio data of the second preset duration includes the pulse code modulation audio data of the first preset duration; the process of processing to obtain the sound determination result of the corresponding information source according to each of the volume detection results includes:

step A1a, calculating the first duty ratio of the volume detection result indicating silence within the first preset duration, and determining whether the first duty ratio is less than a first threshold:

if not, outputting the sound judgment result indicating that the signal source is continuously silent within the first preset time period, and then turning to the step S4;

if yes, go to step A2 a;

step A2a, calculating the second proportion of the volume detection result indicating voiced sound within the second preset time period, and determining whether the second proportion is smaller than a second threshold:

if not, outputting the sound judgment result indicating that the signal source is continuously voiced within the second preset time period, and then turning to the step S4;

if so, the sound determination result of the last volume detection is used as the sound determination result of the current volume detection, and the process then proceeds to step S4.

Preferably, the first preset duration is not less than the second preset duration, and the pulse code modulation audio data of the first preset duration includes the pulse code modulation audio data of the second preset duration; the process of processing to obtain the sound determination result of the corresponding information source according to each of the volume detection results includes:

step A1b, calculating the second proportion of the volume detection result indicating voiced sound within the second preset time period, and determining whether the second proportion is smaller than a second threshold:

if yes, go to step A2 b;

step A2b, calculating the first duty ratio of the volume detection result indicating silence within the first preset time period, and determining whether the first duty ratio is greater than a first threshold:

Preferably, in step S2, the two-channel volume value is mono-converted by using a Max channel conversion algorithm.

Preferably, in step S4, the video frames of the video streams corresponding to the respective signal sources are intelligently switched according to the number of the sound determination results indicating that the signal sources are continuously voiced in the sound determination results of the current volume detection of all the signal sources.

Preferably, in step S4, the number of the sound determination results indicating that the source is continuously voiced is counted, and when the number is one, the current video playing interface is intelligently switched to the video frame of the video stream corresponding to the source that is continuously voiced, and when the number is greater than one, the current video playing interface is intelligently switched to the video frame indicating a panorama.

Preferably, in step S3, for each volume buffer set, a sliding time window is adopted to perform volume detection on each monaural volume value to obtain the corresponding volume detection result.

The invention also provides a video stream intelligent switching system, which is applied to the video stream intelligent switching method, and the video stream intelligent switching system comprises:

the volume calculation module is used for continuously acquiring the pulse code modulation audio of the double channels of the multi-channel information sources and calculating to obtain the double-channel volume value of the pulse code modulation audio of each information source;

the volume conversion module is connected with the volume calculation module and used for carrying out single-channel conversion on the double-channel volume value to obtain a single-channel volume value of each information source and correspondingly adding each single-channel volume value corresponding to each information source into a volume cache set according to the sequence of corresponding acquisition time;

the volume detection module is connected with the volume conversion module and is used for respectively carrying out volume detection on each single sound channel volume value aiming at each volume cache set to obtain a corresponding volume detection result and outputting a judgment signal when the audio time of the pulse code modulation audio corresponding to each single sound channel volume value in the volume cache set is not less than a preset value;

and the sound judgment module is connected with the volume detection module and used for processing according to the judgment signal and each volume detection result to obtain a sound judgment result of the corresponding information source, intelligently switching the video pictures of the video streams corresponding to the information sources according to the sound judgment result of the current volume detection of all the information sources, and updating each volume cache set respectively.

The technical scheme has the following advantages or beneficial effects: through detecting and analyzing the pulse code modulation audio of the multi-channel information source, sound or soundless protection is realized, the interference of noise to detection is prevented, meanwhile, the intelligent switching of corresponding video pictures can be carried out based on sound or soundless sound judgment results, the cloud program guide working mode of unattended operation and automatic program guide is realized, and the labor cost is effectively saved.

Drawings

FIG. 1 is a flow chart illustrating a method for intelligently switching video streams according to a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating the comparison between the volume detection of each single-channel volume value in the volume buffer set according to the preferred embodiment of the present invention;

FIG. 3 is a flowchart illustrating a sound determination process when a first predetermined duration is less than a second predetermined duration according to a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating volume detection results corresponding to a first preset duration and a second preset duration, respectively, in a preferred embodiment of the present invention;

FIG. 5 is a flowchart illustrating a sound determination process performed when the first predetermined duration is not less than the second predetermined duration according to a preferred embodiment of the present invention;

FIG. 6 is a schematic diagram of two adjacent sliding steps of volume detection according to the preferred embodiment of the present invention;

fig. 7 is a schematic structural diagram of an intelligent video stream switching system according to a preferred embodiment of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present invention is not limited to the embodiment, and other embodiments may be included in the scope of the present invention as long as the gist of the present invention is satisfied.

In a preferred embodiment of the present invention, based on the above problems in the prior art, there is provided a method for intelligently switching video streams, as shown in fig. 1, including:

step S2, mono-channel conversion is carried out on the double-channel volume value to obtain mono-channel volume value of each information source, and each mono-channel volume value corresponding to each information source is correspondingly added into a volume cache set according to the sequence of corresponding acquisition time;

step S3, for each volume buffer set, respectively carrying out volume detection on each single sound channel volume value to obtain a corresponding volume detection result, and judging whether the audio time length of the pulse code modulation audio corresponding to each single sound channel volume value in the volume buffer set is less than a preset value:

if yes, return to step S1;

and step S4, intelligently switching the video pictures of the video streams corresponding to the information sources according to the sound judgment results of the current volume detection of all the information sources, respectively updating each volume cache set, and then returning to the step S3.

Specifically, in this embodiment, the technical scheme may be applied to cloud broadcasting, and the cloud broadcasting is adopted to realize intelligent switching of video frames in the playing process of a television program. During the playing process of the television program, the Pulse Code Modulation (PCM) audio of the dual channels of the multiple information sources corresponding to the television program is continuously acquired, and then the PCM audio of each information source is continuously processed to obtain the sound judgment result of each information source, so that the sound judgment results of all the information sources perform intelligent switching of the video pictures of the video streams corresponding to the information sources.

Further specifically, after acquiring the dual-channel pulse code modulation audio of the multiple information sources, the dual-channel volume value of the pulse code modulation audio of each information source can be calculated through a volume decibel calculator, then the dual-channel volume value can be sent to a volume processor to be processed, so that the dual-channel volume values of the left and right dual channels are subjected to single channel conversion to obtain single channel volume, and finally the continuously acquired single channel volume of each information source can be sent to an intelligent switcher to be subjected to volume detection and judgment.

Preferably, a plurality of audio buffers may be configured in the volume decibel calculator, each audio buffer respectively buffers a pulse code modulation audio of one channel of the information source, and further preferably, a plurality of volume buffers may be configured in the intelligent switcher, and each volume buffer respectively buffers a volume buffer set of one channel of the information source. The method comprises the steps that the audio buffers and the volume buffers adopt a first-in first-out principle, namely, pulse coding modulation audio stored in the audio buffers is firstly stored, after corresponding two-channel volume is obtained through calculation, the audio buffers are firstly stored in the volume buffers and are sent into a volume processor to obtain single-channel volume through calculation, then the single-channel volume is stored in the volume buffers firstly and participates in volume detection and judgment, after the volume detection and judgment are finished, the single-channel volume which participates in the volume detection and judgment can be removed from a volume buffer set, so that the volume buffer set is updated for next volume detection, once volume detection is carried out each time, and video pictures are switched once based on corresponding sound judgment results. It can be understood that if the sound determination results of two adjacent times are the same, the corresponding video picture is kept unchanged.

Further preferably, in order to prevent noise from interfering with volume detection, in the present technical solution, sound determination is performed on a plurality of volume detection results within a duration time period, so that when the audio duration of the pulse code modulation audio corresponding to each monaural volume value in the volume buffer set is long enough, that is, when the buffer amount in the volume buffer set is sufficient, further sound determination is performed. Specifically, a preset value is configured to be used as a judgment standard for judging whether the audio time is long enough.

In a preferred embodiment of the present invention, the volume detection result is silent or voiced; in step S4:

processing each volume cache set according to a first proportion representing a silent volume detection result in a first preset time length and a second proportion representing a voiced volume detection result in a second preset time length to obtain a sound judgment result of a corresponding information source;

the first preset time length and the second preset time length are both larger than a preset value.

Specifically, in this embodiment, when the audio time of the pulse code modulation audio corresponding to each monaural volume value in the volume buffer set reaches a preset value, the volume detection results of the first preset time and the second preset time may be respectively selected from the volume buffer set to perform the silent or voiced determination. The first preset time and the second preset time can be set according to the requirement.

In a preferred embodiment of the present invention, in step S3, the detecting the sound volume of each monaural sound volume value includes:

comparing each single sound track volume value with a volume threshold value respectively, and judging whether the single sound track volume value is smaller than the volume threshold value:

if yes, outputting a volume detection result representing silence;

if not, outputting a volume detection result indicating sound.

Specifically, in this embodiment, as shown in fig. 2, taking a volume cache set corresponding to one of the channels of information sources as an example, a volume detection result obtained by respectively performing volume detection on each monaural volume value can be visually seen.

In a preferred embodiment, after the processed sound volume detection result of each monaural sound volume value, when sound determination is performed, priority of silent determination or sound determination is configured in advance, wherein when the first preset time length and the second preset time length are the same, sound determination is performed preferentially, and when the first preset time length and the second preset time length are different, priority determination is performed with the shorter time length.

Further specifically, the first preset duration is less than the second preset duration, and the pulse code modulation audio data of the second preset duration comprises the pulse code modulation audio data of the first preset duration; as shown in fig. 3, the process of processing the sound determination result of the corresponding source according to each volume detection result includes:

step A1a, calculating a first ratio of the volume detection result indicating silence within a first preset time period, and determining whether the first ratio is less than a first threshold:

if not, outputting a sound judgment result indicating that the information source is continuously silent within a first preset time length, and then turning to the step S4;

if yes, go to step A2 a;

step A2a, calculating a second ratio of the voiced sound volume detection result within a second preset time period, and determining whether the second ratio is smaller than a second threshold:

if not, outputting a sound judgment result indicating that the information source is continuously voiced within a second preset time period, and then turning to the step S4;

Specifically, when the first preset time length is less than the second preset time length, the judgment of silence is carried out, if the first ratio is not less than the first threshold value, the sound judgment result is that silence is a true proposition, the judgment of sound is not carried out any more, if the first ratio is less than the first threshold, the sound judgment result is that no sound is false proposition, at this time, further sound judgment is needed, if the second ratio is not less than the second threshold, the sound judgment result is considered to be voiced and is a true proposition, at the moment, the sound judgment result which indicates that the information source continues to be voiced within a second preset time length is output, if the second ratio is smaller than the second threshold, the sound judgment result is considered to be sound and false proposition, at this time, and when the sound judgment result of the volume detection at this time is invalid, the sound judgment result of the volume detection at the last time is taken as the sound judgment result of the volume detection at this time.

Preferably, the first threshold value and the second threshold value may be 80%, or may be set as needed. For each volume buffer set, the volume detection result of the volume detection on each monaural volume value is shown in fig. 4, where the audio acquisition time corresponding to the voiced volume detection result located on the rightmost side is the earliest, and when sound determination is performed, the audio acquisition time corresponding to the volume detection result may be used as a time starting point, and the volume detection result for a first preset time is selected, it can be seen that the first percentage representing the unvoiced volume detection result in the first preset time is 20% and is less than 80%, and at this time, voiced determination is further performed. As shown in fig. 4, it can be seen that the duration starting point of the second preset duration and the duration starting point of the first preset duration are required to be consistent, the volume detection result indicating that sound is present in the second preset duration is 75%, and is also less than 80%, and at this time, the sound determination result of the previous volume detection is taken as the sound determination result of the current volume detection.

In a preferred embodiment of the present invention, the first predetermined duration is not less than the second predetermined duration, and the pulse code modulation audio data of the first predetermined duration includes pulse code modulation audio data of the second predetermined duration; as shown in fig. 5, the process of processing the sound determination result of the corresponding source according to each volume detection result includes:

step A1b, calculating a second percentage of the voiced sound volume detection result within a second preset time period, and determining whether the second percentage is smaller than a second threshold:

if yes, go to step A2 b;

step A2b, calculating a first ratio of the volume detection result indicating silence within a first preset time period, and determining whether the first ratio is greater than a first threshold:

Specifically, in the present embodiment, specifically, when the first preset time period is not less than the second preset time period, the sound determination is performed first, if the second ratio is not less than the second threshold, the sound judgment result is that the sound is a true proposition, the judgment of silence is not performed, if the second ratio is smaller than the second threshold, the sound judgment result is considered to be that the sound is a false proposition, at this time, the further soundless judgment is needed, if the first ratio is not less than the first threshold, the sound judgment result is regarded as soundless and is a true proposition, at the moment, the sound judgment result which indicates that the information source continuously soundless within the first preset time length is output, if the first ratio is smaller than the first threshold, the sound judgment result is considered to be silence and is a false proposition, and at the moment, and when the sound judgment result of the volume detection at this time is invalid, the sound judgment result of the volume detection at the last time is taken as the sound judgment result of the volume detection at this time.

In the preferred embodiment of the present invention, in step S2, a Max channel conversion algorithm is used to perform mono conversion on the two-channel volume value.

In a preferred embodiment of the present invention, in step S4, the video frames of the video streams corresponding to the signal sources are intelligently switched according to the number of sound determination results indicating that the signal sources are continuously voiced in the sound determination results of the current volume detection of all the signal sources.

In a preferred embodiment of the present invention, in step S4, the number of sound determination results indicating that the signal source has continuous sound is counted, and when the number is one, the current video playing interface is intelligently switched to the video frame of the video stream corresponding to the signal source having continuous sound, and when the number is more than one, the current video playing interface is intelligently switched to the video frame indicating the panorama.

Specifically, in this embodiment, when the number of sound determination results indicating that the source has been continuously voiced is one, it may be considered that only one anchor in the television program is speaking continuously at this time, and at this time, the current video broadcast program is switched to the corresponding video frame, and a close-up is performed. If the number of the sound judgment results indicating that the information source has sound continuously is more than one, it can be considered that a plurality of anchor broadcasters continuously speak in the television program at the moment, and the current video playing program is switched to a video picture corresponding to the panorama of the current video playing program so as to contain all anchor broadcasters speaking.

In a preferred embodiment of the present invention, in step S3, for each volume buffer set, a sliding time window is adopted to perform volume detection on each monaural volume value to obtain a corresponding volume detection result.

Specifically, in the present embodiment, the sliding step size of the sliding time window includes, but is not limited to, 100 ms. For example, as shown in fig. 6, the time window is a schematic diagram of the sliding step length of the sliding time window corresponding to the first preset time length and the second preset time length when two adjacent volume measurements are performed; it can be seen that, during the volume detection, the start time of the sliding time window corresponding to the first preset duration and the second preset duration is the rightmost side, and then during the next volume detection, the start time of the sliding time window corresponding to the first preset duration and the second preset duration is the audio acquisition time of the third voiced volume detection result counted from the right side, and the sliding step length covers the audio durations of the two voiced volume detection results. Further, at the end of the next volume detection, the two voiced volume detection results covered by the sliding step may be deleted from the volume buffer set, so as to update the volume buffer set. It is understood that the number of the volume detection results covered by the first preset duration, the second preset duration and the sliding step is related to the acquisition period of the pcm audio and the setting values of the first preset duration, the second preset duration and the sliding step, and the illustration is only an example and is not intended to limit the present application.

The present invention further provides an intelligent video stream switching system, which is applied to the above intelligent video stream switching method, as shown in fig. 7, the intelligent video stream switching system includes:

the volume calculation module 1 is used for continuously collecting the pulse code modulation audio of the dual channels of the multi-channel information sources and calculating to obtain the dual-channel volume value of the pulse code modulation audio of each information source;

the volume conversion module 2 is connected with the volume calculation module 1 and is used for performing single-channel conversion on the dual-channel volume value to obtain a single-channel volume value of each information source, and correspondingly adding the single-channel volume values corresponding to each information source into a volume cache set according to the sequence of corresponding acquisition time;

the volume detection module 3 is connected with the volume conversion module 2 and is used for respectively carrying out volume detection on each single sound channel volume value aiming at each volume cache set to obtain a corresponding volume detection result and outputting a judgment signal when the audio time length of the pulse code modulation audio corresponding to each single sound channel volume value in the volume cache set is not less than a preset value;

and the sound judgment module 4 is connected with the volume detection module 3 and is used for processing the judgment signal and each volume detection result to obtain a sound judgment result of the corresponding information source, intelligently switching the video pictures of the video streams corresponding to each information source according to the sound judgment results of the volume detection of all the information sources at this time, and updating each volume cache set respectively.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. An intelligent video stream switching method is characterized by comprising the following steps:

if yes, returning to the step S1;

2. The intelligent switching method for video streams according to claim 1, wherein the volume detection result is silence or voiced; then in step S4:

3. The method for intelligently switching video streams according to claim 2, wherein the step S3, performing volume detection on each of the mono volume values includes:

if yes, outputting the volume detection result representing silence;

and if not, outputting the volume detection result indicating sound.

4. The method according to claim 2, wherein the first predetermined duration is less than the second predetermined duration, and the pulse code modulated audio data of the second predetermined duration comprises the pulse code modulated audio data of the first predetermined duration; the process of processing to obtain the sound determination result of the corresponding information source according to each of the volume detection results includes:

if yes, go to step A2 a;

5. The method according to claim 3, wherein the first predetermined duration is not less than the second predetermined duration, and the pulse code modulation audio data of the first predetermined duration comprises the pulse code modulation audio data of the second predetermined duration; the process of processing to obtain the sound determination result of the corresponding information source according to each of the volume detection results includes:

if yes, go to step A2 b;

step A2b, calculating the first duty ratio of the volume detection result indicating silence within the first preset time period, and determining whether the first duty ratio is less than a first threshold:

6. The method for intelligent switching of video streams according to claim 1, wherein in step S2, the two-channel volume value is mono-converted by using Max channel conversion algorithm.

7. The method for intelligently switching video streams according to claim 4 or 5, wherein in step S4, the video frames of the video streams corresponding to the respective sources are intelligently switched according to the number of the sound determination results indicating that the sound of the source continues in the sound determination results of the current volume detection of all the sources.

8. The method for intelligently switching video streams according to claim 7, wherein in step S4, the number of the sound determination results indicating that the source is continuously voiced is counted, and when the number is one, the current video playing interface is intelligently switched to the video picture of the video stream corresponding to the source that is continuously voiced, and when the number is more than one, the current video playing interface is intelligently switched to the video picture indicating panorama.

9. The method for intelligently switching video streams according to claim 1, wherein in step S3, for each volume buffer set, a sliding time window is used to perform volume detection on each monaural volume value to obtain the corresponding volume detection result.

10. An intelligent video stream switching system, applied to the intelligent video stream switching method according to any one of claims 1 to 9, the intelligent video stream switching system comprising: