CN116264620B - Live broadcast recorded audio data acquisition and processing method and related device - Google Patents

Live broadcast recorded audio data acquisition and processing method and related device Download PDF

Info

Publication number
CN116264620B
CN116264620B CN202310434372.4A CN202310434372A CN116264620B CN 116264620 B CN116264620 B CN 116264620B CN 202310434372 A CN202310434372 A CN 202310434372A CN 116264620 B CN116264620 B CN 116264620B
Authority
CN
China
Prior art keywords
audio
behavior analysis
data
image
analysis result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310434372.4A
Other languages
Chinese (zh)
Other versions
CN116264620A (en
Inventor
李庆余
黄智�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shengfeite Technology Co ltd
Original Assignee
Shenzhen Shengfeite Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shengfeite Technology Co ltd filed Critical Shenzhen Shengfeite Technology Co ltd
Priority to CN202310434372.4A priority Critical patent/CN116264620B/en
Publication of CN116264620A publication Critical patent/CN116264620A/en
Application granted granted Critical
Publication of CN116264620B publication Critical patent/CN116264620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the field of audio processing, and discloses a live broadcast recorded audio data acquisition and processing method and a related device, which are used for improving the recording efficiency of a recording end and enabling main sound to be clearer. The method comprises the following steps: inputting the first image data into an object behavior analysis model set for behavior analysis to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; and carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to generate a target recorded video.

Description

Live broadcast recorded audio data acquisition and processing method and related device
Technical Field
The invention relates to the field of audio processing, in particular to a live broadcast recorded audio data acquisition and processing method and a related device.
Background
With the rapid development of live broadcast technology, the technology of live broadcast recording is mature. And transmitting and storing the live broadcast pictures of the live broadcast recording end on a network. And then the large-scale content distribution is carried out through the content distribution network, so that the problem of slow resource access caused by trans-regional network transmission can be avoided as much as possible.
However, in the existing scheme, audio noise exists in the live broadcast recording process, and the audio noise can seriously affect the live broadcast watching experience of the user, so that noise judgment and noise removal need to be performed manually, that is, the recording efficiency of the existing scheme is low.
Disclosure of Invention
The invention provides a live broadcast recorded audio data acquisition and processing method and a related device, which are used for improving the recording efficiency of a recording end and enabling main sound to be clearer.
The first aspect of the present invention provides a live broadcast recorded audio data acquisition and processing method, which includes:
acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;
inputting the first image data into a preset object behavior analysis model set, and respectively performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set;
Performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag corresponding to the at least one target behavior analysis result;
determining corresponding second image data according to the image serial labels, and matching audio data to be processed according to the second image data;
performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type;
and calling a preset digital sound console, carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.
In combination with the first aspect, the acquiring, by the live broadcast recording end, original recording data of a target recording object, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data includes:
acquiring original recording data of a target recording object based on a preset live broadcast recording end, and acquiring time stamp data of the original recording data;
Inputting the original recorded data into a preset video image extraction network to extract video images according to the timestamp data, so as to obtain first image data;
inputting the original recorded data into a preset audio segmentation network to carry out audio data segmentation to obtain initial audio data, and carrying out audio transcoding on the initial audio data according to the timestamp data to obtain first audio data.
In combination with the first aspect, the inputting the first image data into a preset object behavior analysis model set, and performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set, including:
inputting the first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts;
performing behavior analysis on different parts in the first image data through the object behavior analysis models to obtain a behavior analysis result of each object behavior analysis model;
Performing coding storage on the behavior analysis results of each object behavior analysis model to obtain a coding value of each behavior analysis result;
and constructing an initial behavior analysis result set according to the coding value of each behavior analysis result.
In combination with the first aspect, the performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image serial label corresponding to the at least one target behavior analysis result, where the method includes:
performing abnormal behavior recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result;
generating at least one information tag according to the at least one target behavior analysis result;
and carrying out image serial connection processing on the at least one information tag to obtain an image serial connection tag corresponding to the at least one target behavior analysis result.
With reference to the first aspect, the determining corresponding second image data according to the image serial label, and matching audio data to be processed according to the second image data includes:
determining corresponding second image data according to the image serial labels;
acquiring an audio segment corresponding to the second image data;
And performing audio data matching on the audio segments to obtain audio data to be processed corresponding to the second image data.
In combination with the first aspect, the performing a background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing policy according to the background audio fault type, includes:
inputting the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network;
performing background audio fault analysis on the audio data to be processed through the audio fault classification model to obtain a background audio fault type;
and acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.
In combination with the first aspect, the calling a preset digital sound console, and performing background audio adjustment on the audio data to be processed according to the audio processing policy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data, including:
Setting a parameter adjustment value of a preset digital sound console according to the audio processing strategy;
according to the parameter adjustment value, carrying out background audio adjustment on the audio data to be processed to obtain second audio data;
according to the second audio data, audio integration processing is carried out on the first audio data, and audio data after the audio integration processing are obtained;
and carrying out video fusion on the audio data subjected to the audio integration processing and the first image data to generate a target recording video.
The second aspect of the present invention provides a live broadcast recorded audio data acquisition and processing device, where the live broadcast recorded audio data acquisition and processing device includes:
the acquisition module is used for acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;
the analysis module is used for inputting the first image data into a preset object behavior analysis model set, and respectively carrying out behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set;
The construction module is used for carrying out behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series label corresponding to the at least one target behavior analysis result;
the matching module is used for determining corresponding second image data according to the image serial labels and matching audio data to be processed according to the second image data;
the processing module is used for carrying out background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type;
the generation module is used for calling a preset digital sound console, carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.
A third aspect of the present invention provides a live recording audio data acquisition processing apparatus, including: a memory and at least one processor, the memory having instructions stored therein; and the at least one processor calls the instruction in the memory so that the live broadcast recorded audio data acquisition processing equipment executes the live broadcast recorded audio data acquisition processing method.
A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the live recorded audio data acquisition processing method described above.
In the technical scheme provided by the invention, the first image data is input into an object behavior analysis model set for behavior analysis to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; according to the method, the system and the device, the target recording video is generated by carrying out background audio adjustment on the audio data to be processed according to an audio processing strategy, and the fault audio data to be processed is found in time by carrying out real-time detection on the behavior of the target recording object in the live broadcast recording process, and then the target recording video is produced by an audio processing technology.
Drawings
Fig. 1 is a schematic diagram of an embodiment of a method for collecting and processing live recording audio data according to an embodiment of the present invention;
FIG. 2 is a flow chart of performing behavior analysis on different portions of the first image data according to an embodiment of the present invention;
FIG. 3 is a flow chart of constructing an image series tag in an embodiment of the invention;
FIG. 4 is a flow chart of background audio fault analysis in an embodiment of the invention;
FIG. 5 is a schematic diagram of an embodiment of a live recording audio data acquisition and processing device according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an embodiment of a live recording audio data acquisition processing device according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a live broadcast recorded audio data acquisition and processing method and a related device, which are used for improving the recording efficiency of a recording end and enabling main sound to be clearer. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For easy understanding, a specific flow of an embodiment of the present invention is described below, referring to fig. 1, and an embodiment of a method for acquiring and processing live recording audio data in an embodiment of the present invention includes:
s101, acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;
it can be understood that the execution body of the present invention may be a live recording audio data acquisition processing device, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
Specifically, the server collects original recording data of a target recording object based on a preset live broadcast recording end, and further, the server performs image and audio segmentation on the original recording data, wherein the server determines audio feature data through the original recording data and simultaneously obtains image feature data and audio feature data, and further, the server performs image and audio segmentation through the image feature data and the audio feature data to obtain first image data and first audio data.
S102, inputting first image data into a preset object behavior analysis model set, and respectively performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set;
specifically, the server inputs the first image data into a preset object behavior analysis model set, and further, the server performs behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set, wherein the server performs category classification according to the different parts in the first image data, and queries images of the different parts with a preset behavior database to obtain an initial behavior analysis result set.
S103, performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series label corresponding to the at least one target behavior analysis result;
when the behavior feature recognition is performed on the initial behavior analysis result set, the server performs data extraction on the behavior analysis result to obtain a plurality of traditional driving feature indexes, extracts a plurality of mel-frequency cepstrum coefficient MFCC feature indexes from the initial behavior analysis result set, further performs behavior feature recognition according to the plurality of mel-frequency cepstrum coefficient MFCC feature indexes to obtain at least one target behavior analysis result, and constructs an image series tag corresponding to the at least one target behavior analysis result.
S104, determining corresponding second image data according to the image serial labels, and matching audio data to be processed according to the second image data;
specifically, the server determines corresponding second image data according to the image serial labels, wherein the server performs preprocessing on a plurality of preset candidate images according to the image serial labels, calculates corresponding label correlation values between each candidate image, further performs image screening according to the corresponding label correlation values between each candidate image, determines the corresponding second image data, and finally performs audio data matching according to the second image data to obtain corresponding audio data to be processed.
S105, analyzing the background audio fault of the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type;
specifically, the server inputs the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: the method comprises the steps of carrying out background audio fault analysis on audio data to be processed through an audio fault classification model to obtain a background audio fault type, obtaining a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.
S106, calling a preset digital sound console, performing background audio adjustment on the audio data to be processed according to an audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.
Specifically, a preset digital sound console is called, background audio adjustment is carried out on audio data to be processed according to an audio processing strategy, wherein a server receives an actual parameter value related to the background audio to be adjusted, further, the server adjusts the parameter adjustment value of the preset digital sound console on the actual parameter value of the background audio to be adjusted, further, the server carries out the background audio adjustment on the audio data to be processed according to the parameter adjustment value to obtain second audio data, carries out audio integration processing on the first audio data according to the second audio data to obtain audio data after the audio integration processing, and carries out video fusion on the audio data after the audio integration processing and the first image data to generate target recorded video.
In the embodiment of the invention, the first image data is input into an object behavior analysis model set for behavior analysis to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; according to the method, the system and the device, the target recording video is generated by carrying out background audio adjustment on the audio data to be processed according to an audio processing strategy, and the fault audio data to be processed is found in time by carrying out real-time detection on the behavior of the target recording object in the live broadcast recording process, and then the target recording video is produced by an audio processing technology.
In a specific embodiment, the process of executing step S101 may specifically include the following steps:
(1) Acquiring original recording data of a target recording object based on a preset live broadcast recording end, and acquiring time stamp data of the original recording data;
(2) Inputting the original recorded data into a preset video image extraction network to extract video images according to the time stamp data, so as to obtain first image data;
(3) The original recorded data is input into a preset audio segmentation network to carry out audio data segmentation to obtain initial audio data, and the initial audio data is subjected to audio transcoding according to the time stamp data to obtain first audio data.
Specifically, original recording data of a target recording object is collected based on a preset live broadcast recording end, and timestamp data of the original recording data is obtained, wherein a server randomly extracts N frames of images of the original recording data, records position information of N frames in a video stream, detects timestamp rectangular frame position information in each frame of images through an image timestamp detection algorithm, and splices the N frames of image timestamp rectangular frame position information and the timestamp image data to serve as the timestamp data of the video stream; and further obtaining time stamp data of the original recorded data, inputting the original recorded data into a preset video image extraction network for video image extraction according to the time stamp data to obtain first image data, inputting the original recorded data into a preset audio segmentation network for audio data segmentation to obtain initial audio data, and performing audio transcoding on the initial audio data according to the time stamp data to obtain first audio data, wherein a server divides a video stream of the original recorded data into at least one group of pictures GOP, stores each GOP as a file, divides an audio stream of the original recorded data into at least one audio packet according to a fixed frame number, stores each audio packet as a file to obtain initial audio data, and further, the server performs audio transcoding on the initial audio data according to the time stamp data, wherein the initial audio data is written into each GOP file to finally obtain the first audio data.
In a specific embodiment, as shown in fig. 2, the process of executing step S102 may specifically include the following steps:
s201, inputting first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts;
s202, performing behavior analysis on different parts in the first image data through a plurality of object behavior analysis models respectively to obtain a behavior analysis result of each object behavior analysis model;
s203, coding and storing the behavior analysis result of each object behavior analysis model to obtain a coding value of each behavior analysis result;
s204, constructing an initial behavior analysis result set according to the coding value of each behavior analysis result.
Specifically, the first image data is input into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models, the plurality of object behavior analysis models are respectively used for performing behavior analysis on different parts, and the server builds training data sets of different parts in the historical image data before the first image data is input into the preset object behavior analysis model set; generating a behavior image based on different part training data sets in historical image data, inputting the behavior image into an analysis model to be trained to obtain behavior classifications of different part training data sets in the historical image data, obtaining prediction loss of the analysis model to be trained based on the behavior classifications and the real behavior labels of the different part training data sets in the historical image data, training the analysis model set to be trained by using the prediction loss to obtain a final object behavior analysis model set, further, performing behavior analysis on different parts in the first image data through a plurality of object behavior analysis models to obtain behavior analysis results of each object behavior analysis model, performing coding storage on the behavior analysis results of each object behavior analysis model to obtain coding values of each behavior analysis result, and constructing an initial behavior analysis result set according to the coding values of each behavior analysis result.
In a specific embodiment, as shown in fig. 3, the process of executing step S103 may specifically include the following steps:
s301, carrying out abnormal behavior recognition on an initial behavior analysis result set to obtain at least one target behavior analysis result;
s302, generating at least one information tag according to at least one target behavior analysis result;
s303, performing image serial connection processing on at least one information label to obtain an image serial connection label corresponding to at least one target behavior analysis result.
Specifically, abnormal behavior recognition is performed on the initial behavior analysis result set to obtain at least one target behavior analysis result, wherein a server extracts an image of the initial behavior analysis result set, recognizes a facial expression in the image of the initial behavior analysis result set and actions and behavior trends in the image of the initial behavior analysis result set, and finally generates at least one target behavior analysis result according to the facial expression and the actions and the behavior trends in the image of the initial behavior analysis result set, and further generates at least one information tag according to the at least one target behavior analysis result, and further the server performs image series processing on the at least one information tag to obtain an image series tag corresponding to the at least one target behavior analysis result, wherein the server performs tag recognition on the series tag in the at least one target behavior analysis result based on a preset sample image to obtain a target parameter value corresponding to the series tag, and finally, the server performs image series processing on the at least one information tag according to the target parameter value corresponding to the series tag to obtain the image series tag corresponding to the at least one target behavior analysis result.
In a specific embodiment, the process of executing step S104 may specifically include the following steps:
(1) Determining corresponding second image data according to the image serial labels;
(2) Acquiring an audio segment corresponding to the second image data;
(3) And performing audio data matching on the audio segments to obtain audio data to be processed corresponding to the second image data.
Specifically, corresponding second image data is determined according to the image serial label, wherein the server determines common characteristics of image characteristics in one or more associated image data according to the image serial label, and further determines corresponding second image data according to the common characteristics of the image characteristics in the one or more associated image data.
In a specific embodiment, as shown in fig. 4, the process of performing step S105 may specifically include the following steps:
s401, inputting audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network;
S402, analyzing background audio faults of the audio data to be processed through an audio fault classification model to obtain a background audio fault type;
s403, acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.
Specifically, the server inputs the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: the method comprises the steps of carrying out background audio fault analysis on audio data to be processed through an audio fault classification model to obtain a background audio fault type, preprocessing the audio data to be processed by a server, obtaining a time-frequency mask of equipment sound by using a deep neural network, separating pure equipment sound by using the time-frequency mask, carrying out audio fault pre-judgment on the separated sound activity by using an initial analysis model, carrying out audio event starting and ending endpoint detection on an audio data area which is judged to be an audio fault, intercepting an audio event fragment, carrying out accurate identification on the detected audio event fragment to obtain the background audio fault type, finally, obtaining a strategy list by the server, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.
In a specific embodiment, the process of executing step S106 may specifically include the following steps:
(1) Setting a parameter adjustment value of a preset digital sound console according to an audio processing strategy;
(2) According to the parameter adjustment value, carrying out background audio adjustment on the audio data to be processed to obtain second audio data;
(3) According to the second audio data, performing audio integration processing on the first audio data to obtain audio data after the audio integration processing;
(4) And carrying out video fusion on the audio data and the first image data after the audio integration processing to generate a target recording video.
Specifically, setting a parameter adjustment value of a preset digital sound console according to an audio processing strategy, and performing background audio adjustment on audio data to be processed according to the parameter adjustment value to obtain second audio data, wherein a server judges whether the sound tuning parameters of the digital sound console meet parameter adjustment conditions or not, wherein the parameter adjustment conditions are used for indicating adjustment of the current parameter values of the sound tuning parameters; when the current parameter value is determined to meet the parameter adjustment condition, adjusting the current parameter value to a parameter value corresponding to the parameter adjustment value, and further, performing audio integration processing on the first audio data by the server according to the second audio data to obtain audio data after the audio integration processing, wherein the server obtains the audio data obtained after the analog-to-digital conversion of the first audio data; when two paths of audio data to be integrated exist in the obtained audio data, integrating the two paths of audio data to be integrated to obtain audio data after audio integration processing, and finally, the server performs video fusion on the audio data after audio integration processing and the first image data to generate a target recorded video.
The method for collecting and processing live-recorded audio data in the embodiment of the present invention is described above, and the device for collecting and processing live-recorded audio data in the embodiment of the present invention is described below, referring to fig. 5, one embodiment of the device for collecting and processing live-recorded audio data in the embodiment of the present invention includes:
the acquisition module 501 is configured to acquire original recording data of a target recording object based on a preset live broadcast recording end, and perform image and audio segmentation on the original recording data to obtain first image data and first audio data;
the analysis module 502 is configured to input the first image data into a preset object behavior analysis model set, and perform behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set, so as to obtain an initial behavior analysis result set;
a construction module 503, configured to perform behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and construct an image serial label corresponding to the at least one target behavior analysis result;
a matching module 504, configured to determine corresponding second image data according to the image serial label, and match audio data to be processed according to the second image data;
The processing module 505 is configured to perform a background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generate an audio processing policy according to the background audio fault type;
the generating module 506 is configured to call a preset digital sound console, perform background audio adjustment on the audio data to be processed according to the audio processing policy, obtain second audio data, and generate a target recorded video according to the second audio data and the first image data.
Inputting the first image data into an object behavior analysis model set to conduct behavior analysis through the cooperative cooperation of the components, so as to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; according to the method, the system and the device, the target recording video is generated by carrying out background audio adjustment on the audio data to be processed according to an audio processing strategy, and the fault audio data to be processed is found in time by carrying out real-time detection on the behavior of the target recording object in the live broadcast recording process, and then the target recording video is produced by an audio processing technology.
Fig. 5 above describes the live recording audio data acquisition and processing device in the embodiment of the present invention in detail from the perspective of a modularized functional entity, and the live recording audio data acquisition and processing device in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 6 is a schematic structural diagram of a live recording audio data collecting and processing device according to an embodiment of the present invention, where the live recording audio data collecting and processing device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 610 and a memory 620, and one or more storage media 630 (such as one or more mass storage devices) storing application programs 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations in the live recording audio data acquisition processing device 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the live recorded audio data acquisition processing device 600.
The live recorded audio data acquisition processing device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input/output interfaces 660, and/or one or more operating systems 631, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the live recording audio data acquisition processing device configuration shown in fig. 6 is not limiting of the live recording audio data acquisition processing device and may include more or fewer components than shown, or may be combined with certain components, or may be arranged in a different arrangement of components.
The invention also provides a live broadcast recorded audio data acquisition and processing device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor executes the steps of the live broadcast recorded audio data acquisition and processing method in the above embodiments.
The present invention also provides a computer readable storage medium, which may be a nonvolatile computer readable storage medium, and the computer readable storage medium may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions run on a computer, cause the computer to execute the steps of the live recording audio data acquisition processing method.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be an image computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random acceS memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. The live broadcast recorded audio data acquisition and processing method is characterized by comprising the following steps of:
acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;
inputting the first image data into a preset object behavior analysis model set, and respectively performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set, wherein the method specifically comprises the following steps of: inputting the first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts; performing behavior analysis on different parts in the first image data through the object behavior analysis models to obtain a behavior analysis result of each object behavior analysis model; performing coding storage on the behavior analysis results of each object behavior analysis model to obtain a coding value of each behavior analysis result; constructing an initial behavior analysis result set according to the coding value of each behavior analysis result; specifically, before the first image data is input into a preset object behavior analysis model set, different part training data sets in historical image data are constructed; generating a behavior image based on different part training data sets in the historical image data, inputting the behavior image into an analysis model to be trained to obtain behavior classification of the different part training data sets in the historical image data, obtaining prediction loss of the analysis model to be trained based on the behavior classification of the different part training data sets in the historical image data and a real behavior label, and training the analysis model set to be trained by using the prediction loss to obtain an object behavior analysis model set;
Performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag corresponding to the at least one target behavior analysis result, wherein the method specifically comprises the following steps of: performing abnormal behavior recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result; generating at least one information tag according to the at least one target behavior analysis result; performing image serial connection processing on the at least one information tag to obtain an image serial connection tag corresponding to the at least one target behavior analysis result; specifically, an image of an initial behavior analysis result set is extracted, facial expressions in the image of the initial behavior analysis result set and actions and behavior trends in the image of the initial behavior analysis result set are identified, at least one target behavior analysis result is generated according to the facial expressions and the actions and the behavior trends in the image of the initial behavior analysis result set, at least one information tag is generated according to the at least one target behavior analysis result, image series processing is carried out on the at least one information tag to obtain an image series tag corresponding to the at least one target behavior analysis result, tag identification is carried out on the information tag in the at least one target behavior analysis result based on a preset sample image, a target parameter value corresponding to the information tag is obtained, image series processing is carried out on the at least one information tag according to the target parameter value corresponding to the information tag, and an image series tag corresponding to the at least one target behavior analysis result is obtained;
Determining corresponding second image data according to the image serial labels, and matching audio data to be processed according to the second image data;
performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type, wherein the method specifically comprises the following steps of: inputting the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network; performing background audio fault analysis on the audio data to be processed through the audio fault classification model to obtain a background audio fault type; acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type;
and calling a preset digital sound console, carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.
2. The method for acquiring and processing live broadcast recorded audio data according to claim 1, wherein the acquiring original recorded data of a target recorded object based on a preset live broadcast recording terminal, and performing image and audio segmentation on the original recorded data to obtain first image data and first audio data, includes:
acquiring original recording data of a target recording object based on a preset live broadcast recording end, and acquiring time stamp data of the original recording data;
inputting the original recorded data into a preset video image extraction network to extract video images according to the timestamp data, so as to obtain first image data;
inputting the original recorded data into a preset audio segmentation network to carry out audio data segmentation to obtain initial audio data, and carrying out audio transcoding on the initial audio data according to the timestamp data to obtain first audio data.
3. The method for collecting and processing live recording audio data according to claim 1, wherein determining corresponding second image data according to the image serial label, and matching audio data to be processed according to the second image data, comprises:
Determining corresponding second image data according to the image serial labels;
acquiring an audio segment corresponding to the second image data;
and performing audio data matching on the audio segments to obtain audio data to be processed corresponding to the second image data.
4. The method for collecting and processing live broadcast recorded audio data according to claim 1, wherein the calling a preset digital sound console, performing background audio adjustment on the audio data to be processed according to the audio processing policy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data comprises:
setting a parameter adjustment value of a preset digital sound console according to the audio processing strategy;
according to the parameter adjustment value, carrying out background audio adjustment on the audio data to be processed to obtain second audio data;
according to the second audio data, audio integration processing is carried out on the first audio data, and audio data after the audio integration processing are obtained;
and carrying out video fusion on the audio data subjected to the audio integration processing and the first image data to generate a target recording video.
5. The utility model provides an audio data acquisition processing apparatus of live recording which characterized in that, audio data acquisition processing apparatus of live recording includes:
the acquisition module is used for acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;
the analysis module is used for inputting the first image data into a preset object behavior analysis model set, and respectively carrying out behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set, and specifically comprises the following steps: inputting the first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts; performing behavior analysis on different parts in the first image data through the object behavior analysis models to obtain a behavior analysis result of each object behavior analysis model; performing coding storage on the behavior analysis results of each object behavior analysis model to obtain a coding value of each behavior analysis result; constructing an initial behavior analysis result set according to the coding value of each behavior analysis result; specifically, before the first image data is input into a preset object behavior analysis model set, different part training data sets in historical image data are constructed; generating a behavior image based on different part training data sets in the historical image data, inputting the behavior image into an analysis model to be trained to obtain behavior classification of the different part training data sets in the historical image data, obtaining prediction loss of the analysis model to be trained based on the behavior classification of the different part training data sets in the historical image data and a real behavior label, and training the analysis model set to be trained by using the prediction loss to obtain an object behavior analysis model set;
The construction module is used for carrying out behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag corresponding to the at least one target behavior analysis result, and specifically comprises the following steps: performing abnormal behavior recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result; generating at least one information tag according to the at least one target behavior analysis result; performing image serial connection processing on the at least one information tag to obtain an image serial connection tag corresponding to the at least one target behavior analysis result; specifically, an image of an initial behavior analysis result set is extracted, facial expressions in the image of the initial behavior analysis result set and actions and behavior trends in the image of the initial behavior analysis result set are identified, at least one target behavior analysis result is generated according to the facial expressions and the actions and the behavior trends in the image of the initial behavior analysis result set, at least one information tag is generated according to the at least one target behavior analysis result, image series processing is carried out on the at least one information tag to obtain an image series tag corresponding to the at least one target behavior analysis result, tag identification is carried out on the information tag in the at least one target behavior analysis result based on a preset sample image, a target parameter value corresponding to the information tag is obtained, image series processing is carried out on the at least one information tag according to the target parameter value corresponding to the information tag, and an image series tag corresponding to the at least one target behavior analysis result is obtained;
The matching module is used for determining corresponding second image data according to the image serial labels and matching audio data to be processed according to the second image data;
the processing module is used for analyzing the background audio fault of the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type, and specifically comprises the following steps: inputting the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network; performing background audio fault analysis on the audio data to be processed through the audio fault classification model to obtain a background audio fault type; acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type;
the generation module is used for calling a preset digital sound console, carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.
6. A live recorded audio data acquisition and processing device, wherein the live recorded audio data acquisition and processing device comprises: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invoking the instructions in the memory to cause the live recorded audio data acquisition processing device to perform the live recorded audio data acquisition processing method of any of claims 1-4.
7. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the live recorded audio data acquisition processing method of any one of claims 1-4.
CN202310434372.4A 2023-04-21 2023-04-21 Live broadcast recorded audio data acquisition and processing method and related device Active CN116264620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310434372.4A CN116264620B (en) 2023-04-21 2023-04-21 Live broadcast recorded audio data acquisition and processing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310434372.4A CN116264620B (en) 2023-04-21 2023-04-21 Live broadcast recorded audio data acquisition and processing method and related device

Publications (2)

Publication Number Publication Date
CN116264620A CN116264620A (en) 2023-06-16
CN116264620B true CN116264620B (en) 2023-07-25

Family

ID=86723143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310434372.4A Active CN116264620B (en) 2023-04-21 2023-04-21 Live broadcast recorded audio data acquisition and processing method and related device

Country Status (1)

Country Link
CN (1) CN116264620B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677716A (en) * 2019-08-20 2020-01-10 咪咕音乐有限公司 Audio processing method, electronic device, and storage medium
CN113192532A (en) * 2021-03-29 2021-07-30 安徽理工大学 Mine hoist fault acoustic analysis method based on MFCC-CNN
US11490133B1 (en) * 2019-12-09 2022-11-01 Amazon Technologies, Inc. Insertion of directed content into a video asset

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109074822B (en) * 2017-10-24 2023-04-21 深圳和而泰智能控制股份有限公司 Specific voice recognition method, apparatus and storage medium
CN113539294A (en) * 2021-05-31 2021-10-22 河北工业大学 Method for collecting and identifying sound of abnormal state of live pig
CN114530166A (en) * 2022-01-29 2022-05-24 国网福建省电力有限公司电力科学研究院 Transformer on-load tap-changer fault diagnosis method based on background sound texture
CN115119007B (en) * 2022-06-23 2023-03-03 恩平市新盈科电声科技有限公司 Big data based audio acquisition and processing system and method for online live broadcast recording

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110677716A (en) * 2019-08-20 2020-01-10 咪咕音乐有限公司 Audio processing method, electronic device, and storage medium
US11490133B1 (en) * 2019-12-09 2022-11-01 Amazon Technologies, Inc. Insertion of directed content into a video asset
CN113192532A (en) * 2021-03-29 2021-07-30 安徽理工大学 Mine hoist fault acoustic analysis method based on MFCC-CNN

Also Published As

Publication number Publication date
CN116264620A (en) 2023-06-16

Similar Documents

Publication Publication Date Title
CN110033756B (en) Language identification method and device, electronic equipment and storage medium
EP2437255A2 (en) Automatic identification of repeated material in audio signals
US9224048B2 (en) Scene-based people metering for audience measurement
CN108615532B (en) Classification method and device applied to sound scene
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN111312286A (en) Age identification method, age identification device, age identification equipment and computer readable storage medium
CN113129927A (en) Voice emotion recognition method, device, equipment and storage medium
CN107277557B (en) A kind of methods of video segmentation and system
CN116264620B (en) Live broadcast recorded audio data acquisition and processing method and related device
CN112383488B (en) Content identification method suitable for encrypted and non-encrypted data streams
CN112562727B (en) Audio scene classification method, device and equipment applied to audio monitoring
CN113115107B (en) Handheld video acquisition terminal system based on 5G network
CN113473117B (en) Non-reference audio and video quality evaluation method based on gated recurrent neural network
CN115424253A (en) License plate recognition method and device, electronic equipment and storage medium
CN111553408B (en) Automatic test method for video recognition software
CN113593603A (en) Audio category determination method and device, storage medium and electronic device
CN113555022A (en) Voice-based same-person identification method, device, equipment and storage medium
CN110163043B (en) Face detection method, device, storage medium and electronic device
CN113362832A (en) Naming method and related device for audio and video characters
CN117351988B (en) Remote audio information processing method and system based on data analysis
CN115953724B (en) User data analysis and management method, device, equipment and storage medium
CN116975938B (en) Sensor data processing method in product manufacturing process
CN108417221B (en) Digital interphone sound code type detection method based on signal two-dimensional recombination fusion filtering
CN117316184B (en) Event detection feedback processing system based on audio signals
CN114302107A (en) Network interaction system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant