CN116264620B

CN116264620B - Live broadcast recorded audio data acquisition and processing method and related device

Info

Publication number: CN116264620B
Application number: CN202310434372.4A
Authority: CN
Inventors: 李庆余; 黄智�
Original assignee: Shenzhen Shengfeite Technology Co ltd
Current assignee: Shenzhen Shengfeite Technology Co ltd
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-25
Anticipated expiration: 2043-04-21
Also published as: CN116264620A

Abstract

The invention relates to the field of audio processing, and discloses a live broadcast recorded audio data acquisition and processing method and a related device, which are used for improving the recording efficiency of a recording end and enabling main sound to be clearer. The method comprises the following steps: inputting the first image data into an object behavior analysis model set for behavior analysis to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; and carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to generate a target recorded video.

Description

Live broadcast recorded audio data acquisition and processing method and related device

Technical Field

The invention relates to the field of audio processing, in particular to a live broadcast recorded audio data acquisition and processing method and a related device.

Background

With the rapid development of live broadcast technology, the technology of live broadcast recording is mature. And transmitting and storing the live broadcast pictures of the live broadcast recording end on a network. And then the large-scale content distribution is carried out through the content distribution network, so that the problem of slow resource access caused by trans-regional network transmission can be avoided as much as possible.

However, in the existing scheme, audio noise exists in the live broadcast recording process, and the audio noise can seriously affect the live broadcast watching experience of the user, so that noise judgment and noise removal need to be performed manually, that is, the recording efficiency of the existing scheme is low.

Disclosure of Invention

The invention provides a live broadcast recorded audio data acquisition and processing method and a related device, which are used for improving the recording efficiency of a recording end and enabling main sound to be clearer.

The first aspect of the present invention provides a live broadcast recorded audio data acquisition and processing method, which includes:

acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;

inputting the first image data into a preset object behavior analysis model set, and respectively performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set;

Performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag corresponding to the at least one target behavior analysis result;

determining corresponding second image data according to the image serial labels, and matching audio data to be processed according to the second image data;

performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type;

and calling a preset digital sound console, carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.

In combination with the first aspect, the acquiring, by the live broadcast recording end, original recording data of a target recording object, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data includes:

acquiring original recording data of a target recording object based on a preset live broadcast recording end, and acquiring time stamp data of the original recording data;

Inputting the original recorded data into a preset video image extraction network to extract video images according to the timestamp data, so as to obtain first image data;

inputting the original recorded data into a preset audio segmentation network to carry out audio data segmentation to obtain initial audio data, and carrying out audio transcoding on the initial audio data according to the timestamp data to obtain first audio data.

In combination with the first aspect, the inputting the first image data into a preset object behavior analysis model set, and performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set, including:

inputting the first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts;

performing behavior analysis on different parts in the first image data through the object behavior analysis models to obtain a behavior analysis result of each object behavior analysis model;

Performing coding storage on the behavior analysis results of each object behavior analysis model to obtain a coding value of each behavior analysis result;

and constructing an initial behavior analysis result set according to the coding value of each behavior analysis result.

In combination with the first aspect, the performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image serial label corresponding to the at least one target behavior analysis result, where the method includes:

performing abnormal behavior recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result;

generating at least one information tag according to the at least one target behavior analysis result;

and carrying out image serial connection processing on the at least one information tag to obtain an image serial connection tag corresponding to the at least one target behavior analysis result.

With reference to the first aspect, the determining corresponding second image data according to the image serial label, and matching audio data to be processed according to the second image data includes:

determining corresponding second image data according to the image serial labels;

acquiring an audio segment corresponding to the second image data;

And performing audio data matching on the audio segments to obtain audio data to be processed corresponding to the second image data.

In combination with the first aspect, the performing a background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing policy according to the background audio fault type, includes:

inputting the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network;

performing background audio fault analysis on the audio data to be processed through the audio fault classification model to obtain a background audio fault type;

and acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.

In combination with the first aspect, the calling a preset digital sound console, and performing background audio adjustment on the audio data to be processed according to the audio processing policy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data, including:

Setting a parameter adjustment value of a preset digital sound console according to the audio processing strategy;

according to the parameter adjustment value, carrying out background audio adjustment on the audio data to be processed to obtain second audio data;

according to the second audio data, audio integration processing is carried out on the first audio data, and audio data after the audio integration processing are obtained;

and carrying out video fusion on the audio data subjected to the audio integration processing and the first image data to generate a target recording video.

The second aspect of the present invention provides a live broadcast recorded audio data acquisition and processing device, where the live broadcast recorded audio data acquisition and processing device includes:

the acquisition module is used for acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;

the analysis module is used for inputting the first image data into a preset object behavior analysis model set, and respectively carrying out behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set;

The construction module is used for carrying out behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series label corresponding to the at least one target behavior analysis result;

the matching module is used for determining corresponding second image data according to the image serial labels and matching audio data to be processed according to the second image data;

the processing module is used for carrying out background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type;

the generation module is used for calling a preset digital sound console, carrying out background audio adjustment on the audio data to be processed according to the audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.

A third aspect of the present invention provides a live recording audio data acquisition processing apparatus, including: a memory and at least one processor, the memory having instructions stored therein; and the at least one processor calls the instruction in the memory so that the live broadcast recorded audio data acquisition processing equipment executes the live broadcast recorded audio data acquisition processing method.

A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the live recorded audio data acquisition processing method described above.

In the technical scheme provided by the invention, the first image data is input into an object behavior analysis model set for behavior analysis to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; according to the method, the system and the device, the target recording video is generated by carrying out background audio adjustment on the audio data to be processed according to an audio processing strategy, and the fault audio data to be processed is found in time by carrying out real-time detection on the behavior of the target recording object in the live broadcast recording process, and then the target recording video is produced by an audio processing technology.

Drawings

Fig. 1 is a schematic diagram of an embodiment of a method for collecting and processing live recording audio data according to an embodiment of the present invention;

FIG. 2 is a flow chart of performing behavior analysis on different portions of the first image data according to an embodiment of the present invention;

FIG. 3 is a flow chart of constructing an image series tag in an embodiment of the invention;

FIG. 4 is a flow chart of background audio fault analysis in an embodiment of the invention;

FIG. 5 is a schematic diagram of an embodiment of a live recording audio data acquisition and processing device according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of a live recording audio data acquisition processing device according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a live broadcast recorded audio data acquisition and processing method and a related device, which are used for improving the recording efficiency of a recording end and enabling main sound to be clearer. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, a specific flow of an embodiment of the present invention is described below, referring to fig. 1, and an embodiment of a method for acquiring and processing live recording audio data in an embodiment of the present invention includes:

s101, acquiring original recording data of a target recording object based on a preset live broadcast recording end, and performing image and audio segmentation on the original recording data to obtain first image data and first audio data;

it can be understood that the execution body of the present invention may be a live recording audio data acquisition processing device, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.

Specifically, the server collects original recording data of a target recording object based on a preset live broadcast recording end, and further, the server performs image and audio segmentation on the original recording data, wherein the server determines audio feature data through the original recording data and simultaneously obtains image feature data and audio feature data, and further, the server performs image and audio segmentation through the image feature data and the audio feature data to obtain first image data and first audio data.

S102, inputting first image data into a preset object behavior analysis model set, and respectively performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set;

specifically, the server inputs the first image data into a preset object behavior analysis model set, and further, the server performs behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set, wherein the server performs category classification according to the different parts in the first image data, and queries images of the different parts with a preset behavior database to obtain an initial behavior analysis result set.

S103, performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series label corresponding to the at least one target behavior analysis result;

when the behavior feature recognition is performed on the initial behavior analysis result set, the server performs data extraction on the behavior analysis result to obtain a plurality of traditional driving feature indexes, extracts a plurality of mel-frequency cepstrum coefficient MFCC feature indexes from the initial behavior analysis result set, further performs behavior feature recognition according to the plurality of mel-frequency cepstrum coefficient MFCC feature indexes to obtain at least one target behavior analysis result, and constructs an image series tag corresponding to the at least one target behavior analysis result.

S104, determining corresponding second image data according to the image serial labels, and matching audio data to be processed according to the second image data;

specifically, the server determines corresponding second image data according to the image serial labels, wherein the server performs preprocessing on a plurality of preset candidate images according to the image serial labels, calculates corresponding label correlation values between each candidate image, further performs image screening according to the corresponding label correlation values between each candidate image, determines the corresponding second image data, and finally performs audio data matching according to the second image data to obtain corresponding audio data to be processed.

S105, analyzing the background audio fault of the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type;

specifically, the server inputs the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: the method comprises the steps of carrying out background audio fault analysis on audio data to be processed through an audio fault classification model to obtain a background audio fault type, obtaining a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.

S106, calling a preset digital sound console, performing background audio adjustment on the audio data to be processed according to an audio processing strategy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data.

Specifically, a preset digital sound console is called, background audio adjustment is carried out on audio data to be processed according to an audio processing strategy, wherein a server receives an actual parameter value related to the background audio to be adjusted, further, the server adjusts the parameter adjustment value of the preset digital sound console on the actual parameter value of the background audio to be adjusted, further, the server carries out the background audio adjustment on the audio data to be processed according to the parameter adjustment value to obtain second audio data, carries out audio integration processing on the first audio data according to the second audio data to obtain audio data after the audio integration processing, and carries out video fusion on the audio data after the audio integration processing and the first image data to generate target recorded video.

In the embodiment of the invention, the first image data is input into an object behavior analysis model set for behavior analysis to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; according to the method, the system and the device, the target recording video is generated by carrying out background audio adjustment on the audio data to be processed according to an audio processing strategy, and the fault audio data to be processed is found in time by carrying out real-time detection on the behavior of the target recording object in the live broadcast recording process, and then the target recording video is produced by an audio processing technology.

In a specific embodiment, the process of executing step S101 may specifically include the following steps:

(1) Acquiring original recording data of a target recording object based on a preset live broadcast recording end, and acquiring time stamp data of the original recording data;

(2) Inputting the original recorded data into a preset video image extraction network to extract video images according to the time stamp data, so as to obtain first image data;

(3) The original recorded data is input into a preset audio segmentation network to carry out audio data segmentation to obtain initial audio data, and the initial audio data is subjected to audio transcoding according to the time stamp data to obtain first audio data.

Specifically, original recording data of a target recording object is collected based on a preset live broadcast recording end, and timestamp data of the original recording data is obtained, wherein a server randomly extracts N frames of images of the original recording data, records position information of N frames in a video stream, detects timestamp rectangular frame position information in each frame of images through an image timestamp detection algorithm, and splices the N frames of image timestamp rectangular frame position information and the timestamp image data to serve as the timestamp data of the video stream; and further obtaining time stamp data of the original recorded data, inputting the original recorded data into a preset video image extraction network for video image extraction according to the time stamp data to obtain first image data, inputting the original recorded data into a preset audio segmentation network for audio data segmentation to obtain initial audio data, and performing audio transcoding on the initial audio data according to the time stamp data to obtain first audio data, wherein a server divides a video stream of the original recorded data into at least one group of pictures GOP, stores each GOP as a file, divides an audio stream of the original recorded data into at least one audio packet according to a fixed frame number, stores each audio packet as a file to obtain initial audio data, and further, the server performs audio transcoding on the initial audio data according to the time stamp data, wherein the initial audio data is written into each GOP file to finally obtain the first audio data.

In a specific embodiment, as shown in fig. 2, the process of executing step S102 may specifically include the following steps:

s201, inputting first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts;

s202, performing behavior analysis on different parts in the first image data through a plurality of object behavior analysis models respectively to obtain a behavior analysis result of each object behavior analysis model;

s203, coding and storing the behavior analysis result of each object behavior analysis model to obtain a coding value of each behavior analysis result;

s204, constructing an initial behavior analysis result set according to the coding value of each behavior analysis result.

Specifically, the first image data is input into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models, the plurality of object behavior analysis models are respectively used for performing behavior analysis on different parts, and the server builds training data sets of different parts in the historical image data before the first image data is input into the preset object behavior analysis model set; generating a behavior image based on different part training data sets in historical image data, inputting the behavior image into an analysis model to be trained to obtain behavior classifications of different part training data sets in the historical image data, obtaining prediction loss of the analysis model to be trained based on the behavior classifications and the real behavior labels of the different part training data sets in the historical image data, training the analysis model set to be trained by using the prediction loss to obtain a final object behavior analysis model set, further, performing behavior analysis on different parts in the first image data through a plurality of object behavior analysis models to obtain behavior analysis results of each object behavior analysis model, performing coding storage on the behavior analysis results of each object behavior analysis model to obtain coding values of each behavior analysis result, and constructing an initial behavior analysis result set according to the coding values of each behavior analysis result.

In a specific embodiment, as shown in fig. 3, the process of executing step S103 may specifically include the following steps:

s301, carrying out abnormal behavior recognition on an initial behavior analysis result set to obtain at least one target behavior analysis result;

s302, generating at least one information tag according to at least one target behavior analysis result;

s303, performing image serial connection processing on at least one information label to obtain an image serial connection label corresponding to at least one target behavior analysis result.

Specifically, abnormal behavior recognition is performed on the initial behavior analysis result set to obtain at least one target behavior analysis result, wherein a server extracts an image of the initial behavior analysis result set, recognizes a facial expression in the image of the initial behavior analysis result set and actions and behavior trends in the image of the initial behavior analysis result set, and finally generates at least one target behavior analysis result according to the facial expression and the actions and the behavior trends in the image of the initial behavior analysis result set, and further generates at least one information tag according to the at least one target behavior analysis result, and further the server performs image series processing on the at least one information tag to obtain an image series tag corresponding to the at least one target behavior analysis result, wherein the server performs tag recognition on the series tag in the at least one target behavior analysis result based on a preset sample image to obtain a target parameter value corresponding to the series tag, and finally, the server performs image series processing on the at least one information tag according to the target parameter value corresponding to the series tag to obtain the image series tag corresponding to the at least one target behavior analysis result.

In a specific embodiment, the process of executing step S104 may specifically include the following steps:

(1) Determining corresponding second image data according to the image serial labels;

(2) Acquiring an audio segment corresponding to the second image data;

(3) And performing audio data matching on the audio segments to obtain audio data to be processed corresponding to the second image data.

Specifically, corresponding second image data is determined according to the image serial label, wherein the server determines common characteristics of image characteristics in one or more associated image data according to the image serial label, and further determines corresponding second image data according to the common characteristics of the image characteristics in the one or more associated image data.

In a specific embodiment, as shown in fig. 4, the process of performing step S105 may specifically include the following steps:

s401, inputting audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network;

S402, analyzing background audio faults of the audio data to be processed through an audio fault classification model to obtain a background audio fault type;

s403, acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.

Specifically, the server inputs the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: the method comprises the steps of carrying out background audio fault analysis on audio data to be processed through an audio fault classification model to obtain a background audio fault type, preprocessing the audio data to be processed by a server, obtaining a time-frequency mask of equipment sound by using a deep neural network, separating pure equipment sound by using the time-frequency mask, carrying out audio fault pre-judgment on the separated sound activity by using an initial analysis model, carrying out audio event starting and ending endpoint detection on an audio data area which is judged to be an audio fault, intercepting an audio event fragment, carrying out accurate identification on the detected audio event fragment to obtain the background audio fault type, finally, obtaining a strategy list by the server, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type.

In a specific embodiment, the process of executing step S106 may specifically include the following steps:

(1) Setting a parameter adjustment value of a preset digital sound console according to an audio processing strategy;

(2) According to the parameter adjustment value, carrying out background audio adjustment on the audio data to be processed to obtain second audio data;

(3) According to the second audio data, performing audio integration processing on the first audio data to obtain audio data after the audio integration processing;

(4) And carrying out video fusion on the audio data and the first image data after the audio integration processing to generate a target recording video.

Specifically, setting a parameter adjustment value of a preset digital sound console according to an audio processing strategy, and performing background audio adjustment on audio data to be processed according to the parameter adjustment value to obtain second audio data, wherein a server judges whether the sound tuning parameters of the digital sound console meet parameter adjustment conditions or not, wherein the parameter adjustment conditions are used for indicating adjustment of the current parameter values of the sound tuning parameters; when the current parameter value is determined to meet the parameter adjustment condition, adjusting the current parameter value to a parameter value corresponding to the parameter adjustment value, and further, performing audio integration processing on the first audio data by the server according to the second audio data to obtain audio data after the audio integration processing, wherein the server obtains the audio data obtained after the analog-to-digital conversion of the first audio data; when two paths of audio data to be integrated exist in the obtained audio data, integrating the two paths of audio data to be integrated to obtain audio data after audio integration processing, and finally, the server performs video fusion on the audio data after audio integration processing and the first image data to generate a target recorded video.

The method for collecting and processing live-recorded audio data in the embodiment of the present invention is described above, and the device for collecting and processing live-recorded audio data in the embodiment of the present invention is described below, referring to fig. 5, one embodiment of the device for collecting and processing live-recorded audio data in the embodiment of the present invention includes:

the acquisition module 501 is configured to acquire original recording data of a target recording object based on a preset live broadcast recording end, and perform image and audio segmentation on the original recording data to obtain first image data and first audio data;

the analysis module 502 is configured to input the first image data into a preset object behavior analysis model set, and perform behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set, so as to obtain an initial behavior analysis result set;

a construction module 503, configured to perform behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and construct an image serial label corresponding to the at least one target behavior analysis result;

a matching module 504, configured to determine corresponding second image data according to the image serial label, and match audio data to be processed according to the second image data;

The processing module 505 is configured to perform a background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generate an audio processing policy according to the background audio fault type;

the generating module 506 is configured to call a preset digital sound console, perform background audio adjustment on the audio data to be processed according to the audio processing policy, obtain second audio data, and generate a target recorded video according to the second audio data and the first image data.

Inputting the first image data into an object behavior analysis model set to conduct behavior analysis through the cooperative cooperation of the components, so as to obtain an initial behavior analysis result set; performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag; determining second image data according to the image serial labels, and matching audio data to be processed according to the second image data; performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type; according to the method, the system and the device, the target recording video is generated by carrying out background audio adjustment on the audio data to be processed according to an audio processing strategy, and the fault audio data to be processed is found in time by carrying out real-time detection on the behavior of the target recording object in the live broadcast recording process, and then the target recording video is produced by an audio processing technology.

Fig. 5 above describes the live recording audio data acquisition and processing device in the embodiment of the present invention in detail from the perspective of a modularized functional entity, and the live recording audio data acquisition and processing device in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 6 is a schematic structural diagram of a live recording audio data collecting and processing device according to an embodiment of the present invention, where the live recording audio data collecting and processing device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 610 and a memory 620, and one or more storage media 630 (such as one or more mass storage devices) storing application programs 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations in the live recording audio data acquisition processing device 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the live recorded audio data acquisition processing device 600.

The live recorded audio data acquisition processing device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input/output interfaces 660, and/or one or more operating systems 631, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the live recording audio data acquisition processing device configuration shown in fig. 6 is not limiting of the live recording audio data acquisition processing device and may include more or fewer components than shown, or may be combined with certain components, or may be arranged in a different arrangement of components.

The invention also provides a live broadcast recorded audio data acquisition and processing device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor executes the steps of the live broadcast recorded audio data acquisition and processing method in the above embodiments.

The present invention also provides a computer readable storage medium, which may be a nonvolatile computer readable storage medium, and the computer readable storage medium may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions run on a computer, cause the computer to execute the steps of the live recording audio data acquisition processing method.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be an image computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random acceS memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The live broadcast recorded audio data acquisition and processing method is characterized by comprising the following steps of:

inputting the first image data into a preset object behavior analysis model set, and respectively performing behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set, wherein the method specifically comprises the following steps of: inputting the first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts; performing behavior analysis on different parts in the first image data through the object behavior analysis models to obtain a behavior analysis result of each object behavior analysis model; performing coding storage on the behavior analysis results of each object behavior analysis model to obtain a coding value of each behavior analysis result; constructing an initial behavior analysis result set according to the coding value of each behavior analysis result; specifically, before the first image data is input into a preset object behavior analysis model set, different part training data sets in historical image data are constructed; generating a behavior image based on different part training data sets in the historical image data, inputting the behavior image into an analysis model to be trained to obtain behavior classification of the different part training data sets in the historical image data, obtaining prediction loss of the analysis model to be trained based on the behavior classification of the different part training data sets in the historical image data and a real behavior label, and training the analysis model set to be trained by using the prediction loss to obtain an object behavior analysis model set;

Performing behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag corresponding to the at least one target behavior analysis result, wherein the method specifically comprises the following steps of: performing abnormal behavior recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result; generating at least one information tag according to the at least one target behavior analysis result; performing image serial connection processing on the at least one information tag to obtain an image serial connection tag corresponding to the at least one target behavior analysis result; specifically, an image of an initial behavior analysis result set is extracted, facial expressions in the image of the initial behavior analysis result set and actions and behavior trends in the image of the initial behavior analysis result set are identified, at least one target behavior analysis result is generated according to the facial expressions and the actions and the behavior trends in the image of the initial behavior analysis result set, at least one information tag is generated according to the at least one target behavior analysis result, image series processing is carried out on the at least one information tag to obtain an image series tag corresponding to the at least one target behavior analysis result, tag identification is carried out on the information tag in the at least one target behavior analysis result based on a preset sample image, a target parameter value corresponding to the information tag is obtained, image series processing is carried out on the at least one information tag according to the target parameter value corresponding to the information tag, and an image series tag corresponding to the at least one target behavior analysis result is obtained;

performing background audio fault analysis on the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type, wherein the method specifically comprises the following steps of: inputting the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network; performing background audio fault analysis on the audio data to be processed through the audio fault classification model to obtain a background audio fault type; acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type;

2. The method for acquiring and processing live broadcast recorded audio data according to claim 1, wherein the acquiring original recorded data of a target recorded object based on a preset live broadcast recording terminal, and performing image and audio segmentation on the original recorded data to obtain first image data and first audio data, includes:

3. The method for collecting and processing live recording audio data according to claim 1, wherein determining corresponding second image data according to the image serial label, and matching audio data to be processed according to the second image data, comprises:

acquiring an audio segment corresponding to the second image data;

4. The method for collecting and processing live broadcast recorded audio data according to claim 1, wherein the calling a preset digital sound console, performing background audio adjustment on the audio data to be processed according to the audio processing policy to obtain second audio data, and generating a target recorded video according to the second audio data and the first image data comprises:

5. The utility model provides an audio data acquisition processing apparatus of live recording which characterized in that, audio data acquisition processing apparatus of live recording includes:

the analysis module is used for inputting the first image data into a preset object behavior analysis model set, and respectively carrying out behavior analysis on different parts in the first image data through each object behavior analysis model in the object behavior analysis model set to obtain an initial behavior analysis result set, and specifically comprises the following steps: inputting the first image data into a preset object behavior analysis model set, wherein the object behavior analysis model set comprises a plurality of object behavior analysis models which are respectively used for performing behavior analysis on different parts; performing behavior analysis on different parts in the first image data through the object behavior analysis models to obtain a behavior analysis result of each object behavior analysis model; performing coding storage on the behavior analysis results of each object behavior analysis model to obtain a coding value of each behavior analysis result; constructing an initial behavior analysis result set according to the coding value of each behavior analysis result; specifically, before the first image data is input into a preset object behavior analysis model set, different part training data sets in historical image data are constructed; generating a behavior image based on different part training data sets in the historical image data, inputting the behavior image into an analysis model to be trained to obtain behavior classification of the different part training data sets in the historical image data, obtaining prediction loss of the analysis model to be trained based on the behavior classification of the different part training data sets in the historical image data and a real behavior label, and training the analysis model set to be trained by using the prediction loss to obtain an object behavior analysis model set;

The construction module is used for carrying out behavior feature recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result, and constructing an image series tag corresponding to the at least one target behavior analysis result, and specifically comprises the following steps: performing abnormal behavior recognition on the initial behavior analysis result set to obtain at least one target behavior analysis result; generating at least one information tag according to the at least one target behavior analysis result; performing image serial connection processing on the at least one information tag to obtain an image serial connection tag corresponding to the at least one target behavior analysis result; specifically, an image of an initial behavior analysis result set is extracted, facial expressions in the image of the initial behavior analysis result set and actions and behavior trends in the image of the initial behavior analysis result set are identified, at least one target behavior analysis result is generated according to the facial expressions and the actions and the behavior trends in the image of the initial behavior analysis result set, at least one information tag is generated according to the at least one target behavior analysis result, image series processing is carried out on the at least one information tag to obtain an image series tag corresponding to the at least one target behavior analysis result, tag identification is carried out on the information tag in the at least one target behavior analysis result based on a preset sample image, a target parameter value corresponding to the information tag is obtained, image series processing is carried out on the at least one information tag according to the target parameter value corresponding to the information tag, and an image series tag corresponding to the at least one target behavior analysis result is obtained;

the processing module is used for analyzing the background audio fault of the audio data to be processed to obtain a background audio fault type, and generating an audio processing strategy according to the background audio fault type, and specifically comprises the following steps: inputting the audio data to be processed into a preset audio fault classification model, wherein the audio fault classification model comprises: a first layer of bidirectional threshold cycle network, a second layer of bidirectional threshold cycle network and a fully connected network; performing background audio fault analysis on the audio data to be processed through the audio fault classification model to obtain a background audio fault type; acquiring a strategy list, and inquiring an audio processing strategy corresponding to the background audio fault type from the strategy list according to the background audio fault type;

6. A live recorded audio data acquisition and processing device, wherein the live recorded audio data acquisition and processing device comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the live recorded audio data acquisition processing device to perform the live recorded audio data acquisition processing method of any of claims 1-4.

7. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the live recorded audio data acquisition processing method of any one of claims 1-4.