CN115424634A

CN115424634A - Audio and video stream data processing method and device, electronic equipment and storage medium

Info

Publication number: CN115424634A
Application number: CN202211047686.0A
Authority: CN
Inventors: 汪秀兵; 闫振利; 赵君; 王亮
Original assignee: China United Network Communications Group Co Ltd; China Unicom Online Information Technology Co Ltd
Current assignee: China United Network Communications Group Co Ltd; China Unicom Online Information Technology Co Ltd
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-12-02

Abstract

The electronic equipment obtains audio and video stream data comprising audio data and video data, target audio characteristic information is obtained by utilizing the audio data, target video characteristic information is obtained by utilizing the video data, and then target information density of the audio and video stream data is obtained according to the target audio characteristic information and the target video characteristic information, so that the electronic equipment timely and comprehensively masters the security and protection related conditions in the audio and video stream data according to the target information density, and the accuracy of security and protection operation executed by the electronic equipment according to the target information density is guaranteed.

Description

Audio and video stream data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of information technologies, and in particular, to an audio/video stream data processing method and apparatus, an electronic device, and a storage medium.

Background

The household security alarm system monitors the security state of the living environment by using a sensor installed in a house, and performs corresponding alarm operation when the security state is abnormal. However, most of the households in China currently only acquire environment information through a camera installed at a fixed position, and utilize security equipment to process the environment information so as to obtain corresponding security information, and accordingly perform security operation.

In the prior art, two situations exist in the acquisition of security information, in one situation, a security device utilizes a single identification model to synchronously process environmental information acquired by a camera, and after a corresponding identification result is obtained, the environmental information is synchronously played with the environmental information, so that the analysis of the environmental information is insufficient, important information is easily ignored, the correspondingly executed security operation is incorrect, and the loss of personal and property safety is caused; under another condition, the security equipment analyzes the environment information stored locally by using different analysis methods to obtain multi-angle analysis results, but cannot realize continuous monitoring on the environment information acquired by the camera, so that the security operation is not executed timely.

Disclosure of Invention

The application provides an audio and video stream data processing method and device, electronic equipment and a storage medium, which are used for solving the technical problems of untimely execution of security operation and low accuracy.

In a first aspect, the present application provides a method for processing audio/video stream data, where the method includes:

obtaining audio and video stream data; the audio and video stream data comprises audio data and video data;

acquiring target audio characteristic information by using the audio data, and acquiring target video characteristic information by using the video data;

acquiring target information density of audio and video stream data according to the target audio characteristic information and the target video characteristic information;

and executing corresponding security operation according to the target information density.

In the technical scheme, the electronic equipment respectively extracts the characteristics of the audio data and the video data in the audio and video stream data obtained in real time, and comprehensively analyzes the obtained characteristics to obtain the target information density related to security in the audio and video stream data, so that the electronic equipment can comprehensively know the risk related to security in the audio and video stream data through the target information density, and can timely execute corresponding security operation according to the target information density.

Optionally, the obtaining target audio feature information by using the audio data specifically includes:

analyzing the sound wave change state of the audio data to determine sound abnormal state information; the sound abnormal state information includes an abnormal level, an abnormal duration and an abnormal number of times;

performing voice recognition on the audio data to determine risk voice information; the risk voice information comprises risk keywords and/or risk tone color information;

the target audio characteristic information includes sound abnormal state information and/or risk speech information.

Optionally, analyzing the sound wave variation of the audio data to determine the sound abnormal state information includes:

calculating the sound wave amplitude change frequency of the audio data, and determining an abnormal sound wave signal of which the sound wave amplitude change degree and the change frequency meet the sound abnormal state condition;

counting sound wave signals, and determining the abnormal level, the abnormal duration and the abnormal times of the abnormal sound wave signals in the audio and video stream data;

the abnormal condition is that the change degree of the sound wave amplitude is within the preset sound wave amplitude change range, and the change frequency is within the preset frequency change range.

Optionally, performing speech recognition on the audio data, and determining risk speech information specifically includes:

processing the audio data by using a voice recognition model to obtain text information corresponding to the audio data;

and performing keyword matching on the text information by using preset keywords to determine risk keywords.

processing the audio data by using the voiceprint model to obtain user identity information corresponding to the audio data; the voiceprint model is a model trained using audio data of the target user;

and when the user identity information is not the target user, determining the audio data as the risk tone color information.

In the technical scheme, the electronic equipment extracts the risk keywords through sound wave change analysis of the audio data and determines the identity of the user through the voiceprint, so that multi-angle analysis of the audio data is realized, omission of important characteristic information in the audio and video stream data processing process by the electronic equipment is ensured, and the accuracy of security protection operation of the electronic equipment according to the obtained information is ensured.

Optionally, the obtaining target video feature information by using the video data specifically includes:

performing principal component analysis on a target image in the video data to obtain an analysis result, and determining an abnormal image in the video data according to the analysis result;

processing the video data by using the behavior recognition model, and determining abnormal behavior information in the video data;

the target video feature information includes abnormal images and/or abnormal behavior information.

Optionally, performing principal component analysis on a target image in the video data to obtain an analysis result, and determining an abnormal image in the video data according to the analysis result, specifically including:

selecting a plurality of frames of target images from the video data according to a preset frame number interval;

performing principal component analysis on each target image to obtain the image mean value change state of the target images of adjacent frames;

and when the image mean value change state meets the preset risk image condition, determining the target image as an abnormal image.

Optionally, obtaining a target information density of the audio/video stream data according to the target audio characteristic information and the target video characteristic information, specifically including:

and processing the target audio characteristic information and the target video characteristic information by using the data information density fitting model to obtain the target information density of the audio and video stream data.

In the technical scheme, the electronic equipment analyzes the two types of data of the audio data and the video data which are split from the audio and video stream data respectively, so that the phenomenon that the characteristics related to security only appear in the data of a single type is prevented, the comprehensiveness of the electronic equipment in analyzing the audio and video stream data is guaranteed, and the accuracy of performing related security operation on the electronic equipment is improved.

In a second aspect, the present application provides an audio/video stream data processing apparatus, including:

the acquisition module is used for acquiring audio and video stream data; the audio and video stream data comprises audio data and video data;

the processing module is used for acquiring target audio characteristic information by using the audio data and acquiring target video characteristic information by using the video data;

the processing module is also used for obtaining the target information density of the audio and video stream data according to the target audio characteristic information and the target video characteristic information;

the processing module is also used for executing corresponding security operation according to the target information density.

In a third aspect, the present application provides an electronic device, comprising: a processor and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor is used for realizing the audio and video stream data processing method related to the first aspect when executing the computer execution instructions.

In a fourth aspect, the present application provides a computer-readable storage medium, in which computer instructions are stored, and when executed by a processor, the computer instructions are used to implement the audio and video stream data processing method according to the first aspect.

The electronic equipment obtains audio and video stream data comprising audio data and video data, obtains target audio characteristic information by using the audio data, obtains target video characteristic information by using the video data, and obtains target information density of the audio and video stream data according to the target audio characteristic information and the target video characteristic information, so that the electronic equipment timely and comprehensively grasps the conditions related to security protection in the audio and video stream data according to the target information density, and the accuracy of security protection operation executed by the electronic equipment according to the target information density is ensured.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic flowchart of an audio and video stream data processing method provided in the present application according to an exemplary embodiment;

fig. 2 is a schematic flowchart of an audio/video stream data processing method provided in the present application according to another exemplary embodiment;

fig. 3 is a schematic diagram of an audio and video stream data processing method provided in the present application according to another exemplary embodiment;

fig. 4 is a schematic structural diagram of an audio/video stream data processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Specific embodiments of the present application have been shown by way of example in the drawings and will be described in more detail below. The drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the concepts of the application by those skilled in the art with reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The application provides an audio and video stream data processing method and device, electronic equipment and a storage medium, and aims to solve the technical problems of untimely execution of security and protection operation and low accuracy. The technical idea of the application is as follows: after the electronic equipment obtains the audio and video stream data, the audio data and the video data in the audio and video stream data are respectively processed, the characteristic information corresponding to the audio and video stream data is obtained from multiple angles, the comprehensive analysis of the audio and video stream data by the electronic equipment is ensured, and the target information density in the audio and video stream data is determined according to the characteristic information, so that the electronic equipment pertinently executes security protection operation according to the target information density, and the electronic equipment is ensured to execute the security protection operation in time when monitoring that the security protection state is abnormal.

The applicable scene of the audio and video stream data processing method provided by the application is a scene for monitoring and analyzing the audio and video stream data in real time, in the scene, the electronic equipment is an execution main body, and the electronic equipment can be internally provided with an audio and video stream data acquisition unit and can also be connected with the audio and video stream data acquisition equipment, so that the electronic equipment can acquire and process the original audio and video stream data in real time. The audio and video stream data acquisition unit or the audio and video stream data acquisition equipment is used for acquiring the environmental information of the environment where the audio and video stream data acquisition unit or the audio and video stream data acquisition equipment is located in real time and generating corresponding original audio and video stream data.

After the electronic device obtains the original audio/video stream data, the original audio/video stream data is sliced according to a preset time interval to obtain sliced data, namely the audio/video stream data. Each audio and video stream data contains the same number of frames of video data and audio data corresponding to the video data. And the electronic equipment splits the audio and video stream data to obtain corresponding audio data and video data. The electronic equipment respectively analyzes the audio data and the video data to obtain corresponding characteristic information. The electronic equipment processes the characteristic information to obtain the target information density of the audio and video stream data, and executes corresponding security operation according to the target information density, wherein the security operation comprises the following steps: the audio and video stream data with the target information density within a preset range in the original audio and video stream data is stored, corresponding alarm information is generated according to the target information density, and the alarm information is sent to a corresponding functional department.

In one embodiment, an electronic device includes a main processing unit, an audio processing unit, and a video processing unit. The main processing unit is respectively connected with the audio processing unit and the video processing unit. After the electronic equipment obtains the original audio and video stream data, the main processing unit is used for carrying out fragmentation operation on the original audio and video stream data to obtain fragmentation data, namely the audio and video stream data. And the main processing unit splits the data according to the data types to obtain audio data and video data. The main processing unit sends the audio data to the audio processing unit so that the audio processing unit analyzes sound wave information and audio content in the audio data to determine information which is contained in the audio data and related to security protection; the main processing unit also sends the video data to the video processing unit so that the video processing unit can perform action analysis on each frame of image in the video data and change condition analysis of pixels between adjacent frames so as to determine information related to security protection in the video data. And the main processing unit determines the target information density in the audio and video stream data according to the information obtained by the audio processing unit and the video processing unit, so that corresponding security operation is executed according to the target information density.

Fig. 1 is a schematic flowchart of an audio/video stream data processing method provided in the present application according to an exemplary embodiment. As shown in fig. 1, the audio/video stream data processing method includes:

s101, the electronic equipment obtains audio and video stream data.

The audio and video stream data is data acquired by the audio and video stream data acquisition unit or the audio and video stream data acquisition equipment in real time of the environmental information of the audio and video stream data acquisition unit or the audio and video stream data acquisition equipment. The device collects audio and video stream data and transmits the audio and video stream data to the electronic device in real time according to a preset communication mode so that the electronic device can obtain the audio and video stream data in real time.

The audio-video stream data includes audio data and video data.

More specifically, audio data in the audio-video stream data is synchronized with the video data.

S102, the electronic equipment obtains target audio characteristic information by using the audio data and obtains target video characteristic information by using the video data.

The target audio characteristic information represents characteristic information related to the security abnormal state in the audio data, and the target video characteristic information represents characteristic information related to the security abnormal state in the video data.

The security abnormal state is an environmental state including security abnormal behavior, for example: environmental conditions including hazardous actions and/or hazardous words, environmental conditions including production safety incidents.

The electronic equipment respectively processes the audio data and the video data to obtain corresponding target audio characteristic information and target video characteristic information.

More specifically, the target audio feature information includes sound abnormal state information and/or risk speech information. The sound abnormal state information is information that the waveform of the sound wave signal is abnormal, and the risk voice information is preset sensitive information existing in semantic information in the audio data.

Wherein, the abnormal image is one or more frames of images of pixel characteristics existing in the target video and associated with the environmental state of the production safety accident, such as: and when the pixel change state of each frame of image is consistent with the preset pixel change state when the production safety accident occurs, determining that the image is an abnormal image.

S103, the electronic equipment obtains the target information density of the audio and video stream data according to the target audio characteristic information and the target video characteristic information.

The target information density is a quantity representing characteristics related to security abnormal events in the audio and video stream data.

The electronic equipment processes and counts the target audio characteristic information and the target video characteristic information obtained from the audio and video stream data to obtain the target information density in the audio and video stream data.

And S104, the electronic equipment executes corresponding security operation according to the target information density.

And the electronic equipment compares the target information density with a preset information density range, determines the security risk degree of the event described by the audio and video stream data, and executes corresponding security operation according to the security risk degree.

The security operation includes but is not limited to storing fragment data of which the target information density is within a preset range in the audio and video stream data, generating corresponding alarm information according to the target information density, and sending alarm information to corresponding functional departments.

Fig. 2 is a schematic flowchart of an audio-video stream data processing method provided by the present application according to another exemplary embodiment, and as shown in fig. 2, the method includes:

s201, the electronic equipment obtains audio and video stream data.

The electronic equipment obtains original audio and video stream data, and performs fragmentation processing on the data according to a preset fragmentation time interval to obtain a plurality of audio and video stream data.

The time lengths corresponding to the audio and video stream data are the same, and the audio and video stream data comprise video data with the same frame number and audio data corresponding to the video data. For example: the duration of a plurality of audio and video stream data obtained by the electronic equipment is 1 minute, and each audio and video stream data comprises 100 frames of video data.

After the electronic equipment obtains the audio and video stream data, extracting attribute information of the audio and video stream data, wherein the attribute information comprises the start time and the end time of the audio and video stream data, the storage type of the audio and video stream data, the size of the audio and video stream data, the file offset position of the audio and video stream data in the original audio and video stream data, the marking information of the audio and video stream data, and the marking of a key packet and a non-key packet.

In addition, the electronic device splits the audio-video stream data into audio data and video data.

S202, the electronic equipment analyzes the sound wave change state of the audio data and determines sound abnormal state information.

The audio data includes a sound wave signal whose parameters include loudness and timbre.

The sound abnormal state information is information describing the presence of an abnormal situation in the acoustic wave signal, and includes an abnormality level, an abnormality duration, and the number of abnormalities.

The electronic equipment analyzes the waveform change state of the sound wave signal, determines the abnormal state of the sound, and updates the attribute information of the audio and video stream data according to the abnormal state. More specifically, the marking information of the audio and video stream data and the marks of the key packets and the non-key packets are updated. For example: adding 'sound abnormity' into the marking information of the audio and video stream data, and marking the audio and video stream data as a key packet.

In one embodiment, the electronic device analyzes the loudness amplitude change state of the sound wave signal, i.e., analyzes decibel data of the sound wave signal which changes continuously.

And when the decibel data of the sound wave signal is in a preset decibel range, determining the time period, the duration and the abnormal level of the sound wave signal which is continuously in the preset decibel range. And the electronic equipment counts the occurrence frequency of the continuous segments and determines the frequency as the frequency of abnormal sound in the audio data. More specifically, the preset decibel range is more than one, and the corresponding anomaly level is also more than one.

For example: when the decibel data of the sound wave signal is greater than 120 decibels between 1ms and 3ms, determining that the sound wave signal in the time period is a first abnormal sound wave signal, the abnormal level of the sound wave signal is a 'dangerous' level, and the abnormal duration is 2ms; when the decibel data of the sound wave signal is greater than 60 decibels and less than or equal to 120 decibels between 7ms and 12ms, the sound wave signal in the time period is determined to be a second abnormal signal, the abnormal level of the sound wave signal is a level of 'abnormal existence', and the abnormal duration is 5ms.

In another embodiment, the electronic device calculates the sound wave amplitude change frequency of the audio data, determines the abnormal sound wave signal of which the sound wave amplitude change degree and the change frequency meet the sound abnormal state condition, counts the sound wave signal, and determines the abnormal duration and the abnormal times of the abnormal sound wave signal in the audio and video data. The abnormal state condition is that the change degree of the sound wave amplitude is within a preset sound wave amplitude change range, and the change frequency is within a preset frequency change range. For example, when the sound is small, the sound wave amplitude of the sound wave is not within the preset sound wave amplitude variation range, but the variation frequency of the sound wave is within the preset variation frequency, the electronic device will determine that the event corresponding to the audio data is not dangerous, i.e., is not related to security anomaly, and therefore, the audio signal is not determined as an abnormal sound wave signal.

S203, the electronic equipment performs voice recognition on the audio data and determines risk voice information.

The risk voice information includes risk keywords and/or risk timbre information.

In an embodiment, the electronic device may determine the risk voice information in the audio data only by the existence status of the risk keyword in the audio data.

The method comprises the steps that in the process of determining risk keywords, the electronic equipment processes audio data through a voice recognition model to obtain text information corresponding to the audio data, and keyword matching is conducted on the text information through preset keywords to determine the risk keywords. And the electronic equipment determines the sentence containing the risk keyword as risk voice information. Wherein the speech recognition model is a trained model stored locally in the electronic device. The risk keywords related in the embodiment are preset keywords in the process of training the voice recognition model.

In another embodiment, the electronic device may determine the risky speech information in the audio data only by the presence state of the risky timbre information in the audio data.

When the electronic equipment determines the risk tone information, the voice print model is utilized to process the audio data, and user identity information corresponding to the audio data is obtained. Wherein the voiceprint model is a model trained using audio data of the target user. In an embodiment, when the audio data has tone color information that does not correspond to the preset user identity information, the sentence containing the tone color information is determined as the risk voice information. For example: the target place only allows the user A to enter, the tone information of the user A is tone information corresponding to preset user identity information, when the electronic equipment monitors the tone information of other users, the target place is in a security abnormal condition, and sentences containing the tone information of the other users are determined as risk voice information by the electronic equipment.

In another embodiment, the electronic device can jointly determine the risk voice information in the audio data by using the risk keywords and the existence state of the risk tone color information in the audio data at the same time.

The electronic equipment determines risk keywords existing in the audio data by using the voice recognition model, determines the identity of a user emitting the sound of the risk keywords in the audio data according to the voiceprint model so as to determine the security and protection danger degree of the user, and determines statements existing in the risk keywords as risk voice information.

After the electronic equipment determines the risk voice information, the attribute information of the audio and video stream data is updated according to the risk voice information.

And S204, the electronic equipment performs principal component analysis on the target image in the video data to obtain an analysis result, and determines an abnormal image in the video data according to the analysis result.

More specifically, the electronic equipment selects multi-frame target images from the video data according to a preset frame number interval so as to prevent the influence of single-frame image abnormity on the principal component analysis result. And the electronic equipment performs principal component analysis on each target image to obtain the image mean value change state of the target images of adjacent frames, determines the target images as abnormal images when the image mean value change state meets the preset risk image condition, and correspondingly updates the attribute information of the audio/video stream data.

For example: when the electronic equipment analyzes the main components of the multi-frame target images, the red channel image mean value of each target image is determined to exceed the preset number mean value, the risk of fire in the environment where the audio and video stream data acquisition unit or the audio and video stream data acquisition equipment is located is determined, and therefore the target images are determined to be abnormal images.

S205, the electronic equipment processes the video data by using the behavior recognition model, and determines abnormal behavior information in the video data.

The behavior recognition model is a trained model stored locally in the electronic device. The model is used to identify the actions exhibited by the user in the video.

More specifically, the electronic device determines the identity information of the users in the video data by using the target recognition model, and determines the behavior information of each user by using the behavior recognition model. The electronic equipment inquires each identity information and the behavior information corresponding to the identity information in the abnormal behavior information mapping table, determines whether the behavior information is abnormal behavior information or not, and updates the attribute information of the audio and video stream data according to the abnormal behavior information. The abnormal behavior information mapping table represents a mapping relation between the identity information and the corresponding preset abnormal behavior information. And the identity information is different, and the corresponding preset abnormal behavior information is different.

For example: the user A is an old man, and in the abnormal behavior information mapping table, the preset abnormal behavior information corresponding to the user A is behaviors such as wrestling, twitching and the like; the user B is a young person, and in the abnormal behavior information mapping table, the preset abnormal behavior information corresponding to the user B is the behaviors of charging, twitching and the like.

And S206, the electronic equipment processes the target audio characteristic information and the target video characteristic information by using the data information density fitting model to obtain the target information density of the audio and video stream data.

The target audio feature information includes the sound abnormal state information obtained in step S202 and/or the risk speech information obtained in step S203. The target video feature information includes the abnormal image obtained in step S204 and/or the abnormal behavior information obtained in step S205.

The input data of the data information density fitting model are target audio characteristic information and target video characteristic information, the output data of the model are target information density, and the target information density represents the number of the characteristics related to the security abnormal events existing in the audio and video stream data determined by the electronic equipment according to the analysis result.

The data information density fitting model may be a mathematical modeling model, or a model trained by using a neural network, and is not limited herein.

And S207, the electronic equipment executes corresponding security operation according to the target information density.

The security operation is explained in detail in step S104, and is not described here again.

The relationship between the above processing steps will be explained in detail by the embodiment corresponding to fig. 3, where the relationship between the electronic device and the camera shown in fig. 3 is only one scenario to which the proposed solution is applicable.

The method comprises the steps that after original audio and video stream data containing environment information of a camera is collected, the data are transmitted to electronic equipment, and a main processor in the electronic equipment intercepts the audio and video stream data from the original audio and video stream data according to the sequence of receiving the original video stream and processes the audio and video stream data. The main processor intercepts the audio and video stream data through the slicing operation. The time lengths corresponding to the audio and video stream data intercepted by the electronic equipment are the same, and the number of the contained video data frames is also the same.

The main processor divides the obtained audio and video stream data into audio data and video data, transmits the audio data to the audio processor, and transmits the video data to the video processor. The audio processor and the video processor simultaneously process the acquired data.

The processing of audio data by the audio processor includes: sound wave analysis, voice recognition processing and voiceprint recognition. The execution sequence of the three processing methods may be any sequence, that is, the audio processor may complete the three processing procedures in sequence, or may perform parallel processing, and the sequence of the processing procedures is not specifically limited herein.

The processing of video data by the video processor comprises: principal component analysis and behavior recognition, and the execution order of the two processing methods can be any order.

The processing of the audio data and the processing of the video data are explained in detail in the embodiment corresponding to fig. 2, and are not described herein again.

The audio processor sends the processing result to the main processor, the video processor sends the processing result to the main processor, the main processor calls the data information density fitting model to analyze all the obtained processing results to obtain the information density of the audio and video stream data, and corresponding security operation is executed when the information density is within a preset information density range, and the security operation comprises the following steps: the method comprises the steps of storing current audio and video stream data in a local or appointed storage position, generating alarm information to remind surrounding users, sending alarm information to a corresponding functional department, and sending the alarm information to a user terminal associated with the electronic equipment.

In the technical scheme, the electronic equipment obtains audio and video stream data comprising audio data and video data, the audio data is used for obtaining target audio characteristic information, the video data is used for obtaining target video characteristic information, and then the target information density of the audio and video stream data is obtained according to the target audio characteristic information and the target video characteristic information, so that the electronic equipment timely and comprehensively grasps the conditions related to security protection in the audio and video stream data according to the target information density, and the accuracy of security protection operation executed by the electronic equipment according to the target information density is guaranteed.

Fig. 4 is a schematic structural diagram of an audio/video stream data processing apparatus 300 provided in the present application according to an embodiment, where the audio/video stream data processing apparatus 300 includes an obtaining module 301 and a processing module 302, where,

an obtaining module 301, configured to obtain audio/video stream data; the audio-video stream data includes audio data and video data.

The processing module 302 is configured to obtain target audio feature information by using the audio data, and obtain target video feature information by using the video data.

The processing module 302 is further configured to obtain a target information density of the audio/video stream data according to the target audio characteristic information and the target video characteristic information.

The processing module 302 is further configured to execute a corresponding security operation according to the target information density.

In an embodiment, the processing module 302 is specifically configured to:

performing voice recognition on the audio data to determine risk voice information; the risk voice information comprises risk keywords and/or risk tone information;

In an embodiment, the processing module 302 is specifically configured to:

the abnormal state condition is that the change degree of the sound wave amplitude is within the preset change range of the sound wave amplitude, and the change frequency is within the preset change range of the frequency.

In an embodiment, the processing module 302 is specifically configured to:

processing the audio data by using the voice recognition model to obtain text information corresponding to the audio data;

and performing keyword matching on the text information by using preset keywords, and determining risk keywords.

In an embodiment, the processing module 302 is specifically configured to:

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device 400 comprises, inter alia, a memory 401 and a processor 402, the memory 401 being adapted to store computer instructions executable by the processor. The Memory 401 may include a Random Access Memory (RAM), a Non-Volatile Memory (NVM), at least one disk Memory, a usb flash drive, a removable hard drive, a read-only Memory, a magnetic disk or an optical disk.

The processor 402, when executing the computer instructions, implements the steps in the audio and video stream data processing method with the electronic device as the execution subject in the foregoing embodiment. Reference may be made in particular to the description relating to the method embodiments described above. The Processor 402 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

Alternatively, the memory 401 may be separate or integrated with the processor 402. When the memory 401 is separately provided, the electronic device 400 further includes a bus for connecting the memory 401 and the processor 402.

The embodiment of the present application further provides a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when a processor executes the computer instructions, the steps in the audio/video stream data processing method in the foregoing embodiment are implemented.

The embodiment of the present application further provides a computer program product, which includes computer instructions, and when the computer instructions are executed by a processor, the computer instructions implement the steps in the audio and video stream data processing method in the foregoing embodiment.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An audio and video stream data processing method, characterized in that the method comprises:

acquiring target information density of the audio and video stream data according to the target audio characteristic information and the target video characteristic information;

2. The method according to claim 1, wherein obtaining the target audio feature information using the audio data specifically comprises:

analyzing the sound wave change state of the audio data to determine sound abnormal state information; the sound abnormal state information comprises an abnormal grade, abnormal duration and abnormal times;

the target audio characteristic information comprises the sound abnormal state information and/or the risk voice information.

3. The method according to claim 2, wherein analyzing the acoustic wave changes of the audio data to determine abnormal sound status information comprises:

counting the sound wave signals, and determining the abnormal level, the abnormal duration and the abnormal times of the abnormal sound wave signals in the audio and video stream data;

and the abnormal state condition is that the change degree of the sound wave amplitude is within a preset sound wave amplitude change range, and the change frequency is within a preset frequency change range.

4. The method according to claim 2, wherein performing speech recognition on the audio data to determine risky speech information specifically comprises:

5. The method according to claim 2, wherein performing speech recognition on the audio data to determine risky speech information specifically comprises:

processing the audio data by using a voiceprint model to obtain user identity information corresponding to the audio data; the voiceprint model is a model trained using audio data of a target user;

when the user identity information is not the target user, determining the audio data as risk timbre information.

6. The method according to claim 1, wherein obtaining the target video feature information using the video data specifically comprises:

processing the video data by using a behavior recognition model, and determining abnormal behavior information in the video data;

the target video feature information comprises the abnormal image and/or the abnormal behavior information.

7. The method according to claim 6, wherein performing principal component analysis on a target image in the video data to obtain an analysis result, and determining an abnormal image in the video data according to the analysis result specifically comprises:

and when the image mean value change state meets a preset risk image condition, determining the target image as an abnormal image.

8. The method according to claim 1, wherein obtaining a target information density of the audio/video stream data according to the target audio feature information and the target video feature information specifically includes:

and processing the target audio characteristic information and the target video characteristic information by using a data information density fitting model to obtain the target information density of the audio and video stream data.

9. An audio-video stream data processing apparatus characterized by comprising:

the processing module is used for obtaining target audio characteristic information by using the audio data and obtaining target video characteristic information by using the video data;

the processing module is further used for obtaining the target information density of the audio and video stream data according to the target audio characteristic information and the target video characteristic information;

and the processing module is also used for executing corresponding security operation according to the target information density.

10. An electronic device, comprising: a processor and a memory communicatively coupled to the processor;

the memory stores computer instructions;

the processor when executing the computer instructions is configured to implement the audio-video stream data processing method according to any one of claims 1 to 8.

11. A computer-readable storage medium, wherein computer instructions are stored in the computer-readable storage medium, and when executed by a processor, the computer instructions are used for implementing the audio and video stream data processing method according to any one of claims 1 to 8.