CN110636329A

CN110636329A - Audio data processing method and device

Info

Publication number: CN110636329A
Application number: CN201810654096.1A
Authority: CN
Inventors: 焦克新; 安君超; 卢燕青; 韩杰
Original assignee: Visionvera Information Technology Co Ltd
Current assignee: Visionvera Information Technology Co Ltd
Priority date: 2018-06-22
Filing date: 2018-06-22
Publication date: 2019-12-31

Abstract

The embodiment of the invention provides a method and a device for processing audio data, wherein the method is applied to a video network, the video network comprises a video network terminal, and the method comprises the following steps: the video networking terminal acquires audio data to be processed; the video network terminal initializes a volume display area according to the sampling rate and the sampling digit of the audio data, wherein the volume display area is an area used for displaying volume information of the acquired audio data in the video network terminal; the video network terminal determines volume information of the audio data according to the root mean square value of the audio data; and the video network terminal draws the volume information in a volume display area after initialization operation according to a preset view control. The embodiment of the invention realizes that the volume of the audio data is visually displayed by the video network terminal.

Description

Audio data processing method and device

Technical Field

The present invention relates to the field of video networking technologies, and in particular, to a method and an apparatus for processing audio data.

Background

The video network is a special network for transmitting high-definition video and a special protocol at high speed based on Ethernet hardware, is a higher-level form of the Internet and is a real-time network.

At present, in an audio/video service system of a video network, a user cannot intuitively see the volume of sound while hearing the sound.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are proposed to provide a processing method of audio data and a corresponding processing apparatus of audio data that overcome or at least partially solve the above problems.

In order to solve the above problem, an embodiment of the present invention discloses a method for processing audio data, where the method is applied to a video network, the video network includes a video network terminal, and the method includes: the video networking terminal acquires audio data to be processed; the video network terminal initializes a volume display area according to the sampling rate and the sampling digit of the audio data, wherein the volume display area is an area used for displaying volume information of the acquired audio data in the video network terminal; the video network terminal determines volume information of the audio data according to the root mean square value of the audio data; and the video network terminal draws the volume information in a volume display area after initialization operation according to a preset view control.

Optionally, the volume display area includes: the volume value scale table, the plosive indication area, the volume value peak value indication mark and the volume value valley value indication mark.

Optionally, the video network terminal performs initialization operation on a volume display area according to the sampling rate and the sampling number of bits of the audio data, including: the video networking terminal calculates the maximum volume value and the minimum volume value of the audio data according to the sampling rate and the sampling digit of the audio data; and the video networking terminal carries out initialization operation on the volume value scale table according to the maximum volume value and the minimum volume value.

Optionally, the drawing, by the video network terminal, the volume information in the volume display area after the initialization operation according to a preconfigured view control includes: the video networking terminal draws the volume value indication area in the volume display area after the initialization operation according to the preset view control and the maximum volume value and the minimum volume value; the video networking terminal draws the maximum volume value in the volume information of the audio data acquired in a preset time period as a volume value peak value indication mark and draws the minimum volume value in the volume information of the audio data acquired in the preset time period as a volume value valley value indication mark in the volume value indication area according to a preset view control; and the video network terminal draws the volume value which is greater than a preset volume threshold value in the volume information of the audio data in the plosive indicating area according to a preset view control.

Optionally, the determining, by the video network terminal, the volume information of the audio data according to the root mean square value of the audio data includes: the video network terminal calculates the variance value of the actual sampling point of the audio data; and the video network terminal calculates volume information of the audio data according to 10 × log10(rms), wherein the rms is the variance value.

The embodiment of the invention also discloses a device for processing the audio data, which is applied to the video network terminal in the video network, and the device comprises: the acquisition module is used for acquiring audio data to be processed; the initialization module is used for initializing a volume display area according to the sampling rate and the sampling number of the audio data, wherein the volume display area is an area used for displaying volume information of the acquired audio data in the video network terminal; the determining module is used for determining the volume information of the audio data according to the root mean square value of the audio data; and the drawing module is used for drawing the volume information in the volume display area after the initialization operation according to the preset view control.

Optionally, the initialization module includes: the volume value calculation module is used for calculating the maximum volume value and the minimum volume value of the audio data according to the sampling rate and the sampling digit of the audio data; and the scale initialization module is used for initializing the scale of the volume value according to the maximum volume value and the minimum volume value.

Optionally, the drawing module is configured to draw the volume value indication region in the volume display region after the initialization operation according to the maximum volume value and the minimum volume value according to a preset view control; the drawing module is used for drawing a maximum volume value in the volume information of the audio data acquired within a preset time period as the volume value peak value indication mark and drawing a minimum volume value in the volume information of the audio data acquired within the preset time period as the volume value valley value indication mark in the volume value indication area according to a preset view control; and the drawing module is used for drawing the volume value which is greater than a preset volume threshold value in the volume information of the audio data in the plosive indicating area according to a preset view control.

Optionally, the determining module includes: the variance value calculation module is used for calculating the variance value of the actual sampling point of the audio data; and the volume information calculation module is used for calculating volume information of the audio data according to 10 × log10(rms), wherein the rms is the variance value.

The embodiment of the invention has the following advantages:

the embodiment of the invention is applied to the video network, the video network comprises a video network terminal, and the video network terminal acquires audio data to be processed; initializing a volume display area by the video network terminal according to the sampling rate and the sampling digit of the audio data, wherein the volume display area is an area used for displaying the volume information of the acquired audio data in the video network terminal; the video network terminal determines the volume information of the audio data according to the root mean square value of the audio data; and the video network terminal draws the volume information in the volume display area after the initialization operation according to the preset view control.

According to the embodiment of the invention, by applying the characteristics of the video network, after the video network terminal acquires the audio data to be processed, the volume display area is initialized, the volume information of the audio data is determined, and the volume information is drawn in the volume display area according to the preset view control, so that the video network terminal can visually display the volume of the audio data.

Drawings

FIG. 1 is a schematic networking diagram of a video network of the present invention;

FIG. 2 is a schematic diagram of a hardware architecture of a node server according to the present invention;

fig. 3 is a schematic diagram of a hardware structure of an access switch of the present invention;

fig. 4 is a schematic diagram of a hardware structure of an ethernet protocol conversion gateway according to the present invention;

FIG. 5 is a flow chart of steps of a method embodiment of the invention for processing audio data;

fig. 6 is a block diagram of an embodiment of an audio data processing apparatus according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The video networking is an important milestone for network development, is a real-time network, can realize high-definition video real-time transmission, and pushes a plurality of internet applications to high-definition video, and high-definition faces each other.

The video networking adopts a real-time high-definition video exchange technology, can integrate required services such as dozens of services of video, voice, pictures, characters, communication, data and the like on a system platform on a network platform, such as high-definition video conference, video monitoring, intelligent monitoring analysis, emergency command, digital broadcast television, delayed television, network teaching, live broadcast, VOD on demand, television mail, Personal Video Recorder (PVR), intranet (self-office) channels, intelligent video broadcast control, information distribution and the like, and realizes high-definition quality video broadcast through a television or a computer.

To better understand the embodiments of the present invention, the following description refers to the internet of view:

some of the technologies applied in the video networking are as follows:

network Technology (Network Technology)

Network technology innovation in video networking has improved over traditional Ethernet (Ethernet) to face the potentially enormous video traffic on the network. Unlike pure network Packet Switching (Packet Switching) or network circuit Switching (circuit Switching), the internet of vision technology employs network Packet Switching to satisfy the demand of Streaming (which is interpreted as Streaming, continuous broadcasting, and is a data transmission technology that changes received data into a stable continuous stream and continuously transmits the stream, so that the sound heard or image seen by the user is very smooth, and the user can start browsing on the screen before the whole data is transmitted). The video networking technology has the advantages of flexibility, simplicity and low price of packet switching, and simultaneously has the quality and safety guarantee of circuit switching, thereby realizing the seamless connection of the whole network switching type virtual circuit and the data format.

Switching Technology (Switching Technology)

The video network adopts two advantages of asynchronism and packet switching of the Ethernet, eliminates the defects of the Ethernet on the premise of full compatibility, has end-to-end seamless connection of the whole network, is directly communicated with a user terminal, and directly bears an IP data packet. The user data does not require any format conversion across the entire network. The video networking is a higher-level form of the Ethernet, is a real-time exchange platform, can realize the real-time transmission of the whole-network large-scale high-definition video which cannot be realized by the existing Internet, and pushes a plurality of network video applications to high-definition and unification.

Server Technology (Server Technology)

The server technology on the video networking and unified video platform is different from the traditional server, the streaming media transmission of the video networking and unified video platform is established on the basis of connection orientation, the data processing capacity of the video networking and unified video platform is independent of flow and communication time, and a single network layer can contain signaling and data transmission. For voice and video services, the complexity of video networking and unified video platform streaming media processing is much simpler than that of data processing, and the efficiency is greatly improved by more than one hundred times compared with that of a traditional server.

Storage Technology (Storage Technology)

The super-high speed storage technology of the unified video platform adopts the most advanced real-time operating system in order to adapt to the media content with super-large capacity and super-large flow, the program information in the server instruction is mapped to the specific hard disk space, the media content is not passed through the server any more, and is directly sent to the user terminal instantly, and the general waiting time of the user is less than 0.2 second. The optimized sector distribution greatly reduces the mechanical motion of the magnetic head track seeking of the hard disk, the resource consumption only accounts for 20% of that of the IP internet of the same grade, but concurrent flow which is 3 times larger than that of the traditional hard disk array is generated, and the comprehensive efficiency is improved by more than 10 times.

Network Security Technology (Network Security Technology)

The structural design of the video network completely eliminates the network security problem troubling the internet structurally by the modes of independent service permission control each time, complete isolation of equipment and user data and the like, generally does not need antivirus programs and firewalls, avoids the attack of hackers and viruses, and provides a structural carefree security network for users.

Service Innovation Technology (Service Innovation Technology)

The unified video platform integrates services and transmission, and is not only automatically connected once whether a single user, a private network user or a network aggregate. The user terminal, the set-top box or the PC are directly connected to the unified video platform to obtain various multimedia video services in various forms. The unified video platform adopts a menu type configuration table mode to replace the traditional complex application programming, can realize complex application by using very few codes, and realizes infinite new service innovation.

Networking of the video network is as follows:

the video network is a centralized control network structure, and the network can be a tree network, a star network, a ring network and the like, but on the basis of the centralized control node, the whole network is controlled by the centralized control node in the network.

As shown in fig. 1, the video network is divided into an access network and a metropolitan network.

The devices of the access network part can be mainly classified into 3 types: node server, access switch, terminal (including various set-top boxes, coding boards, memories, etc.). The node server is connected to an access switch, which may be connected to a plurality of terminals and may be connected to an ethernet network.

The node server is a node which plays a centralized control function in the access network and can control the access switch and the terminal. The node server can be directly connected with the access switch or directly connected with the terminal.

Similarly, devices of the metropolitan network portion may also be classified into 3 types: a metropolitan area server, a node switch and a node server. The metro server is connected to a node switch, which may be connected to a plurality of node servers.

The node server is a node server of the access network part, namely the node server belongs to both the access network part and the metropolitan area network part.

The metropolitan area server is a node which plays a centralized control function in the metropolitan area network and can control a node switch and a node server. The metropolitan area server can be directly connected with the node switch or directly connected with the node server.

Therefore, the whole video network is a network structure with layered centralized control, and the network controlled by the node server and the metropolitan area server can be in various structures such as tree, star and ring.

The access network part can form a unified video platform (the part in the dotted circle), and a plurality of unified video platforms can form a video network; each unified video platform may be interconnected via metropolitan area and wide area video networking.

Video networking device classification

1.1 devices in the video network of the embodiment of the present invention can be mainly classified into 3 types: servers, switches (including ethernet gateways), terminals (including various set-top boxes, code boards, memories, etc.). The video network as a whole can be divided into a metropolitan area network (or national network, global network, etc.) and an access network.

1.2 wherein the devices of the access network part can be mainly classified into 3 types: node servers, access switches (including ethernet gateways), terminals (including various set-top boxes, code boards, memories, etc.).

The specific hardware structure of each access network device is as follows:

a node server:

as shown in fig. 2, the system mainly includes a network interface module 201, a switching engine module 202, a CPU module 203, and a disk array module 204.

The network interface module 201, the CPU module 203, and the disk array module 204 all enter the switching engine module 202; the switching engine module 202 performs an operation of looking up the address table 205 on the incoming packet, thereby obtaining the direction information of the packet; and stores the packet in a queue of the corresponding packet buffer 206 based on the packet's steering information; if the queue of the packet buffer 206 is nearly full, it is discarded; the switching engine module 202 polls all packet buffer queues for forwarding if the following conditions are met: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero. The disk array module 204 mainly implements control over the hard disk, including initialization, read-write, and other operations on the hard disk; the CPU module 203 is mainly responsible for protocol processing with an access switch and a terminal (not shown in the figure), configuring an address table 205 (including a downlink protocol packet address table, an uplink protocol packet address table, and a data packet address table), and configuring the disk array module 204.

The access switch:

as shown in fig. 3, the network interface module (downstream network interface module 301, upstream network interface module 302), the switching engine module 303, and the CPU module 304 are mainly included.

Wherein, the packet (uplink data) coming from the downlink network interface module 301 enters the packet detection module 305; the packet detection module 305 detects whether the Destination Address (DA), the Source Address (SA), the packet type, and the packet length of the packet meet the requirements, if so, allocates a corresponding stream identifier (stream-id) and enters the switching engine module 303, otherwise, discards the stream identifier; the packet (downstream data) coming from the upstream network interface module 302 enters the switching engine module 303; the data packet coming from the CPU module 204 enters the switching engine module 303; the switching engine module 303 performs an operation of looking up the address table 306 on the incoming packet, thereby obtaining the direction information of the packet; if the packet entering the switching engine module 303 is from the downstream network interface to the upstream network interface, the packet is stored in the queue of the corresponding packet buffer 307 in association with the stream-id; if the queue of the packet buffer 307 is nearly full, it is discarded; if the packet entering the switching engine module 303 is not from the downlink network interface to the uplink network interface, the data packet is stored in the queue of the corresponding packet buffer 307 according to the guiding information of the packet; if the queue of the packet buffer 307 is nearly full, it is discarded.

The switching engine module 303 polls all packet buffer queues, which in this embodiment of the present invention is divided into two cases:

if the queue is from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queued packet counter is greater than zero; 3) and obtaining the token generated by the code rate control module.

If the queue is not from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port send buffer is not full; 2) the queue packet counter is greater than zero.

The rate control module 208 is configured by the CPU module 204, and generates tokens for packet buffer queues from all downstream network interfaces to upstream network interfaces at programmable intervals to control the rate of upstream forwarding.

The CPU module 304 is mainly responsible for protocol processing with the node server, configuration of the address table 306, and configuration of the code rate control module 308.

Ethernet protocol conversion gateway：

As shown in fig. 4, the apparatus mainly includes a network interface module (a downlink network interface module 401 and an uplink network interface module 402), a switching engine module 403, a CPU module 404, a packet detection module 405, a rate control module 408, an address table 406, a packet buffer 407, a MAC adding module 409, and a MAC deleting module 410.

Wherein, the data packet coming from the downlink network interface module 401 enters the packet detection module 405; the packet detection module 405 detects whether the ethernet MAC DA, the ethernet MAC SA, the ethernet length or frame type, the video network destination address DA, the video network source address SA, the video network packet type, and the packet length of the packet meet the requirements, and if so, allocates a corresponding stream identifier (stream-id); then, the MAC deletion module 410 subtracts MAC DA, MAC SA, length or frame type (2byte) and enters the corresponding receiving buffer, otherwise, discards it;

the downlink network interface module 401 detects the sending buffer of the port, and if there is a packet, acquires the ethernet MAC DA of the corresponding terminal according to the video networking destination address DA of the packet, adds the ethernet MAC DA of the terminal, the MACSA of the ethernet coordination gateway, and the ethernet length or frame type, and sends the packet.

The other modules in the ethernet protocol gateway function similarly to the access switch.

A terminal:

the system mainly comprises a network interface module, a service processing module and a CPU module; for example, the set-top box mainly comprises a network interface module, a video and audio coding and decoding engine module and a CPU module; the coding board mainly comprises a network interface module, a video and audio coding engine module and a CPU module; the memory mainly comprises a network interface module, a CPU module and a disk array module.

1.3 devices of the metropolitan area network part can be mainly classified into 3 types: node server, node exchanger, metropolitan area server. The node switch mainly comprises a network interface module, a switching engine module and a CPU module; the metropolitan area server mainly comprises a network interface module, a switching engine module and a CPU module.

2. Video networking packet definition

2.1 Access network packet definition

The data packet of the access network mainly comprises the following parts: destination Address (DA), Source Address (SA), reserved bytes, payload (pdu), CRC.

As shown in the following table, the data packet of the access network mainly includes the following parts:

DA

SA

Reserved

Payload

CRC

wherein:

the Destination Address (DA) is composed of 8 bytes (byte), the first byte represents the type of the data packet (e.g. various protocol packets, multicast data packets, unicast data packets, etc.), there are at most 256 possibilities, the second byte to the sixth byte are metropolitan area network addresses, and the seventh byte and the eighth byte are access network addresses.

The Source Address (SA) is also composed of 8 bytes (byte), defined as the same as the Destination Address (DA).

The reserved byte consists of 2 bytes.

The payload part has different lengths according to types of different datagrams, and is 64 bytes if the type of the datagram is a variety of protocol packets, or is 1056 bytes if the type of the datagram is a unicast packet, but is not limited to the above 2 types.

The CRC consists of 4 bytes and is calculated in accordance with the standard ethernet CRC algorithm.

2.2 metropolitan area network packet definition

The topology of a metropolitan area network is a graph and there may be 2, or even more than 2, connections between two devices, i.e., there may be more than 2 connections between a node switch and a node server, a node switch and a node switch, and a node switch and a node server. However, the metro network address of the metro network device is unique, and in order to accurately describe the connection relationship between the metro network devices, parameters are introduced in the embodiment of the present invention: a label to uniquely describe a metropolitan area network device.

In this specification, the definition of the label is similar to that of a label of Multi-Protocol label switching (MPLS), and assuming that there are two connections between a device a and a device B, there are 2 labels for a packet from the device a to the device B, and 2 labels for a packet from the device B to the device a. The label is classified into an incoming label and an outgoing label, and assuming that the label (incoming label) of the packet entering the device a is 0x0000, the label (outgoing label) of the packet leaving the device a may become 0x 0001. The network access process of the metro network is a network access process under centralized control, that is, address allocation and label allocation of the metro network are both dominated by the metro server, and the node switch and the node server are both passively executed, which is different from label allocation of MPLS, and label allocation of MPLS is a result of mutual negotiation between the switch and the server.

As shown in the following table, the data packet of the metro network mainly includes the following parts:

DA

SA

Reserved

label (R)

Payload

CRC

Namely Destination Address (DA), Source Address (SA), Reserved byte (Reserved), tag, payload (pdu), CRC. The format of the tag may be defined by reference to the following: the tag is 32 bits with the upper 16 bits reserved and only the lower 16 bits used, and its position is between the reserved bytes and payload of the packet.

Based on the characteristics of the video network, one of the core concepts of the embodiments of the present invention is provided, following a protocol of the video network, an area for displaying volume information of acquired audio data is preset in a terminal of the video network, after the terminal of the video network acquires audio data to be processed, an initialization operation is performed on the volume display area, the volume information of the audio data is determined according to a mean square error of the audio data, and then the volume information is drawn in the volume display area according to a preset view control.

Referring to fig. 5, a flowchart illustrating steps of an embodiment of a method for processing audio data according to the present invention is shown, where the method may be applied to a video network, where the video network includes a video network terminal, and the method may specifically include the following steps:

step 501, the video network terminal obtains audio data to be processed.

In a specific implementation, the video network terminal may be a Set Top Box (STB), commonly referred to as a Set Top Box or Set Top Box, which is a device for connecting a tv Set and an external signal source, and may convert a compressed digital signal into tv content and display the tv content on the tv Set.

Generally, the set-top box may be connected to a camera and a microphone for collecting multimedia data such as video data and audio data, and may also be connected to a television for playing multimedia data such as video data and audio data.

In the embodiment of the invention, the audio data to be processed acquired by the video network terminal can be a section of sound file. A piece of sound file may be composed of a plurality of data samples, for example: a 5 minute 44.1kHz 16bit stereo audio data segment with 44100 samples per channel per second, 2 bytes per sample, has a total data size of 44100 × 2 × 300-52920000 bytes-52.92 MB.

Step 502, the video network terminal initializes the volume display area according to the sampling rate and the sampling number of bits of the audio data.

In the embodiment of the present invention, the volume display area may be an area in the video networking terminal, where the area is used to display volume information of the acquired audio data.

In a preferred embodiment of the present invention, the volume display area may include: the volume value scale table, the plosive indication area, the volume value peak value indication mark and the volume value valley value indication mark.

In a preferred embodiment of the present invention, when initializing the volume display area according to the sampling rate and the number of sampling bits of the audio data, the video networking terminal may calculate the maximum volume value and the minimum volume value of the audio data according to the sampling rate and the number of sampling bits of the audio data, and then initialize the volume scale table in the volume display area according to the maximum volume value and the minimum volume value. Where the sampling rate, also referred to as sampling speed or sampling frequency, defines the number of samples per second that are extracted from a continuous signal and constitute a discrete signal, the sampling rate being expressed in hertz (Hz). The sampling digit, i.e. sampling value or sampling value, is a parameter for measuring the fluctuation of sound, and refers to the binary digit of the digital sound signal used in collecting and playing sound files. The video networking terminal multiplies the sampling rate of the audio data by the sampling bit number to obtain the bit rate of the audio data, namely the bit number transmitted per second, 1 bit rate can record 6 decibels of audio data, for example, the volume range of the 16-bit-rate audio data is between 96 decibels and-97 decibels, the 96 decibels are the maximum volume value of the 16-bit-rate audio data, and the-97 decibels are the minimum volume value of the 16-bit-rate audio data; the volume of the 24-bit rate audio data ranges between 144 decibels and-145 decibels, 144 decibels being the maximum volume value of the 24-bit rate audio data, and-145 decibels being the minimum volume value of the 24-bit rate audio data. After the maximum volume value and the minimum volume value of the audio data are obtained through calculation, the video networking terminal can take the minimum volume value as the minimum scale value of the volume value scale, take the maximum volume value as the maximum scale value of the volume value scale, and initialize the volume value scale by taking 1 dB as the minimum volume value scale unit.

Step 503, the video network terminal determines the volume information of the audio data according to the root mean square value of the audio data.

In the embodiment of the invention, the video networking terminal can determine the volume information of the audio data in various ways, such as an average value way, namely, the values of the acquisition points of the audio data are accumulated and then divided by the number of the acquisition points of the audio data, so as to calculate the average volume value of the audio data. Taking the audio data of two channels as an example, the maximum volume value is 32767, the audio data is divided into 100 parts in equal proportion, and the average volume value of each part of volume data is 3276, which can simply represent the average volume value of the volume data.

In a preferred embodiment of the present invention, when determining the volume information of the audio data according to the root mean square value of the audio data, the terminal of the video network may calculate a variance value of actual sampling points of the audio data, and then calculate the volume information of the audio data according to 10 × log10(rms), where rms is the variance value. Specifically, a formula of dB ═ 20 × lg (a/B) calculation may be used, where dB (Decibel) is a pure unit of count, which is intended to represent the magnitude of the ratio of two quantities, without unit. For voltage or current, dB ═ 20 × lg (a/B). Here, a and B represent the values of power or current and voltage involved in the comparison. In practical application, the terminal of the video network firstly calculates the square value of the sampling point, then calculates rms according to the maximum square value, and then calls 10 × log10(rms) to calculate the energy value, namely decibel.

And step 504, the video network terminal draws the volume information in the volume display area after the initialization operation according to the preset view control.

In the embodiment of the invention, the video network terminal can respectively draw the volume value scale table, the plosive indicating area, the volume value peak indicating mark and the volume value valley indicating mark in the volume display area according to the preset view control.

In the embodiment of the invention, the volume display area after the initialization operation comprises the volume value scale table.

In a preferred embodiment of the present invention, the video network terminal may draw a volume value indication area in the volume display area after the initialization operation according to the maximum volume value and the minimum volume value described above according to the pre-configured view control. The volume value indication area may be a rectangular area, and an upper boundary and a lower boundary of the rectangular area may respectively correspond to the maximum scale value and the minimum scale value of the volume value scale. The maximum volume value may correspond to an upper boundary of the volume value indication region, the minimum volume value may correspond to a lower boundary of the volume value indication region, a left boundary, a right boundary of the volume value indication region, and a distance between the left boundary and the right boundary may be determined according to an actual situation.

In a preferred embodiment of the present invention, the video networking terminal may draw, according to a preset view control, a maximum volume value in the volume information of the audio data acquired within a preset time period as a volume value peak indication identifier, and draw a minimum volume value in the volume information of the audio data acquired within the preset time period as a volume value valley indication identifier in the volume value indication area. The volume value indication area can display the volume information of the audio data in real time, namely, the decibel size of the audio data at the current moment. The preset time period may be set according to actual conditions, for example, one minute, or 45 seconds, and the like. The volume value peak indicator and the volume value valley indicator may be line segments, and the volume value peak indicator and the volume value valley indicator are not particularly limited in the embodiment of the present invention. In practical application, the video network terminal may draw a volume value peak indication mark at a volume value scale position corresponding to the maximum volume value and draw a volume value valley indication mark at a volume value scale position corresponding to the minimum volume value according to a pre-configured view control.

In a preferred embodiment of the present invention, the video network terminal may plot, according to the pre-configured view control, a volume value greater than a preset volume threshold in the volume information of the audio data in the plosive indicating area. The preset volume threshold may be determined according to actual situations, for example, 96 decibels and the like, and the preset volume threshold is not specifically limited in the embodiment of the present invention. For a volume value greater than a preset volume threshold, which represents a volume value that has exceeded a normal volume range, when the volume value is plotted in the plosive indicating area, the volume value may be plotted in a specific color or a specific shape, such as a red strip shape, and the like, in the plosive indicating area.

In the embodiment of the invention, the preset view control can be determined according to the classification of the video networking terminals, and the classification of the video networking terminals can be divided according to the operating system adopted by the video networking terminals, for example, if a certain video networking terminal adopts an android operating system, a view control, called WaveView, is predefined in the video networking terminal. Drawing a rectangular area according to the maximum value and the minimum value of the decibel, continuously sending the decibel value of the audio data obtained by calculation to the WaveView in a thread for dynamically receiving the audio data in a video networking terminal, and drawing the volume value to the WaveView by using a canvas drawing technology of an android system in an onDraw method of the WaveView so as to visually display the volume value of the audio data.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 6, a block diagram of an embodiment of an audio data processing apparatus according to the present invention is shown, where the apparatus may be applied to a video network terminal in a video network, and the apparatus may specifically include the following modules:

the obtaining module 601 is configured to obtain audio data to be processed.

The initialization module 602 is configured to initialize a volume display area according to a sampling rate and a sampling bit number of the audio data, where the volume display area is an area in the video networking terminal for displaying volume information of the acquired audio data.

The determining module 603 is configured to determine volume information of the audio data according to a root mean square value of the audio data.

And a drawing module 604, configured to draw the volume information in the volume display area after the initialization operation according to the preconfigured view control.

In a preferred embodiment of the present invention, the volume display area includes: the volume value scale table, the plosive indication area, the volume value peak value indication mark and the volume value valley value indication mark.

In a preferred embodiment of the present invention, the initialization module 602 includes: a volume value calculation module 6021 for calculating a maximum volume value and a minimum volume value of the audio data according to the sampling rate and the number of sampling bits of the audio data; and the scale initialization module 6022 is used for initializing the scale of the volume value according to the maximum volume value and the minimum volume value.

In a preferred embodiment of the present invention, the drawing module 604 is configured to draw the volume value indication area in the volume display area after the initialization operation according to the preset view control according to the maximum volume value and the minimum volume value.

In a preferred embodiment of the present invention, the drawing module 604 is configured to draw, according to a preset view control, a maximum volume value in the volume information of the audio data acquired within a preset time period as a volume value peak indication identifier, and draw a minimum volume value in the volume information of the audio data acquired within the preset time period as a volume value valley indication identifier in the volume value indication area.

In a preferred embodiment of the present invention, the drawing module 604 is configured to draw, according to the preconfigured view control, a volume value greater than a preset volume threshold in the volume information of the audio data in the plosive indicating area.

In a preferred embodiment of the present invention, the determining module 603 includes: a variance value calculation module 6031, configured to calculate a variance value of an actual sampling point of the audio data; and a volume information calculation module 6032, configured to calculate volume information of the audio data according to 10 × log10(rms), where rms is a variance value.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The foregoing describes in detail a method and an apparatus for processing audio data according to the present invention, and specific examples are used herein to explain the principles and embodiments of the present invention, and the descriptions of the foregoing examples are only used to help understand the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A processing method of audio data is applied to a video network, wherein the video network comprises video network terminals, and the method comprises the following steps:

the video networking terminal acquires audio data to be processed;

the video network terminal initializes a volume display area according to the sampling rate and the sampling digit of the audio data, wherein the volume display area is an area used for displaying volume information of the acquired audio data in the video network terminal;

the video network terminal determines volume information of the audio data according to the root mean square value of the audio data;

and the video network terminal draws the volume information in a volume display area after initialization operation according to a preset view control.

2. The audio data processing method according to claim 1, wherein the volume display area includes: the volume value scale table, the plosive indication area, the volume value peak value indication mark and the volume value valley value indication mark.

3. The audio data processing method of claim 2, wherein the video networking terminal performs initialization operation on a volume display area according to the sampling rate and the sampling bit number of the audio data, and the initialization operation comprises:

the video networking terminal calculates the maximum volume value and the minimum volume value of the audio data according to the sampling rate and the sampling digit of the audio data;

and the video networking terminal carries out initialization operation on the volume value scale table according to the maximum volume value and the minimum volume value.

4. The audio data processing method of claim 3, wherein the drawing, by the video network terminal, the volume information in the volume display area after the initialization operation according to the pre-configured view control comprises:

the video networking terminal draws the volume value indication area in the volume display area after the initialization operation according to the preset view control and the maximum volume value and the minimum volume value; the video networking terminal draws the maximum volume value in the volume information of the audio data acquired in a preset time period as a volume value peak value indication mark and draws the minimum volume value in the volume information of the audio data acquired in the preset time period as a volume value valley value indication mark in the volume value indication area according to a preset view control;

and the video network terminal draws the volume value which is greater than a preset volume threshold value in the volume information of the audio data in the plosive indicating area according to a preset view control.

5. The audio data processing method according to any one of claims 1 to 4, wherein the determining, by the video network terminal, the volume information of the audio data according to the RMS value of the audio data includes:

the video network terminal calculates the variance value of the actual sampling point of the audio data;

and the video network terminal calculates volume information of the audio data according to 10 × log10(rms), wherein the rms is the variance value.

6. An apparatus for processing audio data, the apparatus being applied to a video network terminal in a video network, the apparatus comprising:

the acquisition module is used for acquiring audio data to be processed;

the initialization module is used for initializing a volume display area according to the sampling rate and the sampling number of the audio data, wherein the volume display area is an area used for displaying volume information of the acquired audio data in the video network terminal;

the determining module is used for determining the volume information of the audio data according to the root mean square value of the audio data;

and the drawing module is used for drawing the volume information in the volume display area after the initialization operation according to the preset view control.

7. The audio data processing apparatus according to claim 6, wherein the volume display area includes: the volume value scale table, the plosive indication area, the volume value peak value indication mark and the volume value valley value indication mark.

8. The apparatus for processing audio data according to claim 7, wherein the initialization module comprises:

the volume value calculation module is used for calculating the maximum volume value and the minimum volume value of the audio data according to the sampling rate and the sampling digit of the audio data;

and the scale initialization module is used for initializing the scale of the volume value according to the maximum volume value and the minimum volume value.

9. The audio data processing apparatus according to claim 8,

the drawing module is used for drawing the volume value indication area in the volume display area after the initialization operation according to the maximum volume value and the minimum volume value according to a preset view control;

the drawing module is used for drawing a maximum volume value in the volume information of the audio data acquired within a preset time period as the volume value peak value indication mark and drawing a minimum volume value in the volume information of the audio data acquired within the preset time period as the volume value valley value indication mark in the volume value indication area according to a preset view control;

and the drawing module is used for drawing the volume value which is greater than a preset volume threshold value in the volume information of the audio data in the plosive indicating area according to a preset view control.

10. The apparatus for processing audio data according to any of claims 6 to 9, wherein the determining module comprises:

the variance value calculation module is used for calculating the variance value of the actual sampling point of the audio data;

and the volume information calculation module is used for calculating volume information of the audio data according to 10 × log10(rms), wherein the rms is the variance value.