CN114861139A

CN114861139A - Audio processing method, copyright reading method, computer device and storage medium

Info

Publication number: CN114861139A
Application number: CN202210596336.3A
Authority: CN
Inventors: 陆克松; 周文江; 姜涛; 赵伟峰; 徐东
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-08-05
Also published as: WO2023231679A1

Abstract

The application relates to an audio processing method, a copyright reading method, a computer device and a computer readable storage medium. The method comprises the following steps: acquiring copyright information index features corresponding to copyright information of audio to be processed, wherein the copyright information index features are used for indexing the copyright information of the audio to be processed in a copyright database; obtaining a digital watermark according to the copyright information index characteristic; and embedding the digital watermark into the audio to be processed to obtain the target audio. The method can be used for generating the digital watermark without being limited by the watermark capacity. In addition, the copyright information containing a large amount of information is placed in the copyright database and is called from the copyright database according to the copyright information index features when needed, so that complete and comprehensive copyright information can be provided, and the copyright protection effect is ensured.

Description

Audio processing method, copyright reading method, computer device and storage medium

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to a frequency processing method, a copyright reading method, a computer device, and a computer-readable storage medium.

Background

The digital watermark is protection information embedded into a carrier file by using a computer algorithm, and some identification information is directly embedded into a digital carrier or indirectly represented, so that the use value of the original carrier is not influenced, and the digital watermark is not easy to be ascertained and modified again. Based on the above-mentioned characteristics of digital watermarking, it is often used to protect the copyright of audio. However, the capacity of the digital watermark embedded in the audio by the traditional technology is limited, and the requirement of embedding copyright information in the audio is difficult to meet.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an audio processing method, a copyright reading method, a computer device, and a storage medium that can solve the problem of limited digital watermark capacity.

In a first aspect, an embodiment of the present invention provides an audio processing method, including: acquiring copyright information index features corresponding to the copyright information of the audio to be processed, wherein the copyright information index features are used for indexing the copyright information of the audio to be processed in a copyright database; obtaining a digital watermark according to the copyright information index characteristic; and embedding the digital watermark into the audio to be processed to obtain the target audio.

In one embodiment, the step of obtaining the digital watermark according to the copyright information index feature includes: coding the copyright information index features to obtain two-dimensional watermark images corresponding to the copyright information index features; and obtaining the digital watermark according to the two-dimensional watermark image.

In one embodiment, obtaining the digital watermark from the two-dimensional watermark image includes: simplifying the two-dimensional watermark image; and taking the simplified two-dimensional watermark image as a digital watermark.

In one embodiment, the step of performing simplified processing on the two-dimensional watermark image includes: setting pixel points of a simplified area of the two-dimensional watermark image as blank pixel points to obtain a simplified two-dimensional watermark image; wherein the simplifiable region includes at least one of a positioning mark region, a correction mark region, a timing mark region and a static region.

In one embodiment, the step of embedding the digital watermark in the audio to be processed comprises: setting a digital watermark in a target frequency band of a preset spectrogram, wherein the target frequency band is a frequency band outside a human ear perception range; converting a preset spectrogram into audio to obtain watermark audio; and synthesizing the watermark audio and the audio to be processed to obtain the target audio.

In one embodiment, a first coordinate axis of the preset spectrogram represents time, a second coordinate axis represents frequency, and the step of setting the digital watermark in the target frequency band of the preset spectrogram comprises: determining a target coordinate corresponding to the target frequency band in the second coordinate axis direction, and determining a target area of the preset spectrogram according to the target coordinate; the coordinate of the target area in the second coordinate axis direction is larger than the target coordinate; and setting the digital watermark in a target area of a preset spectrogram to obtain the spectrogram.

In one embodiment, the step of embedding the digital watermark in the audio to be processed comprises: performing framing processing on the audio to be processed, and performing time domain transformation processing on each frame audio obtained after the framing processing to obtain a frequency domain signal of each frame audio; embedding digital watermarks in the frequency domain signals to obtain embedded frequency domain signals; carrying out inverse processing of time-frequency transformation processing on each embedded frequency domain signal to obtain each embedded frame audio; and carrying out superposition processing on each embedded frame audio to obtain a target audio.

In one embodiment, the step of embedding a digital watermark in each frequency domain signal comprises: converting the digital watermark into a binary sequence, and performing spread spectrum processing on the binary sequence by using a pseudo-random noise sequence to obtain a spread spectrum sequence; and embedding the spread spectrum sequence into each frequency domain signal to obtain each embedded frequency domain signal.

In one embodiment, the step of embedding a digital watermark in each frequency domain signal comprises: converting the digital watermark into a binary sequence; determining a target frequency coefficient in the frequency coefficients of the frequency components of the frequency domain signal; and adjusting the target frequency coefficient of each frequency domain signal by using the binary sequence so as to embed the digital watermark in each frequency domain signal.

In one embodiment, the step of embedding the digital watermark in the audio to be processed comprises: performing frame processing on the audio to be processed to obtain each frame audio; performing time-frequency transformation processing on each frame audio to obtain a frequency domain signal corresponding to each frame audio; embedding digital watermarks in the frequency domain signals to obtain embedded frequency domain signals; carrying out inverse processing of time-frequency transformation processing on each embedded frequency domain signal to obtain each embedded frame audio; and carrying out superposition processing on each embedded frame audio to obtain a target audio.

In one embodiment, the copyright database is configured with a target server, and the target server is in a network isolation environment.

In one embodiment, the copyright information in the copyright database is updated to the changed actual copyright information when the actual copyright information is changed.

In a second aspect, an embodiment of the present invention provides an audio copyright reading method, including: acquiring a target audio; extracting a digital watermark from the target audio; obtaining copyright information index characteristics according to the digital watermark; and indexing the copyright information of the target audio in a copyright database according to the copyright information index characteristic.

In a third aspect, an embodiment of the present invention provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method described above.

According to the audio processing method provided by the application, the digital watermark generated based on the copyright information index characteristic is embedded into the audio to be processed, when the copyright of the audio to be processed needs to be read, the copyright information index characteristic in the digital watermark is extracted, and the copyright information corresponding to the audio to be processed can be read in the copyright database according to the copyright information index characteristic. Since the copyright information index feature has a smaller information amount compared with copyright information, the digital watermark generation is not limited by the watermark capacity. In addition, the copyright information containing a large amount of information is placed in the copyright database and is called from the copyright database according to the copyright information index features when needed, so that complete and comprehensive copyright information can be provided, and the copyright protection effect is ensured.

In addition, the application also provides an audio copyright reading method, computer equipment and a computer readable storage medium, and the beneficial effects are also achieved.

Drawings

FIG. 1 is a diagram of an exemplary audio processing application;

FIG. 2 is a flow diagram of an audio processing method in one embodiment;

FIG. 3 is a flowchart illustrating a process of obtaining a digital watermark according to an index feature of copyright information in an embodiment;

FIG. 4 is a spectral diagram of a two-dimensional watermark image mapped to a pending audio spectrum in one embodiment;

FIG. 5 is a flow chart illustrating embedding a digital watermark into pending audio according to one embodiment;

fig. 6 is a schematic diagram illustrating a process of obtaining a digital watermark according to an index feature of copyright information in another embodiment;

FIG. 7 is a schematic diagram of a simplified region of a two-dimensional watermark image according to one embodiment;

FIG. 8 is a flowchart illustrating an audio copyright reading method according to an embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Related abbreviations and key terms in this application are defined as follows:

digital watermarking: the author enjoys copyright on the audio. In order to protect the copyright of the creator on the audio to be processed, the copyright information of the audio to be processed can be embedded into the audio to be processed, the embedded copyright information does not affect the normal use of the original audio to be processed, and is not easy to be ascertained and modified again, but the copyright information can be identified from the embedded audio to be processed when the copyright is required. Digital watermarking is a common technique for achieving this effect. The copyright information of the audio to be processed is processed into the digital watermark, and then the digital watermark is embedded into the audio to be processed, so that the original tone quality of the audio to be processed is not greatly influenced after the digital watermark is embedded, or the human ear cannot feel the influence of the original tone quality. On the contrary, the digital watermark can be extracted from the audio to be processed in which the digital watermark is embedded, and the copyright information can be obtained by decoding the digital watermark.

Watermark capacity: the watermark capacity of a digital watermark refers to the amount of binary data that can be embedded per unit length of time of audio.

The audio processing method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The data storage system may store therein a copyright database, which includes copyright information of the audio, including but not limited to copyright owner, copyright granting time, purchaser information, purchase price, secondary authorization information, etc. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

For illustration, please refer to fig. 2, the method includes steps S202 to S206, where the method is applied to the terminal 102 in fig. 1.

S202, acquiring copyright information index characteristics corresponding to the copyright information of the audio to be processed, wherein the copyright information index characteristics are used for indexing the copyright information of the audio to be processed in a copyright database.

The audio to be processed includes but is not limited to works created by creators such as songs, pure music, movie soundtracks, talking novels, talk shows, radio plays, and the like. The copyright information index feature may be obtained from a server configured with a copyright database, or may be obtained by the terminal 102 using the same algorithm as that used for generating the copyright information index feature by the copyright database.

It can be understood that if the digital watermark contains too much data, the audio listening feeling may be poor, so that it is difficult for the conventional digital watermark embedding method to embed the more complex copyright information into the audio without affecting the audio listening feeling. In order to solve the problem, the embodiment selects the content contained in the digital watermark not to be the copyright information but to be the copyright information index feature, that is, the copyright information with a large information amount is not directly embedded into the audio to be processed, but a brief copyright information index feature is embedded into the audio to be processed. The copyright information is stored in the copyright database without being limited by the watermark capacity, and the copyright information of the audio to be processed can be conveniently found from the copyright database after the copyright information index feature is extracted from the digital watermark of the audio to be processed, so that the copyright protection effect of the audio to be processed is not influenced.

In some optional embodiments, the copyright information index feature may be a URL (Uniform Resource Locator) callback address, and the copyright information of the corresponding audio is called back through the URL callback address. In some embodiments, the copyright information index feature may be a string of a certain length, the length of the string may be selected according to the watermark capacity, and the string may be formed by one or more of a combination of numeric characters, english characters, and chinese characters. The character string can be randomly generated or processed by a certain encryption algorithm, and only the one-to-one correspondence between each character string and the audio is required to be ensured. Optionally, when the copyright database is constructed, the copyright information index feature and the corresponding copyright information have a mapping relationship, so that specific copyright information can be queried according to the copyright information index feature when needed. For example, if the copyright database stores the copyright information in a key-value pair manner, the copyright information may be stored as a value by using the copyright index feature corresponding to the copyright information as a key. For another example, the copyright information is stored in a linked list form, and the copyright index feature may be a jump pointer, according to which a jump can be made to a position corresponding to the copyright information. Optionally, the character string may also be a feature related to the audio to be processed, such as a keyword of the name, the author, the content of the lyrics, and the like of the audio to be processed. The storage structure of the copyright database is not limited in this embodiment, as long as it is satisfied that the corresponding copyright information can be searched from the copyright database according to the copyright index feature.

And S204, obtaining the digital watermark according to the copyright information index characteristic.

S206, embedding the digital watermark into the audio to be processed to obtain the target audio.

There are many digital watermark embedding algorithms, such as spatial domain algorithm, transform domain algorithm, compressed domain algorithm, spread spectrum algorithm, least significant bit algorithm, echo hiding algorithm, and phase encoding algorithm. The watermark embedding algorithms can process the copyright information index features of the audio to be processed into digital watermarks, and then embed the digital watermarks containing the copyright information index features into the audio to be processed without influencing the normal playing effect of the audio to be processed. An appropriate digital watermark embedding algorithm can be selected according to actual situations, for example, when a Transform domain algorithm is used, in order to ensure that an imaginary number situation does not occur in the audio to be processed in the transformation process, a DCT (Discrete Cosine Transform) algorithm in the Transform domain algorithm can be used. For another example, to increase the robustness of the digital watermark, DSSS (Direct Sequence Spread Spectrum) algorithm, which is a Spread Spectrum algorithm, may be used.

Based on the audio processing method in this embodiment, the digital watermark generated according to the copyright information index feature is embedded into the audio to be processed, and when the copyright of the audio to be processed needs to be read, the copyright information index feature in the digital watermark is extracted, so that the copyright information corresponding to the audio to be processed can be read in the copyright database according to the copyright information index feature. Since the copyright information index feature has a smaller information amount compared with copyright information, the digital watermark generation is not limited by the watermark capacity. In addition, the copyright information containing a large amount of information is placed in the copyright database and is called from the copyright database according to the copyright information index features when needed, and the copyright database can provide complete and comprehensive copyright information to ensure the copyright protection effect.

In one embodiment, the copyright database is configured with a target server, and the target server is in a network isolation environment. On the basis of disconnection of two or more computers or networks in the same network isolation environment, information exchange and resource sharing can be realized, that is, the two networks can be physically isolated and data exchange can be carried out in a safe network environment through a network isolation technology. It can be understood that, in the process of generating a digital watermark according to information embedded as required, a preset watermark key and a watermark encryption algorithm may be used to encrypt the digital watermark, but the current watermark encryption algorithm is generally an open source, and only if the watermark key is decrypted, the information embedded in the digital watermark can be extracted, which may cause some confidential copyright information to be decrypted, and the security of the copyright information is difficult to guarantee. In order to improve the security of the copyright information in this embodiment, the copyright database is configured in the target server in the network isolation environment. Because the target server is in a network isolation environment, the copyright information can not be obtained from the copyright database by means of network attack, and the safety of the copyright information is ensured.

In addition, the digital watermark in this embodiment may be encrypted or unencrypted, and because the embedded information of the digital watermark is the copyright information index feature, even if the digital watermark is obtained without being encrypted or decrypted, only the copyright information index feature can be obtained from the digital watermark. If the corresponding copyright information is required to be searched according to the copyright information index characteristics, a request is sent to the target server, and the target server can check whether the requester is qualified to obtain the copyright information. For example, the target server determines whether the network address of the requester is an intranet (i.e., in the same network isolation environment as the target server) address or an extranet (i.e., in a network environment other than the network isolation environment in which the target server is located) address, so as to determine whether to feedback the copyright information to the requester. In one embodiment, the target server responds to a request from a requester of the intranet address, and feeds back copyright information corresponding to the copyright information index feature to the requester. The target server denies the request from the requestor of the foreign network address. That is, in the case that the requirement on the security level of the copyright information is high, only the requester from the target server in the same network isolation environment can obtain the copyright information. In another embodiment, the copyright information includes public copyright information and secret copyright information. The target server responds to a request from a requester of the intranet address, and feeds back public copyright information and secret copyright information corresponding to the copyright information index feature to the requester. The target server responds to a request from a requester of the external network address, and feeds back public copyright information corresponding to the copyright information index feature to the requester. Some copyright information can be known to the public, namely the copyright information is disclosed, and anyone can inquire the copyright information according to the copyright information index characteristics. However, some copyright information needs to be kept secret, namely, the secret copyright information can be obtained only by a requester from a target server in the same network isolation environment. For example, a copyright owner, a copyright granting time, and the like may be disclosed, set to disclose copyright information. Information related to business or privacy, such as a copyright purchase price, copyright purchaser information, and the like, should be kept secret, and set as secret copyright information. The public copyright information and the secret copyright information can be set according to actual needs, and are not limited in the embodiment.

In one embodiment, when the actual copyright information changes, the copyright information in the copyright database is updated to the changed actual copyright information. It can be understood that, for the scheme of directly embedding the copyright information in the digital watermark, the digital watermark cannot be changed once being embedded, so that it is difficult to adapt to the actual use scene that the actual copyright information is dynamically changed. However, in this embodiment, the information embedded in the digital watermark is the copyright information index feature, and although the copyright information index feature cannot be changed, the copyright database may be updated on the target server, that is, after the actual copyright information changes, the target server updates the copyright information in the copyright database to the changed actual copyright information. And then, when corresponding copyright information is acquired in the copyright database according to the copyright information index characteristics, copyright information matched with the actual copyright information can be acquired, and the method is better adapted to the actual use scene.

In one embodiment, referring to fig. 3, the step of obtaining the digital watermark according to the copyright information index feature includes S302 and S304.

S302, coding the copyright information index features to obtain two-dimensional watermark images corresponding to the copyright information index features.

And S304, obtaining the digital watermark according to the two-dimensional watermark image.

It will be appreciated that the encoding process is used to convert one-dimensional text information into a two-dimensional watermark image. The two-dimensional watermark image is a pattern which is distributed on a plane (in two-dimensional direction) according to a certain rule, is black and white and is recorded with copyright information characteristics. The content contained in the traditional digital watermark is generally one-dimensional text information, the phenomenon of information loss possibly exists in the uploading, downloading, spreading and other processes of target audio, the embedded one-dimensional text information of the digital watermark is incomplete, the common one-dimensional text information has no fault tolerance resistance, and therefore copyright information extracted according to the one-dimensional text information may also be incomplete. Therefore, in the embodiment, the copyright information index feature is firstly selected to be converted into the two-dimensional watermark image, the two-dimensional watermark image can store the one-dimensional character information with high capacity and high density, the problem that the watermark capacity is limited is further solved, and the two-dimensional watermark image supports error correction processing, namely, the complete one-dimensional text information embedded in the two-dimensional watermark image can be read from the two-dimensional watermark image when the two-dimensional watermark image is defective, so that the robustness of the digital watermark is improved. Alternatively, the coding system of the two-dimensional watermark image may be divided into a stacked two-dimensional code and a matrix two-dimensional code. The stacked two-dimensional Code may be Code 16K, Code 49, PDF417, or the like. The matrix two-dimensional code may be a QR (Quick Response) code, MaxiCode, or the like. At present, the algorithm for converting one-dimensional text information into a two-dimensional watermark image is relatively mature, and a corresponding coding processing algorithm can be selected according to a selected code system. In the encoding processing algorithm, it is often necessary to set encoding parameters including the error correction level of the two-dimensional watermark image. Therefore, the higher the error correction level is, the smaller the capacity of the one-dimensional text information that can be contained in the two-dimensional watermark image code is, so that a proper error correction level can be selected according to the size of the copyright information index feature. Taking the QR code as an example, the error correction capability is ranked from low to high, and the error correction levels include level L (maximum 7% of errors can be corrected), level M (maximum 15% of errors can be corrected), level Q (maximum 25% of errors can be corrected), and level H (maximum 30% of errors can be corrected). Specifically, an optional error correction level may be determined first, and the capacity of the QR code corresponding to the optional error correction level may accommodate the copyright information index feature. And taking the highest level of the selectable error correction levels as a target error correction level. And when the data index features are subjected to coding processing, carrying out error correction coding according to the target error correction level.

There are also many algorithms for embedding digital watermarks corresponding to two-dimensional watermark images into audio. Such as spatial domain algorithms, transform domain algorithms, compressed domain algorithms, spread spectrum algorithms, least significant bit algorithms, echo-hiding algorithms, and phase-encoding algorithms, among others. Because the audio to be processed is a one-dimensional signal, when the algorithm is applied, the two-dimensional watermark image needs to be converted into a one-dimensional sequence, and each element in the one-dimensional sequence is used for representing the pixel value of each pixel point of the two-dimensional watermark image. After the one-dimensional sequence is extracted from the digital watermark, a two-dimensional watermark image can be restored according to the one-dimensional sequence. The digital watermark corresponding to the two-dimensional watermark image can also be directly mapped to the spectrogram of the audio to be processed, and the digital watermark corresponding to the two-dimensional watermark image can be directly observed after the spectrogram of the target audio is extracted.

In an embodiment, the digital watermark corresponding to the two-dimensional watermark image may be directly mapped onto the spectrogram of the audio to be processed, and after the spectrogram of the target audio is extracted, the digital watermark corresponding to the two-dimensional watermark image may be directly observed. Specifically, the step of embedding the digital watermark into the audio to be processed includes:

step 1, setting a digital watermark in a target frequency band of a preset spectrogram, wherein the target frequency band is a frequency band outside a human-ear perception range.

It is understood that the spectrogram can reflect the frequency and time of the audio and the relationship between the audio energy corresponding to the frequency and the time point. The first axis represents time, the second axis represents frequency, and the color depth (RGB value if color image, or gray value if black-and-white image) of each pixel point of the image represents the audio energy. In order to avoid influencing the hearing of a user after the digital watermark is embedded, the position of the digital watermark presented in the preset spectrogram should be in the target frequency band. For example, the theoretical hearing range of human ears is in the interval of 20Hz-20kHz, but generally, the audio frequency above 15.1kHz is difficult to be perceived by human ears, so the target frequency band can be selected to be 16 kHz. In order to further ensure imperceptibility of the digital watermark, a frequency band having a target frequency band of 20kHz or more may be selected. Taking the example that the horizontal axis represents time and the vertical axis represents frequency, the frequency band corresponding to the region above the preset spectrogram is higher, so that the digital watermark can be arranged in the upper half region of the preset spectrogram.

When the digital watermark is set to the target frequency band of the preset spectrogram, the target coordinates corresponding to the target frequency band on the second coordinate axis can be determined, and the digital watermark is set to the area where the coordinates on the second coordinate axis are all larger than the target coordinates.

And 2, converting the preset spectrogram into audio to obtain watermark audio.

The spectrogram reflects the energy of any time and any frequency point, and the corresponding amplitude value can be obtained according to the energy. The expression of the audio signal in the time domain is the amplitude of the audio which changes along with the time, so that the corresponding audio can be obtained according to the spectrogram. For example, the predetermined spectrogram may be represented by a two-dimensional array, where an element in the ith row and the jth column of the two-dimensional array is a pixel value in the ith row and the jth column of the predetermined spectrogram, and a magnitude of the pixel value reflects an energy magnitude of the audio signal with the i-th time and the frequency j. And performing inverse processing (such as IDFT, IDCT and the like) of time-frequency transformation processing on elements of each row of the two-dimensional array to obtain frame audio corresponding to each moment, and overlapping the frame audio to obtain audio corresponding to the spectrogram. At present, many mature function libraries contain functions for converting speech spectrogram into audio, and the corresponding audio can be restored by only importing the speech spectrogram into the functions. Such as the library of library librosa based on python. Therefore, the preset spectrogram can be converted into audio by any mode to obtain the digital watermark containing the digital watermark information. The specific conversion mode is not limited in this embodiment, and can be selected according to actual situations.

And 3, synthesizing the watermark audio and the audio to be processed to obtain the target audio.

I.e. the watermark audio and the audio to be processed are superimposed, e.g. the amplitudes of the two audio at the same time are superimposed. The time-frequency transformation processing is involved in the process of converting the audio frequency into the spectrogram, the time-frequency transformation processing has linear property, and the target spectrogram obtained by converting the target audio frequency is equivalent to the superposition of the preset spectrogram obtained by converting the watermark audio frequency and the original spectrogram obtained by converting the audio frequency to be processed. The target spectrogram is shown in fig. 4, the image at the upper left corner of fig. 4 is from the preset spectrogram, and the image at the lower right corner of fig. 4 is from the original spectrogram. Based on the above, after the target audio is obtained, if the copyright in the target audio needs to be read, the target audio is converted into a target spectrogram. And displaying the target language spectrogram to obtain the digital watermark, and reading the copyright information by scanning the two-dimensional watermark image at the upper left corner of the figure 4.

In one embodiment, for transform domain algorithm, spread spectrum algorithm, etc., please refer to fig. 5, the step of embedding the digital watermark into the audio to be processed includes steps S502 to S510.

S502, performing framing processing on the audio to be processed to obtain each frame audio.

It can be understood that the whole audio to be processed is a non-steady signal which changes continuously in the time domain, and is not beneficial to analysis in the frequency domain. The framing process may decompose the audio to be processed into a plurality of framed audios in a unit of a frame. Each framed audio time span is short, and the audio signal can be considered to be a stationary change within the short time span. At present, many software for audio processing are packaged with functions of frequency division processing, for example, an encode function in Matlab can be used for framing audio, and optional parameters in the encode function include window length, overlap rate, and the like. The window length is the window length when the windowing operation is carried out on the frame-divided audio, and the amplitude of the frame-divided audio is gradually changed to 0 at two ends through the windowing operation so as to reduce the frequency spectrum leakage phenomenon. In order to make each frame audio smoother, a certain degree of overlap is provided between two adjacent frame audio by setting an overlap ratio, considering that a pitch change between two adjacent frame audio, such as exactly between two syllables, may be large.

S504, time-frequency transformation processing is carried out on each frame audio frequency, and a frequency domain signal corresponding to each frame audio frequency is obtained.

The time-frequency transformation processing is used for converting each frame audio in the time domain into the frequency domain, so that a frequency domain signal corresponding to each frame audio can be obtained. In the field of audio processing, specific ways of time-frequency Transform processing include DCT (Discrete Cosine Transform) algorithm, DFT (Discrete Fourier Transform) algorithm, and the like.

And S506, embedding the digital watermark in each frequency domain signal to obtain each embedded frequency domain signal.

The way in which the digital watermark is embedded in the target frequency band differs depending on the watermark embedding algorithm chosen. For example, the transform domain algorithm may include: converting the digital watermark into a binary sequence; determining a target frequency coefficient in the frequency coefficients of the frequency components of the frequency domain signal; and adjusting the target frequency coefficient of each frequency domain signal by using the binary sequence so as to embed the digital watermark in each frequency domain signal. The binary sequence is that the pixel value of each pixel point in the image is represented by a series of binary numbers, if the digital watermark is not a gray image, graying can be performed first, and 1 and 0 in the binary sequence can be used for representing the gray value of a certain pixel point. In addition, the frequency domain signal may be represented by a sum of a series of cosine functions of different frequencies, each cosine function representing a frequency component, and the coefficient of the cosine function being a frequency coefficient. The target frequency coefficient can be selected in many ways, for example, from the viewpoint of robustness, the energy of the normal audio is mainly concentrated in the low frequency or the intermediate frequency, and is not easy to lose, so the target frequency coefficient can be selected from the intermediate frequency or the low frequency coefficient. The adjustment of the target frequency coefficient is equivalent to changing the amplitude of the frequency component, and the adjustment amplitude of the target frequency coefficient is controlled, so that the auditory sensation of the adjusted audio frequency is not influenced.

For another example, the spreading algorithm may include: and converting the digital watermark into a binary sequence, and performing spread spectrum processing on the binary sequence by using a pseudo-random noise sequence to obtain a spread spectrum sequence. And embedding the spread spectrum sequence into each frequency domain signal to obtain each embedded frequency domain signal. Spread spectrum technology is well developed in the field of communications, and can disperse information to be transmitted to a wider frequency band by using a pseudo-random noise sequence as a carrier. And for the application scene of the digital watermark, embedding the digital watermark into the audio to be processed is equivalent to embedding noise into the audio to be processed, and because the binary sequence is subjected to spread spectrum processing, the energy of the noise can be dispersed to each frequency point, so that the influence on the hearing is reduced. The technology for embedding the spreading sequence into the frequency domain signal is mature, and is not described herein again.

And S508, performing inverse processing of time-frequency transformation processing on each embedded frequency domain signal to obtain each embedded frame audio.

The inverse of the time-frequency transform process is used to convert each embedded framed audio from the frequency domain to the time domain. For example, when DCT is selected as the time-frequency Transform processing, the Inverse of the time-frequency Transform processing is IDCT (Inverse Discrete Cosine Transform). When DFT is selected for the time-frequency Transform processing, the Inverse of the time-frequency Transform processing is IDFT (Inverse Discrete Fourier Transform).

And S510, performing superposition processing on each embedded frame audio to obtain a target audio.

The superposition process can restore each embedded frame audio to a continuous audio, i.e. the target audio. After each embedded framed audio is obtained, each embedded framed audio needs to be reconstructed into a complete target audio. An algorithm that has a good signal reconstruction effect on an audio signal is an OLA (Overlap-Add) algorithm. At present, many software for performing audio processing are packaged with a function for frequency division processing, for example, an overlap function in Matlab can be used to perform overlap processing on audio, and selectable parameters in the overlap function include window length, overlap rate, and the like, and may select the same parameters as those in the encode function in step S602.

In one embodiment, referring to fig. 6, the step of obtaining the digital watermark according to the copyright information index feature includes S602 and S606.

S602, coding the copyright information index features to obtain two-dimensional watermark images corresponding to the copyright information index features.

And S604, simplifying the two-dimensional watermark image.

And S606, taking the simplified two-dimensional watermark image as the digital watermark.

It will be appreciated that the graphics in the two-dimensional watermark image may include both encoded graphics and functional graphics. The encoding patterns include patterns corresponding to the format information, patterns corresponding to the data codeword and the error correction codeword, and patterns corresponding to the version information. And the format information is used for reflecting the format of the embedded information in the two-dimensional watermark image. The data code word is used for extracting embedded information in the two-dimensional watermark image. The error correction code words are used for correcting errors of the two-dimensional watermark images. The version information is used to reflect the version used by the two-dimensional watermark image. And the functional graphic is at least used for positioning the two-dimensional watermark image. The functional graph can help the scanning equipment to quickly position the two-dimensional watermark image in a scene of scanning the two-dimensional watermark image to obtain the embedded information in the two-dimensional watermark image, and the processing speed is increased. However, in this embodiment, the limitation of the watermark capacity of the digital watermark should be prioritized, and in order to further reduce the amount of information in the two-dimensional watermark image, the two-dimensional watermark image may be selected to be subjected to the simplification processing, and the target of the simplification processing may be an area that does not change depending on the embedded information in the two-dimensional watermark image. For example, the position and the pattern of the functional graphic in the two-dimensional watermark image are fixed and do not change with the embedded information, so that the functional graphic can be selected as a simplified object. After simplified processing, the information content in the digital watermark is reduced, and the method is also applicable to the condition that the watermark capacity is small. And because the simplified object is an area which does not change along with the difference of the embedded information, the simplified digital watermark is extracted from the target audio, and the simplified digital watermark can be restored according to the position and the pattern of the simplified object, so that the digital watermark still has a good scanning effect.

In one embodiment, the step of performing simplified processing on the two-dimensional watermark image includes: and setting pixel points of a simplified area of the two-dimensional watermark image as blank pixel points to obtain the two-dimensional watermark image after simplification processing, wherein the simplified area comprises at least one of a positioning mark area, a correction mark area, a timing mark area and a static area. Referring to fig. 7, taking the QR code as an example, the functional graphic includes a positioning mark area, a calibration mark area, a timing mark area, and a static area. The positioning mark area is used for determining the size and the position of the two-dimensional watermark image. The correction mark area is used for determining the center of the two-dimensional watermark image. The timing mark area is used for determining the angle of the two-dimensional watermark image. The static area is used for distinguishing a boundary between the two-dimensional watermark image and the non-two-dimensional watermark image. The position and the pattern of the simplified region in the two-dimensional watermark image are fixed, for example, the positioning mark region is positioned at the upper left corner, the upper right corner and the lower left corner of the two-dimensional watermark image and consists of three large black-white nested squares. Thus, the region to be simplified can be restored when necessary.

The embodiment of the invention also provides an audio copyright reading method, which can be applied to the computer device 102 in the figure and can also be applied to the server 104. Referring to fig. 8, steps S802 to S808 are included.

S802, acquiring the target audio.

It can be understood that the target audio is the audio embedded with the digital watermark, and the information embedded in the digital watermark is the audio with the copyright information index characteristic. The target audio may be obtained by the above audio processing method, or may be obtained by other processing methods.

And S804, extracting the digital watermark from the target audio.

And S806, obtaining the copyright information index characteristic according to the digital watermark.

The digital watermark extraction algorithm is the inverse algorithm of the digital watermark embedding algorithm, and the corresponding digital watermark extraction algorithm is selected according to the digital watermark embedding algorithm selected in the process of processing the proper target audio.

For example, for a digital watermark obtained based on a two-dimensional watermark image, the digital watermark is embedded by mapping the digital watermark to a spectrogram, and the target audio is converted into the target spectrogram. The digital watermark is displayed on the target language spectrogram. The audio is converted into a spectrogram, the spectrogram characteristics can be obtained through Short-time Fourier Transform (STFT), and then the target spectrogram is drawn according to the spectrogram characteristics.

For example, for the way in which a digital watermark is embedded in the audio to be processed by a transform domain algorithm. The target audio and the audio to be processed corresponding to the target audio are subjected to time-frequency transformation processing, the difference of frequency coefficients of frequency domain signals of the two audios is compared, the target frequency coefficient is determined from the frequency coefficients with the difference of the frequency coefficients, a binary sequence corresponding to the digital watermark is extracted from the target frequency coefficient of each frequency domain signal, and the binary sequence is restored to the digital watermark.

For example, for the way in which a digital watermark is embedded in the audio to be processed by a spread spectrum algorithm. The method comprises the steps of obtaining a plurality of frequency domain signals by carrying out time-frequency transformation processing on target audio, determining embedding points of elements in a spread spectrum sequence according to an embedding mechanism used when the spread spectrum sequence is embedded into each frequency domain signal, and extracting all the elements in the spread spectrum sequence from the embedding points of each frequency domain signal. After the spread spectrum sequence is reconstructed, a binary sequence corresponding to the digital watermark is obtained from the spread spectrum sequence by using the same pseudo-random sequence used when the digital watermark is embedded, and the binary sequence is restored into the digital watermark.

If the copyright information index features are encrypted in the process of generating the digital watermark according to the copyright information index features, the copyright information index features can be obtained through decryption processing corresponding to encryption processing after information is extracted from the digital watermark.

And S808, indexing the copyright information of the target audio in a copyright database according to the copyright information indexing characteristics.

Specifically, after the copyright information index features are extracted from the digital watermark, a suitable index mode can be selected according to the form of the copyright information index features. For example, if the copyright information index feature is a character string, the character string may be input in a copyright query interface corresponding to the copyright database to index the copyright information corresponding to the character string. If the copyright information index feature is a URL callback address, the corresponding copyright information can be obtained by accessing the address. The mode of accessing the address can also be selected according to the form of the copyright information index features, for example, if the URL callback address extracted from the digital watermark is in a text form, the URL callback address can be input in a browser for accessing. If the URL callback address extracted from the digital watermark is in a two-dimensional watermark image form, the two-dimensional watermark image can be scanned by two-dimensional code scanning equipment (such as various terminals), and the URL callback address is automatically skipped to after scanning.

Based on the audio copyright reading method in the embodiment, the copyright information index feature can be extracted from the target audio taking the digital watermark embedded information as the copyright information index feature, and then the copyright information corresponding to the target audio is read according to the copyright information index feature. Since the copyright information index feature has a smaller information amount compared with copyright information, the digital watermark generation is not limited by the watermark capacity. In addition, the copyright information containing a large amount of information is placed in the copyright database, and is called from the copyright database according to the copyright information index features by using the method in the embodiment when needed, so that complete and comprehensive copyright information can be provided, and the copyright protection effect is ensured.

In one embodiment, the copyright database is configured with a target server, and the target server is in a network isolation environment. The audio copyright reading method in this embodiment may be applied to the target server itself, or may be applied to a computer device or a server in the same network isolation environment as the target server, or may be applied to a server or a computer device outside the network isolation environment where the target server is located. If the audio copyright reading method is applied to a main body except the target server, whether the requester can obtain the copyright information which needs to be checked or not is judged, namely the step of indexing the copyright information of the target audio in the copyright database according to the copyright information indexing characteristics comprises the following steps: and sending a copyright information acquisition request to the target server, wherein the copyright information acquisition request comprises the copyright information index characteristic and the self network address. If the copyright information index feature is a character string, the copyright information index feature is input in a query interface corresponding to the copyright database to send a copyright information acquisition request for indexing, and after the target server receives the copyright information acquisition request, the network address from which the data packet comes and the copyright information corresponding to the copyright information index feature which needs to be indexed can be extracted from the data packet. If the copyright information index feature is a URL callback address, the target server may obtain the network address of the access device and which copyright information the URL callback address corresponds to when accessing the URL callback address. Based on this, the target server can verify whether the requester is qualified to obtain the copyright information according to the network address.

In one embodiment, the target server responds to a request from a requester of the intranet address, and feeds back copyright information corresponding to the copyright information index feature to the requester. The target server denies the request from the requestor of the foreign network address. In another embodiment, the copyright information includes public copyright information and secret copyright information. The target server responds to a request from a requester of the intranet address, and feeds back public copyright information and secret copyright information corresponding to the copyright information index feature to the requester. The target server responds to a request from a requester of the external network address, and feeds back public copyright information corresponding to the copyright information index feature to the requester.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

In a third aspect, an embodiment of the present invention provides a computer device, where the computer device may be a server or a terminal, and an internal structure diagram of the computer device may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. When the audio copyright reading method is executed, the nonvolatile storage medium can also store a database, and the database of the computer device is used for storing copyright information. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program realizes an audio processing method when being executed by a processor, namely the computer program realizes: acquiring copyright information index features corresponding to copyright information of audio to be processed, wherein the copyright information index features are used for indexing the copyright information of the audio to be processed in a copyright database; obtaining a digital watermark according to the copyright information index characteristics; and embedding the digital watermark into the audio to be processed to obtain the target audio.

Based on the computer device in this embodiment, the digital watermark generated based on the copyright information index feature is embedded into the audio to be processed, and when the copyright of the audio to be processed needs to be read, the copyright information index feature in the digital watermark is extracted, that is, the copyright information corresponding to the audio to be processed can be read in the copyright database according to the copyright information index feature. Since the copyright information index feature has a smaller information amount compared with copyright information, the digital watermark generation is not limited by the watermark capacity. In addition, the copyright information containing a large amount of information is placed in the copyright database and is called from the copyright database according to the copyright information index features when needed, so that complete and comprehensive copyright information can be provided, and the copyright protection effect is ensured.

In one embodiment, the computer program when executed by a processor implements: coding the copyright information index features to obtain two-dimensional watermark images corresponding to the copyright information index features; and obtaining the digital watermark according to the two-dimensional watermark image.

In one embodiment, the computer program when executed by a processor implements: coding the copyright information index features to obtain two-dimensional watermark images corresponding to the copyright information index features; simplifying the two-dimensional watermark image; and obtaining the digital watermark according to the simplified two-dimensional watermark image.

In one embodiment, the computer program when executed by a processor implements: performing frame processing on the audio to be processed to obtain each frame audio; performing time-frequency transformation processing on each frame audio to obtain a frequency domain signal corresponding to each frame audio; embedding digital watermarks in the frequency domain signals to obtain embedded frequency domain signals; carrying out inverse processing of time-frequency transformation processing on each embedded frequency domain signal to obtain each embedded frame audio; and carrying out superposition processing on each embedded frame audio to obtain a target audio.

In one embodiment, the computer program when executed by the processor implements an audio copyright reading method, i.e. the computer program when executed by the processor implements: acquiring a target audio; extracting a digital watermark from the target audio; obtaining copyright information index characteristics according to the digital watermark; and indexing the copyright information of the target audio in a copyright database according to the copyright information index characteristic.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the audio processing method or the audio copyright reading method in any of the above embodiments.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. An audio processing method, comprising:

acquiring copyright information index features corresponding to copyright information of audio to be processed, wherein the copyright information index features are used for indexing the copyright information of the audio to be processed in a copyright database;

obtaining a digital watermark according to the copyright information index characteristic;

and embedding the digital watermark into the audio to be processed to obtain a target audio.

2. The audio processing method of claim 1, wherein the step of deriving the digital watermark according to the copyright information indexing characteristic comprises:

coding the copyright information index features to obtain two-dimensional watermark images corresponding to the copyright information index features;

and obtaining the digital watermark according to the two-dimensional watermark image.

3. The audio processing method of claim 2, wherein the deriving the digital watermark from the two-dimensional watermark image comprises:

simplifying the two-dimensional watermark image;

and taking the simplified two-dimensional watermark image as the digital watermark.

4. The audio processing method according to claim 3, wherein the step of simplifying the two-dimensional watermark image comprises:

setting pixel points of a simplified area of the two-dimensional watermark image as blank pixel points to obtain the two-dimensional watermark image after simplification processing, wherein the simplified area comprises at least one of a positioning mark area, a correction mark area, a timing mark area and a static area.

5. The audio processing method according to any of claims 2-4, wherein the step of embedding the digital watermark in the audio to be processed comprises:

setting the digital watermark in a target frequency band of a preset spectrogram, wherein the target frequency band is a frequency band outside an ear perception range;

converting the preset spectrogram into audio to obtain watermark audio;

and synthesizing the watermark audio and the audio to be processed to obtain target audio.

6. The audio processing method of claim 5, wherein a first coordinate axis of the preset spectrogram represents time and a second coordinate axis represents frequency, and the step of setting the digital watermark in the target frequency band of the preset spectrogram comprises:

determining a target coordinate corresponding to the target frequency band in the second coordinate axis direction, and determining a target area of the preset spectrogram according to the target coordinate; the coordinate of the target area in the direction of the second coordinate axis is larger than the target coordinate;

and setting the digital watermark in a target area of the preset spectrogram to obtain the spectrogram.

7. The audio processing method according to any of claims 2-4, wherein the step of embedding the digital watermark in the audio to be processed comprises:

performing framing processing on the audio to be processed, and performing time domain transformation processing on each frame audio obtained after the framing processing to obtain a frequency domain signal of each frame audio;

embedding the digital watermark in each frequency domain signal to obtain each embedded frequency domain signal;

carrying out inverse processing of the time-frequency transformation processing on each embedded frequency domain signal to obtain each embedded frame audio;

and performing superposition processing on each embedded frame audio to obtain the target audio.

8. The audio processing method of claim 7, wherein the step of embedding the digital watermark in each of the frequency domain signals comprises:

converting the digital watermark into a binary sequence, and performing spread spectrum processing on the binary sequence by using a pseudo-random noise sequence to obtain a spread spectrum sequence;

and embedding the spread spectrum sequence into each frequency domain signal to obtain each embedded frequency domain signal.

9. The audio processing method of claim 7, wherein the step of embedding the digital watermark in each of the frequency domain signals comprises:

converting the digital watermark into a binary sequence;

determining a target frequency coefficient in the frequency coefficients of the frequency components of the frequency domain signal;

and adjusting the target frequency coefficient of each frequency domain signal by using the binary sequence so as to embed the digital watermark in each frequency domain signal.

10. The audio processing method according to any one of claims 1 to 4, wherein the copyright database is configured in a target server, and the target server is in a network isolated environment.

11. The audio processing method according to any one of claims 1 to 4, wherein when actual copyright information changes, the copyright information in the copyright database is updated to the changed actual copyright information.

12. An audio copyright reading method, comprising:

acquiring a target audio;

extracting a digital watermark from the target audio;

obtaining copyright information index characteristics according to the digital watermark;

and indexing the copyright information of the target audio in a copyright database according to the copyright information index characteristic.

13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the method of any one of claims 1 to 11 or the steps of the method of claim 12.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 11 or the steps of the method of claim 12.